Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
PASIG 2019 LOCKSS Seminar:Technical Overview
Thib Guicherd-Callin – Technical Manager, LOCKSS [email protected] – github.com/thibgc
PASIG 2019LOCKSS Seminar:
Technical Overview
1. Origins of LOCKSS Technology in Research Libraries
2. LOCKSS Approach to Digital Preservation Threat Models
3. LOCKSS Polling and Repair Primer
4. LOCKSS Software in Motion
PASIG 2019LOCKSS Seminar:
Technical Overview
1. Origins of LOCKSS Technology in Research Libraries
2. LOCKSS Approach to Digital Preservation Threat Models
3. LOCKSS Polling and Repair Primer
4. LOCKSS Software in Motion
Research Libraries in the Paper Era
● Ownership model● Many independent replicas● Features
○ Disaster resistance○ Disaster recovery○ Tamper evident○ Permanent access
Research Libraries in the Web Era
● Leasing model● One master copy● Misfeatures
○ Disaster resistance?○ Disaster recovery?○ Tamper evident?○ Permanent access?
LOCKSS Technology in Response
● Re-establish ownership● Inter-library collaboration● Diversity
○ Geography○ Hardware○ Software○ Organizational structure○ Jurisdiction
PASIG 2019LOCKSS Seminar:
Technical Overview
1. Origins of LOCKSS Technology in Research Libraries
2. LOCKSS Approach to Digital Preservation Threat Models
3. LOCKSS Polling and Repair Primer
4. LOCKSS Software in Motion
Digital Preservation Threat Models
David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. Requirements for Digital Preservation Systems: A Bottom-Up Approach. D-Lib Magazine, vol. 11, iss. 11, November 2005. DOI: 10.1045/november2005-rosenthal
Digital Preservation: Definition and Key Properties
The goal of a digital preservation system is that the information it contains remains accessible to users over a period of time much longer than the lifetime of individual storage media, hardware and software components.
● No single point of failure● Media, hardware and software flow through as they fail or are replaced● Regular audits frequent enough to keep probability of irrecoverable failure
acceptable
Threat Taxonomy
● Media failure● Hardware failure● Software failure● Communication errors● Failure of network services● Natural disaster
● Media and hardware obsolescence● Software obsolescence
● Operator error● Economic failure● Organizational failure
● External attack● Internal attack
LOCKSS Polling and Repair
● Landslide agreementTake no action(high confidence in outcome)
● Inconclusive agreementTake no action and raise alarm(low confidence in outcome)
● Landslide disagreementSeek repair and notify(high confidence in outcome)
Attacker'sgoal
(Stealthmodification
gap)
PASIG 2019LOCKSS Seminar:
Technical Overview
1. Origins of LOCKSS Technology in Research Libraries
2. LOCKSS Approach to Digital Preservation Threat Models
3. LOCKSS Polling and Repair Primer
4. LOCKSS Software in Motion
P2
P1
P3
P4
P5P6
What is hash(X)?
XThe peers hold identical replicas of XPeer P1 calls a poll on content X
P2
P1
P3
P4
P5P6
X
hash(X) = h1 hash(X) = h1
hash(X) = h1
hash(X) = h1hash(X) = h1
P2, P3, P4, P5, P6 agreed with me on X
Landslide agreement
P2
P1
P3
P4
P5P6
Peer P2 calls a poll on content X
X
What is hash(X)?
P2
P1
P3
P4
P5P6
hash(X) = h1
hash(X) = h1
hash(X) = h1hash(X) = h1
hash(X) = h1
P1, P3, P4, P5, P6 agreed with me on X
XLandslide agreement
P2
P1
P3
P4
P5P6
What is hash(X)?
XPeer P1 incurs damage on content XPeer P1 later calls a poll on content X X
P2
P1
P3
P4
P5P6
X
hash(X) = h1 hash(X) = h1
hash(X) = h1
hash(X) = h1hash(X) = h1
hash(X) = h2
Landslide disagreement
X
P2
P1
P3
P4
P5P6
Help me repair X
X
XRepair request
P2
P1
P3
P4
P5P6
P1 agreed with me on X
X
X
X
Repair
P2
P1
P3
P4
P5P6
X
XThe peers hold identical replicas of X
PASIG 2019LOCKSS Seminar:
Technical Overview
1. Origins of LOCKSS Technology in Research Libraries
2. LOCKSS Approach to Digital Preservation Threat Models
3. LOCKSS Polling and Repair Primer
4. LOCKSS Software in Motion
Functionality of the LOCKSS System
● Storage layer (POSIX file system → HDFS)● Web crawler (LOCKSS Crawler)● LOCKSS polling and repair protocol (LCAP)● Metadata extraction and metadata database● Web replay (ServeContent → OpenWayback, Pywb)
Evolution of the LOCKSS System
● Standalone Java software stack performing all functions, controlled via a Web user interface
● Standalone Java software stack performing all functions, controlled via a Web user interface and a limited set of Web services
● Suite of Java software components performing specialized functions, controlled via a Web user interface and REST Web services
Evolution of the LOCKSS Platform
● A dedicated physical machine with a read-only OpenBSD operating system and read-only configuration data, running the standalone LOCKSS software exclusively, with locally attached disk storage
● A physical or virtual machine with a Linux operating system, running the standalone LOCKSS software exclusively or non-exclusively, with locally attached or proximally available storage, with an optional local database
● A physical or virtual machine, running a set of Docker containers, with local or remote HDFS storage and database connections
Software Modernization Initiative
● "A successful 20-year-old open source codebase is still a 20-year-old codebase"● Spring Framework● Major refactoring● Distribution as artifacts on Maven Central, Docker containers on Docker Hub● Orchestration through Docker (Docker Swarm, Docker Stack)
Thank You
● Resources○ LOCKSS Web site: lockss.org○ LOCKSS Documentation Portal: lockss.github.io
● Software○ LOCKSS at GitHub: github.com/lockss○ LOCKSS at Maven Central: group ID org.lockss○ LOCKSS at Docker Hub: hub.docker.com/u/lockss
● Communication○ Twitter: twitter.com/lockss○ Slack: tinyurl.com/slackjoinlockss
● Q&A