27
PASIG 2019 LOCKSS Seminar: Technical Overview Thib Guicherd-Callin – Technical Manager, LOCKSS Program [email protected] – github.com/thibgc

Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019 LOCKSS Seminar:Technical Overview

Thib Guicherd-Callin – Technical Manager, LOCKSS [email protected] – github.com/thibgc

Page 2: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019LOCKSS Seminar:

Technical Overview

1. Origins of LOCKSS Technology in Research Libraries

2. LOCKSS Approach to Digital Preservation Threat Models

3. LOCKSS Polling and Repair Primer

4. LOCKSS Software in Motion

Page 3: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019LOCKSS Seminar:

Technical Overview

1. Origins of LOCKSS Technology in Research Libraries

2. LOCKSS Approach to Digital Preservation Threat Models

3. LOCKSS Polling and Repair Primer

4. LOCKSS Software in Motion

Page 4: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Research Libraries in the Paper Era

● Ownership model● Many independent replicas● Features

○ Disaster resistance○ Disaster recovery○ Tamper evident○ Permanent access

Page 5: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Research Libraries in the Web Era

● Leasing model● One master copy● Misfeatures

○ Disaster resistance?○ Disaster recovery?○ Tamper evident?○ Permanent access?

Page 6: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

LOCKSS Technology in Response

● Re-establish ownership● Inter-library collaboration● Diversity

○ Geography○ Hardware○ Software○ Organizational structure○ Jurisdiction

Page 7: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019LOCKSS Seminar:

Technical Overview

1. Origins of LOCKSS Technology in Research Libraries

2. LOCKSS Approach to Digital Preservation Threat Models

3. LOCKSS Polling and Repair Primer

4. LOCKSS Software in Motion

Page 8: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Digital Preservation Threat Models

David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. Requirements for Digital Preservation Systems: A Bottom-Up Approach. D-Lib Magazine, vol. 11, iss. 11, November 2005. DOI: 10.1045/november2005-rosenthal

Page 9: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Digital Preservation: Definition and Key Properties

The goal of a digital preservation system is that the information it contains remains accessible to users over a period of time much longer than the lifetime of individual storage media, hardware and software components.

● No single point of failure● Media, hardware and software flow through as they fail or are replaced● Regular audits frequent enough to keep probability of irrecoverable failure

acceptable

Page 10: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Threat Taxonomy

● Media failure● Hardware failure● Software failure● Communication errors● Failure of network services● Natural disaster

● Media and hardware obsolescence● Software obsolescence

● Operator error● Economic failure● Organizational failure

● External attack● Internal attack

Page 11: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

LOCKSS Polling and Repair

● Landslide agreementTake no action(high confidence in outcome)

● Inconclusive agreementTake no action and raise alarm(low confidence in outcome)

● Landslide disagreementSeek repair and notify(high confidence in outcome)

Attacker'sgoal

(Stealthmodification

gap)

Page 12: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019LOCKSS Seminar:

Technical Overview

1. Origins of LOCKSS Technology in Research Libraries

2. LOCKSS Approach to Digital Preservation Threat Models

3. LOCKSS Polling and Repair Primer

4. LOCKSS Software in Motion

Page 13: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

What is hash(X)?

XThe peers hold identical replicas of XPeer P1 calls a poll on content X

Page 14: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

X

hash(X) = h1 hash(X) = h1

hash(X) = h1

hash(X) = h1hash(X) = h1

P2, P3, P4, P5, P6 agreed with me on X

Landslide agreement

Page 15: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

Peer P2 calls a poll on content X

X

What is hash(X)?

Page 16: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

hash(X) = h1

hash(X) = h1

hash(X) = h1hash(X) = h1

hash(X) = h1

P1, P3, P4, P5, P6 agreed with me on X

XLandslide agreement

Page 17: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

What is hash(X)?

XPeer P1 incurs damage on content XPeer P1 later calls a poll on content X X

Page 18: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

X

hash(X) = h1 hash(X) = h1

hash(X) = h1

hash(X) = h1hash(X) = h1

hash(X) = h2

Landslide disagreement

X

Page 19: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

Help me repair X

X

XRepair request

Page 20: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

P1 agreed with me on X

X

X

X

Repair

Page 21: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

P2

P1

P3

P4

P5P6

X

XThe peers hold identical replicas of X

Page 22: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

PASIG 2019LOCKSS Seminar:

Technical Overview

1. Origins of LOCKSS Technology in Research Libraries

2. LOCKSS Approach to Digital Preservation Threat Models

3. LOCKSS Polling and Repair Primer

4. LOCKSS Software in Motion

Page 23: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Functionality of the LOCKSS System

● Storage layer (POSIX file system → HDFS)● Web crawler (LOCKSS Crawler)● LOCKSS polling and repair protocol (LCAP)● Metadata extraction and metadata database● Web replay (ServeContent → OpenWayback, Pywb)

Page 24: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Evolution of the LOCKSS System

● Standalone Java software stack performing all functions, controlled via a Web user interface

● Standalone Java software stack performing all functions, controlled via a Web user interface and a limited set of Web services

● Suite of Java software components performing specialized functions, controlled via a Web user interface and REST Web services

Page 25: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Evolution of the LOCKSS Platform

● A dedicated physical machine with a read-only OpenBSD operating system and read-only configuration data, running the standalone LOCKSS software exclusively, with locally attached disk storage

● A physical or virtual machine with a Linux operating system, running the standalone LOCKSS software exclusively or non-exclusively, with locally attached or proximally available storage, with an optional local database

● A physical or virtual machine, running a set of Docker containers, with local or remote HDFS storage and database connections

Page 26: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Software Modernization Initiative

● "A successful 20-year-old open source codebase is still a 20-year-old codebase"● Spring Framework● Major refactoring● Distribution as artifacts on Maven Central, Docker containers on Docker Hub● Orchestration through Docker (Docker Swarm, Docker Stack)

Page 27: Technical Overview PASIG 2019 LOCKSS Seminar€¦ · LOCKSS Seminar: Technical Overview 1. Origins of LOCKSS Technology in Research Libraries 2. LOCKSS Approach to Digital Preservation

Thank You

● Resources○ LOCKSS Web site: lockss.org○ LOCKSS Documentation Portal: lockss.github.io

● Software○ LOCKSS at GitHub: github.com/lockss○ LOCKSS at Maven Central: group ID org.lockss○ LOCKSS at Docker Hub: hub.docker.com/u/lockss

● Communication○ Twitter: twitter.com/lockss○ Slack: tinyurl.com/slackjoinlockss

● Q&A