21
Overview of LOCKSS

Overview of LOCKSS

  • Upload
    jontae

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Overview of LOCKSS. Session Learning Objectives. Provide an overview of the LOCKSS architecture. Describe the LOCKSS polling process Describe how LOCKSS private networks differ. Provide a vocabulary of technical terms used frequently with LOCKSS networks. Architectural Components. - PowerPoint PPT Presentation

Citation preview

Page 1: Overview of LOCKSS

Overview of LOCKSS

Page 2: Overview of LOCKSS

Session Learning Objectives

Provide an overview of the LOCKSS architecture.

Describe the LOCKSS polling process

Describe how LOCKSS private networks differ.

Provide a vocabulary of technical terms used frequently with LOCKSS networks

Page 3: Overview of LOCKSS

Architectural Components

Provider Sites (digital collections) LOCKSS nodes (aka “peers”) Plugins / Plugin Repository Cache Manager Title Database / Conspectus

Database

Page 4: Overview of LOCKSS

Provider Sites

Prepare a digital collection so that it is web accessible to the preservation nodes

Expose a “manifest” web page for each collection, according to LOCKSS specifications. Grants permission for LOCKSS to crawl Gives starting point for crawl

Provide information sufficient to create a LOCKSS plugin for the collection (or else create the plugin themselves and reposit that plugin with the LOCKSS network)

Page 5: Overview of LOCKSS

LOCKSS Peer Nodes

Data caches for harvested content Caches organized into archival units

(AUs) Nodes can select which AUs to crawl

and preserve There must be >= 6 copies of an AU

in order for the polling process to work properly

Page 6: Overview of LOCKSS

Plugins / Plugin Repository

Tell LOCKSS where, how and how often to crawl a provider site for AUs

Plugins are Java based Distinct from core LOCKSS software

Page 7: Overview of LOCKSS

Cache Manager

Distributed separately from LOCKSS Can remotely inspect and manage

the caches on the various peer nodes

Page 8: Overview of LOCKSS

Title / Conspectus Databases

Title database on each node describes and manages which AUs to preserve on that node

Conspectus Database designed for MetaArchive Project, provides more extensive metadata about the preserved digital collections, and feeds the Title database with entries

Page 9: Overview of LOCKSS

Web Site

Source Code

SQL Dump

Digital Collection 1 Private LOCKSS Network Nodes

Manifest page

Manifest page

9

1

8

2

7

3

6

4

5

Digital Collection 2

AU 1

AU 2

AU 2

AU 3

WebSite

AU 1

Plugin Repository

DC1

DC2

DC1

DC2

DC2

DC2

DC2

DC2

DC2

DC2

DC1

DC1

DC1DC1

DC1

Page 10: Overview of LOCKSS

The Polling Process

Page 11: Overview of LOCKSS

Polling Process resulting in “landslide loss”, AU repair

9

1

8

2

7

4

5

DC2-AU1

Node 5 calls poll on AU 1

of Digital Collection 2

DC2-AU1 DC2-AU1

DC2-AU1

DC2-AU1

DC2-AU1 DC2-AU1

Node 5 invites some recently encountered

peers to vote.

(Each node maintains a reference list of the

recently encountered peers)

Those invited are the “inner circle” for this

opinion poll.

SHA1

Invited nodes create fresh

SHA1 digest of the AU

SHA1

Invitation

SHA1

SHA1

SHA1

PollChallenge

Affirmative PollChallenge

message responses allow that inner circle node to

participate in poll

PollProof

Poll Effort Proof is cryptographically

derived and sent to affirmative voter’s

challenges

Node 9 nominates 7 and 8

Nominated Nodes 7 and 8 belong to the “outer circle”, can be invited to subsequent

voting rounds by Node 5

Node 5 discovers new peers through

nomination process

Valid vote agrees

Valid vote disagrees

Valid vote disagreesValid vote disagrees

There is a “landslide” of valid, disagreeing votes

against the Node 5’s SHA1 digest of DC2-AU1

Since agreeing votes are below

threshold, Node 5 picks a random

disagreeing voter from the inner circle

Encrypted RepairRequest messageRepair made

Once repair is completed, Node 5 immediately calls a new poll,

which effectively verifies, or invalidates and corrects, the

repair

Page 12: Overview of LOCKSS

Polling Refresh Timer

A peer sets a refresh timer for a given AU to determine the interval between successive polls

System parameter R is the mean for the possible random values generated for the refresh timer

Page 13: Overview of LOCKSS

System Parameter – ‘Quorum’

Q = # of valid inner circle votes required to conclude a poll successfully

Q = 6 is the thoroughly tested value in use

If votes < Q, poller invites additional peers, or else aborts the opinion poll

Page 14: Overview of LOCKSS

Polling Outcome – ‘Landslide Win’

The poller considers its current copy to have integrity

This is the only scenario in which an opinion poll concludes successfully

The poller updates its reference list and then waits until the next polling period (determined by the refresh timer)

Page 15: Overview of LOCKSS

Reference List Update

Happens only after a successful poll Poller removes the inner circle peers

who had valid votes in the last opinion poll

Culls peers it has not been able to contact for some time

Adds outer circle peers whose votes were valid and eventually agreeing

Page 16: Overview of LOCKSS

Polling Outcome - Inconclusive

D = max allowed “minority” votes If Agreeing Votes > D, and Agreeing Votes < Total valid votes – D, Then the poll is inconclusive, raises alarm Human intervention needed to determine

if nodes have been compromised Peers voting in agreement with a known

bad copy are blacklisted if that peer node can’t be identified or it won’t cooperate

Page 17: Overview of LOCKSS

Further Details on Polling Process

Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, and Yanto Muliadi, "LOCKSS: A Peer-to-Peer Digital Preservation System", ACM Transactions on Computer Systems (TOCS). http://www.eecs.harvard.edu/~mema/publications/TOCS2005.pdf

See also LOCKSS related publications at http://www.lockss.org/lockss/Publications

Page 18: Overview of LOCKSS

The LOCKSS Private Network Difference

More flexible (not appliance based) Can run on any operating system that

supports Java LOCKSS Team maintains rpm packages for

Linux installations Peer Node administrators have greater

discretion configuring access, customizing functionality, e.g. altering system parameters

Page 19: Overview of LOCKSS

The LOCKSS Private Network Difference (cont.)

Can extend LOCKSS core functionality with supplemental tools and methods to fit new use cases

E.g. the MetaArchive Conspectus database

Page 20: Overview of LOCKSS

Vocabulary

(Please refer to the workshop binder for terminology and definitions)

Page 21: Overview of LOCKSS

Overview of LCAP version 3