38
LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program Nicholas Taylor (@ nullhandle ) Program Manager for LOCKSS and Web Archiving Stanford University Libraries CNI Fall 2016 Membership MeeKng 12 December 2016

LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

LOTS OF COPIES KEEP STUFF SAFE

Lots of LOCKSS Keeping Stuff Safe: The Future of the LOCKSS Program

NicholasTaylor(@nullhandle)ProgramManagerforLOCKSSandWebArchivingStanfordUniversityLibrariesCNIFall2016MembershipMeeKng12December2016

Page 2: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

why more LOCKSS? • mature,community-validatedtechnology

•  research-based+builttoaspecificthreatmodel

• web-centricpreserva6onforweb-centricscholarship

•  community-centricpreserva6onforcollecKvechallenges+opportuniKes

•  robust,distributeddigitalpreservaKon

“Cologne Love Padlocks” by orkomedix under CC BY-NC-SA 2.0

Page 3: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Program History

“Grant Park” by xelipe under CC BY-NC-SA 2.0

Page 4: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

inception •  aserialslibrarian+acomputerscienKst

•  printjournals→Web•  conservelibrary’sroleaspreserver

•  collectfrompublishers’websites

•  preservew/cheap,distributed,library-managedhardware

•  disseminatewhenunavailablefrompublisher

Chris Dobson: “From Bright Idea to Beta Test”

Page 5: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

philosophy + focus

•  lotsofcopieskeepstuffsafe

• preservaKonisanac6vecommunityeffort

•  lotsofcommuni6eskeepstuffsafe

•  enablecommuniKestopreserve+accesstheirscholarlyrecord

“Le Penseur” by Ian Abbott under CC BY-NC-SA 2.0

Page 6: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

present day

• financiallyself-sustaining

•  tensofnetworks• hundredsofinsKtuKons•  alltypesofcontent

“LOCKSS | Lots of Copies Keep Stuff Safe”

Page 7: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

looking forward

• organizaKonalchanges•  soYwareevoluKon•  LOCKSSnetworks• distributeddigitalpreservaKon

“Looking for a brighter Future?” by Vincent Brassinne under CC BY-NC-ND 2.0

Page 8: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

“Olympic Relay Handoff” by Dr. Mark Kubert under CC BY-NC-ND 2.0

Organizational Changes

Page 9: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

David + Vicky

AmericanLibraryAssociaKon:“VictoriaReichandDavidS.H.Rosenthal”

Page 10: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

personal introduction •  10yearsinresearchlibraries:

•  StanfordUniversityLibraries(2013–present)•  LibraryofCongress(2010–2013)•  U.S.SupremeCourt(2007–2010)

•  professionalbackground:•  webarchives•  digitallibraryservices•  librarytechnology

•  whatIcareabout:•  scalability+sustainabilityofPLNs,CLOCKSS•  mainstreamingLOCKSSfordigitalpreservaKon

•  buildingcollaboraKvetechnicalcommuni6es

Page 11: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

SUL Web Archiving •  end-to-endservice:

•  collect•  preserve•  makeaccessible•  makediscoverable

•  integratew/collecKondevelopment

•  usecases:•  scholarlyinputs/outputs•  insKtuKonallegacy/compliance

•  governmentinformaKon

InternetArchive:“StanfordUniversityHomepage”

Page 12: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

LOCKSS + DLSS administrativa •  LOCKSSintegraKngw/SULDigitalLibrarySystems&Services(DLSS)

•  ledbyTomCramer,Director&AssociateUniversityLibrarian

•  LOCKSS+SULWebArchiving,underNicholasTaylor

“SPO.101514.SLIDERlathrop.jpg” by Michael Hong

Page 13: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

LOCKSS + DLSS synergies •  realizeopera6onalefficiencies

•  adopt,drivesharedengineeringbestprac6ces

•  promoteAPI-orientedarchitectures

•  streamlinerepository→PLNdatahand-offs

•  contributeupstreamtosharedtools

•  broaden,diversifycommunityoutreach

Page 14: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Software Evolution

“Why we love our macs” by Jason Corneveaux under CC BY-NC-ND 2.0

Page 15: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

new functionality

•  supportedbyMellonFoundaKongrant

•  ingest/harvest•  form-filling•  AJAX

• disseminaKon•  Memento•  Shibboleth

• preservaKon•  pollingperformance “1.13.09: versatility” by Team Dalog under CC BY 2.0

Page 16: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

new architecture

•  exisKngfuncKonality• discretecomponentsaswebservices

•  incorporateexternalsoYware

“San Francisco Oakland Bay Bridge, East Spans New and Old” by Shanan under CC BY-NC 2.0

Page 17: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

web services imperative 1.  “Allteamswillhenceforthexposetheirdataand

funcKonalitythroughserviceinterfaces.”2.  “Teamsmustcommunicatewitheachotherthrough

theseinterfaces.”3.  “Therewillbenootherformofinterprocess

communica6onallowed:nodirectlinking,nodirectreadsofanotherteam'sdatastore,noshared-memorymodel,noback-doorswhatsoever.”

4.  “Allserviceinterfaces,withoutexcepKon,mustbedesignedfromthegrounduptobeexternalizable.Thatistosay,theteammustplananddesigntobeabletoexposetheinterfacetodevelopersintheoutsideworld.”

5.  “Anyonewhodoesn'tdothiswillbefired.”Steve Yegge: “Stevey's Google Platforms Rant”

Page 18: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

risk of large projects

smallprojects(<$1million)4%

20%

76%

largeprojects(>$10million)

38%

52%

10%

StandishGroup:“ChaosManifesto2013:ThinkBig,ActSmall”

successful(onKme,onbudget)

challenged(late,overbudget,lackingfuncKonality)

failed(cancelled,ordelivered

andneverused)

Basedonan8-yearsurveyof50,000so3wareprojectsbytheStandishGroup.

Page 19: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

why re-architect LOCKSS?

•  reducesupport+operaKonscosts•  leverageweb-scaleopen-sourcesoYware•  alignw/webarchivingmainstream

• de-silocomponents+enableexternalintegraKon•  metadataextracKon•  archiveaccessviaDOI+OpenURL•  polling+repairprotocol

• preparetoevolvew/theWeb•  webservicesarchitectureasflexiblefoundaKon

Page 20: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

integration opportunities

• polling+repair•  repositoryreplicaKonlayer

•  otherdistributeddigitalpreservaKonsystems

•  access•  Dockerizedfull-textsearchforwebarchives

•  DOI+OpenURLaccesstowebarchives

• metadataextracKon “A Different Kind of Weave” by Barbara Courouble under CC BY-NC 2.0

Page 21: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

aligning with web archiving WebARChive(WARC)format compa6bletechnologies

• Heritrix• OpenWayback• WarcBase• WebArchivingProxy

21

Page 22: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

web archiving system APIs (WASAPI)

Page 23: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

leveraging community components

Page 24: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

development progress

•  accessWARC-storedcontentvia:

•  DOI•  OpenURL•  URL•  Solrfull-textsearch

• webservices:•  metadataextracKon•  metadatadatabase

“Milestones” by Dheeraj Nagwani under CC BY-NC-ND 2.0

Page 25: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

product roadmap •  2017

•  Docker-izecomponents•  webharvestframework•  polling+repairwebservice

•  releasetoPLNs•  2018

•  IPaddress+ShibbolethaccessviaOpenWayback

•  OpenWaybackformatnegoKaKonframework

•  full-textsearchwebservice

•  releasetoGLN“Printemps, work in progress” by Eric Gjerde under CC BY-NC 2.0

Page 26: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

LOCKSS Networks

“Railroad Wye Switch” by Noel Hankamer under CC BY-NC-SA 2.0

Page 27: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Controlled LOCKSS (CLOCKSS) • whatisit?

•  library/publisherpartnership•  preservethescholarlyrecord•  12globally-distributednodes•  darkunKlnolongeraccessible•  triggeredcontentworld-accessible

•  lookingforward•  expandcapacity•  increasepursuitoflongtail•  championstandardstosimplifyarchiving(e.g.,SignposKng)

Page 28: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Private LOCKSS Networks (PLNs) • whatarethey?

•  communityofinterest•  jointlydesignatecontent•  rundistributednodes•  establishgovernance•  preservaKonviadiversetechnologies,insKtuKons,networks

•  lookingforward•  createdocumenta6on•  enableself-setup•  supportcommunitycollabora6on

•  preservewebarchives

Page 29: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

national networks

• whatarethey?•  in-countrypreservaKon•  localstewardship•  perpetualaccess•  non-consumpKveuse

•  lookingforward•  morenetworks•  preservingna6onallong-tailcontent

“1951 World Map” by peonyandthistle under CC BY-NC-ND 2.0

Page 30: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Distributed Preservation

“Catho longtime [explored]” by Bill Collison under CC BY-NC 2.0

Page 31: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

distributed preservation landscape • beqerunderstandingofroleofdistributeddarkarchives

• nextlogicalstepbeyondmaturelocalpreservaKon

•  appealingopKonforthosew/omaturelocalpreservaKon

Page 32: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

a greater role for LOCKSS?

• bolsterexisKngefforts• undergirdPLNserviceproviders

• mainstreamdistributeddigitalpreservaKon

“DSCN7867” by tyalis_2 under CC BY-NC-ND 2.0

Page 33: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

LOCKSS for web archiving

•  growthinwebarchiving•  centralizaKoninwebarchiving

• naKveWARCsupport•  logicalcomplementforwebarchivepreservaKon

NDSA: “Web Archiving in the United States”

Page 34: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

reliance on service provider

25.40%

60.32%

14.29%

19.51%

63.41%

15.85%

4.81%

63.29%

30.38%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

Local External Both

2011 2013 2015

NDSA: “2016 NDSA Web Archiving Survey”

Page 35: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

flat data transfer trend

19.15%

80.85%

20.29%

79.71%

20.27%

79.73%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Transferdata Donottransferdata

2011 2013 2015

NDSA: “2016 NDSA Web Archiving Survey”

Page 36: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Recap

“Rearview” by jenkinson2455 under CC BY-NC 2.0

Page 37: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

vision

• beqerensurethepreserva6onofwebarchives•  LOCKSSteammoreacKvelyengagedincommunity-supporteddevelopmentefforts

•  communiKesenabledtomoreeasilycontributetoLOCKSSsocware,orrunitw/oourhelp

•  alongertailofins6tu6onsabletocapitalizeondistributeddigitalpreservaKon

•  LOCKSScomponentsappliedincontextsotherthanLOCKSSnetworks

Page 38: LOTS OF COPIES KEEP STUFF SAFE Lots of LOCKSS Keeping ... · web services imperative 1. “All teams will henceforth expose their data and funcKonality through service interfaces.”

Questions?

“stanford dish at sunset” by Dan under CC BY-NC-SA 2.0