30
Building Web Archiving Technology, Together Nicholas Taylor Web Archiving Service Manager Stanford University Libraries Web Archives 2015: Capture, Curate, Analyze November 13, 2015

Building Web Archiving Technology, Together

Embed Size (px)

Citation preview

Page 1: Building Web Archiving Technology, Together

Building Web Archiving Technology, Together

Nicholas TaylorWeb Archiving Service ManagerStanford University Libraries

Web Archives 2015: Capture, Curate, AnalyzeNovember 13, 2015

Page 2: Building Web Archiving Technology, Together

overview

• why build together?

• community for collaborative work

• APIs for collaborative work

“LAX on take off” by Doug under CC BY-NC-ND 2.0

Page 3: Building Web Archiving Technology, Together

not a programmer

“Bug” by Randall Munroe under CC BY-NC 2.5

Page 4: Building Web Archiving Technology, Together

aspiring OSS contributor

GitHub: “nullhandle (Nicholas Taylor)”

Page 5: Building Web Archiving Technology, Together

studying the landscape

“2010 Grand Canyon Celebration of Art 172” by Grand Canyon National Park under CC BY 2.0

Page 6: Building Web Archiving Technology, Together

a centralized enterprise

External Local Both0%

10%

20%

30%

40%

50%

60%

70%

60%

25%

14%

63%

20%16%

2011 2013NDSA: “Web Archiving in the U.S.: A 2013 Survey”

Page 7: Building Web Archiving Technology, Together

a centralized enterprise

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 20130

2

4

6

8

10

12

14

16

18

20

0 0 1 02

0 1 0 1 0

3 31 2

42

64

1 0

2

0

0

11

0

1 3

53

4 2

2 5

6

15

Number of organizations Archive-It Partner as of 2013NDSA: “Web Archiving in the U.S.: A 2013 Survey”

Page 8: Building Web Archiving Technology, Together

minimal local preservation

Transferred Haven't transferred0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

19%

81%

20%

80%

2011 2013NDSA: “Web Archiving in the U.S.: A 2013 Survey”

Page 11: Building Web Archiving Technology, Together

opportunities for research

“Exploring the Canadian Political Interest Group and Political Parties Web Sphere” by Ian Milligan under Standard YouTube License

Page 15: Building Web Archiving Technology, Together

community analysis

SAA Web

Archiving

Roundtable

Archive-It

Partners

IIPC

NDSA: “Web Archiving in the U.S.: A 2013 Survey”

Page 18: Building Web Archiving Technology, Together

models of software production

(irrespective of license)• sole source

– single developer• closed source

– team/corporate dev; no outside contributions• club source

– pool resources for solo/team/corporate dev• community source

– direct and distributed community participation

• open source– grassroots, democratic, meritocratic

participationTom Cramer: “Collaborative Open Source Software Production & APIs”

Page 19: Building Web Archiving Technology, Together

club source examples• Archivematica, AtoM (Artefactual)• ArchivesSpace (Lyrasis)• Bitcurator (Educopia)• Fedora (DuraSpace)• JHOVE (OPF)• LOCKSS (Stanford University)• Omeka (George Mason University)

Page 20: Building Web Archiving Technology, Together

community source examples

Page 21: Building Web Archiving Technology, Together

community architecture• privileges community over code• recognizes distribution of

investment• embraces community diversity• models open processes and

governance• encourages varied contributions• serves community needs

Page 23: Building Web Archiving Technology, Together

success of a standard• capture: DeDuplicator, Heritrix,

python-heritrix, SiteStory, WAIL, WARCreate, WarcMITMProxy, WarcProxy, Webrecorder, wget, Wpull

• access: OpenWayback, pywb, warc-proxy, WarcManager, Wayback Machine, Web Archive Discovery, WebArchivePlayer

• utilities: JHOVE2, JWAT, Megawarc, pylibwarc, WARCAT, Warcbase, warctools, Web Archive Commons

Page 24: Building Web Archiving Technology, Together

web archiving lifecycle

Internet Archive: “The Web Archiving Life Cycle Model”

Page 27: Building Web Archiving Technology, Together

smaller projects do bettersmall projects (<$1 million)

large projects (>$10 million)

on time/budget challengedfailed

on time/budget challengedfailed

Standish Group: “Chaos Manifesto 2013: Thing Big, Act Small”

Page 28: Building Web Archiving Technology, Together

IIPC community interest in APIs

contribution type% of

respondents

# of responde

ntshelp define functional

requirements 94% 15

contribute use cases 81% 13help define technical

details 69% 11

help schedule and run meetings 19% 3

implement and test 6% 1Andrea Goethals: “Results of the Web Archiving API Survey of IIPC Members”

Page 29: Building Web Archiving Technology, Together

API candidates

• capture tool/proxy interconnect

• capture tool management

• data import/export• query + extraction• integrity audit +

repair• descriptive

metadata

• logs + analytics• renderings/

derivative formats• federated data

delivery• federated replay• federated full-text

search