23
SolrWayback Demo of the SolrWayback search interface, tools and playback engine for WARCs Thomas Egense IT specialist Anders Klindt Myrvoll Programme Manager the Danish web archive IIPC 2018, Wellington

SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

SolrWayback

Demo of the SolrWayback

search interface, tools and

playback engine for WARCs

Thomas Egense

IT specialist

Anders Klindt Myrvoll

Programme Manager – the Danish web archive

IIPC 2018, Wellington

Page 2: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

https://da.aoab.dk/view/state-and-university-li

https://www.instagram.com/librarylovestories/ for more library pics!

Page 3: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

http://www.kb.dk/en/index.html

Page 4: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search
Page 5: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

WEB ARCHIVE

WITH ARC/WARC

FILES

INDEXBritish Library

Webarchive-discovery/Warc-indexer framework

SEARCH/

FRONT END

INTERFACE

PLAYBACK

ENGINE

TOOLS

HARVEST

PWID

WWW

Build in socks proxy

to prevent leaking

Out of the box, open source web-applicationfor researchers to explore Arc/Warc files.

Page 6: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

PWID/XML

IIPC, Wellington, 2018

pwid:mia.oszk.hu:2018-04-24T09:06:21:page:https://mnm.hu/en/museum

web archive

time of archiving

content coverage

archived URL

Best poster

iPRES 2018

Eld Zierau

- online

soon

Page 7: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

PWID-resolver

IIPC, Wellington, 2018

”Prototype Resolver for resolving PWIDs to entries

(Netarkivet & open web archives),

e.g. the PWID

urn:pwid:archive.org:2018-02-22T11:54:11Z:page:https://ipres2018.org/

resolves to:

http://web.archive.org/web/20180222115411/https://ipres2018.org/

Source at https://github.com/netarchivesuite/NAS-research/releases

Page 8: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Visualization of crawltimes

Page 9: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Domain development over time

Page 10: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

• Easy to install and use on Mac, Linux and

Windows. Contains Webserver, Solr and warc-

indexing tool. Just drop Arc/Warcs into a folder

and start exploring the corpus.

•Github-link

https://github.com/netarchivesuite/solrwayback

Installing SolrWayback

Page 11: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

More infoThe National Széchényi Library - Hungary http://193.6.201.202/solrwayback/

Gabor Vitez re-wrote the geo search form google maps to open streetmaps.

Athens University of Economics and Business (older version of SolrWayback)

http://archive.aueb.gr/

• Toke Eskildsen has helped with

sparring / performance tuning and a warc export.

• Niels Gamborg has made 75% of the

front end search interface and tools.

Retired now.

Abstract

IIPC, Wellington, 2018

Page 12: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Contact

SolrWayback - Thomas Egense

[email protected] @ThomasEgense

PWID - Eld Zierau

[email protected] @EldZierau

General inquires – Anders Klindt Myrvoll

[email protected] @AndersKlindt

IIPC, Wellington, 2018

Page 13: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

IIPC, Wellington, 2018

Questions anddiscussion

Page 14: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Search example showing hits. Images are shown in search-result.

Page 15: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

23-11-2018Google like image search in the web-archive.

Page 16: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

SOLRWayback showing an archived webpage with an overlay statistics and

further navigation options.

Page 17: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Page previews for different harvest times of a given url. Images are generated

real-time and uses the build in socks proxy to prevent leaking to the live web.

Page 18: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Interactive domain link graph

Page 19: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Visualization of crawltimes

Page 20: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

XML/PWID

Page 21: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Search by gps location for images having exif location information

Page 22: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

Domain development over time

Page 23: SolrWayback - International Internet Preservation Consortiumnetpreserve.org/ga2018/wp-content/uploads/2018/11/IIPC... · 2018-11-23 · SolrWayback Demo of the SolrWayback search

IIPC, Wellington, 2018