View
217
Download
3
Category
Tags:
Preview:
Citation preview
A new class of primary source? Prospects and pitfalls in using web archives for research
Dr Peter WebsterWebster Research and Consulting@pj_webster
The web its own archive?
Open UK Web Archive 2004-13 comparison.@anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-the-web-after-10-years-of-archiving-.html
Reasons to care about web archiving
• education and research
• enforcement of the law
• public accountability
Three archives for the UK
Temporal scope Content scope Access
Open UKWA 2004-present Selective (14.7k)
Online
Legal Deposit UKWA
2013-present Comprehensive (for UK)
Onsite
JISC UK Domain Dataset
1996-2013 Comprehensive (for .uk)
Index only
JISC UK Web Domain Dataset (1996-2013)
• copy of Internet Archive holdings for .uk
• bought by JISC, held by British Library
• 60TB of data
• no direct access to content
• prototype search at webarchive.org.uk/shine
• derived datasets in public domain
Web archives for NI and RoI
Temporal scope Content scope Access
NLI Web Archive
2011-present Selective (542) Online
PRONI Web Archive
2010-present Selective (115) Online
Legal Deposit UKWA
2013-present Comprehensive (for UK!)
Onsite (TCD)
Ways to use the archived web
• URL search -> single page• Full-text search -> single page• Visualisation -> trend -> page
Ways to use the archived web
• URL search -> single page• Full-text search -> single page• Visualisation -> trend -> page
• Direct access to WARC• Derived datasets• API access
Derived datasets from the BL
From JISC UK Web Domain Dataset (1996-2010)
• File format profile• Geo-index• Crawled URL Index (CDX)• Host Link Graph
Public domain at data.webarchive.org.uk
Creationism ?• non-evolutionary account of human
origins
• modern
• a long history
• a feature of some parts of evangelicalism
• (anti-evolutionism, Intelligent Design)
The creationist web : three questionsA justified conspiracy theory about marginalisation of creationist voices?
A real danger or a moral panic (Truth in Science) ?
The web as friend of the marginalised opinion?
http://peterwebster.me/2014/11/18/reading-creationism-in-the-web-archive/
UK Host Link Graph (1996-2010)
2008 | newsimg.bbc.co.uk | youtube.com | 45
2008 | archbishopofyork.org.uk | flickr.com | 1
2002 | secularism.org.uk | geocities.com | 1
Public domain at: data.webarchive.org.uk
Approach • selection of key UK creationist sites
• extraction of all unique inbound referring hosts for 1996-2010
• inspection and classification
Caveats on method • partial nature of the dataset
• benchmarking of absolute numbers
• selective sample
• what does a link mean, anyway ?
• not looking at number of linking resources per host
Truth in Science: how significant? • only 46 unique inbound hosts
• … of which many were other creationists or secularist sites
• two churches, one school
• fewer in 2010 than 2007
Next steps (1) 1. NI the 'creationism capital of Europe'? (Analysis of:
• links from GB organisations to NI creationists
• links from NI to RoW)
2. What about creationism in .ie ?
Next steps (2) Project: EU National Web Spheres
• part of resaw.eu
• investigating the nature of a national web domain
• .. including the interlinking between them
• case study I: Anglican & Presbyterian churches in Ireland, north and south
Web Archives for Historians
@HistWebArchives , http://webarchivehistorians.org/
Recommended