34
A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath, UK Email [email protected] UKOLN is supported by: About This Talk A recap of Web preservation challenges and approaches to the preservation of Web content. But will use of Web 2.0 services lead to new preservation concerns? How much of a concern is this? And what steps can be taken to minimise the risks of data loss? This work is licensed under a Attribution-NonCommercial- ShareAlike 2.5 licence (but note caveat) http://www.ukoln.ac.uk/cultural-heritage/events/sharing- made-simple-200811/ Resources bookmarked using 'sharing-made-simple- 200811' tag

A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

Embed Size (px)

Citation preview

Page 1: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

Web 1.0, Web 2.0 and Digital PreservationBrian KellyUKOLNUniversity of BathBath, [email protected]

UKOLN is supported by:

About This TalkA recap of Web preservation challenges and approaches to the preservation of Web content.But will use of Web 2.0 services lead to new preservation concerns? How much of a concern is this? And what steps can be taken to minimise the risks of data loss?

About This TalkA recap of Web preservation challenges and approaches to the preservation of Web content.But will use of Web 2.0 services lead to new preservation concerns? How much of a concern is this? And what steps can be taken to minimise the risks of data loss?

This work is licensed under a Attribution-NonCommercial-ShareAlike 2.5 licence (but note caveat)

http://www.ukoln.ac.uk/cultural-heritage/events/sharing-made-simple-200811/http://www.ukoln.ac.uk/cultural-heritage/events/sharing-made-simple-200811/

Resources bookmarked using 'sharing-made-simple-200811' tag Resources bookmarked using 'sharing-made-simple-200811' tag

Page 2: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

2

Contents

What’s the Problem? Disappearing domains Disappearing data Broken services

Preservation and Web 1.0 Case studies Mothballing

Preservation and Web 2.0 Third party services Communications rather than resources

Page 3: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

3

Is Web Site Preservation An Issue?

Digital Resources Don't Rot• Digital resources (images, video, software, Web

sites, …) don't degrade due to environmental factors. This is a key difference with physical resources.

• Web sites are made from various digital resources: HTML pages, GIF, JPEG, etc. image files, PDF resources, software (scripts, JavaScript, etc.)

• These won't degrade so why is Web site preservation an issue?

• Isn't the fact that old Web sites won't disappear and may be embarrassing more of a challenge?

Th

e P

rob

lem

Page 4: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

4

Digital Resources Do Rot!

In fact digital resource do 'rot':• Operating systems are upgraded and existing

applications case to work• Security holes are identified and there is a need

to install patches• Resources may be dependent on external

resources (e.g. links, news feeds, …) which may disappear

• Resources may be hosted by external services and there is a need for ongoing funding for the hosting

• …

Th

e P

rob

lem

Page 5: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

5

Preservation In A Web 1.0 World

The Web 1.0 environment:• Static content• Managed by organisation

The challenges:• Changes in funding• “Mothballing• Legal issues (not covered!)• …

Web

1.0

Page 6: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

6

The Nightmare ScenarioTo be avoided:

• The funding finishes• Project staff leave, partnership dissolves• Hosting agency upgrades operating system,

resulting in scripts to access resources from backend database are broken

• User finds page with invitation to project launch and travels to meeting. Unfortunately the event took place in 2002

• Invoice for domain name is not paid, as administrator has left

• Web site domain taken over by porn company• Prime Minister picks up pen containing project

URL and visits pornographic Web site

Web

1.0

Page 7: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

7

It Has HappenedWebtechs.com

• Software company which hosted early HTML validation service

• In 1998/99 confusion over payment of domain name

• March 1999 company receives many messages saying validation service is now a porn site

• Over 30,000 links to Web site!

• Sept 1999 porn company agrees to sell domain name back to Webtech

Web

1.0

Page 8: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

8

Technical Issues

Standards And Formats• Has the Web site been designed using open

standards, which should help future-proofing?• Have proprietary formats been used (for which

backwards compatibility may not be considered)

Architecture & Implementation• Has the technical architecture of the Web site

been documented?• Can I continue to use technical systems after

funding has finished

Web

1.0

Note that in reality content owners may have little control over the formats used and the technical architecture.

Note that in reality content owners may have little control over the formats used and the technical architecture.

Page 9: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

9

Content IssuesAccuracy:

• Is the content of my Web site accurate today – and tomorrow

• Could the content of my Web site be misleading

Usability:• Are links working today – and tomorrow

Legal:• Is my Web site legal today (accessibility;

copyright; defamation; IPR; …)?• Will my Web site be legal tomorrow, if new

legislation is enacted?

Web

1.0

Note that in reality rather than necessarily taking a safe position over, say, legal issues, a risk assessment approach may be taken

Note that in reality rather than necessarily taking a safe position over, say, legal issues, a risk assessment approach may be taken

Page 10: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

10

Mothballing Your Web Site (1)

Before funding finishes you should take steps for the mothballing of your Web site:

• Run a link check across the Web site. Fix broken internal links and as many external links as is reasonable. Document the link report.

• Run HTML (and CSS) validation checks across the Web site. Fix as many invalid pages as is reasonable. Document the findings.

• Run an accessibility check across the Web site. Fix as many inaccessible pages as is reasonable. Document the findings.

This should not be an onerous task if you have following best practices. Note that errors found later occurred after your funding finished.

This should not be an onerous task if you have following best practices. Note that errors found later occurred after your funding finished.

Web

1.0

Page 11: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

11

Mothballing Your Web Site (2)

You should also address technical areas:• Remove any backend scripts which are no longer

needed (e.g. online booking forms for old events).• Remember that scripts, etc. are liable to go wrong.

Ensure that applications are configured to break gracefully and provide meaningful errors: The config.ssi is missing. This should be

reported to the systems administrator (email [email protected] or ring +44 020 123 123. Please provide the URL of the broken page and the project name)

Apache error 6963

Web

1.0

Page 12: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

12

This Web si

te is no lo

nger maintained.

This Web si

te is no lo

nger maintained.

See home page for d

etails

See home page for d

etails

Mothballing Your Web Site (3)You should also address the content of your Web site:

• Clarify the status of the Web site on the home page.

• Ensure the tense of the content reflects the position i.e. don't say "This project will …"

• Ensure that contact details will remain valid i.e. provide generic email addresses not an individuals

• Remember that many users will arrive deep in your Web site (e.g. using Google). If necessary use CSS to flag all pages with a watermark

See <http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-32/>

See <http://www.ukoln.ac.uk/cultural-heritage/documents/briefing-32/>

Web

1.0

Page 13: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

13

Case Study 1 - Exploit Interactive

Exploit Interactive:• EU-funded ejournal available at

<http://www.exploit-lib.org/>• Funded from Jan 1999 – Dec 2000• Web site is still hosted locally

Issues:• Should we continue hosting domain after 3

years?• What is the cost of this (domain name

registration, disk storage, system maintenance)?

Web

1.0

Page 14: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

14

Case Study - Exploit Interactive

Findings:• Disk storage is 4Gb (large proportion is log files)• A 30 Gb disk drive costs ~ £40• It was decide to run an annual link check of the

Web site. Although there were broken links to external sites, the internal links all worked.

• It was estimated that it would take about 30 minutes / year to run a link check and document findings.

• A policy for the ongoing provision of the Web site was agreed

• See <http://www.ukoln.ac.uk/qa-focus/documents/case-studies/case-study-17/>

Web

1.0

Page 15: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

15

Is Web 2.0 Different?Implications of Web 2.0 for Web site preservation:

• Use of 3rd party services (‘network as platform’)• Content collaboration and communication• Richer diversity of services (not just a file on a

filestore/CMS/database)• More complex IPR issues

Let’s look at:• Case study 1 - wikis• Case study 2 – blogs• Case study 3 – reusing data• Case study 4 – comms tools• Case study 5 – recording events• Case study 6 – Slideshare• Case study 7 – Use of video tools• Case study 8 – social networks

Web

2.0

Page 16: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

16

Case Study 1: A Public Wiki

WetPaint wiki used to support UKOLN workshopApproaches taken:

• Open access to all prior to & during event (to minimise barriers to creating content)

• Access restricted to WetPaint users after event

• Access later restricted to event organisers

Web

2.0

Many aspects of Web site curation are to do with implementing such best practices, rather than implementing technical solutions

Many aspects of Web site curation are to do with implementing such best practices, rather than implementing technical solutions

Page 17: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

17

Case Study 1: A Public Wiki

WetPaint provides an option for backing up data.

A zipped file of the pages can be saved for storing on a locally managed service.

Web

2.0

There are limitations in this particular service (poor quality HTML, internal links don’t work, …)

But this does illustrate an approach which can be taken.

Page 18: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

18

Case Study 2a: Blog Migration

How might you migrate the contents of a blog (e.g. you’re leaving college)?

This question was raised by Casey Leaver, shortly before leaving Warwick University

Web

2.0

Page 19: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

19

Web

2.0 Case Study 2a: Blog Migration

She migrated her blog from blogs at Warwick Univ to Wordpress

Note, though, that not all data was transferred (e.g. title, but not contents) so there’s a need to check transfer mechanisms

Page 20: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

20

Case Study 2a: Blog Migration

A backup of UK Web Focus blog is available on Vox:

• Manual migration of new posts every few weeks

• Only migrates text• Doesn’t migrate images, embedded videos, internal links, comments, …

Web

2.0

Migration of blogs, wikis, etc. is not currently an easy task

Page 21: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

21

Case Study 2b: The Individual’s Blog (1)

Auricle blog:• Launched Jan 2004

by head of e-learning team, Bath

• High profile & public visibility by early adopter & evangelist

Today:• It’s gone• Lost after

evangelist left, new staff arrive, new priorities, …

Thoughts?

Page 22: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

22

Case Study 2b: The Individual’s Blog (2)

Auricle reborn:• Further Google

revealed the blog has been reborn

• New domain (www.auricle.org/)

• New engine (Wordpress) & look and feel (but old engine still available)

• New content being added

• Old content still accessible

Thoughts?

Page 23: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

23

Case Study 3: Reusing DataBlog post in Facebook. Possible concerns:

• It’s not sustainable• You’ve given ownership to

Facebook

Web

2.0

Response:• The post is managed in

WordPress; Fb displays copy (to new audience)

• Fb don’t claim ownership – they claim rights to make money (e.g. through ads)

Page 24: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

24

Case Study 4: Disposable Data

Twitter – example of a micro-blogging application

Facebook status messages is another related example

Web

2.0

Issues:• Is the Twitter service

will sustainable over a long period?

• What will happen to the data?

• What about the IPR for ‘tweets’?

• …

Page 25: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

25

Case Study 4: Disposable DataW

eb 2

.0

Many twitterers regard their tweets as disposal

I tend to use Twitter as a ‘virtual water cooler’ – sharing gossip, jokes and occasional work-related information with (mainly) people I know

You could make use of clients which manage your tweets (e.g. treat like email)

But you should develop your policies first, prior to exploring technologies

Page 26: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

26

Web

2.0 Case Study 4: Disposable Data

Skype (or your preferred VoIP application) are growing in popularity

Issues:• Is the digital data (the call)

preserved?• What about the video and

the IM chats?

Possible responses:• Am I bovvered?• I didn’t bother with

analogue phones, why should I worry now?

Page 27: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

27

Case Study 5: Digitized Talks

Seminar on Open Science given at UKOLN in Feb 2008.Video clip of opening 10 mins taken & uploaded to YouTubeIssues:

• Privacy• Quality• Benefits• Long term access

Benefits identified – now how do we seek to deploy recordings of seminars, conferences, etc. on a more systematic basis?This is work in progress – but see IWMW 2007 videos

Benefits identified – now how do we seek to deploy recordings of seminars, conferences, etc. on a more systematic basis?This is work in progress – but see IWMW 2007 videos

Page 28: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

28

Case Study 6: Slideshare

What happens to your slides if Slideshare disappears?My approach:

• Master copy held on managed environment• Info on master on title slide and metadata• CC licence & download available – many copies

Page 29: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

29

Case Study 7 – Video Tools

Requirement:• Provide brief video clips for

colleagues running workshop• Initial idea – use Seesmic video

micro-blogging service (can include video responses)

But:• No video export function (yet)• Accessibility of responses

Approach taken:• Create video locally• Upload video to YouTube (to

allow textual comments)• Link to managed master file

Seesmic and YouTube Web sites and Twirl client are access tools; the

data is managed elsewhere

Seesmic and YouTube Web sites and Twirl client are access tools; the

data is managed elsewhere

Page 30: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

30

Case Study 8: Social Networks

University of Wales, Newport and University of Bradford have set up Ning networks for supporting their students:

• Bradford: Aimed at students during their first term at University

• Newport: Open Intended for students about to arrive at institution

What does preservation mean in this context?Answers to this question will be left as an exercise for the participants

http://newstudents.newport.ac.uk/http://newstudents.newport.ac.uk/

Page 31: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

31

What Do We Do For SNs?

The Open University has a presence in Facebook. On 9 Sep 2008:

• 9,785 fans• 1,233 wall posts• 138 discussion

topicsIs anyone:

• Recording the history?

• Curating the data• Managing

possible risks?

Web

2.0

Page 32: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

32

Role Of The Internet Archive

Can we leave everything to the Internet Archive (IA)?

• Has role to play in Web 1.0

• Seems to archive some public blogs

• May not access images or other embedded content

• Still has limitations (cf. UCE/BCU)

Can’t access, e.g., Facebook pages

Web

2.0

IA is a 3rd party Web 2.0 serviceIA is a 3rd party Web 2.0 service

Page 33: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

33

The Research Challenges

Some thoughts:• Preservation of Web sites in

known to be difficult• Additional difficulties in a

Web 2.0 world• Complexities include

technical challenges and business issues

However:• Is avoiding Web 2.0 a

realistic answer?• There may be some simple

processes which may help

Web

2.0

Page 34: A centre of expertise in digital information management Web 1.0, Web 2.0 and Digital Preservation Brian Kelly UKOLN University of Bath Bath,

A centre of expertise in digital information management

www.ukoln.ac.uk

34

Questions