17
Digital preservation for ongoing access Presentation for Council July 2008 David Pearson Manager, Digital Preservation Section

Digital presevation

Embed Size (px)

Citation preview

Page 1: Digital presevation

Digital preservationfor ongoing access

Presentation for Council July 2008

David PearsonManager, Digital Preservation Section

Page 2: Digital presevation

Overview

1. We have lots of “digital stuff” in our collections and it is growing

2. We will lose access to it unless we take action

3. We need to manage the process of keeping it accessible and usable

4. Solutions have to be scalable, reliable and automated

Page 3: Digital presevation

1. “Digital stuff”- many collections

Oral HistoryPictures

Historical Newspapers

Maps

Manuscripts

Books

Web sites

Ephemera

Sheet music

Serial

Page 4: Digital presevation

How does it grow?

1. We collect it – Physical carriers– Online

• PANDORA web archive• Australian web domain harvests

2. We create it– Oral history interviews – Photographs – Publications

3. We convert it– Digitise our collections

Page 5: Digital presevation

Web Archives

• Web sites are collected selectively – Individually for access via PANDORA, or

– On a large scale via annual domain snapshots

• No control over content creation

• Lots of – File formats

– Individual files (Pandora ≈ 51 million, Domain harvest ≈ 1.3 billion files)

– Links

– Software (browser, plug-ins, readers)

• Internet content changes over time

Page 6: Digital presevation
Page 7: Digital presevation
Page 8: Digital presevation

Digitisation

• Around 135,000 items digitised

• Newspaper project = 4 million pages by 2010

• Internally created so we can control– Standards– File formats (e.g. TIFF,

JPEG, PDF )– Metadata– Workflows

• Issues– Growing volume

Page 9: Digital presevation

Physical carriers

• Approx. 12,000 items – grows by 1,000 a year

Issues• No control over creation

• Time lag before acquisition

• Variety of carriers (fragile) and file formats

• Require various hardware, software, operating systems, drivers to access

• Labour intensive to process and transfer to safe storage (growing backlog)

Page 10: Digital presevation

Growth : digital collection storage

0

50

100

150

200

250

300

350

Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08

Stor

age

size

(ter

abyt

es)

Australian Web Harvests

Newspapers

Page 11: Digital presevation

Type of Digital Collections2008

Pandora3%

Maps2%

Sheet Music4%

Manuscripts2%

Pictures7%

Oral History18%

Other3%

Historical Newspapers

21%

Australian Web Harvest

40%

Page 12: Digital presevation

Comparison of books collection & digital collection "book equivalents"

0.00

1.00

2.00

3.00

4.00

5.00

6.00

2005 2006 2007 2008

Year end June

"Boo

k Eq

uiva

lent

s" (m

illio

ns)

Digital Collection20 mb "bookequivalents"Books Collection

Growth: compared to books

Page 13: Digital presevation

2. Act or risk losing it

• “Digital stuff” is dependent on technology at all stages– Creation/capture

– Storage

– Access

• Technology changes rapidly thus software, hardware, media, file formats, operating systems become obsolete

• Unless managed deterioration can occur rapidly e.g. data can be corrupted or lost in storage or transfer process

Page 14: Digital presevation

Computer Museum

Page 15: Digital presevation

3. Managing to keep it

• “Not managing it” is not an option

• We need to

– Understand our “digital stuff” & associated risks

– Provide safe storage & ensure integrity

– Ensure access over time as technology changes

– Develop & implement preservation workflows, skills, standards, & strategies for ongoing access

– Enable content to be shared and used in different ways in the future

Page 16: Digital presevation

4. Solutions and implications

• Large scale automated processes

• Original research & time to deliver the solutions

• Reasonably long lead times

• Audit processes and quality control monitoring are critical

• Significant resources are required

Page 17: Digital presevation

Conclusions

• We are responsible for a lot of “digital stuff”• If we simply collect and store it, it will become

unusable in a relatively short time as technologies change

• Maintaining the ability to access it requires a lot of good management, planning, & dedicated resources

• We have to find and use solutions that can be applied automatically and reliably to billions of digital files