43
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Embed Size (px)

Citation preview

Page 1: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

How to build your own Dark Archive (in your spare time)

Priscilla CaplanFCLA

Page 2: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA
Page 3: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Topics

• History: What we thought we were going to do

• Geography: Where theory meets reality

• Horticulture: Some thorny details

Page 4: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage

• 3-year project with help from IMLS

• Focus on data for cost analysis

• Treatment based on Action Plans

• Limit ingest to formats with Action Plan

• Canonicalization & forward format migration

• Make tools available as Open Source

Page 5: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage

• ?-year project with help from IMLS

• Focus on data for cost analysis

• Treatment based on Action Plans

• Limit ingest to formats with Action Plan

• Canonicalization & forward format migration

• Make tools available as Open Source

Page 6: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage

• ?-year project with help from IMLS

• Focus on designing DAITSS

• Treatment based on Action Plans

• Limit ingest to formats with Action Plan

• Canonicalization & forward format migration

• Make tools available as Open Source

Page 7: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and

Background Reports• Limit ingest to formats with Action Plan• Canonicalization & forward format migration• Make tools available as Open Source

Page 8: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and

Background Reports• Unlimited ingest; two preservation levels• Canonicalization & forward format migration• Make tools available as Open Source

Page 9: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and

Background Reports• Unlimited ingest; two preservation levels• Normalization, forward migration, bit

preservation of original• Make tools available as Open Source

Page 10: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA Digital Archive Plan

• Dark archive using tape storage• ?-year project with help from IMLS • Focus on designing DAITSS• Treatment based on Action Plans and

Background Reports• Unlimited ingest; two preservation levels• Normalization, forward migration, bit

preservation of original• Make DAITSS available as Open Source

Page 11: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Theory 1: Preservation Strategies

Page 12: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Maintain original

technology

Preserve Technology

OBJECTIVE

Preserve Objects

Spec

ific

APPLI

CABIL

ITY

Gen

eral

ProgrammableChips

Emulation

Viewer

Re-engineerSoftware

VirtualMachine

UniversalVirtual

Computer

VersionMigration

FormatStandardization

Rosetta StoneTranslation

Typed ObjectConversion

PersistentArchives

ObjectI nterchange

Format

Source: Thibodeau, 2002.

Page 13: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Mass Migration

B

P1

A

B

P2

C

C

Page 14: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Migration On Request

C

BA

A

B C

P1

P2

P3

Page 15: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Mass Migration Or MOR

C

BA

A

B C

P1

P2

P3

Page 16: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Mass Migration Or MOR + Normalization

BA

N

P1

NNNN

NNNNMP2

Page 17: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Theory 2: OAIS

4-1

.2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

Page 18: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Formal OAIS Compliance

“A conforming OAIS archive...

• … shall support the model of information described in 2.2”

• … shall fulfill the responsibilities listed in 3.1”

Page 19: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

OAIS Information Model

Content InformationPreservation DescriptiveInformation

Contentdata

object

RepresentationInformation

Context Info

Reference Info

Provenance Info

Fixity Info

Page 20: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Responsibilities in 3.1

Page 21: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

FCLA’s OAIS Compliance

• Formal agreements with “Producers”• Documented SIP, DIP, AIP• Metadata stored redundantly with content data

objects• Retaining both original and migrated AIPs• No content data objects altered in repository• All representation info ends in specification library• Clear separation of functions (4.1)

Page 22: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

DAITSS Functional Architecture

IngestSIP

AIP

Storagemanagement

Dissem-ination DIP

Reporting

MgmtDB

Page 23: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Ingest Functions

• METS validation and metadata extraction

• File format identification and validation

• Extraction of technical metadata

• Harvesting of external files

• Normalization and Forward Migration

• AIP creation

• Storage update

Page 24: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

What’s a (S)(A)(D)IP anyway?

XML

PDF AVI

SIP

Page 25: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

XML

PDF AVI

SIP

XML

XML

XML

XML

XML

XML

TIFF

TIFF

TIFF

Database

AIP

Page 26: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Theory 3: Risk Management

Page 27: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Formats

• Risk of format obsolescence

• Risk of loss in migration

• Action Plans and Background Reports– whether to normalize– long-term strategy and short-term actions– when to revisit

Page 28: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA
Page 29: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Background Reports

• Format description• Pointer to

specification • How to recognize• History and duration• Openness,

maintenance body• Platform support

• Legal issues• Perceived popularity• Limitations• Related specifications• Conclusions• ALL GOOD THINGS

FOR A GLOBAL DIGITAL FORMATS REGISTRY!

Page 30: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

TANSTAASF

• There ain’t no such thing as a simple format– XML?

• Extension technologies

• External references (DTDs, entity references, Schema, external files, stylesheets, …)

– ASCII?• No way to indicate character encoding

Page 31: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Redundancy

• Content:– multiple independently written masters– routine normalization– bit preservation of original– retention of intermediate versions

• Integrity: SHA-1 and MD5 checksums• Metadata: in XML with content and in

RDBMS

Page 32: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Metadata Redundancy

• How to store all metadata pertaining to an object with the object?

• No existing / suitable METS extension schema

• Direct map to DAITSS tables– elements for each table– sub-elements for each column

Page 33: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA
Page 34: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Theory 4: File formats

Page 35: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Preferred file formats

• Pass fidelity test

• Pass “future” test– Well documented, well supported– Standards or de facto standards (widely used)– Without proprietary technologies e.g. codecs

• Without access inhibitors e.g. encryption

Page 36: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Preferred file formats for FDA

• We can’t control what comes in

• Will do bit-level preservation on anything

• Will normalize to preferred format if possible

• Encourage use of preferred formats on campuses

Page 37: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

But what’s a file format anyway?

• Format profiles, e.g. GeoTIFF or XML document with DTD

• Technical characteristics adhere to bitstreams

Metadata-1

Image-1

Image-2

Metadata-2

TIFF 6.0

Page 38: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

And files can have multiple layered formats

Foo.AVI

Foo.PDF

Foo.XML

Foo.tar

Foo.tgz

Page 39: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

DAITSS Data Model

Intellectualentity

(1)

Bitstream(0..n)

Information Package

Data File (1..n)

Page 40: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

DAITSS Data File Object

X M L S G M L

M a rku p F ile T IF F F ile

D T D

T e x tF ile P D F F ile

D a ta F ile

Page 41: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

DAITSS Bitstream Object

A u d io

JP E G Im a ge T IF F Im a ge

Im a ge T e xt V id eo

B its tre am

Page 42: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

Environment

• Software (rendering, runtime, OS, driver)

• Hardware (processor, memory, video card)

• Is environment a property of file format?

• Which of many environments do you record?

• To be meaningful, must environment be arbitrarily recursive?

Page 43: How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA

http://www.fcla.edu/digitalArchive/[email protected]