21
What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere [email protected]

What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere [email protected]

Embed Size (px)

Citation preview

Page 1: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

What Agencies Should Know About PDF/A-1

April 6, 2006

Mark [email protected]

Page 2: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Introduction

Agenda• Why long term preservation of PDF is an issue

• Overview of PDF/A-1 and the ISO Process

• Discussion of PDF/A-1 Standard and NARA’s Transfer Guidance for Permanent PDF records

• Roles of both PDF/A-1 and the NARA’s PDF Transfer Guidance in Federal recordkeeping

• Conclusion and Questions

Page 3: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Long-term preservation of PDF is an issue

Wide use of PDF• PDF is a ubiquitous open format for electronic documents

– Proprietary, but with publicly available specification

• Much important information maintained in PDF – Permanent archival records, in some cases

• The feature-rich nature of PDF can complicate preservation efforts

Page 4: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

PDF Not a Suitable Archival Format

• PDF itself is not suitable as an archival format – Some features not compatible with current archival

requirements• Not necessarily self-contained• Encryption • All PDFs are not created equal

• Long-term solution needed – Permanent archival records, in some cases– Administrative Office of U.S. Courts initiated idea for an

ISO Standard based on PDF (PDF/A)

Page 5: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Overview of PDF/A-1 and the ISO Process

• Multi-part ISO International Standard

– ISO 19005-1:2005, Document management – Electronic document file format for long-term preservation – Part 1: Use of PDF 1.4 (PDF/A-1)

– Part 2 (19005-2) intended to bring PDF/A into conformance with PDF 1.6

– Part 3 (19005-3) intended to address dynamic content (e.g., Java Script)

– And additional future parts, as necessary

Page 6: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

PDF/A-1 Approach

• PDF/A-1 specifies:– The subset of PDF components, from the PDF 1.4 Reference),

that are either required, restricted, or prohibited, and – How these components may be used by software

PDF/A

PDF 1.4 Reference

Specifies required featuresSpecifies restricted features

Specifies prohibited features

Page 7: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

PDF/A-1 Requirements

• Disallows or limits features that could complicate long term preservation, and

• Maximizes: – Device independence

• Can be reliably and consistently rendered without regard to the hardware/software platform

– Self-contained• Contains all resources necessary for rendering

– Self-documenting• Contains its own description

– Transparency • Amenable to direct analysis with basic tools

Page 8: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

PDF/A-1 Table of Contents

• 1 Scope• 2 Normative References• 3 Terms and Definitions• 4 Notation • 5 Conformance Levels• 6 Technical Requirements

– 6.1 File Structure– 6.2 Graphics– 6.3 Fonts

– 6.4 Transparency– 6.5 Annotations– 6.6 Actions– 6.7 Metadata– 6.8 Logical Structure– 6.9 Interactive Forms

• Informative annexes

– Annex A - PDF/A-1 Conformance Summary

– Annex B - Best Practices for PDF/A

• Bibliography

Page 9: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Two Conformance Levels

• Level A - Promotes the creation of PDF/A files with rich semantic and structural information, – Uses “Tagged PDF” and Unicode character maps

• Level B - Allows less complex files such as scanned images. – Includes all requirements of 19005-1 minimally

necessary to preserve the visual appearance– Does not require users to define structure or other

descriptive information.

Page 10: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Annexes of the Draft PDF/A Standard

• Informative Annexes provide supplemental information including:– Summary of the PDF structures and components

disallowed, required, or limited– Best Practices for PDF/A-1

• Guidelines for capturing or converting electronic documents to PDF/A-1– To replicates the exact quality and content of

source documents – Required for compliance with NARA’s PDF Transfer

Guidance

Page 11: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

PDF/A-1 Dos and Don’ts

PDF/A-1 Dos:

• Embed fonts

• Device-independent color

• XMP metadata,

• Tagging

PDF/A-1 Don’ts:

• Encryption

• LZW Compression

• Embedded files

• External content references

• Transparency

• Multi-media

• JavaScript

Page 12: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

NARA’s Expectations for PDF/A

– PDF/A-1 should address some of the PDF archival issues and enable PDF records to be maintained longer as PDF

– Standard maintained by ISO, not just vendors – Agencies should implement PDF/A-1 along with

records management policies and procedures

• Such as….

– NARA’s PDF Transfer Guidance

– AOUSC’s document management program

Page 13: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

How NARA is Addressing PDF

• Issued PDF Transfer Guidance– Allowing agencies to transfer permanent records to

NARA in PDF In March of 2003, NARA

• Participating in PDF/A ISO Standard Development– To influence the process– To gain knowledge

Page 14: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Transfer Format versus File Format

NARA’s transfer guidance and PDF/A-1 have a similar

goal …..to ensure that valuable electronic information in PDF is not lost.

But different purposes:• Transfer Format - NARA’s PDF Transfer Guidance

– Specifies NARA transfer requirements – Applies to existing and future records in PDF

• File Format - The PDF/A ISO Standard (PDF/A-1)– Specifies a subset of the PDF file format – More format reliability/fewer in “bells & whistles”– PDF should be maintained longer as PDF (e.g., within agencies)

Page 15: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Scope and Usage

NARA’s PDF Transfer Guidance• Usage: Instructions on what is required to transfer existing

permanent PDF records to NARA. • Scope

– Applies to permanent records– PDF 1.0 - 1.4– Addresses quality criteria, laws and regulations, transfer documentation,

NARA contact information PDF/A-1 ISO Standard • Usage: Programming specification to create and process the file

format• Scope

– Applies to one aspect of long term preservation (i.e., file format) – PDF 1.4– Addresses how to use the PDF 1.4 reference to create and process a

flavor of PDF that is more amenable to long term preservation.– Should be used as one piece of the archival puzzle

Page 16: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Requirements - PDF/A and NARA’s PDF Transfer Guidance

Embedded fonts • PDF/A-1 and NARA’s PDF Transfer Guidance both

require that fonts be embedded– NARA guidance phases in requirements for

workstation resident fonts.

Encryption • PDF/A-1 and NARA’s PDF Transfer Guidance both

prohibit encryption– NARA guidance phases in requirement as long as

we can open, view and print

Page 17: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Special Features• PDF/A-1 restricts special features

– Embedded files, external links, Java Script– PDF/A-1 promotes tagged PDF as a higher level of

conformance• NARA evaluates special features on a case-by-case

basis at the time of scheduling

Metadata/Documentation • PDF/A requires that embedded metadata must be in

Adobe XMP• NARA requires transfer documentation (e.g., SF-258),

and would evaluate embedded metadata at the time of scheduling

Requirements - PDF/A and NARA’s PDF Transfer Guidance

Page 18: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Quality Requirements• PDF/A-1 as a file format does not address

quality/creation requirements such as exact replication of source material– Informative Annex B - identifies recommended creation

guidelines

– Agencies must implement these guidelines to comply with NARA’s PDF transfer guidance

• NARA’s PDF Transfer Guidance includes – quality requirements regarding scanning quality, – lossy compression – substitution of characters with OCR’d text

Requirements - PDF/A and NARA’s PDF Transfer Guidance

Page 19: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

• For records in PDF, agencies need to understand that:– PDF/A-1 is one option for long-term preservation of

electronic documents– PDF/A-1, by itself, does not guarantee exact

replication of source material– Agencies must implement PDF/A-1 in conjunction

with additional requirements to meet NARA standards for transferring permanent records to NARA (i.e., NARA’s PDF Transfer Guidance)

Take Away

Page 20: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

More Information is Available

• More information on NARA’s PDF Transfer Guidance on NARA’s Web Site– http://www.archives.gov/records-mgmt/initiatives/pdf-records.html

• More information on PDF/A on AIIM Web Site– http://www.aiim.org/standards.asp?ID=25013

• Contact Susan Sullivan at [email protected]

Page 21: What Agencies Should Know About PDF/A-1 April 6, 2006 Mark Giguere mark.giguere@nara.gov

Questions/Discussion