15
© 2011 Adobe Systems Incorporated. All Rights Reserved. Acrobat X and PDF/A Standards Long Term Preservation of Documents Rick Borstein Business Development Manager [email protected] Mark Middleton Business Development Manager [email protected]

Acrobat X and PDF/A StandardsMARK\爀吀栀愀渀欀猀 昀漀爀 猀栀愀爀椀渀最 琀栀愀琀 椀渀昀漀爀洀愀琀椀漀渀 眀椀琀栀 甀猀 屲To kick it off, I’m

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

© 2011 Adobe Systems Incorporated. All Rights Reserved.

Acrobat X and PDF/A Standards Long Term Preservation of Documents

Rick Borstein Business Development Manager [email protected]

Mark Middleton Business Development Manager [email protected]

Presenter
Presentation Notes
Mark Hello and welcome to Acrobat X and PDF/A Standards, an Adobe eSeminar. My name is Mark Middleton and I am an Acrobat Business Development Manager at Adobe. I want to thank you for joining us. With me today is Rick Borstein who is also a Business Development Manager for Adobe. Many of our customers– especially government regulatory bodies, Records managers, archivists, and industry compliance professionals – are concerned about the long term preservation of digital documents. The purpose of today’s event is to make you aware of how Adobe Acrobat X can be used to easily preserve and protect final documents of record as self-contained files, helping to ensure future access to information. More specifically today, we’ll be talking about PDF/A, the PDF for Archiving standard. Ok, let’s discuss what we’ll cover.
borstein
Callout
Open the Sticky notes to read the slide narrative.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

What we’ll cover . . .

Slides

Digital Archiving

What is PDF/A?

Why use PDF/A?

PDF/A Standards (Flavors)

PDF/A Requirements

PDF/A Conformance

Acrobat PDF/A Matrix

Demonstration

Wrap Up

2

Presenter
Presentation Notes
MARK Thanks for sharing that information with us. To kick it off, I’m going to quickly cover our agenda for today. First, we’ll have a few slides to cover digital archiving concepts, some definitions, and information about the PDF/A standard, including the various PDF/A “flavors”. During the slides, we’ll also talk about PDF/A conformance and requirements and how the various versions of Acrobat can view and create PDF. Oh, and when I say conformance, by the way, I mean taking an existing PDF file and turning that into a valid PDF/A file. More on that later. Before we go any further, we want to ask you a few questions in the form of a poll. Rick?

© 2011 Adobe Systems Incorporated. All Rights Reserved.

Existing PDF Standards

PDF 1.7 (ISO-32000)

PDF/A archive

ISO-19005 PDF 1.4

PDF/E engineering

ISO-24517 PDF 1.6

PDF/UA accessibility

ISO-14289 ISO 32000

PDF/X graphic arts

ISO-15930 PDF 1.4 ,1.6

Presenter
Presentation Notes
Mark There are a number of different PDF standards. The ones we’ve listed here are ISO– that is, International Standards Organization– standards. ISO, in case you’re not familiar with it, is a standards body founded in 1926 to help promote worldwide standards in manufacturing and commerce. Today, there are 163 ISO member nations, so ISO is truly a worldwide organization. ISO’s work is really broad and the organization has over 2700 technical committees and subcommittees which create and maintain standards of many types. Rick I first learned about ISO many years ago when I was buying some film for my camera when I was a kid. That box of kodachrome has an ISO film speed rating and I had to ask my Dad what ISO was. Mark Film, yes, for sure but that’s just one example among thousands of ISO standards. By the way, you might have noticed that each of these different PDF specifications have an ISO number assigned to it. For example, PDF itself is ISO-32000. Rick Right. In 2007, Adobe released the PDF Standard to ISO. ISO-32000 is essentially the Acrobat 1.7 spec which is the Acrobat 8 specification. Mark PDF/A is ISO-19005 and it is based on PDF 1.4, the Acrobat 5 specification. So, you can see, PDF/A predates the ISO-32000 standard. You might be wondering, what is the relationship between ISO and Adobe? Adobe, as you might expect, has representatives on all the ISO standards bodies and has been very active in leading many of the committees. I’m told that Adobe’s PDF Architect and Standards Evangelist Leonard Rosenthol is watching today– and answering questions– so we better stay on our toes.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

What is PDF/A?

An ISO Standard

“... a mechanism for representing [PDF] electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing, or rendering the files.” ISO Committee

A long term preservation format

Presenter
Presentation Notes
Mark So, let’s drill into PDF/A. Well, first of all, PDF/A is an ISO standard, so it a worldwide, non-proprietary specification. And, I’m just going to read this . . . Here’s what ISO says is the intention of the PDF/A standard . . . It’s “... a mechanism for representing PDF electronic documents in a manner that preserves their visual appearance over time, independent of the tools and systems used for creating, storing, or rendering the files.” PDF/A is designed for the LONG TERM preservation of digital material ensuring the material may be readable fifty years, a hundred years, or even much longer.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

Preservability – Format Comparison

Fidelity, presentation &

reliability

Access to content & searching

Dependencies – platform,

application, OS Access to

specifications File Size

PDF Image

Formats TIFF, JPEG,

BMP

?

Native Formats doc,

office ? ? ? ?

HTML ?

XML ? ? ? ?

Presenter
Presentation Notes
Mark When ISO met to choose a new digital preservation standard, they chose to base the standard on PDF. Why, you might ask, did they recommend PDF instead of say . . .TIFF? While image files like TIFF and JPEG are widely used for capturing and keeping information, they’re not particularly searchable, which limits their flexibility for easily searchable, multi-page documents. Why not Office file types like Word? One reason is that native formats suffer from platform and software dependencies – how can you guarantee that a copy of an operating system (or the hardware on which to run it) will be available even in 10 years’ time? What about XML? That seems to be a hot topic these days. Well, XML doesn’t really provide a human-friendly view of information, among other considerations. What about HTML? Well, you probably know that web sites looks different depending on the browser you use, so the preserving the visual appearance isn’t possible. Ultimately, PDF provided the best fidelity to the original document, combined with the best access to content without a dependency on platform or application. The development of an open format based on PDF also ensured that the specification would be publicly available and published so that anyone could access the information, even building their own viewer if the Adobe Reader were no longer in existence. The PDF/A working group decided on PDF because of its merits for long-term preservation of digital information.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

When do you need to use PDF/A?

Regulatory Requirements

Federal Courts

State Courts

National Archives and Records Administration

California Public Utilities Commission

Preserve important documents

Protect your organization and reduce risk

Presenter
Presentation Notes
Mark Although many organizations are choosing PDF/A, there may be different reasons for doing so. You or your organization may be obliged to provide PDF/A files to a regulatory body. We’ve listed just a few examples here. Recently, the US Federal Courts announced on their PACER website that they are transitioning to PDF/A. We expect many state courts to follow their lead. Other regulatory agencies like the California Public Utilities Commission require PDF/A. Rick That makes sense, I would certainly hope that the records for nuclear power plants could be read hundreds of years from now. Mark Me, too. Enterprises are also choosing PDF/A to preserve their important archive documents like financial statements, compliance reports, technical information, etc. Another reason to choose PDF/A is to protect your organization and reduce risk. For example, let’s say you are an engineering or architecture firm. You may be called upon to defend your work in court, perhaps years from now. With computer systems changing at the pace we’ve seen, it’s hard to say how an AutoCAD 2010 file will open in AutoCAD 2025. If you save your key work in PDF/A, then you know you’ll be able to open that document years from now. Moreover, you can argue that you used the best practice available at the time for preserving documents.

© 2011 Adobe Systems Incorporated. All Rights Reserved. 7

PDF/A-1A and PDF/A-1B Specifications

PDF 1.4 Reference Acrobat 5.0

Recommended Features

Required Features

Prohibited Features

Presenter
Presentation Notes
Mark Let’s dive into the PDF/A specification a bit. In this case, I’m going to talk specifically about PDF/A-1A and PDF/A-1B. These PDF/A flavors are based on the Acrobat 5 specification, which is called PDF 1.4. The PDF/A spec calls out features which are required, features which are prohibited and features which are recommended. Let’s look into all the PDF/A flavors and compare them. Rick?

© 2011 Adobe Systems Incorporated. All Rights Reserved.

Comparing PDF/A “Flavors”

PDF/A-1A PDF/A-1B PDF/A-2A PDF/A-2B PDF/A-2U

Version Compatibility PDF 1.4 PDF 1.4 ISO-32000 ISO-32000 ISO-32000

Embed all Fonts Required

Multimedia Prohibited

JavaScript Prohibited

Encryption Prohibited

Attachments Prohibited PDF/A attachments only

Transparency Prohibited Allowed

Tagged & Accessibility Required Optional Required Optional

Searchable Unicode Text Required Optional Required Optional Required

Metadata Required Doc Only

Optional Required Doc & Object

Optional

Presenter
Presentation Notes
Rick Thanks, Mark. Oh, how I loath sharing these eye charts, but I need to tell you what is allowed and not allowed in the various flavors of PDF/A. So, starting at the left side of the chart, we have PDF/A-1A and PDF/A-1B which are the original PDF/A Flavors. You’ll notice that the first line in the chart is version compatibility. That can be a little confusing, it least it was to me. Mark That doesn’t sound that confusing to me. It’s the version of the PDF file, right? Rick No, not exactly, and that’s what confused me. Version Compatibility means that the file doesn’t have any features newer that those included in Acrobat 5. It doesn’t actually mean that the file has to be a PDF 1.4 file. You could have a valid, PDF/A-1 file that is, in fact, a PDF 1.5 file or later. How weird is that? Mark Plenty weird, if you ask me. Rick Now, the PDF/A-2 flavors are were just ratified in October of last year and they are based on ISO-32000 the Acrobat 8 specification. There are a number of advantages to having PDF . . . Really any PDF, based on newer specifications. ISO-32000 compresses everything in the document stream and that results in smaller documents. All versions of PDF/A require all of the fonts to be embedded. We’ll cover how to do this during the demonstration. One aspect of the PDF/A is that the documents must be self-contained with no reliance on external players. For that reason, multimedia– such as movies and sounds– is not allowed. Since archives should not be changed, JavaScript or any other dynamic element is prohibited. Here’s another prohibition No encryption allowed. Mark Well, that seems pretty obvious, After all, what good is your important document fifty years from now if you’ve forgotten the Open password? Rick Good point. Now, we are going to talk about some of the differences between PDF/A flavors. Using Acrobat, you can embed a file into a PDF. That’s not allowed in the the PDF/A-1 flavors. In PDF/A-2, you can embed files as long as the embedded files are PDF/A also. Another difference is in the area of transparency. ISO 32000 PDFs support transparency which is important in print workflows and even for preserving fidelity of PowerPoint presentations. Accessibility and tagging is another area where there are differences. Mark, you’re always good at explaining that . . . Mark A tagged or accessible PDF has structural information in it so that people who are visually impaired can use them with screen reading software on their computer. For example, let’s say you had a three PDF page with three columns. By “tagging” the document, the screen reading software knows how to read column 1, followed by column 2, followed by column three. Rick Another difference between the various flavors is the requirement for searchable Unicode text. Unicode simply means that you can have a large number of different characters to support many different languages. A PDF you generate from Word would meet that requirement. But, what about a scanned image of, say, a paper memo? Mark Well, you could OCR the document in Acrobat to make it searchable. Rick Right, you could. This is an interesting change that the PDF/A committee made when they rolled out PDF/A-2. One thing they recognized is that they didn’t have a standard for searchable, scanned documents. Searchable text is optional with PDF/A-1B and, for reasons we will see later, it can be really challenging to take paper documents and make them accessible. So, we now have PDF/A-2U which requires searchable text, but does not have to be tagged. Finally– and I can hear our legal customers on the call getting restless– is the way that PDF/A deals with metadata. Metadata is information about a file such as title, subject, author, etc. Metadata could also be information about a element inside the file . . . For example, the metadata attached to a photo from a digital camera. Metadata is always optional in PDF/A, but is recommended. One difference in PDF/A-2 is that object metadata is supported.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

More Importantly . . .

PDF/A-1a ensures the preservation of a document’s logical structure and content text stream in natural reading order. The text extraction is especially important when the document must be displayed on a mobile device (for example a PDA) or other devices in accordance with Section 508 of the US Rehabilitation Act. In such cases the text must be reorganized on the limited screen size (re-flow). This feature is also known as “Tagged PDFs”.

Rick’s Observations

You need to start with source documents such as Word files

Bringing a PDF in PDF/A-1a conformance is extremely difficult for existing PDFs

PDF/A-1b ensures that the text (and additional content) can be correctly displayed (e.g. on a computer monitor), but does not guarantee that extracted text will be legible or comprehensible. It therefore does not guarantee compliance with Section 508.

Rick’s Observations

Can bring most PDFs into compliance

Presenter
Presentation Notes
Rick While I covered PDF/A-2, it’s so new that I don’t know of any government agencies that require it yet. So, I’m going to confine my discussion to the key differences TO YOU for PDF/A-1A and PDF/A-1B. 1A is sometimes called a “Full Conformance” file and you might think . . .well, I should do that. Don’t go there, seriously. Let’s say you have a PDF you want to make into a PDF/A-1A file. Tagging the document for accessibility can be very time consuming. Realistically speaking, the only way you’re going to be able to create PDF/A-1A files is start upstream in the authoring application. The Adobe PDF Makers support tagging. Few clones do or do not do it effectively. PDF/A-1B is much more forgiving. It doesn’t require tagging. Scanned documents are fine and they don’t even need to be searchable.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

What software should you choose to work with PDF/A?

PDF/A conformance requires a significant effort from the producers of PDF software.

Creation This stages deals with the generation of PDF/A conforming documents from various source formats.

Correction A PDF document may have to be modified in order to achieve conformance with PDF/A.

Processing Conformance must be preserved when a PDF/A document is modified.

Display This refers to the presentation of a PDF/A file in accordance with the requirements. Simply displaying a PDF/A file “somehow”, as is the case with many viewers, is insufficient.

Validation It is often necessary to verify that a PDF/A file actually conforms to the standard.

10

PDF/ACo

www.pdfa.org

Presenter
Presentation Notes
Rick So, you need to work with PDF/A. What software should you use? Mark Selfishly, I’d have to say Adobe’s . . . Rick I’m not going to argue with that, but you may want to consider what an independent group like the PDF/A Competence Center says. One thing they say is that PDF/A requires a lot of work on the part of developers. And, there’s more to getting it right than just creating a PDF/A file, if you can get that right. You also need to be able to correct issues, display PDF/A files correctly and be able to validate the results. Mark So, to me, that sounds like PDF/A is more than a single bullet on a feature chart. Rick There are lots of tools that work with PDF out there. I always like to say that Adobe does more with PDF than any other vendor and we do more right. We think that makes a difference.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

PDF/A Workflow Considerations

11

Should I create all documents as PDF/A? Change docs? Combine docs?

Use “Near-to PDF/A” Settings for documents Embed all fonts Correct Version

Watch out for Metadata Scrubbers Remove PDF/A Information

Know how your tools treat PDF/A Many tools invalidate PDF/A without warning

Presenter
Presentation Notes
Rick Every organization is different, but I’d like to talk about a few issues that have come up with our customers who want to implement PDF/A in their organizations. One question we hear a lot is– should I routinely create PDF/A instead of using one of the other PDF settings in Acrobat? Mark Well, Rick, I don’t think there is a clear answer to that . . . It depends. Rick Right, it depends primarily on whether you will later need to perform operations on the PDFs for example, taking out pages, inserting pages, combining with other files, etc. For example, a lot of the law firms which have been calling me over last couple of months want to know what to do since the Federal Courts are moving to PDF/A. Since law firms regularly edit PDFs and add supplementary materials, I don’t recommend that firms create PDF/A files all the time. Instead, I recommend using settings that are near-to PDF/A and allow for easy conformance. I mentioned earlier, that one important requirement for PDF/A is that all fonts are embedded. If you change your default PDF/A settings to always embed fonts, it will be much easier to conform to PDF/A. Another consideration is the use of metadata scrubbers. Metadata scrubbing software is available from a variety of vendors, and we see it primarily in law firms, corporate legal departmetns and government agencies. These kind of applications are often integrated with Outlook and remove metadata from a variety of file types including PDF to prevent accidental disclosures. Unfortunately, removing PDF metadata also invalidates PDF/A documents. So, that could be a real problem. Finally, know how your PDF tools treat PDF/A. Most PDF editing products do not warn you when you open or edit a PDF/A file and you can easily invalidate it.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

PDF/A View Mode

12

Presenter
Presentation Notes
Mark One exclusive feature of the entire Acrobat product line– including Reader– is PDF/A view mode. PDF/A View Mode automatically detects PDF/A files and prevents you from changing your archive file. After all, if you edit the file, it’s no longer an archive is it? We recommend leaving PDF/A View Mode on to prevent accidentally changing archive files. We’ll show you how to turn it off, too, if needed.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

PDF/A Features in Acrobat X Products

Use Actions to automate PDF/A Conversion , Conformance, Verification

Validate and Report PDF/A Compliance

Use PREFLIGHT to conform to PDF/A-1A, 1B, 2A, 2B or 2U

Use SAVE AS to conform to PDF/A-1A, 1B, 2A, 2B or 2U

Scan to PDF/A-1B using Create PDF from Scanner

Create PDF/A-1B using Adobe PDF Print Driver

Create PDF/A-1A using Office PDF Makers

PDF/A View Mode and Standards Panel

Reader Standard Pro Suite

Presenter
Presentation Notes
MARK While entire Acrobat product line supports PDF/A, there are some differences between Acrobat products that you should know about. Let’s take a quick look at the products. [build] Adobe Reader is installed on over 90% of all internet connected desktops. Adobe Reader supports PDF/A view mode, so they will be alerted that they are looking at an archive file. And, you may be thinking, well, Reader can’t really change file, right? That’s not completely true. Reader X users have the sticky note and highlight tools and Reader users could change form information, too. The whole product line also supports the Standards panel which offers details about PDF/A. [ build ] Adobe Acrobat Standard can create PDF/A-1A files using the Office PDF Makers for Word, Excel and Powerpoint. Using the PDF Print driver, Standard users can create PDF/A-1B files from any application that can print. Acrobat Standard offers a great scanning interface that can scan, OCR, add metadata and save as PDF/A-1B in a single step. We’ll show that to you later. [ build] Next is Adobe Acrobat 10 Pro and the Pro Suite. These products are a superset of Acrobat Standard which means that they contain the same set of PDF/A features plus several additional ones. One big difference is that Acrobat Pro can take existing, non-compliant PDF/A compliant PDFs and CONFORM them to the standard. Acrobat Pro can also do this via File> SAVE AS. One other important feature of Acrobat Pro is that it can verify and report on PDF/A compliance. It can even embed the report in the PDF/A file while maintaining adherence to the standard. Lastly, Acrobat Pro can automate many PDF/A steps. For example, let’s say you wanted to combine a Word file, Excel file and an existing PDF, add a unifying set of headers and footers and export the newly combined file as PDF/A? No problem. Acrobat Pro can automate many PDF-related operations including those that include conforming to PDF/A.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

Demonstration

PDF Creation

PDF/A 1-A PDF Makers in Word, Excel, PowerPoint

PDF/A 1-B Using PDF Print Driver and from a scanner

Creating Near-PDF/A PDF Settings

PDF/A View Mode

Standards Panel

Preflight (Acrobat Pro Only)

Verify Compliance

Report on Compliance

Bring documents into Compliance

Removing PDF/A Information

Actions and PDF/A

Presenter
Presentation Notes
Rick I think that’s enough slides. Mark I couldn’t agree more. Rick Well, here’s what we’re going to cover in our demonstration. Let’s get started.

© 2011 Adobe Systems Incorporated. All Rights Reserved.

PDF/A Resources

Acrobat Standards Page http://www.adobe.com/products/acrobat/standards.html

PDF/A Competence Center http://www.pdfa.org

“Long Term Storage” from Acrobat User Community Website http://acrobatusers.com/tutorials/long-term-pdf-storage

AIIM PDF/A Committee http://www.aiim.org/Resources/Standards/Committees/PDFA

ISO PDF/A Documentation http://www.iso.org/iso/catalogue_detail?csnumber=38920

Presenter
Presentation Notes
Resources