PDF is dead. Long live PDF!

Preview:

Citation preview

PDF made easy with iText 7PDF is dead! Long live PDF!

Benoit Lagae, Developer, iText SoftwareBruno Lowagie, Chief Strategy Officer, iText Group

Is PDF dead?

PDF specifications

Everybody uses HTML

Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/

But governments love PDF

Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/

Percentage of PDF files:.org: 15%.gov: 38%.edu: 27%

Publication versus …

• No need to be self-contained• May change over time• Not all content produced by the author

• e.g. Advertisements• Becoming more interactive

• e.g. Comments on a news article

… Document

• Self-contained• Unchanging (non-dynamic)• Able to be authenticated• Able to be secured/protected

Not counting HTML, PDF is king

Source:http://duff-johnson.com/2015/02/12/the-8-most-popular-document-formats-on-the-web-in-2015/

Publication:HTML depends on context

Document:PDF is forever

PDF/Eengineering

Since 2008

ISO 24517

PDF/VTprinting

Since 2010

ISO 16612

PDF/Xgraphic arts

Since 2001

ISO 15930

PDF/Aarchive

Since 2005

ISO 19005

PDF/UAaccessibility

Since 2012

ISO 14289

PDFPortable Document FormatFirst released by Adobe in 1993ISO Standard since 2008ISO 32000

Related: XFDF (ISO), EcmaScript (ISO), PRC (ISO), PAdES (ETSI), ZUGFeRD

An umbrella of standards:

iText 7: a PDF engine

Image exampleImage fox = new Image(ImageFactory.getImage(FOX));Image dog = new Image(ImageFactory.getImage(DOG));Paragraph p = new Paragraph("The quick brown ").add(fox) .add(" jumps over the lazy ").add(dog);document.add(p);

On the importance of making a document

accessible

Can everyone read this?

Some structure is helpful

title

list item

list item

list item

Label Content

Can everyone read this?

How do we read a spider chart?

Ris

k M

anag

emen

t

Stru

ctur

ed F

inan

ce

Mer

gers

& a

cqui

sitio

ns

Gov

erna

nce

& In

tern

al

Con

trol

Acc

ount

ing

Ope

ratio

ns

Trea

sury

ope

ratio

ns

Man

agem

ent I

nfor

mat

ion

& B

usin

ess

Dec

isio

n Su

ppor

tB

usin

ess

Plan

ning

&

Stra

tegy

Fina

nce

Con

trib

utio

n to

IT

Man

agem

ent

Com

mer

cial

Act

iviti

es

Taxa

tion

Func

tiona

l Lea

ders

hip

Resolve abbreviations

What goes into

rows / columns?Make info color

independent

Is this a better way to read data?

Adapting the‘quick brown fox’

example for PDF/UA

PDF/UA (part 1)PdfDocument pdf = new PdfDocument(new PdfWriter(dest));Document document = new Document(pdf);

//Setting some required parametersPdf.setTagged();pdf.getCatalog().setLang(new PdfString("en-US"));pdf.getCatalog().setViewerPreferences( new PdfViewerPreferences().setDisplayDocTitle(true));PdfDocumentInfo info = pdf.getDocumentInfo();info.setTitle("iText7 PDF/UA example");//Create XMP meta datapdf.createXmpMetadata();

PDF/UA (part 2)//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));//PDF/UA: Set alt textfoxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));//PDF/UA: Set alt textdogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);

document.close();

Result

On the importance of making a document

archivable

PDF/A

• ISO-19005– Long-term preservation of documents– Approved parts will never become invalid– Individual parts define new, useful features

• Obligations and restrictions– Metadata: ISO 16684, eXtensible Metadata Platform (XMP)– The document must be self-contained:

• All fonts need to be embedded• No external movie, sound or other binary files

– No JavaScript allowed– No encryption allowed

Three standards• PDF/A-1 (2005)

– based on PDF 1.4– Level B (“basic”): visual appearance– Level A (“accessible”): visual appearance + structural and semantic properties

(Tagged PDF)

• PDF/A-2 (2011)– Based on ISO-32000-1– Features introduced in PDF 1.5, 1.6, and 1.7:

• Added support for JPEG2000, Collections, object-level XMP, optional content• Improved support for transparency, comment types and annotations, digital

signatures– Level U (“unicode”): visual appearance + all text is in Unicode

• PDF/A-3 (2012)– Based on PDF/A-2 with only 1 difference: attachments do not need to be PDF/A

Adapting the‘quick brown fox’

example for PDF/A

PDF/A-1b examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);//Create XMP meta datapdf.createXmpMetadata();//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));p.add(dogImage);document.add(p);document.close();

Resulting PDF/A-1b

PDF/A-1a examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);pdf.setTagged();pdf.createXmpMetadata();PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));foxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));dogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);document.close();

Resulting PDF/A-1a

Real-world use:publishing a CSV file as PDF/A-3a and PDF/UA

United States database

United States examplepart 1: initializations

PdfADocument pdf = new PdfADocument( new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf, PageSize.A4.rotate());//Setting some required parameterspdf.setTagged(); // PDF/UA and PDF/A Level apdf.getCatalog().setLang(new PdfString("en-US")); // PDF/UA pdf.getCatalog().setViewerPreferences( // PDF/UA new PdfViewerPreferences().setDisplayDocTitle(true)); // PDF/UA PdfDocumentInfo info = pdf.getDocumentInfo(); // PDF/UA info.setTitle("iText7 PDF/A-3 example"); // PDF/UA //Create XMP meta datapdf.createXmpMetadata(); // PDF/UA and PDF/A Level a

United States examplepart 2: add attachment

//Add attachmentPdfDictionary parameters = new PdfDictionary();parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec( pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv", "united_states.csv", new PdfName("text/csv"), parameters, PdfName.Data, false);fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));pdf.addFileAttachment("united_states.csv", fileSpec);PdfArray array = new PdfArray();array.add(fileSpec.getPdfObject().getIndirectReference());pdf.getCatalog().put(new PdfName("AF"), array);

United States examplepart 3: parse CSV file

PdfFont font = PdfFontFactory.createFont(FONT, true);PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);// Parsing a CSV file and add data to a tableTable table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});table.setWidthPercent(100);BufferedReader br = new BufferedReader(new FileReader(DATA));String line = br.readLine();process(table, line, bold, true);while ((line = br.readLine()) != null) { process(table, line, font, false);}br.close();document.add(table);document.close();

United States examplepart 4: process each line

public void process(Table table, String line, PdfFont font, boolean isHeader) { StringTokenizer tokenizer = new StringTokenizer(line, ";"); while (tokenizer.hasMoreTokens()) { if (isHeader) { table.addHeaderCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } else { table.addCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } }}

United States example: result

United States example: result

Real-world use:ZUGFeRD,

the future of invoicing

Invoices:Need to be archived

Invoices:Need to be accessible

Invoices:Need to be machine-readable

Invoices:Need to be machine-readable

iText 7 and its value add-ons

New in iText 7:improved typographyand support for Indic

scripts

iText 5: missing links

Indic scripts:•Only unsupported major script family•Feature request #1•Huge opportunity

• limited support in most other PDF libraries

Other features:•Optional ligatures in Latin script•Vowel diacritics in Arabic

Indic scripts: problems•Lack of expertise

• Unicode encodes 49 Indic scripts• Complex scripts with unique features

• Glyph repositioning: ह + ि� = हिह• Glyph substitution: ம + ு� = மு• Half-characters: त + �� + य = त्य

•Unsolvable issues for iText 5 font engine• No dedicated Unicode points for half-characters• No font lookups past ‘\uFFFF’• Ligaturization is context-dependent (virama)

Indic scripts: solutionsWriting a new font engine

• Automatic script recognition• Based on Unicode ranges

• Flexibility = extensibility• Generic Shaper class • Separate module, only called when necessary

• Glyph replacement rules• Different per writing system• Alternate glyphs are font-dependent

Indic scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\u0938\u093E\u0939\u093F\u0924\u094D\u092F\u0915\u093E\u0930"; // saahityakaardocument.add(new Paragraph(txt).setFont(font));

String txt = "\u0B8E\u0BB4\u0BC1\u0BA4\u0BCD\u0BA4\u0BBE\u0BB3\u0BB0\u0BCD"; // eluttaalardocument.add(new Paragraph(txt).setFont(font));

Other scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\ u0627\u0644\u0643\u0627\u062A\u0628"; // al-katibudocument.add(new Paragraph(txt).setFont(font));

String txt = "writer"; GlyphLine glyphLine = font.createGlyphLine(txt);Shaper.applyLigaFeature(foglihtenNo07, glyphLine, null);canvas.showText(glyphLine)

Status of advanced typography in iText 7

•Indic scripts• We already support:

• Devanagari• Tamil

• Coming soon:• Telugu• Others: based on customer demand

•Arabic• Support for vocalized Arabic (diacritics) is in development

•Latin• Optional ligatures are fully supported