PDF.JS at SwissJeese 2012

Preview:

DESCRIPTION

 

Citation preview

Julian Viereck

@jviereck+julian.viereck

Overview

• What is PDF.JS about

• How PDF is structured & processing in PDF.JS

• “Why are you doing this?”

• Firefox Integration

• What’s next?

• Demo

• Q & A

5

10

15

5

5

15

5

BespinSkywriter

Ace

FirefoxDevTools

ETH Zurich

(Physics)PDF.JS

?

About me

PDF Viewerusing

OpenWebStandards

What is PDF.JS

• building faithful & efficient PDF viewer

• HTML5 technology experiment

• no native code

• secure (web sandbox)

• Mozilla Labs Project - Open Source (Github)

What is PDF.JS

• Not Firefox-Specific - all modern browsers

• 1.3 MB uncompressed JS

• ~ 33`000 lines of code

• viewer in different languages

• async API

root objID, xRef byte offset

root obj = ref to pages catalog

How PDF is structuredHeader

Body

[Objects]

xRef Table

Trailer

sequence of objets

fonts, drawing cmds, images, words, bookmarks, form fields

mapping objID ⇔ byte offset

PDF version

PDF file

Let’s look at it

CanvasGraphics

PartialEvaluator

Processing in PDF.JS

• get plain Uint8Array via XHR2, build Stream

• new PDFDoc(stream): read xRef, root object

• page = PDFDoc.getPage(N)

• page.startRendering(graphics)

• read & convert all PDF cmds ➟ OL

• load required objects (fonts, images)

• graphics.executeOperatorList(OL)

OperationList

Execution ExamplePartial

Evaluator

draw(obj#3, dict.x, dict.y

)

“get page 2”Data

Graphics

buildsobj#3?dict.x, .y?

obj#3 = ”foo”x = 20y = 30

draw oncanvas

drawing cmds

Problem Processing

• Extracting data slow (compressed)

• Transform data (images) slow

• Sometimes a lot of objects on page

➡ Freezes UI

➡ Use WebWorker

➡ :( no direct memory access, postMessage

PartialEvaluator

draw(obj#3, dict.x, dict.y

)

Data

Graphics

builds

draw oncanvas

Data“get page 2”

data

draw(“foo”, 20, 30

)

MainThread

Web Worker

OpListOperation

List + Data

setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke

5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics

PartialEvaluator xRef, catalog, resources+ OL

Images• JPEG streams:

• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));

• If not JPEG stream:

• read bytes, convert to colorspace

• imgData = canvas.getImageData()

• fillWithPixelData(bytes, imgData)

• canvas.putImageData(imgData)

Jpeg, but...

• no natives support for Jpeg 2000, CMYK

➡ use JS implementation

‣ works, not that performant but good enough

Fonts

• There are lots of different font formats!

• fonts are converted to OpenType

• use CSS for loading: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)

• Fonts are sanitized by browser

• Need to rebuild malformed fonts :/

“Why are you doing this?”

aka. ∃ C/C++ libraries= isn’t that faster?

“Performance is not the only measure”

1. Security

Most vulnerable programs

Source: http://www.csis.dk/en/csis/news/3321

~ 25% crashes in Firefox are Plugin related

2. WebSpecific Viewer

3. Drive Innovation

4. Speed

4. Speed

• Rendering slower then C/C++

• BUT

• Partial downloading

• Render page in background

• Make slow become faster

• Mostly: Good enough

5. Can do better

6. Push WebPlatform

B2G aka. Boot2Gecko

New API: Printing

• Printing very limited on the web right now

• no way to achieve native printing experience

• NEED: New API for printing

• mozPrintCallback

• define canvas content during printing

• send drawing commands directly to printer

WebPagePrint

Single Pages

• Find print canvas on page

• Execute printCallback

• All canvas done ➠ print page

Page 1

Page 2

canvas.mozPrintCallback

Firefox Integration

Firefox Integration

• PDF.JS as bundled Addon in Firefox Nightly

• Getting in Release Channel is hard

• 400M users have expectations

• more testing coverage

• accessibility

• match UX expectation

• fallback if something is not working

Firefox Integration

• Try to make it till Aurora Merge (6/5)

• Firefox Specific, BUT

• improving quality browser independent

• only small parts Firefox specific

What’s next

• Fix broken PDFs

• Improve performance

• Improve Text selection

• Text search

• Form support

• Printing support

Demo

Contributing

• Lots of areas

• Translation

• Writing Code (embeddable viewer?)

• Testing (Firefox Auto-Update Addon)

Q & A

Recommended