Introduction to EPUB for
PDF Developers
PDF Technical Conference 2015
Bill McCoy Executive Director, International Digital Publishing Forum President, Readium Foundation
Interna'onal Digital Publishing Forum the global trade associa/on for digital publishing
founded in 1999 (originally Open eBook Forum) • 350+ member organiza/ons from 45 countries
• Member organiza/ons include publishers, service providers, e-‐retailers, reading system developers, libraries, educa/onal ins/tu/ons, government agencies, na/onal and global industry bodies
Web’s Fundamental Architecture Is Op'mized for Decentralized Growth
PostScript
History of Document Standards
A Mar'an Odyssey
• In 2006, Adobe previewed an XML serializa/on of PDF • Ini/ally “Project Mars”, then “PDFXML” • Acrobat/Reader import/export plug-‐ins • Page contents SVG, OCF packaging shared with EPUB 2
• Project discon/nued in 2009 • PDF page contents can be losslessly represented as SVG (slightly extended, mainly for unusual cases)
• But XML serializa/on of binary PDF didn’t solve big problems or expand usefulness of PDF
• And Microso_ XPS was not a threat to PDF ecosystem
15
• .ZIP-compatible archive containing: mimetype META-INF/ container.xml [manifest.xml] [metadata.xml] [signatures.xml] [encryption.xml] [rights.xml] content/ Great_Expectations.opf cover.html chapters/ chapter01.html chapter02.html … assets/
style.css image1.jpg …
<.epub File Structure>
This Was Only The Beginning …
• Styling & layout improvements for reflow content
• Fixed-‐layout content
• Interac/vity and rich media
• Global language support
• Comprehensive accessibility support
• Improve Web Standards alignment
EPUB 3 (2011)
Open Web PlaForm
• The Open Web Plaeorm is a comprehensive framework for rich, interac/ve, cross-‐plaeorm content and applica/ons.
• HTML5 is the cornerstone • Separa/on of content and presenta/on is fundamental (CSS)
EPUB 3 Accessibility Features
• Reliable naviga'on with defined reading order • Publica'on-‐level metadata • Seman'c markup in HTML5 & via epub:type • Fallback framework • Media Overlays
• Audio/video synched with text rendering • Can add fine-‐grained reading order onto document
• Text-‐To-‐Speech (TTS) Support • SSML phonemes • Pronuncia/on Lexicons (PLS) • CSS 3 Speech (aural styles, e.g. rate, pitch, stress)
• Mul'ple rendi'on support
Why Do We Need Portable Documents in a Web World?
• High-‐quality presenta/on and prin/ng • Archivable, offline-‐usable content • Content can be redistributed and sold via channels • Ease of authoring • Vendor-‐neutral open format (interoperability) vs. closed silos delivered via HTML+JS • Determinis/c structural seman/cs = accessibility
EPUB makes Web content into reliable publica/ons
• Structured • Navigable • Accessible • Metadata • Packaged
hop://www.epubcafe.jp/epub-‐japanese-‐layout
Example: OverDrive Read
EPUB 3 Global Adop'on
• Reading System support for EPUB 3 • Apple iBooks, Google Play Books, Overdrive, VitalSource, Kobo, Sony, VitalSource, Adobe Digital Edi/ons, …
• EPUB 3 authoring tool op/ons • Apps from professional (Adobe InDesign, Apple iBooks Author) to mass-‐market (Apple Pages, OpenOffice, Ichitaro, Hangul WP) • Specialized workflow tools (Aerbook, PressBooks, Metrodigi Chaucer, Aquafadas, Inkling, Gutenberg, PubCoder, …)
• Japan has holis/cally adopted EPUB 3 • EPUB 3 widely adopted for both trade eBooks and e-‐Manga
• Global educa/on publishing adop/ng EPUB 3 for e-‐textbooks and other learning content
EDUPUB Alliance • A global community for focused coordination of ongoing work among multiple standards organizations and their stakeholders
• Initiative launched October 2013 • Developing EPUB 3 Education Profile (e-textbooks and other learning content) – see http://idpf.org/edupub
• Integrating e-Learning API and content format standards
• There is not and may never be a “one size fits all” EPUB authoring tool (unlike PDF)
• Authoring EPUB is crea/ng a special kind of website
• Infinite range of poten/al expression • Wide range of suitable approaches • Specialized tools may be best for piece parts (widgets, anima/ons)
• EPUB reading system implementa/ons have differing levels of feature support
Authoring EPUB 3
XML first in novels from books to ebooks
38 25/03/2013
Page layout on XML
Manuscript correc/ons
DAM
Author word files
Images
XML conversion and edi/ng
XML to EPUBs conversion
PDF print file
Printer
XML file
EPUB file
Content proofing
Content driven Light Semantics
Hachette Livre
XML process : Where do you do XML?
39 25/03/2013
Page layout on XML
Author word files
XML structure recogni/on
PDF image file plus
hidden text
XML file
Content driven Light Semantics
Scanning plus OCR
PDF text print file
XML conversion and edi/ng
Author XML files
Back List Front List
Fixed Layout produc'on
25/03/2013 40
InDesign to Fixed Layout • InDesign Scrip/ng and IDML export • Conversion to HTML5 with absolute posi/oning
Layout driven
Fixed Layout produc'on
PDF to Fixed Layout • SVG export with appropriate tools • Use SVG directly in spine
25/03/2013 41
Layout driven
Fixed Layout produc'on
25/03/2013 42
PDF to Fixed Layout Images • High quality image export form PDF • Embed images in HTML5 files
Layout driven
Unstructured PDF
PDF does encode precise, fixed-layout drawing instructions for glyphs, shapes and images, enabling a consistent appearance
PDF does not (usually) encode reading order, words, paragraphs, columns, lists, tables, stories, chapters, text roles, shape/image groupings, shape/image roles, etc.
Structure Reconstruction
Analysis Structure
Border shapes Tables, Callouts, Articles
Clusters of glyph positions Words
Clusters of word positions Alignment
Vertical white strips Line breaks, Columns, Lists
Text line spacing & alignment Paragraphs
Shape grouping Diagram
The Future of Publishing is Na6ve Digital Content not Digi6zed Print
New Publishing Models: Reference Modules
• Trustworthy • Current & e-first • Discoverable • Hosted on ScienceDirect
Adding Value to Content: Enhanced e-books
• Data manipulation • Calculations & formulae • Virtual micriscopes • ePub3 today
Understanding Researcher User Needs:
• User Centered Design • Integrated Content • Enriched Content • Tools
Evolving From Digital Replicas to Native Digital Content Creation
Readium Foundation
• Created March 2013 as an independent nonprofit corpora/on • Apache / Eclipse / OpenStack model
• Readium Founda/on mission:
develop produc,on-‐level open source so0ware to advance EPUB and the Open Web pla<orm for publishing
• Now over 50 organiza/on members worldwide
• Some key members: Hacheoe, Gallimard, Edi/s, Adobe, Google, Kobo, Ingram/VitalSource, HMH, Deutsche Telekom, IBM, NYPL, Bokbasen, Bluefire, TXTR, ACCESS, Sony, DAISY, Mantano, TXTR, Intel, Penguin Random House
Readium – Current Projects
• Readium SDK • Rendering engine for na/ve mobile and device apps • 20+ shipping apps based on Readium
• Readium JS • Readium Cloud for browser-‐stack EPUB content • Readium for Chrome -‐ 500K ac/ve users
• Readium LCP • Lightweight DRM system • First LCP-‐based app recently shipped (Learning Ally)
• Mpre info: http://readium.org - github.com/readium
epubtest.org • Collabora/on between IDPF, Book Industry Study Group (BISG), and DAISY Consor/um
• Conformance test suite for EPUB 3 • Database of results/scores for different reading systems
• Crea/ng peer pressure for improvement and giving content authors objec/ve guidance on feature support across implementa/ons
IDPF and W3C Collabora'on: Digital Publishing Interest Group at W3C
IDPF Focus: Standards for the Book Reading Experience (EPUB) IDPF/EPUB built on W3C standards
W3C Focus: Standards for General Web Technologies Advance publishing industry requirements for the Open Web Plaeorm through a focused Digital Publishing Interest Group (IG) and related Working Groups (e.g. HTML, CSS)
50
S. Korea Collabora'ons with IDPF/Readium
• KERIS (department of Educa/on Ministry) a major ins/gator and partner of EDUPUB
• EPUB 3 has been adopted as a Korean Na/onal Standard and Korea submioed EPUB 3.0 as the basis of recently standardized ISO Technical Specifica/on via ISO JTC1/SC32
• Upcoming EPUB 3.1 expected to become full IS (Interna/onal Standard)
• Korea Copyright Commission has funded development of Open DRM for EPUB which has become a founda/on of the Readium LCP open source DRM project
IDPF and Readium Europe: September 2015
• French government Ministry of Culture and Ministry of Finance have partnered with industry stakeholders to fund 1.5 million Euros for European headquarters for IDPF and Readium Founda/on
• New European Digital Reading Lab (EDRLab) has been formed to host the ac/vi/es within exis/ng tech accelerator in central Paris
• Will help advance open, interoperable standards and open source for EPUB and Web technologies for publishing in and beyond Europe
EPUB Ecosystem – Other Milestones • September 2013
• Italian Minister of Educa/on issues decree on digital content manda/ng adap/ve display and accessibility
• April 2014
• IBM endorses EPUB 3 as preferred portable document format • Apple Pages adds EPUB 3 export
• June, August, September 2014 • EPUB / EDUPUB focused events in Oslo, Beijing, Tokyo
• April 2015 • NYPL wins grant for White House Open Ebook ini/a/ve, using Readium • edX seolement with Department of Jus/ce on ADA specifies EPUB 3
• June 2015
• EU Publica/ons Office and IDPF co-‐host organiza/on publisher workshop
• July 2015 • First CSS-‐based POD solu/on for EPUB 3 launches beta • Apple iBooks Author adds EPUB 3 template and export support
Specifica'on Work Since EPUB 3.0
• EPUB CSS Page Templates: informa/on document August 2012 • EPUB 3.0.1 update: approved June 2014 • EPUB Dic'onaries and Indexes specifica/ons: approved August 2015 • EPUB Mul'ple Rendi'ons, Previews, and Guided Naviga'on
specifica/ons: approved August 2015 • Open Annota'ons in EPUB specifica/on: dra_ July 2015 • EDUPUB EPUB profile and EDUPUB Structural Seman'cs
specifica/ons: dra_s July 2015 • EPUB Scriptable Components specifica/on: dra_ July 2015 • EPUB Distributable Objects specifica/on: dra_ July 2015
• See: hep://idpf.org/ongoing
What’s Next for EPUB? • EPUB 3.1 update – just kicking off
• Simplifica/on • Consolida/on of modular extensions • Improve Web Plaeorm alignment • New features (meet needs of addi/onal segments of publishing, improve accessibility)
• Digital Publishing Interest Group at W3C • Working on a longer-‐range vision of “EPUB+Web” that could fully unify online content and offline-‐capable portable documents • Informing EPUB 3.1 work and could fold into a future EPUB 4 • Near-‐term priority is improving building-‐block Web Standards (e.g. CSS) for high-‐design content publishing
• EPUB Cer'fica'on ini/a/ve • Con/nue to “raise the bar” for the overall EPUB Plaeorm • Accessibility as “table stakes” and “mine canary”
EPUB and the Open Web PlaForm Advancing Via Global Collabora'ons
Layout & Styling
Metadata
Accessibility
56
• PDF and EPUB can be viewed as as complementary serializa/ons of final-‐form content
• In many cases PDF may be preferable (e.g. prepress support, ubiquitous support across systems)
• In many cases EPUB may be preferable to a PDF (e.g. a11y, mobile, interac/vity)
• In some cases it may not maoer (FXL EPUB and PDF can be viewed as alterna/ve serializa/ons of the same infoset)
• A bigger portable document plaeorm, na/vely part of the Web, will increase reach and relevance for content publishers and solu/on providers, and improve content consump/on user experiences
• We can squabble about how government commioee minute mee/ngs should be published or collaborate to make solu/ons beoer for everyone
The Big Tent of Portable Documents
Thank You! hep://epubzone.org hep://idpf.org hep://readium.org Email: [email protected] Twieer: @[email protected]