Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Electronic Books
● Lecture 6 Ing. Miloslav Nič Ph.D.letní semestr 2010-2011BI-XML© Miloslav Nič, 2011
Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti
E-book
● Wikipedia:
An electronic book (also e-book, ebook, digital book) is a text and image-based publication in digital form produced on, published by, and readable on computers or other digital devices.
E-book formats
● TXT● HTML collection● PDF● Kindle (based on Mobipocket)● EPUB● ... and many more basd on similar
principles
EPUB x PDF
● http://www.adobe.com/content/dam/Adobe/en/
devnet/digitalpublishing/pdfs/EPUB_datasheet.pdf● PDF: a fixed page - the publisher in complete control over
page layout and presentation
● EPUB: text reflow according to screen size
International Digital Publishing Forum (IDPF)
● http://idpf.org/● a global trade and standards organization● develops and maintains the EPUB content
publication standard
EPUB
● a distribution and interchange format standard for digital publications and documents
● latest stable version EPUB 2.0.1● EPUB 2 initially standardized in 2007● EPUB 3 in the process of being
standardized (2011?)
Google and EPUB
Project Gutenberg
ePub Readers
● see e.g. http://www.jedisaber.com/eBooks/Readers.asp
● Some examples:
– Bookworm
– Calibre
– FB Reader
– Mobipocket
– Stanza
– ....
ePUB and Kindle
● not direct support at this moment● several converters available
Bookworm.oreilly.com
FBReader
● my favourite reader (both Linux and Android in my case; installers for other versions - e.g. Windows, Mac also exists)
● http://www.fbreader.org/
EPUB Standards
● Open Publication Structure (OPS)– book content in XHTML or DTBook
● Open Packaging Format (OPF)– book structure and metadata
● Open Container Format (OCF)– book file structure and compression to a
single file
Open Publication Structure (OPS)
● XML files● Namespaces:
– XHTML: ● http://www.w3.org/1999/xhtml
– DAISY:● http://www.daisy.org/z3986/2005/dtbook/
– OPS:● http://www.idpf.org/2007/ops
XHTML
● XHTML 1.1; only some modules are included
● a selection of supported elements:– html, head, title, body
– abbr, acronym, address, blockquote, br, cite, code, dfn, div, em, h1, h2, h3, h4, h5, h6, kbd, p, pre, q, samp, span, strong, var, dl, dt, dd, ol, ul, li, sub, sup
– a, img, caption, col, colgroup, table, tbody, td, tfoot, th, thead, tr
CSS
● a subset of CSS 2 supported● must be supplied with the book (not via
web)● E-Book readers are very variable (screen
size, graphic capabilities) – CSS styleshets very usefull
Images
● @alt of <img> required● core media types support of which is
required:– image/gif
– image/jpeg
– image/png
– image/svg+xml
DTBook (Digital Talking Book)
● an XML vocabulary defined in ANSI/NISO Z39.86-2005 Standard (http://www.niso.org/workrooms/daisy/Z39-86-2005.html)
● recommended for more advanced applications (e.g. educatonal books)
● supports footnotes, sidebars, annotations, page numbers, etc.
DTBook features
● hierarchical navigation● sequential reading with choices (e.g. skip
footnotes)● specific reading methods for different
components (e.g. tables)● time synchronization via SMIL
Navigation Control File (NCX)
● http://www.niso.org/workrooms/daisy/Z39-86-2005.html#NCX
● exposes the hierarchical structure of a book
Open Packaging Format (OPF)
● describes and references all components of the electronic publication (e.g. markup files, images, navigation structures)
● provides publication-level metadata
● specifies the linear reading-order of the publication
● provides fallback information to use when unsupported extensions to OPS are employed
● provides a mechanism to specify a declarative global navigation structure (the NCX)
OPF File Structure
● Package:– Metadata
– Manifest
– Spine
– Guide
<package>
● root element of OPF package● Attributes:
– xmlns=”http://www.idpf.org/2007/opf”
– version = “2.0”
– unique-identifier = “an-unique-id”● primary book identifier selected from a
collection of Dublin core identifier elements in <metadata>
● if not world-wide unique it may cause problems in libraries and catalogues
<metadata>
● a required child of <package>● its children either elements from Dublin
core namespace and/or <meta> elements with same syntax as XHTML
<dc:elements>
● Dublin core: http://dublincore.org/documents/dces/
● Elements: contributor, coverage, creator, date, description, format, identifier, language, publisher, relation, rights, source, subject, title, type
● e.g.:– <dc:title>A book</dc:title>
– <dc:identifier>uhf-232-dsds</dc:identifier>
<dc:identifier>
● at least one <identifier> with attribute @id must be present inside <metadata>
● the value of an @id attribute must be equal to the @unique-identifier of <package> element
– content of the <identifier> element with such @id is used to uniquely identify the book in libraries and catalogues
<manifest>
● the next required child of <package>● provides a list of all the files that are part of
the publication (xhtml, css, images, …)● each file listed in a child <item>● each file must be given precisely once but
the order of files is not significant
<item> child of <manifest>● Attributes, all required:
– @id
– @href● relative paths interpreted relative to the
location of OPF file containing the <manifest>
– @media-type
● Optional attribute:
– @fallback ● provides an @id of another item to be
used if this item @media-type is not supported
<spine>
● the next required element● collects “main eBook pages”● contains one or more <itemref> elements
– <itemref idref='anID'>● anID is @id of a <manifest>/<item>
● @toc of <spine>– contains a value of @id of an <item>
which provides a content for eBook, usually in NCX format
Open Container Format (OCF)
● a general-purpose container technology● collects a related set of files into a single-
file container● the required format for a file containing an
EPUB book● a ZIP archive
OCF file structure
● File mimetype
● Directory META-INF with files:
– container.xml (required)
– manifest.xml
– metadata.xml
– signatures.xml
– encryption.xml
– rights.xml
● Directory OEBPS with EPUB files (which may be in subdirectories)
● Other directories, e.g. PDF for alternative book versions
file: mimetype
● in the root of ZIP archive● it must be the first file in the archive● must contain text:
application/epub+zip
● make sure there are no whitespaces around this text
● simplifies automatic recognition of the archive
container.xml
● in directory META-INF● format:
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/an_OPF_file.opf"
media-type="application/oebps-package+xml" />
</rootfiles>
</container>
EPUB 3.0
● http://idpf.org/epub/30/spec/epub30-overview.html
● 4 specifications:– EPUB Publications 3.0
– EPUB Content Documents 3.0
– EPUB Open Container Format (OCF) 3.0
– EPUB Media Overlays 3.0
● in draft stage
Some changes from v.2
● http://idpf.org/epub/30/spec/epub30-changes.html
● HTML5 syntax (DTBook no longer an alternative syntax to XHTML)
● NCX superseded by EPUB Navigation Document (uses <nav> from HTML5)
● text-to-speech facilities● multimedia support (via HTML5 <audio>
and <video>)
EPUB Media Overlays 3.0
● defines a usage of SMIL● a simplified subset of SMIL 3.0 that allow
sequencing of clips● <par> + <seq>● @clipBegin, @clipEnd