12

Click here to load reader

Publish Your Own ebooks with FOSS Tools - Notes

Embed Size (px)

DESCRIPTION

The notes for a presentation given at FSOSS 2011 on publishing ebooks (in EPUB format) using free and Open Source tools.

Citation preview

Page 1: Publish Your Own ebooks with FOSS Tools - Notes

Publishing has changed dramatically in the last 20 years. In fact, it’s undergone something of a minor revolution in the last 10. You don't need me to tell you that, but it's a good opportunity for me to tell a story.

Once upon a time, publishing was the domain of the folks who could afford printing presses. Not just to own them, but to operate and maintain them. Printing presses were, and are, huge machines that require skilled people to work with them.

Compounding that, if you were a writer the only way you could get your book published was to put the fate of your book in the hands of an editor at a publishing house. Your chances then, as now, weren’t that good. While you could go the vanity publishing route that was expensive; out of reach for most.

But as time went on, alternatives emerged. Devices like the typewriter, the mimeograph machine, and

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 1

Publish Your Own ebooks with Open Source Tools

or: Adventures in Modern Publishing

By: Scott Nesbitt

Page 2: Publish Your Own ebooks with FOSS Tools - Notes

the photocopier. The quality varied, depending on what people were using but those devices put printing in the hands of ordinary people. They enabled everyday folk to put things in print and in some level of mass quantity. Still, paper and binding could be expensive. That was, and is still (to a degree), a big barrier to entry.

But high-quality publishing did move closer into the hands of the average person and most authors. I like to think that that move started with Donald Knuth and his creation, the TeX typesetting system . For me, though, the turning point came in 1988. I was in journalism school and the newsroom of the student newspaper got a battery of Macintosh computers. The ones that we now call Classic Macs. Using Microsoft Word, a laser printer, and the venerable techniques of paste up, we were able to quickly assemble an edition of the paper and send it to the printer.

I remember one instance in particular, where my class covered an event late one Thursday afternoon. We rushed back to the newsroom, wrote up our stories, put together the paper, and sent it to the printer. All before 7:00 p.m. the same day.

What we did was primitive, but it opened my eyes. As

did more powerful tools like Ventura Publisher, Quark Xpress, PageMaker and, in the world of technical communication, FrameMaker. But those tools had one thing in common. Although the work was done on computers, the goal was to put the work into print.

Going mobile

It wasn’t until the mid-1990s, when truly mobile devices -- ones with screens only a few inches acrosss – started to hit the market en masse that some people got the idea to create books that were meant to be read on those devices.

Admittedly, most of those books were public domain works, classics, and reference material. There was little, if any new or mainstream content. But the seeds were sown and from those days until today a variety of devices (whether for reading ebooks or not) have come and gone. And ebook formats have bloomed like a thousand flowers.

There are a number of ebook formats available today. Most are just niche or marginal. The two that are arguably the most popular are PDF and EPUB.

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 2

Page 3: Publish Your Own ebooks with FOSS Tools - Notes

PDF versus EPUB

PDF has been around since the early 1990s. At the time, it was somewhat revolutionary. Here was a format that could literally take a snapshot of the look and feel of any document no matter how complex the layout. That, in itself, was pretty impressive. For the time and even for now.

But PDF, no matter what Adobe says, is really a format for printing. At best, it’s a format for viewing on larger screens -- desktop monitors, laptops, and (in a stretch) on larger tablets.

EPUB, on the other hand, is a young upstart. From day one, EPUB had the advantage of being created in the right place at the right time. EPUB was built for viewing on screen. Print wasn’t even an afterthought -- I don’t think it was even considered to be a necessary feature of EPUB.

While EPUB files might not be as visually pretty as PDFs, they’re more than up to the task for reading on screen. Any screen. Let me give you a few reasons why.

Why EPUB?

I’d like to take a moment to look at why think that

EPUB is the best format (at the moment, anyway) for publishing ebooks.

First off, EPUB is based on open standards. I’ll be talking a bit more about this in a moment. While PDF (or, at least, some variants of PDF) is an ISO standard, it’s not really open. To be honest, I’d rather use an open standard than a closed one. Or a closed format.

Secondly, EPUB is widely supported. Most ebook readers can handle EPUB files, and reader software for computers and tablets and smartphones (most of it free or Open Source) can too. There are even browser-based EPUB readers, like the extension for Firefox called EPUBReader.

Third, EPUB content is designed to flow. What do I mean by that? Think of all of the devices that you’d read an ebook on: computer or laptop, a tablet, and ebook reader, or a smartphone. All of them have different sized screens and different screen resolutions. EPUB pages aren’t exactly one-size-fits-all. They’re more one-size-adapts-to-all. You always (well, there are exceptions) get text on a single page, within the margins of the screen.

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 3

Page 4: Publish Your Own ebooks with FOSS Tools - Notes

With a PDF file, things can be very different. I’ve used readers that leave widows and orphans. On top of that, one strength of PDF is a major drawback when the format is used for ebooks. And that’s PDF’s ability to maintain the layout, the look, and the feel of a printed document. It’s always nice to admire the work of a good layout or design person. But when reading that on a small screen, you often wind up scrolling and resizing. That disrupts the flow of reading, and gets really frustrating.

Finally, EPUB is very well suited for text-heavy books. You can include vector and raster images as well. And, unlike PDF, including graphics won’t overly bloat the size of the file.

Drawbacks of EPUB

I’d be remiss if I didn’t mention a few of EPUB’s drawbacks. The main ones are:

● It’s not suited to books with more complex and precise layouts -- for example photo books or digital comics.

● When it comes to scientific and technical publishing, EPUB doesn’t support equations set using MathML (an XML variant for presenting

the structure and content of mathematics). Instead, you need to use image files.

● There’s no provision for linking into or between books.

Taking a peek into EPUB

This isn’t going to be an in-depth technical look at the innards of EPUB. I just want to give you a birds-eye view of the format just so you know what it consists of and how it works.

Remember when I said that EPUB is based on open standards? Well, those standards are XHTML, XML, and CSS.

The text of a book is in XHTML. Yes, one of the file formats used to create Web pages. So if you have existing content -- for example, articles that have been published on the Web or blog posts -- you can use them as the basis of an EPUB book. More about this shortly.

CSS, if you don’t know, is short for Cascading Style Sheets. Cascading Style Sheets let you apply formatting to a Web page. Think of a CSS file as being like a template in a word processor. By changing attributes in a CSS file, you can change the

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 4

Page 5: Publish Your Own ebooks with FOSS Tools - Notes

look and feel of an EPUB file.

XML comes in with an EPUB file’s table of contents file (named toc.ncx) and a metadata file (named content.opf). The table of contents file not only provides structure to an ebook, it also provides navigation. Yes, a true table of contents. The metadata file, obviously, contains information about the book -- like its title, author, language, the software used to publish it, and the like. This is information that readers rarely, if ever, see but which should in an EPUB file to make it complete.

EPUB files have the extension .epub. What a surprise … But .epub isn’t some esoteric and murky format like, say, .doc. It’s actually a ZIP file. You can open an EPUB file using any file compression utility -- like Archive Manager in GNOME or WinZip in Windows.

Let’s look at some tools

A while ago, I heard someone say that creating EPUBs in 2011 is like creating Web pages circa 1997. The implication there was that a lot of manual work is involved. I don’t agree. Sure, you can assemble your own EPUB books (including building your own table of contents files by hand). Why do that? Why not let the

tools do the bulk of the work for you?

I’ll be looking at five tools. Well, not all of them are tools -- two of them are markup languages. For the purposes of this talk, let’s just pretend they’re all tools. They’re not the only games in town, but they’re the ones I’m most familiar with.

I’m going to put these tools into three broad categories:

● Conversion● Native authoring● A hybrid solution

The tools I’m going to discuss, for the most part, aren’t meant for high-volume publishing. But for a lone writer wanting to produce EPUB books or even a small firm wanting to put out content as EPUB they’re more than up to the task.

Let’s get to the tools.

DocBook

DocBook is a markup language, based on XML, that’s widely used in documenting hardware and software. But a few publishers, notably O’Reilly Media and XML

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 5

Page 6: Publish Your Own ebooks with FOSS Tools - Notes

Press, use DocBook for publishing their books. There’s even a subset of DocBook aimed at publishers.

If you want to create EPUBs from DocBook source files, it’s a lot easier than it used to be. That’s because the DocBook stylesheets now support EPUB output. In case you’re wondering, DocBook’s stylesheets are simply a set of files that aid in converting XML documents to various formats like HTML, PDF, and EPUB.

The EPUB stylesheets are a relatively recent addition. When I first tried them, the EPUB stylesheets left a lot to be desired. They’ve gotten a lot better though.

In addition to the stylesheets, you’ll need an XSLT processor. An XSLT processor is software that does the actual work of transforming a DocBook file into another format. Most XSLT processors are command line tools, but they’re easy to use. If you use Linux, many distributions come with one called xsltproc already installed. You can also download and install a couple of other popular processors called Saxon and Xalan.

Let’s assume you’ve got everything you need -- the stylesheets and an XSLT processor installed, and a

DocBook source file. What do you do? You point the processor at the right stylesheet and tell it the name of the file you want to transform. With xsltproc, you’d use this command:

xsltproc [path to stylesheets]/epub/docbook.xsl [your_file.xml]

That was easy, wasn’t it? Well things get a bit messier from here. Remember the .epub container I mentioned earlier? While the DocBook tools just create the files that go into that container, you’ll need to create that container yourself. That’s fairly easy. Just use a file archiving utility to create a .zip file, then change the extension to .epub. There are a few other things you need to do, which are explained in detail in this article .

To me, what I just mentioned is the biggest drawback to using DocBook to create EPUB books. One complaint (well, actually a whine) that I constantly hear is that DocBook has too many tags. Over 400 of them, as I recall. People complain that they can’t possibly learn them all. Guess what? You don’t have to learn them all. You might use a dozen or two tags at the most. Focus on those ones, and use reference

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 6

Page 7: Publish Your Own ebooks with FOSS Tools - Notes

material for the rest.

AsciiDoc

AsciiDoc is one of those quintessential Open Source projects. The programmer behind it, Stewart Rackham, wanted to use DocBook to document the software he was writing. But he found that:

DocBook is a complex language, the markup is difficult to read and even more difficult to write directly — I found I was spending more    time typing markup tags, consulting reference manuals and fixing syntax errors, than I was writing the documentation.

So he came up with AsciiDoc.

AsciiDoc is a couple of things. First, it’s a lightweight markup language. Unlike HTML and XML, which use tags surrounded by angle brackets to format a document, AsciiDoc uses keyboard symbols to apply formatting. For example, if you want to create a heading you put a set of dashes below the text of the heading. A numbered list consists of items with a number and a period before them. I think that you get the idea.

Second, it’s a set of scripts and stylesheets that will convert a marked-up file to various formats like XHTML, PDF, and EPUB.

One thing that I should mention is that AsciiDoc is a command line tool. But don’t worry: you don’t need to remember a long string of commands and options. Rackham wrote a script named a 2 x which does all the heavy lifting for you. All you need to do is tell the script what format you want to output and what file you want to convert.

Here’s how to use the script:

a2x -fepub -dbook [ebook_source.txt]

Overall, AsciiDoc outputs a nice looking EPUB. Of course, to do that you should follow the format for preparing the source file . If you do that, you’ll run into fewer headaches.

OpenOffice.org/LibreOffice

OK, you’re probably thinking: using a word processor as an ebook publishing tool? There’s no reason why you can’t. People have written and published ebooks using OpenOffice.org and LibreOffice. OK, those ebooks were PDFs ... What about EPUB?

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 7

Page 8: Publish Your Own ebooks with FOSS Tools - Notes

Thanks to an extension for OpenOffice.org Writer called Writer 2 EPUB , you can do just that. In case you’re wondering: the extension works with LibreOffice Writer, too.

After you’ve installed the extension, using it is quite easy. Just open your book file in OpenOffice.org or LibreOffice Writer. Then, just click the Writer2EPUB button on your toolbar. You can enter metadata (remember, that’s information about the book) and even attach a separate cover file if you have. Then, click OK.

I’ll be honest: I’ve only experimented with files about 50 or 60 pages long at the most. That said, the conversion was fairly fast and quite smooth. The book looked good to boot.

When you’re preparing content for conversion to EPUB with Writer (and even if you aren’t), always keep this in mind: use styles. Don’t apply formatting manually -- for example, don’t create a heading by making text 22 point DejaVu Sans and applying bolding. Apply the Heading 1 style instead.

The reason you need to do this is simple. EPUB files are very structured. Styles, while they can help make

a document look nice, are there to enforce consistency and structure. If you don’t use styles, there can be problems. The biggest one is that the table of contents for your EPUB file won’t generate properly. Which means you won’t have proper navigation or structure.

Sigil

In some ways, I consider Sigil to be the main event. It pretty much does it all when it comes to creating and publishing EPUB files.

Sigil is a simple application, but it works. Consider it a WYSIWYG word processor for creating EPUB files. And guess what? It’s native format is .epub.

All you need to do is download and install Sigil, and then just fire it up. From there, you can start typing. Just remember to start a new document for each chapter and for the cover of your ebook.

Earlier in this talk, I mentioned that if you have articles published on the Web or blog posts you can use them as the basis of an ebook. Sigil can help you do just that. You can import HTML and XHTML files into Sigil, and they’ll become chapters in your book.

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 8

Page 9: Publish Your Own ebooks with FOSS Tools - Notes

You can arrange the chapters, edit them, add images … well, everything that you’d do in a word processor to tidy up or change their look.

Sigil will also automatically generate a table of contents using the headings in the chapters of your ebook. You can also add some basic metadata to the file -- title, author, and language.

Sigil is a quick and easy way to create an EPUB book. In fact, I used it to create the EPUB version of my first ebook. I was very pleased with the results.

BookiI have a soft spot in my heart for Booki. It’s the tool that the FLOSS Manuals project uses to write and publish it’s guides. Booki isn’t a desktop application. It’s a wiki. In fact, I’ve heard book descibed as a wiki, but instead of Web pages it produced books. It does produce Web pages, too but that’s beside the point …

Booki is fairly easy to use. There’s no wiki markup to deal with. The editing interface is like a Web-based word processor. You can change formatting with a click or two. I’ve worked with a number of people who, when being thrown into Booki, adapted to it within 30 minutes.

Being a wiki, Booki’s backend format is (as you might have guessed) XHTML. Which, as we know, is one of the components of an EPUB file.

There are two ways you can create an EPUB with Booki. One, just go to any manual on the FLOSS Manuals site. Then, click the EPUB button in the navigation panel on the left side of any page. After a few seconds, you get a nicely-formatted EPUB file.

The other way is to go to objavi.flossmanuals.cc. Objavi is the publishing backend of Booki, and using it enables you to choose from a number of output types including EPUB. You can also modify the default Cascading Style Sheet or point to another one of the Web. That, as you know, will let you change the look and feel of the book. Why do that? While the default stylesheet is fine, you might want to change the font being used or the spacing between paragraphs or the size of headings.

In either case, Booki assembles the chapter files, creates a table of contents, and surrounds it all with the EPUB wrapper.

A quick note about PDF

DocBook, AsciiDoc, and Booki all have one crucial

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 9

Page 10: Publish Your Own ebooks with FOSS Tools - Notes

piece of flexibility: if you need a PDF, you can easily create it. That probably sounds strange, especially after what I said about PDF earlier in this presentation.

Even though ebooks are all the rage, you might want to print your work. EPUB isn't suited for that. But PDF is. Last year, I ran a FLOSS Manuals book sprint at Toronto Open Source Week. We used Booki to create a manual for the Thunderbird email client. To do something special for the participants, I generated a PDF and printed copies of the manual using something called the Espresso Book Machine.

But let’s face it: like it or not, PDF is a de-facto electronic publishing standard. Some commercial electronic publishing channels will only distribute PDFs. And there are a number of people who only know PDF.

And not everyone owns an ebook reader or a tablet. They’ll read they read on their desktop or laptop computers.

For now, PDF is still a bit more popular than EPUB. Here’s a very unscientific example. Recently, I published my first ebook . It’s sold as a PDF through e - junkie . com (an electronic fulfillment service) and as

an EPUB in Amazon’s Kindle Store . The PDF version outsells the EPUB version by about 1.5:1.

Validation and testing

So you’ve got a nicely-formatted EPUB file. Now, all you have to do is let it loose into the wild. Not so fast. You can do that, but it’s not the best move. Before offering your EPUB for download or for sale, you should validate and test it first.

Let’s take a look at both processes.

Validation

Validation is the process of making sure that your EPUB books contain all the elements that ebook readers expect. Like what? Here’s a partial list:

● Complete metadata● The proper directory structure in the EPUB file● Valid XHTML● Working links and references to files in the

EPUB file● A table of contents

And a lot more. If you don’t validate your EPUB book, chances are it will render properly in your ebook reader. But why take that chance? But don’t worry:

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 10

Page 11: Publish Your Own ebooks with FOSS Tools - Notes

validation isn’t difficult to do. There are some good software and services that let you do just that.

One of the features of Sigil that I didn’t mention earlier is its built-in validator. All you need to do is open your EPUB file in Sigil, click a button, and after a few seconds it points out any problems.

Another validator you might want to consider is the online validator maintained by digital publishing firm ThreePress. Just upload your ebook and the service does the rest. If you don’t want to do that, then download and install epubcheck. epubcheck is what powers the ThreePress validator. It’s a command line Java application that’s quite easy to use. Just run the command:

java -jar epubcheck-0.9.2.jar ebook_file.epub

That seems simple enough, doesn’t it? There is one catch, though. Validators are great at finding problems. But in many cases, they’re lacking when it comes to explaining what those problems are, specifically. The validators assume that you have a level of knowledge and the knowledge to fix the problem. That’s not always the case.

When I was validating an ebook, I got an error message telling me that there was invalid HTML syntax in a particular file. I went to the line number that the validator pointed to in the file, and I didn’t see anything wrong. And I have a strong knowledge of HTML. Well, it turned out that the validator was expecting paragraph tags (<p> and </p>) around text surrounded by <blockquote> tags. I only figured that out by running the offending HTML file through an HTML validator.

Testing

Like validation, testing is optional. But it’s worthwhile doing it, if only as a final quality check. Crossing “i”s, dotting “t”s, making sure that line and paragraph breaks are accurate. That sort of thing.

In a perfect world, someone publishing an ebook would have access to one of every device on which people read electronic books -- ebook readers, tablets, and smartphones. Sadly, it’s not a perfect world.

So, what do you do? Use the devices that you have. They should give you a good idea of how your ebook will look when people read it. Also, consider using Calibre. Calibre is an Open Source ebook management application for desktop and laptop

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 11

Page 12: Publish Your Own ebooks with FOSS Tools - Notes

computer. While it’s not (as some people believe) a tool for reading ebooks, Calibre does have a solid ebook reading feature. One sneaky trick you can use is to resize Calibre’s ereader window to simulate how your ebook will look on screens of various sizes.

Chances are, you won’t find many (if any) problems.

Final thoughts

As with a number of other areas, Open Source tools are more than up to the task of publishing ebooks. It doesn’t hurt when one of the most popular formats for distributing ebooks is an open standard, either.

Whether you’re creating a short report or manual, a longer non fiction book, or a novel there’s an Open Source tool that will help you do the job. While I don’t believe that creating an EPUB in 2011 is like creating a Web page in 1997, I do have to admit that there’s still a way to go. That said, those of us in the Open Source world have some solid tools at our disposal. And they’re only getting better.

Remember, though, that all the tools in the world won’t make your book worth reading. That will only

happen if you have an interesting idea and do a good job of presenting that idea in writing. EPUB is just a delivery system. It’s the content that counts.

Want to connect?

Web site:

http://scottnesbitt.net

Blog:

http://weblog.scottnesbitt.net

Twitter:

http://www.twitter.com/ScottWNesbitt

identi.ca:

http://identi.ca/scottnesbitt

© 2011 Scott Nesbitt Publish Your Own ebooks with Open Source Tools - 12