Internet Requirements for E-books – A Checklist Aside

The web didn’t change tv, which didn’t change cinema, which didn’t change books. E-books aren’t going to exchange books both. E-books are books, merely with a unique type.

Article Continues Beneath

The digital e-book is the most recent instance of how HTML continues to win out over competing, typically nonstandardized, codecs. E-books aren’t web sites, however E-books are distributed electronically. Now the dominant E-book format is XHTML. Internet requirements tackle a brand new taste when rendering literature on the display, and basic assumptions about typography (or “formatting”) must be adjusted.

HTML isn’t only for the net#section2

It’s for any textual content distributed on-line.

Expertise predictions can come again to hang-out you, however this one I’m positive about: The destiny of non-HTML codecs has been sealed by HTML5 and the iPad. Persons are lastly noticing what was staring them within the face all alongside—HTML is nice for expressing phrases. The online is generally about expressing phrases, and HTML works effectively for it. The identical holds true for digital books.

  • E-books are normally not “web sites.” You possibly can put up your e-book copy as net pages, however the E-book as a logical entity will not be an internet site.

  • ePub, the worldwide E-book normal, is HTML (XHTML 1.1 with minor exclusions). Two different codecs —sure sorts of “true” XML and DTBook— have equal standing in ePub; most builders will use XHTML.

  • Each E-reader underneath the solar besides the Amazon Kindle can show ePub digital books. (A Kindle can present you its personal variant, .AZW, of a variant of HTML [Mobipocket]; that’s two steps faraway from the actual factor. A Kindle can even convert HTML to displayable format, presumably AZW.)

It might be unseemly to bop on graves, however HTML wins once more.

HTML doesn’t work for all paperwork, because it lacks essential structural options. (HTML5 addresses a few of these deficiencies however received’t assist at present’s E-books.) HTML does  work for enormous numbers of paperwork, a lot of which we name books. Guess in opposition to HTML for on-line distribution and also you’ve backed the fallacious horse.

Philosophical digression#section3

Each article on digital books should ritually deal with the idea of e-book and the relation of type to e-book. On this case I’ll acknowledge the remarks of web pioneer Jaron Lanier, who warns in his e-book You Are Not a Gadget that early software program choices can dramatically constrain what later turns into doable. (Others have said the identical factor—the kind designers at LettError complained a decade in the past about how software program instruments constrain concepts.)

I’m articulating an HTML-triumphalist view of E-book manufacturing. By backing what I really feel is clearly the precise horse, I’m contributing to the strangulation of recent or uninvented types of the e-book. Advocacy of 1 digital format is at all times a strategy of eugenics; different codecs won’t ever be born or will die prematurely. I’m doing that proper now by downplaying the significance of XML and DTBook variants of ePub.

I’m completely happy to contribute to the loss of life of “vooks” and different multimedia web sites masquerading as books. (I don’t want a rectangle of video yammering at me whereas I’m attempting to learn.) They’re like animated popunder advertisements in that no precise “consumer” needs them, however any person with an agenda does. Exterminating that species is one thing to which I’m proud to contribute. For different types of books, advocating strict HTML markup will trigger as-yet-unknowable hurt.

I nonetheless preserve that typical works of fiction, and lots of works of nonfiction, could be expressed very effectively certainly in HTML E-books. To realize this diploma of expression, we have now to rid ourselves of print conventions that don’t work in digital media.

One other method of claiming that is that books must be as bookish as doable underneath the circumstances. Printed books have to make the most of all the pieces print has to supply (decision, tactility, portability, collectibility), whereas digital books should do likewise for their very own type (financial system, copyability, reflow, looking and indexing, interlinking).

Two issues to be solved#section4

If HTML is the dominant markup language for many E-books, then net requirements come into play. Frankly, I don’t need to relive the late Nineteen Nineties and early 2000s, during which standardistas needed to provide you with one barely completely different method after one other to persuade builders to code their websites correctly. You continue to don’t see legitimate HTML fairly often on real-world websites, however tables for format are largely a factor of the previous and semantics are vastly improved. Possibly pure net requirements didn’t “win,” however no matter net requirements aren’t undoubtedly misplaced.

It might be overconfident to imagine that this success will instantly replicate itself with E-books. Publishers (there are barely any “builders” within the E-book sphere) is not going to routinely do the precise factor, and thus far they appear to be doing precisely the fallacious factor.

If we wish publishers’ code in E-books to be pretty much as good as standardistas’ code on precise web sites, we’ve acquired two issues to unravel.


The underlying code for typical ePub digital books is XHTML 1.1. Meaning you want legitimate code with no errors: the ePub normal requires XML error dealing with, so you’ll be able to’t get away with HTML 4.0”“fashion tag soup.

Novels and lots of nonfiction books are semantically easy. Most can get by with a tiny vary of tags:

  • P (however don’t mark up all the pieces as a paragraph)
  • Headings (arguably H1 must be reserved for the title of the e-book)
  • Emphasis (perennial debates over semantics of CITE vs.EM vs. I  could hereby resume)
  • Lists
  • Photographs (with necessary alternate textual content)

Even nonexperts could be readily skilled to acknowledge easy buildings like these. However folks untrained in even the best markup are the issue.

Manufacturing strategies#section6

For E-books to have good code, good code must be discovered at each stage of the manufacturing course of. That’s not how issues are finished proper now.

Screenshot: … as ?.?.?.?

Skinny areas between dots in an ellipsis turn into query marks. For extra examples of typographic tragicomedy in E-books, see this text’s Sidebar.

A whole lot, if not 1000’s, of commercially accessible E-books from legacy publishing homes had been transformed to “digital format” by scanning printed books and turning the ensuing OCR e-book copy into textual content recordsdata. (Certainly simply textual content recordsdata, not structured markup.) Copy errors are so rampant that E-books are the primary class of e-book in human historical past that would really be returned as faulty. This in flip has led to the equally rampant mythology that E-books are all about “formatting.” (They aren’t: they’re about structured textual content with kinds hooked up.)

Why would publishers scan hardcopies? Aren’t all books produced on computer systems lately? Sure, however do publishers personal these recordsdata, or do numerous freelance designers? Can anyone even discover the recordsdata? What in the event that they had been saved in an previous model of Quark Xpress or Ventura Writer?  As an alternative of rooting round in recordsdata resident on computer systems they don’t actually perceive anyway (these are e-book folks), publishers discover it simpler to only ship print books out to low bidders for scanning.

Now there’s a cottage business promoting conversion companies for E-texts. One competitor within the E-book “house,” Kobo (né Shortcovers), guarantees conversions for “as little as $29… per title.” One other competitor, eBook Architects, converts (“to Mobipocket/Kindle first”) for about $400 in typical instances. The New York Instances estimated that to “convert the textual content to a digital file, typeset it in digital type and copy-edit it” prices a mere 50¢.

Charges this low are unsustainably low and can’t presumably result in good markup and clear copy.

This isn’t hypothetical. Now we have numerous examples to take a look at proper now (see sidebar).

Race to the underside#section7

E-books are barely starting to catch on and already a very powerful components of an E-book — copy and markup—are affected by a race to the underside.

What’s the answer? The canonical format of a e-book must be HTML. Authors ought to write in HTML, making a manuscript instantly transformable to an E-book. A manuscript might then be imported into that fossil the publishing business refuses to depart behind, Microsoft Phrase. (MS Phrase’s Monitor Adjustments function has turn into a form of methadone for an addicted publishing business.)

To typeset a print e-book from this supply, translating twice (HTML → Phrase →  InDesign) is a confirmed workflow with the added benefit of outputting tagged PDFs with good semantics.

Now, the foregoing is so optimistic as to be ridiculous. Authors are not going to begin writing in HTML, not to mention the full-on XML that Ben Hammersley has known as for.  Guide copy will proceed to be saved as MS Phrase, Xpress, and/or InDesign recordsdata. Although mangled and insufficient, such copy will then be “exported” for E-book “formatting.”

As an alternative of avoiding errors to start with, the publishing business could select to repair errors after they’re made—however provided that authors, particularly big-name authors with ruthless literary brokers,  complain loudly till publishers have whole imprints’ E-books repaired. This is not going to lead to authors writing good sturdy HTML for brand new books, however will clear up a part of the mess.

Ongoing E-book experiments#section8

There’s lots of exercise within the electronic-book “house,” from digital assume tanks just like the Guide Oven to crowdsourced copy-editing at Chunk-Measurement(d) Edits, to call two websites comanaged by impresarios Hugh McGuire and Stephanie Troeth. Two different initiatives are engaged on the probabilities of standardized structured code within the E-book course of.

  • ePub Zen Backyard goals to do for electronic-book format and kind what CSS Zen Backyard did for net design, which was so much. The brand new Zen Backyard may benefit from the expertise of the previous Zen Backyard by providing a couple of canonical textual content to fashion, however the idea is a confirmed winner. (You possibly can assist by contributing.)

  • Simon Fraser College’s Thinkubator is slowly growing a mission that expands on InDesign’s means to avoid wasting a whole round-trip illustration of an InDesign file as XML. Changing XML output to ePub XHTML might not be trivial, however it isn’t unimaginable and could possibly be automated.

    At that time, we wouldn’t must retrain authors to put in writing in HTML; we’d simply must retrain desktop publishers to make use of structural, not presentational, fashion names (Heading 2, Emphasis, Blockquote) for later translation. For code-competent authors, this similar manufacturing technique accepts XHTML as a supply file, which might then be translated to a local InDesign doc or PDF with out middleman recordsdata.

Separation of content material and construction has by no means been extra essential#section9

ePub makes use of XHTML 1.1 as a markup language. You might also affiliate stylesheets — explicitly CSS2, not some other model. As such and as ever, markup should be separated from presentation.

However E-book creators come from the publishing enterprise. They’re writers, editors, desktop publishers. They are going to naturally try and hack and deform code and textual content to breed options from print layouts that ought to actually be ruled by CSS, dealt with by the E-book reader, or forgotten about fully. In some instances, you even have to change the textual content of a e-book to make it work as an E-book; in different instances you should not do this.

Duties CSS should deal with#section10

  • Drop caps. It’s simple to seek out industrial E-books the primary phrase of which has an error: The phrase is written as its first letter adopted by an area and the remainder of its letters. It’s an artifact of drop caps, which in desktop publishing are normally rendered as a separate letter disconnected from the remainder of the phrase. In standards-compliant E-books, you must neglect about drop caps or use a CSS selector (:first-letter).

    The identical goes for kind remedies on the primary phrases (typically the primary n phrases) of a chapter or part. Possibly the primary 5 phrases use small caps or daring. There is no such thing as a method to do this in CSS as but, although you’ll be able to fashion your complete first line of a paragraph. You might need to wrap the primary n phrases in a SPAN with a classname (which can then carry over into Phrase and InDesign for later styling).

  • Small caps. Software program that renders HTML (not simply net browsers) has a tough time with small capitals. The CSS is straightforward sufficient to declare — font-variant: small-caps. However even when the software program has entry to a font with real designed small caps, it normally received’t use them. It’s going to use kém chất lượng small caps as a substitute (common capitals at a smaller level measurement). Giả small caps are normally too brief, nearly at all times too gentle, and sometimes spaced too shut collectively.

    E-books should use CSS to specify small caps. However what you’ll find yourself seeing for now’s kém chất lượng small caps, not actual ones.

  • Columns. Regardless of what former Microsoft researcher Invoice Hill might imagine, multicolumn steady textual content is unnecessary in a window that may resize and/or scroll. (Would you like your columns repeatedly redrawing themselves earlier than your very eyes?) Columns could make sense in a display that stays mounted and motionless. For that objective, CSS3 columns module could be tried, although real-world use could present its weaknesses, as with positioning illustrations, column-spanning headings, and callouts.

  • Indents. One of many easiest (additionally least adopted) conventions of e-book typography, indenting the primary line of a paragraph that follows one other paragraph however nothing else, has by no means been less complicated to arrange than in CSS:  p+p { text-indent: quantity }.

    Clean strains between paragraphs are a Microsoft Phrase artifact which can be moreover broadly utilized in onscreen textual content. In e-book typesetting, they’re a mistake (however don’t inform that to O’Reilly, the computer-book writer that loves this “format”). When you actually need a clean line between paragraphs, add a margin-bottom to P. Supply copy shouldn’t be polluted with extraneous carriage-return characters, that are troublesome to suppress.

Duties the reader software program should deal with#section11

  • H&J. Everybody complains about full-justified textual content in E-readers (textual content with straight left and proper margins). It’s more durable to learn as a result of letterspacing and wordspacing are worse, inflicting rivers of whitespace. The explanation?  E-readers have a tendency to not hyphenate phrases. Hyphenation is complicated and nonetheless has not been perfected even for languages the place there’s a robust market incentive to take action, like English.

    To make use of the business jargon, this situation is all about H&J (hyphenation and justification). Authors want to withstand the temptation so as to add soft-hyphen characters to E-texts. Hyphenation is only a show conference. Hyphenation adjustments when the format adjustments (like switching from tall view to vast view).

    E-book hyphenation must be carried out by laptop algorithms and dictionaries. In print publishing, knowledgeable human proofreaders can override a system’s H&J choices, however once you’re studying an E-book you don’t have a kind of knowledgeable proofreaders seated alongside you. E-reader software program has to implement hyphenation; no one else ought to contact it.

  • Ligatures. One of many very first issues anybody with an curiosity in typography learns about is the usage of ligatures — normally f adopted by f, i, or l. Becoming a member of the letters collectively into ligatures avoids disagreeable collisions, like the highest of an f hitting the dot of an i.

    As with hyphenation, ligatures are purely a show artifact. Your rendering engine must put them in. Don’t pollute your supply textual content with ligature characters. (What if I need to capitalize massive blocks of textual content? What if I need to search the textual content, or lookup a phrase containing a ligature character in a dictionary? In fact you could possibly program very clever software program to beat the issue. It’s simpler to keep away from the issue.) Rarer ligatures, like ct and st, are additionally a problem for show engines, not underlying textual content.

    When you must actively stop ligature use, as in an URL that features the letters fi or fl, there appears to be no method round including a zero-width nonjoiner character between the letters. (There is no such thing as a CSS declaration to show ligatures on and off, although a CSS3 proposal would allow you to do this.)

  • Hanging (or hung) punctuation. Typesetting some punctuation marks, like citation marks and dashes, barely outdoors the margin makes printed textual content look higher and might also make onscreen textual content look higher. This too is as much as the show engine, not the textual content or its writer.

Alterations to e-book textual content#section12

Pure separation of structural markup and presentation can be unimaginable to realize in books extra typically than on web sites. Widespread book-typography options could be adequately expressed in E-books solely by the sacrilege of altering the supply manuscript.

  • Dashes. As generally utilized in print books, em sprint () with no areas on both facet doesn’t work in onscreen textual content. Rendering engines could also be too dumb to interrupt a line earlier than or after the em sprint. In fact which may be solved sometime. However in any occasion the character fails at its meant operate — to interrupt up textual content, as for appositives and parenthetical statements. En sprint () surrounded by areas avoids linebreak issues and works higher on the meant objective. (Acknowledged concisely: Nospace-emdash-nospace doesn’t work; space-endash-space does.)

  • Area characters. You completely can use house characters wider and narrower than a typical phrase house. Em, en, and skinny areas are all outlined in Unicode, together with many others, and show help is sort of good and enhancing. An ordinary phrase house or a nonbreaking phrase house is the fallacious character in lots of constructions, as between nested ranges of citation marks or apostrophe adjoining to citation mark:

    • “I’ve Acquired Chills. They’re Multiplyin’ ”
      (apostrophe; skinny house; finish double quote)
    • “Technical is one thing techies do.  ‘I’m a inventive — I don’t contact that!’ ”
      (finish single quote; skinny house; finish double quote)
    • It’s a nod to the “ ’80s New Wave” sound of the Automobiles and Blondie
      (open double quote; skinny house; apostrophe)
  • Superiors, inferiors, fractions. In concept any character could be typeset as a superscript or subscript, normally altering the that means (πr² and πr2 are two various things). Fonts typically come outfitted with pre-designed superior and inferior characters, usually digits (⁰¹²³⁴⁵⁶⁷⁸⁹) and letters utilized in ordinals (thirteenth, 13e) and salutations (Mlle, Sra.). Fonts typically have extra superscripts and subscripts than are outlined in Unicode, however the place a Unicode superior or inferior exists, use it as a substitute of SUB or SUP markup.

    Math is a separate dialogue. (It at all times is.) Nonetheless, don’t attempt to kém chất lượng out fractions as if you had been utilizing a typewriter. The small variety of Unicode characters for vulgar fractions must be utilized in all instances. There is no such thing as a dependable technique in HTML and CSS to assemble fractions from superiors and inferiors and fraction slash, nor a technique to create stacked fractions.

    Sections. HTML’s single greatest deficiency for lengthy paperwork is its lack of sections. They exist in HTML5, however ePub doesn’t use HTML5. Sections in nonfiction books could generally be differentiable via the usage of headings, however the basic book-design paradigm of leaving additional house between sections (with completely different kind on preliminary phrases of the brand new part) merely can’t be marked up in HTML. (In unusual instances, part breaks like these happen proper on the backside of a printed web page and must be inferred.)

    There may be one other custom in e-book composition that may be tailored — typesetting a fleuron or sprint between sections. It’s functionally equal to the usage of HR, which might, with issue, be styled to be much less intrusive. Nonetheless, you’re nonetheless merely suggesting that sections have modified; what you aren’t doing is definitively encapsulating sections in their very own markup.

Particular observe about tables#section13

Again and again, tables are held up as one thing E-books just about can’t do. I learn this as an admission that individuals doing E-book “conversion” don’t perceive desk markup. Horrendously complicated tables could be marked up in HTML. (What they may actually be complaining about is how a lot width a desk takes up — maybe greater than a sure E-reader show natively has.)

Experimenting with the type of the e-book is one factor, however E-book construction will not be one thing we should always make up as we go alongside. We shouldn’t fake there aren’t any guidelines, nor ought to we import print-book ideas that don’t work in onscreen books. The dominant E-book format of the long run, ePub, can profit from our almost ten years’ expertise constructing standards-compliant web sites.

Leave a Comment