The Hassle With EM ’n EN (and Different Shady Characters) – A Checklist Aside

A observe from the editors: This text, whereas sensible for its time, is now out of date.

The printing press gave us sort that was clearer and simpler to learn than that produced from a typewriter, as a result of the typesetter had further instruments at his disposal—and knew the right way to use them. The net has price us a few of these instruments.

Article Continues Beneath

Lack of instruments and data#section2

There are two issues right here. The primary is that till HTML 4 got here alongside, the online was lacking nearly all of those instruments (it’s nonetheless lacking many essential ones).

However the bigger downside is, now that they’re accessible, nearly nobody publishing on the net right now is aware of the right way to use them—or usually even is aware of of their existence.

Learn this, although, and also you’ll perceive the solutions to each issues much better than nearly anybody else, together with your English lecturers.

Most HTML References Are Incorrect#section3

I’ve misplaced depend of all of the books, articles, and web sites that declare an em sprint is “—”—however they’re all improper. Your complete vary from  by Ÿ are invalid characters, and consequently shouldn’t be used.

Since Netscape 4.x browsers don’t perceive lots of the named entity references (reminiscent of ’ for a proper single quote), I’m not going to say any of them right here (although they’ve been utilized by A Checklist Aside, bless its little coronary heart).

Essentially the most dependable technique to insert particular characters by far is to make use of decimal entity notation. Some characters have 4 strategies of reference: named, decimal, hexadecimal, and UTF-8 (Unicode), however solely the decimal type is dependable throughout browsers and platforms. Use the others if you want, however solely if you wish to be bombarded by Netscape 4.x customers complaining about your “corrupted” pages.

UTF-8 encoding to the rescue—nearly#section5

The one technique to insert these characters (and any character past 127) correctly with out utilizing entity codes is to make use of the UTF-8 character encoding (the default for XHTML and XML paperwork).

Sadly, only a few textual content editors assist this, and lots of extra browsers choke on UTF-8 characters than do on named entities, so don’t use them until you don’t give a hoot about Netscape 4 customers.

(FrontPage and Dreamweaver don’t insert most of those characters correctly, so don’t depend on their “insert image” instruments both.)

Hyphens are Not Dashes#section6

Cease! Return and re-read the subhead above—at the very least 2–3 occasions—then let it sink in earlier than persevering with.

The sentence above illustrates the correct use of the hyphen and the 2 fundamental forms of dashes. They aren’t the identical, and should not be confused with one another. In some fancy fonts the distinction is extra than simply the width—hyphens have a definite serif. In the event you don’t know the foundations already, let’s evaluate them. First, although, a definition:

An “em” is a unit of measurement outlined as the purpose dimension of the font—12 level sort makes use of a 12 level “em.” An “en” is one-half of an “em.”

Although a few of the finer factors within the guidelines are complicated, their primary functions are clear-cut and their misuse simply identifiable. First, neither an em sprint nor an en sprint ought to be confused with the hyphen (-), which is used to affix compound phrases collectively.

The proper use of em and en#section7

The em sprint () is used to point a sudden break in thought (“I used to be excited about writing a—what time did you say the film began?”), a parenthetical assertion that deserves extra consideration than parentheses point out, or as a substitute of a colon or semicolon to hyperlink clauses. It is usually used to point an open vary, reminiscent of from a given date with no finish but (as in “Peter Sheerin [1969—] authored this doc.”), or obscure dates (as a stand-in for the final two digits of a four-digit 12 months).

Two adjoining em dashes (a 2-em sprint) are used to point lacking letters in a phrase (“I simply don’t f——ing care about 3.0 browsers”).

Three adjoining em dashes (a 3-em sprint) are used to substitute for the writer’s identify when a repeated collection of works are introduced in a bibliography, in addition to to point a whole lacking phrase within the textual content.

The en sprint () is used to point a variety of absolutely anything with numbers, together with dates, numbers, sport scores, and pages in any form of doc.

It is usually used as a substitute of the phrase “to” or a hyphen to point a connection between issues, together with geographic references (just like the Mason–Dixon Line) and routes (such because the New York–Boston commuter prepare).

It’s used to hyphenate compounds of compounds, the place at the very least one pair is already hyphenated (as in “Netscape 6.1 is an Open-Supply–primarily based browser.”). The Chicago Guide of fashion additionally states that it ought to be used “The place one of many elements of a compound adjective comprises a couple of phrase,” as a substitute of a hyphen (as in “Netscape 6.1 is an Open Supply–primarily based browser”). Each of those guidelines are for readability in indicating precisely what’s being modified by the compound.

Different sources additionally specify using an en sprint when referring to joint authors, as within the “Bose–Einstein” paper. Some additionally want it to a hyphen when textual content is ready in all capital letters.

Some typographers want to make use of an en sprint surrounded by full areas as a substitute of an em sprint. Others want to insert hair areas on both facet of the em sprint, however that is problematic with some net browsers (see the part on areas for extra element).

That hyphen you may insert with the important thing subsequent to the zero in your keyboard is an ambiguous character affected by an identification disaster. It will probably’t determine if it’s a hyphen, a minus, or an en sprint—the truth is, the Unicode specification describes it as “hyphen-minus” and defines very particular replacements for every of its personalities.

Use it if you have to insert a hyphen, however by no means for a minus () or a touch, because it doesn’t have the proper width for both, or the vertical place for the latter (evaluate “1+4-2=3” to “1+4−2=3”).

The mushy hyphen (­ a.ok.a. “discretionary hyphen” and “optionally available hyphen”) is for use for one goal solely—to point the place a phrase could also be damaged on the finish of a line. In any other case, it’s to stay invisible and never have an effect on the looks of the phrase.

Some browsers show it irrespective of the place it falls, however this isn’t the proper habits. Others previously have beneficial towards its use as a result of its habits was not well-defined, however the HTML 4.01 spec makes its use and habits clear and unambiguous.

Three different hyphen characters exist in Unicode, however are sadly not outlined within the HTML entity set (though they need to be):

  1. The non-breaking hyphen ( not in HTML) does simply what its identify implies.
  2. The hyphen character ( not in HTML) is supposed for use rather than the hyphen-minus when a hyphen is strictly the specified character.
  3. The hyphenation level ( not in HTML) is that bullet-like character you discover in some dictionaries to separate syllables. That’s its solely use, however for those who’re creating an internet dictionary, utilizing it can make your entries look extra skilled.

There are fifteen house characters outlined in Unicode. That’s proper, fifteen. Most aren’t outlined in HTML, and you’ll ignore many of those. Although many ought to be part of the online, let’s take care of those which are outlined first.

A regular house (a.ok.a. “phrase house”) is your trusty outdated good friend coming in at .

The non-breaking house (  or  ), generally present in otherwise-empty desk cells, is safely referred to by both its numeric or named entity reference in all 3.0-level and better browsers.

I mentioned earlier that some want to encompass single em dashes with a hair house ( not in HTML), which is between one-tenth to one-sixteenth of an em, nevertheless it isn’t outlined in HTML 4.01.

The skinny house () is essentially the most related house character which is outlined in HTML. It’s imagined to be one-fifth of an em in width, however is nearly at all times rendered a lot wider. The one font I’ve discovered with a appropriately designed hair house in is Arial Unicode MS, and it renders each with nearly precisely the identical width.

Backside-line: Except you may make sure that your audience has Arial Unicode MS put in, neither of those areas has something near the specified and proper look.

The final two areas within the HTML repertoire are the en house () and the em house (). Are you able to guess how broad every is?

Each are visibly wider than a traditional house, and as soon as once more, Arial Unicode MS is the one mainstream font that features each, though they’re a part of the official HTML 4.01 specification.

That leaves the areas outlined by Unicode however not HTML. Use them at your individual danger:

Generally in typesetting you have to present a touch that the pc can break an extended phrase in a selected place with out every other interpretation or seen indication. That is the zero width house ( not in HTML). It’s not outlined in HTML 4.01, and it doesn’t work in IE until you’re utilizing Arial Unicode MS.

Its evil twin is the zero width no-break house ( not in HTML), which might (theoretically) be used to maintain a phrase from breaking at that time. Surprisingly, although this isn’t outlined in HTML both, it really works as-designed in IE6/Win.

I’m going to make life straightforward on you right here (nicely, principally). There are literally fourteen citation characters. (Eighteen for those who depend the massive, daring variations within the Dingbats part of Unicode.) I’m going to faux that almost all of them don’t exist—you’ll solely want them for international languages anyway.

Newspapers of (damaged) document#section14

Strategies for appropriately inserting curly quotes in net pages are usually not nicely understood. Don’t, below any circumstances, use “ by • for curly quotes.

Don’t ever belief the 8-bit representations to be appropriate, as a result of they nearly actually received’t be. The largest downside is that many net browsers assume that 8-bit characters seek advice from the native character system, translating your curly quotes or dashes into Greek or accented Latin characters on different platforms. These similar browsers at all times get the numeric entity references proper.

And don’t ever attempt to ``nhái it´´ with doubled-up grave accents and straight single quotes or acute accents, as a lot of the ``best-known newspapers" do.

Do use:

  1. for an opening single quote (Ctrl + ` ` in Phrase—that’s two grave accents—that character on the tilde key).
  2. for a closing single quote (or an apostrophe) (Ctrl + ‘’ in Phrase).
  3. for an opening double quote (Ctrl + ` ” in Phrase).
  4. for a closing double quote (Ctrl + ’ ” in Phrase).

I’ll wager you didn’t know this about HTML—the <q> and <blockquote> parts are designed to have quote marks mechanically inserted within the acceptable places. No present browser does this by default, nevertheless, and even people who do when confronted with the suitable model sheet markup (as detailed in CSS) get it improper, particularly with curly quotes.

HTML 4.01 mandates that this happen for the factor, and advises authors towards putting quotes manually, since this might lead to double quotes.

My suggestion: Keep away from using <q> fully till that is extensively supported, and both do the identical with <blockquote> (attainable as a result of indented paragraphs are implied to be quotations by conference within the English language), or place the automated quote code in your model sheet and tolerate the truth that some browsers will produce rubbish. (Or, simply outline all of those quotes to be plain-old straight quotes, and keep away from a lot of the issues.)

That is such a disgrace, since CSS can mechanically apply one of many forgotten guidelines about quotes: When quoting a number of paragraphs, every one begins with a gap quote, however solely the final paragraph has a closing quote. Oh, nicely…

Many individuals (most, from what I’ve noticed) consider that curly single opening and double opening quotes are the proper symbols for ft and inches. In case you are one in every of these individuals, put out your hand so I can slap it with a ruler.

The proper symbols to make use of are prime and double prime. They appear much like curly quotes in a number of fonts, however are often way more distinct. They by no means, ever appear like commas. They’re often set at a slight angle of 75—80 levels, and are additionally often tapered from the highest to the underside.

A single prime is used to symbolize ft or minutes ( not in HTML 4.01), whereas a double prime is used to point inches or seconds ( not in HTML 4.01). (I received’t make you be taught in regards to the triple prime and the three reversed variations of those characters.)

Lastly, listed here are some high quality factors on using the ellipsis ():

  1. An ellipsis is most frequently used to point a number of lacking phrases in a citation. It is usually used to point when a thought or citation trails off.
  2. When it happens on the finish of a sentence, it ought to be handled in one in every of 3 ways, relying on utilization:
    1. If the ellipsis is getting used to point one or lacking phrases within the sentence, then it ought to be adopted by a interval.
    2. If it signifies a number of lacking sentences, then it ought to seem after the interval of the previous sentence, and with an area on both facet.
    3. But when it signifies that the thought or quote is simply trailing off on the finish of a sentence, then solely the ellipsis is used, to make clear that no phrases from a citation have been omitted, as can be the case if the extra interval have been there.

There are extra shady characters lurking within the background, however the ones described above are the commonest and essential.

In conclusion, I may inform you about all of the locations the place try to be utilizing the one dot chief ( and not in HTML 4.01) as a substitute of the plain outdated interval, however that might simply be too merciless. Moreover, all you actually need to know is the interval’s official Unicode identify—full cease.

  1. William Strunk: Components of Fashion
  2. W3C HTML 4.01 entity definitions
  3. Unicode Consortium
  4. Worldwide System of Models
  5. NASA’s A Handbook for Technical Writers and Editors is of nice assist even for those who don’t write about technical topics.
  6. Jukka Korpela supplies a large amount of element on particular characters as a part of a bigger collection on characters, and a buttload of further net authoring—associated data.
  7. Received an in depth query about which characters permit line breaks to happen? An replace to the Unicode specification has all of the solutions

Leave a Comment