Extra About Customized DTDs – A Listing Aside

In a earlier situation of A Listing Aside, Peter-Paul Koch mentioned the addition to his markup of non-standard attributes to create JavaScript triggers. J. David Eisenberg famous that for the reason that attributes weren’t a part of XHTML, the W3C Markup Validator was more likely to reject paperwork utilizing these attributes, and wrote an article on the subject of Customized DTDs and validation.

Article Continues Beneath

This text will observe up on these writings by discussing the necessity for customized DTDs: why making a customized DTD for the only real objective of validation is a mistake, and through which instances it does make sense to create and use one. For these instances, this text may even current strategies for creating clear customized DTDs and avoiding hacks.

Of poetry and markup languages#section2

As a substitute of speaking about customized DTDs, let’s simply discuss poetry for a second. Poetry and net paperwork are very comparable of their language constraints, and poetry additionally addresses the subjects of affection, flowers and starry skies, one thing DTDs seldom do. So, poetry wins.

An impetuous poet might, every now and then, commit a spelling mistake or a grammatical blunder in even probably the most good ode to like. A dictionary or spell checker can due to this fact be a great tool, even when it gained’t repair a lame rhyme or make a bland metaphor shine; expertise, inspiration, and some different instruments will assist with that.

Can a poet resolve to invent new phrases, develop new grammatical constructs? Definitely! The brand new language wouldn’t be English, however that’s what poetic license is all about, isn’t it? And if in the long run the poem is utilizing a language that’s so distant from English that nobody understands it, who cares? We will nonetheless name it artwork….

Sadly, browsers, search engine indexers and different brokers on the internet have a really restricted understanding of artwork and poetic license — and net content material, not like poetry, advantages from using a lingua franca. The interoperability derived from utilizing a typical language is, in any case, the aim behind the existence of the “net requirements.”

Net requirements? That’s exactly why I need to validate!#section3

And you might be proper, after all! Except, after all, you deal with each validation and net requirements as some form of sacred cow, virtually forgetting the that means behind their significance.

That is in all probability the worst mistake that the advocates of net requirements can ever make: to struggle for an summary, arcane idea of requirements and contemplate validation for the sake of validity a aim in itself.

The facility of a typical language#section4

Babel. The mythic tower teaches us that there’s nice social worth in a language used broadly, and that the facility of those shared semantics could be maximized if the frequent language is used correctly and persistently by all events concerned in a given communication.

Within the net paradigm, the place the communication is seldom one-to-one however largely one-to-many, the above logic is much more essential, and interprets to: “The correct use of the shared semantics of a normal markup language empowers me as an online writer, and since I do know that validation is one sensible manner of getting near that aim, due to this fact I validate my net paperwork.”

You will need to stay acutely aware, nonetheless, that validity alone will not be a assure of compliance, and even farther from being a assure of high quality.

Many normative ideas of the HTML specs can’t be expressed when it comes to DTD constraints. The content material of attributes, for example, are usually outlined within the DTD as being CDATA (roughly equal to “any form of textual content”) despite the fact that the specification defines a really clear and strict syntax for them. Nothing in a DTD, for example, can implement that the content material of the lang attribute should be a language code from RFC1766…

Not solely isn’t validity a magic key to high quality, there isn’t even such a factor as absolute, basic “validity.” A doc can declare to be utilizing a selected DTD, and be legitimate with reference to this DTD. That is what is commonly summarized as “a legitimate doc.”

The above doesn’t deprecate the method of validation. It does, nonetheless, remind us of a vital level: validation just one step in checking the correctness of the language utilized in markup paperwork, and will by no means turn into the aim in itself. That’s the reason the concept of making a customized DTD for the only real objective of getting paperwork move validation is perhaps well-motivated however it’s finally a deluded endeavor.

When customized DTDs make sense#section5

There are, after all, actual instances through which the creation and use of a
customized DTD for a customized markup language is sensible. Taking the prevailing XHTML and enriching it with new parts could be an attention-grabbing
and helpful apply, for example, to handle an inside information
database in a selected subject.

The conditions through which customized markup
languages are wanted characteristically embody comparatively particular, closed environments. These environments can profit from extending past the semantics offered by the
normal languages with out worrying concerning the modified semantics of
a world language and the absence of interoperability with generic
brokers. If finished correctly, the prolonged or proprietary language can
pretty be remodeled into “common” XHTML.

When such a case arises, downloading and modifying the usual DTD
till it suits the necessities will not be a chic methodology; it’s even
discouraged by the HTML (4.01) specification:

For causes of interoperability, authors should not “lengthen” HTML by the out there SGML mechanisms (e.g., extending the DTD, including a brand new set of entity definitions, and so forth.).

A way more elegant and environment friendly methodology exists: the Modularization of XHTML
offers a clear framework to create semantically wealthy languages. It’s, for example, the strategy used to construct MathML2 (DTD) and the WAP Discussion board’s XHTML Cellular Profile (DTD).

Making a customized markup language with the modularization of XHTML#section6

The creation of a customized DTD utilizing the XHTML Modularization methodology will not be very sophisticated, however it isn’t trivial both. A step-by-step information would require a full article, and we is not going to go into particulars right here, however listed here are a couple of pointers and reminders:

  • The directions from the Modularization of XHTML specification are a very good place to study the practicalities of Modular DTD creation, however the tutorial written by HTML Working Group participant Shane McCarron is a good higher place to begin.
  • Generally, it isn’t needed to start out from scratch as it may be tedious, generally, extending XHTML shall be comparatively straightforward, and affords the good thing about constructing on the already outlined semantics of the usual XHTML. For those who don’t want the entire XHTML as a place to begin, likelihood is selecting a couple of modules and leaving others out will give you a compact, but helpful, foundation for the brand new language.
  • When creating new parts, keep away from overloading the html namespace. Creating the brand new parts in a brand new XML namespace, as advisable by the specification,  is barely marginally extra sophisticated, and ensures that semantics are clearly outlined and distinguished.

Reworking our customized language to plain XHTML#section7

We at the moment are capable of create paperwork utilizing our new XHTML Host Language (that is the formal identify for a language constructed utilizing the modularization of XHTML), and its prolonged semantics. In lots of instances the brand new language shall be used solely in a closed setting, however in some unspecified time in the future or one other we’ll need to put our information on the entire, wild world extensive net.

As mentioned above, it might not be a good suggestion to place paperwork authored in our proprietary language straight on the internet, since few or no consumer agent will be capable of perceive its peculiar semantics. However since each our customized language and XHTML are XML-based languages, we will simply use XSLT to rework a doc written within the former to a doc within the latter.

As a easy instance, contemplate the case the place we’d have expanded XHTML with a brand new <poetryml:writer> component, utilizing the href and identify attributes, and an empty <poetryml:pause /> component marking speech pauses in a poetry verse.

The poetryml2xhtml.xsl
stylesheet may then be used to rework the PoetryML markup <poetryml:writer identify=“foo” uri=“http://folks.instance.org/misterfoo”> into XHTML: <handle class=“writer”>foo

The stylesheet is slightly lengthy, however the transformation talked about above is the matter of some traces:

<xsl:template match="poetryml:writer">
    <handle class="writer">
        <a href="https://alistapart.com/article/customdtds2/{@uri}"><xsl:value-of choose="{@identify}" /></a>

Is that this a loss of semantics? From a PoetryML standpoint, it’s, since our poetryml:writer component would have been very exactly outlined, however seen from the generic net consumer agent, handle has a a lot clearer that means than an unknown component in a overseas namespace.

Customized languages created with the strategy above are clearly not international requirements, however inside a given setting, for example the information database of an organization, the particular semantics of this proprietary language are properly understood by the vary of instruments for which they’ve been designed.

The HTML normal, not like these proprietary languages, is made by a course of making certain, amongst different advantages, the widest doable interoperability throughout net browsers. Constructing
a proprietary language upon this normal is a wonderfully acceptable apply so long as you might be conscious that the proprietary language loses these advantages. A doc written through the use of such a customized DTD could also be validated in opposition to this DTD, however it is not going to be legitimate (X)HTML1.0 Strict, HTML 4.01 Transitional, or some other model of the HTML normal. It will likely be legitimate… one thing else.

Customized DTDs could be a very great tool to counterpoint the prevailing markup languages or create fully new ones. One all the time has to remember the fact that they’re tantamount to creating a brand new language, and that proprietary languages are finest saved in closed environments the place they are often taught to a restricted set of brokers and instruments, and NOT to make the online a contemporary model of the Tower of Babel by unleashing them within the wilderness.

Leave a Comment