From the Top: Document Type Definitions

"From the Top" is a series of articles that I am publishing to concisely explain how and why to construct a high quality, web-standards compliant head section for a web page. I'm starting at line 2 of the web page with this first article by examining the Document Type Definition (DTD). Line 2? Tune in same time next week to learn what you might find at line 1.

What is a Document Type Definition (DTD)?

A Document Type Definition provides a set of declarations that define a particular markup syntax. These definitions provide the syntax for applications of SGML or XML (i.e.) the markup languages of HTML and XHTML respectively. A DTD defines the following "building blocks" of an HTML or XHTML document:

  • Elements — such as head, body, p and how these may be nested (their parent / child relationship).
  • Attributes — these provide extra information about an element and has an associated value (e.g.) title="value".
  • Entities — variables used to define common text such as &amp; &lt; and &gt; (&, < and > respectively).
  • PCDATA — Parsed Character Data. This is text to be parsed by a parser.
  • CDATA — This is also Character Data but the text is not parsed by a parser.

Why do we need a DTD?

It is important to add a DTD declaration to any HTML or XHTML document to establish that the document is a valid instance of the defined DTD. This informs the World Wide Web Consortium (W3C) validator which version of (X)HTML you are using. Without this, you cannot validate the markup and Cascading Style Sheets (CSS) and will fail to meet checkpoints in the W3C web accessibility guidelines to boot. A DTD is also important for the proper rendering and functionality of web documents in browsers. By telling the browser to render in standards-compliant mode the (X)HTML, CSS and Document Object Model (DOM) code that you write will be treated as expected. Leave out the DTD, or incorrectly declare it, and you put the browser into "quirks" mode and the browser assumes you've written invalid markup and code from the 1990s. The browser will render your CSS as if it were Internet Explorer 4 and use proprietary, browser-specific Document Object Models for your JavaScript. In 2006, this is not good.

How do we use a DTD?

A Document Type Definition is declared in a web page by using the DOCTYPE (Document Type Declaration) tag. The DOCTYPE is case-sensitive and comprises two parts, the public identifier (it's name) and system identifier (Universal Resource Identifier (URI) to the DTD). Although the DOCTYPE contains the URI of the Document Type Definition the browser holds an internal list and only uses the URI as a reference. Here's an example of a correct DOCTYPE, showing the public identifier on the first line followed by the system identifier on the second:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The full list of recommended DOCTYPEs is available from the W3C website. I will put my head on the block and recommend a subset from that list for your web pages:

  1. <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
  2. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  3. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  4. <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

As we progress down the list, the allowed list of elements etc get more tightly controlled. If you are not able or comfortable with using XHTML then HTML 4.01 Strict is the doctype for you, next week's article will probably explain why you would still develop a new website to this DOCTYPE. XHTML 1.0 Transitional is ideal for beginners as it remains as flexible as HTML 4 but introduces (enforces) well-formedness. XHTML 1.0 Strict, the basis of this website and what I use for new work, is stricter in allowable elements and attributes as it attempts to remove presentational markup from the structure. Purists will probably want to go all the way and use XHTML 1.1.

Conclusion

  1. Producing a standards-compliant web page starts right at the top of the code with a correctly formed DOCTYPE (i.e.) it comprises of the public identifier and system identifier (a URI) and that the case is properly preserved. Either cut and paste the DOCTYPE declaration from the W3C resource page or ensure that your HTML editor outputs it correctly.
  2. A properly declared DOCTYPE allows the browser to render your markup and code in "standards-compliant" mode — in other words as you would expect it to.
  3. A properly declared DOCTYPE allows a validator to validate your markup and CSS, this is especially important for accessibility purposes.
  4. The only decision to make is "which DOCTYPE do I want to use?"

Next in "From the Top"

Next week I will introduce you to content negotiation, it's relevance to DOCTYPEs and why you need to know this. What you might find at line 1 of an HTML document will also be revealed.

Further Reading

The Complete "From the Top" Series

 

This entry was posted on Monday, January 9th, 2006 at 9:19 am and is filed under Markup, Standards. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

17 Responses to “From the Top: Document Type Definitions”

  1. Robin

    Of course, in the long run the doctype isn't the best of solutions: http://tbray.org/ongoing/When/200x/2005/12/15/Drop-the-Doctype

  2. Mike Cherim

    Good write-up Karl. I don't think you missed a thing. My guess is serving it up comes next? Just thinking where line 1 will come into play? ;)

  3. Blair Millen

    This looks like this could be a useful set of articles Karl; ideal place to send those new to web design (or those in need of a refreshing).

  4. Vincent

    This is always useful as a refresher and for anyone who really does not undertsand why a Doctype should be there and correct implementation.

  5. Dwacon

    Great article. If you expand and add a bit more content, could be foundation for a book. In which case, put me down for a copy!

  6. Thierry

    Smart idea. I'll be back next week...

  7. Joshue O Connor

    Good article. DTD's are a confusing and broad subject. You have managed to condense much of why a correct Doc type is important in a brief and readable article. Well done and I will keep my eye on the site for future reference.

  8. Karl Dawson

    I've changed the title of the series as it's the same name as Drew McLellan's blog.
    I hadn't been there until today, when a couple of people mentioned it :(
    I mailed Drew immediately and changed the series name.

    I have left the title of this article the same due to not wanting to break bookmarks etc.

    Regards, Karl

  9. Colin Steele

    A nice article that makes DocTypes easy to grasp. Bookmarked for future reference.

  10. Lars Gunther

    Good article. I think it's wisw to introduce the concept of doctype switching as soon as possible, though.

  11. mike

    Interesting article, but you aren't very consistant in your references - one doesn't 'implement a DTD' at all, rather we create a reference to an existing DTD. This can be rather confusing at times.

  12. Karl Dawson

    Thanks for the grammar check Mike, I've changed the single use of the word "implement" (and it's link) to "use" - words that perhaps I interchange too freely. I don't feel that the word "reference" carries enough weight for that header.

    Regards, Karl

  13. Richard Ishida

    Some quick thoughts:

    [1] "A Document Type Definition defines a set of declarations that conform to a particular markup syntax."

    Shouldn't that be 'a Document Type Definition provides a set of declarations that define a particular markup syntax'?

    [2] "It is important to add a DTD declaration to any HTML or XHTML document to establish that the document is an instance of the defined DTD. "

    maybe "establish that the document is a valid instance"

    [3] You may want to add http://www.w3.org/International/articles/serving-xhtml/ to your list of references.

  14. Karl Dawson

    Updated:

    Amendments as suggested by Richard Ishida (W3C) above.

    Regards, Karl

  15. zcorpan

    CDATA — This is also Character Data but the text is not parsed by a parser.

    Really? What is it parsed by, then? Or is it not parsed at all?

    CDATA means that < and & are to be parsed as character literals, not as a start of a tag or a general entity reference or a character reference. In SGML </ closes the element if it is declared as CDATA in the DTD.

  16. Karl Dawson

    zcorpan: Thanks for clarifying the treatment of CDATA. I summed that up way too short - and badly as a result.

    Regards, Karl

  17. CSS Mastery Book Review | That Standards Guy

    [...] The foundation chapter provides a clear and easy to understand introduction to meaningful markup techniques for CSS “hooks” — divs, spans, ids and classes as well as discussion on DOCTYPEs, browser modes and validation before diving in to CSS selector types, the cascade and specificity. The chapter finishes with discussion on how best to organise your stylesheets - no, don’t just lump it all together in a single file [...]