From the Top: Defining Content Language

The opening html tag when expanded with a few attributes is a real boon for increasing the accessibility of your web pages. In this the third article in my “From the Top” series, I’ll introduce each of those attributes and explain the benefits of their use.

The lang Attribute

Declaring the language of a web page occurs at two levels. Firstly, the primary language of a document may be used by search engines to return only web pages in a specific language and is declared in either the Hypertext Transport Protocol (HTTP) header or as a content-language meta tag. Access to server settings or ability to perform content negotiation will affect your decision on which method to use.

The second, and more specific level defines the default text-processing language a specific range of text is written in. For Hypertext Markup Language (HTML) this is achieved using the lang attribute and for Extensible Markup Language (XML), in our case as Extensible Hypertext Markup Language (XHTML) using xml:lang. Where both are present (i.e. in backwards compatibility mode) the xml:lang takes precedence.

For both flavours of markup the text-processing or natural language is inherited along the document hierarchy so to apply your main language to the entire document the html tag is ideal. The default text-processing language can be changed further along the heirarchy by applying the lang attribute to a more specific element.

Language Tags and Localisation

The value of the lang (or xml:lang) attribute is referred to as a language tag. It comprises the primary subtag optionally followed by further subtags separated by a hyphen. Language tags use two or three letter language codes such as en for English, de for German and fr for French. in such cases where a two and three letter code exists for the same language, the two letter code should be used. By including a subtag the natural language of the document can be localised further for dialect or region so en-GB would identify British English text and fr-CA would indicate content written in Canadian French. Subtags are case-insensitive. There are special-case primary subtags of i- and x- but these will be outside your normal usage (unless you have a killer site for Klingons that is) so I will leave those for you to look at another day.

Reading Order

It is easy to forget that whereas “western” languages are read from left to right, there are also major languages such as Chinese and Arabic that read in the opposite direction. The reading order is not necessarily inherited from the chosen language tag so we will add the dir attribute to the html tag and assign the value “ltr” to it. For languages read from right to left the attibute value would be “rtl”. These are the only two options, imagine the fun to be had with “ttb” (top to bottom) for authentic Japanese writing (and yes, “rtl” also).

Why it’s Important to Define the Language

Declaring the text-processing, or “natural” language of a page is beneficial for many purposes:

  • To assist screen readers and braille translators.
  • To meet World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI) guidelines – specifically checkpoints 4.1 and 4.3.
  • To meet legislative requirements, for example the Disability Discrimination Act (DDA) in the UK.
  • To provide authoring tools with the ability to check spelling and grammar.
  • To identify the correct language of a section of text for translation tools.
  • To style information in a specified language using the Cascading Style Sheets (CSS) :lang pseudo class.
  • To filter search engine results based on the user’s language preference.
  • To assist the parsing of the text of the document with XSL or some other scripting by other people / devices.

The xmlns Attribute

If your markup is XHTML another attribute you must include is the xmlns declaration for the XHTML namespace. Remembering that XHTML is a reformulation of HTML as an application of XML, an XML namespace is a collection of names, identified by a Universal Resource Indicator (URI) reference, that are used in XML documents as element types and attribute names. You need to declare the namespace so that a user agent knows which elements belong to which language. The namespace is declared using the attribute xmlns followed by the URI, which for our purposes is http://www.w3.org/1999/xhtml.

Conclusion

To maximise the universal accessibility of our pages we should always include language information in our pages. We can identify the natural language of the content by using the lang attribute and/or the xml:lang attribute for XHTML and must always include the XML namespace if using XHTML. Additionally, we can specify the primary language of the document using HTTP headers or the content-language meta tag. Examples of the opening html tag include:

For XHTML 1.0 in backwards compatibility mode:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

For XHTML as application/xhtml+xml:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr">

For HTML 4:

<html lang="en" dir="ltr">

References

Next in “From the Top”

Next week I’ll cover the title tag and provide a few tips from around the Internet to writing effective page titles.

The Complete “From the Top” Series

 

This entry was posted in Markup, Standards. Bookmark the permalink.

6 Responses to From the Top: Defining Content Language

  1. Lachlan Hunt says:

    It’s important to understand that there is a distinct difference between the meaning of the Content-Language HTTP header and the lang and xml:lang attributes. The HTTP header indicates the language of the intended audience, whereas the attributes indicate the actual language of the content. In most cases, the langauge of the document and the intended audience language will be the same, but not always. For example, a tutorial comprsing predeominately French text for teaching to an English speaking reader may contain lang="fr", but be served with Content-Languge: en.

  2. Lachlan Hunt says:

    One more thing, it doesn’t make any sence to discuss xmlns in an article about natual languages. Can you explain why that was included?

  3. Karl Dawson says:

    Thanks Lachlan, The article is about writing a complete tag. If you are writing a document in XHTML then you need to include the xml namespace too. I added it for completeness.

    I am also trying to protect novice or busy developers from the really technical details by pitching a concise summary of “what”, “why” and “how”. The references can then lead the keen into the technical background.

  4. A couple of comments, in addition to Lachlan’s:

    The dir=”ltr” on the html tag is redundant. By default a x/html document is “ltr” unless otherwise specified.

    An interesting chase for language tags, are Unicode documents containing hanzi/kanji/hanji (CJK ideographs). Simplified Chinese, Traditional Chinese, japanese and Korean can share a codepoint, but have different glyph variations for the character represented by that glyph. If there is no appropriate font declaration in the style sheet, and no language tagging, then web browsers will use a default language for rendering. This may result in an incorr3ect font being used, e.g. a Japanese font used to render Simplified Chinese text. Appriopriate langauge tagging of CJK text is important in order for culturally appropriate glyphs to be used.

  5. Mike Cherim says:

    The use of the letter direction isn’t always needed as it can be implied by DTD, charset, language, etc. But I don’t feel its use is a boo-boo per se. Being redundant isn’t bad until its redundancy interferes with something. In this case adding it to this article is good “general” advice, especially since this article isn’t confined to XHTML alone (or it is and I missed that part).

  6. Karl Dawson says:

    Andrew / Mike,

    Yes, I can’t remember the source of my opinion on including dir=”ltr” it goes back at least a year I think, it must have been based on something more than redundancy of function but if it’s not in my del.icio.us I’m doomed to forget. In the research around this article, I didn’t even discover the default behaviour info as obviously that would have changed my mind immediately. Thanks for posting about it, I’m always ready to modify an opinion I may hold about something.

    Mike: No, you didn’t miss anything – this article caters for both markup languages.