That Standards Guy



Search

About

That Standards Guy is the online persona of Karl Dawson, a web developer living and working in Ipswich, England.

I'm a member of the Guild of Accessible Web Designers and the Web Standards Group and team member at Accessites—an awards site to recognise accessible and usable websites.

I specialise as a front-end developer and worry about the minutae of semantic (X)HTML and CSS, accessibility, microformats, typographic rhythm and grid design. I also care about the user experience and remind myself constantly of visitor site goals when working with clients and their aims.

That Standards Guy is proudly powered by WordPress using my own “StrictlyTSG v3.0” theme. Site Policies.

Stay up to date via the RSS feed. What’s RSS?

From the Top: Defining Content Language

The opening html tag when expanded with a few attributes is a real boon for increasing the accessibility of your web pages. In this the third article in my “From the Top” series, I’ll introduce each of those attributes and explain the benefits of their use.

The lang Attribute

Declaring the language of a web page occurs at two levels. Firstly, the primary language of a document may be used by search engines to return only web pages in a specific language and is declared in either the Hypertext Transport Protocol (HTTP) header or as a content-language meta tag. Access to server settings or ability to perform content negotiation will affect your decision on which method to use.

The second, and more specific level defines the default text-processing language a specific range of text is written in. For Hypertext Markup Language (HTML) this is achieved using the lang attribute and for Extensible Markup Language (XML), in our case as Extensible Hypertext Markup Language (XHTML) using xml:lang. Where both are present (i.e. in backwards compatibility mode) the xml:lang takes precedence.

For both flavours of markup the text-processing or natural language is inherited along the document hierarchy so to apply your main language to the entire document the html tag is ideal. The default text-processing language can be changed further along the heirarchy by applying the lang attribute to a more specific element.

Language Tags and Localisation

The value of the lang (or xml:lang) attribute is referred to as a language tag. It comprises the primary subtag optionally followed by further subtags separated by a hyphen. Language tags use two or three letter language codes such as en for English, de for German and fr for French. in such cases where a two and three letter code exists for the same language, the two letter code should be used. By including a subtag the natural language of the document can be localised further for dialect or region so en-GB would identify British English text and fr-CA would indicate content written in Canadian French. Subtags are case-insensitive. There are special-case primary subtags of i- and x- but these will be outside your normal usage (unless you have a killer site for Klingons that is) so I will leave those for you to look at another day.

Reading Order

It is easy to forget that whereas “western” languages are read from left to right, there are also major languages such as Chinese and Arabic that read in the opposite direction. The reading order is not necessarily inherited from the chosen language tag so we will add the dir attribute to the html tag and assign the value “ltr” to it. For languages read from right to left the attibute value would be “rtl”. These are the only two options, imagine the fun to be had with “ttb” (top to bottom) for authentic Japanese writing (and yes, “rtl” also).

Why it’s Important to Define the Language

Declaring the text-processing, or “natural” language of a page is beneficial for many purposes:

  • To assist screen readers and braille translators.
  • To meet World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI) guidelines - specifically checkpoints 4.1 and 4.3.
  • To meet legislative requirements, for example the Disability Discrimination Act (DDA) in the UK.
  • To provide authoring tools with the ability to check spelling and grammar.
  • To identify the correct language of a section of text for translation tools.
  • To style information in a specified language using the Cascading Style Sheets (CSS) :lang pseudo class.
  • To filter search engine results based on the user’s language preference.
  • To assist the parsing of the text of the document with XSL or some other scripting by other people / devices.

The xmlns Attribute

If your markup is XHTML another attribute you must include is the xmlns declaration for the XHTML namespace. Remembering that XHTML is a reformulation of HTML as an application of XML, an XML namespace is a collection of names, identified by a Universal Resource Indicator (URI) reference, that are used in XML documents as element types and attribute names. You need to declare the namespace so that a user agent knows which elements belong to which language. The namespace is declared using the attribute xmlns followed by the URI, which for our purposes is http://www.w3.org/1999/xhtml.

Conclusion

To maximise the universal accessibility of our pages we should always include language information in our pages. We can identify the natural language of the content by using the lang attribute and/or the xml:lang attribute for XHTML and must always include the XML namespace if using XHTML. Additionally, we can specify the primary language of the document using HTTP headers or the content-language meta tag. Examples of the opening html tag include:

For XHTML 1.0 in backwards compatibility mode:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

For XHTML as application/xhtml+xml:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr">

For HTML 4:

<html lang="en" dir="ltr">

References

Next in “From the Top”

Next week I’ll cover the title tag and provide a few tips from around the Internet to writing effective page titles.

The Complete “From the Top” Series

 

6 Responses to “From the Top: Defining Content Language”

  1. It’s important to understand that there is a distinct difference between the meaning of the Content-Language HTTP header and the lang and xml:lang attributes. The HTTP header indicates the language of the intended audience, whereas the attributes indicate the actual language of the content. In most cases, the langauge of the document and the intended audience language will be the same, but not always. For example, a tutorial comprsing predeominately French text for teaching to an English speaking reader may contain lang="fr", but be served with Content-Languge: en.

  2. One more thing, it doesn’t make any sence to discuss xmlns in an article about natual languages. Can you explain why that was included?

  3. Thanks Lachlan, The article is about writing a complete tag. If you are writing a document in XHTML then you need to include the xml namespace too. I added it for completeness.

    I am also trying to protect novice or busy developers from the really technical details by pitching a concise summary of “what”, “why” and “how”. The references can then lead the keen into the technical background.

  4. A couple of comments, in addition to Lachlan’s:

    The dir=”ltr” on the html tag is redundant. By default a x/html document is “ltr” unless otherwise specified.

    An interesting chase for language tags, are Unicode documents containing hanzi/kanji/hanji (CJK ideographs). Simplified Chinese, Traditional Chinese, japanese and Korean can share a codepoint, but have different glyph variations for the character represented by that glyph. If there is no appropriate font declaration in the style sheet, and no language tagging, then web browsers will use a default language for rendering. This may result in an incorr3ect font being used, e.g. a Japanese font used to render Simplified Chinese text. Appriopriate langauge tagging of CJK text is important in order for culturally appropriate glyphs to be used.

  5. The use of the letter direction isn’t always needed as it can be implied by DTD, charset, language, etc. But I don’t feel its use is a boo-boo per se. Being redundant isn’t bad until its redundancy interferes with something. In this case adding it to this article is good “general” advice, especially since this article isn’t confined to XHTML alone (or it is and I missed that part).

  6. Andrew / Mike,

    Yes, I can’t remember the source of my opinion on including dir=”ltr” it goes back at least a year I think, it must have been based on something more than redundancy of function but if it’s not in my del.icio.us I’m doomed to forget. In the research around this article, I didn’t even discover the default behaviour info as obviously that would have changed my mind immediately. Thanks for posting about it, I’m always ready to modify an opinion I may hold about something.

    Mike: No, you didn’t miss anything - this article caters for both markup languages.