That Standards Guy



Search

About

That Standards Guy is the online persona of Karl Dawson, a web developer living and working in Ipswich, England.

I'm a member of the Guild of Accessible Web Designers and the Web Standards Group and team member at Accessites—an awards site to recognise accessible and usable websites.

I specialise as a front-end developer and worry about the minutae of semantic (X)HTML and CSS, accessibility, microformats, typographic rhythm and grid design. I also care about the user experience and remind myself constantly of visitor site goals when working with clients and their aims.

That Standards Guy is proudly powered by WordPress using my own “StrictlyTSG v3.0” theme. Site Policies.

Stay up to date via the RSS feed. What’s RSS?

Archive for January, 2006

From the Top: Defining Content Language

The opening html tag when expanded with a few attributes is a real boon for increasing the accessibility of your web pages. In this the third article in my “From the Top” series, I’ll introduce each of those attributes and explain the benefits of their use.

The lang Attribute

Declaring the language of a web page occurs at two levels. Firstly, the primary language of a document may be used by search engines to return only web pages in a specific language and is declared in either the Hypertext Transport Protocol (HTTP) header or as a content-language meta tag. Access to server settings or ability to perform content negotiation will affect your decision on which method to use.

The second, and more specific level defines the default text-processing language a specific range of text is written in. For Hypertext Markup Language (HTML) this is achieved using the lang attribute and for Extensible Markup Language (XML), in our case as Extensible Hypertext Markup Language (XHTML) using xml:lang. Where both are present (i.e. in backwards compatibility mode) the xml:lang takes precedence.

For both flavours of markup the text-processing or natural language is inherited along the document hierarchy so to apply your main language to the entire document the html tag is ideal. The default text-processing language can be changed further along the heirarchy by applying the lang attribute to a more specific element.

Language Tags and Localisation

The value of the lang (or xml:lang) attribute is referred to as a language tag. It comprises the primary subtag optionally followed by further subtags separated by a hyphen. Language tags use two or three letter language codes such as en for English, de for German and fr for French. in such cases where a two and three letter code exists for the same language, the two letter code should be used. By including a subtag the natural language of the document can be localised further for dialect or region so en-GB would identify British English text and fr-CA would indicate content written in Canadian French. Subtags are case-insensitive. There are special-case primary subtags of i- and x- but these will be outside your normal usage (unless you have a killer site for Klingons that is) so I will leave those for you to look at another day.

Reading Order

It is easy to forget that whereas “western” languages are read from left to right, there are also major languages such as Chinese and Arabic that read in the opposite direction. The reading order is not necessarily inherited from the chosen language tag so we will add the dir attribute to the html tag and assign the value “ltr” to it. For languages read from right to left the attibute value would be “rtl”. These are the only two options, imagine the fun to be had with “ttb” (top to bottom) for authentic Japanese writing (and yes, “rtl” also).

Why it’s Important to Define the Language

Declaring the text-processing, or “natural” language of a page is beneficial for many purposes:

  • To assist screen readers and braille translators.
  • To meet World Wide Web Consortium (W3C) Web Accessibility Initiative (WAI) guidelines - specifically checkpoints 4.1 and 4.3.
  • To meet legislative requirements, for example the Disability Discrimination Act (DDA) in the UK.
  • To provide authoring tools with the ability to check spelling and grammar.
  • To identify the correct language of a section of text for translation tools.
  • To style information in a specified language using the Cascading Style Sheets (CSS) :lang pseudo class.
  • To filter search engine results based on the user’s language preference.
  • To assist the parsing of the text of the document with XSL or some other scripting by other people / devices.

The xmlns Attribute

If your markup is XHTML another attribute you must include is the xmlns declaration for the XHTML namespace. Remembering that XHTML is a reformulation of HTML as an application of XML, an XML namespace is a collection of names, identified by a Universal Resource Indicator (URI) reference, that are used in XML documents as element types and attribute names. You need to declare the namespace so that a user agent knows which elements belong to which language. The namespace is declared using the attribute xmlns followed by the URI, which for our purposes is http://www.w3.org/1999/xhtml.

Conclusion

To maximise the universal accessibility of our pages we should always include language information in our pages. We can identify the natural language of the content by using the lang attribute and/or the xml:lang attribute for XHTML and must always include the XML namespace if using XHTML. Additionally, we can specify the primary language of the document using HTTP headers or the content-language meta tag. Examples of the opening html tag include:

For XHTML 1.0 in backwards compatibility mode:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

For XHTML as application/xhtml+xml:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr">

For HTML 4:

<html lang="en" dir="ltr">

References

Next in “From the Top”

Next week I’ll cover the title tag and provide a few tips from around the Internet to writing effective page titles.

The Complete “From the Top” Series

 

From the Top: MIME and Content Negotiation

Content negotiation at its simplest is a conversation between your web server and a user agent (browser, search engine bot etc) to determine the preferred format or version of a resource to serve. In this, the second in my article series “From the Top” I will introduce you to the web (head) waiter that knows how to correctly serve your web page to a user agent.

MIME and Content Negotiation

Content negotiation at its simplest is a conversation between your web server and a user agent (browser, search engine bot etc) to determine the preferred format or version of a resource to serve. To achieve this, a user agent will send a Hypertext Transfer Protocol (HTTP) “Accept” header to the web server with a list of preferred Multipurpose Internet Mail Extension (MIME) types and a ranking or weighting (the Quality Value) of how well it understands a particular MIME type (the ranking is from 0 to 1 to three decimal places). If no Quality Value (q) is defined for a MIME type then q=1.0 is assumed. MIME as the name suggests was first used as an extension to email but is also used by HTTP. It is simply a way to define what type of media (resource) is being sent — be it an image, Flash, text etc. Each resource has a MIME type consisting of two parts separated by a forward-slash, “/”. The first part is called the top-level media type and the second is the subtype. For example, the MIME type for a Graphics Interchange Format (GIF) image would be image/gif.

A Mozilla accept header may look like this:
Accept: text/xml, application/xml, application/xhtml+xml, text/html;q=0.9, text/plain;q=0.8, image/png, image/jpeg, image/gif;q=0.2, */*;q=0.1

For Hypertext Markup Language (HTML) the MIME type is text/html. User agents treat this in a very forgiving way but this tag soup rendering mode is slower to display as a result.

If you are using Extensible Hypertext Markup Language (XHTML) markup the correct MIME type is application/xhtml+xml. This is because XHTML is HTML reformulated as an Extensible Markup Language (XML) application and as such must be well-formed with no overlapping elements, properly closed tags, attribute values enclosed in quotes and care taken with case-sensitivity (all elements and attributes written in lowercase). Due to these quality checks, user agents are able to handle the markup more efficiently and render quicker than tag soup.

User agent support for application/xhtml+xml

Unfortunately, Internet Explorer 6 Service Pack 2 (IE 6 SP2) and below does not understand this particular MIME type and will attempt to download the page as an XML file. The MIME type is also buggy in several other user agents although as time goes by compatibility will naturally improve. Until such time then, there are two methods to get around this showstopper. The first is to use XHTML 1.0 in what is termed backwards compatibility mode and the other is through content negotiation.

XHTML 1.0 Backwards Compatibility Mode

By following the World Wide Web Consortium (W3C) guidelines XHTML 1.0 can be served with the MIME type of text/html. This mode considers such techniques as including a space before the trailing /> when closing a tag, avoiding white space and line breaks in attribute values and encoding ampersands in content including Universal Resource Indicators (URIs) referenced in hyperlinks. A lot of web developers coding to web standards (including the author and this website at time of writing) work in this mode and it is a matter of hot debate about the correctness of this method. Although we may not be utilising XML within a particular website at launch, XHTML 1.0 for me at least offers forward compatibility with future XML applications I may be asked to implement — HTML cannot offer that. I can and do serve XHTML 1.0 as text/html without specific content negotiation and it works (obviously) but I want to do this right and so does this article series.

Gotchas of Serving application/xhtml+xml

Before we embark on serving our XHTML with the correct MIME type there are several issues that you must consider — remember you might be coding for a Content Management System (CMS) / multi-author environment. The text editor needs to be capable and configured properly.

  • Code must be well-formed. Remember XHTML is a reformulation of HTML as an application of XML. As such must it be well-formed.
  • The XML Declaration is required for character sets other than UTF-8 and UTF-16 and is referenced as part of the XML Prolog on line 1 of your code in the format <xml version="1.0" encoding="yourChosenCharset" ?>
  • Stylesheets may be referenced with an XML stylesheet Processing Instruction (PI) as part of the XML Prolog (along with the DOCTYPE). The Processing Instruction <?xml-stylesheet href="myStyle.css" type="text/css" ?> is written much the same way as the HTML 4 <link rel="stylesheet">. If you are serving alternative stylesheets then the link href="myStyle.css" title="Medium" rel="alternate stylesheet" type="text/css" becomes <?xml-stylesheet alternate="yes" href="myStyle.css" title="Medium" type="text/css"?> as a style sheet PI. The W3C have written a very clear normative recommendation on associating style sheets with XML documents including several more examples.
  • Only five named character entities are “safe”: &lt;, &gt;, &amp;, &quot; and &apos;. It should be noted however that &apos; is undefined in HTML 4 and unsupported in Internet Explorer. You will need to ensure that all other character references are numeric in nature. Lachy’s log explains character references in greater detail.
  • Anything within style or script tags are treated as XML so you must wrap content using < or & in a Character Data (CDATA) section.
  • No elements are inferred, for example, tbody.
  • Scripting with document.write doesn’t work, you must use the Document Object Model (DOM) core methods. If you use Google’s AdSense on your website then you may need to apply a Google-approved workaround (if they haven’t fixed it already).
  • Cascading Style Sheets (CSS) are applied slightly differently. For example, to apply a background colour to the body element would require the html element to be styled also as the body element doesn’t cover the whole viewport when using XHTML.
  • HTML comments in scripts or styles for example <script type="text/javascript"></script> will result in a fatal error (and the page won’t display as a result) in an XHTML document served as application/xhtml+xml. This is due to the fact that in XML the last pair of hyphens causes a well-formedness error. The correct way to write script or style blocks for XHTML when served as application/xhtml+xml is in the format <script type="text/javascript"><![CDATA[ { // do something } //]]></script>. Of course, the easiest way to avoid all this is to put your scripts and styles into external files in the first place. Lachlan Hunt has a great in-depth article that goes into the why’s and where for’s of HTML comments in scripts.

Doing the “Right Thing” ™

Rather than letting the server decide whether to serve a page as index.xhtml or index.html (note these would be separate files) based on the preferences sent in the accept header, content negotiation should be configured on the web server if you have access, or through scripting in your template. If you have an Apache server, I’ll send you off to read the manual now, as I want to concentrate on providing an overview of scripting a solution in this article.

Irrespective of using asp.NET, PHP, etc the following thought process is required:

  1. Lower the Quality of Source (qs) parameter for application/xhtml+xml on the server to account for possibly incomplete accept headers.
  2. Specifically test for the W3C validator as it doesn’t send a complete accept header.
  3. Parse the http_accept header - find out the user agent’s preference.
  4. Send the preferred MIME type.
  5. Send a Vary header to inform proxy servers that content negotiation is taking place.
  6. Send the correct DOCTYPE.
  7. Send the appropriate opening html tag — this will be discussed further in next week’s article.
  8. If application/xhtml+xml is preferred, send the XML Declaration — do not include it with text/html as this will put IE for Windows into Quirks Mode.
  9. If application/xhtml+xml is preferred, send the XML Stylesheet declaration(s).
  10. If text/html is preferred, the closing of tags with ” />” needs to be changed to “>”.
  11. For text/html it is best to define the character encoding in the HTTP header rather than hard code <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> into your pages or templates. Again, the W3C have a very straightforward document explaining server configuration techniques.

Code Example — PHP

The following code snippet is a small modification to Neil Crosby’s original work. Test and refine on a development server please, I offer no warranties on it working straight off the bat. Write a simple include at the top of your web page (or template) to reference this external file.

<?php
$charset = "utf-8";
$mime = "text/html";

function fix_code($buffer) {
   return (str_replace(" />", ">", $buffer));
}

if(stristr($_SERVER["HTTP_ACCEPT"],"application/xhtml+xml")) {
   # if there's a Q value for "application/xhtml+xml" then also
   # retrieve the Q value for "text/html"
   if(preg_match("/application\\/xhtml\\+xml;q=0(\\.[1-9]+)/i",
                 $_SERVER["HTTP_ACCEPT"], $matches)) {
      $xhtml_q = $matches[1];
      if(preg_match("/text\\/html;q=0(\\.[1-9]+)/i",
                    $_SERVER["HTTP_ACCEPT"], $matches)) {
         $html_q = $matches[1];
         # if the Q value for XHTML is greater than or equal to that
         # for HTML then use the "application/xhtml+xml" mimetype
         if($xhtml_q >= $html_q) {
            $mime = "application/xhtml+xml";
         }
      }
   # if there was no Q value, then just use the
   # "application/xhtml+xml" mimetype
   } else {
      $mime = "application/xhtml+xml";
   }
}

# special check for the W3C_Validator
if (stristr($_SERVER["HTTP_USER_AGENT"],"W3C_Validator")) {
   $mime = "application/xhtml+xml";
}

# set the prolog_type according to the mime type which was determined
if($mime == "application/xhtml+xml") {
   $prolog_type = "<?xml version=\"1.0\" encoding=\"$charset\" ?>
<?xml-stylesheet type=\"text/css\" href=\"/styles/initial.css\" media=\"all\"?>
<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Strict//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\">
<html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"en-GB\" lang=\"en-GB\" dir=\"ltr\">\\n";

} else {
   ob_start("fix_code");
   $prolog_type = "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN\" \"http://www.w3.org/TR/html4/strict.dtd\">
<html lang=\"en-GB\" dir=\"ltr\">\\n";
}

# finally, output the mime type and prolog type
header("Content-Type: $mime;charset=$charset");
header("Vary: Accept");
print $prolog_type;
?>

Code Example — asp.NET

Via Roger Johansson’s article Content Negotiation, Justin Perkins provides an asp.NET example (same disclaimer as before):

string http_accept = Request.ServerVariables["HTTP_ACCEPT"];
string http_user_agent = Request.ServerVariables["HTTP_USER_AGENT"];

if (((http_accept != null) && (http_accept.ToLower().IndexOf("application/xhtml+xml") > 0)) || ((http_user_agent != null) && (http_user_agent.ToLower().IndexOf("w3c_validator") > -1))){
    Response.ContentType = "application/xhtml+xml";
    Response.Write("&lt;?xml version=\"1.0\" encoding=\"iso-8859-1\"?&gt;\\n");
}
else{
    Response.ContentType = "text/html";
}
Response.Charset = "iso-8859-1";
Response.AddHeader("Vary", "Accept");

Conclusion

Developing now with XHTML 1.0 allows for forward compatibility but many developers only deploy websites with XHTML 1.0 in backwards compatibility mode (using the text/html MIME type). In order to get the most benefit from XHTML 1.0, developers need to properly consider the issues surrounding the application/xhtml+xml MIME type and implement content negotiation accordingly. It is both worthwhile and achievable and once the solution has been written it is available for re-use within your quality development framework.

References

Next in “From the Top”

Next week will (thankfully) be a shorter article (perhaps!) explaining why the HTML element is actually required.

Technorati tags: , .

The Complete “From the Top” Series

 

Failed Redesign: MSN UK Entertainment

After reading Joe Clark’s blog entry on failed redesigns I thought I’d keep an eye out for UK sites failing to meet web standards after a redesign. Well I went looking a moment ago and from the following article Netimperative - MSN redesigns UK entertainment channel trooped off to the MSN UK Entertainment website.

Visually, it’s very clean and simple - under the hood though? Oh my God!

  • No doctype ;)
  • in-line JavaScript
  • JavaScript to give Jeremy Keith a fit (probably)
  • Tables! lots and lots of tables
  • in-line Styles
  • multiple &nbsp; to force white-space
  • Unescaped ampersands (naturally)

I could probably go on. They have the cheek to say this is accessible? When they are using an 8 point font size somewhere in there?

Pull the other one, it’s got Bill Gates on it.

Technorati tag:

Popular articles

Elsewhere

I’m promoting

Patronage: It ain't just for the Medicis. The Joe Clark Micropatronage project