In: Categories » » HTML XHTML and CSS » Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML.
Note Some of the solutions covered in this chapter apply tools described in much more detail in later chapters — notably the XHTML DTDs and cascading style sheets. If you encounter issues you don't understand, keep them in mind and study the chapters describing those issues more closely when you get to them. The steps covered in this chapter are more important for establishing the context in which you use certain technologies than for explaining those technologies.
An Initial HTML Document The following document, which I use as a test case, isn't an ordinary HTML document. It's designed to contain some of the serious "gotchas" that conversions to XHTML involve. It's more or less a worst-case scenario, although its contents aren't unusual. (It's a little more meaningless than usual, but fairly ordinary otherwise.) This single document produces five derivatives, representing different paths to XHTML conformance. The following document is reasonably small, but it contains a lot of problems in a small space:
<HTML>
<HEAD>
<TITLE>Non-XHTML HTML</TITLE>
<SCRIPT LANGUAGE="JAVASCRIPT">
function presentCount() {
counter="";
for (i=0; i<10; i++) {
counter=counter + " " + i;
}
alert (counter);
}
</SCRIPT>
</HEAD>
<BODY BGCOLOR=#FFFFFF>
<FONT FACE="Times" SIZE="24" COLOR="BLUE"><B>Non-XHTML HTML</FONT></B>
<P><A NAME="description">This document opens in most HTML browsers, but it is definitely not
XHTML.</A>
<P>The cleanup shouldn't cause too many problems, we hope.
<LI><a href="javascript:presentCount()">Click me for a count!</a><br>
<LI><a href="query.htm?val1=1&val2=2&val3=3">Click here for a query!</a>
<li><a href="#description">Click here for a description of this page</a>
<p>Copyright 2000 by the Wacki HTML Writer <br>
All rights reserved.
</BODY>
</HTML>
Two Remedies While the initial HTML isn't in incredibly bad shape, it uses the FONT element – a deprecated element that the W3C is trying to stamp out and replace with cascading style sheets (CSS). Web designers have two choices for dealing with this shift. The first approach uses XHTML 1.0's transitional DTD to avoid this complication entirely, while the second bites the bullet and makes some more structural changes to fit the document into the strict DTD. While the first approach is simpler in the short run, it may mean more work later. The second approach has more of an up-front cost – and may mean that you spend considerable time toiling over complex documents – but it should prove more stable and more manageable in the long run.
There are also a number of cases in which XHTML provides multiple approaches to solving the same problem. We'll take advantage of the fact that we're creating two different versions of the XHTML document. The two versions will test two strategies for keeping the < sign in the script from causing problems in browsers and XML parsers. (Neither works especially well in HTML browsers, as it turns out.) We'll also put each strategy through two different phases of development. The first phase keeps all the resources used by a document (such as scripts and style sheets) inside of the document, while the second phase moves those resources to separate files.
Remedy 1: The Transitional DTD and CDATA Sections By using the transitional DTD, you can preserve the formatting used in the document – mostly the large, blue headline – without having to change the over-all document structure in any significant way. While this document is simple enough that the changes aren't that difficult (as shown in the second approach), more complex documents require an enormous investment of time to convert them to the strict DTD. For starters, you need to add the DOCTYPE declaration to the start of your document. (You can add the XML declaration, but leave that to the second approach. For the transitional DTD, that means:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/1999/PR-xhtml1-19991210/DTD/xhtml1-transitional.dtd">
This identifies the document as using the XHTML 1.0 transitional DTD from the W3C, allowing validating XML parsers to check the document using the formal declarations it contains. The opening HTML tag needs several changes. First, you must change it to lowercase; second, you must include an attribute declaring the XHTML namespace for its contents (as described in Article 4). The new version looks like this:
<html xmlns="http://www.w3.org/1999/xhtml">
You need to change the tags for the HEAD and TITLE elements to lowercase, as well as change the title to reflect the document's new identity:
<head> <title>Transitional XHTML - Phase 1</title>
The SCRIPT element presents a larger problem. It contains the forbidden character <, which needs to be escaped to get past an XML parser. For this pass, use a CDATA section to mark off the contents of the (now lowercase) script element. This allows the characters <, >, and & to appear anywhere within a script. (If the sequence ]]> appears, you need to break it up with whitespace like ]] >.) The script element also needs to have a type attribute added to it. The W3C supports the language attribute, but insists on a type attribute with a MIME content type identifying the scripting language as well.
<script language="javascript" type="text/javascript">
<![CDATA[
function presentCount() {
counter="";
for (i=0; i<10; i++) {
counter=counter + " " + i;
}
alert (counter);
}
]]>
</script>
</head>
The script element is inside the head element, so the CDATA section shouldn't cause problems with display – although it may make browser scripting engines malfunction.
Tip Another trick that can help you avoid problems with < in scripts is to recast expressions like i<10 to 10>i. XML parsers may raise warnings when they encounter the > symbol, however. The body of the document presents some more complicated problems. Because you're using the transitional DTD, you can keep the bgcolor attribute (put in lowercase, of course) on the body element. However, you have to add quotes:
<body bgcolor="#FFFFFF">
The headline is the next challenge. The transitional DTD supports the font and b elements, but you need to rearrange them so that they nest cleanly. You also need to store these elements in a higherlevel element. The p element serves nicely, although you also can use the div element. We'll also change the size attribute's value to 6, as these are supposed to be expressed as a range from 1 to 7, not as a point size:
<p><font face="Times" size="6" color="blue"><b> Transitional XHTML - Phase 1</b></font></p>
Once again, you change the title so that it more accurately describes the content of the page. The next element, the first paragraph, includes an anchor with a NAME attribute. Lowercase this and then supplement it with an id attribute. The p element also needs a closing tag at the end of the paragraph.
<p><a name="description" id="description">This document is transitional XHTML - we'll see how it does in the browsers.</a></p>
(Yes, the text changed again.) The next paragraph just needs you to make its P element into a lowercase p and give it a closing tag:
<p>The cleanup shouldn't cause too many problems, we hope.</p>
You need to put the following list items in lowercase, give them end tags, and enclose them in some kind of list element – ul, for unordered list, seems most appropriate. The br element following the first list item is unnecessary so you can remove it.
<ul> <li><a href="javascript:presentCount()">Click me for a count!</a></li>
The use of javascript in href attributes isn't recommended, but you can leave it for now as it isn't expressly prohibited (although you change it in the second approach). The next line also includes a URL, this time with ampersands. The cleanup process needs to replace them with &.
<li><a href="query.htm?val1=1&val2=2&val3=3">Click here for a query!</a></li>
The last list element is mostly fine, although it needs an end tag. You must close the ul element as well:
<li><a href="#description">Click here for a description of this page</a></li> </ul>
At the end, you have a paragraph containing a line break. You need to add a closing tag for the p element and make the br element into an empty tag rather than just a start tag:
<p>Copyright 2000 by the Wacki HTML Writer <br /> All rights reserved.</p>
Finally, you must convert the closing tags of the BODY and HTML elements into lowercase to match the start tags:
</body> </html>
This completes the cleaned-up version:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/1999/PR-xhtml1-19991210/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional XHTML - Phase 1</title>
<script language="javascript" type="text/javascript">
<![CDATA[
function presentCount() {
counter="";
for (i=0; i<10; i++) {
counter=counter + " " + i;
}
alert (counter);
}
]]>
</script>
</head>
<body bgcolor="#FFFFFF">
<p><font face="Times" size="6" color="blue"><b>
Transitional XHTML - Phase 1</b></font></p>
<p><a name="description" id="description">This document is
transitional XHTML - we'll see how it does in the browsers.</a></p>
<p>The cleanup shouldn't cause too many problems, we hope.</p>
<ul>
<li><a href="javascript:presentCount()">Click me for a count!</a></li>
<li><a href="query.htm?val1=1&val2=2&val3=3">Click here for a query!</a></li>
<li><a href="#description">Click here for a description of this page</a></li>
</ul>
<p>Copyright 2000 by the Wacki HTML Writer <br />
All rights reserved.</p>
</body>
</html>
To test it out, send it to the W3C's HTML Validation Service at http://validator.w3.org/. the example a little bit further by removing the script from the document and storing it in an external file. This enables you to get rid of the CDATA section since script files don't have to be XML. The new script element references the code file using the src attribute and it looks like this:
<script language="javascript" type="text/javascript" src="mycode.js" ></script>
While it is acceptable XML practice to use an empty tag instead of the opening and closing tags, most browsers don't recognize that approach and try to treat the rest of the document as a script. The script goes into a separate file named mycode.js:
function presentCount() {
counter="";
for (i=0; i<10; i++) {
counter=counter + " " + i;
}
alert (counter);
}
The document as a whole now reads:
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/1999/PR-xhtml1-19991210/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional XHTML - Phase 2</title>
<script language="javascript" type="text/javascript" src="mycode.js" ></script>
</head>
<body bgcolor="#FFFFFF">
<p><font face="Times" size="6" color="blue"><b> Transitional XHTML - Phase 2</b></font></p>
<p><a name="description" id="description">This document is transitional XHTML -
we'll see how it does in the browsers.</a></p>
<p>The cleanup shouldn't cause too many problems, we hope.</p>
<ul>
<li><a href="javascript:presentCount()">Click me for a count!</a></li>
<li><a href="query.htm?val1=1&val2=2&val3=3">Click here for a query!</a></li>
<li><a href="#description">Click here for a description of this page</a></li>
</ul>
<p>Copyright 2000 by the Wacki HTML Writer <br />
All rights reserved.</p>
</body>
</html>
Remedy 2: The Strict DTD and Entity Replacement While the files produced using the first approach are valid XHTML, a little more work can produce documents that are easier to manage in the long run. This requires making a few more structural changes to the document and adding some cascading style sheets information. In your first pass, you convert the document to the HTML 4.01 strict DTD without worrying about XHTML. Then you convert it to XML in two slightly different ways. You also try a different approach in the scripts on the first XML pass – one that works well on XML processors but which still fails in most HTML processors.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...
2. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...
3. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
4. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
5. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...
6. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...
7. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...
8. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...
