Converting to strict HTML and XHTML

an article added by: Albert Lichtblau at 06022007



In: Categories » » HTML XHTML and CSS » Converting to strict HTML and XHTML

Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document:

   <!DOCTYPE HTML
   PUBLIC "-//W3C//DTD HTML  4.01//EN"
 "http://www.w3.org/TR/html4/strict.dtd">
  

Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE attribute – instead, a TYPE attribute containing a MIME content identifier (text/javascript) for the script is required:

   <HTML>
   <HEAD>
   <TITLE>Non-XHTML Strict  HTML</TITLE>
   <SCRIPT  TYPE="text/javascript">
   function presentCount() {
   counter="";
   for (i=0; i<10; i++) {
   counter=counter + " " + i;
   }
   alert (counter);
   }
   </SCRIPT>
 </HEAD>

Because this is still regular HTML and not XHTML, the < sign and the uppercase element names in the script are fine. When you read the BODY start tag and the headline, however, you should notice a problem. The BGCOLOR attribute of the BODY element isn't supported by the strict DTD and neither is the FONT [GSL1]element used for the headline. There are two ways to handle this problem. The first approach simply moves the formatting information to a different place within the elements concerned – the STYLE attribute. This approach, called in-line styling, is more of a quick-fix solution. It solves the immediate problem of preserving formatting, but it doesn't make the document any more manageable in the long term. The new BODY start tag and headline look like this:

   <BODY  STYLE="background-color:#FFFFFF">
 <P STYLE="color:blue; font-family:Times,  serif; font-size:24pt"><b>Non-XHTML Strict  HTML</b></p>

The second solution separates the style information from the element markup entirely, putting it in its own place inside of the document's head element. This requires two steps. First, you clean up the elements using an H1 element in place of the p element (after all, this is a headline):

   <BODY>
 <H1>Strict XHTML - Phase 1</H1>

Next, you add a style element to the head element of the document, containing the same formatting information that appears in the style attributes. The style element uses cascading style sheets syntax to identify the elements to which the formatting is applied and to describe the formatting:

   <STYLE TYPE="text/css">
   BODY {background-color:#FFFFFF }
   H1 {color:blue; font-family:Times,  serif; font-size:24pt}
 </STYLE>

Because the information now is stored at the beginning of the document in a style element, you can use that formatting across elements anywhere in the document. While you might have only one H1 element in a given document, it isn't unusual for a document to have many copies of lower-level headings or other components. As phase 2 demonstrates, this approach also enables you to store style information in a form that can be shared across multiple documents. This makes it easy to define and modify a look for a set of documents. The next few paragraphs are fine as they stand.

   <P><A  NAME="description">This document opens in most HTML browsers, but  it is definitely not
 XHTML.</A>
 <P>The cleanup shouldn't cause  too many problems, we hope.
   The LI elements of the list need to be  contained within a UL element. Now it's time to change the
   approach used by the link that calls  the script. You left the javascript in href attributes in the other
   approach, but you change it here.  First you use a span element to replace the a element, and use its
 onclick  attribute to  capture the event.
 <UL>
 <LI><SPAN  ONCLICK="presentCount()">Click me for a  count!</SPAN></LI>
 

Tip For an explanation of why the javascript usage is discouraged, see http://lists.w3.org/Archives/Public/www-html/2000Feb/0039.html. Although case is mixed up in the next few LI elements and their contents, these elements require very few changes. You need to replace the ampersands in the query string in the link with the &amp; entity, and you need to add a closing UL tag.

   <LI><a  href="query.htm?val1=1&amp;val2=2&amp;val3=3">Click here  for a query!</a>
   <li><a href="#description">Click  here for a description of this page</a>
 </UL>

The remainder of the document is acceptable as is:

   <p>Copyright 2000 by the Wacki  HTML Writer <br>
   All rights reserved.
   </BODY>
   </HTML>
   The document as a whole now looks like this:
   <!DOCTYPE HTML
   PUBLIC "-//W3C//DTD HTML 4.01//EN"
   "http://www.w3.org/TR/html4/strict.dtd">
   <HTML>
   <HEAD>
   <TITLE>Non-XHTML Strict HTML</TITLE>
   <SCRIPT TYPE="type/javascript">
   function presentCount() {
   counter="";
   for (i=0; i<10; i++) {
   counter=counter + " " + i;
   }
   alert (counter);
   }
   </SCRIPT>
   <STYLE TYPE="text/css">
   BODY {background-color:#FFFFFF }
   H1 {color:blue; font-family:Times, serif;  font-size:24pt}
   </STYLE>
   </HEAD>
   <BODY>
   <H1>Strict Non-XHTML HTML</H1>
   <P><A  NAME="description">This document opens in most HTML browsers, but  it is definitely not
   XHTML.</A>
   <P>The cleanup shouldn't cause too many  problems, we hope.
   <UL>
   <LI><SPAN  ONCLICK="presentCount()">Click me for a  count!</SPAN></LI>
   <LI><a  href="query.htm?val1=1&amp;val2=2&amp;val3=3">Click here  for a query!</a>
   <li><a  href="#description">Click here for a description of this  page</a>
   </UL>
   <p>Copyright 2000 by the Wacki HTML Writer  <br>
   All rights reserved.
   </BODY>
   </HTML>
 

Converting to strict XHTML The conversion to strict HTML does a lot to simplify the process of converting to strict XHTML, but there's still a lot to do. For starters, you use the XML declaration and a different DOCTYPE declaration at the top of this document. The XML declaration enables you to declare the document encoding (which you do again in the head element) and the version of XML used, while the DOCTYPE declaration tells processors that this document will abide by the rules of the XHTML strict DTD:

   <?xml version="1.0"  encoding="UTF-8"?>
   <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0  Strict//EN"
 "http://www.w3.org/TR/1999/PR-xhtml1-19991210/DTD/xhtml1-strict.dtd">

Once again, the HTML element needs some modification: making it lowercase. Take the opportunity to add some information about the language this document uses (English) and do so using both the oldstyle HTML lang attribute and the XHTML xml:lang attribute.

 <html  xmlns="http://www.w3.org/1999/xhtml" lang="en-US"  xml:lang="en-US">

The head element gets some extra information as well. While this addition isn't necessary to meet the demands of the strict DTD, it makes sense in the context of the strict approach and provides the identifying the encoding used in this document to HTML browsers:

   <head>
   <title>Strict XHTML - Phase  1</title>
 <meta http-equiv='Content-type'  content='text/html; charset="UTF-8"'>

You can experiment with the script element in this document using a character entity to represent the < character rather than hiding the script within a CDATA section:

   <script type="text/javascript">
   function presentCount() {
   counter="";
   for (i=0; i&lt;10; i++) {
   counter=counter + " " + i;
   }
   alert (counter);
   }
 </script>

Using entities may prove easier in an XML-only context than with CDATA sections, but it may cause problems (as you'll see) in HTML browsers. You need to add a style element in the head as well.

   <style type="text/css">
   body {background-color:#FFFFFF }
   h1 {color:blue; font-family:Times, serif;  font-size:24pt}
   </style>
 </head>

You already cleaned up the architecture of the body and h1 elements, so just move them to lowercase.

   <body>
 <h1>Strict XHTML - Phase 1</h1>

The next element, the first paragraph, includes an anchor with a NAME attribute. Just like with the transitional version, you need to lowercase this and supplement it with an id attribute. The p element also needs a closing tag at the end of the paragraph.

   <p><a name="description"  id="description">This document is strict
   XHTML - we'll see how it does in the  browsers.</a></p>
   (Yes, the text changed yet again.)
   The next paragraph just needs you to make its P element into a lowercase p and give it a closing tag:
   <p>The cleanup shouldn't cause too many  problems, we hope.</p>
You must put the list item elements that follow in lowercase and give them end tags. The br element following the first list item is unnecessary so you can remove it. Otherwise, just make the markup lowercase and close the li element.
   <ul>
   <li><span  onclick="presentCount()">Click me for a  count!</span></li>
The rest of the conversion can follow the previously established pattern for the transitional DTD. The next two list items need end tags.
   <li><a  href="query.htm?val1=1&amp;val2=2&amp;val3=3">Click here  for a query!</a></li>
   <li><a  href="#description">Click here for a description of this  page</a></li>
   </ul>
At the end, you have a paragraph containing a line break. You need to add a closing tag for the p element and make the br element into an empty tag rather than just a start tag:
   <p>Copyright 2000 by the Wacki HTML Writer  <br />
   All rights reserved.</p>
   </body>
   </html>
 

Browser Testing While the W3C's HTML Validation Service is a useful tool for making sure that documents conform to the specification, most of the documents created previously will have at least some problems in existing browsers. To demonstrate the kinds of problems you may encounter as you deploy XHTML, the next few pages show the results of running the original HTML, the strict HTML, and all of their variations through a variety of browsers of different vintages. No browser accepts every version, but you can see trends emerging over time. The browsers tested here range from the obsolete to the experimental. While very few users still work with Netscape Navigator 1.22 (though it's still used on some older servers), its response to XHTML documents demonstrates how some aspects of the strict approach can make XHTML more palatable to even the oldest of commercial browsers. Newer browsers have an extraordinary number of quirks that suggest Web designers will test their work in multiple browsers for some time to come. Because the Microsoft Internet Explorer versions tend to vary widely on different platforms, I provide samples for both Macintosh and Windows. The Netscape and Amaya browsers display the same results whatever operating system they use, so I show results for Windows NT and Windows 95.

Note You can run these same sets of tests on your own browser. The test files are available at http://www.simonstl.com/xhtml/code/chap5/. While the browser tests may not make the browsers look great at handling XHTML, this is hardly a knock on their performance. Most of these were written well before XHTML even began to germinate, so you can't hold them responsible for ideas hatched long after their code was completed. This set of tests provides benchmarks you can use to determine your strategy for creating XHTML documents, not to evaluate browser performance.

Lessons Creating XHTML that meets the W3C's specs clearly is not enough to achieve interoperability with older browsers. While you probably can disregard some of the experiments in very old browsers, there are still some tools with serious problems that you shouldn't ignore – and some issues (such as the use of CDATA sections in scripts) are here to stay. Although most designers now expect at least version 3 or version 4 functionality in their pages, the preceding examples demonstrate that even those browsers aren't really enough to handle full-fledged XHTML. In addition, many programs use toolkits for integrating browser functionality that support more or less the equivalent of Netscape 1 or 2. Complying with both the W3C standards and the compatibility limitations of existing browsers is yet another painful challenge for Web developers, much like the early problems with JavaScript, dynamic HTML, and cascading style sheets. Making any of these technologies work in a mixed environment is difficult, and there are very few completely pure communities of users working with only the latest browsers. The test suite does demonstrate some strategies to avoid when creating XHTML pages that have to work in older browsers. First, it's easiest to move scripts outside of the document. Although Netscape 3.0 has some problems with this strategy, script elements stored inside the body element rather than the head seem to get around this problem. (It's acceptable to the XHTML DTD, although it's not exactly best practice in documents with lots of scripts at the front of the document.) Similarly, cascading style sheets information should be stored in external files because it keeps their contents from cluttering the top of the document. It may delay browser rendering if style sheet retrieval is slow, but it does make file management easier. The XML declaration causes problems in many older browsers. (In some embedded browser tools, it even keeps documents from displaying at all.) Although leaving out the XML declaration effectively restricts documents to using UTF-8 or UTF-16, this may not pose a problem if you have tools for editing documents in these encodings. Java uses UTF-8 by default, and various Microsoft tools are capable of exporting Unicode text in these formats. If your pages can get by using the ASCII subset of characters – basically American English – your files will be UTF-8 compatible automatically. Users of the common Latin-1 or other character encodings may want to upgrade their tools so that they can save their files to UTF-8 or UTF-16 if the XML declaration display issue is a serious concern.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Converting to strict HTML and XHTML  
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

2. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

3. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

4. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

5. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

6. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...

7. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...

8. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...