Fixing Static HTML

an article added by: Albert Lichtblau at 06022007


In: Root » » HTML XHTML and CSS » Fixing Static HTML

French Spanish Portuguese Italian German Japanese Chinese Korean Russian Arabic

The Big Clean-Up: Fixing Static HTML (The Easy Part)

Why Convert Existing XHTML? Before we get started, you may be asking yourself "Why should I want to covert my Web site that works perfectly fine in HTML to this newfangled XHTML?" Most likely your curiosity is piqued if you've made it this far into the article. However, let me give you some answers to better soothe your fears and to justify to management why you should make the jump to XHTML. XHTML provides two main improvements over HTML:

- Cleaner structures (which make things such as dynamic HTML and styling a lot easier) - The ability to use tools developed for XML

Cleaner structures are basic conveniences that demonstrate how it's a lot easier for a browser to reference parts of a document reliably. Plus, they avoid some of the crazier cross-browser hopscotching caused by different interpretations of structures. (Guessing what a document author intended is harder when parts are missing or misplaced.) The ability to use XML tools opens the field to a lot of new possibilities from Scalable Vector Graphics for graphics and layout, to MathML for math, to SMIL for multimedia, to MyML for my stuff and "YourML" for your stuff and "HisML" and "HerML" and "TheirML." It's not just that developers will be able to extend the HTML vocabulary. They will be able to do it in controllable ways and in a context in which clients, servers, and peers can negotiate the kind of information they accept and make smarter decisions than are possible with today's browser and object sniffing techniques. XHTML is also much more reusable than HTML. If you need to convert your documents to Wireless Markup Language or some other presentation variant, you can use scripts and the DOM, Extensible Stylesheet Language Transformations (XSLT), or other XML-based tools to handle the conversion. You can move from one structure to another without having to read the markup byte by byte and guess what's supposed to happen. Finally, XHTML enables you to do things such as use XML repositories for your documents, opening up easier and more powerful referencing, fragmenting, and searching possibilities. Instead of storing your static documents as plain old files, XML repositories let you store your documents based on the XML structure. It's easier to have multiple people editing (and versioning) a document using such tools, and analyzing your Web site can become a lot easier. Some examples of situations in which converting is beneficial include: - Customers who want the older content delivered as XHTML so they can use XML processing tools on it. (Say you're converting a Web site to a print article, and they want to use XSLT to convert it to PDF under their control.) - New storage requirements. Somebody orders the site moved into an XML-aware content management system perhaps to support versioning of documents or for new functionality such as you can add by using Zope, a powerful Web applications builder. - Downtime (it happens once in a while). XHTML conversion can be a good, nonintrusive project for developers when they're stuck waiting for clients.

Starting with Your HTML Document Find a document authored in HTML. This document should be somewhat complex because you don't want to work with something that only has two lines beyond the basic HTML structure (at least not to use as your primary learning page). If you don't have a page in mind, you can find the HTML page used in this example at http://www.zotgroup.com/development/test-suites/xhtml10/htmlto- xhtml.html. Here's the code:

   <HTML>
   <HEAD>
   <TITLE>BK's Home Page</TITLE>
   </HEAD>
   <BODY bgcolor=#ffffff>
   <FONT color=red><H1  align=center>BK's Home Page</H1></FONT>
   This is the homepage of BK DeLong. Enjoy your  visit!<P>
   My EverQuest Friends:<BR>
   <A  href="http://www.attrition.org/eq/crimen/"><B>Crimen  Talionis</A></B><BR>
   MacIntyre<BR>
   Kiekre<BR>
   Krimzor<BR>
   Catya<P>
   <HR>
   B.K. DeLong<BR>
   <A  href="mailto:bkdelondg@zotgroudp.com">bkdelondg@zotgroudp.com</A>
   </BODY>
 </HTML>

Pretty nasty HTML. For beginners who didn't learn HTML with a background knowledge of SGML and thought it was used for designing pages, this is pretty typical of a person's first Web site. (I'm sure this is quite similar to my initial Web site.) Now clean it up. You're not going to clean up your HTML in the order that you build an XHTML file from scratch necessarily. Instead start with the easiest things to check first.

Step 1: All elements are lowercase The most tedious thing you have to do with your HTML files is to make sure that all the elements are lowercase. The only text that should be uppercase is your own content. This difficult task can be made easier by some of the authoring tools discussed toward the end of this article. Once you're finished, the file should look like this:

   <html>
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body bgcolor=#ffffff>
   <font color=red><h1  align=center>BK's Home Page</h1></font>
   This is the homepage of BK DeLong.  Enjoy your visit!<p>
   My EverQuest Friends:<br>
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br>
   MacIntyre<br>
   Kiekre<br>
   Krimzor<br>
   Catya<p>
   <hr>
   B.K. DeLong<br>
   <a  href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
   </html>
 

Step 2: All attribute values must have quotes This is another easy step, although it can be tedious. Spotting all non-quoted attributes can be tough, too. You may have to rely on the validation process to uncover the ones you can't find.

   <html>
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body  bgcolor="#ffffff">
   <font  color="red"><h1 align="center">BK's Home  Page</h1></font>
   This is the homepage of BK DeLong.  Enjoy your visit!<p>
   My EverQuest Friends:<br>
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br>
   MacIntyre<br>
   Kiekre<br>
   Krimzor<br>
   Catya<p>
   <hr>
   B.K. DeLong<br>
   <a href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
 </html>
You're getting there.

Step 3: All elements must end Now that you've completed some of the most basic cleanup of your HTML document, you need to start fixing the actual structure. A document isn't considered well formed or valid if elements that are started aren't closed. Cleanly authored XHTML (as with any markup language) makes it a lot easier to figure out what's going on in the document, so logically this is the next step of your cleanup. Basically, there are three elements in your HTML document that you need to address. The two easiest are the "empty" elements <hr> and <br>. All that you need to add to them is a space along with a backslash (/) so they look like this: <br /> . Is the space required? No, but many of the older browser versions view a <br/> as the text BR/ and not a <br> element. The browser doesn't know what a BR/ element is, but it recognizes it if a space separates the BR and the backslash. The other element that needs closure is the <p> element. The first rule of markup is that it's meant to denote context and not design or layout. So technically, the <p> element should not be at the end of the paragraph to create a carriage return. The <p> element should appear at the beginning of a paragraph with a </p> at the end. After making these changes, you now have better formed HTML with which to work.

   <html>
   <head>
   <title>BK's Home Page</title>
   </head>
   <body bgcolor="#ffffff">
   <font color="red"> <h1  align="center">BK's Home Page</h1></font>
   <p>This is the homepage of BK DeLong.  Enjoy your visit!</p>
   <p>
   My EverQuest Friends:<br />
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br />
   MacIntyre<br />
   Kiekre<br />
   Krimzor<br />
   Catya
   </p>
   <hr />
   B.K. DeLong<br />
   <a href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
 </html>

 

All elements must be in the right place You must remedy a few problems still remaining with this document. First, you cannot have a header element, <h1>, contained by a <font> element. It's not legal according to the XHTML 1.0 DTD (even the transitional one). To fix this, place the <font> element inside the <h1>:

 <h1  align="center"><font color="red">BK's Home  Page</font></h1>

While this is not required for making your document well formed or valid – because the <font> element is deprecated in favor of style sheets – you should change the line to read:

 <h1  style="text-align:center;color:red">BK's Home Page</h1>

You also need to fix the way you use the <b> element in the "My EverQuest Friends" list. Most people know by now that you can't have an element overlap another element in XHTML. The line in question currently reads like this:

 <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br />

You should modify it to look like this:

 <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</b></a><br />

Again, the <b> element deals with the style of text. While it's not required, I recommend changing the <b> element to a <strong> element that better denotes its context. If you want to change its style inlined, then the line should look like this:

   <a  href="http://www.attrition.org/eq/crimen/"  style="font-weight:bold">Crimen Talionis</a><br />
 

Adding XHTML declarations and definitions At this point, your core document structure is as XHTMLized as possible. Now you need to add the headers that identify this document as XHTML instead of HTML. First, you need to declare the XHTML namespace. As you may remember from previous articles, you do this by adding the xmlns attribute to the <html> element:

 <html  xmlns="http://www.w3.org/1999/xhtml">

Next, you need to make sure you declare the correct DOCTYPE for XHTML. Are you using strict, frameset, or transitional? In this case, I think you should stick with transitional – especially if you keep the <b> and <font> elements:

   <!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Last, and this is not required of an XHTML document, add in the XML declaration. I strongly encourage you to put this into all XHTML documents so you keep in the habit of using it with other XML documents. The only issue worth mentioning with this declaration is that it may appear in older browsers.

 <?xml  version="1.0"?>

Your final XHTML document should look like this now:

   <?xml  version="1.0"?>
   <!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html  xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body  bgcolor="#ffffff">
   <h1 align="center"  style="color:red">BK's Home Page</h1>
   <p>This is the homepage of BK  DeLong. Enjoy your visit!</p>
   <p>
   My EverQuest Friends:<br />
   <a  href="http://www.attrition.org/eq/crimen/"  style="font-weight:bold">Crimen Talonis</a><br />
   MacIntyre<br />
   Kiekre<br />
   Krimzor<br />
   Catya
   </p>
   <hr />
   B.K. DeLong<br />
   <a  href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
   </html>
 

Note Notice that the DOCTYPE declaration in the beginning of the document is uppercase. This is allowable because it isn't actually "part of the document." However the "html" identifier in the DOCTYPE declaration should be in lowercase to correspond with the <html>.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

2. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

3. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

4. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

5. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

6. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

7. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

8. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...

9. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...

10. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...