Fixing Static HTML

an article added by: Albert Lichtblau at 06022007



In: Categories » » HTML XHTML and CSS » Fixing Static HTML

The Big Clean-Up: Fixing Static HTML (The Easy Part)

  

Why Convert Existing XHTML? Before we get started, you may be asking yourself "Why should I want to covert my Web site that works perfectly fine in HTML to this newfangled XHTML?" Most likely your curiosity is piqued if you've made it this far into the article. However, let me give you some answers to better soothe your fears and to justify to management why you should make the jump to XHTML. XHTML provides two main improvements over HTML:

- Cleaner structures (which make things such as dynamic HTML and styling a lot easier) - The ability to use tools developed for XML

Cleaner structures are basic conveniences that demonstrate how it's a lot easier for a browser to reference parts of a document reliably. Plus, they avoid some of the crazier cross-browser hopscotching caused by different interpretations of structures. (Guessing what a document author intended is harder when parts are missing or misplaced.) The ability to use XML tools opens the field to a lot of new possibilities from Scalable Vector Graphics for graphics and layout, to MathML for math, to SMIL for multimedia, to MyML for my stuff and "YourML" for your stuff and "HisML" and "HerML" and "TheirML." It's not just that developers will be able to extend the HTML vocabulary. They will be able to do it in controllable ways and in a context in which clients, servers, and peers can negotiate the kind of information they accept and make smarter decisions than are possible with today's browser and object sniffing techniques. XHTML is also much more reusable than HTML. If you need to convert your documents to Wireless Markup Language or some other presentation variant, you can use scripts and the DOM, Extensible Stylesheet Language Transformations (XSLT), or other XML-based tools to handle the conversion. You can move from one structure to another without having to read the markup byte by byte and guess what's supposed to happen. Finally, XHTML enables you to do things such as use XML repositories for your documents, opening up easier and more powerful referencing, fragmenting, and searching possibilities. Instead of storing your static documents as plain old files, XML repositories let you store your documents based on the XML structure. It's easier to have multiple people editing (and versioning) a document using such tools, and analyzing your Web site can become a lot easier. Some examples of situations in which converting is beneficial include: - Customers who want the older content delivered as XHTML so they can use XML processing tools on it. (Say you're converting a Web site to a print article, and they want to use XSLT to convert it to PDF under their control.) - New storage requirements. Somebody orders the site moved into an XML-aware content management system perhaps to support versioning of documents or for new functionality such as you can add by using Zope, a powerful Web applications builder. - Downtime (it happens once in a while). XHTML conversion can be a good, nonintrusive project for developers when they're stuck waiting for clients.

Starting with Your HTML Document Find a document authored in HTML. This document should be somewhat complex because you don't want to work with something that only has two lines beyond the basic HTML structure (at least not to use as your primary learning page). If you don't have a page in mind, you can find the HTML page used in this example at http://www.zotgroup.com/development/test-suites/xhtml10/htmlto- xhtml.html. Here's the code:

   <HTML>
   <HEAD>
   <TITLE>BK's Home Page</TITLE>
   </HEAD>
   <BODY bgcolor=#ffffff>
   <FONT color=red><H1  align=center>BK's Home Page</H1></FONT>
   This is the homepage of BK DeLong. Enjoy your  visit!<P>
   My EverQuest Friends:<BR>
   <A  href="http://www.attrition.org/eq/crimen/"><B>Crimen  Talionis</A></B><BR>
   MacIntyre<BR>
   Kiekre<BR>
   Krimzor<BR>
   Catya<P>
   <HR>
   B.K. DeLong<BR>
   <A  href="mailto:bkdelondg@zotgroudp.com">bkdelondg@zotgroudp.com</A>
   </BODY>
 </HTML>

Pretty nasty HTML. For beginners who didn't learn HTML with a background knowledge of SGML and thought it was used for designing pages, this is pretty typical of a person's first Web site. (I'm sure this is quite similar to my initial Web site.) Now clean it up. You're not going to clean up your HTML in the order that you build an XHTML file from scratch necessarily. Instead start with the easiest things to check first.

Step 1: All elements are lowercase The most tedious thing you have to do with your HTML files is to make sure that all the elements are lowercase. The only text that should be uppercase is your own content. This difficult task can be made easier by some of the authoring tools discussed toward the end of this article. Once you're finished, the file should look like this:

   <html>
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body bgcolor=#ffffff>
   <font color=red><h1  align=center>BK's Home Page</h1></font>
   This is the homepage of BK DeLong.  Enjoy your visit!<p>
   My EverQuest Friends:<br>
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br>
   MacIntyre<br>
   Kiekre<br>
   Krimzor<br>
   Catya<p>
   <hr>
   B.K. DeLong<br>
   <a  href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
   </html>
 

Step 2: All attribute values must have quotes This is another easy step, although it can be tedious. Spotting all non-quoted attributes can be tough, too. You may have to rely on the validation process to uncover the ones you can't find.

   <html>
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body  bgcolor="#ffffff">
   <font  color="red"><h1 align="center">BK's Home  Page</h1></font>
   This is the homepage of BK DeLong.  Enjoy your visit!<p>
   My EverQuest Friends:<br>
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br>
   MacIntyre<br>
   Kiekre<br>
   Krimzor<br>
   Catya<p>
   <hr>
   B.K. DeLong<br>
   <a href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
 </html>
You're getting there.

Step 3: All elements must end Now that you've completed some of the most basic cleanup of your HTML document, you need to start fixing the actual structure. A document isn't considered well formed or valid if elements that are started aren't closed. Cleanly authored XHTML (as with any markup language) makes it a lot easier to figure out what's going on in the document, so logically this is the next step of your cleanup. Basically, there are three elements in your HTML document that you need to address. The two easiest are the "empty" elements <hr> and <br>. All that you need to add to them is a space along with a backslash (/) so they look like this: <br /> . Is the space required? No, but many of the older browser versions view a <br/> as the text BR/ and not a <br> element. The browser doesn't know what a BR/ element is, but it recognizes it if a space separates the BR and the backslash. The other element that needs closure is the <p> element. The first rule of markup is that it's meant to denote context and not design or layout. So technically, the <p> element should not be at the end of the paragraph to create a carriage return. The <p> element should appear at the beginning of a paragraph with a </p> at the end. After making these changes, you now have better formed HTML with which to work.

   <html>
   <head>
   <title>BK's Home Page</title>
   </head>
   <body bgcolor="#ffffff">
   <font color="red"> <h1  align="center">BK's Home Page</h1></font>
   <p>This is the homepage of BK DeLong.  Enjoy your visit!</p>
   <p>
   My EverQuest Friends:<br />
   <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br />
   MacIntyre<br />
   Kiekre<br />
   Krimzor<br />
   Catya
   </p>
   <hr />
   B.K. DeLong<br />
   <a href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
 </html>

 

All elements must be in the right place You must remedy a few problems still remaining with this document. First, you cannot have a header element, <h1>, contained by a <font> element. It's not legal according to the XHTML 1.0 DTD (even the transitional one). To fix this, place the <font> element inside the <h1>:

 <h1  align="center"><font color="red">BK's Home  Page</font></h1>

While this is not required for making your document well formed or valid – because the <font> element is deprecated in favor of style sheets – you should change the line to read:

 <h1  style="text-align:center;color:red">BK's Home Page</h1>

You also need to fix the way you use the <b> element in the "My EverQuest Friends" list. Most people know by now that you can't have an element overlap another element in XHTML. The line in question currently reads like this:

 <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</a></b><br />

You should modify it to look like this:

 <a  href="http://www.attrition.org/eq/crimen/"><b>Crimen  Talionis</b></a><br />

Again, the <b> element deals with the style of text. While it's not required, I recommend changing the <b> element to a <strong> element that better denotes its context. If you want to change its style inlined, then the line should look like this:

   <a  href="http://www.attrition.org/eq/crimen/"  style="font-weight:bold">Crimen Talionis</a><br />
 

Adding XHTML declarations and definitions At this point, your core document structure is as XHTMLized as possible. Now you need to add the headers that identify this document as XHTML instead of HTML. First, you need to declare the XHTML namespace. As you may remember from previous articles, you do this by adding the xmlns attribute to the <html> element:

 <html  xmlns="http://www.w3.org/1999/xhtml">

Next, you need to make sure you declare the correct DOCTYPE for XHTML. Are you using strict, frameset, or transitional? In this case, I think you should stick with transitional – especially if you keep the <b> and <font> elements:

   <!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.0 Transitional//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Last, and this is not required of an XHTML document, add in the XML declaration. I strongly encourage you to put this into all XHTML documents so you keep in the habit of using it with other XML documents. The only issue worth mentioning with this declaration is that it may appear in older browsers.

 <?xml  version="1.0"?>

Your final XHTML document should look like this now:

   <?xml  version="1.0"?>
   <!DOCTYPE html PUBLIC  "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html  xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <title>BK's Home  Page</title>
   </head>
   <body  bgcolor="#ffffff">
   <h1 align="center"  style="color:red">BK's Home Page</h1>
   <p>This is the homepage of BK  DeLong. Enjoy your visit!</p>
   <p>
   My EverQuest Friends:<br />
   <a  href="http://www.attrition.org/eq/crimen/"  style="font-weight:bold">Crimen Talonis</a><br />
   MacIntyre<br />
   Kiekre<br />
   Krimzor<br />
   Catya
   </p>
   <hr />
   B.K. DeLong<br />
   <a  href="mailto:bkdelong@zotgroup.com">bkdelong@zotgroup.com</a>
   </body>
   </html>
 

Note Notice that the DOCTYPE declaration in the beginning of the document is uppercase. This is allowable because it isn't actually "part of the document." However the "html" identifier in the DOCTYPE declaration should be in lowercase to correspond with the <html>.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Fixing Static HTML  
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. HTML and XHTML Application Possibilities
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...

2. Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...

3. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

4. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

5. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

6. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

7. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...