In: Categories » » HTML XHTML and CSS » Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't likely to be useful for a while, at least without some kind of conversion process that allows the enormous amount of legacy HTML to enter in some form.
Developers who want to build XHTML processors can get started with the wide variety of tools available from XML sources. Parsers, various kinds of processors, integration with databases and object structures, transformation engines, and more are often available as open source. Building XHTML applications generally involves integrating tools and making them meet your needs – more so than starting from scratch to build a piece of software that understands everything about XHTML. While the legacy HTML problem remains daunting for now, the tools and techniques discussed in the chapters to follow help you get over those hurdles and enable you to start applying these kinds of techniques to your daily Web site work. As XHTML becomes more widespread, vendors hopefully will provide many of the tools just described to enable you to work more efficiently without having to build your own tools.
Tip If you need to track down XML development tools and software, try http://www.xmlsoftware.com. For news on the latest emerging tools, go to http://www.xmlhack.com. For coverage of XML application design, read Building XML Applications by Simon St. Laurent and Ethan Cerami (McGraw-Hill, 1999) or XML and Java, by Hiroshi Maruyama, Kent Tamura, and Naohiko Uramoto (Addison-Wesley, 1999).
Understood Omissions: Leaving Out Endings HTML picked up a convenient trick from SGML: enabling developers to leave out end tags in many cases. This trick works best when it's obvious that one element can't contain another and must end before the second element starts. For example, it doesn't make sense for one paragraph to contain another paragraph. This means that the beginning of a new paragraph is treated as the end of any previous paragraph mark. For example,
<p>
As more and more people create vocabularies, a certain amount of standardization will no doubt emerge, based on the convenience factor it promises. While mapping information between schemas may not be terribly difficult, common vocabularies promise to reduce the need to do such work at all. Rather than starting with a complete vocabulary, however, a distributed approach would let people build their own vocabularies and gradually map their intersections into 'suggested' conventions.</p> <p>While this approach might take longer than an expert community developing standards, it might also better reflect the needs of all involved. Experts might well have a role in exploring intersections and developing solutions that will be optimal, for a time, but the point is to leave final decision making with users rather than strapping them into a straitjacket someone else built.
</p> <p>...
The italicized end tags for the paragraphs (</p>) are optional so the browser treats them as being there whether or not they actually appear. (Sometimes browsers present information slightly differently depending on the details of the markup.) The same thing happens within lists, as shown here:
<ul> <li>bananas</li> <li>apples</li> <li>oranges</li> <li>persimmons</li> </ul>
Although paragraphs and lists are fairly simple cases, similar things happen through HTML in most browsers – despite subtle variations in the rules for interpreting them. The following code adds an open b element, which appears in the third line of code (but is never closed).
<html> <body> <p>Hello! <b>This is a stickup! <p>Hand over all your money. <h2>I mean it!</h2> <p>Thank you for your time. </body></html>
Tip Although it hasn't taken the world by storm, the Amaya browser is an incredibly useful tool for learning how the W3C sees the world. While Amaya hasn't implemented W3C specifications completely, it sticks much closer to the letter of the spec than any of its commercial competitors and is driven by the W3C's agenda. It also now supports XHTML — the first browser to do so. You can find out more about Amaya at http://www.w3.org/Amaya/. Developers who rely on HTML browsers to fill in their end tags have encountered these kinds of issues for a while. Making dynamic HTML work (even in a single browser) sometimes requires cleaning up documents to clarify their structure; style sheets that rely on document structure to apply formatting often have similar problems. Still, letting the browser figure out where an element ends is a common (and successful) practice and it is built into HTML tools of all shapes and sizes.
Note Some HTML browsers took advantage of the loose structure of HTML to produce special effects. For instance, Netscape enabled developers to flash background colors using multiple BODY opening tags that specify different colors. Most of these effects aren't in common use any more, and some of them were declared bugs. Generally, scripting techniques that accomplish pretty much the same things in more structured ways replaced them.
Overlaps Most HTML browsers do more than just close your tags automatically; they also support more complex markup such as overlapping tags. Structures like the following one are common in HTML documents, often produced by tools as well as hand-coding.
Abbreviated Attributes HTML supports a feature from SGML that enables document creators to include the name of an attribute without any value. This feature exists even in the "strict" version of HTML 4.0. For example, the checked and disabled attributes of checkboxes (or any input component) allows:
<input type="checkbox" checked disabled>
HTML 4.0's transitional version (and most browsers) also supports a compact attribute for list items:
<li compact>Squeezed tight!</li>
Even though no value is provided for these attributes, browsers note their existence. (It actually doesn't matter which value you provide!) If a compact attribute appears at all, the browser displays the list item in a more compact form. HTML also enables developers to omit quotes around attribute values. While the quotes are necessary for values that contain spaces, they aren't required for other values. You also can write the input element just shown like this:
<input type=checkbox checked disabled>
Multiple Names There are two separate mechanisms within HTML for identifying particular elements. The first, which comes from HTML's hyperlinking within documents, uses the A element and a NAME attribute to identify a position in a document:
<A NAME="Section1_1"><H2>1.1 Conformance</H2></A>
The second flavor of identification, used most frequently in dynamic HTML implementations, uses ID attributes on elements to identify them to scripts:
<H2 ID="Section1_1">1.1 Conformance</H2>
While both of these attributes identify content within documents, they remain separate pieces in HTML. This enables hypertext link managers and script developers to stay out of each other's way.
Tag Soup HTML browsers typically ignore any elements or attributes that they don't recognize. This makes the development of new versions of HTML much simpler because older browsers don't have problems digesting new code. At the same time, it enables browser vendors to modify the language. They can add new features such as BLINK, MARQUEE, and LAYER without fearing that they might set off catastrophic problems for users of other browsers. While these vendor-centric creations may cause Web designers heartburn, the general rule that browsers ignore mysterious tags makes it possible to create cross-browser solutions that work even for complex problems (like the wild variations between dynamic HTML as proposed by Netscape and Microsoft). This feature also enables Microsoft to create XML data islands within HTML documents, storing information in a non-HTML vocabulary within an HTML document without fearing serious problems in browsers. This is probably the most extreme case of HTML extension, but fortunately its side effects in older legacy tools are fairly minimal. (Its effects on future browsers will likely be much more complicated.)
Extending the Browser HTML presentation remains the core of Web browser functionality, although scripting has become an important component of that presentation. Developers who need more capabilities than what HTML+ scripting can provide have to extend the browser. Java applets are one solution, plug-ins another, ActiveX components one more, and helper applications still another. Integrating these tools with HTML can be difficult because there isn't really a way to express the information they need through HTML, except as a series of name-value parameters. The following examples show one style of parameter passing:
<APPLET CODE="StockGraph.class" WIDTH=100 HEIGHT=100> <PARAM NAME="company" value="T"> <PARAM NAME="startDate" value="12221999"> <PARAM NAME="endDate" value="01302000"> <PARAM NAME="scale" value="60"> <PARAM NAME="style" value="tradLine"> </APPLET> or: <object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://active.macromedia.com/flash2/cabs/swflash.cab#version=3,0,0 ,0" id="OpenFlash" width="200" height="200" name="Flash Demo" align="left"> <param name="movie" value="FlashDemo.swf"> <param name="quality" value="high"> <param name="bgcolor" value="#000000"> </object>
The following excerpt illustrates the approach taken by many extensions: using HTML only to set up the presentation of the incoming content, but then referencing an external file that contains all the information the extension needs rather than providing it through the HTML.
<APPLET CODE="LinkMap.class" WIDTH=405 HEIGHT=257> <PARAM NAME="src" value="maps/map1.xml"> </APPLET>
HTML itself provides just enough room to support these kinds of extensions, although developers find plenty of ways around its limitations.
Creative Comments There are a few cases in which HTML's "ignore tags you don't understand" approach can't prevent conflicts with newer flavors of content. Browser developers have had to improvise to support these cases, and thus have found a few tricks to avoid the problems. The main issue surfaced when JavaScript appeared, using < to mean "less than" instead of "markup tag starts here." To keep browsers from displaying scripts on pages and tripping over < signs, developers use comments to hide scripts as shown here:
<SCRIPT LANGUAGE="JavaScript"> <!-- Hide This Code From Non-JS Browsers
document.writeln("<P>See? This was created today! </P>"); var today = new
Date(); // Use today's date var text= "Today is " + (today.getMonth() + 1) +
"/" + today.getDate() + "/" +today.getFullYear()+".";
document.writeln(text); //--> </SCRIPT>
JavaScript ignores lines that begin with an HTML comment opener, <!--; the closing of the comment is hidden from JavaScript with a JavaScript comment, //. Older browsers just interpret the contents of the SCRIPT element as one large comment and they don't display any of the material inside. Newer browsers understand that the comment is "just kidding" and they process the JavaScript properly. Similar tactics are used often with STYLE elements when they contain style sheets directly:
<STYLE TYPE="text/css"><!--
H1 {font-family: Arial, Helvetica; font-weight: bold; font-size: x-large; color: red}
H2 {font-family: Arial, Helvetica; font-weight: bold; font-size: large; color: blue}
H3 {font-family: Arial, Helvetica; font-weight: bold; color: green}
A:link {color: red}
A:visited {color:lime}
A:active {color:yellow}
H1 B {color:purple}
H1.black {font-family: serif; color: black}
H3#freaky {font-family: serif; color: aqua}
/* End of stylesheet */--></STYLE>
Browsers that support cascading style sheets ignore the comments, while other browsers treat the style
sheet as a comment and politely ignore it.
Validate? Why? The W3C has spent a fair amount of time (with some success) trying to convince developers to check their pages against the standard. Many HTML documents are prefixed now with a DOCTYPE declaration similar to:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
The DOCTYPE declaration points the browser (and other processors) to the formal SGML document type definition for HTML. HTML 4 actually has three different document types; the preceding declaration points to the "strict" version, which is probably the least used in practice. While most browsers don't use validation, the W3C does provide a service that checks your documents for conformance (go to http://validator.w3.org/). There's even an icon that you can put on your pages after you validate them to let the world know you're paying close attention to the spec. Validation, if used consistently, can help developers ensure that their pages conform to the specification. However, it doesn't do much to solve the problems of clients who tend to see documents from the same point of view as users – as a particular rendering in a specific browser. If making it look right (or simply consistent across implementations) is a more important requirement than conforming to an abstract specification, then validation isn't going to receive high priority. Browsers aren't concerned about validation and so they support all kinds of possibilities that fall well outside the rules for validity. Thus, validation isn't a high priority for most Web developers. With XHTML, that will change.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
Overview Hypertext Markup Language (HTML) is getting an enormous and overdue cleanup. Much of HTML's early charm as browsers reached a wide audience was the ease of use created by browser tolerance for a wide variety of syntactical variations and unknown markup. Unfortunately, that charm has worn thin through years of "browser wars" and demands for new features that go beyond presenting documents. The World Wide Web Consortium (W3C) is rebuilding HTML on a new foundati...
2. HTML and XHTML Application Possibilities
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...
3. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...
4. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...
5. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
6. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
7. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
8. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...
9. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...
