In: Categories » » HTML XHTML and CSS » Using XHTML in Traditional HTML Applications
Before moving into the much more complicated terrain of converting older HTML content to the newer XHTML rules, let's take a look at how the shift to XHTML affects day-to-day Web development and the construction of new content. Web development has been in nearly constant flux since its beginnings, and developers are accustomed to (if perhaps tired of) the challenges that come with every new standard and every new browser. Some of the challenges XHTML presents are familiar, although a few new twists brought on by XHTML's view of processing beyond the browser make new demands.
Lessons from Previous Technology Shifts XHTML is the latest in a long line of technologies that have changed the way the Web works. Some of those technologies have bounced off the Web, proving too complex or too finicky for Web developers to use easily or reliably. Other technologies – such as in-line images (presented as part of the document), image maps, JavaScript, cookies, and dynamic HTML – have become part of the mainstream of Web development. All of these technologies have had some bumps as they entered the market, largely caused by the widespread use of older software, but the solutions they presented solved enough real problems for developers to use them. As newer software has appeared and become more widespread, compatibility issues have eased – although new issues continue to emerge. For developers, however, the authoring tools and viewers are only part of the process. JavaScript drove a lot of people who focused on making their sites look good into some degree of programming, while cascading style sheets has proven important to developers using dynamic HTML. Dynamic HTML has forced designers to coordinate their work with developers creating scripts; over time, the skill set for building Web sites has broadened considerably. Fortunately, XHTML 1.0 is a relatively nondisruptive technology. It makes certain older technologies, notably Cascading Style Sheets and dynamic HTML, easier to use reliably. It does require some minor tweaks in authoring tools and some style changes for those who hand-code their own pages. As a management challenge, XHTML 1.0 should rate very low on the difficulty scale. Developer retraining shouldn't require more than presenting a fairly short list of guidelines and insisting that authors test their pages – validate them – before making them public. Even in the early stages of XHTML 1.0 adoption when browsers haven't learned yet to treat XHTML as anything different than HTML, XHTML 1.0 likely will provide more cultural problems (You mean I really have to add all those end tags, and use lower case?) than technical difficulties.
Making Certain Nothing Looks Different (to the User) Based on the testing done in that article, describing the functionality subset considered safe isn't very difficult. Scripts and style sheets are stored best outside of the document. XHTML's shift from hiding scripts in comments (<!-- -->) to hiding scripts in CDATA sections (<![CDATA[ ]]>) creates problems for even the latest browser releases. Admittedly, Netscape 3.0 had some trouble with a script file referenced from a script element inside the head element, but it does better with script elements that appear inside the body element. Style sheets stored in document head elements caused problems in some older browsers; but in the very worst case, external style sheets were ignored. Storing scripts and style sheets in external files has additional advantages because it becomes much easier to use them in multiple pages, even across a site, and make changes to all of them from a single location. Editors and other tools can focus on a particular syntax, rather than dealing with three or four different systems at once.
The other XHTML-specific issue that has caused a lot of trouble, even for newer browsers, is the XML declaration. The XML declaration is critical in certain cases for XML parsing, but it goes unused by HTML browsers. If you choose to leave XML declarations off your XHTML document, especially if you go so far as to prohibit their use, you should note the encoding problems that working with an encoding declaration may cause for XML parsers (as described in the next section of this article). Apart from these significant compatibility issues, the rest of the guidelines for using XHTML in an HTML production environment flow in two general streams: enforcing the syntactical restrictions of XHTML and choosing a strategy regarding which document type definition to use for documents. The syntactical part isn't so difficult. The following list provides a quick start:
- Every document must have a DOCTYPE declaration before its html element, and the document must be validated successfully against the type specified in that declaration. - The html element must include the xmlns attribute with the XHTML namespace declaration. - All element and attribute names must be in lowercase. - Every start tag (<name>) must have an end tag (</name>). - All empty elements – such as hr, br, and img – must be represented using syntax such as <name></name> or <name />. - All attribute values must be enclosed in single or double quotes. This part of the explanation isn't so difficult, even when there are additional issues such as XHTML 1.0's rules for transitioning from name to id attributes and supplementing the lang attribute with xml:lang. It's a good idea to encourage developers to talk about elements rather than tags in an effort to clarify that structure is more important than markup, but generally XHTML 1.0's syntactical policing isn't that difficult to enforce. The harder choices come from XHTML 1.0's insistence on the inclusion of the DOCTYPE declaration and its provision of three different DTDs. While the transitional DTD maximizes compatibility with older HTML and reflects the way most authoring tools create HTML, it's fairly clear that the long-term future of XHTML in the W3C's vision is that of the strict DTD. XHTML 1.1 and its likely successors base their primary form on the strict DTD, though their modularization includes (unused) support for frames and other features. While developers can add their own modules to a driver file to support the functionality in the transitional and frameset DTDs – it's likely that someone will, if not the W3C – those modules aren't likely to receive any kind of official blessing.
Note How much does "official blessing" matter in this case? Past experience suggests that browser makers support features when it seems convenient to them, not because the W3C says so. The more standards-friendly approach of the Mozilla project may change this, but it's unlikely that the mainstream of browser development will purge frame or font elements anytime soon.
Tip For a vision of a small HTML that goes well beyond the strict DTD, see the W3C's XHTML Basic at http://www.w3.org/TR/xhtml-basic/. You should keep in mind that pure XHTML Basic implementations likely will be used only in environments with very limited processing such as appliances, personal digital assistants (PDAs), and cell phones. If your developers feel constrained by the strict DTD, you can use this as a handy rhetorical device to demonstrate that a smaller subset is possible.
If your organization or your site supports a wide variety of different HTML approaches, you may find it easiest not to make a decision and simply require that document authors choose a given DTD and apply it on a per-document basis. This provides maximum flexibility and enables Web developers to transition at their own pace, without forcing them to change their vocabulary as well as their syntax. If you plan to take advantage of the W3C's ongoing XHTML development, however, you may find it easiest to stay within the confines of the strict DTD. The capabilities provided by Cascading Style Sheets more than make up for the formatting information provided by the transitional DTD. Developers have to learn CSS, or perhaps use a standardized style sheet across all of their documents, imposing some additional costs on organizations in which CSS isn't already in widespread use. While switching to the strict DTD may cause some differences in appearance from pages created using other forms of HTML or XHTML, you can control those differences using CSS.
Tip If switching to the strict DTD sounds impossible because your site uses frames, you can use the frameset DTD exclusively for defining framesets and rely on the strict DTD for all of the documents inside of those frames. The Strict DTD doesn't include the target attribute, which may limit how well your framesets work, however.
Supporting the Widest Possible Base in XML While making good use of existing HTML skills and browsers is important, making your documents acceptable to XML parsers is the other half of the XHTML transition. In the short term, minimizing transition costs for older HTML software is a worthy goal; but you should ensure that your documents do in fact make the transition to XML. As previously noted, the XML declaration presents problems for many older browsers – but it is critical for XML processing of documents in many commonly used encodings. Dropping this declaration may keep your documents from being processed by search engines, stored in document repositories, or even read by users of XML-based clients.
Although HTML does use it, the encoding declaration in the XML declaration is critically important to XML parsing. In fact, the encoding declaration is important for all cases in which non-Unicode character encodings are used. XML parsers should be capable of auto-detecting the UTF-8 (which includes basic ASCII) and UTF-16 encodings, but they may not be capable of detecting other commonly used encodings such as ISO-Latin-1 and Shift-JIS. This means that leaving off the XML declaration requires you to store documents in UTF-8 or UTF-16 if interoperability with XML parsers is important. Some tools can create and manage documents in these encodings, while others can't.
Note It's a safe bet that any XML-oriented or Java-based tool can handle UTF-8 and UTF-16 character encodings. Other programs and environments may vary. Until Unicode support becomes more widespread, you may find it worth your effort to explore some strategies to ensure your XHTML is acceptable to XML parsers. If you can stand to have the declaration appear at the top of the page in some browsers (especially if you work in an environment that doesn't use those problem browsers), keeping the declaration is a good idea. If you work with pure ASCII documents, they can pass as UTF-8 and the declaration isn't required. Users of the Latin-1 character set can replace all of the Latin-1 characters that aren't in ASCII with their equivalents in the Latin-1 entity set – all of HTML's built-in entities remain available. Users of other character encodings are faced with converting their documents to UTF-8 or UTF-16, or using numeric character references throughout their documents – not an especially readable or efficient approach, but one that does have the virtue of reliability.
Tip XHTML developers working with Asian character encodings, particularly Chinese, may want to visit Academica Sinica's Chinese XML Now! Web site at http://www.ascc.net/xml/. The site includes a Frequently Asked Questions list and a section on XHTML. In addition, it has a version of the Tidy XHTML clean-up program customized to work with Chinese and Japanese encodings. If your documents are generated dynamically, you also may be able to check the software requesting documents and add or leave out the declaration on a case-by-case basis. It requires extra processing, but it supports the widest possible range of both XML and HTML clients and tools.
Balancing Needs and Retraining If you're reading this article because you need to apply XHTML to your own one-person projects, you probably already made some decisions about the compatibility trade-offs. If you're using XHTML as part of a larger project, the decision-making process is likely to be a lot more difficult because different participants with different needs have very different perspectives about the usefulness of these tradeoffs. Moving from HTML to XHTML 1.0 involves changing some habits and looking more closely at features such as character encodings, which most developers take for granted. Some Web designers may chafe at the syntactical restrictions imposed by XHTML, while others (particularly those who work with dynamic HTML and cascading style sheets) may observe many of the restrictions already. Spreading the gospel of XHTML isn't always easy, especially at this early stage when tools (even XML-oriented tools) are far more HTML-oriented than XHTML-oriented.
If you decide that the potential of XML is worth the trouble of making some changes, make certain that those changes are explained to everyone involved in your Web development organization. With luck, an explanation of the benefits and the direction the W3C is taking HTML can give developers more motivation to change their habits. However, you may find it necessary to make all the accommodations possible – notably using the transitional DTD and leaving off the XML declaration – to keep HTML developers comfortable with the brave new world of XHTML.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...
2. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...
3. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...
4. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
5. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
6. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
7. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...
