In: Categories » » HTML XHTML and CSS » XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure
Overview
XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents.
Note If reading this chapter makes you groan with pain about the amount of work this transition involves, don't panic. I devote much of the rest of this book to making these changes easy and (where possible) automated. Some of the choices the XHTML team made may not be to your liking, but you can adjust to most of them fairly easily. (I even learned to accept lowercase markup after years of protest.)
Cleaning up HTML The issues described in this section are changes you can make in existing HTML without knowing about any of the new features introduced by XML. For the most part, the cleanup dominates the transition to XHTML 1.0. While some of these issues may require developers to rethink the way they create documents, they generally don't cause problems for older browsers.
Case matters XML is case-sensitive and it treats IMG and img as two different element names entirely. In large part, this is because XML supports a much wider set of characters than most HTML implementations. Also, many languages either don't have case or they follow different sets of rules for how case works. As a result, the W3C settled on a single standard for XHTML markup. They chose lowercase for all element and attribute names, and anything that purports to be XHTML must use lowercase. The same applies to all attribute values in which choices are provided. For example, in regular HTML 4.0, you can include this code in a form:
<INPUT TYPE="TEXT" VALUE="Singing Fish"></INPUT>
To represent the same item in XHTML, you have to change the case of almost the entire element:
<input type="text" value="Singing Fish"></input>
The element name is now in lowercase, as are the attribute names. The type attribute value changes to lowercase as well because it represents an option chosen from a list of possibilities. The value attribute's content, however, can appear in whatever case is appropriate – it only represents the default value for the text, not a particular choice an XHTML browser needs to understand.
Clean (and explicit) element structures HTML browsers have never been picky about element structures, but that will change with the advent of XHTML. HTML documents are supposed to have a structure like that shown here:
<html> <head>...</head> <body>...</body> </html>
Most browsers don't enforce this structure, however. Browsers display fragments quite happily – with or without html, head, and body tags. In XHTML, you must provide this basic framework and put content in only the body element.
Empty elements XML has a slightly different syntax for empty elements – elements that don't contain other elements or text – than did HTML, and XHTML requires further change. In HTML, a normal start tag represents empty elements:
<img src="mypic.gif">
In XHTML, you need to add a slash to the end of the tag:
<img src="mypic.gif" />
The space before the slash isn't necessary, but it keeps some older browsers from displaying the slash on the page. The same guideline applies to horizontal rule and line break end tags, which you should enter as:
<br /> <hr />
Note You can also write empty elements as <br></br>, with no whitespace between the start and end tags, but this tends to confuse older browsers.
Quoting and expanding attribute values XHTML makes more demands on attribute formatting than does HTML. The most obvious change is that all attribute values – whether or not they contain spaces, their content is text or numbers, or they reflect a choice from a list or a more free-form approach – must be surrounded by quotes. The programmer still has one option: you can use single quotes or double quotes as you like, provided that you start and end with the same kind of quote. This means that the following examples are both legal XHTML:
img src="mypic.gif"> img src='mypic.gif'>
Despite this leniency, XHTML does require that all attributes have values. The mere existence of an attribute name is no longer enough. This HTML:
input type="checkbox" checked disabled> must become this XHTML: input type="checkbox" checked="checked" disabled="disabled" />
and this HTML:
compact>Squeezed tight!</li>
must become this XHTML:
compact="compact">Squeezed tight!</li>
XHTML has one other important attribute "gotcha." While HTML allows the use of ampersands within attribute values – they're common in URI query strings, for instance – XHTML requires that you use an entity (&) in place of ampersands. The HTML form:
a href="http://www.simonstl.com/example/test.jsp?name=Simon&birthday=1125&hair color=brown">Birthday link</li>
must become this XHTML form:
a href="http://www.simonstl.com/example/test.jsp?name=Simon&birthday=1125& amp;haircolor=brown">Birthday link</li>
Unique identifiers The conflict between NAME and ID described previously was resolved in favor of ID (although now it's id). The XHTML specification describes NAME as deprecated – a limbo that enables developers to use the attribute but suggests a short lifespan. Deprecated elements do survive in HTML browsers for the most part, but it's unclear if XHTML will treat deprecation and eventual removal from the spec more seriously. In XHTML 1.0, you can create identifiers in two ways. The first way is simpler, but it loses backward compatibility:
<a id="Section1_1"><h2>1.1 Conformance</h2></a>
The second way looks like unnecessary duplication, but it works for both HTML and XHTML browsers:
<a id="Section1_1" name="Section1_1"><h2>1.1 Conformance</h2></a>
In the long term, shifting to ids will make it simpler to integrate XHTML with the new tools for hypertext linking that are emerging in the XML world. It also will encourage consistency in existing projects such as dynamic HTML by making it easier to apply Cascading Style Sheets and the Document Object Model. The change to XHTML brings with it one additional shift for identifiers. They now have to start with a letter, underscore, or colon, and may consist of letters, digits, underscores, colons, hyphens, and periods. Spaces are no longer permitted, for example.
Validation and reliability XHTML 1.0 makes validation an important part of processing. While it doesn't require validating XHTML 1.0 processors, it does require that strictly conforming XHTML documents must be valid XML documents and conform to one of the three document type definitions (DTD) the W3C created for XHTML. The DTDs are the core of the XHTML specification and the foundation for further development. In some ways, this isn't a departure from its HTML predecessors, which also had validation discussions; but the move to XML may make it more likely that validation will matter. First, very few validation tools are available for HTML. While some HTML editors included validation as an option, and the W3C hosts its own Validation Service, validation checking wasn't likely to ever take place in a client – and if it did, it would be optional. As XHTML brings HTML into an XML environment, however, validation is likely to become much more important. Applications that use commodity XML parsers will be unable to read XHTML documents that don't meet the validation requirements if they expect all documents to be valid – a fairly common requirement in the XML world. As various XML applications and repositories link into, process, and store XHTML, the requirement for valid XHTML will increase. Validation promises to simplify XHTML processing by reducing the number of possibilities that browsers need to accommodate. Structures meant to mark up text within a block of text (such as bold, strong, em) can be limited to existence inside appropriate block elements (such as p and li) to simplify the work browsers need to perform in order to render a page. Designers building style sheets don't have to support odd usages because validation makes structures very predictable. Using validation regularly, and breaking the habits ingrained by years of HTML, may be difficult; but cleaning things up can enable you to use new tools and avoid complicated situations.
New to XHTML XHTML brings a few new tools to your Web development arsenal. In some cases, they replace older HTML tools; in other cases, they bring XML functionality to XHTML. You should get used to these fairly quickly, although some of them may cause problems in making XHTML work with older HTML browsers. As the shift from HTML to XHTML becomes more pronounced, you'll be able to use these more and more easily.
XML declarations XML documents typically are prefixed with an XML declaration – an odd-looking bit of markup that indicates the XML version number and sometimes the encoding of the characters used. For example, a document might start with:
<?xml version="1.0" encoding="UTF-8"?>
This indicates that the document is an XML document (or should be, anyway!) written to conform to version 1.0. The character encoding used is an 8-bit transformation of Unicode. The values used for the encoding declaration are the same as those used by the HTML meta element's charset attribute, and the XHTML recommendation suggests using both. (In case of a conflict, the XML declaration wins, though.) For example, an XHTML document might start out like this:
<?xml version="1.0" encoding="US-ASCII"?> <html> <head> <title>My US-ASCII document</title> <meta http-equiv="Content-type" content="text/html" charset="US-ASCII" /> </head> ...
The XML declaration is optional, as are the version and encoding declarations it contains. For example, you can include this simple XML declaration at the start of an XHTML document:
<?xml?> Or this one: <?xml version="1.0"?> Or this one: <?xml encoding="UTF-8"?>
Some older HTML browsers display the XML declaration at the top of the page, so you can omit it if this bothers you. Without the XML declaration, however, you are limited to encoding your documents in UTF-8 or UTF-16 – at least if XML software processes your XHTML documents at any point.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
related articles
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
2. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
3. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...
4. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...
5. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...
6. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...
7. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...
8. Formatting Content with CSS Properties
While selectors do a great job of picking out content that needs formatting, designers (as opposed to Web site managers) like CSS mostly because of the large number of available formatting properties. CSS offers properties that support nearly any presentation of a document desired, and yet more properties are in development as part of the CSS3 activity. CSS properties enable you to describe precisely how you want the pieces of your document formatted and to override the rules by which HTML is presented normally. <...
