In: Categories » » HTML XHTML and CSS » Exploring the XHTML DTDs
Exploring the XHTML DTDs
Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a given document, you must use a DOCTYPE declaration in the prologue of your document. The XHTML 1.0 Recommendation provides three options, one for each DTD. They look much like their HTML 4.01 predecessors, although their names are slightly different and the HTML root element is now html. For the strict DTD, this HTML 4.01 declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
becomes this XHTML 1.0 declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
For the transitional DTD, this HTML 4.01 declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
becomes this XHTML 1.0 declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
And for the frameset DTD, this HTML 4.01 declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
becomes this XHTML 1.0 declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Whichever declaration you choose, it must appear after the XML declaration (if there is one) and before the root element of the document. If your document passes through a validating parser, it checks your document to make sure that its contents conform to the rules laid out in the DTD.
Caution The XHTML 1.0 Recommendation doesn't say anything about using another XML feature, the internal subset of the DOCTYPE declaration. While its use isn't prohibited, you should avoid using it with XHTML documents.
Starting Out All three DTDs follow roughly the same layout, with a few sections more or less depending on the particular DTD you read. The first few sections of a DTD are often the most frustrating (they often put people off) because they lay groundwork for later declarations rather than make concrete declarations. Reading somewhat abstract collections of declarations outside of their context for page after page may not feel rewarding, but it's important to understand these preliminaries in order to make sense of the concrete declarations.
Tip While these preliminaries are important in XHTML 1.0, they will become even more important when XHTML is modularized in XHTML 1.1. Then you may need to choose which modules are used in documents. Understanding how these pieces fit together is critical as the specification is broken into smaller pieces.
Including character entities After some introductory comments, the three XHTML DTDs all start by referencing the entity sets – character mnemonic entities – supported by HTML: Latin-1, Symbols, and Special. Because these entity sets are stored in separate files, the DTDs can reference them easily without requiring a special set for each DTD. (It also means that other XML applications can reference the XHTML entity sets easily without needing to incorporate the entire DTD.) The declaration for the Latin-1 set, immediately followed by a reference including the material referenced by the declaration, looks like:
<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN" "xhtml-lat1.ent"> %HTMLlat1;
The entity declaration creates a parameter entity named HTMLlat1. HTMLlat1 references a set of declarations using two different identifiers, including a public identifier (-//W3C//ENTITIES Latin 1 for XHTML//EN) that applications can use if they already know what these entities are and don't want to retrieve information from the URL. Applications that don't understand the public identifier, like most XML processors, can use the URL to retrieve the full set of declarations. Either way, documents that use the XHTML DTDs may use the full set of entities.
Note The URLs for the entity set locations are given as local URLs. If you want to reference these sets in your own XML declarations, use the full form: http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent. You also may want to create a local copy — not all users of your XML DTDs may have access to the Internet or the W3C site. The copyright statement at the top of the DTD makes it clear that this kind of usage is acceptable.
Imported names This declaration, for instance, creates the Character parameter entity; meanwhile, the comment tells developers that attributes declared using this parameter entity must contain a single character as defined in ISO 10646.
Note Appendix E of the XHTML 1.0 specification mostly omits the specs listed in square brackets, but they are available at http://www.w3.org/TR/xhtml1/#refs. If you need to look up the RFCs, see http://www.rfceditor.org. For more on ISO 10646, see the XML 1.0 references at http://www.w3.org/TR/REC-xml#secexisting- stds. Many of the types are defined more simply, without referring to outside specifications. The Number entity, for instance, is described as "one or more digits." The Shape entity doesn't have a description, but its declaration limits it to a small set of well-known types: <!ENTITY % Shape "(rect|circle|poly|default)"> The transitional and frameset DTDs include two additional entities, ImgAlign and Color, which support formatting properties left out of the strict DTD. These entities are declared in a slightly different style, with their descriptive comments preceding the declaration rather than following it. These DTDs also provide a list of commonly supported colors in comments, although they aren't formally a part of the DTD that an XML parser understands.
Generic attributes The next section of each of the DTDs defines entities describing numerous attributes that are applied to many different elements. For the most part, all three DTDs define the same set of attributes for their elements. This section, in a sense, defines the framework with which the W3C wants developers to build XHTML applications. It contains the hooks for styling, internationalization, and scripting – all key tools for moving beyond static Web pages built for Western organizations. The generic attributes make XHTML more active and more inclusive at the same time. The next two sets of entities define attributes used to connect XHTML elements to user interfaces and the scripts that respond to user activities. The events entity defines a set of attributes that connect scripts to particular user-driven events, such as onclick and onkeypress, and is employed widely on elements in the body of HTML documents. The focus entity provides additional hooks for elements that can receive and lose user-interface focus. (Oddly enough, the focus entity is never used anywhere in the three DTDs, although its contents appear regularly.) Then, three of these entities – coreattrs, i18n, and events – are combined into a single large attrs attribute for use on many of the textual elements. The transitional and frameset DTDs also declare the TextAlign entity, which defines the align formatting attribute for many of the block-level elements.
Text elements The next few sections define element content for various parts of XHTML. The first, text elements, defines content that is used throughout the set of elements that present text. In this section, the first large differences between the strict and the transitional and frameset DTDs become clearly apparent. While all of the DTDs declare the same set of entities, the strict DTD omits many of the content models permitted by the other DTDs' special and fontstyle entities and effectively abolishes iframe, u, s, strike, font, and basefont from the XHTML vocabulary. This isn't new – it happened in HTML 4.0 – but it's an indicator of the direction the W3C wants to see developers take, moving away from explicit formatting in markup to a more abstracted approach applying style sheets to the structures formed by that markup. The rest of the text elements entities, culminating in the Inline entity, describe different content models that can appear inside textual content. This section defines markup that you can use inside of paragraphs and other block-level elements. One entity, misc, provides support for content that may appear in both the textual and block-level contexts, such as ins, del, script, and noscript.
Block-level elements The next section describes structures that operate at a higher level than the text elements, creating the structures in which those text elements can appear. Here the three DTDs almost converge, defining sets of block-level elements that fit into the relatively neat categories of heading, lists, and blocktext, and then adding the p, div, fieldset, and table element types for a main block element. The strict DTD leaves out isindex, menu, dir, center, and noframes, which appear in the other two DTDs. These element models then combine with the misc entity and form element to create the Block entity. Remember, XML's case sensitivity means that block and Block are completely different things. For cases in which an element may contain either block-level or textual content, this section also defines the Flow entity. This entity adds the inline entity and text to the combination of components that make up Block. The Flow entity functions in elements that step outside the usual block-text distinctions and permit either form to appear.
Content models for exclusions This is one of the odder sections of the XHTML 1.0 DTDs. Effectively, it declares content models for particular elements using models much like those in the block-level area – but with minor changes explained in comments. This section of the DTD is the result of the switch to XML. Older versions of HTML used a feature of SGML, called exclusions, to specify rules such as "no a element can contain another a element." XML dropped that feature for the sake of simplicity. As a result, this section of the DTD redefines a few of the models from the previous section in terms of needs for particular elements – a, pre, form, and button. There are also some differences among the DTDs. The content model for Form, for instance, includes the Block model in the strict DTD but the Flow model in the transitional and frameset DTDs.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
If you like this article (tutorial), please link to it from your web page using the information above.
related articles
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...
2. Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...
3. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...
4. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...
5. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
6. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
7. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
8. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...