XHTML Namespaces Validation and Other Complexities

an article added by: Albert Lichtblau at 06022007


In: Root » » HTML XHTML and CSS » XHTML Namespaces Validation and Other Complexities

French Spanish Portuguese Italian German Japanese Chinese Korean Russian Arabic

Namespaces, Validation, and Other Complexities The XHTML 1.1 specification is lucky. It's all assembled within the same namespace. Extensions have a much harder time negotiating a number of implementation issues involving namespaces and different kinds of XML processing. Because XHTML is the dominant vocabulary when html is the root element, it's quite clear that XHTML documents should declare the default namespace there. On the other hand, it's difficult both to decide what to do with namespace declarations for extensions and how to sensibly include XHTML within other modules. This is because it may not be wise to always assign XHTML the default namespace. Namespace problems arise (for now) because XML 1.0 DTDs are not "namespace-aware." Validating parsers don't interpret prefixes as having any meaning other than a simple string of characters in the name. The Namespaces in XML Recommendation makes it clear that you can change the prefix to something else, while referencing the same URL, and still describe the same thing. Because validating parsers don't recognize namespaces, changing the prefix means that documents claiming to be identical under the Namespaces in XML Recommendation aren't recognized as such by XML 1.0 – and nothing has been done to resolve this. XHTML 1.1 doesn't go to the (admittedly extreme) length of parameterizing element names to allow namespace prefix changes – at least as of the 5 January 2000 Last Call drafts. This means that module developers have to watch out for namespace prefix collisions if they plan to use their modules in environments where they might encounter validating XML parsers – an XML document storage system, for instance. Keeping out of trouble on this issue can be easy, although perhaps not quite perfect. When I chose the prefix for the XHTML-Biography Module, I chose the more verbose (but less common) biog over the easier (but more likely to conflict) bio. The more specific (or downright weird) the prefix you choose, the less likely it is that others will choose the same one and then want to mix their content with yours. This relatively easy strategy may make sense. Another option is to parameterize names within your own documents. This might look like the following code:

   <!ENTITY % biog.nsprefix  "biog:">
   <!ENTITY % biog.nsuri  "http://www.simonstl.com/xhtml/xhtml-biography/">
   <!ENTITY % biog:biography.element  "INCLUDE" >
   <![%biog:biography.element;[
   <!ENTITY biog:biography.content  "(biog:title, biog:name, biog:birth, biog:death?, %Block.mix;)">
   <!ENTITY % biog.nsElementName  "%biog.nsprefix;biography%biog:biography.content;">
   <!ELEMENT % biog.nsElementName  %biog:biography.content;>
   ]]>
   <!ENTITY % biog:biography.attlist "INCLUDE"  >
   <![%biog:biography.attlist;[
   <!ATTLIST biog:biography.attlist
   %biog.nsprefix CDATA
   #FIXED "%nsuri;"
 %Common.attrib;>

Section 4.4.8 of the XML 1.0 Recommendation states: "When a parameter entity is recognized in the DTD and validated, its replacement text is enlarged by the attachment of one leading and one following space (#x20) character." That sounds like your element and attribute names would end up with spaces in them, which is prohibited. However, by creating yet another parameter entity, and combining the namespace prefix with the element name there, you can fall back on section 4.4.5, which avoids adding the space. The last option, and the one most likely to be honored in practice, is to hope that namespace prefix collisions never occur in a validating XML environment. Most browsers, for instance, are non-validating and don't ever check the names and structures in the document against the DTD. This makes the DTD a formal exercise rather than a binding commitment, a passport into validating environments for some especially conformant documents rather than a set of rules for all the documents of a different type. Non-validating environments may never read the DTD at all, so the #FIXED attribute values used in the preceding code may have no effect there. Always include your namespace declarations within the document as often as necessary to make sure that your elements are identified correctly.

Tip XML Schemas should ease this problem because schema processors are namespace-aware. When Schema modules appear, they may be worth investigating for this feature in particular.

Documenting Extensions The abstract module and the DTD are a fine start toward building a module, but they're not especially human-friendly. Abstract modules are a definite improvement over DTD modules for casual reading, but they still don't contain explanations of your modules' purpose or their inner workings. They provide a guide to structure, not to usage or processing. To produce a "complete" module, you should provide a few more pieces. The first piece, unless you plan to be the only one working with your module, is a user's guide. The user's guide is basically for authors and readers; it explains the semantics of your document structure. When should you use biog:given, and how should it interact with biog:middle? What happens when a biography includes 50 other people's names? Should they be marked up? Issues that seem obvious to the creator of a vocabulary are often confusing, invisible, or opaque to the people who use that vocabulary – especially as time passes. The more information you can provide for your users (without putting them to sleep entirely, of course) the better.

The second piece is an explanation for those who need to process your vocabulary. Some information overlaps with the user guide, but there may be a lot more detail about interactions between different elements and also more formal descriptions of things such as style sheet conventions. A reference implementation may be part of this documentation, or it may be left as an exercise for the reader depending on the situation. There isn't any standard approach for including this kind of material with XHTML modules, but references from the DTD (in comments) and abstract modules should suffice. If not, this material can be part of a larger site explaining how to make the module work.

Tip While there is no universal convention for where to put your documentation, it might be smart to keep it in the same directory as your DTD, and to provide comments in the beginning of your DTD that explain where to find the detailed documentation.

Supporting Your Extensions on the Server Defining a vocabulary doesn't mean that applications suddenly will understand it or have any idea what to do with it. Some levels of generic XML processing are possible (such as storing XML in databases or modifying it with style sheets), but making good use of many modules requires some custom code. Support on the server is usually much easier than support on the client because you typically have more control over the setup of the server. You can control the hardware, the software, the software versions, and the surrounding environment, and you can code in a language you find most comfortable. Even if you support multiple platforms in a commercial product, the number of server installations is typically much smaller than the number of clients to reduce the probability of odd conflicts.

Some XHTML applications probably will be entirely server-based, converting the XHTML to dynamic HTML or some other interface structure for presentation in a browser. Meanwhile, others (likely including the XHTML-Biography Module) will provide only server-specific functionality for tasks such as search engine assistance. The transformation of XHTML to another format may take place at user request, on the server, or as a consequence of an authoring process that saves both the native XHTML for editing and another format for presentation. Search engine and agent-oriented server tasks may involve customizing code to look for particular vocabularies, but probably will require only the creation of new software for cases in which substantial or complex new vocabularies are introduced. Server software also may have an important role in mediating the transmission of XHTML documents to a variety of clients that may understand only portions of them as they stand.

Supporting Your Extensions on the Client Sending XHTML to clients can be very easy or it can be very difficult depending on the needs of your vocabulary for processing and the level of standards compliance of the target browser, should you choose to stay in the browser. Building your own client software gives you incredible freedom, but it means that you have to create a lot more code from the ground up. Conversely, relying on browsers means relying on other people's code – sometimes a good thing, sometimes a bad thing. Most browsers don't support XHTML 1.0 yet, so expecting compliance with (and understanding of) XHTML 1.1 is too much to ask. Instead, the key tools for using XHTML 1.1 modules in Web browsers are the tools for using its foundations: XML 1.0 and namespaces; the DOM for programmable access to, and modification of, information; and cascading style sheets Level 2 (CSS) and/or Extensible Stylesheet Language (XSL) for presentation.

Tip You may be able to get away with less for particular applications, but XHTML doesn't support the alternatives directly (such as Microsoft's xml element in HTML documents). You can create modules that add support for them, of course. The easiest way to integrate new XHTML modules into the client is through the use of style sheets. CSS style sheets enable you to deliver your XHTML directly to the client and provide some supplement formatting so that readers can explore your documents. (As CSS develops further, you also may be able to use style sheets to specify behavior – much as is done today with dynamic HTML.) If your module is just designed to present extra information or to annotate information within an XHTML framework, you may not need a style sheet, or you may have an extremely simple style sheet comprised mostly of display:inline property values. More sophisticated modules may need more complex style sheets that include sophisticated semantic rules.

The Document Object Model (DOM) provides similar capabilities and more flexible programmability. The same capabilities that the DOM provides for HTML and XML processing are available for XHTML, without need for modification. (XHTML is just HTML and XML after all!) A client application using an XHTML 1.1 module might just be an HTML and JavaScript wrapper with some extra XML content that gets processed by the script at the whim of the user. These kinds of applications are extremely useful for prototyping, and are sometimes robust enough to be useful in the longer term as well. Building your own client from scratch requires a lot more investment, as toolkits for doing so aren't readily available. Java applets, ActiveX controls, and browser plug-ins present lightweight solutions that provide a lot of custom code without the need for a large framework. However, building new applications that implement the XHTML vocabulary can be very difficult – even with the structural ambiguities of older HTML removed. In some cases, such as the wireless applications discussed in Article 18, this may be the only viable strategy. In others, like those described in the next article, the amount of HTML vocabulary used may be small or isolated enough that the other XHTML modules deserve their own program – with the HTML vocabulary given a minimal amount of attention.

XHTML 1.1 is still designed with an HTML-oriented browser in mind – after all, it's a product of the HTML Working Group at the W3C. Many of the design decisions for XHTML 1.1 – notably its use of the default namespace – make it difficult to use XHTML in something other than a dominant position within the document. Nonetheless, there are plenty of reasons for XML developers to use XHTML within their work. It's possible to make some accommodations with XHTML 1.1 in its current form to produce a workable set of tools. When XML first appeared, some people considered it an opportunity to throw away the bloated and misused HTML vocabulary and start all over again.

The HTML vocabulary can provide a useful set of basic formatting and linking tools, enabling XML developers who need a bit of human-readable markup in their documents to avoid the extra work of building their own structures and processing tool.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

2. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

3. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

4. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

5. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

6. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

7. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

8. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...