Using XHTML in an XML Context

an article added by: Albert Lichtblau at 06022007


In: Categories » Computers and technology » HTML XHTML and CSS » Using XHTML in an XML Context

XHTML Inside XML: Using XHTML in an XML Context

While supplementing the XHTML vocabulary with your own XML modules is a very useful approach for solving a lot of problems, there are also a lot of cases in which XHTML can provide a useful supplement to XML vocabularies. HTML is ubiquitous, well known, and supported by toolkits that make integrating it with some kinds of XML development easier. If XML documents need documentation, if they need a presentation vocabulary, or if they need to act as containers for Web documents, then putting XHTML in XML can be a relatively quick fix for a lot of problems.

Beyond the Browser and Within the Browser enabled its users to include XML content within its XML element type, making it possible for IBTWSH to both contain XML and be contained by an XML document.

Document Definition Markup Language (DDML – at http://www.w3.org/TR/NOTE-ddml ), a simple schema vocabulary, uses IBTWSH for human-readable documentation of schema components. The DTD fragment includes IBTWSH and employs it for the content model of the DDML:Doc element:

   <!ENTITY % ibtwsh SYSTEM  "http://www.ccil.org/~cowan/XML/ibtwsh.dtd">
   %ibtwsh;
   <!ELEMENT DDML:Doc  %struct.model;>
   <!ATTLIST DDML:Doc
   xmlns CDATA #FIXED  ""><!--IBTWSH has no namespace-->

The struct.model entity is much like the Block.mix entity in XHTML; it enables schema designers to add comprehensive comments to their schemas, not just a few sentences. Although DDML files aren't meant for presentation in a browser – they describe document structures, after all – the use of the HTML vocabulary makes it easy for schema creators to use a familiar vocabulary and familiar tools to describe the structures they create formally with the DDML schema vocabulary. Documentation creators can create transformations that turns the schema "inside out," building an HTML document describing the schema that uses the DDML:Doc elements as a foundation. Even without that extra step, developers may be able to open a DDML file in a browser and get a reasonable amount of information about the schema depending on how (and if) the schema creator uses the DDML:Doc element.

What HTML Has to Offer XML HTML provides a well-known vocabulary for which an amazing amount of infrastructure is already available. Browsers, widely available for free, are only the most visible aspects of that infrastructure. HTML editors are commodity tools, available in a range of prices and sophistication. Components for viewing HTML in programs are available in environments from Windows (where you can "borrow" Internet Explorer's HTML rendering DLL) to Java (where the Swing library includes a set of components for viewing and editing HTML) to much smaller, text-only environments. HTML has been thoroughly poked, explored, and criticized, leading to the development of best practices that can support information access for the disabled and the integration of complex information structures from a wide variety of sources. While HTML has its limits, and developers encounter those every day, XHTML seems to lead the path forward to using HTML vocabularies within XML documents, as well as using XML to enrich XML documents. You can work around XHTML's insistence on the default namespace, as you see in the next section. While many XML documents may be capable of getting along just fine without XHTML or the HTML vocabulary, the low cost of using HTML should make it an attractive option for many cases in which it can be useful.

As of this writing, the strengths of HTML are obvious. But the means of integrating that infrastructure with XML are not obvious. The current generation of HTML tools is designed to create HTML documents, not to create XML documents that may include some HTML. As a result, taking advantage of HTML within XML is difficult. However, this approach may be worth the trouble. At the same time, browsers remain very HTML-oriented, and no standard yet exists (XLink may come eventually) that lets XML do tasks that are very simple in HTML (such as the inclusion of images, scripts, and other components). So far, the browser vendors seem content to let XML documents access these facilities through the HTML vocabulary.

Applications for XHTML Islands While Microsoft is pushing XML data islands within HTML documents, let's explore how XHTML document islands can fit into an XML framework. We examine both XHTML 1.1-conformant approaches and the more informal set of rules supported by the early XML-aware browsers, trying to find a consistent means of using HTML vocabulary within XML documents that can work over the long run.

Note Within this section, you explore the implications of using the XHTML vocabulary within XML documents, not its impact on XML DTDs. The next section discusses the DTD issues for various types of XML development.

Images, scripts, and forms in browsers HTML provides a rich set of functionality that the XML specifications don't support yet. While any XML document can contain markup describing images, scripts, and forms, there are no standardized tools for notifying applications that this markup should be treated as images, scripts, or forms. Things that seem extremely easy in HTML can be very difficult in XML simply because applications arrive with very few assumptions about XML content. Cascading style sheets, originally built on top of HTML, didn't address these issues because HTML already did. As a result, there's no easy way to include such content in XML documents for display in today's Web browsers – unless you fall back on an HTML vocabulary. Namespaces ride to the rescue here (although with a few hitches). They enable XML document creators to identify some elements and their attributes as HTML. Those elements must use the HTML vocabulary – you can't rename an img element type image or picture – but this approach can solve some problems for XML developers targeting Web browsers. At present, the Web browsers (Netscape 6 preview release, Opera 4 beta, and Internet Explorer 5.x) use a namespace-based HTML 4.0 to identify HTML information within an XML document, not the namespaces defined in XHTML 1.0. The URI in use is: http://www.w3.org/TR/REC-html40 In most of the browsers, you can assign this URI to the default namespace or to a different prefix (typically html or xhtml). However, Microsoft Internet Explorer supports the exclusive use of html as the prefix (and regards all elements prefixed with html as HTML vocabulary even if they are assigned a different namespace).

Tip For more detailed information on using XML and CSS to create documents for Web browser presentation, see my series of articles at XML.com: http://www.xml.com/pub/au/St._Laurent_Simon. Let's start with a simple example that you can use for either browser display or machine-to-machine communication that provides some basic information about a article. It uses the HTML img element type to bring in a picture, a for a link, and h1 for a headline:

   <?xml  version="1.0"?>
   <?xml-stylesheet  type="text/css" href="articlestest.css"?>
   <catalog  xmlns:html="http://www.w3.org/TR/REC-html40" >
   <html:h1>An XML  Introduction</html:h1>
   <article><cover>
   <html:a  href="http://www.amazon.com/exec/obidos/ISBN=076453310X">
   <html:img  src="http://images.amazon.com/images/P/076453310X.01.MZZZZZZZ.gif"
   /></html:a></cover>
   <author>Simon  St.Laurent</author>
   <title> <html:a  href="http://www.amazon.com/exec/obidos/
   ISBN=076453310X/">XML: A  Primer, 2nd Ed.</html:a></title>
   <pubyear>1999</pubyear>
   <publisher>IDG Articles</publisher>
   <isbn>0-7645-3310-X</isbn>
   <price>$19.99</price>
   </article>
 </catalog>

You can display this XML document (which has no DTD whatsoever) very easily using the minimalist style sheet shown here:

   catalog {display:block; }
   article {display:block; padding:5px;  }
   article *{display:block;}
 

Caution The current release of Internet Explorer 5.0 for the Macintosh displays the link, but doesn't make it active. (The images work fine.) The browsers can render the XML portions of the document using the rules provided in the style sheet, and they can render the HTML portion using their built-in understanding of what the HTML vocabulary does. This isn't exactly XHTML – the namespace (and use of the prefix) is different, there is no DOCTYPE declaration, and no modules are included – but it's the likely path toward XHTML within browsers.

Caution Unless XHTML modularization changes to permit namespaces other than the default namespace to represent XHTML, the preceding example will never be valid XHTML. Making it valid requires the creation of a different namespace for the XML and a declaration making the XHTML namespace the default namespace. That will break Internet Explorer's current HTML-in-XML support, which depends on the html: prefix.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Link to this article from your page    Send this article to you or to a friend
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

2. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

3. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

4. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

5. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

6. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...

7. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...

8. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...

9. Formatting Content with CSS Properties
While selectors do a great job of picking out content that needs formatting, designers (as opposed to Web site managers) like CSS mostly because of the large number of available formatting properties. CSS offers properties that support nearly any presentation of a document desired, and yet more properties are in development as part of the CSS3 activity. CSS properties enable you to describe precisely how you want the pieces of your document formatted and to override the rules by which HTML is presented normally. <...