Moving Forward into XML Using XSL to Generate XHTML

an article added by: Albert Lichtblau at 06022007


In: Root » » HTML XHTML and CSS » Moving Forward into XML Using XSL to Generate XHTML

French Spanish Portuguese Italian German Japanese Chinese Korean Russian Arabic

While XML has inherited an enormous amount of familiar infrastructure from the world of Web development, its SGML ancestry has brought with it some tools and innovations that may seem strikingly unfamiliar to Web developers. Extensible Stylesheet Language (XSL) and Extensible Stylesheet Language Transformations (XSLT) originally were developed as industrial-strength formatting tools, but they have application to XHTML work as well. XSLT is probably more interesting to developers who want to work with the HTML vocabulary because XSL is largely about the creation of a markup vocabulary to replace HTML for formatting.

Note This article shows you what XSL has to offer XHTML developers, but it's not a full-scale XSL tutorial. XSL, even just XSLT, is an enormous subject worthy of its own article-length treatment. You may want to explore Elliotte Rusty Harold's XML Bible (IDG Articles, 1999) for a thorough introduction to XPath and XSLT. Ken Holman has a complete set of training materials available through http://www.cranesoftwrights.com/training/index.htm; the first and last two articles are available as free downloads. The XSL specification is available at http://www.w3.org/TR/xsl/. The XSLT specification, which you apply in this article, is available at http://www.w3.org/TR/xslt. The XPath specification, which XSLT uses, is available at http://www.w3.org/TR/xpath. If you need XSL help, the XSL-List (at http://www.mulberrytech.com/xsl/xsl-list/index.html) is a great place to start.

Introduction to XSL While XSL has been much slower in development than XML, the ideas behind it coalesced around the same time as XML itself. XML's roots are in SGML, while XSL's roots are in a styling language for SGML – the Document Style Semantics and Specification Language (DSSSL). While XML was largely a simplification of SGML, XSL has proven more of an inheritor and reinterpreter of DSSSL. In effect, XSL makes some aspects of DSSSL that hadn't received much use (transformations) more central to the project and reconciles DSSSL's model for document formatting to some extent with the W3C's cascading style sheets. XSL is in some ways a competitor to CSS, although its proponents consider them different enough that they don't compete officially. While CSS describes formatting for particular structures within a document, XSL describes a transformation from the original document to a set of formatting objects – possibly reorganizing, filtering, or even discarding the original structures along the way. While CSS is annotative, XSL is transformative. CSS works well in environments in which documents are either static or generated by code that isn't format-specific; XSL, on the other hand, assumes that it has much more work to do in building a document. Extensible Stylesheet Language Transformations (XSLT) processors take an XML document as an input, the origin tree, and create a result tree based on the template rules provided in the style sheet. That result tree may contain XSL Formatting Objects (often called XSL-FOs) or it may contain other information, typically HTML or XHTML. The output is already a tree, and the XSLT processor has to reserialize it anyway, so converting it to XHTML is easy. Effectively, XSLT provides a simple way to convert XML documents to XHTML, making it easy to present content from XML documents to browsers that know nothing of XML itself. (XSLT input documents must be XML – you can't use this tool on HTML that doesn't conform to XHTML's rules.)

Note XSL Formatting Objects provide an explicit XML vocabulary for describing formatted text. XSL-FOs are still in development at the W3C and haven't received wide implementation (at least in browsers) yet. For the latest specification, see http://www.w3.org/TR/xsl/. When they're ready, XSL will provide a complete formatting solution, using XSLT transformations to convert documents into formatting objects describing information presentation. The other possibility that XSLT opens, but which isn't implemented widely yet, is sending XML information to clients. The clients then perform the XSLT transformation locally. Most servers process multiple, simultaneous requests, while most browsers more or less are idle. This redistributes processing for better server response. So far, however, Microsoft is the only vendor actively pursuing this strategy; the old version of XSLT that Microsoft currently supports is decidedly different from the standard. (The Mozilla project is pursuing standard XSLT support, although that remains in the early stages.) For now, most XSLT processing has to take place on the server where developers have more control over the environment.

XSLT processing is fairly resource-intensive, requiring the construction of object trees in memory. This can become a burden for servers that process large numbers of requests or process very large documents. There are several strategies for avoiding this bottleneck – from buying more hardware, to sending processing to the client when possible, to aggressively caching the result documents produced by transformations in order to avoid processing the same document and style sheet combinations repeatedly. In some cases, batch processing can make conversions before users actually retrieve the files and keep the server load minimal.

Note XSLT is new enough that it isn't a standard feature of most server environments yet, although this is changing slowly. There are a number of different XSLT processors available, most of which conform to the W3C Recommendation closely. Many of them are freely distributed or open source, requiring only some integration with your processing environment. For a list of XSLT processors, see http://www.xslinfo.com/. News on recent developments in XSLT is available at http://xmlhack.com/list.php?cat=2.

Basic Transformation Principles XSLT style sheets are XML documents that combine an XSLT vocabulary with the vocabulary that the information is transformed into – in this case, XHTML. (In some cases, an extension vocabulary for a particular processor also may appear.) The XSLT vocabulary defines the rules for processing, while the other vocabulary provides parts and structures that are assembled into the result document.

Preliminaries

An xsl:stylesheet element can contain the entire style sheet:

   <?xml  version="1.0"?>
   <xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   version="1.0">
   ...stylesheet...
   </xsl:stylesheet>
An xsl:transform element used the same way may be substituted for xsl:stylesheet; technically, neither of these elements is necessary. It's also a good idea to define any namespaces you plan to use in the result document here. An XSLT style sheet can be any XML document, and only elements using the XSL namespace are processed. Despite that incredible flexibility, let's stick to a more conservative approach. The next piece you need for XHTML creation is the xsl:output element, which enables you to specify the type of output you're creating and provides access to the DOCTYPE declaration. <xsl:output method="xml" indent="yes" doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" doctype-system="-//W3C//DTD XHTML 1.0 Strict//EN" omit-xml-declaration="yes" /> While most XSLT processors provide an html output method, this leaves off end tags (or empty tags) for empty elements and may leave off end tags for some elements with content. Using the xml output method must suffice until developers begin supporting XHTML explicitly.
  

Caution While XSL processors produce valid XHTML using the xml setting, they don't do things such as insert the space before the /> of an empty tag to produce <br/> instead of <br />. You can do a search-and-replace after the transformation to add the space, or add dummy attributes (like class="") to keep older browsers from choking on the empty tags. If you're doing batch processing, rather than generating files on the fly, you also can use the Tidy program (described in Article 10) on the results to add the needed space. The indent attribute is handy if you want to produce more readable markup, but it doesn't have much effect on the output seen in the browser window because of the way HTML and XHTML discard extra whitespace. The next two attributes, doctype-public and doctype-system, are critical if you're creating strictly conforming XHTML because they enable you to specify the public and system identifiers of the XHTML vocabulary you're using. The example here uses the identifiers for the XHTML 1.0 strict DTDs, but you can replace these with values for the transitional or frameset DTDs or with XHTML 1.1 (and beyond identifiers when they become available).

The last attribute, omit-xml-declaration, keeps the XML declaration from appearing at the front of the document when its value is set to yes. If you generate XHTML that has to go to a wide range of browsers, particularly older browsers that sometimes display the XML declaration at the top of the screen, this is probably a good idea. If you're less interested in backward compatibility and more interested in forward compatibility with more character encodings for internationalization, you should set this value to no.

Creating the result document Now that you've specified the overall form of the result document, you need to start describing its content. XSLT enables you to specify content using a mix of the result document and XSLT-specific elements and attributes that build the document from information in the source document. XSLT provides some default behavior that sets the processor to explore the document tree until it finds a match, and a rule that copies the text of nodes. For the first example, you override those rules and create a style sheet that completely ignores the content of the source tree and just produces XHTML:

   <?xml version="1.0"?>
   <xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   version="1.0">
   <xsl:output method="xml"  indent="yes"
   doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
   doctype-system="-//W3C//DTD XHTML 1.0  Strict//EN"
   omit-xml-declaration="yes" />
   <xsl:template match="/">
   <html  xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US"  lang="en-US">
   <head>
   <title>Hello World!</title>
   </head>
   <body>
   <h1>Hello World!</h1>
   <p>Hello World!</p>
   </body>
   </html>
   </xsl:template>
   </xsl:stylesheet>
   The output looks like this:
   <!DOCTYPE html PUBLIC  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" "-//W3C//DTD  XHTML
   1.0 Strict//EN">
   <html xml:lang="en-US"  lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <title>Hello World!</title>
   </head>
   <body>
   <h1>Hello World!</h1>
   <p>Hello World!</p>
   </body>
 </html>

While this simple example isn't convincing, it does provide a foundation for future work. The output is notable for several things, including the proper handling of the XML declaration (which you said you don't want) and the DOCTYPE, which you set. Also notable is the change in the sequence of attributes on the html element – attribute order isn't considered important in XML, HTML, or XHTML, and XSLT doesn't preserve it either. The xsl:template element does the real work; it specifies both the content to which they should be applied and the results that should be included. Because you just replaced the entire document, you match against the root element (/, an XPath expression). This bit of code then replaces the root element and the output is generated. Although they don't appear in the style sheet, there are also default rules built into XSLT (in Section 5.8) that get tested – but only if none of the explicit rules match. The first is this:

   <xsl:template  match="*|/">
   <xsl:apply-templates/>
 </xsl:template>

The match attribute uses XPath notation to say that the template should apply to any element (*) or (|) of the root element (/) of the document. The xsl:apply-templates element inside the xsl:template element tells the XSLT processor to check the rest of the document for possible templates that apply to the content of the document. This allows recursive processing of documents because the explicit rules provided in the style sheet can begin with content further into the document than the root element, and some content may be skipped. The second rule is normally this:

   <xsl:template  match="text()|@*">
   <xsl:value-of  select="."/>
 </xsl:template>

By default, this applies to all text nodes (text()) and the contents of all attributes (@*) and includes their content in the document. The xsl:value-of element retrieves that information, using the select attribute value (.) to get the content from the current node. (There is also a default rule that drops processing instructions and comments from the original document.)

Note The XSLT implementation in Internet Explorer 5.5, apart from using a slightly different syntax, also doesn't support these default rules. Future versions may provide better support. On just this small foundation, you can create some XSLT style sheets that do real work. You can take a simple XML document and convert it into an XHTML table. Start with an XML document describing a set of articles:

   <catalog>
   <article>
   <author>Simon  St.Laurent</author>
   <title>XML Elements of  Style</title>
   <pubyear>2000</pubyear>
   <publisher>McGraw-Hill</publisher>
   <isbn>0-07-212220-X</isbn>
   <price>$29.99</price>
   </article>
   <article>
   <author>Elliotte Rusty  Harold</author>
   <title>XML Bible</title>
   <pubyear>1999</pubyear>
   <publisher>IDG Articles</publisher>
   <isbn>0764532367</isbn>
   <price>$49.99</price>
   </article>
   <article>
   <author>Robert Eckstein</author>
   <title>XML Pocket Reference</title>
   <pubyear>1999</pubyear>
   <publisher>O'Reilly and  Associates</publisher>
   <isbn>1-56592-709-5</isbn>
   <price>$8.95</price>
   </article>
   <article>
   <author>Kevin Dick</author>
   <title>XML: A Manager's  Guide</title>
   <pubyear>1999</pubyear>
   <publisher>Addison-Wesley</publisher>
   <isbn>0201433354</isbn>
   <price>$29.95</price>
   </article>
   <article>
   <author>Simon St.Laurent</author>
   <title>XML: A Primer, 2nd  Ed.</title>
   <pubyear>1999</pubyear>
   <publisher>IDG Articles</publisher>
   <isbn>0-7645-3310-X</isbn>
   <price>$19.99</price>
   </article>
   <article>
   <author>Simon St.Laurent</author>
   <title>Building XML  Applications</title>
   <pubyear>1999</pubyear>
   <publisher>McGraw-Hill</publisher>
   <isbn>0-07-134116-1</isbn>
   <price>$49.99</price>
   </article>
 </catalog>

The style sheet includes a rule to build the HTML document as a whole, including a table element, and then rules to build rows and cells:

   <?xml version="1.0"?>
   <xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns="http://www.w3.org/1999/xhtml"
   version="1.0">
   <xsl:output method="xml" indent="yes"
   doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
   doctype-system="-//W3C//DTD XHTML 1.0  Strict//EN"
   omit-xml-declaration="yes" />
   <xsl:template match="/">
   <html  xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US"  lang="en-US">
   <head>
   <title>Catalog</title>
   </head>
   <body>
   <h1>Articles</h1>
   <table>
   <xsl:apply-templates/>
   </table>
   </body>
   </html>
   </xsl:template>
   <xsl:template match="article">
   <tr><xsl:apply-templates/></tr>
   </xsl:template>
   <xsl:template match="article/*">
   <td><xsl:apply-templates/></td>
   </xsl:template>
 </xsl:stylesheet>

The first rule matches the root element (/) and builds an XHTML document framework, just like the previous example. In this case, however, the rule adds a table element and includes an xsl:applytemplates rule to let the XSL processor build the table from the rest of the document. The second rule matches any article elements it encounters and builds table rows (tr elements) to contain their content. Again, xsl:apply-templates lets the processor continue to work on the contents of the article element. The last rule matches any child element of any article element (article/*) and enables you to avoid the task of creating rules for the author, title, pubyear, and other elements specifically. These become the table cells (td elements), and xsl:apply-templates is applied yet again. The XSL-generated XHTML as it appears in a Web browser:

   <!DOCTYPE html PUBLIC  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" "-//W3C//DTD  XHTML
   1.0 Strict//EN">
   <html xml:lang="en-US"  lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <title>Catalog</title>
   </head>
   <body>
   <h1>Articles</h1>
   <table>
   <tr>
   <td>Simon St.Laurent</td>
   <td>XML Elements of Style</td>
   <td>2000</td>
   <td>McGraw-Hill</td>
   <td>0-07-212220-X</td>
   <td>$29.99</td>
   </tr>
   <tr>
   <td>Elliotte Rusty Harold</td>
   <td>XML Bible</td>
   <td>1999</td>
   <td>IDG Articles</td>
   <td>0764532367</td>
   <td>$49.99</td>
   </tr>
   <tr>
   <td>Robert Eckstein</td>
   <td>XML Pocket Reference</td>
   <td>1999</td>
   <td>O'Reilly and Associates</td>
   <td>1-56592-709-5</td>
   <td>$8.95</td>
   </tr>
   <tr>
   <td>Kevin Dick</td>
   <td>XML: A Manager's Guide</td>
   <td>1999</td>
   <td>Addison-Wesley</td>
   <td>0201433354</td>
   <td>$29.95</td>
   </tr>
   <tr>
   <td>Simon St.Laurent</td>
   <td>XML: A Primer, 2nd Ed.</td>
   <td>1999</td>
   <td>IDG Articles</td>
   <td>0-7645-3310-X</td>
   <td>$19.99</td>
   </tr>
   <tr>
   <td>Simon St.Laurent</td>
   <td>Building XML Applications</td>
   <td>1999</td>
   <td>McGraw-Hill</td>
   <td>0-07-134116-1</td>
   <td>$49.99</td>
   </tr>
   </table>
   </body>
   </html>
This example is pretty simple because it doesn't need to create or access any attributes. To make these entries referenceable, use the ISBN of each article to create an id attribute on the table row. You only need to change the rule that handles the article element, although you reach into the isbn element to create the attribute.
   <xsl:template match="article">
   <tr>
   <xsl:attribute name="id">
   <xsl:value-of select="./isbn"/>
   </xsl:attribute>
   <xsl:apply-templates/></tr>
   </xsl:template>
The xsl:attribute element enables you to add attributes to the current element – in this case the tr element. The xsl:value-of element fills in the content based on its select attribute's value. The select attribute's value is "./isbn", meaning to start from the current source tree node and find a child isbn element. The xsl:apply-templates element then lets the rest of the processing continue as usual. The new entries in the table now look like this:
   <tr id="0-07-212220-X">
   <td>Simon St.Laurent</td>
   <td>XML Elements of Style</td>
   <td>2000</td>
   <td>McGraw-Hill</td>
   <td>0-07-212220-X</td>
   <td>$29.99</td>
   </tr>
Similarly, you can reach into attributes for their values using XPath's @name syntax for referencing attributes. Remember, attribute values are added by default to your content because of the default rules built into XSLT. You may want to override this behavior, as shown here:
   <xsl:template  match="@*">
   </xsl:template>
This lets the default rule for text apply, but it prevents attributes from showing up. XSLT offers an enormous number of options that build on these basic structures and enable you to sort, recombine, split up, or modify your content.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

2. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

3. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

4. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

5. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

6. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...

7. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...

8. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...

9. Formatting Content with CSS Properties
While selectors do a great job of picking out content that needs formatting, designers (as opposed to Web site managers) like CSS mostly because of the large number of available formatting properties. CSS offers properties that support nearly any presentation of a document desired, and yet more properties are in development as part of the CSS3 activity. CSS properties enable you to describe precisely how you want the pieces of your document formatted and to override the rules by which HTML is presented normally. <...

10. Using XHTML in Traditional HTML Applications
Before moving into the much more complicated terrain of converting older HTML content to the newer XHTML rules, let's take a look at how the shift to XHTML affects day-to-day Web development and the construction of new content. Web development has been in nearly constant flux since its beginnings, and developers are accustomed to (if perhaps tired of) the challenges that come with every new standard and every new browser. Some of the challenges XHTML presents are familiar, although a few new twists brought on by XH...