While XML has inherited an enormous amount of familiar infrastructure from the world of Web
development, its SGML ancestry has brought with it some tools and innovations that may seem
strikingly unfamiliar to Web developers. Extensible Stylesheet Language (XSL) and Extensible
Stylesheet Language Transformations (XSLT) originally were developed as industrial-strength
formatting tools, but they have application to XHTML work as well. XSLT is probably more interesting to
developers who want to work with the HTML vocabulary because XSL is largely about the creation of a
markup vocabulary to replace HTML for formatting.
Note
This article shows you what XSL has to offer XHTML developers, but it's not a
full-scale XSL tutorial. XSL, even just XSLT, is an enormous subject worthy of its
own article-length treatment. You may want to explore Elliotte Rusty Harold's XML
Bible (IDG Articles, 1999) for a thorough introduction to XPath and XSLT. Ken
Holman has a complete set of training materials available through
http://www.cranesoftwrights.com/training/index.htm; the first
and last two articles are available as free downloads.
The XSL specification is available at http://www.w3.org/TR/xsl/. The XSLT
specification, which you apply in this article, is available at
http://www.w3.org/TR/xslt. The XPath specification, which XSLT uses, is
available at http://www.w3.org/TR/xpath. If you need XSL help, the XSL-List (at
http://www.mulberrytech.com/xsl/xsl-list/index.html) is a great place to start.
Introduction to XSL
While XSL has been much slower in development than XML, the ideas behind it coalesced around the
same time as XML itself. XML's roots are in SGML, while XSL's roots are in a styling language for
SGML – the Document Style Semantics and Specification Language (DSSSL). While XML was largely
simplification of SGML, XSL has proven more of an inheritor and reinterpreter of DSSSL. In effect, XSL
makes some aspects of DSSSL that hadn't received much use (transformations) more central to the
project and reconciles DSSSL's model for document formatting to some extent with the W3C's
cascading style sheets.
XSL is in some ways a competitor to CSS, although its proponents consider them different enough that
they don't compete officially. While CSS describes formatting for particular structures within a document, XSL describes a transformation from the original document to a set of formatting objects –
possibly reorganizing, filtering, or even discarding the original structures along the way. While CSS is
annotative, XSL is transformative. CSS works well in environments in which documents are either static
or generated by code that isn't format-specific; XSL, on the other hand, assumes that it has much more
work to do in building a document.
Extensible Stylesheet Language Transformations (XSLT) processors take an XML document as an
input, the origin tree, and create a result tree based on the template rules provided in the style sheet.
That result tree may contain XSL Formatting Objects (often called XSL-FOs) or it may contain other
information, typically HTML or XHTML. The output is already a tree, and the XSLT processor has to
reserialize it anyway, so converting it to XHTML is easy. Effectively, XSLT provides a simple way to
convert XML documents to XHTML, making it easy to present content from XML documents to browsers
that know nothing of XML itself. (XSLT input documents must be XML – you can't use this tool on HTML
that doesn't conform to XHTML's rules.)
Note
XSL Formatting Objects provide an explicit XML vocabulary for describing
formatted text. XSL-FOs are still in development at the W3C and haven't
received wide implementation (at least in browsers) yet. For the latest
specification, see http://www.w3.org/TR/xsl/. When they're ready, XSL
will provide a complete formatting solution, using XSLT transformations to
convert documents into formatting objects describing information presentation.
The other possibility that XSLT opens, but which isn't implemented widely yet, is sending XML
information to clients. The clients then perform the XSLT transformation locally. Most servers process
multiple, simultaneous requests, while most browsers more or less are idle. This redistributes
processing for better server response. So far, however, Microsoft is the only vendor actively pursuing
this strategy; the old version of XSLT that Microsoft currently supports is decidedly different from the
standard. (The Mozilla project is pursuing standard XSLT support, although that remains in the early
stages.) For now, most XSLT processing has to take place on the server where developers have more
control over the environment.
XSLT processing is fairly resource-intensive, requiring the construction of object trees in memory. This
can become a burden for servers that process large numbers of requests or process very large
documents. There are several strategies for avoiding this bottleneck – from buying more hardware, to
sending processing to the client when possible, to aggressively caching the result documents produced
by transformations in order to avoid processing the same document and style sheet combinations
repeatedly. In some cases, batch processing can make conversions before users actually retrieve the
files and keep the server load minimal.
Note
XSLT is new enough that it isn't a standard feature of most server environments
yet, although this is changing slowly. There are a number of different XSLT
processors available, most of which conform to the W3C Recommendation
closely. Many of them are freely distributed or open source, requiring only some
integration with your processing environment. For a list of XSLT processors, see
http://www.xslinfo.com/. News on recent developments in XSLT is
available at http://xmlhack.com/list.php?cat=2.
Basic Transformation Principles
XSLT style sheets are XML documents that combine an XSLT vocabulary with the vocabulary that the
information is transformed into – in this case, XHTML. (In some cases, an extension vocabulary for a
particular processor also may appear.) The XSLT vocabulary defines the rules for processing, while the
other vocabulary provides parts and structures that are assembled into the result document.
Preliminaries
An xsl:stylesheet element can contain the entire style sheet:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
...stylesheet...
</xsl:stylesheet>
An xsl:transform element used the same way may be substituted for xsl:stylesheet;
technically, neither of these elements is necessary. It's also a good idea to define any namespaces you
plan to use in the result document here. An XSLT style sheet can be any XML document, and only
elements using the XSL namespace are processed. Despite that incredible flexibility, let's stick to a
more conservative approach.
The next piece you need for XHTML creation is the xsl:output element, which enables you to specify
the type of output you're creating and provides access to the DOCTYPE declaration.
<xsl:output method="xml" indent="yes"
doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
doctype-system="-//W3C//DTD XHTML 1.0 Strict//EN" omit-xml-declaration="yes" />
While most XSLT processors provide an html output method, this leaves off end tags (or empty tags)
for empty elements and may leave off end tags for some elements with content. Using the xml output
Creating the result document
Now that you've specified the overall form of the result document, you need to start describing its
content. XSLT enables you to specify content using a mix of the result document and XSLT-specific
elements and attributes that build the document from information in the source document. XSLT
method must suffice until developers begin supporting XHTML explicitly.
While XSL processors produce valid XHTML using the xml setting, they don't
do things such as insert the space before the /> of an empty tag to produce
<br/> instead of <br />. You can do a search-and-replace after the
transformation to add the space, or add dummy attributes (like class="") to
keep older browsers from choking on the empty tags.
provides some default behavior that sets the processor to explore the document tree until it finds a
match, and a rule that copies the text of nodes. For the first example, you override those rules and
create a style sheet that completely ignores the content of the source tree and just produces XHTML:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" indent="yes"
doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
doctype-system="-//W3C//DTD XHTML 1.0 Strict//EN"
omit-xml-declaration="yes" />
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
<head>
<title>Hello World!</title>
</head>
<body>
<h1>Hello World!</h1>
<p>Hello World!</p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
The output looks like this:
<!DOCTYPE html PUBLIC "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" "-//W3C//DTD XHTML
1.0 Strict//EN">
<html xml:lang="en-US" lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World!</title>
</head>
<body>
<h1>Hello World!</h1>
<p>Hello World!</p>
</body>
</html>
While this simple example isn't convincing, it does provide a foundation for future work. The output is
notable for several things, including the proper handling of the XML declaration (which you said you
don't want) and the DOCTYPE, which you set. Also notable is the change in the sequence of attributes
on the html element – attribute order isn't considered important in XML, HTML, or XHTML, and XSLT
doesn't preserve it either.
The xsl:template element does the real work; it specifies both the content to which they should be
applied and the results that should be included. Because you just replaced the entire document, you
match against the root element (/, an XPath expression). This bit of code then replaces the root
element and the output is generated.
Although they don't appear in the style sheet, there are also default rules built into XSLT (in Section 5.8)
that get tested – but only if none of the explicit rules match. The first is this:
<xsl:template match="*|/">
<xsl:apply-templates/>
</xsl:template>
The match attribute uses XPath notation to say that the template should apply to any element (*) or (|)
of the root element (/) of the document. The xsl:apply-templates element inside the
xsl:template element tells the XSLT processor to check the rest of the document for possible
templates that apply to the content of the document. This allows recursive processing of documents
because the explicit rules provided in the style sheet can begin with content further into the document
than the root element, and some content may be skipped.
The second rule is normally this:
<xsl:template match="text()|@*">
<xsl:value-of select="."/>
</xsl:template>
By default, this applies to all text nodes (text()) and the contents of all attributes (@*) and includes
their content in the document. The xsl:value-of element retrieves that information, using the select
attribute value (.) to get the content from the current node. (There is also a default rule that drops
processing instructions and comments from the original document.)
Note
The XSLT implementation in Internet Explorer 5.5, apart from using a slightly
different syntax, also doesn't support these default rules. Future versions may
provide better support.
On just this small foundation, you can create some XSLT style sheets that do real work. You can take a
simple XML document and convert it into an XHTML table. Start with an XML document describing a set
of articles:
<catalog>
<article>
<author>Simon St.Laurent</author>
<title>XML Elements of Style</title>
<pubyear>2000</pubyear>
<publisher>McGraw-Hill</publisher>
<isbn>0-07-212220-X</isbn>
<price>$29.99</price>
</article>
<article>
<author>Elliotte Rusty Harold</author>
<title>XML Bible</title>
<pubyear>1999</pubyear>
<publisher>IDG Articles</publisher>
<isbn>0764532367</isbn>
<price>$49.99</price>
</article>
<article>
<author>Robert Eckstein</author>
<title>XML Pocket Reference</title>
<pubyear>1999</pubyear>
<publisher>O'Reilly and Associates</publisher>
<isbn>1-56592-709-5</isbn>
<price>$8.95</price>
</article>
<article>
<author>Kevin Dick</author>
<title>XML: A Manager's Guide</title>
<pubyear>1999</pubyear>
<publisher>Addison-Wesley</publisher>
<isbn>0201433354</isbn>
<price>$29.95</price>
</article>
<article>
<author>Simon St.Laurent</author>
<title>XML: A Primer, 2nd Ed.</title>
<pubyear>1999</pubyear>
<publisher>IDG Articles</publisher>
<isbn>0-7645-3310-X</isbn>
<price>$19.99</price>
</article>
<article>
<author>Simon St.Laurent</author>
<title>Building XML Applications</title>
<pubyear>1999</pubyear>
<publisher>McGraw-Hill</publisher>
<isbn>0-07-134116-1</isbn>
<price>$49.99</price>
</article>
</catalog>
The style sheet includes a rule to build the HTML document as a whole, including a table element, and
then rules to build rows and cells:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
version="1.0">
<xsl:output method="xml" indent="yes"
doctype-public="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
doctype-system="-//W3C//DTD XHTML 1.0 Strict//EN"
omit-xml-declaration="yes" />
<xsl:template match="/">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
<head>
<title>Catalog</title>
</head>
<body>
<h1>Articles</h1>
<table>
<xsl:apply-templates/>
</table>
</body>
</html>
</xsl:template>
<xsl:template match="article">
<tr><xsl:apply-templates/></tr>
</xsl:template>
<xsl:template match="article/*">
<td><xsl:apply-templates/></td>
</xsl:template>
</xsl:stylesheet>
The first rule matches the root element (/) and builds an XHTML document framework, just like the
previous example. In this case, however, the rule adds a table element and includes an xsl:applytemplates
rule to let the XSL processor build the table from the rest of the document.
The second rule matches any article elements it encounters and builds table rows (tr elements) to
contain their content. Again, xsl:apply-templates lets the processor continue to work on the
contents of the article element.
The last rule matches any child element of any article element (article/*) and enables you to avoid the
task of creating rules for the author, title, pubyear, and other elements specifically. These become
the table cells (td elements), and xsl:apply-templates is applied yet again.
The XSL-generated XHTML as it appears in a Web browser
<!DOCTYPE html PUBLIC "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" "-//W3C//DTD XHTML
1.0 Strict//EN">
<html xml:lang="en-US" lang="en-US" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Catalog</title>
</head>
<body>
<h1>Articles</h1>
<table>
<tr>
<td>Simon St.Laurent</td>
<td>XML Elements of Style</td>
<td>2000</td>
<td>McGraw-Hill</td>
<td>0-07-212220-X</td>
<td>$29.99</td>
</tr>
<tr>
<td>Elliotte Rusty Harold</td>
<td>XML Bible</td>
<td>1999</td>
<td>IDG Articles</td>
<td>0764532367</td>
<td>$49.99</td>
</tr>
<tr>
<td>Robert Eckstein</td>
<td>XML Pocket Reference</td>
<td>1999</td>
<td>O'Reilly and Associates</td>
<td>1-56592-709-5</td>
<td>$8.95</td>
</tr>
<tr>
<td>Kevin Dick</td>
<td>XML: A Manager's Guide</td>
<td>1999</td>
<td>Addison-Wesley</td>
<td>0201433354</td>
<td>$29.95</td>
</tr>
<tr>
<td>Simon St.Laurent</td>
<td>XML: A Primer, 2nd Ed.</td>
<td>1999</td>
<td>IDG Articles</td>
<td>0-7645-3310-X</td>
<td>$19.99</td>
</tr>
<tr>
<td>Simon St.Laurent</td>
<td>Building XML Applications</td>
<td>1999</td>
<td>McGraw-Hill</td>
<td>0-07-134116-1</td>
<td>$49.99</td>
</tr>
</table>
</body>
</html>
This example is pretty simple because it doesn't need to create or access any attributes. To make these
entries referenceable, use the ISBN of each article to create an id attribute on the table row. You only
need to change the rule that handles the article element, although you reach into the isbn element to
create the attribute.
<xsl:template match="article">
<tr>
<xsl:attribute name="id">
<xsl:value-of select="./isbn"/>
</xsl:attribute>
<xsl:apply-templates/></tr>
</xsl:template>
The xsl:attribute element enables you to add attributes to the current element – in this case the tr
element. The xsl:value-of element fills in the content based on its select attribute's value. The
select attribute's value is "./isbn", meaning to start from the current source tree node and find a
child isbn element. The xsl:apply-templates element then lets the rest of the processing continue
as usual. The new entries in the table now look like this:
<tr id="0-07-212220-X">
<td>Simon St.Laurent</td>
<td>XML Elements of Style</td>
<td>2000</td>
<td>McGraw-Hill</td>
<td>0-07-212220-X</td>
<td>$29.99</td>
</tr>
Similarly, you can reach into attributes for their values using XPath's @name syntax for referencing
attributes. Remember, attribute values are added by default to your content because of the default rules
built into XSLT. You may want to override this behavior, as shown here:
<xsl:template match="@*">
</xsl:template>
This lets the default rule for text apply, but it prevents attributes from showing up.
XSLT offers an enormous number of options that build on these basic structures and enable you to sort,
recombine, split up, or modify your content.
|