In: Categories » » HTML XHTML and CSS » HTML and XHTML Application Possibilities
Overview
Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML – even as the tools only start to catch up. Looking ahead to the possibilities that XHTML opens can assist you in planning a transition to more sophisticated Web applications.
From Presentation to Reprocessing and Interaction HTML is designed to present users with reasonably attractive pages (although Web designers always can make them glorious or hideous) and to support a very simple level of interaction through forms and hyperlinks. The application logic that Web browsers support – at least on the level of HTML pages and scripts, not the extensions of Java applets, plug-ins, or ActiveX controls – is relatively simple. Applications designed for the Web tend to centralize their processing on the server, storing information in databases and using Web browsers as mere windows on the server's information. This makes it possible to use more sophisticated server-side facilities for security, processing, and connectivity. While dynamic HTML made Web browsers a more advanced interface capable of animated views of information, the forms interface remains the main way for users to manipulate information and enter new information. Some Web browsers enable users to edit HTML and send it back to a server; but the editor is more or less a separate application useful only for editing HTML, not general-purpose interaction with a server application.
XHTML provides a transition from the HTML model for Web applications to the more powerful and more flexible XML model. While XHTML applications will start out much like HTML applications, XHTML will enable application developers to integrate XML tools with the HTML vocabulary. XHTML is not merely a foil for XML's eventual takeover – it promises to keep the well-known HTML vocabulary alive in this new world.
Flows and Trees: HTML and XML Parsing HTML and XML processors tend to treat the text they receive very differently. While both kinds of processors read a document from start to finish, HTML processors read HTML documents using HTMLspecific understandings. However, XML processors tend to parse documents more generically. Applications then apply their own logic to the results of the parse, without really participating in the parse itself. This separation requires that XML documents conform tightly to the XML specification because applications can't apply their own logic to provide loopholes or modify basic structures. HTML parsers typically are built for one purpose: to read HTML. Whether the parser builds a browser view of the document, retrieves information for a search engine, or feeds a shopping agent information, HTML parsers need to know a lot about HTML's vocabulary. This crucial information includes a complex set of rules about which elements don't need end tags, how to properly end elements when end tags are omitted, and rules for dealing with some particularly tricky elements. The META element, for instance, defines its real purpose in an attribute and that purpose may influence the parsing process substantially for the rest of the document when things such as character encodings are declared. The INPUT element similarly uses an attribute to define its true purpose. It would require processors to keep track of a considerable amount of information to process a form correctly if INPUT elements are nested, so nesting INPUT elements is outlawed.
As a result, HTML parsers tend to be tightly bonded to their particular applications, applying processing rules that make sense for their particular application. Search engines, for instance, usually discard all markup and focus on text – except for META elements that provide keyword information. Browsers need to collect as much information as possible from the parser, but they apply their own rules as to how markup transforms into document structures.
While XML parsers have a similar job to do, they don't expect to see a particular vocabulary; hence they can't do the kind of interpretation that HTML parsers do. Instead of interpreting the flow of information with a sophisticated set of guidelines, XML parsers extract and report a tree structure that is described by the elements, attributes, text, and other information within the document markup. XML parsers rely on explicit markup structures in the document to determine what gets reported to the application, but they don't take orders from the application much beyond instructions for which file to parse. This loose connectivity makes it easy to use the same XML parser to interpret XHTML, MathML, SVG, or any other possible vocabularies and structures. Applications have a new option of processing information generically and opening up a new set of architectures for handling information.
Application Layers for XML Document Processing XML's generic approach to markup opens numerous new possibilities for document handling, all of which you can use with XHTML. While it might seem counterintuitive that 'dumber' processors can lead to more powerful applications, XML's approach leaves more room for applications to solve a much wider range of problems.
Presenting documents XML parsers don't make any assumptions about how information should be presented – they really can't because they don't interpret the vocabularies used in documents. P, B, EM, FONT, CITE, and everything else used in HTML are just names to an XML parser – nothing else. On the other hand, XML does provide a very clean set of structures on which presentation information can be layered to build the information needed by a browser.
Cascading Style Sheets (CSS) provide one set of tools for annotating document structures with rules for presentation. CSS include a formal vocabulary for describing different types of presentation roles for velements (such as blocks, tables, or inline text) and details about how their content should be presented, from color to font family to font size. Extensible Stylesheet Language (XSL) is another possibility, as described in the following section, "Transforming documents." You can use CSS with both HTML and XML, but it is more important and easier to use with XML. When used with HTML, CSS supplements – and to some extent overrides – the rules for presenting particular elements. On the other hand, XML provides a clean slate on which CSS can operate. In fact, the CSS2 specification provides a "sample style sheet" for HTML that outlines a nearly complete set of presentation rules an XML application can use to render HTML. (See http://www.w3.org/TR/RECCSS2/ sample.html for details.)
XHTML offers the possibility of bridging these two approaches. When an HTML processor is used, it can understand the markup well enough to produce a rendering – with or without the assistance of the style sheet. When an XML processor is used, it can apply the rules in the style sheet to produce a rendering without having to understand the ins and outs of HTML. Developers who have relied on HTML's internal mechanisms for describing presentation (the FONT, B, I and other tags) may find it worthwhile to switch to the XML model. Separating the presentation description from the documents makes it much simpler to reuse formatting across a large number of documents (for example, building a consistent look without relying on templates).
Transforming documents Because the structure of XML documents is defined tightly within the document, it's relatively easy to convert information from one vocabulary and structure to another. HTML documents typically are treated as final containers for information and used primarily for delivery to end users. You can use XML documents – and XHTML documents – as waystations for information, holding information in a particular form until the user wants to work with it in a different form. A simple example of this is a set of information, such as a table storing financial results over a ten-year period. While reading a table is useful, being able to tell the application to "show me this information as a bar graph" also is handy. Right now, that process typically requires copying the information out of the HTML table, pasting it into an application that supports graphing, and then creating the graph. If the table is stored in XML or XHTML, you easily can tell an application to apply a style sheet to the table that presents the information as a graph – perhaps using the W3C's Scalable Vector Graphics (SVG) XML vocabulary for displaying graphics.
JavaScript, VBScript, Java, or whatever is convenient. You also can create style sheets, typically using the W3C's Extensible Stylesheet Language Transform-ations (XSLT) that automate conversions from one format to another. These conversions, once written, provide pathways among different formats that you can reuse on different instances of the same format. There are some limitations because a graphing vocabulary might not understand what to do with certain content – for instance, converting 'n/a' in a table to a bar graph – but a whole new range of possibilities emerges. The W3C's Extensible Style Language (XSL) is probably the most developed use of this approach. XSL style sheets are written as transformations (in XSLT) from particular XML document structures to a vocabulary composed of formatting objects, elements, and attributes that describe presentation in a very detailed way. While CSS (described in the preceding section) merely annotate document structures to provide rules for presentation, XSL enables developers to transform any kind of XML documents into documents that purely describe presentation.
While XSL is probably overkill for most designers working with XHTML, XHTML is a popular target for XSLT transformations. Converting information stored in XML documents into XHTML makes it possible to read that information on a much wider range of browsers using a commonly understood vocabulary.
Linking into and referencing documents
Because HTML documents have such flexible structures (enough so that different processors can interpret them differently), it's very difficult to create reliable and usable tools for describing locations within HTML documents. Even something as simple as "the third paragraph of the second section" is hard to pinpoint. Because XML is designed so that every parser sees the same structure in every document, it's much simpler to describe locations within XML documents. This makes it much easier to build links to and from portions of documents without requiring the use of anchor tags (<A NAME="location">) throughout a document. Effectively, it enables developers to point to parts of documents they don't control. This, in turn, makes it possible to build much more detailed pointers from search engines, bibliography sites, or just general reference without coordination between the people creating the link and the owners of the target document.
Storing documents XML's hierarchical nature opens new possibilities for document storage and management as well. While many HTML documents are generated from databases, it's very difficult to cram HTML into databases in any form more useful than an ordinary file system. HTML's chaotic flows of text work well when stored as linear files, but they're very hard to break down into smaller components for storing and indexing. You can store XML as a text flow, but it also is possible to decompose XML into a lot of smaller bits, store them in a database, and retrieve and recombine those bits as needed. This allows random access to the information stored in those documents without requiring applications to load an entire document, parse it, and pull out the desired information.
This approach is useful in two cases. In the first case, the information in the XML document is a data stream much like those traditionally stored in relational databases. Mapping XML information into and out of a relational database isn't very difficult, and tools for making this process look like an ordinary file system appear in databases from Oracle, IBM, and other vendors. In the second case, fragmenting XML documents gives readers and writers access to smaller pieces of documents so they can avoid downloading and operating on potentially enormous documents merely to retrieve a tiny bit. In this case, the XML document's native hierarchical structure is preserved – not just a mapping to and from a set of tables. While it's possible to do this fragmentation in a relational database framework (several relational vendors are pushing this), other options such as hierarchical and object databases provide a different storage mechanism that more naturally reflects the structures inside the XML document. This tends to work better for XHTML documents in which the structures may contain wildly varying amounts of text and other content.
Searching and indexing documents The same structures that make referencing and storing XML documents easy make searching and indexing them simple as well. With the referencing tools, you easily can build tables of contents and indexes that address the parts of an XML document where a search result appears. In addition, the flexibility of XML's naming structures makes it possible to search for information in particular fields. Documents using XHTML lose some of the field-based potential because they employ HTML's vocabulary for presenting information. However, other possibilities within XHTML – such as using the class attribute to provide the "real" description of what a given element contains – can provide hooks similar to XML element names.
Most search engines today discard the markup in HTML documents, preferring to use full-text strategies. While META elements occasionally may receive some attention, no conventions for identifying content and content types ever emerged in the HTML world. XHTML may not provide the free-form labeling of content that XML offers, but its capability of reliably referencing fragments should make it easier to find information within documents.
legal notice
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
Useful tools and features
If you like this article (tutorial), please link to it from your web page using the information above.
related articles
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...
2. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
3. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
4. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
5. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...
6. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...
7. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...
8. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...
9. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...