XML and the Next Generation of the Web

an article added by: Albert Lichtblau at 06022007


In: Categories » » HTML XHTML and CSS » XML and the Next Generation of the Web

Making your Mozquito HTML/JavaScript Accessible While Mozquito Factory produces HTML and JavaScript that functions on any browser supporting JavaScript 1.2, you need to exert some extra effort to make your documents as accessible as possible. Unfortunately, many Web developers think Web accessibility is limited only for people with disabilities. Keep in mind that people with older versions of browsers, text-based browsers, and browsers on new devices such as cell phones and PDAs will have great difficulty with Mozquito-produced content. Open up a Mozquito-produced HTML file in Notepad, WordPad, or any other text editor. Scroll down to the very bottom of the document and you can see their use of the <NOSCRIPT> element:

   <noscript>
 <center>

Sorry, your browser doesn't support JavaScript, or you have turned off JavaScript in your browser.<p>

   <b>Please activate JavaScript or get the  latest
   <a  href="http://home.netscape.com">Netscape Navigator</a> or
   <a  href="http://www.microsoft.com/windows/ie/default.htm">Internet  Explorer</a>
   to view this page properly!</b><p>
   </center>
 </noscript>

As I mentioned several times in this article, FML closely resembles the HTML 4.01 version of forms. Strip out the preloading images, editable lists, and layers, and you can duplicate your entire FML form in HTML 4.01. tags. If you have time to spare after you complete your FML document, make the edits in a separate HTML file and copy them over to your Mozquito HTML document. (Keep in mind that each time you export from your FML document to the Mozquito HTML document you lose your code.) That's why it's important to save it in a separate HTML file and add it to your Mozquito HTML just before you post it on your Web server. For more information on making your JavaScript and forms more accessible, take a look at the W3C Web Content Accessibility Guidelines (<http://www.w3.org/TR/WAI-WEBCONTENT>).

XML and the Next Generation of the Web

You've looked at XHTML from all different angles, from the new capabilities it introduces to the new costs it imposes, and pondered its use in devices from cell phones to Web browsers on PCs to Web servers and even larger-scale devices. Now that you've waded through all of that, it's time to consider the long-term payoff – the overall impact on the once familiar World Wide Web.

Person to Person and Machine to Machine So far, the Web has mostly been a tool for person-to-person and person-to-machine connections. While simple advertising-oriented brochureware Web sites and most information content of the Web is intended for human consumption, much of the driving force (read: investment opportunities) behind the Web has come from projects that make it easier for humans to connect to machines. Humans connect to machines to enter orders for goods, for instance, setting off a whole series of events that is largely managed by the computers while involving many people along the way.

For the most part, humans have maintained a "don't call me, I'll call you" attitude toward computers. Commercial automated e-mail, commonly known as spam when it is unsolicited, is seen as a bane of the Internet and not one of its attractions. While machine-to-person communications got a small boost in the brief period when push seemed popular, bandwidth concerns and the growing ease with which people could retrieve information themselves left push without many customers. Similarly, people don't seem excited over the prospect of computer monitoring of their Web surfing that results in suggestions about buying products seemingly appropriate to their interests.

XHTML enters this framework – in which markup has provided human-readable information and form responses have provided machine-readable information – and it opens some new doors. Markup still presents information to people, but it also carries information from machine to machine. XHTML modularization and the extensibility it can provide, specifically for forms, promises sizable improvements in the kinds of information people can send to machines. And while nothing in XHTML makes spam any more interesting, XHTML at least opens the possibility of machine-to-person transmissions that carry useful information for your computer that you don't need to read. A teacher can read a neatly formatted message that three new students have been added to her class, sent automatically by the school's computer. Meanwhile, her computer has already extracted their names and added them to the grade article.

Automating – and Fragmenting – the Web The preceding example exemplifies a tiny piece of what XHTML makes possible. By enabling developers to create application-specific vocabularies and use them in combination with the more generic HTML vocabulary, XHTML lets documents carry multiple layers of information. These layers may be aimed at different "customers" of the document, with one layer (likely using the HTML vocabulary) presenting the message as a document for human consumption and the other layers containing information for use in automated processing tools.

Although HTML may look fragmented and riddled with incompatibilities if you're a Web developer trying to perform complex tasks across browsers from multiple vendors, the overall similarities of those implementations generally outweigh their differences. The expectation of similarity that simple HTML creates often makes it more frustrating when the differences begin to appear. XHTML to some extent – and XML to a much greater extent – has frightened lots of people with the prospect of wildly different vocabularies shattering the shared understanding that has kept the Web (mostly) unified up to this point. As the Web grows, however, demand for such customized vocabularies rises. The value of more specific descriptions becomes more obvious as developers of Web applications try to build in additional functionality.

Many intranet sites already include bastardized HTML, containing markup that isn't HTML. The generic div and span elements have become placeholders for this kind of information for developers who want to stay within the HTML framework. They can use the class attribute to indicate what the information really is. (This attribute offers limited extensibility.) Microsoft provides XML data islands within HTML documents that give developers a more formal set of tools for working with this information, although that set only works within Microsoft's own software frameworks. The primary benefit of this additional vocabulary is increased customizability, which enables developers to build all kinds of application hooks into documents that let scripts or programs process them efficiently and reliably. The costs are a bit more complex, but they mostly stem from the fact that not all of the potential recipients of a document have the tools needed to process that document completely. Web developers who rely on plug-in capabilities already face this problem, but extending the HTML vocabulary threatens to make it worse, at least in the short run. Developers can either ship all the information, whether the recipient can use it or not, or spend processing cycles negotiating which information the recipient can process.

The shape of these negotiating and processing frameworks isn't clear yet. While it's reasonable to assume that it will be built on the structures already used for content negotiation (such as HTTP headers and MIME content types) and markup processing (such as the Document Object Model and XSLT), lots of missing pieces remain. Using XHTML to extend the HTML vocabulary will be a risky process, and at least will involve some serious inefficiencies at first. Negotiation can consume resources, while skipping negotiation and just shipping information may mean users get information for which they don't have tools. Unlike the information sent for use with plug-ins today, XHTML doesn't provide an extra built-in step that gives the user a chance to say, "No, I don't want that content or the software to display it." Using XHTML (as a foundation) and additional XML (incorporated as XHTML modules) to extend that foundation should ensure a basic level of understanding for users, even if their tools can't process the entire document. As the level of XML content rises, however, it may become more difficult for users to handle documents appropriately without the right tools. Infrastructure for dealing with these cases and for helping users find the right tools is just getting started. For now, extending XHTML is a fairly risky task that may cause more trouble than it's worth.

Caution Automation of the kind just described may incur security risks. Building programs that respond to content in messages makes those messages the bearers of potentially damaging information. If you write these kinds of applications, make certain to build them within a secure framework that includes authentication and provides safeguards against corrupted or lost information. It's also worthwhile to set boundaries that require human intervention, as many workflow applications have found.

Information Leaks As XHTML documents come to include more and more "real" information, the risks of unplanned information distribution increase. HTML documents can, of course, contain confidential or other sensitive information. However, HTML has a more comforting "all the information is on the surface" style. As developers start to include multiple layers of information in documents, some of those layers may not be visible to users directly.

To take an extreme case, imagine a corporate annual report prepared for public consumption. Underneath the calculated public numbers and pretty pie charts lie an enormous number of confidential details about the company's operations, along with auditing information and production notes. All of this information is removed from the final HTML version, which fits the preceding description – all the information is on the surface.

Suppose, however, that someone decides that the annual report might be very useful to certain parts of the company – say top management or the board of directors – as an interface to the more concrete details. Unlike the flat HTML version, this enhanced XHTML version would enable its users to click through tables and charts to reach the underlying information, rearranging it if needed for different viewpoints. When opened, the interface is very familiar; the annual report looks just like it did before, in HTML. The extra features and information require user interaction to set them off.

If this thoroughly enhanced XHTML document is mistaken for its flatter cousin and it reaches the outside world, maybe an analyst, the consequences could be dire. The problem doesn't involve crackers breaking into systems; it involves human error and a lack of infrastructure for managing such information. While this is pretty much a worst-case scenario, it warns of things that are newly possible when sophisticated representations of private information are used in the same framework as their public versions. XHTML opens new possibilities, but it brings with it new responsibilities. The security infrastructure isn't there yet, and markup provides no security on its own.

Reviving the Agent Dream While information leakage may be harmful in some contexts, it reopens the door to a whole range of applications that weren't possible in the HTML Web. Agents, software designed to automatically find and process information to meet user needs, may have another chance. While agents originally promised to give users customized tools for finding the information they wanted (sale prices on tuxedos, for instance), they were often stymied by the difficulty of sorting out HTML markup and the imprecise nature of the human languages surrounding the information.

XHTML isn't a magic cure-all for these problems. Human language remains an important part of the content that agents must deal with for many kinds of searches, and the core of XHTML itself remains fairly difficult for agents to interpret. If prices, for instance, are rendered as red and bold using cascading style sheets, that information might not even appear within the document. Agents need to figure out something else (the class attribute?) to latch on to, if they hope to reliably extract information that users want.

On the other hand, XHTML's extensibility may give agents some real information to work with in the form of embedded XML content. If, for instance, a common module for marking up sales information was widely used – or even if multiple modules came into use – agents would have meaningful pointers to the information they wanted. While companies may be concerned about enabling comparison shopping by providing such information, they may find that it brings them new customers as well.

Will XHTML Survive? Some early critics of HTML have waited a long while for a replacement to come along. From their perspectives, XML offers a much more versatile set of tools with a minimal learning curve and it can fit into the same infrastructures (browsers, HTTP, and Web servers) that HTML does today. Tools such as XLink can give XML hyperlinking capabilities that go far beyond the simple mechanisms provided in HTML, and XSL style sheets promise formatting power that similarly surpasses the wildest dreams of HTML-based Web developers. XML makes it possible to create vocabularies, such as Structured Vector Graphics (SVG) and Synchronized Multimedia Integration Language (SMIL), which can present graphics and multi-media far better than the more general purpose document-oriented HTML. Seen from this perspective, HTML is past its prime – a weak tool whose replacement is only forestalled by the existence of many millions of legacy browsers.

A friendlier perspective finds the HTML vocabulary more valuable. Even apart from the millions of browsers already distributed, or the large community of developers who already have a solid understanding of how it works, HTML still works well for many of the reasons that catapulted it to prominence in the first place. It's not difficult to create HTML documents, and even while XHTML imposes a few more rules on structure, those rules can actually help keep beginners out of trouble. The fixed HTML vocabulary provides a set of boundaries that keeps projects from aiming at impossible goals, while giving document creators the power they need to build usable interfaces. HTML has already proven capable of accommodating extensions, from scripting to style sheets to applets and objects. You can argue that much of the world gets along just fine without XML and won't gain that much by using it.

It seems likely that Web development will follow a more moderate course than these two proposals. The HTML vocabulary is too well known and too well supported to disappear quickly, and it will probably always provide a kind of baseline vocabulary for many types of markup. The HTML vocabulary contains some other features that will be a long time coming in XML, providing semantics for information that isn't just formatting. HTML forms are one area in which HTML has a distinct advantage, but HTML includes a lot of other features for describing content that have yet to be implemented in any widely used manner in XML. XML provides no general tool for including scripts in documents and it lacks a general way of including style sheet information within a document. Ad hoc solutions to all of these problems can be developed on a vocabulary-by-vocabulary basis, but XHTML already has ready-made solutions to these problems and a large community of developers who know how to use them.

XHTML's development promises to eradicate the largest problem facing HTML: its brittleness brought on by its lack of extensibility. At the same time, XHTML may solve some of the problems XML developers face as they bring XML into the Web environment by providing reusable solutions to realworld problems. While XHTML documents may eventually look very little like their HTML forbearers, it seems probable that many of HTML's features will last beyond the transition period (perhaps with some remodeling). Making the leap directly to XML will remain difficult unless more tools for integrating it with other Web tools appear, and XHTML already holds much of that needed toolkit.

Efficient, Friendly, Invisible XHTML is probably the biggest change to the underlying architecture of the World Wide Web since it first appeared. HTTP 1.1 refined the protocol for transferring information, but XHTML remodels HTML in a way that may eventually make it unrecognizable. Instead of battling tag soup, the ever-growing and uncontrolled additions to HTML made by vendors, the W3C has changed its tune and thrown the doors open to new vocabularies. New vocabularies should come properly attired in namespaces and XHTML modules, but the possibilities are there.

XHTML promises to change the Web from a medium that people use to communicate with other people to a medium that people and computers use to communicate with other people and computers. This transition will incur some costs and produce some problems along the way, but the end result may be a Web that saves people time and effort. The Web has already demonstrated that large networks can create new opportunities, but its current form means that many opportunities have been ignored or wasted. These problems don't involve the more obvious bandwidth issues, although those remain important, but what we can do with that bandwidth.

Perhaps the most important aspect of this change is how small it is, at least at first. As you've seen, XHTML 1.0 starts the transition with as little disruption as possible (although some disruption is unavoidable). While the transition through XHTML 1.1 to the future XHTML 2.0 is likely to involve more bumps, these new structures are being built on the same familiar infrastructure that has supported HTML for years. XHTML isn't starting afresh with a brand-new Web; it's adding new potential to the existing Web. Users and developers, building on familiar tools, hopefully will find that the XHTML tuneup gives them a more useful Web without requiring them to understand the underpinnings.

Tip Still want to know more about XHTML, or discuss its working? Try the XHTML-L list. Details are available at http://www.egroups.com/group/XHTML-L.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

link to this article    
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. HTML and XHTML Application Possibilities
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...

2. Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...

3. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

4. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

5. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

6. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

7. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

8. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

9. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...