XML and CDATA

an article added by: Albert Lichtblau at 06022007


In: Categories » » HTML XHTML and CSS » XML and CDATA

Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, underscore, or colon, and may consist of letters, digits, underscores, colons, hyphens, and periods. A target can't start with any case variation on XML. After that, any characters may appear. (Although if ?> appears inside of PI content, the PI ends abruptly and the document probably won't parse.) The general syntax is:

 <?target whatever?>

For example, you can use a processing instruction like this:

 <?page 212-555-1212?>

in the middle of an XML document, or:

 <?paint mix red and green and smear them  around?>

Obviously, most XHTML applications don't know what to do with these and many older browsers treat the contents of the processing instruction – or part of the contents – as text and include them in the document. Using processing instructions is not a good idea unless you pass your XHTML through XML processors that understand particular processing instructions or the W3C creates some standard ones, which isn't very likely to happen for XHTML. Processing instructions can appear anywhere in an XML document except inside of markup. They can appear before a document (but after the XML declaration, if there is one), any place text can appear within elements (though not within the tags), and after a document. They follow the same rules as comments, and you can think of them as comments meant for computer consumption.

CDATA sections XML provides a new tool for protecting content, such as scripts and styles, which uses markup characters (<, &, and >) for purposes other than markup. CDATA (or character data) marked sections tell parsers to ignore any markup that appears within the section until its end is reached. By using fairly distinctive syntax, CDATA sections are hard to miss.

 <![CDATA[protected content]]>

To protect this script, for example, you can use:

   <SCRIPT  LANGUAGE="JavaScript">
   <![CDATA
   document.writeln("<P>See?  This was created today! </P>");
   var today = new Date(); // Use  today's date
   var text= "Today is " +  (today.getMonth() + 1) + "/" + today.getDate() + "/"  +today.getFullYear()+".";
   document.writeln(text);
   ]]>
 </SCRIPT>
This isn't a perfect solution because older browsers will choke on the strange new syntax and scripts may not behave. However, it does make it much easier to integrate XHTML with XML processing. You can use CDATA sections any place you expect to have a run of markup characters, or you can use the built-in entities (&lt; for <, &amp; for &, and &gt; for >).
  

Namespaces Namespaces are one of the most controversial aspects of XML, and their usage in XHTML produced a significant obstacle in XHTML's passage toward becoming a W3C Recommendation. Fortunately, the scheme in question was dropped in favor of a much simpler scheme so you easily can work with the results. Namespaces address the key problem of overlapping names that emerges when developers try to mix more than one markup language. A title in XHTML is a title for the Web page, while a title in a markup language describing books probably identifies the title of the book. As XHTML is expected to be used (eventually) both as a container for XML information and within XML documents, some mechanism needs to distinguish XHTML elements and attributes from those in other markup languages. (This mechanism makes it much easier to build applications that process XHTML as well.) Namespaces enable document authors to assign Uniform Resource Identifiers (URIs), a superset of the familiar URLs used to identify documents and other components on the Web to element and attribute names. For example, the namespace for XHTML is:

   http://www.w3.org/1999/xhtml

Effectively, namespaces can add this to every element name in an XHTML document to identify them clearly as XHTML. Typing this over and over is repetitive, and most URIs would result in prohibited element and attribute names anyway, so the namespaces tools provide an easier mechanism. Namespaces are declared in special attributes that begin with xmlns. These namespaces then are available to all the child elements of the element containing the attribute, unless those child elements override the declaration by making a new one of their own. It sounds a bit tricky, but it's actually easier than it sounds.

There are two ways to attach namespaces to elements and attributes. Both use the same declaration mechanism; but one allows the creation of a default namespace, while the other creates namespaces that correspond to particular prefixes. The default namespace is used by most XHTML. The prefix mechanism will probably be applied to other types of XML contained within XHTML, and occasionally to XHTML contained in other types of XML. To declare a default namespace, create an attribute named xmlns and assign it a URI value. For example,

 <html  xmlns="http://www.w3.org/1999/xhtml">

The default namespace is applied to the html element in which the declaration is made and to all of the elements contained within that html element that don't have namespace prefixes or new declarations of the default namespace. In XHTML, it also applies to all of the attributes of those elements that don't have namespace prefixes of their own – although you can't count on this in other flavors of XML. For example, in the following simple XHTML document, all of the elements and attributes (except the namespace declaration itself: the xmlns attribute) are in the XHTML namespace (http://www.w3.org/1999/xhtml). The namespace declaration is required for XHTML 1.0 documents.

   <?xml  version="1.0"?>
   <html  xmlns="http://www.w3.org/1999/xhtml">
   <head>
   <title>Namespace  test</title>
   </head>
   <body>
   <h1>Namespaces!</h1>
   <p>All of the elements in this  document are in the http://www.w3.org/1999/xhtml namespace,
   even the picture.</p>
   <img  src="namespacesquare.gif" height="100"  width="100" />
   </body>
 </html>

An XHTML parser reading this document receives two pieces of information about every element here: its name and the namespace attached to it. You can represent the same document using a different namespace mechanism: prefixes. You declare prefixes using a similar attribute syntax, but the prefix follows the xmlns and a colon. Prefixes cannot begin with xml or any case variant of xml, such as XML or XmL. For example, to declare the namespace prefix xhtml, use the attribute name xmlns:xhtml. A version of the same document that uses this format looks like:

   <?xml  version="1.0"?>
   <xhtml:html  xmlns:xhtml="http://www.w3.org/1999/xhtml">
   <xhtml:head>
   <xhtml:title>Namespace  test</xhtml:title>
   </xhtml:head>
   <xhtml:body>
   <xhtml:h1>Namespaces!</xhtml:h1>
   <xhtml:p>All of the elements  in this document are in the
   http://www.w3.org/1999/xhtml  namespace, even the picture.</xhtml:p>
   <xhtml:img  src="namespacesquare.gif" height="100"  width="100" />
   </xhtml:body>
 </xhtml:html>

There are a lot of issues with namespaces and XML 1.0, the worst of which is incompatibility between XML 1.0 validation and namespace prefix changes. As a result, this document – which technically represents the exact same information as the preceding version – won't make it through a validating XML parser although it may well work in non-validating environments. This form is available if you need to include XHTML content in other XML documents, but it's best to stick with the simpler default namespace form for XHTML documents.

Caution I suggest you do not apply prefixes to XHTML attributes. While it may be appropriate if you want to apply XHTML attributes to non-XHTML element names in some combination with other vocabularies, no real rules exist for processing such documents.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Link to this article from your page    Send this article to you or to a friend
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

2. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

3. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

4. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

5. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...

6. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...

7. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...

8. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...

9. Formatting Content with CSS Properties
While selectors do a great job of picking out content that needs formatting, designers (as opposed to Web site managers) like CSS mostly because of the large number of available formatting properties. CSS offers properties that support nearly any presentation of a document desired, and yet more properties are in development as part of the CSS3 activity. CSS properties enable you to describe precisely how you want the pieces of your document formatted and to override the rules by which HTML is presented normally. <...