XML DTD Modules Part One

an article added by: Albert Lichtblau at 06022007


In: Categories » » HTML XHTML and CSS » XML DTD Modules Part One

XML DTD Modules DTD modules are better defined than abstract modules, although not quite as flexible. structures. Parameterization is extremely powerful, but it does take some getting used to.

Tip XHTML 1.1 DTD modules are a lot harder to read than many XML DTDs. If you can't penetrate the formal description of a given module, the abstract module should help you. If you write your own modules, it is critical that you include abstract modules.

XREF The rules for creating XHTML 1.1's XML DTD modules are presented in Section 5 of the Building XHTML Modules draft and demonstrated in Section 6. There are a few additional conventions used in Modularization of XHTML that Building XHTML Modules doesn't describe, which I cover here as well. They appear useful and help explain some of the syntactical shortcuts (such as the Common attributes) used in abstract modules. Parameterization just means putting all the contents of declarations into param-eter entities. This makes the declarations easier to manage, and at the same time makes it much easier to modify them. While you can modify attribute declarations and parameter entities by making the declaration again, XML prohibits multiple declarations for element types. By putting the contents of those declarations into parameter entities, the creators of XHTML modules can provide a lot more flexibility. Let's look at examples of each of these suffixes taken from the W3C draft DTD, building from the smallest atomic pieces to the largest.

.datatype The data types in XHTML 1.1 are direct descendants of those in XHTML 1.0, and they are declared in Section B.2.1. Most of the data types are simply more precise names for CDATA, textual content:

   <!-- a Uniform Resource  Identifier, see [URI] -->
 <!ENTITY % URI.datatype  "CDATA" >

These data types are then used in attribute declarations:

   <!ATTLIST a
   %Common.attrib;
   href %URI.datatype; #IMPLIED
   charset %Charset.datatype; #IMPLIED
   type %ContentType.datatype; #IMPLIED
   hreflang %LanguageCode.datatype;  #IMPLIED
   rel %LinkTypes.datatype; #IMPLIED
   rev %LinkTypes.datatype; #IMPLIED
   accesskey %Character.datatype;  #IMPLIED
   tabindex %Number.datatype; #IMPLIED
   > 

All of these data type declarations actually resolve to CDATA when an XML processor reads the DTD, but they make the content that should be stored in these attributes much more identifiable.

Tip

While XML 1.0 processors can't do much to enforce data typing today, schema processors should be capable of accomplishing more with this information in the future. Think of this approach as adding information to the DTD so it's ready for the next version. These data type names are used in the abstract modules for XHTML 1.1 as well, supplementing the core XML 1.0 set of types.

.attrib The .attrib suffix is used on parameter entities that represent one or more attribute specifications – the part of an attribute list declaration that defines individual attributes, their types, defaults, and possible values. These entities sometimes describe only one attribute, like this one for the id attribute:

   <!ENTITY % Id.attrib
   "id ID
   #IMPLIED"
   > 
   They may specify multiple  attributes, like this one for xml:lang and dir:
   <!ENTITY % I18n.attrib
   "xml:lang  %LanguageCode.datatype; #IMPLIED
   dir ( ltr | rtl ) #IMPLIED"
 > 

These entities also may include other entities with the .attrib suffix, as in the ubiquitous Common.attrib entity:

   <!ENTITY % Common.attrib
   "%Core.attrib;
   %I18n.attrib;
   %Events.attrib;"
   > 

This just includes all of the attribute specifications declared in the Core.attrib, I18n.attrib, and Events.attrib entities, building a large list of common components. The quotes need to be used even though all of the contents of the entity are contained in parameter entities.

.attlist The .attlist suffix (not documented in Building XHTML Modules) is used in the XHTML 1.1 DTDs to turn ATTLIST declarations on and off. Parameter entities that have the .attlist suffix take one of two values: INCLUDE or IGNORE. These function with a feature of XML 1.0 DTDs not used in XHTML 1.0: conditional sections.

Tip For a much more detailed explanation of conditional sections and their use in other XML contexts, see Article 16 of XML Elements of Style by Simon St. Laurent (McGraw-Hill, 2000). Conditional sections may appear in DTDs only; they enable DTD designers to turn sets of declarations on and off. By using parameter entities to determine whether to include or ignore a section, developers make it possible to use portions of a DTD or even choose among different variations on a single DTD. For example, this DTD fragment includes the attributes for the title element type:

   <!ENTITY % Title.attlist  "INCLUDE" >
   <![%Title.attlist;[
   <!ATTLIST title
   %I18n.attrib;
   > 
 <!-- end of Title.attlist  -->]]>

The first line creates a parameter entity named Title.attlist whose value is INCLUDE. In the next line, the entity is substituted with %Title.attlist; to produce these resulting declarations:

   <![INCLUDE[
   <!ATTLIST title
   %I18n.attrib;
   > 
 <!-- end of Title.attlist  -->]]>
An XML parser strips out the INCLUDE section and the comment, leaving a core of:
 <!ATTLIST title
 %I18n.attrib;
 > 
Which then becomes:
 <!ATTLIST title
 xml:lang %LanguageCode.datatype;  #IMPLIED
 dir ( ltr | rtl ) #IMPLIED
 > 
and finally:
 <!ATTLIST title
 xml:lang NMTOKEN #IMPLIED
 dir ( ltr | rtl ) #IMPLIED
 > 
If, on the other hand, another module redeclares the Title.attlist entity to be IGNORE:
 <!ENTITY % Title.attlist  "IGNORE" >
then the result is:
 <![IGNORE[
 <!ATTLIST title
 %I18n.attrib;
 > 
 <!-- end of Title.attlist  -->]]>
which prohibits the parser from processing the declarations at all, leaving title with no attributes. Entities with the .attlist suffix surround the attribute list declarations for every element type in the Modularization of XHTML draft.
  

.content The .content suffix functions for parameter entities that describe content models for particular element types. The simplest example, for an EMPTY content model, looks like this:

   <!ENTITY % Input.content  "EMPTY" >
 <!ELEMENT input %Input.content;  >

When processed, this resolves to:

   <!ELEMENT input EMPTY >

and defines the input element as having an empty content model. By redeclaring entities with a .content suffix, other modules easily can modify the content model of an element.

.class (and .extra) The .class suffix functions for parameter entities that may be used repeatedly in content models for multiple elements, but only when the contents are element type names that all share something in common. In XHTML, this tends to mean that block elements are one class, while inline elements are another class. These entities aren't defined (with one exception, noted next) in the Modularization of XHTML draft. They are defined in the customization file, another module, in Appendix C of XHTML 1.1 - Module-based XHTML. For example:

 <!ENTITY % Inlstruct.class  "br | span" >

Through the abbreviations, you can see that these are structural element types that may appear as inline elements. br is used for line breaks within block elements, while span is an abstract element mostly useful for marking off inline content in ways that aren't reflected by other inline content. This entity and several of its siblings get combined into a larger Inline.class entity:

   <!ENTITY % Inline.class
   "%Inlstruct.class;
   %Inlphras.class;
   %Inlpres.class;
   %I18n.class;
   %Anchor.class;
   %Inlspecial.class;
   %Ruby.class;
   %Inline.extra;"
 > 

One oddity here is Inline.extra – Building XHTML Modules describes no "official" convention for .extra. Inline.extra has this declaration:

   <!ENTITY % Inline.extra
 "| input | select | textarea |  label | button" >

The DTD comments describe how to use this .extra suffix: While in some cases this module may need to be rewritten to accommodate changes to the document model, minor extensions may be accomplished by redeclaring any of the three *.extra; parameter entities to contain extension element types as follows:

%Misc.extra; whose parent may be any block or inline element. %Inline.extra; whose parent may be any inline element. %Block.extra; whose parent may be any block element. If used, these parameter entities must be an OR-separated list beginning with an OR separator ("|"), eg., "| a | b | c" While .extra is undocumented (so far) in Building XHTML Modules, it is a critical piece for developers who want to add their own extensions to XHTML 1.1. The .class suffix also functions in at least one place for attributes. The following entity includes all of the input types:

   <!ENTITY % InputType.class
   "( text | password | checkbox |  radio | submit
   | reset | file | hidden | image  )"
 > 

This is then used in an attribute declaration:

   <!ATTLIST input
   %Common.attrib;
 type %InputType.class; 'text'

This anomaly probably derives from the input element's unusual use of an attribute to signify its "real" content.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Link to this article from your page    Send this article to you or to a friend
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. Moving From HTML to XHTML
Overview Hypertext Markup Language (HTML) is getting an enormous and overdue cleanup. Much of HTML's early charm as browsers reached a wide audience was the ease of use created by browser tolerance for a wide variety of syntactical variations and unknown markup. Unfortunately, that charm has worn thin through years of "browser wars" and demands for new features that go beyond presenting documents. The World Wide Web Consortium (W3C) is rebuilding HTML on a new foundati...

2. HTML and XHTML Application Possibilities
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...

3. Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...

4. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

5. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

6. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

7. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...