XML DTD Modules

an article added by: Albert Lichtblau at 06022007


In: Root » Computers and technology » HTML XHTML and CSS » XML DTD Modules

French Spanish Portuguese Italian German Japanese Chinese Korean Russian Arabic

DTD modules are better defined than abstract modules, although not quite as flexible. Because they use the formal syntax of XML 1.0 DTDs (as described in Article 6), DTD modules have all the capabilities and all the limitations of any XML DTD. XHTML 1.1 DTD modules are also more complex than the average XML DTD, using a set of naming conventions that takes full advantage of parameter entities to create customizable descriptions of document structures. Parameterization is extremely powerful, but it does take some getting used to.

Tip XHTML 1.1 DTD modules are a lot harder to read than many XML DTDs. If you can't penetrate the formal description of a given module, the abstract module should help you. If you write your own modules, it is critical that you include abstract modules.

XREF If you haven't done much with parameter entities, you may want to go back to Article 6 and review their syntax and usage. The rules for creating XHTML 1.1's XML DTD modules are presented in Section 5 of the Building XHTML Modules draft and demonstrated in Section 6. There are a few additional conventions used in Modularization of XHTML that Building XHTML Modules doesn't describe, which I cover here as well. They appear useful and help explain some of the syntactical shortcuts (such as the Common attributes) used in abstract modules.

Parameterization just means putting all the contents of declarations into param-eter entities. This makes the declarations easier to manage, and at the same time makes it much easier to modify them. While you can modify attribute declarations and parameter entities by making the declaration again, XML prohibits multiple declarations for element types. By putting the contents of those declarations into parameter entities, the creators of XHTML modules can provide a lot more flexibility. Let's look at examples of each of these suffixes taken from the W3C draft DTD, building from the smallest atomic pieces to the largest.

.datatype The data types in XHTML 1.1 are direct descendants of those in XHTML 1.0, and they are declared in Section B.2.1. Most of the data types are simply more precise names for CDATA, textual content:

   <!-- a Uniform Resource  Identifier, see [URI] -->
   <!ENTITY % URI.datatype  "CDATA" >
   These data types are then used in  attribute declarations:
   <!ATTLIST a
   %Common.attrib;
   href %URI.datatype; #IMPLIED
   charset %Charset.datatype; #IMPLIED
   type %ContentType.datatype; #IMPLIED
   hreflang %LanguageCode.datatype;  #IMPLIED
   rel %LinkTypes.datatype; #IMPLIED
   rev %LinkTypes.datatype; #IMPLIED
   accesskey %Character.datatype; #IMPLIED
   tabindex %Number.datatype; #IMPLIED
 > 

All of these data type declarations actually resolve to CDATA when an XML processor reads the DTD, but they make the content that should be stored in these attributes much more identifiable.

Tip While XML 1.0 processors can't do much to enforce data typing today, schema processors should be capable of accomplishing more with this information in the future. Think of this approach as adding information to the DTD so it's ready for the next version. These data type names are used in the abstract modules for XHTML 1.1 as well, supplementing the core XML 1.0 set of types.

.attrib The .attrib suffix is used on parameter entities that represent one or more attribute specifications – the part of an attribute list declaration that defines individual attributes, their types, defaults, and possible values. These entities sometimes describe only one attribute, like this one for the id attribute:

   <!ENTITY % Id.attrib
   "id ID
   #IMPLIED"
   > 
   They may specify multiple  attributes, like this one for xml:lang and dir:
   <!ENTITY % I18n.attrib
   "xml:lang  %LanguageCode.datatype; #IMPLIED
   dir ( ltr | rtl ) #IMPLIED"
 > 

These entities also may include other entities with the .attrib suffix, as in the ubiquitous Common.attrib entity:

   <!ENTITY % Common.attrib
   "%Core.attrib;
   %I18n.attrib;
   %Events.attrib;"
 > 

This just includes all of the attribute specifications declared in the Core.attrib, I18n.attrib, and Events.attrib entities, building a large list of common components. The quotes need to be used even though all of the contents of the entity are contained in parameter entities.

.attlist The .attlist suffix (not documented in Building XHTML Modules) is used in the XHTML 1.1 DTDs to turn ATTLIST declarations on and off. Parameter entities that have the .attlist suffix take one of two values: INCLUDE or IGNORE. These function with a feature of XML 1.0 DTDs not used in XHTML 1.0: conditional sections.

Tip For a much more detailed explanation of conditional sections and their use in other XML contexts, see Article 16 of XML Elements of Style by Simon St. Laurent (McGraw-Hill, 2000). Conditional sections may appear in DTDs only; they enable DTD designers to turn sets of declarations on and off. By using parameter entities to determine whether to include or ignore a section, developers make it possible to use portions of a DTD or even choose among different variations on a single DTD. For example, this DTD fragment includes the attributes for the title element type:

   <!ENTITY % Title.attlist  "INCLUDE" >
   <![%Title.attlist;[
   attrib;
   > 
 <!-- end of Title.attlist  -->]]>

The first line creates a parameter entity named Title.attlist whose value is INCLUDE. In the next line, the entity is substituted with %Title.attlist; to produce these resulting declarations:

   <![INCLUDE[
   <!ATTLIST title
   %I18n.attrib;
   > 
   <!-- end of Title.attlist  -->]]>
   An XML parser strips out the INCLUDE  section and  the comment, leaving a core of:
   <!ATTLIST title
   %I18n.attrib;
   > 
   Which then becomes:
   <!ATTLIST title
   xml:lang %LanguageCode.datatype;  #IMPLIED
   dir ( ltr | rtl ) #IMPLIED
   > 
   and finally:
   <!ATTLIST title
   xml:lang NMTOKEN #IMPLIED
   dir ( ltr | rtl ) #IMPLIED
 > 

If, on the other hand, another module redeclares the Title.attlist entity to be IGNORE:

   <!ENTITY % Title.attlist  "IGNORE" >
   then the result is:
   <![IGNORE[
   <!ATTLIST title
   %I18n.attrib;
   > 
 <!-- end of Title.attlist  -->]]>
which prohibits the parser from processing the declarations at all, leaving title with no attributes. Entities with the .attlist suffix surround the attribute list declarations for every element type in the Modularization of XHTML draft.
  

.content The .content suffix functions for parameter entities that describe content models for particular element types. The simplest example, for an EMPTY content model, looks like this:

   <!ENTITY % Input.content  "EMPTY" >
   <!ELEMENT input %Input.content;  >
   When processed, this resolves to:
 <!ELEMENT input EMPTY >
   and defines the input element as  having an empty content model. By redeclaring entities with a
   .content  suffix,  other modules easily can modify the content model of an element.
 

.class (and .extra) The .class suffix functions for parameter entities that may be used repeatedly in content models for multiple elements, but only when the contents are element type names that all share something in common. In XHTML, this tends to mean that block elements are one class, while inline elements are another class. These entities aren't defined (with one exception, noted next) in the Modularization of XHTML draft. They are defined in the customization file, another module, in Appendix C of XHTML 1.1 - Module-based XHTML. For example:

 <!ENTITY % Inlstruct.class  "br | span" >

Through the abbreviations, you can see that these are structural element types that may appear as inline elements. br is used for line breaks within block elements, while span is an abstract element mostly useful for marking off inline content in ways that aren't reflected by other inline content. This entity and several of its siblings get combined into a larger Inline.class entity:

   <!ENTITY % Inline.class
   "%Inlstruct.class;
   %Inlphras.class;
   %Inlpres.class;
   %I18n.class;
   %Anchor.class;
   %Inlspecial.class;
   %Ruby.class;
   %Inline.extra;"
 > 

One oddity here is Inline.extra – Building XHTML Modules describes no "official" convention for .extra. Inline.extra has this declaration:

   <!ENTITY % Inline.extra
 "| input | select | textarea |  label | button" >

The DTD comments describe how to use this .extra suffix: While in some cases this module may need to be rewritten to accommodate changes to the document model, minor extensions may be accomplished by redeclaring any of the three *.extra; parameter entities to contain extension element types as follows: %Misc.extra; whose parent may be any block or inline element. %Inline.extra; whose parent may be any inline element. %Block.extra; whose parent may be any block element. If used, these parameter entities must be an OR-separated list beginning with an OR separator ("|"), eg., "| a | b | c" While .extra is undocumented (so far) in Building XHTML Modules, it is a critical piece for developers who want to add their own extensions to XHTML 1.1. The .class suffix also functions in at least one place for attributes. The following entity includes all of the input types:

   <!ENTITY % InputType.class
   "( text | password | checkbox | radio |  submit
   | reset | file | hidden | image )"
   This is then used in an attribute  declaration:
   <!ATTLIST input
   %Common.attrib;
 type %InputType.class; 'text'

This anomaly probably derives from the input element's unusual use of an attribute to signify its "real" content.

.mix The .mix suffix creates lists of elements for use in content models in which different classes of items get combined. The Flow.mix entity (also from the customization file of XHTML 1.1 - Module-based XHTML) is a good example:

   <!ENTITY % Flow.mix
   "%Heading.class;
   | %List.class;
   | %Block.class;
   | %Inline.class;
   %Misc.class;"
 > 
Any element that uses Flow.mix within its content model can include just about any element in the XHTML vocabulary. It's definitely a combination.
  

.mod Parameter entities that end in .mod assemble complete DTDs out of all of these little parts that comprise XHTML. XHTML 1.1's driver file (included as Appendix B of XHTML 1.1 - Module-based XHTML) contains lots of these entities.

   <!ENTITY % xhtml-form.mod
   PUBLIC "-//W3C//ELEMENTS XHTML  1.1 Forms 1.0//EN"
   "xhtml11-form-1.mod" >
 %xhtml-form.mod;

In this case, the entity is declared using a public identifier and a system part. If the application or parser processing this understands the public identifier, it can use that information to include the DTD. If not, it can use the relative URL that follows to retrieve the file. After the declaration, the contents of the file are immediately included in the DTD.

.module Parameter entities that end in .module turn the entities that end in .mod on and off using the same conditional statements (INCLUDE and IGNORE) that .attlist entities use. For example, the XHTML form module normally gets loaded because of this code:

   <!ENTITY % xhtml-form.module  "INCLUDE" >
   <![%xhtml-form.module;[
   <!ENTITY % xhtml-form.mod
   PUBLIC "-//W3C//ELEMENTS XHTML  1.1 Forms 1.0//EN"
   "xhtml11-form-1.mod" >
 %xhtml-form.mod;]]>

If you want to keep the form module from loading, all you have to do is define a new xhtmlform. module entity before that one, overriding it with a value of IGNORE:

   <!ENTITY % xhtml-form.module  "IGNORE" >
   ...
   <!ENTITY % xhtml-form.module  "INCLUDE" >
   <![%xhtml-form.module;[
   <!ENTITY % xhtml-form.mod
   PUBLIC "-//W3C//ELEMENTS XHTML  1.1 Forms 1.0//EN"
   "xhtml11-form-1.mod" >
   %xhtml-form.mod;]]>
   The result is this:
   <![IGNORE[
   <!ENTITY % xhtml-form.mod
   PUBLIC "-//W3C//ELEMENTS XHTML  1.1 Forms 1.0//EN"
   "xhtml11-form-1.mod" >
   %xhtml-form.mod;
 ]]>

The module doesn't load. Note the naming convention here – both the entities (.mod and .module) have the same name except for the suffix. This makes the DTDs much more manageable.

Schema Modules XHTML 1.1 will use XML Schemas when they're ready. The Modularization of XHTML draft's Appendix A states, "This appendix will contain implementations of the modules defined in XHTML Abstract Modules via XML Schema [XMLSCHEMA] when the XML Schema becomes a W3C approved recommendation." The Building XHTML Modules draft doesn't specify how to build Schema modules. The current Schema drafts don't use mechanisms that correspond directly to parameter entities, but similar approaches may be possible using general entities (because Schemas are XML documents themselves) and the extension and restriction mechanisms of XML Schemas.

legal disclaimer

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

related articles

1. HTML and XHTML Application Possibilities
Overview Shifting from HTML to XHTML requires a significant change in mindset from the design-oriented freefor- all that characterized the early years of the Web. This change in style reflects movement in the underlying architecture toward a more powerful and more controllable approach to document creation, presentation, and management. Understanding the connections between the architectural and stylistic changes may help you find more immediate benefits from XHTML –...

2. Coding Styles HTMLs Maximum Flexibility
The XHTML 1.0 specification provides a set of rules for XHTML (User Agent Conformance) that includes a rough description of how XHTML software differs from HTML software, though these rules exist mostly to bring XHTML rendering practice in line with the rules for parsing XML 1.0. XHTML also is designed to remain compatible (mostly) with the previous generation of HTML applications, so it may take a while for the transition to occur. Pure XHTML user agents (also known as XHTML processing software) aren't l...

3. XML and XHTMLs Maximum Structure
Coding Styles— XML and XHTML's Maximum Structure Overview XML parsers are far more brutal about rejecting documents they don't like than are HTML browsers. XML's clear focus on structure demands that the practices described in the previous chapter must change. However, most of those changes shouldn't cause more than minor inconveniences – at least for newly created documents. Note If reading this chapt...

4. XML and CDATA
Processing instructions XML also enables developers to pass information to the application through processing instructions (often called PIs). Processing instructions use a similar syntax to the XML declaration, although the rules for them are much less strict. Processing instructions begin with <? and end with ?>, but the developer generally dictates their contents. The first bit of text before a space appears in a PI is called the target. The target must start with a letter, unde...

5. lang Internationalization
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...

6. Anatomy of an XHTML Document
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...

7. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...

8. Reading the XHTML DTDs A Guide to XML Declarations
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of ...

9. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...