HTML XHTML and CSS :: Reading the XHTML DTDs A Guide to XML Declarations ::
Reading the XHTML DTDs: A Guide to XML Declarations Although the W3C has long had document type definitions (DTDs) for HTML, few developers actually use those DTDs as a foundation for learning HTML. XHTML 1.0 simplifies those DTDs with the slightly friendlier XML syntax – they previously used SGML's more complex syntax – and the increased emphasis on validation may lead developers to explore them more closely. Making good use of XHTML 1.1 requires some level of understanding of DTDs, so getting started now is a good idea. Fortunately, XHTML doesn't use every tool XML provides; figuring out XHTML is easier than learning all about XML. Note The W3C is moving slowly toward its new XML Schemas standard for describing document structures. You'll want to learn XML Schemas when they're ready, but the DTDs described in this structure provide a solid foundation for figuring them out. You can work with XHTML 1.0 without any comprehension of the DTD because the rules for element and attribute usage are the same as those for HTML 4.0. However, if you plan on using validating parsers with XHTML 1.0, you should know about DTDs to figure out some of the error messages you may encounter. In addition, understanding DTDs can help you out considerably with XHTML 1.1 and its modular approach. Note Because you don't necessarily need to understand DTD syntax to use XHTML, you're welcome to bail out of this article if you prefer, and come back to it if and when you need it. The W3C wrote the XHTML DTDs for its own convenience, making them more manageable (and at an abstract level, more readable) – but at the cost of requiring some cross-referencing to figure out exactly what's included in a particular element or attribute. As a result, the XHTML DTDs aren't recommended reading for developers without an XML or SGML background. The following sections introduce the different kinds of declarations used within the XHTML DTDs in their simpler forms, building up to the more complex rules used to assemble the XHTML 1.0 DTDs. Tip If you want a guide to creating and reading XML DTDs in all their glory, try XML: A Primer, 2nd Edition by Simon St. Laurent (IDG Articles, 1999). For even more details on XML technicalities, see XML Elements of Style (McGraw-Hill, 1999), also by Simon St.Laurent. Element Type Declarations Every valid document needs one or more element type declarations, which describe element names used within a document and content that appears within a given element. If an element name appears in a document and there is no corresponding element type declaration, validating parsers report an error. (Some parsers also halt processing, although that isn't required.) Similarly, if an element appears in a context where it's not supposed to appear, validating parsers report errors. The syntax for element type declarations is simple: <!ELEMENT elementName contentModel> Element names must begin with letters, underscores, or colons, and they may contain letters, underscores, colons, digits, hyphens, and periods. Element names beginning with xml (or any case variation on that, such as XmL or XML) are reserved for the use of the W3C. The use of colons is discouraged except for use with namespaces, which Article 4 describes. Content models can be a lot more complicated, enabling designers to specify intricate combinations of elements and text. There are four basic types of content models available: EMPTY, ANY, structured content models, and mixed content models. Note Element type declarations don't provide any background on what the element is for, what contexts it may be used in, or what its appearance in a given context might mean. You have to provide that information separately, typically in documentation. Element type declarations only describe a small, but important, set of element properties: name and allowed contents. The EMPTY content model The EMPTY content model is the simplest model available. EMPTY elements may either use empty element tags or a set of start and end tags with no content whatsoever (not even whitespace) between them. However, they may (and usually do) store information in attributes, which are declared separately. The img and br elements are both examples of elements with EMPTY content models, and their declarations are very similar: <!ELEMENT img EMPTY> <!ELEMENT br EMPTY> The ANY content model The ANY content model is nearly as simple as the EMPTY model. Elements declared as ANY can contain any mix of text and (declared) elements. The ANY content model is never used within XHTML 1.0, but it sometimes appears in XML documents that contain XHTML content (perhaps followed by a comment): <!ELEMENT documentation ANY> <!--Please note: XHTML is the preferred content for the documentation element, but other models may be used.--> XML developers frown on the widespread use of ANY, seeing it as introducing serious weaknesses, but you may use it appropriately in your own DTDs at the beginning of a project or to preserve spaces for future extensions. Using this decoder key, you can translate the content model of the table element type's declaration and its pieces into English. The outside parentheses just enclose the entire content model – a requirement for structured content model declarations. The first item inside the parentheses, caption?, indicates that a caption element may appear once as the first element inside the table element (but it is optional). The comma following caption? indicates that the other items following it must appear in sequence. The next chunk provides some options: (col*|colgroup*) This grouping means that either col or colgroup elements may appear after the caption and before the thead (if they appear), but that col and colgroup elements may not be mixed within a given table element. This chunk of markup says that either zero or more col elements or zero or more colgroup elements may appear at this point. If the developers of the XHTML standard had wanted to allow col and colgroup elements to be mixed, they could have written: (col|colgroup)* <!--this is not the route XHTML chose--> This says that zero or more instances of the col or colgroup elements may appear, without prohibiting both from appearing in a single sequence. A comma follows the (col*|colgroup*) grouping, followed by thead?. Like caption?, this allows the thead element to appear zero or one times. The comma following then permits tfoot? to indicate the possible appearance of a tfoot element zero or one times. The last portion of the content model is similar to the (col*|colgroup*) grouping, but with a slight change: (tbody+|tr+) Again, either tbody or tr elements may appear in this location within the content model. However, at least one instance of one of these elements is required for a valid document. This is the only required content within a table element. No instance of the table element may appear without containing at least a tbody or a tr element. Mixed content models Most of HTML's elements contain mixed content models, which enable document authors to mix text and elements together to create Web pages. Mixed content models in XML come in two varieties. The simpler variety enables you to create elements that may contain only text: The title element, for example, may contain only text: <!ELEMENT title (#PCDATA)> PCDATA stands for parsed character data, the only one of SGML's textual types that XML supports. You can write the same declaration like this: <!ELEMENT title (#PCDATA)*> The asterisk is optional when a text-only element is declared, but the asterisk makes it more consistent with other mixed content models. Mixed content models that describe the mixture of text and elements are more complicated. They look like structured content models, using the | and * indicators, but you are very limited in how you can use them. The general syntax for an element type declaration using mixed content of this kind looks like this: <!ELEMENT elementName (#PCDATA | child1 | child2 | ...)*> Mixed content models only enable you to list a set of elements that may appear mixed with text, but you cannot specify their sequence or the number of times they may appear. For example, if a very simple paragraph element only contains text mixed with bold and italic elements, the declarations might look like this: <!ELEMENT bold (#PCDATA)> <!ELEMENT italic (#PCDATA)> <!ELEMENT paragraph (#PCDATA | bold | italic)*> Based on those declarations, all of the paragraphs shown here are legal: <paragraph>There's just text in this one!</paragraph> <paragraph><bold>This one's bold!</bold></paragraph> <paragraph><italic>This one's italic!</italic></paragraph> <paragraph><bold>This one's part bold</bold> and <italic>part italic!</italic></paragraph> <paragraph><italic>This one's part italic</italic> and <bold>part bold - </bold> and then <bold>bold again!</bold></paragraph> Mixed declarations are used throughout the XHTML 1.0 DTDs; to understand their usage there, you need to know about parameter entities (which I cover later in this article). Attribute List Declarations Attribute list declarations enable you to specify attributes that you can use on particular element types. Every element in XHTML has at least one core set of attributes so attribute list declarations (sometimes abbreviated ATTLIST declarations) are an important part of the XHTML 1.0 DTDs. You have more options for attribute list declarations than element type declarations in XML, but fortunately the XHTML 1.0 specification stays away from the most complicated types of attributes. The basic syntax for an attribute list declaration looks like this: <!ATTLIST elementName attName attType default attName attType default ... > Multiple attribute list declarations may appear for a single element type, although the first definition of a particular attribute for a given element is the one that gets used in repeated definitions. Any number of attributes may be defined for a particular element in a given attribute list declaration, even none: <!ATTLIST myElement> Attribute names are subject to the same rules as element names: they must begin with letters, underscores, or colons, and may contain letters, underscores, colons, digits, hyphens, and periods. Attribute names beginning with xml (or any case variation on that, such as xMl or XML) are reserved for the use of the W3C. Furthermore, the use of colons is discouraged except for use with namespaces. The simplest type of attribute is the CDATA type, an abbreviation for Character DATA. The simplest default is the keyword #IMPLIED, which doesn't supply any default value for the attribute. A very simple attribute declaration might look like this: <!ATTLIST myElement note CDATA #IMPLIED> The following sections discuss the attribute types and default options in more detail. Types of attributes Let's take a look at how these attributes are used by exploring subsets of the declarations employed in the XHTML DTD. The DTD uses parameter entities, covered later in this article, and smaller examples are easier to work with, so we'll create examples that are easy to read but aren't the exact quote from the XHTML DTD. Also, as you'll see, the W3C uses parameter entities to specify expectations for attribute content that can't be expressed using the basic types. Attributes of type CDATA appear throughout the XHTML DTDs. CDATA is the loosest model, accommodating all kinds of needs while setting very few expectations. CDATA attribute types can hold URLs, numeric information, style information – basically anything that can be expressed as text. A subset of the attribute list declaration for the img element, for example, might look like this: <!--These are compatible with the XHTML DTDs but do not represent the complete declarations from the XHTML DTD--> <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED height CDATA #IMPLIED width CDATA #IMPLIED > The src attribute, which takes a URL, is represented as CDATA. The alt attribute, which contains text to display if the image isn't loaded, also is represented as CDATA despite the differences between its content and that of the src attribute. The height and width attributes, which accept lengths, also use CDATA. CDATA can handle all of these different types because it places so few restrictions on its content. The XHTML 1.0 Recommendation names all of its attributes of type ID as id and makes them available to every single element in the DTD. To add the ID element to the img element, you just use this: <!ATTLIST img id ID #IMPLIED > Or add this to the preceding list: <!--These are compatible with the XHTML DTDs but do not represent the complete declarations from the XHTML DTD--> <!ATTLIST img src CDATA #REQUIRED alt CDATA #REQUIRED height CDATA #IMPLIED width CDATA #IMPLIED id ID #IMPLIED > The IDREF and IDREF attribute types are used more sparingly. The label element, which enables the creation of labels for all elements in a document, has a for attribute that should contain an ID value describing the content being labeled: <!--This is compatible with the XHTML DTDs but does not represent the complete declarations from the XHTML DTD--> <!ATTLIST label for IDREF #IMPLIED This mechanism allows the label to refer to one and only one element in a document – the one that has an id attribute value matching that of the label's for attribute. IDREFS are used similarly, although they permit a single attribute to refer to multiple ID values. XHTML 1.0 uses IDREFS to allow table cells to point to the header labels that describe them: <!--This is compatible with the XHTML DTDs but does not represent the complete declarations from the XHTML DTD--> <!ATTLIST td headers IDREFS #IMPLIED > <!ATTLIST th headers IDREFS #IMPLIED > Complex tables sometimes sprout multiple levels of headers; this can help manage table reorganization or analysis. For instance, XHTML uses the NMTOKEN attribute type to restrict content to a single word. In the a, map, and object elements, NMTOKEN is used to restrict the value of name attributes to the same rules that apply to id attributes: <!--This is compatible with the XHTML DTDs but does not represent the complete declarations from the XHTML DTD--> <!ATTLIST a name NMTOKEN #IMPLIED id ID #IMPLIED > <!ATTLIST map name NMTOKEN #IMPLIED id ID #IMPLIED > <!ATTLIST object name NMTOKEN #IMPLIED id ID #IMPLIED > XHTML uses enumerated attributes to restrict the values for an attribute to a small set of permitted choices, presented as a list. Enumerated attributes appear throughout the DTDs. The use of an enumerated attribute type to restrict values is useful especially for input elements in which the type attribute defines the "real" meaning of the element: <!--This is compatible with the XHTML DTDs but does not represent the complete declarations from the XHTML DTD--> <!ATTLIST input type (text | password | checkbox | radio | submit | reset | file | hidden | image | button) "text" > Enumerated types also are used for certain attributes (such as the ismap attribute for img elements, which can have only one value if enumerated types are used): <!ATTLIST img ismap (ismap) #IMPLIED> If the img element should be treated as an image map, the document creator should use the ismap attribute shown here: <img src='whatever.png' ismap='ismap' /> If the image isn't a map, the img element shouldn't have an ismap attribute at all as shown here: <img src='whatever.png' /> XHTML 1.0 doesn't use the NMTOKENS, NOTATION, ENTITY, or ENTITIES attribute types at all. However, their use is not prohibited in XML DTDs that are designed to include or be included by XHTML. If you encounter these types in a DTD you use with XHTML, consult the documentation for that DTD regarding their proper use. |
legal disclaimer
Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.
related articles
Internationalization: xml:lang and lang Internationalization (often abbreviated i18n because 18 characters appear between the i and the n) gets a significant boost with the shift to XML primarily because of XML's use of Unicode as the underlying character model. While not every document needs to encode Chinese, Cyrillic, Arabic, and Indian characters, Unicode makes it possible for all of these forms to exist within a single document. In addition, XML and XHTML allow for the possibility of other e...
The transition from HTML to XHTML will come with a fair number of bumps. While later chapters introduce tools to help you get past those bumps – and figure out where they come from – this chapter examines what's going to change and demonstrates a few strategies for handling those changes. Along the way, we visit the ghosts of browsers past and explore problems that exist in current browsers. In turn, you discover how prepared and unprepared various tools are for XHTML. Note Som...
3. Converting to strict HTML and XHTML
Converting to strict HTML You start out by declaring your intentions to use the strict HTML 4.01 DTD by putting the appropriate DOCTYPE declaration at the head of the document: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> Now the first section of the document, including the HTML opening tag and the HEAD element and its contents, is fine except for one line. The SCRIPT element no longer supports a LANGUAGE at...
4. Defaulting attribute values XHTML DTDs
XML 1.0 also provides a set of tools for specifying what happens if an attribute isn't declared within an element. Four different possibilities exist, including "the attribute just isn't there"; "the attribute must be there, period"; and "the attribute has this value, period." You already have seen a few uses of these choices in the preceding declarations. In the img element, for instance, the src and alt attributes are required (#REQUIRED); meanwhile, most of the rest of its attribute content is optio...
5. Exploring the XHTML DTDs
Exploring the XHTML DTDs Choosing Your DTD XHTML 1.0 provides three DTDs that describe different sets of XHTML elements and reflect the three choices provided in HTML 4.0: strict, transitional, and frameset. The probably the one that the W3C would like to see developers adhere to, but transitional DTDs reflect the reality of HTML usage much more accurately. Appendix A lists the in the three different DTDs, along with notes regarding attributes. To identify the DTD for a ...
6. Building XHTML DTD Structure Element and Attribute Declarations
Building Structure: Element and Attribute Declarations After all of these preliminaries, it's finally time to make some real declarations, creating the elements and attributes partly described by the entities established so far. This portion of the DTD is broken down into segments that reflect groupings of element types, foreshadowing to some extent the modularization process that XHTML 1.1 will perform. If you have trouble getting your XHTML documents to validate, you need to explore this portion of the ...
7. Style Sheets and XHTML
Cascading Style Sheets (CSS) is an enormously powerful tool that has been slow to catch on in the HTML development world. Whether or not you use (or like) CSS, the continuing evolution of CSS is deeply intertwined with the work moving forward on XHTML so learning about CSS can help you understand XHTML as well as implement it. Fortunately, CSS isn't very difficult once you master a few key structures and learn to apply its vocabulary. There are some real problems with existing CSS implementations that I cover later...