Working with DTDs
Understanding Document Type Definitions (DTDs)
A Document Type Definition (DTD) is a file that describes the structure of a group of documents, by means of declarations written in a formal notation described by the international XML standards committee. Think of it as the definition of the grammar of an XML document.
If you will be working regularly with XML, it's a good idea to bookmark the URL of the Worldwide Web Consortium (W3C) standards organization (http://www.w3.org/XML/) to stay abreast of critical developments, and to refer to current standards.
The DTD is at the heart of your XML or SGML implementation. It defines the types of data you want to collect, process, and present, as well as the relationships between data elements. Data elements defined in the DTD are the building blocks of your entire customization.
Every XML file you edit with XMetaL must be associated with either a schema or a DTD. SGML files can only be associated with DTDs. These are typically provided to developers by systems analysts and designers. There are also numerous proven DTDs available without charge from reputable bodies on the Web.
Defining your DTD is critical to your project's workflow design. Before you begin any coding, you should have a carefully considered DTD already in place. Even so, DTDs will usually require tweaking or maintenance as the project develops.
This lesson assumes you are familiar with XML. However, we will review some core concepts as a refresher, and also as a way of introducing you to some important rules about XMetaL's treatment of DTDs, rules files, and schemas.
This lesson will take about 30 minutes to complete.
Understanding Schema and Rules Files
A schema is the definition of a structural model for a group of XML files. Schema are typically used for machine validation of XML document structure. If you work with a software application or client/server application that recieves a large quantity of XML files from a variety of sources, you may have to work with Schema.
Human beings use schema in the broad sense all the time in their daily lives. Imagine a postal address. You'll know the structure of an address immediately from your mental schema of an address - it has a name, a street address, a city, a country, and a postal code or zip code.
XML capital-S Schemas most often define what is invalid, through what are known as "constraints". In the XML world, Schemas are regularly used to ensure that things like address elements don't contain two instances of <city>, or a <country> value that contain only integers.
The W3C committee also defines rules for Schema, and if you work with Schema, you should visit their site (http://www.w3.org/XML/Schema). Here you'll find definitions and standards, as well as valuable tools for validating schema files.
One example of a schema with wide industry support is the RIXML schema specification, which provides a common language to improve the value of broker investment research. Support for RIXML schemas can be developed using XML.
Schema files have the file suffix "XSD", for XML Schema Definition. If properly constructed, XML Schema can replace a standard DTD - and add functionality at the same time.
XML Schemas have certain advantages over DTDs. They:
  1. are extensible to future additions
  2. are richer and more useful than DTDs
  3. are written in XML
  4. support data types
  5. support namespaces.
There are certain limitations to the use of schemas with XMetaL:
  1. Identity-constraint definitions are ignored
  2. Wildcards are ignored
  3. The <redefine> tag is not supported
  4. The instance attributes xsi:nil and xsi:type are ignored, and cannot be edited in Normal or Tags On view.
  5. Checking an XML Schema (.xsd file) for errors is limited in XMetaL. We recommend the use of third-party tools, such as those available from W3C (http://www.w3.org/XML/Schema).
XMetaL can use DTDs or Schema in two formats: as text files, or as a rules file. A rules file is a DTD or Schema that has been compiled into binary format. When you open or create a document that uses a DTD for which there is no corresponding rules file, XMetaL automatically compiles a rules (.rlx) file that encodes the information in the DTD. Rules files for DTDs have a ".rlx" extension, while rules files for a Schema have a ".rld" extension.
Viewing and Modifying DTDs and Schemas
You can view DTDs and Schemas from XMetal Developer by just double-clicking on their file names in the Solution Explorer.
Try double-clicking on the "Meeting.dtd" file in the MeetingMinutes customization. You'll see a list of all elements in the project, with their properties listed in the Properties window. You'll find that you will often reference this file when building XML solutions.
To edit a DTD or Schema, use the Visual Studio .NET XSD Viewer/Editor. For further information, see the VS .NET online documentation.
Understanding DOCTYPE Declarations
An XML document starts with a declaration called a document type declaration (DOCTYPE). The DOCTYPE associates the document with a DTD or rules file by means of an external identifier.
Every XML document you create will contain a DOCTYPE declaration, and unless its syntax is correct, your document will not be valid, so it's a good idea to become familiar with DOCTYPEs.
Here is an example of a DOCTYPE declaration:
<!DOCTYPE BOOK PUBLIC "-//Blast Radius//Book v1.0//EN" "book.dtd">
Following the DOCTYPE keyword is the document type name. In the above example, the document type name is BOOK. By default, this is the top-level element in the DTD or rules file. However, as you are editing a document, XMetaL changes this to the current top-level element in the document.
Following the document type name is an external identifier. An external identifier consists of the keyword SYSTEM or PUBLIC, followed by a string of characters inside double quotes that indicate the location of the DTD or Schema. If the external identifier starts with SYSTEM, it has only a system identifier; if it starts with PUBLIC, it has a public identifier followed by a system identifier.
The system identifier is generally the filename or URL of the DTD or rules file. The public identifier is an arbitrary identifier, usually one agreed upon by various organizations that use the DTD. Certain DTDs used by a large number of organizations have a standard public identifier.
Here are two examples of DOCTYPE declarations, one with a PUBLIC keyword and one with a SYSTEM keyword, that could be used to refer to the same DTD:
<!DOCTYPE BOOK PUBLIC "-//Blast Radius//Book v1.0//EN" "book.dtd">
The keyword PUBLIC indicates that the first string in quotes that follows it is the public identifier, and the second string in quotes that follows it is the system identifier. This DOCTYPE refers to a DTD that has the public identifier -//Blast Radius//Book v1.0//EN and the system identifier book.dtd.
Now let's look at another reference to the same file:
<!DOCTYPE BOOK SYSTEM "book.dtd">
The keyword SYSTEM indicates that the identifier that follows it is the system identifier. If the external identifier starts with SYSTEM, there cannot be a public identifier. This DOCTYPE refers to a DTD that has the system identifier book.dtd.
Understanding the Internal Subset of a DOCTYPE
Instead of, or in addition to, the external identifier, the DOCTYPE declaration can have an internal subset containing further declarations. An external DTD file is known as the "external subset", while similar definitions inside an XML document are called the "internal subset". Both work together to create a document type definition.
For example:
<!DOCTYPE Article SYSTEM "journalist.dtd" [ <!ENTITY Title "Weasel populations in a forest in Poland"> ... ]>
Take a look at the ENTITY declaration above. ENTITY is an XML "attribute" type. Attributes are additional information associated with an element type, intended mainly for text and markup interpretation by a software application. All attribute values must be in quotes.
The internal subset can contain attributes such as ELEMENT, ATTLIST, and ENTITY declarations. Attribute declarations in the subset are read before declarations in the external DTD or rules file, and therefore they override any external declarations of the same attribute or entity.
ATTLIST declarations identify which element types may have attributes, what type of attributes they may be, and what the default value of the attributes are. ATTLIST declarations specifying different attributes of the same element are combined, but if the same attribute is specified both internally and externally, the specification in the internal subset takes precedence.
Duplicate ELEMENT declarations are not allowed and result in an error message.
A DOCTYPE declaration can omit the external identifier, so that the document's DTD is internal (contained completely in the internal subset). For example:
<?xml version="1.0" standalone="yes"?> <!DOCTYPE Article [ <!Element Article (Title, Sect1+)> <!Element Title (#pcdata)> <!Element Sect1 (Title,Para+)> <!Element Para (#pcdata)> <!Attlist Article Id ID #IMPLIED> ]> <Article> ... </Article>
The internal subset can refer to an external DTD using a parameter entity reference:
<?xml version="1.0"?> <!DOCTYPE Article [ <!Entity % journalist.dtd SYSTEM "journalist.dtd"> %journalist.dtd; ]> <Article> ... </Article>
When the users create an entity with any of the entity-creation commands in XMetaL Author's Tools menu, the entity declarations are placed in the internal subset. However, if the internal subset contains any declarations other than ENTITY declarations, it is read-only from the Tags On and Normal views, and the entity-creation commands are unavailable.
Mapping External Identifiers to Files
XMetaL uses the OASIS catalog mechanism to associate the external identifier in a DOCTYPE or in an external entity declaration with the name and location of a DTD, rules file, or entity file.
You would typically use this mechanism only in the following situations:
  1. If the document's DOCTYPE contains only a public identifier.
  2. If the DTD or rules file is not stored in the Rules folder.
  3. If the system identifier in the DOCTYPE does not match the DTD or rules file that you want to use.
If the catalog mechanism does not provide a result, XMetaL tries to resolve the external identifier using the following methods, in the order given, until a result is obtained.
  1. The external identifier map file (extid.map). This mechanism is provided for backward compatibility with previous versions of XMetaL, and can be disabled.
  2. Attempting to retrieve the system identifier as a URL (relative URLs are relative to the document instance).
  3. Attempting to retrieve the system identifier as a file path (relative paths are relative to the document instance).
For a complete and formal OASIS specification, see OASIS Technical Resolution 9401:1997 (http://www.oasis-open.org/specs/a401.htm).
Understanding the External Identifier Map File
XMetaL provides a backup mechanism called "the external identifier map file" for mapping the external identifier in a DOCTYPE to the name and location of a DTD or rules file. XMetaL uses this mechanism if the catalog mechanism does not resolve the public identifier.
Note: You can disable the external identifier map mechanism by setting use_extid_mapping to false in the xmetal45.ini file.
You would typically use this mechanism only in the following situations:
  1. If the document's DOCTYPE contains only a public identifier.
  2. If the DTD, Schema, or rules file is not stored in the Rules folder.
  3. If the system identifier in the DOCTYPE does not match the DTD or rules file that you want to use.
  4. If you want to use patterns (regular expressions) to match a set of public or system identifiers and map them on to a set of filenames.
The external identifier map file is, by default, the file "extid.map" in the top-level XMetaL folder. You can use a different file by specifying a value for extid_map in the xmetal45.ini file.
The external identifier map file consists of lines in this form:
public-id system-id DTD/rulesfile
The first two values are strings or patterns that match the public and system identifiers respectively. The third value is the name of the DTD or rules file that these identifiers refer to. Here is an example:
"-//Blast Radius//Book v1.0//EN" ! book.dtd
If you open a file whose DOCTYPE contains the public identifier -//Blast Radius//Book v1.0//EN, XMetaL scans the external identifier map file until it comes to the line in the example. It sees that the two identifiers match, and therefore it looks for the DTD "book.dtd". The exclamation mark (!) is a special character that means "match any identifier", so in this example it does not matter what the system identifier is, or if one is present.
Setting up the External Identifier Map file
XMetaL needs to refer to the external identifier map file (extid.map) only when the DOCTYPE in a document does not have a system identifier that is the same as the filename of a DTD or rules file stored in the Rules folder. Let's look at some examples of this:
Using an alternative DTD/rules location
If you store your DTD, Schema, or rules file somewhere other than the Rules folder, there are two way to tell XMetaL the location.
You can put the rules file location in the DOCTYPE explicitly:
<!DOCTYPE BOOK SYSTEM "C:\DTDs\book.dtd">
Or, you can use the extid.map to point to the location of the DTD or rules file.
"-//Blast Radius//Book v1.0//EN" ! "C:/DTDs/book.dtd" ! "book.dtd" "C:/DTDs/book.dtd"
The first example maps a public identifier to a DTD; the second maps a system identifier to a DTD. Either form is valid.
Mapping one system identifier to another
By default, if the system identifier specifies "dtdname.dtd", XMetaL automatically looks for the rules file "dtdname.rlx". If the system identifier does NOT correspond to the desired DTD or rules file in this regular way, you must create an entry in the external identifier map file.
The system identifier in the DOCTYPE may specify a DTD name, as in this example:
<!DOCTYPE BOOK SYSTEM "book.dtd">
If you want to use the rules file realbook.rlx, instead of book.rlx, you can either change the DOCTYPE to refer to the rules file, or create an entry in the external identifier map file that tells XMetaL which rules file corresponds to the DTD name.
Note: If the DOCTYPE contains a reference to a rules file (instead of a DTD), the DOCTYPE no longer adheres to the XML specification.
To map a public identifier to a file name, use an entry like this example:
! "book.dtd" "realbook.rlx"
If you use several rules files, and there is a regular correspondence between DTD names and rules file names (other than the default correspondence between .dtd and .rlx files), you can map them all using one entry.
For example, if you use names of the form "anything.dtd" for all your DTD file names, and call the corresponding rules files "anything.rules", the following line in the external identifier map tells XMetaL to use the .rules file corresponding to the DTD (no matter whether the public identifier is present, or what it is):
! (.*)\.dtd \1.rules
Using Catalogs
"Catalogs" allow XML processing tools like XMetaL to use a local copy or fragment of a DTD or Schema if it is available, even if your local XML document refers to a DTD at an external URL.
XML Catalogs are anchored in the root catalog (usually /etc/xml/catalog or defined by the user). Catalogs are a tree of XML documents defining the mappings between the canonical naming space and the local installed ones, in a static cache structure. When XMetaL is asked to process a resource, it will automatically test for a locally available version in the catalog, starting from the root catalog, and possibly fetching sub-catalog resources, until it finds (or does not find) that the catalog has that resource.
If the catalog can't help XMetaL locate a resource locally, it will look to the Web, allowing in most cases for a recovery from a catalog miss. This gives the document considerable platform independence.
XMetaL can use catalog files to help identify external references. Let's look at some examples:
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN" "isolat1.ent" SYSTEM "sqdoc.dtd" "sqdoc-xml.dtd" ENTITY face1 "c:\project1\smallfaces\face1.gif"
The PUBLIC entry in the first line above associates the public identifier "ISO 8879-1986//ENTITIES Added Latin 1//EN" with the filename "isolat1.ent". This entry could resolve the following declaration in a DTD file:
<!ENTITY % isolat1 PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN"> %isolat1;
When XMetaL encounters the "%isolat1" entity reference, it scans the declaration of the isolat1 entity, and finds a public identifier. It then looks in the catalog file for a PUBLIC entry matching the same identifier. The filename (isolat1.ent) specified in this entry is then used as the replacement for the entity reference.
The SYSTEM entry associates the system identifier "sqdoc.dtd" with the filename "sqdoc-xml.dtd". This entry could resolve the following DOCTYPE declaration.
<!DOCTYPE DOC SYSTEM "sqdoc.dtd">
When XMetaL reads this declaration at the top of an XML document, it finds the system identifier "sqdoc.dtd" and then looks in the catalog file for a SYSTEM entry matching that identifier. The filename (sqdoc-xml.dtd) found in this entry is then used as the DTD file for the document.
The ENTITY line associates the entity name "face1" with the filename "c:\project1\smallfaces\face1.gif". When XMetaL encounters a reference to the external entity "face1", it scans the declaration of face1 for a system and/or public identifier. It then reads the catalog file, looking for SYSTEM and/or PUBLIC entries specifying these identifiers. If it does not find a matching entry, it then looks for an ENTITY entry that matches the entity name in question. In the example above, the file "c:\project1\smallfaces\face1.gif" would be used as the replacement for the entity reference.
Note: Filenames can be absolute or relative paths, or URLs. Relative filenames in a catalog are interpreted as relative to the location of the catalog file, unless the catalog file contains a BASE entry. It is best to use backslashes with Windows file paths, but forward slashes are accepted.
Locating Catalog Files
Let's say for the sake of illustration that our current document is called "docname.xml", and it's located in the folder "docfldr". If XMetaL needs to resolve a reference to an external entity by referring to a catalog file, it searches for the files listed below, in the order given, until it finds the catalog file that references it.
The file referencing the needed resource is the root catalog, and it may have links to other catalog files via the CATALOG and DELEGATE keywords (see below), which may in turn have their own links, and so on. XMetaL looks for matches only in the root catalog file and its linked files (at all levels of linking).
  1. docfldr\docname.soc (a file in the same folder as the XML document, whose name is the same as the document except for the .soc file extension)
  2. docfldr\catalog (a file called catalog in the same folder as the XML document)
  3. docfldr\catalog.soc (a file called catalog.soc in the same folder as the XML document)
  4. Rules\catalog (a file called catalog in the XMetaL Rules folder)
  5. Rules\catalog.soc (a file called catalog in the XMetaL Rules folder).
There are two ways to specify alternate catalog files from within a catalog file:
A catalog file entry such as:
CATALOG "catalog2"
specifies an alternate catalog file. If XMetaL encounters such an entry, it continues reading the current catalog file, and if it does not find a matching entry, it reads the alternate file. If no matching entry is found, XMetaL continues with the next catalog file in the normal sequence. A catalog file can contain several CATALOG entries.
A catalog file entry of the form
DELEGATE public-id-prefix catalog-file
can be used if XMetaL is currently attempting to match a public identifier (though PUBLIC entries take precedence). If XMetaL encounters one or more DELEGATE lines (in a single catalog file) in which the public-id-prefix matches a substring of the public identifier in question (starting at the first character) then XMetaL looks for matching entries in the catalog files specified by the DELEGATE entries. It does not return to the normal sequence of catalog files.
Using Catalogs to give Priority to Identifiers
The system identifier (if there is one) in an external entity declaration is generally the real name of the file represented by the entity. Sometimes, however, this may not be the case, and the catalog mechanism provides the option of using other means to obtain the filename:
If the catalog file contains a SYSTEM entry matching the system identifier in question, then the filename specified in that entry is used to resolve the entity reference.
If the catalog file contains the entry OVERRIDE YES and there is no matching SYSTEM entry, then
  1. If the entity declaration contains a public identifier, and a matching PUBLIC entry is found, then the filename specified in that entry is used to resolve the entity reference.
  2. If a matching ENTITY entry is found, then the filename specified in that entry is used to resolve the entity reference.
  3. Otherwise, the system identifier is used to resolve the entity reference.
If the catalog file contains the entry OVERRIDE NO and there is no matching SYSTEM entry, then the system identifier is used to resolve the entity reference. In this case XMetaL does not attempt to match the public identifier or entity name.
An OVERRIDE YES or OVERRIDE NO entry is in effect until the end of the current catalog file, or until an OVERRIDE entry with the opposite setting is encountered.
The default mode (YES or NO) is set using the "OASIS_override" setting in the "xmetal45.ini" file. The default setting is true (YES).
Setting up a DTD
In the simplest case, a DTD consists of only a single file. Often, however, several files are involved:
  1. The main DTD file
  2. DTD fragments referred to in the main DTD file
  3. Files of entity declarations referred to in the main DTD file or a DTD fragment
  4. An attribute description file.
In order for your DTD to be read correctly by XMetaL, any required DTD fragments and entity files must be at the locations specified by the system identifier used to refer to them. For example, a DTD fragment may be referenced in the following entity declaration in the DTD:
<!ENTITY % calsdtd PUBLIC "CALS Table DTD" "dtds/cals.dtd">
In this case, the required DTD fragment should be in the file "cals.dtd", located in the folder "dtds", which should be in the same folder as the main DTD file.
The attribute description file should be located in the same folder as the main DTD file; if the DTD is named "dtdname.dtd", the attribute description file should be named "dtdname.att".
Note: When you open or create a document that uses a DTD for which there is no corresponding rules file, XMetaL compiles a rules (.rlx) file that encodes the information in the DTD. XMetaL then uses the rules file instead of the DTD. If the DTD is changed, the rules file must be deleted so that XMetaL can automatically recompile a new rules file.
Understanding the Attribute Description File
An attribute description file provides help strings for the Attribute Inspector. This file contains descriptions of attributes, which are displayed at the bottom of the Attribute Inspector when you click an attribute name. The attribute description file consists of entries of the form:
Element Attribute "Help String"
This example supplies a help string for the SECURITY attribute of PARA:
Para Security "Security level"
Attribute description files can be used with DTDs or compiled rules files. The attribute description file for a DTD must have the same name as the DTD, but with the file extension changed to .att; it should be in the same folder as the DTD (by default, the folder Rules). If you are compiling a rules file, you can choose the attribute description file from the XMetaL Rules Maker interface.
Understanding Content Types
XMetaL follows the content types set out in W3C XML Specifications. Content types that are accepted by XMetaL are:
  1. Mixed content - Can contain mixed content (a mixture of element, CDATA...)
  2. Element content - Contains only elements.
  3. Character data (CDATA) - Contains only CDATA.
  4. Replacable character data (PCDATA) - Contains only PCDATA.
  5. Any content - Can contain any or none of the different sets above.
  6. Empty content - Must be empty i.e., <element/>.
Go to the next lesson: Creating Simple XFT Forms...
Last modified: Friday, May 21, 2004 4:26:43 PM