9.1 Working with XML: An XML Primer

An XML document is a plain text file that contains hierarchically structured data. You can edit the file in a text editor if you are using ASCII characters. For non-ASCII characters, you must convert the XML document to UTF-8 encoding.

An XML document describes the data it contains using elements and attributes. An element identifies what the data is and an attribute defines metadata about the data. The relative placement of elements within the XML structure matters because an element can take on different meaning, based on where it is located in the XML structure.

For information, see the following:

9.1.1 Elements

An element uses a set of markup tags to delimit and define each piece of data in a document. The tag set consists of an start tag and an end tag. For example:


<tagname>data</tagname>

An element consists of information from the start tag to the end tag and everything in between.

If data contains the less than (<), greater than (>), or ampersand (&) characters, you must enclose the data with the CData tag.

<![CDATA[data]]>

For example:


<tagname><![CDATA[1&2]]></tagname>

9.1.2 Attributes

Elements can be annotated with any number of unique attributes. Attributes appear as name/value pairs separated by an equal sign (=) and must appear in double quotes or single quotes. You attach attributes in the start tag, but not to the end tag. For example:


<tagname attribute_name="value">data</tagname>

9.1.3 Hierarchical Relationships between XML Elements

Elements have a hierarchical structure that defines the relationships between parent and child elements. Every XML document has exactly one top-level element, known as the root element. The root element is mandatory, even if it has no content. All other elements are its children.

Some elements appear only once in a document, while others can appear multiple times. The child elements of the root element can be a parent to multiple elements. An element can be a child of different parent elements.

9.1.4 Element Content

Element content is the information between the two tags of an element, such as no content, parsed character data (PCData), and child elements. If a tag has no element content, it is called an empty element. The table below shows examples of some common XML tag constructs.

Element Content

Example of Tag Construct

Description

No content


<tagname></tagname>

An empty element


<tagname/>

A abbreviated form of an empty element

Parsed character data (PCData)


<tagname>text</tagname>

An element with data


<tagname attribute_name="value">


text


</tagname>

An element with data and with one or more attributes that describe the data

One or more child elements


<tagname>


   <child1_tag>text</child1_tag>


   <child2_tag></child2_tag>


</tagname>

An element with two child elements: an element with PCData and an empty element

9.1.5 Rules for Well-Formed XML Documents

As you configure the XML file, keep in mind the following rules for well-formed XML documents:

  • Every XML document has only one root element.

  • Every start tag must have a matching end tag. The exception is the abbreviated version of an empty element (<tagname/>).

  • Tags cannot overlap; every element must be properly nested.

  • Element names and attribute names are case sensitive.

  • XML keeps any white space in your text.

9.1.6 Additional Information

For more information about using XML, consult an XML programming textbook or search the Internet for an XML tutorial.