
Using W3C XML Schema
by Eric van der VlistOctober 17, 2001
The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. W3C XML Schema is a W3C Recommendation.
This article is an introduction to using W3C XML Schemas, and also includes a comprehensive reference to the Schema datatypes and structures.
(Editor's note: this tutorial has been updated since its first publication in 2000, to reflect the finalization of W3C XML Schema as a Recommendation.)
Introducing our First Schema
Let's start by having a look at this simple document which describes a book:
<?xml version="1.0" encoding="UTF-8"?> <book isbn="0836217462"> <title> Being a Dog Is a Full-Time Job </title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>
Get a copy of library1.xml for reference.
To write a schema for this document, we could simply follow its structure
and define each element as we find it. To start, we open a xs:schema
element:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema"> .../... </xs:schema>
The schema
element opens our schema. It can also hold the
definition of the target namespace and several default options, of which we
will see some of them in the following sections.
To match the start tag for the book
element, we define an
element named book
. This element has attributes and non text
children, thus we consider it as a complexType
(since the other
datatype, simpleType
is reserved for datatypes holding only values
and no element or attribute sub-nodes. The list of children of the book
element is described by a sequence
element:
<xs:element name="book"> <xs:complexType> <xs:sequence> .../... </xs:sequence> .../... </xs:complexType> </xs:element>
The sequence
is a "compositor" that defines an ordered sequence
of sub-elements. We will see the two other compositors, choice
and
all
in the following sections.
Now we can define the title and author elements as simple types -- they don't
have attributes or non-text children and can be described directly within a
degenerate element
element. The type (xs:string
) is prefixed by
the namespace prefix associated with XML Schema, indicating a predefined XML
Schema datatype:
<xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/>
Now, we must deal with the character
element, a complex type.
Note how its cardinality is defined:
<xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> .../... </xs:sequence> </xs:complexType> </xs:element>
Unlike other schema definition languages, W3C XML Schema lets us define the
cardinality of an element (i.e. the number of its possible occurrences) with
some precision. We can specify both minOccurs
(the minimum number
of occurences) and maxOccurs
(the maximum number of occurrences).
Here maxOccurs
is set to unbounded
which means that
there can be as many occurences of the character element as the author wishes.
Both attributes have a default value of one.
We specify then the list of all its children in the same way:
<xs:element name="name" type="xs:string"/> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="since" type="xs:date"/> <xs:element name="qualification" type="xs:string"/>
Related Reading ![]() XML Schema |
And we terminate its description by closing the complexType
,
element
and sequence
elements.
We can now declare the attributes of the document elements, which must always come last. There appears to be no special reason for this, but the W3C XML Schema Working Group has considered that it was simpler to impose a relative order to the definitions of the list of elements and attributes within a complex type, and that it was more natural to define the attributes after the elements.
<xs:attribute name="isbn" type="xs:string"/>
And close all the remaining elements.
That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure of our example document.
One of the key features of such a design is to define each element and attribute within its context and to allow multiple occurrences of a same element name to carry different definitions.
Complete listing of this first example:
<?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="https://www.w3.org/2001/XMLSchema"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="since" type="xs:date"/> <xs:element name="qualification" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="isbn" type="xs:string"/> </xs:complexType> </xs:element> </xs:schema>
Download this schema: library1.xsd
The next section explores how to subdivide schema designs to make them more readable and maintainable.