CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Wed, 08 Oct 2025 23:07:42 GMT content-type: text/html; charset=iso-8859-1 content-encoding: gzip last-modified: Mon, 02 Oct 2017 10:22:28 GMT cache-control: max-age=31536000 expires: Thu, 08 Oct 2026 23:07:41 GMT vary: Accept-Encoding,Origin access-control-allow-origin: * x-backend: www-mirrors x-request-id: 98b94bc1de9bc19a strict-transport-security: max-age=15552000; includeSubdomains; preload content-security-policy: frame-ancestors 'self' https://cms.w3.org/ https://cms-dev.w3.org/; upgrade-insecure-requests cf-cache-status: BYPASS set-cookie: __cf_bm=pSfd0.xp7aNo3svum_tqLRsEL3raRV6myYbkxe_CmuM-1759964862-1.0.1.1-GA2xzMcisT0HBAQN.w0POuxhohSl0YykxsPwHWlWCS_XfgesbvwuuxtJgmDr_zSqIiN9IaRVy3uoYVIz52WNbz42Gr71Cu_MnXZlneU1.No; path=/; expires=Wed, 08-Oct-25 23:37:42 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None server: cloudflare cf-ray: 98b94bc1de9bc19a-BLR alt-svc: h3=":443"; ma=86400 Canonical XML

Canonical XML
Version 1.0

W3C Working Draft 19 January 2000

This version:: https://www.w3.org/TR/2000/WD-xml-c14n-20000119
Latest version:: https://www.w3.org/TR/xml-c14n
Previous versions:: https://www.w3.org/TR/1999/WD-xml-c14n-19991115; https://www.w3.org/TR/1999/WD-xml-c14n-19991109
Editors:: Tim Bray <tbray@textuality.com>; James Clark <jjc@jclark.com>; James Tauber <jtauber@jtauber.com>; John Cowan <jcowan@reutershealth.com>

Abstract

This document describes a subset of the information contained in an XML document and a syntax for expressing that subset. This syntax, called Canonical XML, is designed to encode the logical structure of XML documents; two XML documents whose Canonical-XML form is identical will be considered equivalent for the purposes of many applications.

Status of this document

The XML Core Working Group, with this 19 January 2000 Infoset Last Call working draft, invites comment on this specification. The Last Call period ends the 22 February 2000.

The W3C Membership and other interested parties are invited to review the specification and report implementation experience. Please send comments to www-xml-canonicalization-comments@w3.org (archive).

Note: The XML Core Working Group strongly solicits commentary, especially from early implementors of this Working Draft, on the appropriateness of the requirement that Canonical XML be in W3C normalized text form as well. The Working Group has published a minority report on this question at https://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0000.html. A rationale for the majority viewpoint embodied in this draft has been published at https://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0001.html.

For background on this work, please see the XML Activity Statement . While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

A list of current W3C Recommendations and other technical documents can be found at https://www.w3.org/TR.

Appendices

A References
B Acknowledgements (Non-Normative)

1 Introduction

The XML 1.0 Recommendation [XML] describes the syntax of a class of resources called XML documents. It is possible for XML documents which are equivalent for the purposes of many applications to differ in their physical representation. In particular, they may differ in their entity structure, attribute ordering, and character encoding. This means that much equivalence testing of XML documents cannot be done at the byte-comparison level. This Canonical XML specification aims to introduce a notion of equivalence between XML documents which can be tested at the syntactic level and, in particular, by byte-for-byte comparison. In the syntax it describes, logically equivalent documents are byte-for-byte identical.

The syntax described in this specification is called Canonical XML. XML documents may be transformed into Canonical XML (with potentially some information loss) - the result of this transformation is described as the canonical form of the original document. Canonical XML is XML - that is to say, the canonical form of any XML document is an XML document.

There are two essential aspects to the specification of Canonical XML:

Which information from an XML document is included in its canonical form (and which is not).
How information is expressed in Canonical XML.

2 Information Included in Canonical XML

For the purposes of this specification, the information in an XML document is that described by the XML Information Set Specification [Infoset]. The canonical form of an XML document, which is itself an XML document, also has an information set. This section describes what portion of an XML document's information set is included in that of its canonical form.

Note that information not included in Canonical XML may still be used to produce it. In particular:

Attribute types serve as the basis of the normalization process for attribute values in Canonical XML, but the type of attributes is not preserved in it.
The replacement text of general parsed entities that are referenced is included in Canonical XML, but the information about which entity any character or logical structure came from is not.
Attribute values provided by default are included in Canonical XML, but the fact that the value was provided by default is not.

2.1 The Document Information Item

The information set of the canonical form includes only the "children" property of the document information item. It does not include any of the peripheral properties of the document information item, nor the "notations" or "carview.php?tsp=entities" properties.

2.2 Element Information Items

The information set of the canonical form includes the properties: "namespace URI," "local name," "children" and "attributes" from each element information item. It does not include the "declared namespaces" property, nor any of the peripheral properties. Note that the infoset lists the "children" property as including references to skipped entity information items, but the canonical form does not include these.

2.3 Attribute Information Items

The information set of the canonical form includes all of the core properties, but none of the peripheral properties, of the attribute information item.

2.4 Processing Instruction Information Item

For Processing Instructions appearing outside of the Document Type Definition, the information set of the canonical form includes all of the core properties, but none of the peripheral properties, of the processing instruction information item. For those which appear in the Document Type Definition, the information set of the canonical form includes no Processing Instruction information items.

2.5 Reference to Skipped Entity Information Items

Reference to skipped entity information items are not included in the information set of the canonical form of a document. Such information items could not appear in Canonical XML because canonicalization requires the reading of declarations for all entities referenced in a document.

2.6 Character Information Items

The information set of the canonical form includes the core "character code" property of the character information item. None of the peripheral properties of the character information item are included.

2.7 Comment Information Items

The information set of the canonical form does not include comment information items.

2.8 Document Type Declaration Information Items

The information set of the canonical form does not include document type declaration information items.

2.9 Entity Information Items

The information set of the canonical form does not include entity information items.

2.10 Notation Information Items

The information set of the canonical form does not include notation information items.

2.11 Entity Start Marker Information Items

The information set of the canonical form does not include entity start marker information items.

2.12 Entity End Marker Information Items

The information set of the canonical form does not include entity end marker information items.

2.13 CDATA Start Marker Information Items

The information set of the canonical form does not include CDATA start marker information items.

No CDATA sections occur in the information set of the canonical form. They are not necessary since all syntactically-significant characters in Canonical XML are escaped in the fashion described in this specification.

2.14 CDATA End Marker Information Items

The information set of the canonical form does not include CDATA end marker information items.

2.15 Namespace Declaration Information Items

The information set of the canonical form does not include namespace declaration information items.

3 Document Type Definition Processing

The process of canonicalizing an XML document depends on its standalone document declaration. If the declaration is present and its value is "yes", then assuming the XML document satisfies the Standalone Document Declaration validity constraint, no external portion of the DTD can contain material which affects its canonical form.

In all other cases, the process of canonicalization requires reading the whole of the DTD. The following information from the DTD affects the canonical form of an XML document:

Default attribute values.
Declarations of general entities which are referenced in the document.
Attribute type declarations which affect the process of attribute value normalization.

Note that the process of canonicalization is effectively impossible for a non-standalone document for which some external component of the DTD cannot be retrieved. Implementors of software which is designed to produce Canonical XML should provide an interface to users which allows this error condition to be signaled.

The canonical form of an XML document is standalone.

4 Entity and Reference Processing

The canonical form of an XML document contains no general entity references - all such references are expanded so that the canonical form contains only the replacement text. Since it contains no DTD, it also contains no parameter entity references.

Suppose a file named "e1.xml" contains the following text, with no trailing newline (#xA) character.

Hallelujah, I'm a bum!

then if the following XML document is stored in a file in the same directory

<!DOCTYPE d [ 
 <!ENTITY lsb '['> 
 <!ENTITY rsb ']'> 
 <!ENTITY bum SYSTEM "e1.xml">
]>
<d>&lsb;&bum;&rsb;</d>

its canonical form is

<d>[Hallelujah, I'm a bum!]</d>

5 The Syntax of Canonical XML

This section describes the syntax of Canonical XML. This syntax is a proper subset of the syntax of XML 1.0. The canonical form of an XML document is identical to its original form except as described in this section.

Each Canonical XML document must match the production labeled canonXML in the grammar below, where the notation and the semantics of the word "match" are those described in the XML 1.0 specification.

Canonical XML

[1]	canonXML	::=	(PI #xA)* element #xA (PI #xA)*
[2]	element	::=	Stag (Datachar \| element \| PI)* Etag
[3]	Stag	::=	'<' Name NSDecl? (Att NSDecl?)* '>'
[4]	Etag	::=	'</' Name '>'
[5]	NSDecl	::=	#x20 'xmlns:' Prefix '=' '"' Attvalchar* '"'
[6]	Att	::=	#x20 Name '=' '"' Attvalchar* '"'
[7]	Datachar	::=	'&' \| '<' \| '>' \| ' '
			\| (Char - ('&' \| '<' \| '>' \| #xD ))
[8]	Attvalchar	::=	'&' \| '<' \| '"' \| ' ' \| ' ' \| ' '
			\| (Char - ('&' \| '<' \| '"' \| #x9 \| #xA \| #xD))
[9]	Name	::=	(Prefix ':')? NCName
[10]	Prefix	::=	'n' [1-9] [0-9]*
[11]	PI	::=	'<?' PITarget (#x20 (Char+ - (Char* '?>' Char*)))? '?>'
[12]	PITarget	::=	NCName - (('X' \| 'x') ('M' \| 'm') ('L' \| 'l'))

The remainder of this section expresses additional constraints beyond those expressed in the grammar and provides further explanatory material on key aspects of Canonical XML.

5.1 Character Encoding

Canonical XML uses UTF-8 in the normalized form recommended by [CharModel] as the character encoding.

For example, consider the following small document:

<?xml version="1.0" encoding="ISO-8859-1"?>
<lang>Español</lang>

Since it is encoded in ISO-8859-1 ("ISO Latin-1"), the character "ñ" is represented as #xF1. In Canonical XML, however, that character must be represented using UTF-8 in two bytes whose values are #xC3 and #xB1.

The Unicode standard [Unicode] allows multiple different representations of certain "precomposed characters" (a simple example is "ç"). Thus two XML documents with content that is equivalent for the purposes of most applications may contain differing character sequences. The W3C has recommended a normalized representation [CharModel]. Canonical XML uses this normalized form.

Note: The XML Core Working Group strongly solicits commentary, especially from early implementors of this Working Draft, on the appropriateness of this requirement for normalized form. The Working Group has published a minority report on this question at https://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0000.html. A rationale for the majority viewpoint embodied in this draft has been published at https://lists.w3.org/Archives/Public/www-xml-canonicalization-comments/2000Jan/0001.html.

5.2 Character Escaping

The XML 1.0 specification requires XML processors to perform certain simple transformations on white-space characters in XML documents, when they serve as line separators and when they appear in attribute values. For each character in the result of the transformation, there will be a character information item as described by the Information Set. For example, in an XML 1.0 document:

Where an element contains two lines are separated by CR-NL (#xD, #xA), the information set contains a single NL (#xA) character information item.
Where an element or attribute value contains the string "", the information set contains a single CR (#xD) character information item.
Where a CDATA attribute value contains a TAB (#x9) character, the information set contains a single space (#x20) character information item.
When a non-CDATA attribute value contains a TAB (#x9) character, the information set contains a single space (#x20) character information item if the TAB character immediately followed a non white-space character, and, otherwise contains nothing at all.
Where an attribute value contains the string "	", the information set contains a TAB character (#x9).

All character information items are represented in a Canonical XML document by their UTF-8 encoding, with the following exceptions:

In character data and attribute values, the character information items "<" and "&" are represented by "<" and "&" respectively.
In character data, the character information item ">" is represented by ">".
In attribute values, the double-quote character information item (") is represented by """.
In character data, the carriage-return (#xD) character information item is represented by "".
In attribute values, the character information items TAB (#x9), newline (#xA), and carriage-return (#xD) are represented by "	", "
", and "" respectively.

5.3 Prolog

Canonical-XML documents have a prolog which contains only those Processing Instructions appearing before the start-tag of the root element but not within the Document Type Definition. Each PI is followed by a single newline (#xA) character. These PIs and newline characters make up the whole content of the prolog. If there are no such PIs, the first character is the "<" marking the beginning of the root element's start-tag.

For the following XML document

<!DOCTYPE x PUBLIC "myX" "x.dtd" [
 <!ENTITY a "aVal"> ]>
 <x>y</x>

the canonical form is

<x>y</x>

If PIs are involved

<?t1  t1-body  ?>
<!DOCTYPE x PUBLIC "myX" "x.dtd" [
 <?t2 t2-body ?>
 <!ENTITY a "aVal"> ]>
<?xml-stylesheet 
  href="mystyle.css" 
  type="text/css" ?> <?rating    mostly-harmless?> <x>y</x><?t3 ?>

the canonical form is

<?t1 t1-body  ?>
<?xml-stylesheet href="mystyle.css" 
  type="text/css" ?>
<?rating mostly-harmless?>
<x>y</x>
<?t3?>

5.4 Epilog

The epilog of all Canonical-XML documents contains a single newline (#xA) character, which immediately follows the ">" marking the end of the root element's end-tag. If the epilog contains Processing Instructions they are preserved in the Canonical-XML epilog, each followed by a newline (#xA) character.

For the following XML document

<x>y</x><?audio stop here ?>
<!--
Local variables:
mode: xml
End:
--><?pi?>

the canonical form is

<x>y</x>
<?audio stop here ?>
<?pi?>

5.5 Elements

In Canonical XML, all elements have a start-tag and an end-tag. For elements which have no content, the end-tag follows the start-tag with no intervening characters.

For the following element

<x>
<a n="1"/><b n="2"/>
<c n="3"/></x>

the canonical form is

<x>
<a n="1"></a><b n="2"></b>
<c n="3"></c></x>

5.6 Tags

In Canonical XML, for end-tags and start-tags which contain no attributes, the ">" character closing the tag follows the element type immediately with no intervening white space. Any attributes and namespace declarations follow with each attribute and namespace declaration preceded by one space (#x20) character. When the element type and the attribute names do not have namespaces, the attributes are sorted lexicographically by attribute name (based on Unicode character code points); the ordering when namespaces are present is described in [5.9 Namespaces].

The canonical form of an XML document includes all its attributes, whether provided explicitly or by default in the original document.

For the following element

<x a="Earth"
   ñ="Wind"
   z="Fire"
>!!</x
>

the canonical form is

<x a="Earth" z="Fire" ñ="Wind">!!</x>

5.7 Attributes

In the canonical form of an XML document, attribute values are normalized in the fashion required of an XML processor.

In Canonical XML, attribute names and values are separated by a single "=" character and no spaces. All attribute values are delimited by double-quote (") characters. Within attribute values, all occurrences of double-quote are replaced by """.

For the following start-tag

<x a = '"Don&apos;t!", he cried.' b = "'>'">

the canonical form is

<x a="&quot;Don't!&quot;, he cried." b="'>'">

5.8 Processing Instructions

In Canonical XML, there is no Document Type Definition and thus no PIs contained in it. PIs which precede and follow the root element are normalized as follows:

The white-space separating the PI Target from the rest of the PI contents is replaced by a single space (#x20) character.
The "?>" sequence which closes the PI is followed by a single newline (#xA) character.

PIs which are contained in the content of an element are normalized as follows:

The white-space separating the PI Target from the rest of the PI contents is replaced by a single space (#x20) character.

For the following XML document

<?pi1 
  v1 ?><?pi2
  v2 ?><root>Hello <?audio
 bang!
?> he said.</root><?pi3?>

the canonical form is

<?pi1 v1 ?>
<?pi2 v2 ?>
<root>Hello <?audio bang!
?> he said.</root>
<?pi3?>

5.9 Namespaces

In Canonical XML, namespace prefixes always have the form n1, n2 and so on. The positive integer following the n is called the index of the prefix.

A start-tag always contains namespace declarations for exactly those prefixes that are used in the element type and the attribute names occurring in the start-tag. Namespace declarations are never inherited.

NOTE: This approach was chosen so that canonicalization is context-independent: the canonical form of an element is independent of where it occurs in the document.

The default namespace is never used. An attribute name never has the same prefix as the element type or another attribute name. The namespace declaration for a prefix immediately follows the element type or attribute that uses the prefix. Attributes are ordered primarily by the lexicographic order of the namespace URI with which the prefix of the attribute name is associated, and secondarily by the lexicographic order of the local part of the attribute name. A null namespace URI is considered to precede a non-null namespace URI: thus all attributes without prefixes precede all attributes with prefixes.

In the start-tag namespace prefixes occur in order of prefix index. The index of the first namespace prefix in the start-tag is always 1. The indices of the prefixes occurring in the start-tag are always consecutive integers. Thus if the element type has a prefix, its prefix will be n1; the prefix of the first attribute name in the start-tag that has a prefix will be n2 if the element type has a prefix, and n1 otherwise; for subsequent attributes, the index of the prefix of the attribute name will be one greater than the index of the prefix of the name of the preceding attribute.

For example, for the following element

<doc xmlns:x="https://w3.org/2" xmlns:y="https://w3.org/1">
<x:e a="a"/>
<x:e x:a="x:a"/>
<e x:a="x:a"/>
<e x:a="x:a" y:a="y:a"/>
<e x:a="x:a" a="a"/>
<e x:a="x:a" x:b="x:b"/>
</doc>

the canonical form is

<doc>
<n1:e xmlns:n1="https://w3.org/2" a="a"></n1:e>
<n1:e xmlns:n1="https://w3.org/2" n2:a="x:a" xmlns:n2="https://w3.org/2"></n1:e>
<e n1:a="x:a" xmlns:n1="https://w3.org/2"></e>
<e n1:a="y:a" xmlns:n1="https://w3.org/1" n2:a="x:a" xmlns:n2="https://w3.org/2"></e>
<e a="a" n1:a="x:a" xmlns:n1="https://w3.org/2"></e>
<e n1:a="x:a" xmlns:n1="https://w3.org/2" n2:b="x:b" xmlns:n2="https://w3.org/2"></e>
</doc>

A References

CharModel: Character Model for the World Wide Web, ed. Martin J. Dürst, François Yergeau. Available at https://www.w3.org/TR/charmod.
Infoset: XML Information Set, ed. John Cowan. Available at https://www.w3.org/TR/xml-infoset.
Namespaces: Namespaces in XML, eds. Tim Bray, Dave Hollander, and Andrew Layman. Available at https://www.w3.org/TR/REC-xml-names.
Unicode: The Unicode Consortium. The Unicode Standard, version 3.0. ISBN 0-201-61633-5. Described at https://www.unicode.org/unicode/standard/versions/Unicode3.0.html.
XML: Extensible Markup Language (XML) 1.0, eds. Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen. 10 February 1998. Available at https://www.w3.org/TR/REC-xml.

B Acknowledgements (Non-Normative)

The work of producing this specification was accomplished by the membership of the W3C XML Syntax Working Group and its successor, the W3C XML Core Working Group:

Joel Nava, Adobe (Co-chair, Syntax)
Tim Bray, Invited Expert (Co-chair, Syntax; Editor)
Paul Grosso, Arbortext (Co-chair, Core)
Arnaud Le Hors, IBM (Co-chair, Core)
James Clark, Invited Expert (Editor)
James Tauber, Bow Street Software (Editor)
John Cowan, Reuters (Editor)
Bert Bos, W3C (W3C Liaison, Syntax)
Joseph Reagle, W3C (W3C Liaison, Syntax)
Dan Connolly, W3C (W3C Liaison, Core)
Daniel Veillard, W3C (W3C Liaison, Core)
Daniel Austin, Ask Jeeves
Gary Bisaga, Mitre
Tim Boland, NIST, Invited Expert
Allen Brown, Microsoft
John Evdemon, XMLSolutions
Charles Frankston, Microsoft
Eduardo Gutentag, Sun Microsystems
Michael Hyman, Microsoft
Murata Makoto, Fuji Xerox
Eve Maler, Sun Microsystems
Murray Maloney, Commerce One
Jonathan Marsh, Microsoft
Mark Needleman, Data Research Associates
Anguel Novoselsky, Oracle
David Orchard, IBM
Lew Shannon, NCR
Michael Sperberg-McQueen, U. Ill. and W3C
Steph Tryphonas, Microstar
Norman Walsh, Arbortext
François Yergeau, Alis

Original Source | Taken Source

Canonical XML Version 1.0

W3C Working Draft 19 January 2000

Abstract

Status of this document

Table of contents

Appendices

1 Introduction

2 Information Included in Canonical XML

2.1 The Document Information Item

2.2 Element Information Items

2.3 Attribute Information Items

2.4 Processing Instruction Information Item

2.5 Reference to Skipped Entity Information Items

2.6 Character Information Items

2.7 Comment Information Items

2.8 Document Type Declaration Information Items

2.9 Entity Information Items

2.10 Notation Information Items

2.11 Entity Start Marker Information Items

2.12 Entity End Marker Information Items

2.13 CDATA Start Marker Information Items

2.14 CDATA End Marker Information Items

2.15 Namespace Declaration Information Items

3 Document Type Definition Processing

4 Entity and Reference Processing

5 The Syntax of Canonical XML

Canonical XML

5.1 Character Encoding

5.2 Character Escaping

5.3 Prolog

5.4 Epilog

5.5 Elements

5.6 Tags

5.7 Attributes

5.8 Processing Instructions

5.9 Namespaces

A References

B Acknowledgements (Non-Normative)

Canonical XML
Version 1.0