CARVIEW |
This specification defines the Document Object Model Load and
Save Level 3, a platform- and language-neutral interface that
allows programs and scripts to dynamically load the content of
an XML document into a DOM document and serialize a DOM document
into an XML document; DOM documents being defined in
This document contains the Document Object Model Level 3 Load
and Save specification and is a
It is based on the feedback received during the
W3C Advisory Committee Representatives are now invited to submit
their formal review via Web form, as described in the Call for
Review. Additional comments may be sent to a Team-only list,
Publication as a Proposed Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
Patent disclosures relevant to this specification may be found
on the Working Group's
Created in electronic form.
$Revision: 1.3 $
Copyright © 2004
This document is published under the
This section is a copy of the W3C® Document
Notice and License and could be found at
Copyright © 2004
https://www.w3.org/Consortium/Legal/2002/copyright-documents-20021231
Public documents on the W3C site are provided by the copyright holders under the following license. By using and/or copying this document, or the W3C document from which this statement is linked, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions:
Permission to copy, and distribute the contents of this document,
or the W3C document from which this statement is linked, in any
medium for any purpose and without fee or royalty is hereby
granted, provided that you include the following on
A link or URL to the original W3C document.
The pre-existing copyright notice of the original author, or
if it doesn't exist, a notice (hypertext is preferred, but a
textual representation is permitted) of the form:
"Copyright © [$date-of-document]
When space permits, inclusion of the full text of this
No right to create modifications or derivatives of W3C documents is
granted pursuant to this license. However, if additional requirements
(documented in the
THIS DOCUMENT IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, NON-INFRINGEMENT, OR TITLE; THAT THE CONTENTS OF THE DOCUMENT ARE SUITABLE FOR ANY PURPOSE; NOR THAT THE IMPLEMENTATION OF SUCH CONTENTS WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE DOCUMENT OR THE PERFORMANCE OR IMPLEMENTATION OF THE CONTENTS THEREOF.
The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to this document or its contents without specific, written prior permission. Title to copyright in this document will at all times remain with copyright holders.
This section is a copy of the W3C® Software
Copyright Notice and License and could be found at
Copyright © 2004
https://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
This work (and included software, documentation such as READMEs, or other related items) is being provided by the copyright holders under the following license. By obtaining, using and/or copying this work, you (the licensee) agree that you have read, understood, and will comply with the following terms and conditions.
Permission to copy, modify, and distribute this software and its documentation, with or without modification, for any purpose and without fee or royalty is hereby granted, provided that you include the following on ALL copies of the software and documentation or portions thereof, including modifications:
The full text of this NOTICE in a location viewable to users of the redistributed or derivative work.
Any pre-existing intellectual property disclaimers, notices,
or terms and conditions. If none exist, the
Notice of any changes or modifications to the files, including the date changes were made. (We recommend you provide URIs to the location from which the code is derived.)
THIS SOFTWARE AND DOCUMENTATION IS PROVIDED "AS IS," AND COPYRIGHT HOLDERS MAKE NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE OR DOCUMENTATION WILL NOT INFRINGE ANY THIRD PARTY PATENTS, COPYRIGHTS, TRADEMARKS OR OTHER RIGHTS.
COPYRIGHT HOLDERS WILL NOT BE LIABLE FOR ANY DIRECT, INDIRECT, SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF ANY USE OF THE SOFTWARE OR DOCUMENTATION.
The name and trademarks of copyright holders may NOT be used in advertising or publicity pertaining to the software without specific, written prior permission. Title to copyright in this software and any associated documentation will at all times remain with copyright holders.
This section is a copy of the W3C® Short Software
Notice and could be found at
Copyright © 2004
Copyright © [$date-of-software]
[1] https://www.w3.org/Consortium/Legal/2002/copyright-software-20021231
This section defines a set of interfaces for loading and saving
document objects as defined in Document
is defined in
The proposal for loading is influenced by the Java APIs for XML
Processing
The list of interfaces involved with the Loading and Saving of XML documents is:
DOMImplementationLS
-- An extended
DOMImplementation
interface that provides the
factory methods for creating the objects required for
loading and saving.
LSParser
-- An interface for parsing data into
DOM documents.
LSInput
-- Encapsulates information about the
data to be loaded.
LSResourceResolver
-- Provides a way for
applications to redirect references to external resources
when parsing.
LSParserFilter
-- Provides the ability to
examine and optionally remove nodes as they are being
processed while parsing.
LSSerializer
-- An interface for serializing
DOM documents or nodes.
LSOutput
-- Encapsulates information about the
destination for the data to be output.
LSSerializerFilter
-- Provides the ability to
examine and filter DOM nodes as they are being processed for
the serialization.
To ensure interoperability, this specification specifies the following basic types used in various DOM modules. Even though the DOM uses the basic types in the interfaces, bindings may use different types and normative bindings are only given for Java and ECMAScript in this specification.
LSInputStream
type
This type is used to represent a sequence of input bytes.
A LSInputStream
represents a reference to a
byte stream source of an XML input.
For Java, LSInputStream
is bound to the
java.io.InputStream
type. For ECMAScript,
LSInputStream
is bound to Object
.
LSOutputStream
type
This type is used to represent a sequence of output bytes.
A LSOutputStream
represents a byte
stream destination for the XML output.
For Java, LSOutputStream
is bound to the
java.io.OutputStream
type. For ECMAScript,
LSOutputStream
is bound to Object
.
LSReader
type
This type is used to represent a sequence of input characters
in
A LSReader
represents a character
stream for the XML input.
For Java, LSReader
is bound to the
java.io.Reader
type. For ECMAScript,
LSReader
is
LSWriter
type
This type is used to represent a sequence of output characters
in
A LSWriter
represents a character
stream for the XML output.
For Java, LSWriter
is bound to the
java.io.Writer
type. For ECMAScript,
LSWriter
is
The interface within this section is considered fundamental, and must be fully implemented by all conforming implementations of the DOM Load and Save module.
A DOM application may use the hasFeature(feature,
version)
method of the DOMImplementation
interface with parameter values "LS"
(or
"LS-Async"
) and "3.0"
(respectively)
to determine whether or not these interfaces are supported by
the implementation. In order to fully support them, an
implementation must also support the "Core" feature defined in
A DOM application may use the hasFeature(feature,
version)
method of the DOMImplementation
interface with parameter values "LS-Async"
and
"3.0"
(respectively) to determine whether or not
the asynchronous mode is supported by the implementation. In
order to fully support the asynchronous mode, an
implementation must also support the "LS"
feature
defined in this section.
For additional information about
Parser or write operations may throw an LSException
if the processing is stopped. The processing can be stopped due to
a DOMError
with a severity of
DOMError.SEVERITY_FATAL_ERROR
or a non recovered
DOMError.SEVERITY_ERROR
, or if
DOMErrorHandler.handleError()
returned
false
.
As suggested in the definition of the constants in the
DOMError
interface, a DOM implementation may choose
to continue after a fatal error, but the resulting DOM tree is
then implementation dependent.
An integer indicating the type of error generated.
If an attempt was made to load a document, or an XML Fragment,
using LSParser
and the processing has been stopped.
If an attempt was made to serialize a Node
using
LSSerializer
and the processing has been stopped.
DOMImplementationLS
contains the factory methods for
creating Load and Save objects.
The expectation is that an instance of the
DOMImplementationLS
interface can be obtained by
using binding-specific casting methods on an instance of the
DOMImplementation
interface or, if the
Document
supports the feature "Core"
version "3.0"
defined in DOMImplementation.getFeature
with parameter values
"LS"
(or "LS-Async"
) and
"3.0"
(respectively).
Integer parser mode constants.
Create a synchronous LSParser
.
Create an asynchronous LSParser
.
Create a new LSParser
. The newly constructed
parser may then be configured by means of its
DOMConfiguration
object, and used to parse documents by
means of its parse
method.
The mode
argument is either
MODE_SYNCHRONOUS
or MODE_ASYNCHRONOUS
,
if mode
is MODE_SYNCHRONOUS
then the
LSParser
that is created will operate in
synchronous mode, if it's MODE_ASYNCHRONOUS
then
the LSParser
that is created will operate in
asynchronous mode.
An absolute URI representing the type of the Document
using the newly created
LSParser
. Note that no lexical checking is
done on the absolute URI. In order to create a
LSParser
for any kind of schema types
(i.e. the LSParser will be free to use any schema found),
use the value null
.
For W3C XML Schema "https://www.w3.org/2001/XMLSchema"
. For XML
DTD "https://www.w3.org/TR/REC-xml"
. Other Schema
languages are outside the scope of the W3C and therefore
should recommend an absolute URI in order to use this
method.
The newly created LSParser
object. This
LSParser
is either synchronous or asynchronous
depending on the value of the mode
argument.
By default, the newly created LSParser
does
not contain a DOMErrorHandler
, i.e. the value
of the "null
. However,
implementations may provide a default error handler at
creation time. In that case, the initial value of the
"error-handler"
configuration parameter on the
new LSParser
object contains a reference to
the default error handler.
NOT_SUPPORTED_ERR: Raised if the requested mode or schema type is not supported.
Create a new LSSerializer
object.
The newly created LSSerializer
object.
By default, the newly created LSSerializer
has no DOMErrorHandler
, i.e. the value of the
"error-handler"
configuration parameter is
null
. However, implementations may provide a
default error handler at creation time. In that case, the
initial value of the "error-handler"
configuration parameter on the new
LSSerializer
object contains a reference to the
default error handler.
Create a new empty input source object where
LSInput.characterStream
,
LSInput.byteStream
,
LSInput.stringData
LSInput.systemId
,
LSInput.publicId
, LSInput.baseURI
,
and LSInput.encoding
are null, and
LSInput.certifiedText
is false.
The newly created input object.
Create a new empty output destination object where
LSOutput.characterStream
,
LSOutput.byteStream
,
LSOutput.systemId
,
LSOutput.encoding
are null.
The newly created output object.
An interface to an object that is able to build, or augment, a DOM tree from various input sources.
LSParser
provides an API for parsing XML and
building the corresponding DOM document structure. A
LSParser
instance can be obtained by invoking the
DOMImplementationLS.createLSParser()
method.
As specified in
there will never be two adjacent nodes of type NODE_TEXT, and there will never be empty text nodes.
it is expected that the value
and
nodeValue
attributes of an Attr
node initially return the true
, depending on the attribute
normalization used, the attribute values may differ from the
ones obtained by the XML 1.0 attribute
normalization. false
, the XML 1.0 attribute
normalization is guaranteed to occur, and if the attributes
list does not contain namespace declarations, the
attributes
attribute on Element
node represents the property
[attributes] defined in
Asynchronous LSParser
objects are expected to also
implement the events::EventTarget
interface so that
event listeners can be registered on asynchronous
LSParser
objects.
Events supported by asynchronous LSParser
objects are:
The LSParser
finishes to load the
document. See also the definition of the
LSLoadEvent
interface.
The LSParser
signals progress as data is
parsed.
This specification does not attempt to define exactly when progress events should be dispatched, that is intentionally left as implementation dependent, but here is one example of how an application might dispatch progress events. Once the parser starts receiving data, a progress event is dispatched to indicate that the parsing starts, then from there on, a progress event is dispatched for every 4096 bytes of data that is received and processed. This is only one example, though, and implementations can choose to dispatch progress events at any time while parsing, or not dispatch them at all.
See also the definition of the
LSProgressEvent
interface.
All events defined in this specification use the namespace URI
"https://www.w3.org/2002/DOMLS"
.
While parsing an input source, errors are reported to the
application through the error handler
(
Raised if the paramter "
Raised if the configuration parameter "
Raised when loading a document and no input is specified
in the
Raised if a processing instruction is encountered in a
location where the base URI of the processing
instruction can not be preserved.
One example of a case where this warning will be raised is
if the configuration parameter "
And
An implementation dependent warning that may be raised
if the configuration parameter "
Raised if the configuration parameter "
Raised if an unsupported encoding is encountered.
Raised if the configuration parameter "LSParser.domConfig
's "DOMError.type
) of errors and warnings defined by
this specification are:
true
and a doctype is encountered.
LSInput
object.
false
and the following XML file is
parsed:
subdir/myentity.ent
contains:
true
and an unbound namespace
prefix is encountered in an entity's replacement
text. Raising this warning is not enforced since some
existing parsers may not recognize unbound namespace
prefixes in the replacement text of entities.
false
and a character is
encountered for which the processor cannot determine the
normalization properties.
true
and an unsupported media type
is encountered.
In addition to raising the defined errors and warnings, implementations are expected to raise implementation specific errors and warnings for any other error and warning cases such as IO errors (file not found, permission denied,...), XML well-formedness errors, and so on.
The DOMConfiguration
object used when parsing an
input source. This DOMConfiguration
is specific to
the parse operation and no parameter values from this
DOMConfiguration
object are passed automatically to
the DOMConfiguration
object on the
Document
that is created, or used, by the parse
operation. The DOM application is responsible for passing any
needed parameter values from this DOMConfiguration
object to the DOMConfiguration
object referenced by
the Document
object.
In addition to the parameters recognized in on the DOMConfiguration
objects for LSParser
add or modify the following parameters:
[
If a higher level protocol such as HTTP LSInput
overrides any encoding from
the protocol.
[
The parser ignores any character set encoding information from higher-level protocols.
[
Throw a fatal "doctype-not-allowed" error if a doctype node is found while parsing the document. This is useful when dealing with things like SOAP envelopes where doctype nodes are not allowed.
[
Allow doctype nodes in the document.
[
If, while verifying full normalization when
This parameter is ignored for
[
Report an fatal "unknown-character-denormalization" error if a character is encountered for which the processor cannot determine the normalization properties.
See the definition of DOMConfiguration
for
a description of this parameter. Unlike in true
for LSParser
.
[
Perform the namespace processing as defined in
[
Do not perform the namespace processing.
[
A reference to a LSResourceResolver
object, or null. If the value of this parameter is not
null when an external resource (such as an external XML
entity or an XML schema location) is encountered, the
implementation will request that the
LSResourceResolver
referenced in this
parameter resolves the resource.
[
Check that the media type of the parsed resource
is a supported media type. If an unsupported media
type is encountered, a fatal error of type
"unsupported-media-type" will be
raised. The media types defined in
[
Accept any media type.
The parameter "false
.
When a filter is provided, the implementation will call out to the filter as it is constructing the DOM tree structure. The filter can choose to remove elements from the document being constructed, or to terminate the parsing early.
The filter is invoked after the operations requested by the
DOMConfiguration
parameters have been applied. For
example, if "true
, the validation is done before
invoking the filter.
true
if the LSParser
is asynchronous,
false
if it is synchronous.
true
if the LSParser
is currently
busy loading a document, otherwise false
.
Parse an XML document from a resource identified by a
LSInput
.
The LSInput
from which the source of
the document is to be read.
If the LSParser
is a synchronous
LSParser
, the newly created and populated
Document
is returned. If the
LSParser
is asynchronous, null
is
returned since the document object may not yet be constructed
when this method returns.
INVALID_STATE_ERR: Raised if the LSParser
's
LSParser.busy
attribute is true
.
PARSE_ERR: Raised if the LSParser
was unable to
load the XML document. DOM applications should attach a
DOMErrorHandler
using the parameter "
Parse an XML document from a location identified by a
URI reference
The location of the XML document to be read.
If the LSParser
is a synchronous
LSParser
, the newly created and populated
Document
is returned, or null
if an
error occured. If the LSParser
is asynchronous,
null
is returned since the document object may
not yet be constructed when this method returns.
INVALID_STATE_ERR: Raised if the LSParser.busy
attribute is true
.
PARSE_ERR: Raised if the LSParser
was unable to
load the XML document. DOM applications should attach a
DOMErrorHandler
using the parameter "
A set of possible actions for the parseWithContext
method.
Append the result of the parse operation as children of the
context node. For this action to work, the context node must
be an Element
or a
DocumentFragment
.
Replace all the children of the context node with the result
of the parse operation. For this action to work, the context
node must be an Element
, a
Document
, or a DocumentFragment
.
Insert the result of the parse operation as the immediately
preceding sibling of the context node. For this action to
work the context node's parent must be an
Element
or a DocumentFragment
.
Insert the result of the parse operation as the immediately
following sibling of the context node. For this action to
work the context node's parent must be an
Element
or a DocumentFragment
.
Replace the context node with the result of the parse
operation. For this action to work, the context node must
have a parent, and the parent must be an
Element
or a DocumentFragment
.
Parse an XML fragment from a resource identified by a
LSInput
and insert the content into an existing
document at the position specified with the
context
and action
arguments. When
parsing the input stream, the context node (or its parent,
depending on where the result will be inserted) is used for
resolving unbound namespace prefixes. The context node's
ownerDocument
node (or the node itself if the
node of type DOCUMENT_NODE
) is used to resolve
default attributes and entity references.
As the new data is inserted into the document, at least one mutation event is fired per new immediate child or sibling of the context node.
If the context node is a Document
node and the
action is ACTION_REPLACE_CHILDREN
, then the
document that is passed as the context node will be changed
such that its xmlEncoding
,
documentURI
, xmlVersion
,
inputEncoding
, xmlStandalone
, and all
other such attributes are set to what they would be set to if
the input source was parsed using
LSParser.parse()
.
This method is always synchronous, even if the
LSParser
is asynchronous
(LSParser.async
is true
).
If an error occurs while parsing, the caller is notified through
the ErrorHandler
instance associated with the
"DOMConfiguration
.
When calling parseWithContext
, the values of the
following configuration parameters will be ignored and their
default values will always be used instead: "LSParserFilter
just as if a
whole document was parsed.
The LSInput
from which the source document
is to be read. The source document must be an XML
fragment, i.e. anything except a complete XML document
(except in the case where the context node of type
DOCUMENT_NODE
, and the action is
ACTION_REPLACE_CHILDREN
), a DOCTYPE (internal
subset), entity declaration(s), notation declaration(s),
or XML or text declaration(s).
The node that is used as the context for the data that is
being parsed. This node must be a Document
node, a DocumentFragment
node, or a node of a
type that is allowed as a child of an Element
node, e.g. it cannot be an Attribute
node.
This parameter describes which action should be taken
between the new set of nodes being inserted and the
existing children of the context node. The set of possible
actions is defined in ACTION_TYPES
above.
Return the node that is the result of the parse operation. If the result is more than one top-level node, the first one is returned.
HIERARCHY_REQUEST_ERR: Raised if the content cannot
replace, be inserted before, after, or as a
child of the context node (see also
Node.insertBefore
or
Node.replaceChild
in
NOT_SUPPORTED_ERR: Raised if the LSParser
doesn't support this method, or if the context node is of
type Document
and the DOM implementation
doesn't support the replacement of the
DocumentType
child or Element
child.
NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a
INVALID_STATE_ERR: Raised if the LSParser.busy
attribute is true
.
PARSE_ERR: Raised if the LSParser
was unable to
load the XML fragment. DOM applications should attach a
DOMErrorHandler
using the parameter "
Abort the loading of the document that is currently being
loaded by the LSParser
. If the
LSParser
is currently not busy, a call to this
method does nothing.
This interface represents an input source for data.
This interface allows an application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), a base URI, and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
The application is expected to provide objects that implement
this interface whenever such objects are needed. The application
can either provide its own objects that implement this
interface, or it can use the generic factory method
DOMImplementationLS.createLSInput()
to create
objects that implement this interface.
The
LSParser
will use the LSInput
object to determine how to read data. The LSParser
will look at the different inputs specified in the
LSInput
in the following order to know which one
to read from, the first one that is not null and not an empty
string will be used:
LSInput.characterStream
LSInput.byteStream
LSInput.stringData
LSInput.systemId
LSInput.publicId
If all inputs are null, the LSParser
will report a
DOMError
with its DOMError.type
set to
"no-input-specified"
and its
DOMError.severity
set to
DOMError.SEVERITY_FATAL_ERROR
.
LSInput
objects belong to the application. The DOM
implementation will never modify them (though it may make copies
and modify the copies, if necessary).
An attribute of a language and binding dependent type that
represents a stream of
An attribute of a language and binding dependent type that represents a stream of bytes.
If the application knows the character encoding of the byte stream, it should set the encoding attribute. Setting the encoding in this way will override any encoding specified in an XML declaration in the data.
String data to parse. If provided, this will always be treated
as a sequence of
The system identifier, a URI reference LSParser
will
only attempt to fetch the resource identified by the URI
reference if there is no other input available in the
input source).
If the application knows the character encoding of the object
pointed to by the system identifier, it can set the encoding
using the encoding
attribute.
If the specified system ID is a relative URI reference (see
section 5 in baseURI
as the base, if that fails, the behavior is
implementation dependent.
The public identifier for this input source. This may be mapped to an input source using an implementation dependent mechanism (such as catalogues or other mappings). The public identifier, if specified, may also be reported as part of the location information when errors are reported.
The base URI to be used (see section 5.1.4 in systemId
to an absolute URI.
If, when used, the base URI is itself a relative URI, an empty string, or null, the behavior is implementation dependent.
The character encoding, if known. The encoding must be a
string acceptable for an XML encoding declaration (
This attribute has no effect when the application provides a
character stream or string data. For other sources of input, an
encoding specified by means of this attribute will override
any encoding specified in the XML declaration or the Text
declaration, or an encoding obtained from a higher level
protocol, such as HTTP
If set to true, assume that the input is certified (see
section 2.13 in
LSResourceResolver
provides a way for applications
to redirect references to external resources.
Applications needing to implement custom handling for external
resources can implement this interface and register their
implementation by setting the "resource-resolver" parameter of
DOMConfiguration
objects attached to
LSParser
and LSSerializer
. It can also
be register on DOMConfiguration
objects attached to
Document
if the "LS" feature is supported.
The LSParser
will then allow the application to
intercept any external entities, including the external DTD subset
and external parameter entities, before including them. The
top-level document entity is never passed to the
resolveResource
method.
Many DOM applications will not need to implement this interface, but it will be especially useful for applications that build XML documents from databases or other specialized input sources, or for applications that use URN's.
LSResourceResolver
is based on the SAX2 EntityResolver
interface.
Allow the application to resolve external resources.
The LSParser
will call this method before opening
any external resource, including the external DTD subset,
external entities referenced within the DTD, and external
entities referenced within the document element (however, the
top-level document entity is not passed to this method). The
application may then request that the LSParser
resolve the external resource itself, that it use an alternative
URI, or that it use an entirely different input source.
Application writers can use this method to redirect external system identifiers to secure and/or local URI, to look up public identifiers in a catalogue, or to read an entity from a database or other input source (including, for example, a dialog box).
The type of the resource being resolved. For XML "https://www.w3.org/TR/REC-xml"
, for XML
Schema "https://www.w3.org/2001/XMLSchema"
. Other
types of resources are outside the scope of this
specification and therefore should recommend an absolute
URI in order to use this method.
The namespace of the resource being resolved, e.g. the
target namespace of the XML Schema
The public identifier of the external entity being
referenced, or null
if no public identifier
was supplied or if the resource is not an entity.
The system identifier, a URI reference null
if no system identifier was supplied.
The absolute base URI of the resource being parsed, or
null
if there is no base URI.
A LSInput
object describing the new input
source, or null
to request that the parser open
a regular URI connection to the resource.
LSParserFilter
s provide applications the ability to
examine nodes as they are being constructed while parsing.
As each node is examined, it may be modified or removed,
or the entire parse may be terminated early.
At the time any of the filter methods are called by the parser,
the owner Document and DOMImplementation objects exist and are
accessible. The document element is never passed to the
LSParserFilter
methods, i.e. it is not possible to
filter out the document element. Document
,
DocumentType
, Notation
,
Entity
, and Attr
nodes are never passed
to the acceptNode
method on the filter. The child
nodes of an EntityReference
node are passed to the
filter if the parameter "false
. Note that, as described by the
parameter "
All validity checking while parsing a document occurs on the source document as it appears on the input stream, not on the DOM document as it is built in memory. With filters, the document in memory may be a subset of the document on the stream, and its validity may have been affected by the filtering.
All default attributes must be present on elements when the elements are passed to the filter methods. All other default content must be passed to the filter methods.
DOM applications must not raise exceptions in a filter. The effect of throwing exceptions from a filter is DOM implementation dependent.
Constants returned by startElement
and
acceptNode
.
Accept the node.
Reject the node and its children.
Skip this single node. The children of this node will still be considered.
Interrupt the normal processing of the document.
The parser will call this method after each
Element
start tag has been scanned, but before
the remainder of the Element
is processed. The
intent is to allow the element, including any children, to be
efficiently skipped. Note that only element nodes are passed
to the startElement
function.
The element node passed to startElement
for
filtering will include all of the Element's attributes,
but none of the children nodes. The Element may not yet be
in place in the document being constructed (it may not have
a parent node.)
A startElement
filter function may access or change the
attributes for the Element. Changing Namespace declarations will
have no effect on namespace resolution by the parser.
For efficiency, the Element node passed to the filter may not be the same one as is actually placed in the tree if the node is accepted. And the actual node (node object identity) may be reused during the process of reading in and filtering a document.
The newly encountered element. At the time this method is called, the element is incomplete - it will have its attributes, but no children.
FILTER_ACCEPT
if the Element
should be included in the DOM document being built.
FILTER_REJECT
if the Element
and all of its children should be rejected.
FILTER_SKIP
if the Element
should be skipped. All of its children are inserted in
place of the skipped Element
node.
FILTER_INTERRUPT
if the filter wants to
stop the processing of the document. Interrupting the
processing of the document does no longer guarantee that
the resulting DOM tree is Element
is rejected.
Returning any other values will result in unspecified behavior.
This method will be called by the parser at the completion of the parsing of each node. The node and all of its descendants will exist and be complete. The parent node will also exist, although it may be incomplete, i.e. it may have additional children that have not yet been parsed. Attribute nodes are never passed to this function.
From within this method, the new node may be freely modified - children may be added or removed, text nodes modified, etc. The state of the rest of the document outside this node is not defined, and the affect of any attempt to navigate to, or to modify any other part of the document is undefined.
For validating parsers, the checks are made on the original document, before any modification by the filter. No validity checks are made on any document modifications made by the filter.
If this new node is rejected, the parser might reuse the new node and any of its descendants.
The newly constructed element. At the time this method is called, the element is complete - it has all of its children (and their children, recursively) and attributes, and is attached as a child to its parent.
FILTER_ACCEPT
if this
Node
should be included in the DOM
document being built.
FILTER_REJECT
if the
Node
and all of its children should
be rejected.
FILTER_SKIP
if the Node
should be skipped and the Node
should
be replaced by all the children of the
Node
.
FILTER_INTERRUPT
if the filter wants to
stop the processing of the document. Interrupting the
processing of the document does no longer guarantee that
the resulting DOM tree is Node
is accepted and will be the last
completely parsed node.
Tells the LSParser
what types of nodes to show to
the method LSParserFilter.acceptNode
. If a node is
not shown to the filter using this attribute, it is
automatically included in the DOM document being built. See
NodeFilter
for definition of the constants. The
constants SHOW_ATTRIBUTE
,
SHOW_DOCUMENT
, SHOW_DOCUMENT_TYPE
,
SHOW_NOTATION
, SHOW_ENTITY
, and
SHOW_DOCUMENT_FRAGMENT
are meaningless here, those
nodes will never be passed to
LSParserFilter.acceptNode
.
The constants used here are defined in
This interface represents a progress event object that notifies
the application about progress as a document is parsed. It extends
the Event
interface defined in
The units used for the attributes position
and
totalSize
are not specified and can be implementation
and input dependent.
The input source that is being parsed.
The current position in the input source, including all external entities and other resources that have been read.
The total size of the document including all external
resources, this number might change as a document is being
parsed if references to more external resources are seen. A value
of 0
is returned if the total size cannot be
determined or estimated.
This interface represents a load event object that signals the completion of a document load.
The document that finished loading.
The input source that was parsed.
A LSSerializer
provides an API for serializing
(writing) a DOM document out into XML. The XML data is written to
a string or an output stream. Any changes or fixups made during
the serialization affect only the serialized data. The
Document
object and its children are never altered by
the serialization operation.
During serialization of XML data, namespace fixup is done as
defined in namespaceURI
of a Node
is
empty string, the serialization will treat them as
null
, ignoring the prefix if any.
LSSerializer
accepts any node type for
serialization. For nodes of type Document
or
Entity
, well-formed XML will be created when
possible (well-formedness is guaranteed if the document or
entity comes from a parse operation and is unchanged since it
was created). The serialized output for these node types is
either as a XML document or an External XML Entity,
respectively, and is acceptable input for an XML parser. For all
other types of nodes the serialized form is implementation
dependent.
Within a Document
, DocumentFragment
, or
Entity
being serialized, Nodes
are
processed as follows
Document
nodes are written, including the XML
declaration (unless the parameter "false
) and a DTD subset, if one exists
in the DOM. Writing a Document
node serializes
the entire document.
Entity
nodes, when written directly by
LSSerializer.write
, outputs the entity expansion
but no namespace fixup is done. The resulting output will be
valid as an external entity.
If the parameter "true
, EntityReference
nodes are serialized as an entity reference of the form
"&entityName;
" in the output. Child nodes
(the expansion) of the entity reference are ignored. If the
parameter "false
, only the children of the entity
reference are serialized. EntityReference
nodes
with no children (no corresponding Entity
node or
the corresponding Entity
nodes have no children)
are always serialized.
CDATAsections
containing content characters that
cannot be represented in the specified output encoding are
handled according to the "
If the parameter is set to true
,
CDATAsections
are split, and the unrepresentable
characters are serialized as numeric character references in
ordinary content. The exact position and number of splits is
not specified.
If the parameter is set to false
, unrepresentable
characters in a CDATAsection
are reported as
"wf-invalid-character"
errors if the parameter
"true
. The error is not recoverable -
there is no mechanism for supplying alternative characters and
continuing with the serialization.
DocumentFragment
nodes are serialized by
serializing the children of the document fragment in the order
they appear in the document fragment.
All other node types (Element, Text, etc.) are serialized to their corresponding XML source form.
The serialization of a Node
does not always
generate a LSParser
might throw fatal
errors when parsing the resulting serialization.
Within the character data of a document (outside of markup), any characters that cannot be represented directly are replaced with character references. Occurrences of '<' and '&' are replaced by the predefined entities < and &. The other predefined entities (>, ', and ") might not be used, except where needed (e.g. using > in cases such as ']]>'). Any characters that cannot be represented directly in the output character encoding are serialized as numeric character references (and since character encoding standards commonly use hexadecimal representations of characters, using the hexadecimal representation when serializing character references is encouraged).
To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as "'", and the double-quote character (") as """. New line characters and other characters that cannot be represented directly in attribute values in the output character encoding are serialized as a numeric character reference.
Within markup, but outside of attributes, any occurrence of a
character that cannot be represented in the output character
encoding is reported as a DOMError
fatal error. An
example would be serializing the element <LaCañada/> with
encoding="us-ascii"
. This will result with a
generation of a DOMError
"wf-invalid-character-in-node-name" (as proposed in "
When requested by setting the parameter "LSSerializer
to true, character normalization is
performed according to the definition of
When outputting unicode data, whether or not a byte order mark is serialized, or if the output is big-endian or little-endian, is implementation dependent.
Namespaces are fixed up during serialization, the serialization
process will verify that namespace declarations, namespace
prefixes and the namespace URI associated with elements and
attributes are consistent. If inconsistencies are found, the
serialized form of the document will be altered to remove
them. The method used for doing the namespace fixup while
serializing a document is the algorithm defined in Appendix B.1,
"Namespace normalization", of
While serializing a document, the parameter "
While serializing, errors and warnings are reported to the
application through the error handler
(
Raised when writing to a
Raised if the configuration parameter "
Raised if an unsupported encoding is encountered.
LSSerializer.domConfig
's "DOMError.type
) of errors and
warnings defined by this specification are:
LSOutput
if no
output is specified in the LSOutput
.
true
and an entity whose
replacement text contains unbound namespace prefixes is
referenced in a location where there are no bindings for
the namespace prefixes.
In addition to raising the defined errors and warnings, implementations are expected to raise implementation specific errors and warnings for any other error and warning cases such as IO errors (file not found, permission denied,...) and so on.
The DOMConfiguration
object used by the
LSSerializer
when serializing a DOM node.
In addition to the parameters recognized in on the DOMConfiguration
objects for
LSSerializer
adds, or modifies, the following
parameters:
[
Writes the document according to the rules specified
in true
will set the parameters "false
. Setting one of those
parameters to true
will set this
parameter to false
. Serializing an XML
1.1 document when "canonical-form" is
true
will generate a fatal error.
[
Do not canonicalize the output.
[
Use the Attr.specified
attribute to
decide what attributes should be discarded. Note
that some implementations might use whatever
information available to the implementation
(i.e. XML schema, DTD, the
Attr.specified
attribute, and so on) to
determine what attributes and content to discard if
this parameter is set to true
.
[
Keep all attributes and all content.
[
Formatting the output by adding whitespace to produce a pretty-printed, indented, human-readable form. The exact form of the transformations is not specified by this specification. Pretty-printing changes the content of the document and may affect the validity of the document, validating implementations should preserve validity.
[
Don't pretty-print the result.
[
If, while verifying full normalization when
"unknown-character-denormalization"
warning (instead of raising an error, if this
parameter is not set) and ignore any possible
denormalizations caused by these characters.
[
Report a fatal error if a character is encountered for which the processor cannot determine the normalization properties.
This parameter is equivalent to the one defined by
DOMConfiguration
in true
. While DOM
implementations are not required to support
[
If a Document
, Element
,
or Entity
node is serialized, the XML
declaration, or text declaration, should be
included. The version
(Document.xmlVersion
if the document
is a Level 3 document and the version is non-null,
otherwise use the value "1.0"), and the output
encoding (see LSSerializer.write
for
details on how to find the output encoding) are
specified in the serialized XML declaration.
[
Do not serialize the XML and text
declarations. Report a
"xml-declaration-needed"
warning if
this will cause problems (i.e. the serialized data
is of an XML version other than
The end-of-line sequence of characters to be used in the XML
being written out. Any string is supported, but XML treats only
a certain set of characters sequence as end-of-line (See section
2.11, "End-of-Line Handling" in
On retrieval, the default value of this attribute is the
implementation specific default end-of-line sequence. DOM
implementations should choose the default to match the usual
convention for text files in the environment being used.
Implementations must choose a default sequence that matches one
of those allowed by XML 1.0 or XML 1.1, depending on the
serialized content. Setting this attribute to null
will reset its value to the default value.
When the application provides a filter, the serializer will call out to the filter before serializing each Node. The filter implementation can choose to remove the node from the stream or to terminate the serialization early.
The filter is invoked after the operations requested by the
DOMConfiguration
parameters have been applied. For
example, CDATA sections won't be passed to the filter if
"false
.
Serialize the specified node as described above in the general
description of the LSSerializer
interface. The
output is written to the supplied LSOutput
.
When writing to a LSOutput
, the encoding is found
by looking at the encoding information that is reachable through
the LSOutput
and the item to be written (or its
owner document) in this order:
LSOutput.encoding
,
Document.inputEncoding
,
Document.xmlEncoding
.
If no encoding is reachable through the above properties, a default encoding of "UTF-8" will be used.
If the specified encoding is not supported an "unsupported-encoding" fatal error is raised. When outputting XML data, implementations are required to support the encodings "UTF-8", "UTF-16BE", and "UTF-16LE" to guarantee that data is serializable in all encodings that are required to be supported by all XML parsers.
If no output is specified in the LSOutput
, a
"no-output-specified" fatal error is raised.
The implementation is responsible of associating the appropriate media type with the serialized data.
When writing to a HTTP URI, a HTTP PUT is performed. When writing to other types of URIs, the mechanism for writing the data to the URI is implementation dependent.
The node to serialize.
The destination for the serialized DOM.
Returns true
if node
was
successfully serialized. Return false
in case the
normal processing stopped but the implementation kept
serializing the document; the result of the serialization
being implementation dependent then.
SERIALIZE_ERR: Raised if the LSSerializer
was
unable to serialize the node. DOM applications should attach
a DOMErrorHandler
using the parameter
"
A convenience method that acts as if
LSSerializer.write
was called with a
LSOutput
with no encoding specified and
LSOutput.systemId
set to the uri
argument.
The node to serialize.
The URI to write to.
Returns true
if node
was
successfully serialized. Return false
in case the
normal processing stopped but the implementation kept
serializing the document; the result of the serialization
being implementation dependent then.
SERIALIZE_ERR: Raised if the LSSerializer
was
unable to serialize the node. DOM applications should attach
a DOMErrorHandler
using the parameter
"
Serialize the specified node as described above in the general
description of the LSSerializer
interface. The
output is written to a DOMString
that is returned
to the caller. The encoding used is the encoding of the
DOMString
type, i.e. UTF-16.
The node to serialize.
Returns the serialized data.
DOMSTRING_SIZE_ERR: Raised if the resulting string is too long to
fit in a DOMString
.
SERIALIZE_ERR: Raised if the LSSerializer
was unable to
serialize the node. DOM applications should attach a
DOMErrorHandler
using the parameter "
This interface represents an output destination for data.
This interface allows an application to encapsulate information about an output destination in a single object, which may include a URI, a byte stream (possibly with a specified encoding), a base URI, and/or a character stream.
The exact definitions of a byte stream and a character stream are binding dependent.
The application is expected to provide objects that implement
this interface whenever such objects are needed. The application
can either provide its own objects that implement this
interface, or it can use the generic factory method
DOMImplementationLS.createLSOutput()
to create
objects that implement this interface.
The
LSSerializer
will use the
LSOutput
object to determine where to serialize
the output to. The LSSerializer
will look at the
different outputs specified in the LSOutput
in the
following order to know which one to output to, the first one
that is not null and not an empty string will be used:
LSOutput.characterStream
LSOutput.byteStream
LSOutput.systemId
LSOutput
objects belong to the application. The
DOM implementation will never modify them (though it may make
copies and modify the copies, if necessary).
An attribute of a language and binding dependent type that
represents a writable stream to which
An attribute of a language and binding dependent type that represents a writable stream of bytes.
The system identifier, a URI reference
If the system ID is a relative URI reference (see section 5 in
The character encoding to use for the output. The encoding
must be a string acceptable for an XML encoding declaration
(
LSSerializerFilter
s provide applications the
ability to examine nodes as they are being serialized and decide
what nodes should be serialized or not. The
LSSerializerFilter
interface is based on the
NodeFilter
interface defined in
Document
, DocumentType
,
DocumentFragment
, Notation
,
Entity
, and children of Attr
nodes are
not passed to the filter. The child nodes of an
EntityReference
node are only passed to the filter if
the EntityReference
node is skipped by the method
LSParserFilter.acceptNode()
.
When serializing an Element
, the element is passed
to the filter before any of its attributes are passed to the
filter. Namespace declaration attributes, and default attributes
(except in the case when "false
), are never passed to the filter.
The result of any attempt to modify a node passed to a
LSSerializerFilter
is implementation dependent.
DOM applications must not raise exceptions in a filter. The effect of throwing exceptions from a filter is DOM implementation dependent.
For efficiency, a node passed to the filter may not be the same as the one that is actually in the tree. And the actual node (node object identity) may be reused during the process of filtering and serializing a document.
Tells the LSSerializer
what types of nodes to show
to the filter. If a node is not shown to the filter using this
attribute, it is automatically serialized. See
NodeFilter
for definition of the constants. The
constants SHOW_DOCUMENT
,
SHOW_DOCUMENT_TYPE
,
SHOW_DOCUMENT_FRAGMENT
, SHOW_NOTATION
,
and SHOW_ENTITY
are meaningless here, such nodes
will never be passed to a LSSerializerFilter
.
Unlike SHOW_ATTRIBUTE
constant indicates that the
Attr
nodes are shown and passed to the filter.
The constants used here are defined in
This appendix contains the complete OMG IDL
The IDL files are also available as:
This appendix contains the complete Java
The Java files are also available as
This appendix contains the complete ECMAScript
Many people contributed to the DOM specifications (Level 1, 2 or 3), including participants of the DOM Working Group and the DOM Interest Group. We especially thank the following:
Andrew Watson (Object Management Group), Andy Heninger (IBM),
Angel Diaz (IBM), Arnaud Le Hors (W3C and IBM), Ashok Malhotra
(IBM and Microsoft), Ben Chang (Oracle), Bill Smith (Sun), Bill
Shea (Merrill Lynch), Bob Sutor (IBM), Chris Lovett (Microsoft),
Chris Wilson (Microsoft), David Brownell (Sun), David Ezell
(Hewlett-Packard Company), David Singer (IBM), Dimitris
Dimitriadis (Improve AB and invited expert), Don Park (invited),
Elena Litani (IBM), Eric Vasilik (Microsoft), Gavin Nicol
(INSO), Ian Jacobs (W3C), James Clark (invited), James Davidson
(Sun), Jared Sorensen (Novell), Jeroen van Rotterdam (X-Hive
Corporation), Joe Kesselman (IBM), Joe Lapp (webMethods), Joe
Marini (Macromedia), Johnny Stenback (Netscape/AOL), Jon
Ferraiolo (Adobe), Jonathan Marsh (Microsoft), Jonathan Robie
(Texcel Research and Software AG), Kim Adamson-Sharpe (SoftQuad
Software Inc.), Lauren Wood (SoftQuad Software Inc.,
Thanks to all those who have helped to improve this specification by sending suggestions and corrections (Please, keep bugging us with your issues!).
Many thanks to Elliotte Rusty Harold, Andrew Clover, Anjana Manian, Christian Parpart, Mikko Honkala, and François Yergeau for their review and comments of this document.
Special thanks to the
This specification was written in XML. The HTML, OMG IDL, Java and ECMAScript bindings were all produced automatically.
Thanks to Joe English, author of
After DOM Level 1, we used
Thanks also to Jan Kärrman, author of
Some of the following term definitions have been borrowed or modified from similar definitions in other W3C or standards documents. See the links within the definitions for more information.
The base unit of a DOMString
. This indicates that
indexing on a DOMString
occurs in units of 16 bits.
This must not be misunderstood to mean that a DOMString
can store arbitrary 16-bit units. A DOMString
is a
character string encoded in UTF-16; this means that the restrictions
of UTF-16 as well as the other relevant restrictions on character strings
must be maintained. A single character, for example in the form of a
numeric character reference, may correspond to one or two 16-bit units.
An
An
An
The process by which an
The process by which an
A
A [client] application is any software that uses the Document Object Model programming interfaces provided by the hosting implementation to accomplish useful work. Some examples of client applications are scripts within an HTML or XML document.
The
A
A model for a document that represents the document
after it has been manipulated in some way. For example, any
combination of any of the following transformations would
create a cooked model:
Expansion of internal text entities. Expansion of external entities. Model augmentation with style-specified generated
text. Execution of style-specified
reordering. Execution of scripts.
A
A
When new releases of specifications are released, some older
features may be marked as being
A
There is only one document element in a Document
. This
element node is a child of the Document
node. See
There is an ordering,
The term "DOM Level 0" refers to a mix (not formally specified) of HTML document functionalities offered by Netscape Navigator version 3.0 and Microsoft Internet Explorer version 3.0. In some cases, attributes or methods have been included for reasons of backward compatibility with "DOM Level 0".
The programming language defined by the ECMA-262 standard
Each document contains one or more elements, the
boundaries of which are either delimited by start-tags and
end-tags, or, for empty elements by an empty-element tag.
Each element has a type, identified by name, and may have a
set of attributes. Each attribute has a name and a value.
See
An event is the representation of some asynchronous occurrence
(such as a mouse click on the presentation of the element, or
the removal of child node from an element, or any of
unthinkably many other possibilities) that gets associated
with an
The object to which an
Two nodes are
Two nodes are NodeList
objects, and the pairs of
equivalent attributes must in fact be deeply equivalent.
Two NodeList
objects are
Two NamedNodeMap
objects are
Two DocumentType
nodes are NamedNodeMap
objects.
An information item is an abstract representation of some
component of an XML document. See the
Text
or CDATASection
nodes that can be visited
sequentially in Element
,
Comment
, or ProcessingInstruction
nodes.
A
A [hosting] implementation is a software module that provides an implementation of the DOM interfaces so that a client application can use them. Some examples of hosting implementations are browsers, editors and document repositories.
The HyperText Markup Language (
An Interface Definition Language (
Companies, organizations, and individuals that claim to support the Document Object Model as an API for their products.
In object-oriented programming, the ability to create new
classes (or interfaces) that contain all the methods and properties
of another class (or interface), plus additional methods and
properties. If class (or interface) D inherits from class (or
interface) B, then D is said to be
Also known as the
An
A programming
An object is
A
A
A
A
A
A node is a
An
A
A node in a DOM tree is
A
A
The
A
Two nodes are
When string matching is required, it is to occur as
though the comparison was between 2 sequences of code points
from
A document is
The target node is the node representing the
The process by which an
An information item such as an
The description given to various information items (for example,
attribute values of various types, but not including the StringType
CDATA) after having been processed by the XML processor. The process
includes stripping leading and trailing white space, and replacing
multiple space characters by one. See the definition of
A document is
See initial structure model.
A node is a
Extensible Markup Language (
See
An
For the latest version of any W3C specification please consult the list of
DOM Requirements for DOM Level 3in
OMG IDL Syntax and Semanticsdefined in