CARVIEW |
Select Language
HTTP/2 200
date: Sat, 11 Oct 2025 06:28:00 GMT
content-type: text/plain
content-length: 23729
cf-ray: 98cc4b7ca906c7d3-BLR
content-location: draft-ietf-iiir-html-01.txt
vary: negotiate,Accept-Encoding
tcn: choice
last-modified: Mon, 12 Feb 1996 18:20:25 GMT
etag: "13c91-2ed8f31cc4c40;602036b67530d
cache-control: max-age=21600
expires: Sat, 11 Oct 2025 12:28:00 GMT
content-encoding: gzip
x-backend: www-mirrors
x-request-id: 98cc4b7ca906c7d3
strict-transport-security: max-age=15552000; includeSubdomains; preload
content-security-policy: frame-ancestors 'self' https://cms.w3.org/ https://cms-dev.w3.org/; upgrade-insecure-requests
cf-cache-status: BYPASS
accept-ranges: bytes
set-cookie: __cf_bm=HoiWL4xCqB0eUn8F_pSrb6ZVflzGvLtgeWvExzeocEs-1760164080-1.0.1.1-DBfg3x9tPQKCV17XioVOXyaHMuc1ErfutkP3T.kV11sVyblAo5hLVYmyPPYTdp7LVPpw9HKN6oaWpyRDPIlocPsFDyj.kqPv9ZcHVqOJJQQ; path=/; expires=Sat, 11-Oct-25 06:58:00 GMT; domain=.w3.org; HttpOnly; Secure; SameSite=None
server: cloudflare
alt-svc: h3=":443"; ma=86400
Hypertext Markup Language (HTML) Tim Berners-Lee, CERN
Internet Draft Daniel Connolly, Atrium
IIIR Working Group June 1993
Hypertext Markup Language (HTML)
A Representation of Textual Information and MetaInformation
for Retrieval and Interchange
Status of this Document
This document is an Internet Draft. Internet Drafts are working
documents of the Internet Engineering Task Force (IETF), its Areas,
and its Working Groups. Note that other groups may also distribute
working documents as Internet Drafts.
Internet Drafts are working documents valid for a maximum of six
months. Internet Drafts may be updated, replaced, or obsoleted by
other documents at any time. It is not appropriate to use Internet
Drafts as reference material or to cite them other than as a
"working draft" or "work in progress".
Distribution of this document is unlimited. The document is a
draft form of a standard for interchange of information on the
network which is proposed to be registered as a MIME (RFC1341)
content type. Please send comments to timbl@info.cern.ch or the
discussion list www-talk@info.cern.ch.
This is version 1.2 of this draft. This document is available in
hypertext on the World-Wide Web as
https://info.cern.ch/hypertext/WWW/MarkUp/HTML.html
Abstract
HyperText Markup Language (HTML) can be used to represent
Hypertext news, mail, online documentation, and collaborative
hypermedia;
Menus of options;
Database query results;
Simple structured documents with inlined graphics.
Hypertext views of existing bodies of information
The World Wide Web (W3) initiative links related information
throughout the globe. HTML provides one simple format for
providing linked information, and all W3 compatible programs are
required to be capable of handling HTML. W3 uses an Internet
Berners-Lee and Connolly 1
protocol (Hypertext Transfer Protocol, HTTP), which allows transfer
representations to be negotiated between client and server, the
result being returned in an extended MIME message. HTML is
therefore just one, but an important one, of the representations
used with W3.
HTML is proposed as a MIME content type.
HTML refers to the URL specification of RFCxxxx.
Implementations of HTML parsers and generators can be found in the
various W3 servers and browsers, in the public domain W3 code, and
may also be built using various public domain SGML parsers such as
[SGMLS] . HTML is an SGML document type with fairly generic
semantics appropriate for representing information from a wide
range of applications. It is more generic than many specific SGML
applications, but is still completely device-independent.
IN THIS DOCUMENT
This document contains the following parts:
Vocabulary used in this document, degrees of imperative.
HTML and MIME with discussion of character sets.
HTML and SGML and the relationship between them, and
Structured text : an introduction for
beginners to SGML.
HTML Elements A list with description, example, and
typical rendering.
HTML Entities Entities used to describe characters.
The HTML DTD The text of the SGML DTD for HTML
Link relationship values .
A provisional list. Not part of the
standard.
Registration Authority
The authority for extending lists of valid
vales.
References to related documents
Authors addresses Contact information.
table of contents
Vocabulary
Berners-Lee and Connolly 2
This specification uses the words below with the precise meaning
given.
Representation The encoding of information for interchange.
For example, HTML is a representation of
hypertext.
Rendering The form of presentation to information to
the human reader.
IMPERATIVES
may The implementation is not obliged to follow
this in any way.
must If this is not followed, the implementation
does not conform to this specification.
shall as "must"
should If this is not followed, though the
implementation officially conforms to the
standard, undesirable results may occur in
practice.
typical Typical rendering is described for many
elements. This is not a mandatory part of the
standard but is given as guidance for
designers and to help explain the uses for
which the elements were intended.
NOTES
Sections marked "Note:" are not mandatory parts of the
specification but for guidance only.
STATUS OF FEATURES
Mainstream All parsers must recognize these features.
Features are mainstream unless otherwise
mentioned.
Extra Standard HTML features which may safely be
ignored by parsers. It is legal to ignore
these, treat the contents as though the tags
were not there. (e.g. EM, and any undefined
elements)
Obsolete Not standard HTML. Parsers should implement
these features as far as possible in order to
preserve back-compatibility with previous
versions of this specification.
Berners-Lee and Connolly 3
HTML AND MIME
The definition of the HTML content subtype is
MIME Type name text
MIME subtype name: html
Required parameters: none
Optional parameters: charset
Character sets
The base character set (the SGML BASESET) for HTML is ISO Latin-1.
This is the set referred to by any numeric character references .
The actual character set used in the representation of an HTML
document may be ISO Latin 1, or its 7-bit subset which is ASCII.
There is no obligation for an HTML document to contain any
characters above decimal 127. It is possible that a transport
medium such as electronic mail imposes constraints on the number of
bits in a representation of a document, though the HTTP access
protocol used by W3 always allows 8 bit transfer.
When an HTML document is encoded using 7-bit characters, then the
mechanisms of character references and entity references may be
used to encode characters in the upper half of the ISO Latin-1 set.
In this way, documents may be prepared which are suitable for
mailing through 7-bit limited systems.
INTRODUCTION
The HyperText Markup Language is defined in terms of the ISO
Standard Generalized Markup Language []. SGML is a system for
defining structured document types and markup languages to
represent instances of those document types.
Every SGML document has three parts:
An SGML declaration, which binds SGML processing quantities and
syntax token names to specific values. For example, the SGML
declaration in the HTML DTD specifies that the string that opens
a tag is and the maximum length of a name is 40 characters.
A prologue including one or more document type declarations,
which specifiy the element types, element relationships and
attributes, and references that can be represented by markup.
The HTML DTD specifies, for example, that the HEAD element
contains at most one TITLE element.
An instance, which contains the data and markup of the document.
We use the term HTML to mean both the document type and the markup
Berners-Lee and Connolly 4
language for representing instances of that document type.
All HTML documents share the same SGML declaration an prologue.
Hence implementations of the WorldWide Web generally only transmit
and store the instance part of an HTML document. To construct an
SGML document entity for processing by an SGML parser, it is
necessary to prefix the text from ``HTML DTD'' on page 10 to the
HTML instance.
Conversely, to implement an HTML parser, one need only implement
those parts of an SGML parser that are needed to parse an instance
after parsing the HTML DTD.
Structured Text
An HTML instance is like a text file, except that some of the
characters are interpreted as markup. The markup gives structure to
the document.
The instance represents a hierarchy of elements. Each element has a
name , some attributes , and some content. Most elements are
represented in the document as a start tag, which gives the name
and attributes, followed by the content, followed by the end tag.
For example:
A sample HTML instance
is legal, but these others are not:
Character Data
The keyword CDATA indicates that the content of an element is
character data. Character data is all the text up to the next end
tag open delimiter-in-context. For example:
specifies that the following text is a legal XMP element:
Here's an example. It looks like it has
and
in it, but it does not. Even this
is data.
The string is only recognized as the opening delimiter of an end
tag when it is ``in context,'' that is, when it is followed by a
letter. However, as soon as the end tag open delimiter is
recognized, it terminates the CDATA content. The following is an
error:
There is no way to represent tags
in CDATA
Replaceable Character Data
Elements with RCDATA content behave much like those with CDATA,
except for character references and entity references. Elements
declared like:
can have any sequence of characters in their content.
Character References
To represent a character that would otherwise be recognized as
markup, use a character reference. The string signals a
character reference when it is followed by a letter or a digit. The
delimiter is followed by the decimal character number and a
semicolon. For example:
You can even represent </end> tags in RCDATA
Berners-Lee and Connolly 7
Entity References
The HTML DTD declares entities for the less than, greater than, and
ampersand characters and each of the ISO Latin 1 characters so that
you can reference them by name rather than by number.
The string & signals an entity reference when it is followed by a
letter or a digit. The delimiter is followed by the entity name and
a semicolon. For example:
Kurt Gödel was a famous logician and mathematician.
Note: To be sure that a string of characters has
no markup, HTML writers should represent all
occurrences of <, >, and & by character or
entity references.
Element Content
Some elements have, in stead of a keyword that states the type of
content, a content model, which tells what patterns of data and
nested elements are allowed. If the content model of an element
does not include the symbol #PCDATA , the content is element
content.
Whitespace in element content is considered markup and ignored. Any
characters that are not markup, that is, data characters, are
illegal.
For example:
declares an element that may be used as follows:
Head Example
But the following are illegal:
no data allowed!
Two isindex tags
Mixed Content
If the content model includes the symbol #PCDATA, the content of
the element is parsed as mixed content. For example:
This says that the PRE element contains one or more A, B, I, U, or
P elements or data characters. Here's an example of a PRE element:
An Example of Structure
Here's a typical paragraph.
- Item one has an anchor
- Here's item two.
NAME cat -- concatenatefiles EXAMPLE catThe content of the above PRE element is: A B element The string `` cat -- concatenate'' An A element The string ``\n'' Another B element The string ``\n cat . After the comment delimiter, all text up to the next occurrence of -- is ignored. Hence comments cannot be nested. Whitespace is allowed between the closing -- and >. (But not between the opening HTML Guide: Recommended Usage There are a few other SGML markup constructs that are deprecated or illegal. Delimiter Signals... Processing instruction. Terminated by >. . LINE BREAKS A line break character is considered markup (and ignored) if it is the first or last piece of content in an element. This allows you to write eithersome example textorsome example textand these will be processed identically. Also, a line that's not empty but contains no content will be ignored altogether. For example, the elementfirst line third line fourth linecontains only the strings first line third line fourth line. SPACES AND TABS Space characters must be rendered as horizontal white space. In HTML, multiple spaces should be rendered as proportionally larger spaces. The rendering of a horizontal tab (HT) character is not defined, and HT should therefore not be used, except within a PRE (or obsolete XMP, LISTING or PLAINTEXT) element. Neither spaces nor tabs should be used to make SGML source layout more attractive or easier to read. SUMMARY OF MARKUP SIGNALS Berners-Lee and Connolly 10 The following delimiters may signal markup, depending on context. Delimiter Signals Berners-Lee and Connolly 32 Berners-Lee and Connolly 33 Berners-Lee and Connolly 35 ]> Berners-Lee and Connolly 36 LINK RELATIONSHIP VALUES Status: This list is not part of the standard. It is intended to illustrate the use of link relationships and to provide a framework for further development. Additions to this list will be controlled by the HTML registration authority . Experimental values may be used on the condition that they begin with "X-". These values of the REL attribute of hypertext links have a significance defined here, and may be treated in special ways by HTML applications. These relationships relate whole documents (objects), rather than particular anchors within them. If the relationship value is used with a link between anchors rather than whole documents, the semantics are considered to apply to the documents. In the explanations which follows, A is the source document of the link and B is the destination document specified by the HREF attribute. A relationship marked "Acyclic" has the property that no sequence of links with that relationship may be followed from any document back to itself. These types of links may therefore be used to define trees. Relationships between documents These relationships are between the documents themselves rather than the subjects of the documents. USEINDEX B is a related index for a search by a user reading this document who asks for an index search function. A document may have any number of index links, causing several indexes top be searched in a client-defined manner. B must support SEARCH operations under its access protocol. USEGLOSSARY B is an index which should be used to resolve glossary queries in the document. (Typically, a double-click on a word which is not within an anchor). A document may have any number of glossary links. ANNOTATION Berners-Lee and Connolly 37 The information in B is additional to and subsidiary to that in A. Annotation is used by one person to write the equivalent of "margin notes" or other criticism on another's document, for example. Example: The relationship between a newsgroup and its articles. Acyclic. REPLY Similar to Annotation, but there is no suggestion that B is subsidiary to A: A and B are on equal footings. Example: The relationship between a mail message and its reply, a news article and its reply. Acyclic. EMBED If this link is followed, the node at the end of it is embedded into the display of the source document. Acyclic. PRECEDES In an ordered structure defined by the author, A precedes B, B is followed by A. Acyclic. Any document may only have one link of this relationship, and/or one link of the reverse relationship. Note: May be used to control navigational aids, generate printed material, etc. In conjunction with " subdocument ", may be used to define a tree such as a printed book made of hypertext document. The document can only have one such tree. SUBDOCUMENT B is a lower part in the author's hierarchy to A. Acyclic. See also Precedes . PRESENT Whenever A is presented, B must also be presented. This implies that whenever A is retrieved, B must also be retrieved. SEARCH Berners-Lee and Connolly 38 When the link is followed, the node B should be searched rather than presented. That is, where the client software allows it, the user should immediately be presented with a search panel and prompted for text. The search is then performed without an intermediate retrieval or presentation of the node B SUPERSEDES B is a previous version of A. Acyclic. HISTORY B is a list of versions of A A link reverse link must exist from B to A and to all other known versions of A. Relationships about subjects of documents These relationships convey semantics about objects described by documents, rather than the documents themselves. INCLUDES A includes B, B is part of A. For example, a person described by document A is a part of the group described by document B. Acyclic. MADE Person (etc) described by node A is author of, or is responsible for B This information can be used for protection, and informing authors of interest, for sending mail to authors, etc. INTERESTED Person (etc) described by A is interested in node B. This information can be used for notification of changes. Typically, this is a request that, when object B changes in some way, a new link is made to object A. The phrase "object B changes" may be interpreted narrowly (as "B itself changes") or widely (as "B or anythink linked to it or related to it closely changes"). The amount of change considered worth notifying people about is also subject to interpretation, varying from bit changes in the source to a "new edition" statement Berners-Lee and Connolly 39 by the publisher. REGISTRATION AUTHORITY The HTTP Registration Authority is responsible for maintaining lists of: Relationship names for link and anchor elements It is proposed that the Internet Assigned Numbers Authority or their successors take this role. Unregistered values may be used for experimental purposes if they are start with "X-". REFERENCES SGML ISO 8879:1986, Information Processing Text and Office Systems Standard Generalized Markup Language (SGML). sgmls an SGML parser by James Clarkderived from the ARCSGML parser materials which were written by Charles F. Goldfarb. The source is available on the ifi.uio.no FTP server in the directory /pub/SGML/SGMLS . WWW The World-Wide Web , a global information initiative. For bootstrap information, telnet info.cern.ch or find documents by ftp://info.cern.ch/pub/www/doc URL Universal Resource Locators. RFCxxx. Currently available by anonymous FTP from info.cern.ch in /pub/ietf. AUTHOR'S ADDRESSES This document was prepared with the help and advice of many people across the net. Dan Connolly prepared the DTD and the section on HTML and SGML whilst with Convex Computer Corporation of 3000 Waterview Parkway Richardson, TX 75083. He is now with Atrium Technology Inc., and is not a current editor of the document. Tim Berners-Lee Address CERN 1211 Geneva 23 Switzerland Telephone: +41(22)767 3755 Fax: +41(22)767 7155 email: timbl@info.cern.ch Berners-Lee and Connolly 40 Daniel Connolly Address: Atrium Technologies, Inc. 5000 Plaza on the Lake, Suite 275 Austin, TX 78746 USA email: connolly@atrium.com Berners-Lee and Connolly 41