CARVIEW |
XML Entity Definitions for Characters (3rd Edition)
Editors Draft 03 January 2023
- This version:
- https://w3c.github.io/xml-entities/
- Latest version:
- https://www.w3.org/TR/2023/WD-xml-entity-names-20230103/
- Previous versions:
- https://www.w3.org/TR/2014/REC-xml-entity-names-20140410/
https://www.w3.org/TR/2010/REC-xml-entity-names-20100401/ - Editors' version:
- https://w3c.github.io/xml-entities/
- Editors:
- David Carlisle, NAG
- Patrick Ion, Mathematical Reviews, American Mathematical Society
Please refer to the errata for this document, which may include some normative corrections.
See also translations.
Copyright © 1998-2023 World Wide Web Consortium. W3C® liability, trademark and permissive document license rules apply.
Abstract
This document defines several sets of names, so that to each name is assigned a Unicode character or sequence of characters. Each of these sets is expressed as a file of XML entity declarations.
Status of this Document
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document has been reviewed by W3C Members, by software developers, and by other W3C groups and interested parties, and is endorsed by the Director as a W3C Recommendation. It is a stable document and may be used as reference material or cited from another document. W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.
This third edition is based on Unicode 17 and incorporates changes to Unicode since Unicode 5.2 and 6.3, on which the first and second editions of this document were based. Note these updates only affect the non normative descriptions of the Unicode blocks. There are no changes to the normative entity definitions. It also has been updated and restructured slightly to note that [HTML5] now uses these definitions and to more clearly highlight that the HTML-MathML entity set should be used in preference to the older ISO sets that are also defined in this document.
This document was produced by the W3C Math Working Group as a Recommendation and as part of the W3C Math Activity. The goals of the W3C Math Working Group are discussed in the W3C Math WG Charter. The authors of this document are W3C Math Working Group members.
Comments should be sent to the Public W3C Math mailing list (list archives; see also instructions). When sending an e-mail comment on the XML Entity Definitions for Characters, please put the text “XML-Entities” in the subject line, preferably like this: “[XML-Entities] …summary of comment ”. Alternatively, report an issue at this specification's GitHub repository.
This document is governed by the 2 November 2021 W3C Process Document.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Appendix B Changes details the changes since earlier versions of this document.
1 Introduction
Notation and symbols have proved very important for human communication, especially in scientific documents. Mathematics has grown in part because its notation continually changes toward being succinct and suggestive. There have been many new signs developed for use in mathematical notation, and mathematicians have not held back from making use of many symbols originally introduced elsewhere. The result is that science in general, and particularly mathematics, makes use of a very large collection of symbols. It is difficult to write science fluently if these characters are not available for use. It is difficult to read science if corresponding glyphs are not available for presentation on specific display devices. In the majority of cases it is preferable to store characters directly as Unicode character data or as XML numeric character references.
However, in some environments it is more convenient to use the ASCII input mechanism provided by XML entity references. Many entity names are in common use, and this specification aims to provide standard mappings to Unicode for each of these names. It introduces no names that have not already been used in earlier specifications. Note that these names are short mnemonic names designed for input methods such as XML entity references, not the longer formal names that form part of the Unicode standard.
Specifically, the entity names in the sets starting with the letters "iso" were first standardized in SGML ([SGML]) and updated in [ISO9573-13-1991]. The W3C Math Working Group has been invited to take over the maintenance and development of these sets by the original standards committee (ISO/IEC JTC1 SC34). The sets with names starting "mml" were first standardized in MathML [MathML2] and those starting with "xhtml" were first standardized in HTML [HTML4].
This document is the result of years of employing entity names on the Web. There were always a few named entities used for special characters in HTML, and many more names used for MathML. This means that this document can be viewed as an extension and final revision of Chapter 6 of the MathML 2.0 [MathML2] recommendation. Now it presents a completed listing harmonizing the known uses of character entity names in XML and HTML, together with defined mappings to Unicode.
Since there are so many character entity names, and the files specifying them are resources that may be subject to frequent lookup, a template catalog file has also been provided. Users are strongly encouraged to design their implementations so that relevant entity name tables are cached locally, since it is not expected that the listings provided with this specification will need changing for some long time.
2 Sets of names
2.1 The HTML MathML Entity Set
Historically the entity sets have been split into relatively small groups of related characters however for any new document type that is being defined it is strongly recommended that the combined htmlmathml set is used. This defines an identical set of names to the names built in to the HTML parser (derived from the same source materials as this document, see D Source Files).
To incorporate the htmlmathml set into an XML DTD, a typical construct is:
<!ENTITY % htmlmathml-f PUBLIC "-//W3C//ENTITIES HTML MathML Set//EN//XML" "carview.php?tsp=https://www.w3.org/2003/entities/2007/htmlmathml-f.ent" > %htmlmathml-f;
The public identifier should always be used verbatim, the system identifier should be changed to suit local requirements.
The entity set is available in two forms:
- htmlmathml-f the expanded set of HTML and MathML entity definitions
- htmlmathml the HTML and MathML entities defined via reference to the legacy entity set definitions as listed in the following section
The information is also available in JSON format. The JSON arrays encode the entity names and mappings to Unicode and also a list of those entity references for which the HTML (but not XML) parser allows the trailing semicolon to be omitted. So &
may be used as well as &
when using HTML.
An XSLT2 stylesheet is available which performs the reverse mapping, replacing Unicode characters by entity references.
2.2 Legacy Entity Sets
This specification defines mappings to Unicode of many sets of names that have been defined by earlier specifications.
We present two tables listing all the sets combined, first in Unicode order and then in alphabetic order:
- All in Unicode order
- All in alphabetic order
Then there come tables documenting each of the entity sets. Each set has a link to the DTD entity declaration for the corresponding entity set, and also a link to an XSLT2 stylesheet that will implement a reverse mapping from characters to entity names (this is, of course, only possible for entity names that map to a single Unicode code point).
- isobox Box and Line Drawing
- isocyr1 Russian Cyrillic
- isocyr2 Non-Russian Cyrillic
- isodia Diacritical Marks
- isolat1 Added Latin 1
- isolat2 Added Latin 2
- isonum Numeric and Special Graphic
- isopub Publishing
- isoamsa Added Math Symbols: Arrow Relations
- isoamsb Added Math Symbols: Binary Operators
- isoamsc Added Math Symbols: Delimiters
- isoamsn Added Math Symbols: Negated Relations
- isoamso Added Math Symbols: Ordinary
- isoamsr Added Math Symbols: Relations
- isogrk1 Greek Letters (not in MathML3 / HTML5)
- isogrk2 Monotoniko Greek (not in MathML3 / HTML5)
- isogrk3 Greek Symbols
- isogrk4 Alternative Greek Symbols (not in MathML3 / HTML5)
- isomfrk Math Alphabets: Fraktur
- isomopf Math Alphabets: Open Face
- isomscr Math Alphabets: Script
- isotech General Technical
- mmlextra Additional MathML Symbols
- mmlalias MathML Aliases
- xhtml1-lat1 Latin for HTML
- xhtml1-special Special for HTML
- xhtml1-symbol Symbol for HTML
- html5-uppercase uppercase aliases for HTML
- predefined Predefined XML
In addition to the stylesheets and entity files corresponding to each individual entity set, a combined stylesheet is provided, as well as a combined entity set, in two formats, as for the HTML MathML set described above.
- w3centities W3C entities collection; referencing all entity sets listed above
- w3centities-f the same set of entity definitions, expanded into a single file, with duplicates removed
3 Unicode Character Ranges for Scientific Documents
Certain characters are of particular relevance to scientific document production. The following tables display Unicode ranges containing the characters that are most used in mathematics.
Note that each of the tables linked from this section contains 256 images and may take a while to load if the images have not been cached locally.
- 000 C0 Controls and Basic Latin, C1 Controls and Latin-1 Supplement
- 001 Latin Extended-A, Latin Extended-B
- 002 IPA Extensions, Spacing Modifier Letters
- 003 Combining Diacritical Marks, Greek and Coptic
- 004 Cyrillic
- 006 Arabic
- 020 General Punctuation, Superscripts and Subscripts, Currency Symbols, Combining Diacritical Marks for Symbols
- 021 Letterlike Symbols, Number Forms, Arrows
- 022 Mathematical Operators
- 023 Miscellaneous Technical
- 024 Control Pictures, Optical Character Recognition, Enclosed Alphanumerics
- 025 Box Drawing, Block Elements, Geometric Shapes
- 026 Miscellaneous Symbols
- 027 Dingbats, Miscellaneous Mathematical Symbols-A, Supplemental Arrows-A
- 029 Supplemental Arrows-B, Miscellaneous Mathematical Symbols-B
- 02A Supplemental Mathematical Operators
- 02B Miscellaneous Symbols and Arrows
- 0FB Alphabetic Presentation Forms, Arabic Presentation Forms-A
- 0FE Variation Selectors, Vertical Forms, Combining Half Marks, CJK Compatibility Forms, Small Form Variants, Arabic Presentation Forms-B
- 1D4 Mathematical Alphanumeric Symbols
- 1D5 Mathematical Alphanumeric Symbols (continued)
- 1D6 Mathematical Alphanumeric Symbols (continued)
- 1D7 Mathematical Alphanumeric Symbols (continued)
- 1EE Arabic Mathematical Alphabetic Symbols
4 Mathematical Alphanumeric Characters
Many of the entities defined by this specification relate to the mathematical alphanumeric characters contained in the letter-like symbols block of Unicode Plane 0, or in the Mathematical Alphanumeric Symbols block in Unicode Plane 1. The following tables list all these symbols, highlighting those that are not in Plane 1, and giving entity names where appropriate.
5 Entities for Negated and Variant Characters
Each of the entity definitions in a majority of the specification expands to a single Unicode character. The definitions that expand to a sequence of two or more characters are outlined in this section.
5.1 Negated Mathematical Characters
In addition to the Unicode Characters so far listed, one may use the combining characters U+0338 (/), U+20D2 (|) and U+20E5 (\) to produce negated or canceled forms of characters. A combining character should be placed immediately after its "base" character, with no intervening markup or space, just as is the case for combining accents.
In principle, the negation characters may be applied to any Unicode character, although fonts designed for mathematics typically have some negated glyphs ready composed. A MathML renderer should be able to use these pre-composed glyphs in these cases. A compound character code either represents a UCS character that is already available, as in the case of U+003D U+0338 which amounts to U+2260, or it does not, as is the case for U+2202 U+0338. The common cases of negations, of the latter type, that have been identified are listed in the tables.
Note that it is the policy of the W3C and of Unicode that if a single character is already defined for what can be achieved with a combining character, that character must be used instead of the decomposed form. It is also intended that no new single characters representing what can be done with existing compositions will be introduced. For further information on these matters see the Unicode Standard Annex 15, Unicode Normalization Forms [Unicode15], especially the discussion of Normalization Form C.
5.2 Variant Mathematical Characters
Unicode attempts to avoid having several character codes for simple font variants. For a code point to be assigned there should be more than a nuance in glyphs to be recorded. To record variants worth noting there is a special character in Unicode 3.2, U+FE00 (VARIATION SELECTOR-1), which acts as a postfix modifier. However the legally allowed combinations with this variation selector are restricted to a list recorded as part of Unicode. The VARIATION SELECTOR-1 character may only be applied to the characters listed here. The resulting combination is not regarded by Unicode as a separate character, but a variation on the base character. Unicode aware systems may render the combination as the base if the available fonts do not support the variant glyph shape.
A Special Considerations
A.1 Epsilon
Historically there has been much confusion and lack of agreement over variant forms for lower case epsilon.
This specification uses the definitions below. Note that the name epsilon is used for the character used in textual Greek (U+03B5) and varepsilon used for the epsilon symbol character more commonly used in mathematics (U+03F5). Note that this usage is compatible with the naming of similar pairs of characters (for example theta, vartheta) but incompatible with the naming convention used in TeX, MathML2 and some earlier mappings of the ISO entity sets to Unicode.
Entity | Set | Description | Unicode Character | ||
---|---|---|---|---|---|
eacgr | isogrk2 | =small epsilon, accent, Greek | U+03AD | ![]() | GREEK SMALL LETTER EPSILON WITH TONOS |
egr | isogrk1 | =small epsilon, Greek | U+03B5 | ![]() | GREEK SMALL LETTER EPSILON |
epsi | isogrk3 | /epsilon | |||
epsilon | xhtml1-symbol | ||||
epsiv | isogrk3 | /straightepsilon, small epsilon, Greek | U+03F5 | ![]() | GREEK LUNATE EPSILON SYMBOL |
straightepsilon | mmlalias | alias ISOGRK3 epsiv | |||
varepsilon | mmlalias | alias ISOGRK3 epsiv | |||
bepsi | isoamsr | /backepsilon R: such that | U+03F6 | ![]() | GREEK REVERSED LUNATE EPSILON SYMBOL |
backepsilon | mmlalias | alias ISOAMSR bepsi | |||
b.epsi | isogrk4 | small epsilon, Greek | U+1D6C6 | ![]() | MATHEMATICAL BOLD SMALL EPSILON |
b.epsiv | isogrk4 | variant epsilon | U+1D6DC | ![]() | MATHEMATICAL BOLD EPSILON SYMBOL |
A.2 Phi
The situation for phi is very similar to that of epsilon, although with the further complication that early versions of Unicode had the sample glyphs for U+03C6 and U+03D5 swapped from the current usage, and some older fonts still in use follow that older convention. The definitions used in this specification are as listed below.
Entity | Set | Description | Unicode Character | ||
---|---|---|---|---|---|
phi | isogrk3 | /phi - small phi, Greek | U+03C6 | ![]() | GREEK SMALL LETTER PHI |
phi | xhtml1-symbol | greek small letter phi | |||
phgr | isogrk1 | =small phi, Greek | |||
straightphi | mmlalias | alias ISOGRK3 phiv | U+03D5 | ![]() | GREEK PHI SYMBOL |
phiv | isogrk3 | /varphi - straight phi | |||
varphi | mmlalias | alias ISOGRK3 phiv | |||
b.phi | isogrk4 | small phi, Greek | U+1D6D7 | ![]() | MATHEMATICAL BOLD SMALL PHI |
b.phiv | isogrk4 | variant phi | U+1D6DF | ![]() | MATHEMATICAL BOLD PHI SYMBOL |
A.3 Multiple Character Entities
In addition to the combining and variant character combinations listed in the previous sections, the following table lists the remaining entity replacement texts that consist of more than one character.
Entity | Set | Description | Unicode Character | ||
---|---|---|---|---|---|
fjlig | isopub | small fj ligature | U+0066 U+006A | ![]() | fj ligature |
ThickSpace | mmlextra | space of width 5/18 em | U+205F U+200A | ![]() | space of width 5/18 em |
race | isoamsb | reverse most positive, line below | U+223D U+0331 | ![]() | REVERSED TILDE with underline |
acE | isoamsb | most positive, two lines below | U+223E U+0333 | ![]() | INVERTED LAZY S with double underline |
Unicode does not have an fj character, although the other common f ligatures such as fi (U+FB01) are contained in the Alphabetic Presentation Forms block. The fjlig entity is mapped to the pair of characters "fj"; modern typesetting engines should automatically use the fj ligature for this combination if the font supplies such a ligature.
Unicode has a range of space characters (including all multiples of 1/18 em up to 6/18, except for 5/18 em) thus the ThickSpace entity is mapped to a pair of space characters. An alternative would have been to use U+2005 (1/4 em), but 1/4 em is not equal to 5/18 em, so the above definition was chosen, despite the fact that the difference is unlikely to be visibly noticeable at most typeset font sizes.
The entities race and acE denote underlined characters for which Unicode does not have codepoints, thus combining underline characters have been used, in a way analogous to the use of combining strokes for negated operators.
A.4 Entities Defined to be a Combining Character
The following table lists the entity replacement texts that consist of a combining character.
Entity | Set | Description | Unicode Character | ||
---|---|---|---|---|---|
DownBreve | mmlextra | breve, inverted (non-spacing) | U+0311 | ![]() | COMBINING INVERTED BREVE |
tdot | isotech | three dots above | U+20DB | ![]() | COMBINING THREE DOTS ABOVE |
TripleDot | mmlalias | alias ISOTECH tdot | |||
DotDot | isotech | four dots above | U+20DC | ![]() | COMBINING FOUR DOTS ABOVE |
For reasons explained further in [Charmod-norm], it is not advisable to start the replacement text of an entity with a combining character, as then potentially different results may be produced depending on the order in which entity expansion and Unicode normalisation are performed. As far as possible this specification uses non-combining characters, however, in the cases DownBreve, tdot, TripleDot and DotDot Unicode only has combining forms of the accents.
Earlier versions of this specification defined these entities with the replacement text starting with a space, to avoid the possibility that the expansion of the entity combined with preceding text. However for various reasons the entities as incorporated in HTML do not have a space here, and so the definitions now consist just of the combining character so that HTML and XHTML are consistent with any specifications using these definitions.
B Changes
B.1 Changes since 2014-04-10 (Second Edition Recommendation)
Source files updated to Unicode 15.0, affecting the character tables, but with no changes to generated entity files or stylesheets. New table for the U+FE01 Variation selector and greatly extended set of variations in the U+FE00 table (most of these standardised variants were added at Unicode 14). The script alphabet table has been extended to show both variants.
Reference added to the November 2021 W3C Process Document.
Some changes to the front matter including link to GitHub as required by the latest W3C publication process.
Adjustments to CSS styling to match new W3C document style.
The source repository has been moved to github so the log is now public.
As detailed in A.4 Entities Defined to be a Combining Character DownBreve, tdot, TripleDot and DotDot are no longer prefixed by a space.
B.2 Changes between 2010-04-01 and 2014-04-10 (First and Second Edition Recommendations)
Source files updated to Unicode 6.3, affecting the character tables, but with no changes to generated entity files or stylesheets.
Source files updated Unicode 6.1 data on Arabic math alphabets (U+1EE??). Additional tables shown in Sections 3 and 4.
Section 2 Sets of names reorganized to highlight the htmlmathml
set which is used in MathML and HTML. Also link to XSL and JSON formats for the HTML MathML set.
B.3 Changes between 2010-04-01 and 2010-02-11
Several example images improved, bringing them more in line with the Unicode reference images.
B.4 Changes between 2010-02-11 and 2009-11-17
Various editorial improvements, including using Unicode U+1234 notation more consistently rather than displaying the internal IDs of the form U01234.
The combined entities file distributed with the 2009-11-17 draft introduced an error that if two entity names differed only by case, only one was included. This has been corrected.
The combined entity set htmlmathml corresponding to the entities usable in HTML and MathML is now explicitly provided. The predefined set, corresponding to the entities predefined in XML is now documented (it was previously used internally).
The entities xvee and xwedge had the correct Unicode assignments (U+22C1 and U+22C0) but the entity descriptions have been swapped, xvee is logical or and xwedge is logical and. This error in [ISO9573-13-1991] was reported in 1999, in a Proposed Technical Corrigendum, but not previously fixed. The entity files are unaffected by this change.
The entity NotGreaterFullEqual which had been erroneously assigned to a negated less than operator (U+2266 U+0338) has been corrected to be the negated greater than operator (U+2267 U+0338).
A sample catalog is now provided to redirect references to the entity files to copies on the local machine rather than the W3C server.
B.5 Changes between 2009-11-17 and 2008-07-21
The html5-uppercase set is now documented.
The entities ohm and angst have changed to U+03A9 and U+00C5 to match NFC. See w3c bugzilla entry.
The entity race, which had been erroneously assigned U+29DA, is now assigned the combination U+223D U+0331. (U+223D isn't quite the shape shown in the original ISO document which is a rotated S rather than a rotated tilde, but this appears to be the closest character in Unicode 5.2.)
The entities bsolhsub and suphsol which were previously mapped to two-character combinations U+005C U+2282 and U+2283 U+002F are now mapped to the Unicode 5 characters that were added specifically to support these entities, U+27C8 and U+27C9.
The source files have all been updated to match Unicode 5.2.
The entity ThickSpace now maps to the pair U+205F U+200A rather than the triple U+2009 U+200A U+200A (4/18 + 1/18)em rather than (3/18 + 1/18 + 1/18)em.
The entity UnderBar maps to the spacing character _ rather than the combining character U+0332.
The entity OverBar maps to the spacing character U+203E (like the XHTML entity oline) rather than the macron character U+00AF.
The entities epsiv and varepsilon are now mapped to the epsilon symbol U+03F5 rather than being aliases for the entity epsilon, U+03B5.
The entities phiv and varphi are now mapped to the phi symbol U+03D5 rather than being aliases for the entity phi, U+03C6.
C Differences between these entities and earlier W3C DTDs
C.1 Differences from XHTML 1.0
Differences between the XHTML entity definitions described here and the entity set described in the XHTML 1.0 DTD.
- lang and rang
- U+27E8 and U+27E9; XHTML 1.0 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).
Note:
The current drafts of [HTML5] use entity definitions derived from this specification.
C.2 Differences from MathML 2.0 (second edition)
The differences between MathML 2 and the current entity definitions are listed below.
- fjlig
- ISOPUB (and MathML 1) defined an fj ligature; Unicode does not have a specific character and the entity was dropped from MathML2. It is re-instated here for maximum compatibility with [SGML].
- phi
- U+03C6 GREEK SMALL LETTER PHI (the definition used in HTML4); MathML2 used U+03D5 GREEK PHI SYMBOL.
- epsiv, varepsilon, phiv, varphi
- these have been changed to map to the symbol character (to match other uses of the var prefix such as vartheta).
- jmath
- U+0237; MathML 2 used U+006A (j) as there was no dotless j before Unicode 4.1.
- trpezium, elinters
- U+23E2 and U+23E7; MathML 2 used U+FFFD (REPLACEMENT CHARACTER) as these characters were added at Unicode 5.0 specifically to support these entities.
- ohm, angst
- As noted above, the definitions of these entities have been changed so that the definitions use characters that are in NFC normal form.
- bsolhsub and suphsol
- U+27C8 and U+27C9; MathML2 used U+005C U+02282 and U+2283 U+002F.
- NotGreaterFullEqual
- U+2267 U+0338 ; MathML2 used the erroneous definition U+2266 U+0338.
The following bracket symbols have been added to the Mathematical symbols block in Unicode versions between 3.1 and 5.1. MathML2 used similar characters intended for CJK punctuation.
- lang, langle, LeftAngleBracket and rang, rangle, RightAngleBracket
- U+27E8 and U+27E9; MathML2 used U+2329 and U+232A (which have canonical decomposition to U+3008 and U+3009).
- Lang and Rang
- U+27EA and U+27EB; MathML2 used U+300A and U+300B.
- lbbrk and rbbrk
- U+2772 and U+2773; MathML2 used U+3014 and U+3015.
- loang and roang
- U+27EC and U+27ED; MathML2 used U+3018 and U+3019.
- lobrk and robrk
- U+27E6 and U+27E7; MathML2 used U+301A and U+301B.
- OverBrace and UnderBrace
- U+23DE and U+23DF; MathML2 used U+FE37 and U+FE38.
- OverParenthesis and UnderParenthesis
- U+23DC and U+23DD; MathML2 used U+FE35 and U+FE36.
- LeftDoubleBracket and RightDoubleBracket
- U+27E6 and U+27E7; MathML2 used U+301A and U+301B.
Note:
[MathML3] uses the entity sets defined by this specification.
D Source Files
All data files used to construct the entity declarations, XSLT character maps, and HTML tables referenced from this document are available from https://github.com/w3c/xml-entities/.
- unicode.xml master file detailing all Unicode characters with names in various entity sets and applications, TeX equivalents and other data. This file has been maintained for many years, originally by Sebastian Rahtz as part of the jadetex distribution and since around 1999 as part of the MathML specification sources by David Carlisle. The current version encodes data for all characters in Unicode 17. Note: unicode.xml is over 5MB in size and may not really be suitable for direct viewing in a browser. You may prefer to save the file rather than follow the above link to unicode.xml in a browser.
- charlist.rnc relax NG schema for unicode.xml.
- unicode.xsl XSLT stylesheet that renders unicode.xml as an HTML table.
- character-set.xml the source file for this document.
- xmlspec.xsl a copy of the standard xmlspec stylesheet.
- run small script file that builds this collection.
- xhtml1.xml record of XHTML 1.0 entity definitions.
- mml2.xml record of MathML 2.0 (second edition) entity definitions.
- unicodedata.xsl stylesheet that generates a new copy of unicode.xml, incorporating data from the Unicode data file, used to update unicode.xml as new versions of Unicode are released.
- entities.xsl stylesheet to generate the DTD declarations for the entities.
- charmap.xsl stylesheet to generate the XSLT character maps.
- characters.xsl stylesheet to generate this document, including the referenced HTML tables.
- schemas.xml file associating XML documents with appropriate Relax NG schema.
- catalog Sample OASIS XML catalog that redirects references to the entity or stylesheet files at https://www.w3.org/2003/entities/2007/ to the local file system at /etc/xml/w3c-entities. It should be edited to refer to the location of a local copy of the files. Many XML parsers may be configured to read this catalog format, but the specific options depend on the parser being used.
E References
- SGML
- ISO/IEC 8879:1986, Information processing — Text and office systems — Standard Generalized Markup Language (SGML)
- ISO9573-13-1991
- ISO/IEC TR 9573-13:1991, Information technology — SGML support facilities — Techniques for using SGML — Part 13: Public entity sets for mathematics and science
- Unicode
- The Unicode Consortium. The Unicode Standard, Version 5.2.0, defined by: The Unicode Standard, Version 5.2 (Mountain View, CA: The Unicode Consortium, 2009. ISBN 978-1-936213-00-9). Unicode 6.3 update (https://www.unicode.org/versions/Unicode6.3.0/)
- Unicode15
- Unicode Standard Annex 15, Version 6.3.0; Unicode Normalization Forms, The Unicode Consortium, 2013-09-20. (https://www.unicode.org/reports/tr15/)
- Unicode25
- Barbara Beeton, Asmus Freytag, Murray Sargent III, Unicode Support for Mathematics, Unicode Technical Report #25 2012-04-02. (https://www.unicode.org/reports/tr25/)
- MathML2
- David Carlisle, Patrick Ion, Robert Miner, Nico Poppelier, Mathematical Markup Language (MathML) Version 2.0 (Second Edition) W3C Recommendation 21 October 2003 (https://www.w3.org/TR/2003/REC-MathML2-20031021/)
- MathML3
- David Carlisle, Patrick Ion, Robert Miner, Mathematical Markup Language (MathML) Version 3.0 2nd Edition W3C Recommendation 10 April 2014 (https://www.w3.org/TR/2014/REC-MathML3-20140410/)
- HTML4
- Dave Raggett, Arnaud Le Hors, Ian Jacobs, HTML 4.01 Specification W3C Recommendation 24 December 1999 (https://www.w3.org/TR/1999/REC-html401-19991224)
- HTML5
- Robin Berjon, Steve Faulkner, Travis Leithead, Erika Doyle Navara, Edward O'Connor, Silvia Pfeiffer, Ian Hickson HTML 5, A vocabulary and associated APIs for HTML and XHTML W3C Candidate Recommendation 6 August 2013 (https://www.w3.org/TR/html5/)
- Charmod-norm
- François Yergeau, Martin J. Dürst, Richard Ishida, Addison Phillips, Misha Wolf, Tex Texin, Character Model for the World Wide Web 1.0: Normalization W3C Working Draft 1 May 2012 (https://www.w3.org/TR/charmod-norm/)