© 1999–2021 Rick Jelliffe. PageSeeder and hosting generously provided by Allette Systems (Australia)
CARVIEW |
A language for making assertions about the presence or absence of patterns in linked XML documents, and reporting them in useful ways.
Schematron is:
- a simple but powerful language for asserting or reporting the presence or absence of arbitrary patterns in data, in particular using XPath over XML documents (but not limited to either) with particular strengths in capturing natural language descriptions and diagnostics;
- an ISO/IEC International Standard, one of the Document Schema Description Languages (DSDL) standards
- with mature Open Source implementations
- used in sectors such as finance, health, aerospace, homeland security, tax, international business, scientific computing, and publishing, where the value or volume or complexity or interchangeability or required regulatory compliance of the data is high.
Schematron is used for data integrity checking, business rules validation, data reporting, general validation, quality control, quality assurance, firewalling, filtering, constraint checking, naming and design rules checking, statistical consistency, data exploration, transformation testing, feature extraction, house-style rules-checking, automated document correction, and verification that a new information system successfully replaces an old system.
In the spectrum of AI technologies, Schematron can be classed as an “expert system” tool: especially useful for capturing and implementing the intent of Subject Matter Experts.
Cheat Sheet
Cheat Sheet
Summary of Schematron language and SVRL
A Schematron schema has:
- patterns, which have
- rules, which make
- assertions, which can have
- various associated rich texts, diagnostics, and properties defined by you.
Typically, validating a document against a Schematron document produces an SVRL (Schematron Validation Report Language) document which can be expressed as XML or JSON etc. and used by downstream processes.
Overview
USE-CASE: During the COVID 19 epidemic, Schematron helped organize the monitorig of services: in the US, real-time data on Emergency Medical admissions and causes is collected by National Emergency Medical Services Information System (NEMSIS). Schematron allowed them a practical route to having subject-matter experts specify rules in plain English, then developers implement exactly those rules. Read their excellent Schematron Guide, or see their online Library of national and state-level Schematron Rules. (Hint: try “PA”)
New Book
Schematron: A Language for Validating XML
Erik Siegel’s new book is a systematic and thorough introduction explaining all elements and attributes of Schematron with common ways to use them.
Developers will appreciate the large XPath primer, together with appendixes on Schematron, Namespaces, SVRL and Schematron Quick Fix. It follows the most recent version of the ISO Standard and is geared for XPath3.
The 260 page XML Press book is available in paperback; electronic versions are available in ePub, mobi (Kindle), and PDF. Release announced with overview: XML.COM (November 2022.)
A book review by Rick Jelliffe is here.
Erik also introduces Schematron in two YouTube videos from Declarative Amsterdam (1 and 2) Nov 7-9, 2022.
Open Source Implementations
There are two Open Source implementations of Schematron for XSLT at GitHub:
- The skeleton implementation, which was the original code by Rick Jelliffe with numerous contributions. It is
not expected that this code will be developed further.
- SchXslt [ʃˈɛksl̩t], a cleaner re-implementation by David Maus, very compatible with the skeleton. The initial version only took 704 lines of code!
- Other Schematron tools are at the Awesome Schematron repository.
Search the web for implementations of Schematron in your language of choice: from Ant to Scala; these typically host the XSLT scripts and look after housekeeping. (I currently do not recommend the libxml implementation.)
As well, several systems are available which support Schematron embedded in other Schema languages: for example, the Apache Daffodil system allows Schematron embedded in the XSD-subset DFDL.
Open Standard
The international standard is ISO/IEC 19757-3. Available for $300 AUD.
The schemas for Schematron from the international standard are Open Source/Open Standards: they are available at this GitHub site.
- Release comments on the 3rd edition, 2020
- Release comments on the 2nd edition, 2016
ISO Schematron is an open standard, in the sense of the IEEE, ISOC, W3C, IETF, IAB, Foundation for Free Information Interchange (FFII), Free Software Foundation Europe (FSFE), requiring a formal, open process and unencumbered use.
In September 2022, a new ISO working group has been established to prepare a new edition of ISO Schematron, due for 2025. There are numerous proposals for enhancements. To comment on proposals, or to raise issues and make your own proposals for the ISO Schematron standard, use the GitHub Issues facilities on Tony Graham's schematron-enhancement-proposals.
Open Licensing
In the long run, I think Schematron may well be the XML project’s greatest technical
legacy to the world.
Simon St Laurent, Technical Journalist and O’Reilly Editor, xml-DEV list, 19 May 2016
Community
Schematron is remarkable in how few questions people have about it. This is because the language is so small, and many questions people might have are actually XPath questions.
- Andrew Sales hosts the Schematronist mail-list.
- Betty Harvey's schematron-love-in mail list is no longer active but has archives.
- The annual XML Prague conference hosts a Schematron user group day or half-day pre-conference
- The annual Basilage (USA) conference typically has several papers presented on Schematron topics and implementation experience
Schematron Topics
Pesky Humans
Fundamental Concepts
- Fundamental Structural Patterns Bolognese
- Highly Generic Schemas
- Quasi-static and quasi-dynamic constraints
- Islands of validity
- Standard severity levels
- The most common programming error with Schematron
-
From Grammars to the Schematron (PDF) (1999) - the first public presentation of Schematron, at ASCC, Academia Sinica, Taiwan, explaned in terms of cohesion and coupling.
Understanding Assertions
Performance
Document Metrics and Testing
Converting XML Schemas to Schematron
xsd2sch
It may seem impossible, or mad, to attempt to convert XSD to Schematron; or, rather, to partly implement XSD in XSLT 2 through Schematron. In 2007-2008, JSTOR funded an exploratory project at Allette Systems to write such a converter from XSD to Schematron, for a large subset they specified.
The techniques we developed were detailed in a long series of articles on the O' Reilly website: Converting XML Schemas to Schematron, which are now collected here with the related articles from the same time.
The XSLT2 code is available on GitHub under the MIT license.
Software
Current
- SchXslt- David Maus's Schematron engine in XLST
- XSLT 1,2,3
- uses same pre-processors as the skeleton (below)
- potentially faster than skeleton e.g. especially for documents sparsely validated by patterns with few rules and few assertions?
Legacy
- Schematron skeleton - Rick Jelliffe’s Schematron engine in XSLT
- XSLT 1,2
- not actively maintained now
- potentially faster than SchXslt e.g. for small-medium documents intensely validated by complex patterns with lengthy chains of interactive rules with many assertions?
Ancient
- Topologi Schematron Validator - 20-year old desktop application in VB
- Pre-ISO Schematron 1.6
- Free to use, not open source
- Supports xsd, RELAX NG, DTD, schematron in XSD or RELAX NG
Schematron extended
Beyond Schematron
RAN 乱 - a modernized XML for parallel parsing
Apatak - streaming validation of arbitrary segments
One good bubble deserves another. If RAN is a markup language which can be divided into separate fragments and parsed by separate threads, it must also need a validation language that also can work on arbitrary segments without requiring content.
Apatak has some similarities in its pairwise approach to 2002's one-element Hook schema language.
Feature Grammars - a little language for feature extraction
Feature extraction discovers some general property of an XML document, to direct subsequet processing. Is it a New Zealand document or a Fijian? Does it use the old tags or the new ones? Is it a tax return with no income? Is it a form where the person claims to be both single and married? This can certainly be done with Schematron and SVRL.
However, Feature Grammars provides a simpler and more direct language and XSLT for it.
PRESTO - all documents; each grain; any formats; every URL
"All documents, views and metadata at all significant levels of granularity and composition should be available in the best formats practical from their own permanent hierarchical URIs.”
Thought patterns and schema languages
XML beyond XML
- Rapid Access Notation (2024) - remove barriers to parallel lexing/parsing
- The X Refactor (2018) - XML ecosystem refactored into 5 layers
- XMON combining XML and JSON (2017) - allow structs in start-tags
- Editor's Concrete Syntax (2002) - a lexical profile of SGML for coloring editors
Schema Languages real and imagined
- Feature Grammars - a little language for extracting general features of documents
- Hook - a one-element schema language using partial ordering
- RAN Apatak - streaming validation of arbitrary segments
- SHRVL - Schematron Hierarchical Report View Language - nested version of SVRL
- Probabilistic schemas, hidden Markov models, neural nets for XML
- Sugar-Free XSD - resolved XSD subset with Schematron validator
- Lightweight schemas above structs - inline Attribute Grammars in PIs
- XML Notation Schemas (1999) - a framework with pluggable mini-validators
- Weak Validation (1999) -the basics of "feasible validation" later implemented
- Using XSLT as a validation language (1999) - the start of the line of thinking that lead to Schematron
- Family Tree of Schema Languages for XML (2007) (PNG)
Computer Languages, Libraries
Updates
- What’s in Java 18 for XML Developers?
- What's in Java 17 for XML Developers?
- What's in Java 13-16 for XML Developers?
- What's in Java 11 for XML Developers?
- What's in Java 10 (and 9) for XML developers?
- XPath 3.1 adds Maps and Arrays, and new operators '!' '?' - so ~ LISP?
- Overview of Rust and Pony
- Can Intel ISPC help stagnant C get its mojo back?
- The Fastest Growing Programming Language of 2018?
- Using C++ Instrinsic Function For Pipelined Text Processing
Issues