Related link: https://simonstl.com/articles/sanity2/
Sane XML
Is XML driving you insane? It shouldn’t, and it doesn’t have to. Sanity is within reach, if you’re willing to discard a lot of junk and take a look at some tools that fit XML neatly.
Back in 1996, the XML effort began, focused on creating a subset of SGML that would be easier to work with, and finally allowing it to reach a much wider group of developers and users.
Today, in 2004, XML is far more tangled than SGML ever dreamed of being. Even if you include other SGML-related ISO projects, like HyTime, in the mix, XML has far outstripped its supposedly complex parent. In practice, most users focus only on a tiny subset of the capabilities that standards bodies and vendors have provided, but choosing a subset that doesn’t inflict major pain over the course of a project is still difficult.
Over three years ago, a group of developers in which I participated proposed “Common XML“, a subset of XML 1.0 and Namespaces in XML. We thought it trimmed the fat pretty reasonably and enhanced the interoperability that had been compromised by several design decisions in XML 1.0 itself. In practice, I think we got things mostly right, as developers who work with XML tend to stick to the parts whose use we encouraged, and seem to have gotten the message that some of the pieces we described as extensions may or may not work as expected across applications. (I don’t credit Common XML with making any changes; it just codified practices people have largely found on their own.)
Today, the XML landscape is far more complex, with specifications good and bad littering the computing world. One of the most bloated, W3C XML Schema, has dominated the tools world despite interoperability and complexity issues. Thanks in large part to early support from vendors, this collection of issues masquerading as a schema language continues to dominate the XML world - and in my opinion, makes the cost of using XML much higher for both vocabulary creators and consumers of those vocabularies. W3C XML Schema is only of a number of complicating specifications from the W3C, and the W3C is starting to find itself troubled by the additions it made to SGML.
Developers don’t need these headaches, though they may feel trapped by currently available tools, and many of them haven’t heard that there are in fact alternatives to W3C XML Schema. Using XML shouldn’t be a mind-binding experience, and it’s possible to discard most of W3C XML Schema and still get work done - even get more and better work done.
The key to this sanity is a strict focus on XML and XML documents. Stop pretending that these things have object hierarchies, and stop hacking around the conflicts between object hierarchies and document realities with broken tools like substitution groups and keys. Focus on the documents themselves and the structures you’d like to have in those documents, and there’s a chance you’ll produce documents that are a pleasure, rather than a burden, to work with. You can build schemas using this understanding of documents with RELAX NG, a schema language that describes document structures, not type structures abstracted on top of document structures.
I gave a presentation last week on how to use RELAX NG to create schemas which work with W3C XML Schema tools - you don’t have to give up compatibility with current tools to escape the complexity. There’s plenty of information at XML.com to get you started, as well as a new O’Reilly book that’s also available online.
Take a look at RELAX NG, and start using it where you can. Start by writing new schemas in RELAX NG, and convert them to W3C XML Schema later if you need to. Ask other developers for schemas in RELAX NG format. Even the W3C, purveyor of W3C XML Schema, has found RELAX NG to be useful.
XML was never meant to be complicated. You shouldn’t have to buy a continuous stream of books, even O’Reilly books, to get your work done using XML. (Given the state of the XML book market, it seems clear that the treadmill has exhausted people.) While you’ll undoubtedly still find data modeling a challenge, RELAX NG will let you focus on your information structures rather than on the intricacies of a bloated schema specification half-hidden by tools.
What else about XML makes you crazy?