XML-bashing seems to have become a semi-popular passtime of late… of the many critiques I’ve come across, this presentation is one of the best. Here are a few hopefully reasonable comments addressing the whole anti-XML sentiment that’s floating around and this critique in particular.
Up front… I’m not particularly an XML advocate; I’ve been involved in none of the XML specifications or working groups. However, while I share some of the same feelings that somehow XML is rather “grotty,” I’ve developed what I hope is a reasonable position on the matter.
Some of Aaron’s arguments are pretty good, but some rest on a few assumptions and philosophical positions that are, IMHO, erroneous.
What Technology Should (and Shouldn’t) Try to Achieve
All instances of technology have one meta-purpose: to accomplish or achieve some function, feature, or design requirement. That’s it. It’s not the job of a technology to be beautiful, aesthetically pleasing, etc. In fact, there’s *no such thing* as beautiful or aesthetically pleasing technology. We technologists are prone to thinking that technology can have these qualities because we “feel” that some technologies do; however, this is a trap — one that we technologists often fall into.
A technology should accomplish what it is designed to accomplish in a reasonably efficient manner, with minimal cognitive overhead. Let’s break that down: technology should be designed to accomplish some purpose. It should not attempt to accomplish things for which it is not explicitly and specifically designed. (A desire to make our artifacts very general is another pervasive trap that us techies fall into.) It should fulfill its design in a reasonably efficient manner; this means that it shouldn’t be obviously and grossly inefficient, i.e., some other design should not be able to fulfill the specific requirements in a significantly more efficient manner. Efficiency is a loosey-goosey term, but it can mean computational efficiency, storage or bandwidth efficiency, ease-of-use, or any combination of the above and other types of efficiency. Efficiency is likely to be domain-specific, so the statement needs be interpreted in the context of the particular application. Minimal cognitive overhead means that the simplest design that achieves the design requirements reasonably efficiently wins.
All other considerations about a technology are illusionary, insignificant, unimportant. A technology that does what it is supposed to in an efficient manner with no more than a minimum amount of complexity is “good enough” — and there’s no such thing as “better than good enough.” The mistaken belief that there is such a thing is what leads to a chronic and puzzling thing in the marketplace: technologies that are deemed “best of breed” (i.e., exceed their design requirements on somebody’s subjective quality assessment) *ALWAYS* underperform “inferior” but adequate technologies in the market. (NeXT, Beta, Objective C, the Mac, Be, Newton, etc. etc. etc.)
Aaron falls into this trap when he describes XML as “technologically terrible.” He’s expressing an aesthetic opinion dressed up as a technological argument. The question we should ask when evaluating XML (or any technology) is: “can something else do as good or better a job on the relevant / important dimensions?” The answer for XML *might* be “yes”, but in absence of any compelling evidence to the contrary it’s probably “no.”
“Does XML Suck?” Revisited
Aaron lists “verbosity” as one of the problems with XML. He’s not alone in that complaint. However, this criticism is off base for several reasons. First, it’s important to distinguish between XML-the-syntax and XML-the-datastructure. Complaints about syntax are, generally, pretty silly. Two otherwise equivalent syntaxes for something should be considered the same; and a range of techniques exist for reducing the verbosity of XML (including judicious use of namespaces and schemas, as well as things like binary and indexed representations.) Complaining about XML’s verbosity is a generalization from some certain bad examples of XML. And: some researchers at IBM Almaden about two years ago [lost the ref, anybody] showed that a reasonably efficient XML representation of something was IIRC same-order the size of the minimal representation / encoding of something that carried the same amount of structural and semantic information. That is, with appropriately efficient use of namespaces, etc. XML will be less than 10x the minimal compressed size of the same info given any non-lossy compression scheme. In my experience, differences of that order can largely be ignored in almost all systems. (It’s the O(n^2), O(n!) etc. stuff that we’ve got to worry about.)
Aaron also emphasizes that XML isn’t the most human-readable representation. But that assumes that XML is intended to be read / written by humans. It’s tempting to make that assumption, but IMHO it’s incorrect: we make that assumption because HTML is often read / written by humans. However, just as *more* HTML is created / processed by software than by people, so too (and even moreso) is more XML created / processed by software than people. The nice thing about both is that they *can* — when necessary — be processed by humans. XML represents a nice tradeoff between human readability and efficient machine representation.
Aaron’s arguments about complexity sound reasonable on the surface, however… Complexity is a tough thing to pin down. In computer science we have good (at least adequate) tools for analyzing and understanding computational complexity e.g. time-space tradeoffs, algorithmic complexity, etc. Information theory gives us some tools for dealing with information complexity… But we have very poor or no tools for analyzing and quantitatively addressing problems of dynamic complexity of component interactions in systems, representational complexity of data structures, expressivity of languages, etc. I’ve spent over a year trying to create a theory of the former (compositional complexity in software architectures) and let me tell you, complexity is a complex notion. ;-)
Representational complexity and expressivity is an even less studied and less understood area, and while Aaron may in fact be right his argument isn’t well supported. And there are hints from information theory — such as the size order of XML vs. theoretic optima — that indicate that it’s wrong.
If Aaron can state exactly what he means by “complexity” and quantify / generalize his argument, it would be significant not only as XML criticism but as an important result in computer science.
The “acronym proliferation” problem is very real, but it’s a function of where this technology is in its lifecycle and the amount of attention it has received. It’s not surprising that there’s a “fan-out” of overlapping applications / standards / etc. related to XML — it’s relatively early, very general, and lots of people are trying to do stuff. That leads to quite a bit of noise and frustration but — inevitably — there will be a “fan-in” to a few general, standard tools for various things. Aaron even recognizes that this is happening: “Even here, the situation is improving.”
The bottom line is this: it *might* be possible to design a similar representational mechanism that accomplishes all the things that XML accomplishes — i.e., multidimensional reference structures with arbitrary attribution and strong typing… But *today* there are no existence proofs of such alternatives and, indeed, I believe that if there were they would strongly resemble XML except in the trivial details. In the absence of proposals for such alternatives, it would seem that criticizing XML is a rather empty exercise.
And Aaron recognizes the most important argument *for* XML — its socioeconomic benefits. “Everybody’s doing it” is a very *good* argument for any technology; systems that can communicate through such a mechanism grow in value with the square of the number of components, per Metcalfe’s law. Anyone using other idiosyncratic technologies to accomplish some or all of the same things actually inhibit the overall growth of value of the system.
What do you think? Does XML suck? Is it horribly inefficient? Are there better alternatives that accomplish the same thing?