Related link: https://www.cafeconleche.org/oldnews/news2004May5.html
While writing a report on XML Europe for distribution at work, I couldn’t find the excellent notes taken by a former neighbor from my Brooklyn days, Elliotte Rusty Harold, so I e-mailed him to ask. (It turns out that he archives Cafe Con Leche news items on individual pages for each day of news; for example, his notes on the second day of the conference are at https://www.cafeconleche.org/oldnews/news2004April20.html
.) I also suggested that he add ID values to his block-level elements so that documents like my report could link to his discussions of specific talks at the conference.
I’ve written before in this forum (1, 2) about the value of adding ID values to HTML and XML block-level elements. As I wrote to Elliotte, “Simple automated ways to add genuinely useful metadata are few and far between, so I think it’s worth jumping on any we can find.” Tim Bray wrote an excellent essay on metadata with the assertion, catchy enough that Simon St. Laurent blogged it on the O’Reilly Network, that “there is no cheap metadata.” Tim contradicted himself, however, by listing some metadata that’s free: “filename, created/modified dates, who created it, what kind of file (HTML, Excel, PowerPoint), how big it is.” This metadata, although free, has definite value. The knowledge that Google’s number one hit for my search term is a four meg Word file and the number two hit is a 200K HTML file strongly influences my choice of which link to follow first, and providing criteria for making link traversal decisions is the whole point of link metadata.
As Tim alludes, the best metadata comes from a paid staff making human judgments about the best metadata to add. I’ll call this “judgment-call metadata” to distinguish it from metadata generated with algorithms. This is expensive, but makes sense at a business like my employer because lawyers will pay extra for summaries of court decisions and for the ability to search legal cases using keywords from a carefully maintained taxonomy. But what about users without the kind of working budget that lawyers have?
Some useful metadata is still pretty cheap. I managed to convince Elliotte that the trouble of adding IDs to block elements was worth it. Larry Page and Sergey Brin, who didn’t start off with giant server farms but by doing academic research, identified and took advantage of a new kind of web metadata that was cheaper than human editors, and it certainly worked out well for them.
The Google page rank algorithm and automatic addition of ID values are just two sources of inspiration to spur us into looking for new sources of cheap metadata—or at least to look for new, inexpensive incentives for people to add judgment-call metadata. I’d love to see the semantic web movement more concerned with finding and generating usable metadata and less focused on what to do with that metadata come the revolution. FOAF files are fun, but demonstrating the potential value of the semantic web will require more metadata than information about which of our friends also have FOAF files.
A bit of lobbying might help. I’ve asked O’Reilly to include the Subject, Secondary Subject, and Topic values entered with each of these O’Reilly network weblog postings in the RSS feeds about them. Is your blogging tool collecting more metadata about each entry than it includes in its RSS feeds? Why? Ask the people behind it.
What about new incentives for adding judgment-call metadata? Stephen Cayzer’s work at HP Labs (see his XML Europe paper), which demonstrates how better user interfaces can make the entry of metadata less trouble for the user, will hopefully inspire others to think more about acquiring good metadata and postpone some of their ideas about what to do with that metadata.
The success of javadoc compared with the overall slow progress of literate programming should also give us some ideas: why do the majority of Java programmers consider the trouble/payoff ratio for adding javadoc comments and tags to be low enough that they follow through and do it, while the developers who follow through with all the principles of literate programming are still a tiny minority? What sweet spot has the javadoc system hit that people bother to add this metadata to their code even though their code will compile and run just fine without it?
Where will new sources of inexpensive, parsable metadata come from?