CARVIEW |
Laura Dawson: February 2009
Taxonomies and Starting With XML
Laura Dawson
February 25, 2009
| Permalink
| Comments (11)
|
Listen
This is an excerpt from a blog post I wrote last week on taxonomies and chunking.
Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?
Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.
We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.
But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.
Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.
Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.
Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.
My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.
Related Stories:
- Stay Connected
-
TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. Follow TOC on Twitter. Join the TOC Facebook group. Join the TOC LinkedIn group. Get the TOC Headline Widget.
- Search
-
- Events
-
TOC Online Conference
Join us on October 8th for this half-day online conference to explore the state of the art of electronic publishing.
- TOC In-Depth
-
Impact of P2P and Free Distribution on Book Sales
This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.
The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.
Dive into the skills and tools critical to the future of publishing. Learn more.
- Tag Cloud
- TOC Community Topics
-
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
O'Reilly Media Home | Privacy Policy | Community | Blog | Directory | Job Board | About