CARVIEW |
Tools: February 2009
Taxonomies and Starting With XML
Laura Dawson
February 25, 2009
| Permalink
| Comments (11)
|
Listen
This is an excerpt from a blog post I wrote last week on taxonomies and chunking.
Last October, the StartWithXML team wrote a post called "To Chunk or Not To Chunk," where we discussed tagging and infrastructure issues, and a discussion ensued about what happens when you don't know what you'll be using chunks for. How do you tag those?
Later, in our StartwithXML One-Day Forum, we included a presentation on tagging and chunking best practices, where it was pointed out that no taxonomy for chunk-level content currently exists.
We have taxonomies for book-level content. These include formalized code sets such as theLibrary of Congress subject codes, the BISAC codes, the Dewey Decimal System, among others. There are also informal code sets, like the tag sets on Shelfari or Library Thing. There are proprietary taxonomies at Amazon and B&N.com that enable effective browsing.
But nothing like this exists for sub-book-level content. It's never been traded before. We've never really needed a taxonomy for it before.
Other industries that traditionally distribute "chunks" have their own taxonomies that might prove useful in building a book-chunk schema. These include the IPTC news codes, which identify the content of a particular news story -- that's the closest analogy I can find for small gobbets of content that require organization.
Industries have proprietary taxonomies to identify certain concepts -- culinary arts, music, agriculture, engineering, the sciences, literature and criticism, education, and on and on and on. But these do not necessarily identify concepts within a book.
Some might argue that we don't necessarily need taxonomies -- why can't we use natural-language search and the semantic Web to "bubble up" the "right" concepts? I'd argue that words don't always mean what we think they mean. A classic example from my library days is the term "mercury." That could mean the planet, the car or the element. Proponents of semantic search would say that the context in which "mercury" is mentioned should take care of defining that term. I'd say that's true in about 50 percent of all cases but not definitively true enough in 75-100%.
My original post gets into more detail about why taxonomies are important search tools, and how the digitization of books requires a good taxonomy ... and who should do it.
Related Stories:
Virginia Open Sourcing Physics Textbook ("Flexbook")
Andrew Savikas
February 18, 2009
| Permalink
| Comments (5)
|
Listen
I was part of a brief Twitter exchange recently with Cengage's Ken Brooks about the cost of textbooks:
kenbrooks: @doctorow #toc That depends entirely on the type of book. A K-12 reading program costs $millions.
andrewsavikas: @kenbrooks not necessarily. See ck12.org
kenbrooks: @andrewsavikas Talk to McGraw Hill or Pearson about basal reading programs. The intricacies are staggering. #toc
I like Ken a lot personally (and respect him a ton professionally), and I have no reason to doubt that it does take millions to develop many educational programs. But my reference to ck12.org (whose founder, Neeru Khosla, keynoted at TOC 2008) was because if it does cost that much, then something's wrong with the system, and that's not likely to change without the work of groups like ck12.
In fact, Virgina is already in the process of developing an open-source "flexbook" for physics using the ck12 platform:
Secretary of Technology Aneesh Chopra and Secretary of Education Tom Morris today announced the selection of thirteen individuals to form a core team to pilot the development and release of an open–source physics "flexbook" for Virginia. This electronic material will focus on high school physics and contain contemporary and emerging 21st century physics and modern laboratory experiments.
The Virginia Physics "Flexbook" project is a collaborative effort of the Secretaries of Education and Technology and the Department of Education that seeks to elevate the quality of physics instruction across the Commonwealth by allowing educators to create and compile supplemental materials relating to 21st century physics in an open–source format that can be used to strengthen physics content. The Commonwealth is partnering with the Palo Alto, California–based non–profit, CK–12 on this initiative as they will provide the free, open–source technology platform to facilitate the publication of the newly developed content as a "flexbook" — defined simply as an adaptive, web–based set of instructional materials.
"We need transformational ideas to ensure all Virginians are educated to compete in an increasingly competitive global economy," said Secretary Chopra. "This pilot initiative is a step in the right direction to introduce our students to contemporary physics topics and lab materials at no additional cost to the taxpayers or students," added Secretary Morris.
There is certainly a place for the investment-intensive educational publishing programs that only a firm with the resources of Cengage or Pearson or McGraw-Hill can provide. But there's also enormous opportunity to try new models that take advantage of the kind of collaboration that underpins all of academia to develop and distribute quality learning material for students at lower costs. (BTW, ck12 is hiring.)
Video: Android meets Eink
Andrew Savikas
February 13, 2009
| Permalink
| Comments (0)
|
Listen
Keeping with the "labs" theme for recent posts, via a tweet from George Walkley:
Lots of talk about devices at TOC - now just saw this, Android + e-ink https://vimeo.com/3162590 #toc
The guys at MOTO labs have hacked together a prototype showing Google's Android operating system running on an e-ink display:
Android Meets E Ink from MOTO Development Group on Vimeo.
The "O'Reilly Bump" and Bookworm
Andrew Savikas
February 12, 2009
| Permalink
| Comments (0)
|
Listen
During his TOC Keynote, Tim O'Reilly talked about how the status he confers through "retweets" on Twitter are really just another form of publishing, not much different from the status we confer on authors by publishing them, or speakers by featuring them (especially at multiple conferences), or hackers by inviting them to Foo Camp.
On the Web, the effects are easily measured, and Liza Daly has a post over at O'Reilly Labs talking about the bump Bookworm got from the association with O'Reilly. Her graph tells the main story, but digging deeper reveals some notable nuggets (emphasis in the original):
Because of this integration [with Stanza], iPhone and iPod Touch users account for 10-20% of all visitors to Bookworm on any given day
Photos from New York Times R&D; Lab
Andrew Savikas
February 12, 2009
| Permalink
| Comments (0)
|
Listen
Nick Bilton was a hit yesterday at the TOC Conference, and during his keynote he talked about what they're working on with content at the NYT R&D Lab. Nick was kind enough to give a few of us a private tour earlier this week, and here's some photos from the trip:
Open XML API for O'Reilly Metadata
Andrew Savikas
February 10, 2009
| Permalink
| Comments (0)
|
Listen
In addition to Bookworm, O'Reilly Labs now includes an RDF-based API into all of O'Reilly's books:
Most publishers are familiar with the ONIX standard for exchanging metadata about books among trading partners. Anyone who's actually spent time working with ONIX knows that its syntax is abstruse at best. While ONIX does use XML, there are more modern, more general, and more immediately comprehensible standards out there, particularly for the basic details like "author," "title," and "edition." One of those standards is RDF, or "Resource Description Framework." This experimental O'Reilly Product Metadata Interface (OPMI) exposes RDF for all of O'Reilly's titles, organized by ISBN.
If anyone onsite (or otherwise) puts anything interesting together with the data, we'll be happy to feature it here on the TOC Blog, just let us know in the comments.
At TOC: Cory Doctorow to Publishers: Demand Option To *Not* Use DRM
Andrew Savikas
February 10, 2009
| Permalink
| Comments (6)
|
Listen
I knew Cory Doctorow would be a great wrap up to the first day morning keynotes at TOC, and he more than delivered.
He ended the keynote with a challenge to publishers: withhold digital content from any device or service that doesn't give you the option to exclude DRM. (For example, right now publishers cannot sell books on the Kindle or audio books on Audible without DRM.) He's proposing "Doctorow's Law" which I'm paraphrasing here from memory:
If someone takes something that belongs to you, and puts a lock on it that you don't have a key for, that lock isn't in your best interests.
We couldn't agree more, and it's a big reason we sell all of our ebooks (now more than 400) without DRM (and with a Kindle-compatible format that can be added manually to a Kindle), and why we don't enable DRM in our iPhone Apps either. I agree with Cory, and strongly encourage publishers to not use DRM at all for their digital content, but at a minimum, it should at least be a choice for a publisher to make.
Good Company Culture Comes in Small Packages
Kate Eltham
February 5, 2009
| Permalink
| Comments (8)
|
Listen
Common wisdom says that small companies are more nimble, responsive and adaptable than their larger cousins.
My personal experience reflects this. I've worked in large organisations -- FMCG corporates, international aid organisations and government -- and I've worked in small ones -- private consulting firms and small non-profits. In each case I've found that small enterprises outperform large ones when it comes to transformation. Smaller companies are faster to identify industry trends and respond to new business opportunities. They also punch above their weight on some forms of R&D, particularly business process innovation. Put simply, small companies are more fleet of foot.
But why?
We're seeing a lot of reports come through about how small publishers are responding to trends and opportunities. MediaBistro and The Christian Science Monitor have both reported small publishers are leading the charge when it comes to digitization. In his article, "E-book revolution favors the agile", Matthew Shaer said:
But it's not the bigger houses, such as Macmillan or HarperCollins, that are moving the fastest. Instead, some of the most extensive restructuring efforts are being undertaken in the independent publishing world, traditionally a hotbed for innovation and experimentation.
Soft Skull Press, Canongate, Akashic are all good examples. Shaer also points out that publishing is emulating the music industry in this pattern and, I'd wager, other industries as well.
Again, I ask why?
The obvious reasons are the ones people usually point to. Smaller companies are like the canary in the coal mine. They are first to feel the effects of major shifts within an industry and may need to move faster to find solutions. On the other hand, small publishers also have an incentive to exploit technological efficiencies that might even up the playing field against big competitors.
Small size also helps with changing direction. This week Wheatland Press announced it is taking a publishing hiatus in 2009:
What this means is that I will publish no new books during 2009 (including Polyphony 7). I will continue to fill orders on existing titles and will keep those titles available through Amazon and Barnes & Noble.com ... I will explore ways to put Wheatland Press on a firmer financial footing including, but not limited to, seeking external funding via arts councils, seeking partnerships with other presses, etc. I hope the break will allow me to return to a regular publishing schedule in 2010.
On one level this could be regarded as just another volley of bad news from a publisher affected by global economic conditions. But it's worth noting that only a small publisher could make this kind of decision. HarperCollins and Random House can't make the choice to stop publishing books for a year to sort out their business model and make necessary changes. They can cut costs through staff layoffs and tightening budgets, but their operational overheads are way too large to ever get off the treadmill of publishing hundreds of titles a year.
Underneath it all, though, the one thing that has the biggest impact on a company's ability to transform is the one thing that almost never gets talked about in the publishing industry: organizational culture. Paul Biba of TeleRead, quoted in the Shaer article, hints at this but doesn't quite nail it down:
"In general, I'd say the big publishers tend to be really dinosaurs, intrigued by e-books but afraid of them ... [Younger readers] have grown up with a whole different way of looking at the world, and I don't think many publishers understand this. They think people are just sitting down in leather chairs and reading hardcopy books."
I'm not sure this is a fair characterization of publisher attitudes today, but I do think it alludes to a bigger problem that is stopping large publishers from embracing new opportunities.
Big trade publishers are fighting a losing battle against their own organizational cultures. The history of business is littered with examples of companies that couldn't transition from one paradigm to the next, not because they couldn't see the necessity, but because they couldn't undertake the necessary internal change.
The larger a company is, the harder organisational change is to effect. The big trade publishers are now subsidiaries of the largest media companies in the world with thousands of employees, hundreds of offices and decades of crusted-on beliefs, traditions and systems. Small teams, by virtue of scale, can change their organisational culture quickly, sometimes through shifts in personnel, other times by the sheer force of personality from a charismatic leader. In any case, smaller teams tend to adopt a tenacious, can-do, try-anything culture because they have to.
Organisational culture is the bedrock of performance. This, more than any problem of physical infrastructure or technical or financial systems, makes big publishers slow to adapt. Too slow, I fear, to survive the speed of change within the cultural and economic ecology of which they are a part.
New experiments are popping up, such as HarperStudio, which could be the exception that proves the rule. Only by hiving itself off as a separate, entrepreneurial unit within HarperCollins, with its own small-team culture, has HarperStudio been able to achieve the clear-eyed perspective and momentum to try really different and new ways of publishing.
Paul Biba may have called it right by using the word "dinosaur." After all, it was the small dinosaurs, with modern-day descendants still thriving, who made the successful adaptation that evolution requires. The big guys fell hard and fast and it's increasingly rare to find any evidence of their impact on us at all.
Related Stories:
StartWithXML Research Report Now Available for Sale
Andrew Savikas
February 4, 2009
| Permalink
| Comments (0)
|
Listen
If you weren't able to attend the StartWithXML Forum last month in New York, the accompanying research report is available for sale. The report covers topics like:
- Where am I and where do I want to end up?
- How much benefit do I want to obtain from content reuse and repurposing?
- How much work do I want to do myself?
- How much time and money will this take?
When you purchase the report, you get it as our full eBook Bundle, including PDF, EPUB, and Kindle-compatible Mobipocket formats.
If you're ready for a deeper dive into XML, there are two very complementary tutorials lined up during next week's TOC Conference:
And if that's still not enough angle brackets for you, check out the Introduction to XML course from the O'Reilly School of Technology, which earns you four CEUs (Continuing Education Units) and a CEU letter from the University of Illinois Office of Continuing Education. Save $50 with discount code SWXML09.
- Stay Connected
-
TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. Follow TOC on Twitter. Join the TOC Facebook group. Join the TOC LinkedIn group. Get the TOC Headline Widget.
- Search
-
- Events
-
TOC Online Conference
Join us on October 8th for this half-day online conference to explore the state of the art of electronic publishing.
- TOC In-Depth
-
Impact of P2P and Free Distribution on Book Sales
This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.
The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.
Dive into the skills and tools critical to the future of publishing. Learn more.
- Tag Cloud
- TOC Community Topics
-
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
O'Reilly Media Home | Privacy Policy | Community | Blog | Directory | Job Board | About