CARVIEW |
StartWithXML is an industry-wide project to understand and spread the knowledge publishers need to move forward with XML. It's about the business issues driving the "why" of XML and the technical and organizational issues, strategies, and tactics underlying the "how" of getting started. There are four components to the project:
StartWithXML Blog:
Slides from "Essential Tools of an XML Workflow" Webcast
Mac Slocum
December 12, 2008
| Permalink
| Comments (0)
|
Listen
Laura Dawson has made her slides available from the recent TOC Webcast, "Essential Tools of an XML Workflow." A complete recording of the event will be posted here soon.
[TOC Webcast] Essential Tools of an XML Workflow
Mac Slocum
December 9, 2008
| Permalink
| Comments (0)
|
Listen
Tools of Change for Publishing, in conjunction with StartWithXML, will host "Essential Tools of an XML Workflow," a free webcast with presenter Laura Dawson, on Thursday, Dec. 11 at 1 p.m. eastern (10 a.m. pacific).
Webcast Overview
This webcast is for those publishers who have made the decision to pursue digital channels for their content. What tools are out there? What do all those acronyms mean? How can publishers implement new strategies without disrupting current workflows? Here we'll explore the alphabet soup of digital publishing, sort out the tools that are most useful, and help publishers find some solid ground.
Related:
A Correction!
Laura Dawson
November 26, 2008
| Permalink
| Comments (0)
|
Listen
Frank Grazioli, of Wiley, writes in to correct my last post about taxonomies:
Wiley has been exploring taxonomies for its travel content business; the cooking/psych/accounting spaces might be our next logical opportunities because the disciplines are well developed, specific, etc., that content is authored or edited in fairly controlled templates that map to our own XML content models and our belief in content models and XML has evolved that "lighter" and "more agile" are better than taggy and dense. As you so aptly point to the contextuality and "rigor" of taxonomies, these tools would allow our XML to "slip on the right jacket" for the occasion. I apologize if we led you to believe that we already have firm taxonomies in place for the three areas you specify--I wouldn't want readers/event guests to get that impression anyway.
Related:
Beyond the Tag Cloud
Laura Dawson
November 11, 2008
| Permalink
| Comments (0)
|
Listen
This is an excerpt from our research paper, which will publish in concert with the StartWithXML Forum on January 13th at the McGraw-Hill Auditorium in New York. Early bird discounting for BISG members is ending soon!
A good taxonomy is the backbone of your business -- it's how you sort your content. It allows for effective merchandising, effective marketing -- you can aim your content with the precision of a pool cue. It allows for inventorying your content -- so you know what you have ... and what you need. With your content tagged and organized, you know where everything is and how to deploy it.
Taxonomies are contextually sensitive and rigorous -- and in establishing your own, it helps to look at what other industries are doing. Wiley has adopted accounting and cooking and psychology taxonomies from those industries to organize information in its professional development titles. Educational publishers are increasingly arranging their textbooks around "learning objects" -- taxonomized pedagogical goals developed by educators themselves. Even the BISAC codes -- which are part of the ONIX system of organizing book information and therefore an XML-based taxonomy -- are developed very carefully and consensually among book industry professionals in monthly meetings.
An important aspect of taxonomy development is scope notes. Terms need definition and clarity around how they're going to be used. Documenting your taxonomy -- what you mean when you say "porcelain" (collectible china, dental work, household fixtures?), parent-child relationships between categories, and why you choose certain terms over others -- is important for the long term. Future editors and authors will need to know why your taxonomy has developed as it has.
Consistency in application is also crucial. Drop-down menus (as opposed to free-text fields) enforce structure and ensure that users don't come up with their own terms that pollute your taxonomy with duplicates or irrelevancies (or misspellings).
An advantage to using XML is that you don't have to accomplish everything at once, perfectly, from the outset. You will not be able to tag your documents thoroughly right off the bat -- who can know everything in advance? The act of tagging is recursive, and depends on market and company needs. XML allows for this flexibility. Depending on how you envision chunking and re-use, you'll tag your documents differently with each iteration. Unlike the "fire and forget" model, iterative tagging means that your books are living documents.
Related:
Another Position: XML Alone is Not Enough
Mike Shatzkin
November 6, 2008
| Permalink
| Comments (1)
|
Listen
George Lossius, the CEO of Publishing Technology PLC, wrote a very thoughtful post about our StartWithXML project for the new UK blog, BookBrunch. He comments after a report on the presentation I did at Frankfurt about our project.
George's point is that XML "is not enough." Books will live in a larger world also using XML and highly internal standards and procedures for XML use, internal to a company or internal to the book business, do not necessarily equip a publisher to live in the larger world of the semantic web.
We don't disagree with George's premise that XML can be used to position publishers better for the semantic web. The question for all publishers will be how much they can take on how fast, particularly in pursuit of models and opportunities that haven't really emerged yet. But the most forward-thinking always lead the target a bit, and George's post enumerates one aspect of that.
We urge our readers to check out George's post. And we encourage George to put his XML commentary right here on this blog; we're delighted to receive it.
Related:
For a Workflow Change, Support from the Top is Required
Mike Shatzkin
November 5, 2008
| Permalink
| Comments (1)
|
Listen
Last week Laura Dawson and I spoke about StartWithXML to a group of IT and operations people from publishers at the User Group meeting for Global Turnkey Systems, a company owned by one of our lead sponsors, Klopotek.
We got some great questions afterwards. On reflection, we realized that they touched an important theme: the need for CEO-level support for the change initiatives to put XML into the workflow. There are savings of time and money to be made by doing this, but that's not the immediate result. In the short run, the changes require more work, more effort, and, sometimes it would seem, generate a less desirable result.
This echoes what we've heard from Andrew Savikas of O'Reilly. Instead of characterizing the two elements of a publishing organization as "hard (production, accounting, ops) and "soft" (editorial, marketing), Andrew says that for XML change they are "hard" and "harder." Trying to get the most creative people in a publishing company to do something that is "harder" requires a top-down understanding that doing it is important to the business.
That's why we asked David Young, the CEO of Hachette Books in the US, to deliver our keynote address. He'll be speaking on the topic "XML: Why Bother?" That's the question every CEO must answer to get the collaboration up and down an organization that large and systemic change requires.
Related:
Recommended Reading on XML and Publishing
Andrew Savikas
October 27, 2008
| Permalink
| Comments (3)
|
Listen
While clearing out some old files, I came across a folder of articles culled during research about three years ago, while I was building the case for increasing our use of XML for book production. If you're looking to take a break from the steady stream of terrifying financial news, here's a few hours of time well-spent on angle brackets. Much of this skews fairly technical (including actual math), but there's some useful context to an XML conversation:
When Word-to-XML conversions get nasty from Mike Gross at Data Conversion Laboratory. "Before you begin a conversion, look through your source Word documents to see how well they were formatted but be prepared you may be horrified with what you find."
From the Journal of Digital information, a paper by Terje Hillesund, Many Outputs -- Many Inputs: XML for Publishers and E-book Designers. Terje takes a contrarian view on XML, though specifically calls out what many trade publishers primarily deal with as well-suited for XML: "For many typographically simple genres, like most present fiction, reuse has already proved to be relatively easy ... In the future, XML-based workflows will make re-use of many fiction genres even easier, as these visually and navigationally uncomplicated texts can be made into a variety of paper and electronic editions from the same XML document by use of style sheets ..."
The response to Hillesund from XML guru Norm Walsh, XML: One Input -- Many Outputs: A response to Hillesund. "Before considering the flaws in each of [his] arguments, it is interesting, if slightly incongruous to his arguments, to note that Hillesund's paper includes no less than four examples of the successful use of XML precisely for the publication of multiple output formats from a single input document. "
A fascinating paper from 1998, On the Pagination of Complex Documents, which discusses the challenges inherent with automated pagination of the kind found in many XML-based rendering systems (as well as older systems such as LaTeX). "Using competitive analysis we show that, under realistic assumptions, not only first-fit but any online pagination algorithm may produce results that are arbitrarily worse than necessary. This explains why so many people are not satisfied with paginations produced by LaTeX if no manual improvement is done"
It hasn't been updated since 2005, but Choosing an XML editor, from Thijs van den Broek offers a nice survey of XML editors. "The study consisted of a literature search, surveys to identify user needs, current usage, existing editors, and (existing and desired) features of editors, as well as an evaluation exercise."
Here at O'Reilly our workflow is centered around DocBook XML, but DITA (Darwin Information Typing Architecure) is a more recent XML vocabulary, also designed primarily for technical information. IBM developerWorks has a nice overview, Introduction to the Darwin Information Typing Architecture. "This document is a roadmap for the Darwin Information Typing Architecture: what it is and how it applies to technical documentation. It is also a product of the architecture, having been written entirely in XML and produced using the principles described here."
Written from the perspective of a technical documentation group at Cisco, Low-Cost, Flat-File XML for the Masses is an interesting case study from a team committed to finding a way to use XML that was both better for writers and didn't require a large investment in new software: "You can realize the benefits of publishing from modularized XML, without the expense of an enterprise publishing system, by implementing the authoring environment on top of nothing more than your operating system's file system. Although this environment is not adequate for enterprise publishing needs, it is more than adequate for the needs small writing teams, businesses with a limited number of related products, proof-of-concept demonstrations, and even home users."
Related Stories:
Can XML Help you Avoid a Disruptive Innovation?
Brian O'Leary
October 24, 2008
| Permalink
| Comments (5)
|
Listen
This semester, I'm fortunate to spend my Wednesday nights teaching management to students who are part of NYU's M.S. in publishing program. Although a significant share of the course is given over to management fundamentals, the students are for the most part already working in publishing, so they also look for connections between lessons learned and their real-world application.
One recent class was given over to "managing in periods of change" (always relevant, seemingly more so this semester). Part of the lesson includes a discussion of disruptive innovation, a term coined in the mid-1990s by Joseph Bower and Clayton Christensen to describe upstart innovations that grow to disrupt or destroy the business you are in.
Disruptive innovations typically start out as inferior ways to meet the needs of customers who are currently not served at all or who are over-served by existing options and are open to a simpler or cheaper option. Walking through this description, I was asked for a content-related example.
Maybe I do my best work on my feet (you'd have to ask the class), but I started to describe travel books. "People visit France," I said, "but not all of it. Maybe they want information on just the area around their hotel in Paris ... What's a good restaurant, a trendy bar, a place where you won't pay an arm and a leg for show tickets ...
"Today, you could get this information, but you might have to buy all of three or four different books to combine it. After that, you might go to the Web to get current information on the shows that are scheduled for the day you are in Paris. And then, you'd probably try to print maps to get you from your hotel to wherever you decided was interesting.
"Suppose instead, we created a travel database that you could search using criteria that mattered to you -- proximity to a hotel, a particular neighborhood, a time of year, your preference for trendy bars ... Zagats does this for its database, after all, and still makes printed guides. And maybe you'd buy just the parts you want, download them to your laptop or handheld and head to Paris, lighter, greener and better informed."
A structured approach to content development and management -- XML -- makes it possible to create and serve relevant searchable content.
Someone said, rightly, "But that would hurt (print) book sales." I had to agree. Disruptive innovations fundamentally disrupt the old model. If you're in a market that will be disrupted, the choice isn't whether you get disrupted; it's whether you are one of the firms that disrupts.
Ultimately, XML won't help you avoid a disruptive innovation. Depending on the type of book you publish, XML could provide the vehicle that sponsors the disruption. The choice you make in considering XML (or, to pre-empt my friend bowerbird, some form of structured content) may be between staying with your existing business model until it runs out, or hastening its demise in pursuit of a blended mix of new revenue opportunities.
Related:
StartWithXML Survey Results Preview
Brian O'Leary
October 23, 2008
| Permalink
| Comments (0)
|
Listen
The survey for this project closed a little more than a week ago. We're continuing the analysis now, and we'll do a full briefing in early November, but here are some highlights:
- 165 publishing professionals responded
- the majority (60%) work in trade or consumer publishing, with a healthy cross-section of professional, school and academic publishers also represented
- almost half of those responding work at smaller firms (fewer than 100 employees)
- of those responding, about a third currently maintain digital files in an XML format that supports content re-use and reformatting
- while a quarter of the respondents reporting using their own (proprietary) DTD or schema, the majority of those responding are still thinking through what they want to do in this area
Our initial review suggests that many publishers are making progress in the use of XML as a content management tool, and many more have an interest in learning more about current and best practice before committing to greater use of XML or an alternative. While we feel that the research paper and the forum in January will address practice and future use, it's clear from the survey that publishers are already looking at this topic.
Related:
To Chunk or Not To Chunk?
Laura Dawson
October 16, 2008
| Permalink
| Comments (4)
|
Listen
This is excerpted from a column I wrote for the most recent issue of The Big Picture, my free newsletter about technology and the book industry.
As we're proceeding with Start With XML, I'm thinking a lot about chunking.
Chunking, at least as we're talking about it, means carving up your content into chunks and distributing those discrete pieces of it. Travel content (distributed over GPS, the web, and in book form) and recipes (distributed via Epicurious and AllRecipes.com as well as in book form) are the most obvious examples of this. Textbook publishing does this as well - certain assets can be used in the main text, in supplementary workbooks and lab manuals, as individual activities to be downloaded to an iPod, or embedded in e-books.
And as we talk about chunking, it's clear that there are certain types of content that don't immediately lend themselves to that kind of carved-up distribution. Novels, for example. Narrative nonfiction such as memoirs. Philosophical or political works, where tracing the author's thought from beginning to end is important.
The truth is, we may not quite know what will chunk readily and what will not. There are some blue-sky ideas right now - tagging content within narratives, to be pulled out later and stand on its own - but we just don't know yet if readers are interested in that kind of thing.
But publishers can't afford NOT to prepare for the unknown. There has never been uncertainty like this in publishing - uncertainty in stock prices and supply chain issues (paper prices, transportation/shipping costs, the costs of composition and conversion), uncertainty in revenue-generation, uncertainty as to who's going to buy what in which format - and it's not going to get any clearer for quite some time.
And you can't chunk at all if you haven't tagged - you can't even begin to think about chunking if you haven't tagged. Tagging is never a bad strategy - you will never regret doing it. But the risk of NOT doing it - the risk of not being ready for the next wave of consumer demand whatever that demand may be - means that you can't afford not to do it.
Related:
Standardizing Tags in the Metadata Minefield
Laura Dawson
October 14, 2008
| Permalink
| Comments (4)
|
Listen
One issue we haven't discussed much is that of metadata. XML documents are by definition rife with metadata. At what point does metadata cross the line from useful to pollution?
When it's not standardized.
The kind of XML tagging we're primarily talking about can be sectioned into three buckets: rights data ("this picture is good for print products but not electronic ones," "we can use this graphic anywhere," "these animations are exclusively for the workbook"), formatting data ("this is a chapter," "this is a footnote"), and context data ("Paris," "1955," "General Robert E. Lee," "noodles").
This is a perfect recipe for complete chaos. Obviously standards are crucial to the success of using XML in publishing. Even standards within a department -- using tags the same way from one project to the next, from one PERSON to the next -- are crucial.
There's been some talk about the role of the Book Industry Study Group in developing tagging standards, in the same way they've developed BISAC code standards. And this makes a great deal of sense. The rights and formatting tag standards should be relatively easy to establish -- publishing houses, no matter whether big or small, tend to use this data fairly consistently. It's the context tags that pose the more serious challenges.
Library of Congress has done this sort of thing with its subject headings. But, like the BISAC codes, these refer to the subject of an entire book. Many books, however, are comprised of more than one topic - many chapters are comprised of more than one topic. That level of granularity has never been taxonomized before.
Still, it's important to do so in a standardized way, to avoid a cacophony that drowns out meaning. (Is it "pasta" or "noodles"? When you say "diamond," are you talking about baseball or gemstones or Neil? Why is a chapter published by Mosby about dentistry coming up in search results with the chapters on collecting Limoges china published by Antique Trader? Hint: "porcelain.")
If you've ever seen a tag cloud on a website, you'll know what I mean. You never know what you're going to get when you click on it. Standardizing context tags is probably the most thankless, boring job publishers will ever engage in. But it's also the one that's going to ensure that books are actually discoverable the way they're meant to be discovered.
Related:
When it Comes to Search, How Low Can You Go?
Brian O'Leary
October 13, 2008
| Permalink
| Comments (3)
|
Listen
I came back mid-week from the American Magazine Conference, where I heard Paul Saffo talk about the future of content, including what search tool might eventually trump Google. He introduced the term "quantum of search" - the lowest level or most granular search possible - and used it to say that the future of search will depend on your ability to return the precise results needed for each and every search.
While Saffo counseled editors and publishers in attendance to develop the lowest level "quantum of search" possible, he stopped short of saying something that is in my mind directly related: publishers have a tremendous advantage in defining what good search looks like.
Figuring out how to accurately respond to a narrow search requires intimate knowledge of both content and market. Search informs an increasingly niche-driven publishing model, a prediction that Mike Shatzkin and others have advanced, but good search is more than just an alogorithm. As we migrate to a more richly defined, "semantic" web, content that has been given meaning through well-designed editorial processes will not only be more easily sold and repurposed; it will be more easily found by the people who are most likely to benefit from finding it.
So, publishers worried about a content glut have at least two opportunities to define themselves and redefine their role. The first opportunity comes in organizing around audience-valued content niches. Generally, lawyers don't go to Google to find legal information, and in a more niche-driven world, vertical content plays will be increasingly preferred. Even if I try Google first, the trusted vertical niche with deep content should be high on the list of returned links. As publishers, we need to make sure we are there.
The second opportunity comes in using the tools we are examining here - structured content, appropriately tagged - to capture the editorial insight and rich meaning that is lost when we render content to print books and magazines. Investing now to keep that meaning and provide it in a form linked to the content will help publishers demonstrate primacy in defining Saffo's "quantum of search." The discipline of XML-driven workflows can capture that insight.
Related:
StartWithXML at Frankfurt Book Fair
Brian O'Leary
October 9, 2008
| Permalink
| Comments (0)
|
Listen
Next Friday (October 17), Mike Shatzkin (Idea Logical), Michael Healy (BISG) and Andrew Savikas (O'Reilly Media) are presenting an overview of this project at Frankfurt Book fair. If you are attending the fair and are interested in our work, consider attending the panel, which will provide:
- a description of book publishing's changing environment;
- ways that XML workflows can help publishers meet the demands of this changing environment;
- an update on the project, its surveys and interviews and related research now under way;
- background on how this project "fits" with BISG's long-standing commitment to the development and promulgation of meaningful standards; and
- first-hand experience from a publisher whose use of XML is established and evolving.
The presentations and discussion will take place at noon on October 17 in Brillianz Room, Halle 4.2.
Related:
Respond to the StartWithXML Survey Before It Closes on Friday!
Mike Shatzkin
October 8, 2008
| Permalink
| Comments (0)
|
Listen
We are very pleased that over 125 people have already responded to our StartwithXML industry survey, which you can find here.
We will start blogging a bit about the results later in October. Complete results will be published in our Research Paper, which will debut at the Forum on January 13, 2009 at the McGraw-Hill Auditorium.
There's no attempt to be "scientific" here, but we are getting some very though-provoking results.
Related:
StartWithXML Research Paper: A Work in Progress
Brian O'Leary
October 4, 2008
| Permalink
| Comments (0)
|
Listen
The January forum will be accompanied by a research paper whose content is informed by a combination of publisher interviews, supplier discussions, two surveys and the conversation that continues on this blog. If you haven't had an opportunity to answer the publisher survey, this is the week: it closes on October 10.
Based on the feedback we have received from publishers and suppliers, we have started to refocus some parts of the original research paper outline, which you can read here. Changes we are planning include:
- moving the current first section to the appendix, so that we can ...
- start by making the case for why publishers should consider XML now
We are also adding two practical components, an XML "starter kit" (what do you need to get off the ground?) and an XML "checklist" (what kinds of things should you consider before undertaking an approach that starts with XML?)
The outline guides us, and it remains a work in progress. We welcome any comments you may have on its contents or structure.
Related:
- Premier Sponsors
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2008, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
StartWithXML Discussions | StartWithXML Survey | Register for StartWithXML | Contact StartWithXML
O'Reilly Media Home | Privacy Policy | TOC Community | TOC Blog | TOC Directory | TOC Job Board | About TOC