CARVIEW |
Recently by Liza Daly
Slides from "What Publishers Need to Know about Digitization" Webcast
Liza Daly
November 13, 2008
| Permalink
| Comments (0)
|
Listen
TOC will be posting a complete recording of the presentation, but in the meantime I've posted the slides from yesterday's webcast, "What publishers need to know about digitization" on Slideshare.
Thanks to everyone who attended and especially to those who asked so many excellent questions.

The Analog Hole: Another Argument Against DRM
Liza Daly
October 23, 2008
| Permalink
| Comments (2)
|
Listen
Digital rights management (DRM) might be unpopular with the public and plagued with social and technical challenges, but at least it's a guarantee that digital books can't be pirated — right?
Not so fast. Experienced computer crackers will find weaknesses in any encryption scheme, but regular folks with basic computer skills can exploit the one weakness found in all DRM'ed media: the analog hole.
What is the Analog Hole?
The "analog hole" reflects a basic principle of physics: before humans can consume any digital media, the ones and zeroes that computers understand must be converted into an analog format that our senses can perceive. For music, it's sound waves; for video and for digital books, it's patterns of light.
If you've ever visited a major metropolitan city you've probably seen the analog hole in action: street vendors selling pirated copies of popular movies, often months before they're officially released on DVD. Most of these are "cam" films, shot in real movie theaters using camcorders. Even without access to a physical copy of the film, pirates are able to capture its analog expression: the sound and pictures as perceived by a theater-goer.
In music, the analog hole is often used to get around software preventing digital copying. A user simply plays the the desired song on their computer using the legal DRM-enabled software, and records the audio coming out of their computer. Now they have a copy of the sound recording, which can be re-imported into the computer and digitally-encoded, with the original DRM stripped out. (A similar principle is at work when DRM systems go defunct and users are told to pirate their own music, although the industry uses the euphemism "making a backup.")
Film and music companies are painfully aware of the analog hole and have taken steps to close it, either by monitoring patron behavior (as in movie theaters) or by petitioning to legally limit the recording features of consumer electronics.
Because reading is a visual experience, there is the possibility of an analog hole exploit. Unlike with camcorder copies or re-burned MP3s, there is a potential for no loss in quality. And with a little ingenuity, the process can be completely automatic.
One example: Ebooks and Optical Character Recognition (OCR)
Here's a sample digital book as displayed in Adobe Digital Editions. (This book is public domain and isn't technically covered by DRM, but the principle is exactly the same.)

I hid as much of the Digital Editions menus as I could and took a screenshot of this first page of Pride and Prejudice.
Next I downloaded some free optical character recognition (OCR) software. OCR programs can "read" images and output the words in them as plain text. It's a normal part of digitization projects, in which archival printed material is first scanned and its text is automatically extracted. At the consumer level, OCR software is often bundled with commercial scanners and fax machines.
I took my screenshot and fed it to the OCR software. Here's what I got without any special fine-tuning or spell-checking. Note that all typos are from the OCR software.
Chapter 1
It is a truth universally acknowledged, that a single man in possession ofa large fortune must be in want of a wife, However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of someone or other of their daughters.
"My dear Mr. Bennet," said his lady to him one day, "have you heard that Netherfield Park is let at last?"
Mr. Bennet replied that he had not.
...and on through the entire first page. This output was in HTML, ready to be posted to the Web for anyone to read.
The OCR isn't 100 percent accurate, of course, but neither are the widely-available pirated ebooks created by laborious scanning of physical books, page after page. I was also using free software that requires careful fine-tuning to get working optimally; commercial OCR software is much better, especially when combined with spell-checking.
It wouldn't be difficult to automate the process of advancing one page in Digital Editions, taking a screenshot, and passing that on to my OCR software. Once the workflow was in place, I could strip hundreds or thousands of books of their DRM in a matter of minutes.
Another Possibility: Speech Recognition
My local library is kind enough to allow me to check out digital audiobooks. Naturally they're also secured with DRM (so much so that I can't actually play them, as they require Windows Media Player and I have only Mac and Linux computers). But assuming I could play them, I'd have available to me a nice stream of professionally-produced audio.
You're using speech recognition software every time you call a customer service line and an automated voice prompts you to speak your credit card number. If that's happened to you, you also know that speech recognition isn't 100 percent accurate yet, but under certain conditions it can be quite good. Automatic speech-to-text transcription isn't nearly as far along as optical character recognition, but it's another analog hole exploit that will eventually become trivial to perform.
Does This Mean Publishers Shouldn't Produce Ebooks or Audiobooks?
No! What I hope to convey is that DRM is not a true safeguard against ebook piracy. (It is, however, a known deterrent to ebook adoption.) I've heard a lot of passing the buck on DRM: publishers claim authors want it, booksellers claim publishers insist on it. These days it's hard to find someone to publicly state that they're actually for it.
I think of DRM like this: years ago my apartment was broken into and I called a locksmith to replace the door. My landlord had authorized me to get "the best lock possible," and the locksmith obliged with a four-foot steel bolt. It was almost too heavy to turn but made a very satisfying noise when it snapped shut.
I asked the locksmith, "Is this really unbreakable?"
"The lock is, sure." He slapped the door frame. "But this is made out of wood. If I really wanted to get in I'd just kick out the door. That's why I'm honest about what I sell." When I looked puzzled he handed me his business card. It contained his name, phone number, and company slogan: "A feeling of security."
Authors and publishers should be compensated for their talent and their hard work, and the desire for DRM is understandable. Book lovers, too, want their favorite authors to succeed. But digital books are a form of technology as much as they are literature, and technologies that are successful adapt to people's needs. Is just a "feeling" of security worth the ire of good customers who want to read their books wherever and however they like?
Related Stories:
Publishing Lessons from Web 2.0 Expo
Liza Daly
September 26, 2008
| Permalink
| Comments (0)
|
Listen
Last week I was in New York for the city's first Web 2.0 Expo. I was a member of the program committee and one of our goals was to make it a uniquely New York event. This meant a real focus on measurable outcomes and integrating Web 2.0 principles into established business, in contrast with the more startup-friendly atmosphere of the San Francisco event. The fact that the conference ran during the week of the Wall Street meltdown only reinforced the need for pragmatism in tough economic times.
Naturally I was interested in applying what I learned to the publishing world. If you couldn't make it to the event, here were my big take-aways:
Web 2.0 is social software
Consultant Dion Hinchcliffe's tutorial on the Web 2.0 landscape summed it up best: Web 2.0 means software that gets better the more people use it. This is radically different from traditional software development, which gets better only when programmers add new features. (In the case of Microsoft Word, it generally gets worse.)
The best example in the publishing space is LibraryThing, which has a more accurate book catalog than Amazon.com, but also content found nowhere else. My favorites are the Legacy Libraries, which collect works associated with famous dead people. The Legacy Library project illustrates a related principle of Web 2.0: encourage unintended uses. LibraryThing was designed for individuals to catalog and rate their own books, but this user-driven initiative has added tremendous unexpected value.
Thinking outside the box
That is, outside of a single computer (geeks like to call them "boxes"). More Web applications are either being built on top of other services, or make use of so-called cloud computing. Amazon, Google and other providers now offer a wealth of ready-made software and infinite computing power to allow companies to leapfrog over problems of cost and scaling.
Only a few years ago when I was approached by a publisher to start a project, we would begin at the beginning: purchasing a computer, selecting a service provider, writing some HTML, crunching some data. With services like Amazon's Elastic Compute Cloud, there's no longer any need to buy hardware: instantly an application can be deployed on one computer, or a thousand, at very low cost. This makes experimentation much more feasible: if no users come to a new product, no expensive hardware investment has been wasted. If it's successful, a few keystrokes can add 10X the computing power.
Cloud computing has also created tremendous benefit for offline processing tasks, as shown by The New York Times when converting their digitized archive for use on the Web.
It's not just about people, it's about data
Finally, Toby Segaran's talk on "The Ecosystem of Corporate and Social Data" reminded me how much value publishers have. Toby explored clever ways of finding usually-expensive data for free (for example, rather than paying for Yellow Page listings of restaurants, he scraped the New York City health department Web site, which includes ratings of every food-service facility).
Diving deeper, he emphasized how much value can be added to digital services if they are already full of content. Wikipedia came preloaded with a public domain encyclopedia, as it's much easier to correct or update old content than to enter it wholesale. The more of your content that users can find and interact with (for example, by providing an extensive full-content backlist), the more engaged they'll be.
Speaker presentations for the conference are available here: Web 2.0 NYC presentations.
Related Stories:
How to Read any Type of Document on the Kindle (Almost)
Liza Daly
August 26, 2008
| Permalink
| Comments (5)
|
Listen
There are a few options for readers who want to convert PDFs or other non-supported files to the Kindle's AZW format. Amazon's recommended method is to email the file to your personal Kindle email address. It's also possible for users to convert PDFs and other document types themselves using Mobipocket Creator or Stanza.
All of the above methods have the same flaw: AZW does not support the kind of advanced layout available in formats like PDF, and non-Latin fonts aren't easy to convert. What if you need to review a complex legal form, or read a graphic novel, or one in Chinese? A hidden feature can help.
The Kindle has an undocumented picture-viewing mode that was first uncovered by Igor Skochinsky. Although the black and white E Ink screen is not especially good at displaying actual photographs, it is quite good at rendering line art and text.
Here's how to do it, using PDF as an example. Note that unofficial features may be buggy and could damage your Kindle; proceed at your own risk.
- Convert the PDF to a series of images. Commercial versions of Acrobat should be able to do this in batch, but users of free readers may have to convert a page at a time. The Kindle can read JPEG, PNG and GIF; the latter two will work best. Because the picture-viewing application doesn't support a table of contents, you'll need to name the image files in ascending alphabetical or numeric order (e.g. "0001.jpg," "0002.jpg," etc.) For best results, resize the image to 600 x 800, the resolution of the Kindle screen.
- Connect the Kindle to your computer using the USB cable. Once connected, browse to the Kindle's drive. If you have an SD card installed that will appear on your computer as well. The following procedure works on either the Kindle or the SD card. I prefer to do everything on the SD card -- it feels safer.
- Create a folder called "pictures," and a folder inside of that with the name of your "document." Put the images in the document folder. Disconnect the Kindle from the PC. When you go to the Kindle's home screen, nothing will have changed. This is where the secret feature comes in:
- Press Alt-Z from the home screen. Your book title should appear in the list.
- Click on the book title. It will open the first image. Use the normal Kindle next/previous buttons to page through the "book." The picture viewer has menu options of its own to control the size of the image and how it's rendered.

Credit: octopus pie
Of course because the "PDF" is really an image it's not possible to search the document or rescale the fonts. Text-heavy PDFs should be converted in one of the recommended ways.
This same technique can be used to load image-based documents directly, such as comics. (Peeking inside the "pictures" folder after it's been read by the Kindle reveals a file with the extension manga, suggesting that the picture viewer was intended to be used for this purpose).
It's also possible to convert documents in Russian, Chinese or other non-Latin scripts this way. The Kindle does have support for embedded non-Latin fonts as part of its "Topaz" file format, but there are no tools for end-users that output Topaz.
(Screenshots courtesy the undocumented Alt-Shift-G feature, which saves to the root of the SD card.)
Related Stories:
Optimizing Web Content for the Kindle Browser
Liza Daly
August 13, 2008
| Permalink
| Comments (0)
|
Listen
Amazon's Kindle store is convenient, easy-to-use and stocked with thousands of titles.
But what about publishers and content distributors who want to reach the
estimated 240,000 Kindle users without going through Amazon's program? And what about content formats that the Kindle does not directly support?
One selling point of the device is its free, ubiquitous Internet service and Web browser. Amazon has filed the browser under "Experimental" but it's quite usable as-is. With a few simple changes to a Web site's HTML code, it's even possible to specially cater to Kindle users.
The screenshots used in this article are from the mobile version of Bookworm, my Web application for reading ebooks in the EPUB format. Although what's being displayed is ebook content, it's being delivered by the Kindle's browser, not the Kindle ebook technology, which does not yet support EPUB.
Because the mobile Web version is already heavily optimized for small devices, the layout is simpler than a traditional Web site. What works for an iPhone or other wireless device will also be a good starting point for the Kindle, although we'll see there are some special considerations that don't apply to any other device.
Default or Advanced Mode?
When the Kindle ships, its Web browser is in "default mode." It will not load images or CSS styles, but it does render basic HTML tags like the italic tag <i>. Personally, I prefer "advanced mode," which displays Web pages more like a traditional browser, but some sites can be unreadable in this mode.
When optimizing for the Kindle it's best to consider that most users will not change from "default mode," or even realize that the option exists.
How different are these modes? Here is a comparison shot of the same screen from Bookworm in both modes:
![]() |
![]() |
My list of books in Advanced mode, showing tabular layout and more advanced font styles | My list of books in Default mode |
In Default mode, all the information about the books runs together. It would be better to present this as a simple vertical list, the way the Amazon Kindle store does, rather than as a table.
Font Size Considerations
You can choose from six font sizes in the Kindle browser. As a content creator, you can provide a wider range of font sizes in your Kindle-formatted Web page, but take care that they aren't too small. The device doesn't clearly display fonts that are smaller than its default six sizes.
In this screenshot, the table of contents for a Bookworm book is not readable, even though this page has already been tailored for the small display of mobile phones:
This problem is only likely to occur in Advanced mode where stylesheets are activated.
Usability
The Kindle's method of selecting and traversing hyperlinks is unique. The user activates links by selecting along the vertical, or Y-axis, using the scroll wheel. When multiple links fall on the same line, the Kindle will open a dialog box so the user can clarify which link is the target.
In Bookworm, users move to the next or previous chapter by selecting navigation links lined up horizontally (see the top row of the first image). In the Kindle, this presentation forces the user to click a second time to select the appropriate one:
For commonly-used navigational items like this, line up the links in a vertical row:
- Next
- Contents
- Previous
Now no second click (and accompanying page refresh) is necessary.
It's also important to remember that the Kindle is a black-and-white device. If your site uses text color to convey any useful information (such as what is or is not a hyperlink), re-work the design to accommodate a grayscale display.
Finally, keep pages short. The Kindle cannot scroll; long Web pages are paginated like books. Pagination with E Ink devices is slow relative to scrolling on a computer screen. If possible, keep all your content on the first Kindle "page" when viewed at the default font size.
Targeting the Kindle
Web browsers are identified using their "user-agent" string. The current
version of the Kindle is broadcasting this user-agent:
Mozilla/4.0 (compatible; Linux 2.6.10) NetFront/3.3 Kindle/1.0
(screen 600x800)
.
It's beyond the scope of this article to describe how to set up your
Web site to deliver different kinds of content to different browsers,
a process that varies considerably with your site's technology.
How do you test your layout if you don't have a Kindle? There's no substitute for having the real device (tell your boss it's for "research"), and currently Amazon does not offer any kind of browser emulator. Some possibilities:
- Disable stylesheets on your browser and look at the output. Does it still make sense? (Instructions for disabling stylesheets; Firefox users should install the Web Developer add-on)
- Use a text-only browser like Lynx
Some Last Advice
Don't spend too much time on this process. The next version of the Kindle is expected soon, no doubt with an improved browser. Indeed, Amazon could offer a new version of the existing browser at any time. Most of the changes recommended above should take little time and money to implement, and can make a great difference in user experience.
In addition, optimizing your site for small-screen browsers can have other benefits: they allow an increasing number of mobile users to get quick access to your content, and aid accessibility for screen-readers and other non-standard browser types.
Related Stories:
Processing the Deep Backlist at the New York Times
Liza Daly
August 1, 2008
| Permalink
| Comments (0)
|
Listen
At the O'Reilly Open Source Convention (OSCON), Derek Gottfrid of the New York Times led a fascinating session on how the Times was able to utilize Amazon's cloud computing services to quickly and cheaply get their huge historical archive online and freely viewable to the public.
How big is the archive? Eleven million individual articles from 1851 to 1980, or 4 terabytes of data (over 4,000 gigabytes). The Times got it ready for distribution in 24 hours, for a total cost of $240 in computing fees and $650 in storage fees.
As part of their original TimesSelect subscription service, the paper had scanned their entire print archive. Each full-page scan was cut into individual articles. Typical of newspaper format, the articles often spanned column or page boundaries, which meant that many articles were composed of several scans. In the original subscription-based program, whenever a reader requested one of these historical articles, the Times computer would need to stitch together all of the scans for a particular article before presenting it.
This on-demand process used significant computing resources, but because TimesSelect was subscription-based there was never much traffic. Once this archive was open to the public it was expected to generate greater usage, and the safest approach in those cases is to serve pre-generated versions of all 11 million articles. Using traditional software development practices -- with a single computer churning through one article at a time -- the processing could potentially take weeks and tie up Times servers that were needed for other tasks.
Gottfrid turned to Amazon Web Services (AWS) and its two main products:
Amazon Elastic Compute Cloud (EC2) is a form of "virtualization" where one very large computer is divided up into many virtual computers that can be individually leased out for use. Traditional hosting costs money whether the server is working or idle; in EC2 you pay only as long as the virtual computer is running. When it's no longer needed, it's shut down. This makes the service ideal for one-off processing jobs.
In addition, Amazon doesn't care whether you use one EC2 "instance" 100 times, or 100 instances all at once -- the cost is the same. The difference is when you can usefully divide a job into 100 concurrent tasks, because then it takes 1/100th the total time.
Amazon's other major AWS offering is the Simple Storage Service (S3), for large-scale file hosting. Like EC2, it is a leased model -- you pay only for the space that you use in a given time period.
Gottfrid leveraged these technologies in combination with a relatively new software library called Hadoop. Hadoop is written in the Java language and is based on work done at Google. It allows programmers to very easily write programs that can be run simultaneously on multiple computers.
Combining Hadoop concurrency with EC2 and S3, the Times was able to run a job that might have taken weeks of processing time and complete it in 24 hours, using 100 EC2 instances. They were pleased enough with S3 it became their permanent hosting platform for the scans. Hosting with Amazon or other cloud computing services is usually cheaper and has much better bandwidth than the average provider, although downtime can and does occur.
At last year's OSCON, the Times announced the formation of its developer blog, Open. You can read more about the original AWS project as well as TimesMachine, a project that became economically feasible due to the low cost of AWS.
Related Stories:
ALA 2008: Librarians and Patrons Want More Openness
Liza Daly
July 9, 2008
| Permalink
| Comments (0)
|
Listen
At this year's American Library Association (ALA) conference in Anaheim, Calif., one theme emerged in talk after talk: librarians and the readers they serve demand more flexibility, transparency and openness in publishers' offerings. This affects not just digital-only reference works, but the book acquisition via library catalogs and standalone ebooks.
Reference publishing and resource discovery -- Reference publishers invest time and money in bespoke search interfaces for advanced users, but are users finding them? In the ALA panel "The Future of Electronic Reference Publishing," librarians repeatedly commented that multiple reference sources are confusing to users, and that resources must also be discoverable via Google and the library's own digital catalog.
If users do go directly to an individual resource or platform, the search interface should behave "like Google." Although the panel of major reference publishers did state that they are converging on Google's query language, many legacy systems remain that would be economically infeasible to re-tool.
Library catalogs and systems -- The need for more transparent, network-based services applies to the library catalog as well. In the marathon session, "The Ultimate Debate on the Future of the Library Catalog," speakers identified a critical need for geo-based services and APIs for finding what's in my local library -- now. Once a book is located I should be only a few clicks away from reserving it or even ordering it for delivery to my home.
That dream is still far off -- even with a service like WorldCat it's not currently possible for me to find and reserve a book at my local library. The closest offering presented on WorldCat is Harvard University's library, which is not about to lend to the likes of me. The problem is even worse for rural libraries. As for my local library -- I love books and this post is the first time it even occurred to me to visit their site. I'm not alone in that.
Ebooks -- This is a transitional time in publishing, and while many patrons still prefer print, an increasing number are asking for electronic books, especially in university libraries. Students and academics emphatically reject DRM and restrictions on usage, but many ebooks sold to libraries have technical barriers to printing, cut-and-paste and downloading.
Licensing and subscription costs are also a concern for libraries. Ebooks may be re-priced or re-bundled, challenging the basic assumption that once a library buys a title, it owns the book indefinitely. Librarians want assurances that the products they purchase are either available perpetually, or at least have clearly-stated licensing terms that do not change without notice.
The ability to safely and permanently archive electronic books has been a long-time concern of some librarians, but the floods in New Orleans and Iowa have changed some minds. Off-site electronic archiving would save at least some resources, especially for very small or rural libraries can't afford state-of-art preservation facilities.
Related Stories:
Exploring DIY E-Reader Platforms
Liza Daly
June 23, 2008
| Permalink
| Comments (1)
|
Listen
I've been working with the EPUB open ebook format a lot lately, but when I want to read a book in it, I have to use my computer. There just aren't any devices which support it yet. Naturally this leads me to wonder whether I could build my own e-reader.
I'm not a hardware person, but the last few years have seen an emergence of open hardware platforms designed to allow even ordinary programmers like me to modify and customize small devices. As far as software goes, an e-reader is pretty straightforward: it's just some text on a screen. That shouldn't be too hard, right?
Surveying the landscape of hardware options, I've ranked below a variety of devices from "friendliest" to "most-intensive DIY." I'm not addressing PDA or phone devices here, largely because I consider their screen size and text rendering insufficient (but plenty of people disagree).
The Chumby -- With a 3.5" touch screen and reasonable $175 price tag, this little wireless computer in a bean bag is an obvious candidate. There's a full-fledged development environment and large community of users. Most Chumby applications are written in a lightweight version of Flash, which is easy enough to develop in.
It has a few downsides, though. The Chumby doesn't have much storage space at all, so any ebooks would have to be saved online and streamed to it, a page or a chapter at a time. Since it's meant to be an always-on wireless device, that seems doable. The screen might be too small to comfortably read lots of text, as it's meant for short bursts like RSS feeds or Twitter updates.
Unfortunately, it's powered by a wall outlet, with only a small 9-volt battery for emergency backup. People on the hardware forums have managed to hack in rechargeable batteries, and I wouldn't be surprised if a totally-wireless Chumby is forthcoming. [Disclosure: O'Reilly AlphaTech Ventures is an investor in Chumby Industries.]
BugLabs -- The most open of the commercial hardware platforms, BugLabs sells individual pluggable modules that support various features, from touchscreens to cameras to GPS. It looks like a great platform, but it's very expensive ($349 for the base module plus $119 for the 2.5" touch-sensitive screen). The screen is probably too small for comfortable reading, but the company Web site promises a larger display soon.
Both the Chumby and BugLabs have touchscreens, which is key for making small screens more usable.
The Kindle -- All the heavy lifting has been done already to get into the Kindle filesystem and peek inside. It's probably too difficult to extend the existing Kindle environment without true source code, but it might be possible to do some simple things, like add new fonts. Few people have really explored hacking on e-ink devices, largely due to high cost and low availability. I suspect when the first color e-ink devices come out, used black and white ones will become popular playthings for enthusiasts.
YBox2 -- For the ultimate DIY experience, the YBox2 platform is a pile of electronic parts you solder together and assemble in an Altoids tin. It doesn't come with a touch-screen, or any screen at all: you connect it to a television or monitor. It uses the tiny Propeller chip, which powers many hobbyist devices and small robots. Like the Chumby, YBox2 comes with networking capability but little storage, and would need to stream book content from the Internet. The networking isn't wireless and of course there's no handy rechargable battery, but if you are the kind of person who can build a YBox2 you probably know how to make those too. I am not that kind of person.
While I'd be happy to crawl around a hacked Kindle, I know I'm not ready to program my own microcontroller. BugLabs seems great from a developer standpoint, especially when they release a larger screen, but I'm unwilling to shell out almost $500 just to experiment. The Sony Reader doesn't have networking, so that's much less interesting. Perhaps a Chumby is in my future. Any other options?
Related Stories:
Release Early, Release Often: Agile Software Development in Publishing
Liza Daly
June 12, 2008
| Permalink
| Comments (0)
|
Listen
"How do Web startups release three or four new versions of a product in the time it takes publishers to launch just one new feature on their online platforms?"
This question framed "The Agile IT Organization," a lively and well-informed discussion at the recent Society for Scholarly Publishing annual conference in Boston. As a software engineer, I've used both agile and traditional product development methodologies and I was interested to hear the perspectives of other programmers as well as publishers who've gone through the process.
Geoffrey Bilder of CrossRef provided an introduction to agile development practices, which are concisely summarized in plain English by a core set of principles.
Summarizing even further, agile development means:
- Minimal up-front specification. A project has high-level goals (e.g. "make our back catalog searchable and available for print-on-demand purchase"), but is not fully described before development begins.
- Frequent, short-cycle releases. A project is broken up into mini-projects, each with a small set of features that take only a few weeks to implement. Every release ("iteration") has a specification, development and testing phase. This means that every couple of weeks the software is fully usable, although it may have very few features at the start.
- Change to the product design is accommodated and even expected. Market conditions, corporate re-organization or user demands may mean that new features are added or old ones are re-worked. Changes are treated as just another iteration.
The panel at SSP focused on two approaches: internal, IT-driven products, and those developed by a third-party vendor. Larry Belmont, manager of online development at the American Institute of Physics, gave an excellent presentation on the in-house approach. His organization ran its first agile project with a timeline measured in days rather than weeks or months.
Leigh Dodds, CTO of Ingenta, provided the vendor perspective, and described the principles of a formal type of agile development known as Scrum.
The panel was, to their credit, enthusiastic about the approach, but agile development requires commitment and is not right for every
organization or project. Some caveats that need to be emphasized:
- Short development cycles come with a price: you will be asked to review and comment on small pieces of the larger project, and be involved on an almost daily basis. Many publishers need vendors they can treat like plumbers: "I want a new sink put here, it should look like this, call me when it's done." If someone in your organization isn't prepared to think very hard every day about copper pipe fittings, agile isn't right for you.
- Project managers must be empowered to make decisions. Whether the project is in-house or vendor-driven, every day the PM will be asked to make calls without appealing to higher powers. When editorial buy-in is required, or when the product needs a larger review, consider a hybrid approach: appoint a single decision-maker with deep editorial knowledge to work on evaluating, testing and approving each iteration, but use a more traditional alpha/beta/gold release process for the wider group.
- Product features may change, but time and budget should be invariant. Hard deadlines might seem to be antithetical to the free-wheeling, change-friendly agile approach, but in my experience they're critical. They focus the entire team: key decision-makers cannot spend weeks in committee, IT personnel don't fear the "death march" project with no end in sight, and it's more difficult to introduce budget overruns that cause friction with management and vendors. If an agile project does run out of time, you will still have a launchable product that's been thoroughly tested and reviewed all the way down the line, not something just out of beta with weeks of QA ahead. Many agile methodologies use the hard deadline, or timebox, as the primary method of structuring the project.
"Release early, release often" can sound a lot like "throw whatever we've got out the door." This is one reason why the iterative approach has been so embraced by Web startups: each small release has been thoroughly tested and evaluated, and there's never a moment where the software doesn't work. It's possible to to go live with a project that might not be "finished" according to the original master plan, but might otherwise be caught up in insurmountable technical hurdles or tied up in editorial review.
If publishers are going to be ready for an "iPod moment," this kind of flexibility and responsiveness is critical.
Related Stories:
What OpenID Can Do for Academic Publishers
Liza Daly
May 29, 2008
| Permalink
| Comments (2)
|
Listen
OpenID is a free, decentralized system for managing your identity online. What does that mean? It's easy to explain by example.
Right now you probably have dozens of accounts on different Web sites. It's likely that you use the same (or similar) user names and passwords on all of them. OpenID solves the problem of creating nearly-identical accounts on different services, and also allows you to control how much personal information you provide to each service that asks for your OpenID.
What makes OpenID interesting in the publishing community is that it distinguishes between two concepts that are often conflated:
- Identity: Who am I?
- Authentication: What do I have access to?
Traditional user name and password schemes are used for both purposes, but they are actually quite different.
Identity only -- When I shop at Amazon.com (assuming I'm not boycotting it), I only need to provide my identity. I don't need any special permission to access Amazon's search and browse features. What I do want to protect are my account information and shopping cart, but arguably those belong to me, not Amazon.
Identity and authentication -- When I want to post to the TOC blog, I need to provide both types of credentials: identity, so the blog software can put my name under my post, but also authentication to prove that I'm a registered contributor. If you write a comment to this post, you'll only be asked to provide identity.
Authentication only -- The third case -- authentication without identity -- is common in subscription-based journals and research material. I can go to the Boston Public Library, sit at a terminal, and get access to hundreds of online resources in the deep web that aren't available to the general public. The library has paid for the right to access the resources, but those sites only need to know that I'm authenticated through an institutional subscription, not who I am as an individual. This is the correct default behavior, and it's admirable that librarians fight hard on behalf of patrons to explicitly protect users' identities.
This leaves academic and journal publishers without an obvious way to offer their users some of the benefits of identity-based systems: bookmarking, tagging, annotating, and sharing. One solution is to build another layer of access control: first I authenticate, either by using a library terminal or entering my library card number, and then I identify myself with yet another user name and password. Only then do I get the ability to save searches, bookmark documents and possibly share those with other authenticated users of the resource.
Publishers could instead use OpenID to handle identity management in these products. Compared with building such a system from scratch, OpenID is inexpensive and is already fully-implemented in many programming languages.
Users benefit in several ways: they don't have to create a new account and remember another set of credentials, and now they have new options for personalizing their research experience. It also opens up the possibility of tying together saved resources across multiple products owned by different publishers, similar to some types of citation management software.
Currently, signing up and using OpenID can be a bit confusing for novices, but the user experience is expected to improve. In the near future it's likely to be largely opaque to end-users, who will only need to know that their identity is managed by a source they already trust.
One last point that's relevant to library users: an OpenID account can still provide anonymity. There's no requirement or guarantee that my OpenID account name has anything to do with my legal name. It's likely that many users will have multiple OpenIDs in the same way that people use throwaway email accounts when registering on Web sites. However, the onus is still on the end-user to be careful where and how they distribute their personal information.
Related Stories:
Storytelling 2.0: Alternate Reality Games
Liza Daly
May 21, 2008
| Permalink
| Comments (2)
|
Listen
Publishers are experimenting with an emerging form of interactive entertainment known as Alternate Reality Games (ARG). ARGs are mediated by the Web but they also extend into the real world, with players traveling to physical places and interacting with game characters via email, text messaging, Twitter, and even "old-fashioned" telephones.
I spoke to the founders of ARG design firm Fourth Wall Studios, the company that created the first publishing ARG, Cathy's Book. I wanted to know if ARGs are a viable form of commercial storytelling, if they can be packaged up after the experience has ended, and if they can engage with a wider audience beyond hard-core gamers.
Q: Do you think the high level of engagement required of an ARG limits the audience? Is there such a thing as a "casual" ARG, that can be enjoyed in the spare moments between soccer practice and dinner time?
A: Elan Lee, Fourth Wall Studios Founder/Chief Designer: ARGs up until now have been like rock concerts. Thousands (if not millions) of people come together at one point in time to collectively experience something incredible. They have a good time, sing along, maybe buy a t-shirt, but when they go home to tell their friends about it, there's no action their friends can take other than to hope they don't miss the next one. The traditional ARG is an experience that exists between the start and end date of the campaign, and if you weren't there at the right time, you simply miss out.
To continue the metaphor, think of our games [at Fourth Wall] as ARG "albums" instead of concerts: something you can play when, where, and how you want. Ultimately, it is only through this "album" approach that this new form of entertainment is going to evolve into a mainstream genre of storytelling.
Q: Many ARGs have been developed as promotional tools for other media: music releases, films, TV series, video games, and now books. Is there a perception that ARGs have to be in support of something else, rather than entertainment themselves?
A: Elan Lee: ARGs have had their roots in marketing because frankly, at this early stage, that's a great place to find money. Marketers have a tougher job every day of finding ways to get their message heard above the noise, and they have a lot of money to throw at the problem. It's a great situation for both sides: marketers get to engage their audience in a way that attracts, involves, and maintains an audience around a product. ARGs benefit in that we get to run wild and ground-breaking experiments as we birth this new art form.
Also, at least in the case of Nine Inch Nail's Year Zero and Cathy's Book, the ARG elements were not conceived as marketing, but as an inextricable part of the content. An album or a book was the spine of the experience, but the work of art itself was conceived as an interactive multimedia whole.
Q: Cathy's Book was targeted at a young adult (YA) audience. Do you think YA is a strong market for this kind of interactive entertainment? Would it be possible to engage even younger children?
A: Sean Stewart, Fourth Wall Studios Founder/Chief Creative: Cathy's Book and the new hardcover, Cathy's Key, are designed to be first and foremost a fun (and funny) adventure story. We've added a lot of "fourth wall" elements -- you can call Cathy's phone number and leave her a message, investigate clues she doesn't have time to investigate or write to email addresses you find in the book and see what responses come back to you. Cathy even hosts a gallery where readers can submit their own artwork -- the best of which will be published in the paperback of Cathy's Key. The basic impulse behind this series is to make books -- a traditionally passive, solitary activity -- something with an active, social component as well.
"Fourth Wall" fiction -- experiences that play out at least partly over your browser, your phone, your life -- feels somehow very right for this new age; it's a kind of storytelling that arises naturally from the world of three-way calls, instant messenger, text messaging, and shooting a friend an email with a link to something cool you saw on the Web. To that extent, it's going to feel the most natural to the people most comfortable with that kind of wired world.
When I was in New York last year, meeting with the publisher of Cathy's Book, my 12-year-old daughter emailed me a PowerPoint slide deck, complete with music and animations, explaining why I should get her a Mac laptop for Christmas. Yeah, I think her generation finds interactive entertainment more natural than mine. And yes, I think it would be not only possible, but really effective to build interactive, exploratory stories for even younger kids -- but to do that, we need to get away from the traditional ARGs willingness to be confusing. Most people like to have some clue what the heck they are supposed to do next. It won't surprise you to learn that this is another crucial design issue Fourth Wall Studios has set out to solve.
Q: Reading is usually a solitary pursuit, but there's an almost universal desire to "live" in some genres, whether it's idealized period romances, spy novels, or detective stories (murder mystery parties, especially popular in the 1980s, illustrate this). How important are traditional fiction genres in ARG? Can there be an element of role-playing involved? Are there genres that haven't been explored yet that have potential?
A: Sean Stewart: The first paid writing I ever did, actually, was for live action role playing games and murder mystery dinner parties in the '80s. I never would have guessed that writing for those things would turn out to be extremely important training for me, but in fact the intersection of writing and theater, where you try to find ways for the audience to participate in the story, lies at the heart, I think, of the next evolution in storytelling.
We believe that immersing yourself in a world is a fundamental part of what makes fiction fun. Any time I follow a character -- whether in a Jane Austen novel or a "Matrix" movie -- I am imagining what that must be like. One of the biggest pay-offs in an ARG is that you don't just imagine a fictional world, as in a book, or see it, as in a movie: you actually inhabit it. When I read a Harry Potter novel, I get to go to Hogwarts vicariously; when I play an ARG, I get to go myself. I am finding Web sites on my browser, I am talking to characters on my phone: the world of the fiction has reached out to me.
That proposition, by the way, shouldn't be limited by genre. ARGs have often had a thriller/science fiction slant to them, but even inside our games we've done romantic comedies, spy plots, documentary-style slice-of-life experiences, tragedies, and even Westerns. Fourth-wall fiction isn't about a given genre: it's a set of tools and approaches for letting the audience participate in any kind of story.
Q: What happens when the game is over? Is it possible to package up an ARG as a complete work (whether online or in print) to be experienced linearly? Or is the experience meaningless without real-time participation?
A: Elan Lee: Here's where I'm going to try to get as much mileage out of the "rock concert" metaphor as I can. There is no denying the electric energy present at a concert and there is absolutely no substitute for "being there." However, there are only so many available seats per venue, and only so many venues you can play before exhaustion sets in (both for the artist and the audience). For ARGs to evolve into a mainstream form of entertainment, they must create their own version of "albums" to complement the "concert." Don't get me wrong, I'm not saying we have to find a way to put a package around these things and call it a day; I only suggest that both pieces of the experience must exist for the real potential of the form to be realized.
Related Stories:
What Makes a Collaborative Writing Project Successful?
Liza Daly
May 13, 2008
| Permalink
| Comments (1)
|
Listen
Penguin's collaborative writing experiment A Million Penguins was launched in February 2007 and completed in March 2007. This month saw its final scholarly assessment published in a research report out of De Montfort University in Leicester, UK.
The results? Terrible, according to Gawker, echoing a consensus that the project failed as literature. As a study of online behavior, though, it's quite fascinating, and the research paper describes examples of all types of user contributions, from the grandiose and self-serving to the quietly constructive.
But if "every book needs its author," game-like fiction has been shown to be more amenable to collaboration. Each of Penguin's We Tell Stories pieces was co-written by interactive developers and a novelist. This month, the Guardian has launched a participatory interactive fiction project.
Although technically a type of computer game, interactive fiction has a long association with print authors, starting with the commercially successful adaptation of Douglas Adams' The Hitchhiker's Guide to the Galaxy (1984). In 2003 Adam Cadre (Ready, Okay!, HarperCollins, 2000) wrote the game Narcolepsy incorporating 12 dream sequences written by different authors (of which I was one). In a more experimental vein, the recent UpRightDown project released its first story, which generated submissions in multiple media, including some interactive works.
One lesson from these experiments is that while a work of fiction may not need a single author, it does need a single editor or authority to weave together disparate contributions and reject the obvious vandals. A unified final work has the potential to be a marketable product rather than a research project. (On the other hand, if the printed German Wikipedia sells, all bets are off.) Scale is important as well: two or even three dozen contributors are probably manageable; A Million Penguins had 1,700.
The Guardian's interactive fiction project is being managed using wiki software at textadventure.org.uk. The organizers are soliciting both programmers and non-technical writers. It is scheduled to run through at least the end of May.
Related Stories:
Iliad Book Edition E-Reader Coming to UK
Liza Daly
May 8, 2008
| Permalink
| Comments (0)
|
Listen
Just in time for our discussion on the ideal e-book reader comes a new product that will be the first e-reader sold in the United Kingdom.
Trading Wi-Fi for increased storage and an overall price drop, the iLiad Book Edition is a successor to the iLiad 2. Both use the same iRex e-ink technology and feature a tablet-based touch screen. There is no bundled online service or book store, but both iLiads have support for open formats such as PDF. 50 public domain books are preloaded.
Borders UK will sell the device in a small number of stores, and will launch an online ebook store shortly thereafter.
Unfortunately, even this "reduced" price of £399/€499 is unlikely to win over e-reader skeptics, especially without network connectivity. Buying books will always require tethering the device to a computer and completing the purchase over the Web.
Other iLiad Book Edition technical specs:
- 8.1-inch (diagonal) Electronic Paper Display
- 8.5 inch high x 6.1 inch wide, weight 15.3 ounces
- 768 x 1024 pixels resolution, 160 DPI, 16 levels of grey-scale
- File formats supported: PDF, HTML, TXT, JPG, BMP, PNG, PRC (Mobipocket)
- 128MB accessible flash memory; storage expandable via USB, MMC or CF cards
- Built-in stereo speakers and mini-headphone jack
- USB Connectivity to PC
- Optional external 10/100MB Ethernet networking via Travel hub
Related Stories:
Tutorial: Add AB Meta Tagging to Your Blog
Liza Daly
May 5, 2008
| Permalink
| Comments (1)
|
Listen
Many publishers use blogs to promote new products and engage customers. Dedicated blog readers will subscribe and receive every post, but the best way to reach a wider audience is still via search engines.
Embedding simple machine-readable code is a key component of the "semantic" Web, in which search engines don't just treat Web pages as a jumble of keywords, but instead can understand their meaning.
Technology firm Adaptive Blue has recently released a scheme for tagging books, movies and other media to enable search engines to label media products appropriately. Because Adaptive Blue's AB Meta is so new, there aren't yet dedicated tools for it. Fortunately, the scheme is very simple and re-uses basic Web tagging. Publishers can use this scheme -- today -- to enrich blogs and product pages.
Here we provide instructions for adding AB Meta content to a WordPress blog. Examples for integrating the format into other blogging software can be found in the description of AB Meta.
Using AB Meta with WordPress
- Download the HeadMeta plugin
- Unzip the plug-in and copy the headmeta folder to your wp-content/plugins directory.
- Enable the plug-in in the WordPress Plugin Management page (/wp-admin/plugins.php)
- When writing a new post, look under Advanced Options -> Custom Fields.
The Custom Fields form will allow you to set two items: a key and a value:
- The key will always be "head_meta".
- The value will be in the following general format:
name="an AB Meta field" content="the field's value"
Here's an example for a book title:

To qualify as AB Meta content, one field is required and should always be added:
name="object.type" content="book"
After that, you will add fields that are specific to your book content. Here are some examples from the Adaptive Blue site for the book The Kite Runner:
name="object.type" content="book"
name="book.title" content="The Kite Runner"
name="book.author" content="Khaled Hosseini"
name="book.isbn" content="1594480001"
name="book.year" content="2004"
name="book.link" content="https://books.com/1594480001.html"
name="book.image" content="https://books.com/1594480001.jpg"
name="book.tags" content="fiction, afghanistan, bestseller"
name="book.description" content="Story of an Afghan immigrant."
For WordPress, in the Custom Fields option, these would all be entered like this:
In the key field: head_meta
In the value field: name="object.type" content="book"
In the key field: head_meta
In the value field name="book.title" content="The Kite Runner"
... and so on, through all of the metadata fields to be included with the blog post.
What advantages are there to using AB Meta?
At the time of this writing, there are no applications that are specifically indexing AB Meta content. However, the scheme is quite simple, both for human and computer readers, and is likely to see widespread adoption. Tagging content with it now means that when these tools become available, you will already have significant inventory indexed. In addition:
- Many of the fields in AB Meta correspond to values in the Google Book Search API. This should make it trivial for Google to match articles about books to specific entries in Google Books, where customers can preview content before buying.
- It's likely that tools based on Amazon Web Services will be built on top of AB Meta to allow those tags to generate direct or affiliate links to the Amazon.com book store.
- Some XML-based workflows already store book metadata in the Dublin Core schema, and AB Meta supports Dublin Core directly.
- Simpler blog plug-ins that support or even can auto-generate AB Meta are certain to be developed.
So get tagging! In the meantime we'll continue to monitor progress of AB Meta in terms of adoption and tools.
Related Stories:
Ebook Format Primer
Liza Daly
April 21, 2008
| Permalink
| Comments (7)
|
Listen
The simplest solution, of course, is to partner directly with the ebook manufacturers and let them take care of the details. These partnerships must be drawn up for each new platform and publishers are at the whims of the device-makers' terms of use. Innovative publishers may want to first experiment on their own and be prepared to shift platforms strategically: this means ebook distribution must fit into existing workflows. Although some of the formats below support digital rights management, consider eschewing DRM in favor of flexibility and cross-platform support.
Let's start with the major devices first:
- The Sony Reader primarily uses Sony's proprietary Broadband eBooks (BBeB) format for documents with DRM but also supports RTF and non-DRM PDF. Sony does not provide any official tools for end users to convert to BBeB although at least one unofficial open source tool can convert HTML to BBeB. The most flexible non-DRM formats are RTF and PDF. Microsoft Word can readily save to RTF and Microsoft offers detailed instructions on converting from XML to RTF, but pure open-source alternatives are not mature. XML to PDF conversion has stronger open source support but files may need to be specially tweaked for optimum display on the Reader.
- The Amazon Kindle uses Amazon's proprietary AZW format, which supports DRM. There are no tools available to directly convert to AZW, but AZW is a wrapper around the Mobipocket format and DRM-free Mobipocket files can be read on the device. Mobipocket documents can be created using a free (but not open-source) tool called Mobipocket Creator. As if the format wars weren't confusing enough already, "Mobipocket DRM" is not the same as AZW, and files created as Mobipocket DRM cannot be read on the Kindle. Mobipocket Creator does have a "batch" creation mode which could be integrated into an existing workflow, but the software is Windows-only. The Kindle also supports HTML and Word documents, but not PDF.
Specialized readers aren't the only way consumers may be viewing ebook content. Ultra-portable laptops like the Eee PC and OLPC XO are price-competitive with standalone readers. (I have an OLPC and reading by the pool in bright sunlight is quite a joy.) The next version of the iPhone is expected soon, and while the first edition was already a serviceable reader, the next version is likely to be more so, and to reach a wider audience.
All the devices listed above, except the Sony Reader, can read a common format: HTML. If XML is already a part of your workflow, converting to HTML is trivial. If not, HTML is a worthwhile investment for a number of reasons:
- XHTML is the standard markup for book content in OPS/.epub. .epub support is just getting off the ground but is expected to become widespread.
- If your publishing workflow includes HTML, your organization is able to distribute content to dozens of devices in addition to the open Web.
HTML is also the lingua franca of online search engines, and inclusion of partial or full HTML books will attract casual surfers and can drive community engagement with your content. Whether it's BBeB or AZW that becomes the Betamax of the next decade (and one, if not both, will be obsolete by then), HTML conversion is guaranteed to pay off in the foreseeable future.
Related Stories:
- Stay Connected
-
TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. Follow TOC on Twitter. Join the TOC Facebook group. Join the TOC LinkedIn group. Get the TOC Headline Widget.
- Search
-
- Events
-
TOC Online Conference
Join us on October 8th for this half-day online conference to explore the state of the art of electronic publishing.
- TOC In-Depth
-
Impact of P2P and Free Distribution on Book Sales
This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.
The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.
Dive into the skills and tools critical to the future of publishing. Learn more.
- Tag Cloud
- TOC Community Topics
-
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
O'Reilly Media Home | Privacy Policy | Community | Blog | Directory | Job Board | About