CARVIEW |
Tools: October 2008
Harvard Won't Permit Google Scans of In-Copyright Material
Mac Slocum
October 31, 2008
| Permalink
| Comments (1)
|
Listen
Harvard University Library (HUL) has been a partner in Google's library scanning project since 2004, but the boundaries of that partnership will not expand to the in-copyright works covered under Google's new Book Search settlement. From the Harvard Crimson:
In a letter released to library staff, University Library Director Robert C. Darnton '60 said that uncertainties in the settlement made it impossible for HUL to participate.
"As we understand it, the settlement contains too many potential limitations on access to and use of the books by members of the higher education community and by patrons of public libraries," Darnton wrote.
"The settlement provides no assurance that the prices charged for access will be reasonable," Darnton added, "especially since the subscription services will have no real competitors [and] the scope of access to the digitized books is in various ways both limited and uncertain."
The Crimson notes that Harvard will continue to allow scanning of books with expired copyrights.
(Via Jose Alonso Furtado's Twitter stream)
Related Stories:
New Project Examines Close Reading and Web Collaboration
Mac Slocum
October 31, 2008
| Permalink
| Comments (2)
|
Listen
On Nov. 10, Doris Lessing's The Golden Notebook will be read and discussed by seven readers in a new experiment that explores "close reading" and the mechanisms of online conversation.
The project is the brainchild of Bob Stein, founder of Institute for the Future of the Book. Stein outlined the project's goals in an email announcement:
Fundamentally this is an experiment in how the web might be used as a space for collaborative close-reading. We don't yet understand how to model a complex conversation in the web's two-dimensional environment and we're hoping this experiment will help us learn what's necessary to make this sort of collaboration work as well as possible.
The seven readers will discuss the book through margin notes and a group blog, and a public forum will be available for others to join the conversation. Further details are available through the project site.
Related Stories:
New York Times Movie Reviews Released as API
Peter Brantley
October 30, 2008
| Permalink
| Comments (3)
|
Listen
The New York Times has released an application programming interface (API) to its movie reviews, which is a rather significant feature. From the Times' Open blog:
Finally -- and this is the key -- we're giving you access to our Movies search feature, containing all 22,000 reviews indexed by title, reviewer's name, director's name, names of the top five actors, and plot keywords. So, if you'd like to build a list of what The New York Times thinks of Pedro Almodóvar or Lindsay Lohan, we've got you covered. And this is only the beginning: in the next few weeks we'll be rolling out better lookup and search features that will let you call up reviews based on publication date or the movie's release date, just to name two.
The Times also released campaign finance and metadata APIs earlier this month.
Related Stories:
Analytics: Are Streams the New Hits?
Mac Slocum
October 30, 2008
| Permalink
| Comments (0)
|
Listen
Web analytics folks have been trying for years to remove the term "hits" from the analytics lexicon because it's an inherently flaky measurement (one Web page could theoretically yield hundreds of hits). That same flakiness has unfortunately infiltrated another measurement tool: "streams," a key metric for online video.
An off-hand mention in a New York Times article reveals cracks in the "stream" definition:
Despite all the experimentation, it is still difficult to know exactly how many viewers are watching individual TV shows and movies online. Hulu ranks its most popular content, but unlike YouTube it doesn't show the view count for each video. Still, it is clear that millions of viewers are watching some shows online. The Season 3 premiere of "Heroes" in September was streamed 8.1 million times on Hulu and NBC.com, according to the network. (All online streams are not counted as equal, because on NBC.com each segment of an episode is counted as a stream, so a full episode could count as six streams. On Hulu, one episode equals one stream.) [Emphasis added.]
This is a problem. Most digital content models rely on advertising as a revenue stream, and ad rates are generally associated with key analytics (impressions, page views, unique users, streams, clicks, etc.). Redefining a common metric puts the entire industry in flux because advertisers rarely buy inventory on one site. Now they'll need to monitor both their active campaigns as well as variations in campaign metrics (ie -- is this a Hulu stream or an NBC stream?). The last thing digital content needs is more complexity.
Related Stories:
Reaction to Google Book Search Settlement
Mac Slocum
October 29, 2008
| Permalink
| Comments (6)
|
Listen
Updated 10/30, 7:53 AM -- Publishing experts, bloggers and interested parties are weighing in on the Google Book Search settlement. I'll be updating this post as new material comes in. If you see something that deserves notice please post a comment:
Posts Added October 30
On the Google Book Search agreement
(Larry Lessig, Lessig Blog)
The hard question for the registry is how far they will go to support the range of business models that authors and publishers might have. E.g., Yale Press "Books Unbound" and Bloomsbury Academic both have Creative Commons licensed authors. Will the registry enable that fact to be recognized? Indeed, though the comment was made by someone from the plaintiffs' side that it would be "perverse" for authors to choose free licensing, it is perfectly plausible that an author would choose to make his or her work available freely electronically, but contract with one commercial publisher to deal with selling the physical book, or licensing rights commercially. That, again, is the Bloomsbury Academic business model. Ideally, this non-profit should encourage the widest range of rights-respecting business models. One clear signal about what kind of organization this is will come from this.
Posts Added October 29
My initial take on the Google-publishers settlement
(Siva Vaidhyanathan, The Googlization of Everything)
From the beginning, this has seemed to be a major example of corporate welfare. Libraries at public universities all over this country (including the one that employs me) have spent many billions of dollars collecting these books. Now they are just giving away access to one company that is cornering the market on on-line access. They did this without concern for user confidentiality, preservation, image quality, search prowess, metadata standards, or long-term sustainability. They chose the expedient way rather than the best way to build and extend their collections.
Short Term Profits Over Long Term Principles; Google's Caving On Book Scanning Is Bad News (Mike Masnick, Techdirt)
... it's quite upsetting to see Google cave on this. The settlement does not establish any sort of precedent on the legality of creating such an index of books, and, if anything pushes things in the other direction, saying that authors and publishers now have the right to determine what innovations there can be when it comes to archiving and indexing works of content. Unfortunately, this was really inevitable. As was the case with Google caving on YouTube and the Associated Press, it becomes a situation where Google realizes it can throw a little cash at the problem to make it go away -- while also creating a large barrier to entry for any more innovative startup. From a short-term business perspective this might make sense, but from a long-term business perspective (and wider cultural perspective) it's terrible.
Google Book Search Lawsuit Settled, Fair Use Questions Remain ... (Sherwan Siy, Public Knowledge)
But while the legal landscape isn't altered too much by the settlement, the practical landscape could be. Rightsholders and other potential plaintiffs might view this settlement as the model for all future relationships with digitization efforts--if Google pays for digitizing, why shouldn't everyone else? Such a landscape might make a plaintiff more likely to sue, although the results in court, ideally, shouldn't differ, with or without this settlement in place.
Boondoggle in Google Rights Win? (Warning, Rant) (Erik Sherman, Erik Sherman's WriterBiz)
Going forward, people will buy books they want online and libraries will pay for access. Who gets 37 percent of the revenue? Google. Plus, there's advertising revenue and Google gets the same percentage of that. So for $125 million, it's probably nailed down many, many times more future revenue. This will turn out to be a pretty cheap business acquisition for them.
Author's Guild Settlement Insta-Blogging (James Grimmelmann, The Laboratorium)
Read more…The issue is that this is a class-action settlement requiring judicial approval to bind all authors. It's practically impossible for anyone else to take advantage of Google's terms without filing suit to obtain a similar class-binding order. Individual license negotiation -- the route that Google considered and rejected when it started the project -- is utterly infeasible. Since voluntary negotiation can't produce the result one needs to do comprehensive indexing, there's still no market for it, and this settlement therefore shouldn't prejudice future fair use claims by search engines.
Recommended Reading on XML and Publishing
Andrew Savikas
October 27, 2008
| Permalink
| Comments (3)
|
Listen
While clearing out some old files, I came across a folder of articles culled during research about three years ago, while I was building the case for increasing our use of XML for book production. If you're looking to take a break from the steady stream of terrifying financial news, here's a few hours of time well-spent on angle brackets. Much of this skews fairly technical (including actual math), but there's some useful context to an XML conversation:
When Word-to-XML conversions get nasty from Mike Gross at Data Conversion Laboratory. "Before you begin a conversion, look through your source Word documents to see how well they were formatted but be prepared you may be horrified with what you find."
From the Journal of Digital information, a paper by Terje Hillesund, Many Outputs -- Many Inputs: XML for Publishers and E-book Designers. Terje takes a contrarian view on XML, though specifically calls out what many trade publishers primarily deal with as well-suited for XML: "For many typographically simple genres, like most present fiction, reuse has already proved to be relatively easy ... In the future, XML-based workflows will make re-use of many fiction genres even easier, as these visually and navigationally uncomplicated texts can be made into a variety of paper and electronic editions from the same XML document by use of style sheets ..."
The response to Hillesund from XML guru Norm Walsh, XML: One Input -- Many Outputs: A response to Hillesund. "Before considering the flaws in each of [his] arguments, it is interesting, if slightly incongruous to his arguments, to note that Hillesund's paper includes no less than four examples of the successful use of XML precisely for the publication of multiple output formats from a single input document. "
A fascinating paper from 1998, On the Pagination of Complex Documents, which discusses the challenges inherent with automated pagination of the kind found in many XML-based rendering systems (as well as older systems such as LaTeX). "Using competitive analysis we show that, under realistic assumptions, not only first-fit but any online pagination algorithm may produce results that are arbitrarily worse than necessary. This explains why so many people are not satisfied with paginations produced by LaTeX if no manual improvement is done"
It hasn't been updated since 2005, but Choosing an XML editor, from Thijs van den Broek offers a nice survey of XML editors. "The study consisted of a literature search, surveys to identify user needs, current usage, existing editors, and (existing and desired) features of editors, as well as an evaluation exercise."
Here at O'Reilly our workflow is centered around DocBook XML, but DITA (Darwin Information Typing Architecure) is a more recent XML vocabulary, also designed primarily for technical information. IBM developerWorks has a nice overview, Introduction to the Darwin Information Typing Architecture. "This document is a roadmap for the Darwin Information Typing Architecture: what it is and how it applies to technical documentation. It is also a product of the architecture, having been written entirely in XML and produced using the principles described here."
Written from the perspective of a technical documentation group at Cisco, Low-Cost, Flat-File XML for the Masses is an interesting case study from a team committed to finding a way to use XML that was both better for writers and didn't require a large investment in new software: "You can realize the benefits of publishing from modularized XML, without the expense of an enterprise publishing system, by implementing the authoring environment on top of nothing more than your operating system's file system. Although this environment is not adequate for enterprise publishing needs, it is more than adequate for the needs small writing teams, businesses with a limited number of related products, proof-of-concept demonstrations, and even home users."
Related Stories:
BBC Shifts Conversation Style: Go Where They're Already Talking
Peter Brantley
October 13, 2008
| Permalink
| Comments (0)
|
Listen
I think this deserves to be pondered. BBC News is moving away from merely hosting comments to inciting discussion in a variety of formats and locations. From Reportr.net:
For the US presidential debates, it [the BBC] has opened channels on video services Qik, 12Seconds and Phreadz. Some of the videos were subsequently edited and posted on the BBC News website.
The purpose, explains [BBC Editor] Matthew Eltringham, is "to join in conversations wherever they were happening rather than expect people to come to us and host them on the BBC's platforms."
This is a major change in the BBC's approach to user-generated content. It signals a shift away from the idea that the BBC should host the conversation. [Link added]
Related Stories:
Overestimating the Home Page
Mac Slocum
October 13, 2008
| Permalink
| Comments (0)
|
Listen
Brett Crosby from Google Analytics says a home page is often mistaken as the most important part of a Web site. From TechRadar:
Where are your visitors landing, bouncing, and viewing? It's often assumed user experience begins on the homepage, and this misconception drives many an ecommerce site to waste hours of design work in the wrong place. Search engines dig deeper into ecommerce sites, bringing visitors to not just 'electronics', but also televisions, MP3 players or sat navs. Analytics data will tell you where your real 'homepages' reside, so you can focus your design work there.
Crosby's point applies to content-based sites as well. Visitors often enter through an individual story page or blog post, not the home page. This is why there's value in serving up related posts, embedded links and call-outs to other features and tools on story-level pages.
(Via Jeremiah Owyang's Twitter stream)
Related Stories:
Watch the YouTube Video, Buy the Product
Mac Slocum
October 8, 2008
| Permalink
| Comments (1)
|
Listen
YouTube's Content ID service, something we've covered in the past, gives publishers two options for handling unauthorized videos: the material can be removed from YouTube or it can be turned into advertising/revenue opportunities.
An article in today's New York Times shows which option Google prefers -- Content ID can now be used to associate "click-to-buy" links with video clips:
Music labels could choose to place the e-commerce links next to their own videos or on videos uploaded by users, whose images or soundtrack they identified using YouTube's Content ID system, which allows content owners to find unauthorized material on the site.
Click-to-buy links are shown below the video player on YouTube pages. It's unclear if this functionality will be integrated into videos embedded on external sites since this would require some sort of revenue share between the content owner, YouTube, the retailer and Web sites that publish embedded clips.
Links are currently limited to iTunes and Amazon products and are only viewable by U.S. visitors. YouTube says expansion plans are in the works.
Related Stories:
- PaidContent.org: "YouTube Adds Affiliate Links To Its Videos; Amazon and iTunes Downloads"
- Piracy and Advertising: An Unlikely Union that Just Might Work
- Official Google Blog: "Making Money on YouTube with Content ID"
- New York Times: "Google to Sell Ads for Web Games"
- Treating Ebooks Like Software
- Free Ebooks with Embedded Ads Via Scribd-Lulu Partnership
Amazon Launches UK POD Service; Partner Unknown
Mac Slocum
October 7, 2008
| Permalink
| Comments (0)
|
Listen
TheBookseller says Amazon is launching a print-on-demand service in the United Kingdom:
Amazon.com owns POD publisher BookSurge in the US, but the UK business has not divulged who will be handling the printing of POD titles in the UK.
In April, a spokesperson for Amazon.co.uk said the company -- at that time -- had no plans to bring BookSurge to the UK.
Related Stories:
The Confusion Between Content and Containers
Mac Slocum
October 6, 2008
| Permalink
| Comments (0)
|
Listen
The digital realm allows content and containers to exist separately, but their old bond is still tough to break. An article in yesterday's New York Times education section illustrates this point:
Spurred by arguments that video games also may teach a kind of digital literacy that is becoming as important as proficiency in print, libraries are hosting gaming tournaments, while schools are exploring how to incorporate video games in the classroom...
... But doubtful teachers and literacy experts question how effective it is to use an overwhelmingly visual medium to connect youngsters to the written word. They suggest that while a handful of players might be motivated to pick up a book, many more will skip the text and go straight to the game. Others suggest that video games detract from the experience of being wholly immersed in a book.
The problem with this thinking is that it only assigns "literacy" value to books. Certainly, books are an essential learning tool and students should be exposed to them early and often, but if the goal is to improve literacy -- i.e. "being able to read and write" -- then the argument against games falls apart. A game-based project that boosts reading and writing skills in even a small percentage of children is still worthwhile, especially if it's one initiative amidst a broader literacy effort.
The anti-game contingent noted in the Times piece is falling into a familiar trap: assigning value to a container instead of content. The container trap was innocuous in years past because the audience (consumers, students, etc.) was limited to passive acceptance of a few choices. Now that digital delivery empowers audiences to naturally gravitate toward material they deem worthwhile, shoehorning people into a particular form diverges from bigger goals. If you want to accomplish something -- be it literacy improvement or creation of sustainable revenue streams -- you need to go with the audience grain, not against it.
Related Stories:
Getting Some Perspective on Cloud Computing
Mac Slocum
October 2, 2008
| Permalink
| Comments (0)
|
Listen
Richard Stallman, creator of the GNU operating system and founder of the Free Software foundation, is no fan of cloud computing. From The Guardian:
"One reason you should not use web applications to do your computing is that you lose control," Stallman said. "It's just as bad as using a proprietary program. Do your own computing on your own computer with your copy of a freedom-respecting program."
Stallman's comments have inspired a host of counter arguments, including some nice publishing-centric analysis from Adam Hodgkin at Exact Editions:
This obsession with self-sufficiency and self-reliance, veers in the direction of paranoia. You don't necessarily lose control if you outsource a service, especially if there is competition between various service providers. I am sure that there are dangers with a model of cloud computing in which only one company provides a platform for published books (that company would at the moment look like being Google) but there is really no reason why only one company should host and serve print in the cloud.
Stallman took a provocative route to an important caveat: a wholesale transfer to the cloud could bring unwanted repercussions, such as lock in or -- if things go horribly awry -- lock out. But, to Hodgkin's point, publishers who carefully consider their needs may find significant value in cloud toolsets. Dismissing the cloud outright is just as egregious as blindly committing.
Related Stories:
- Matthew Ingram: "Hey hey, you you -- get off of my cloud"
- Ars Technica: "Why Stallman is wrong when he calls cloud computing stupid"
- Mashable: "If Web Apps are Evil, Why Do We Use Them?"
- Tim O'Reilly: "Open Source and Cloud Computing"
- Cloud Computing's Potential Impact on Publishing
- The Kindle, the Cloud and Mixed Signals
Balancing the Benefits and Costs of XML for Book Production
Andrew Savikas
October 1, 2008
| Permalink
| Comments (1)
|
Listen
O'Reilly engineer and XML guru Keith Fahlgren kicked off a lively conversation on an internal mailing list this week by asking whether (and how much) we're "eating our own dogfood" in terms of Tim O'Reilly's recent post about IT.
Along the way, XML.com editor Kurt Cagle weighed in with his thoughts on the importance of an XML workflow (specifically one that plays nicely with his needs running a destination Web site):
Overall, I'd like to see us move to an all XML pipeline, not because I'm the XML editor (I'm actually writing more economics articles of late than anything) but because I think that a cohesive XML workflow provides us with the cleanest implementations that we can have, and ironically it's the one type of flow that may actually make it easier us to work with the content without needing to break open the content to do tedious search and replace operations. It provides the best reuse story -- it's a relatively simple proposition to convert a DocBook publication into an embedded Web block, for instance -- and it integrates well with feed production.
O'Reilly Publishing Services Manager Adam Witwer responded, and included some critical lessons learned about the challenges with moving to XML:
Over the past year or so, we in publishing services have adopted an all DocBook XML pipeline for several of the main book series (Animal, Cookbook, Theory in Practice, In a Nutshell, etc.). Retraining staff has been a huge challenge. From a technical perspective, developing the XSL-FO has taken (and continues to take) lots of time and iterations. But the biggest challenge has been convincing others that the small sacrifices that come with an XML workflow are worth it. We have less control over things like page layout in a book, and certain style elements that are easy in InDesign or Frame are difficult to replicate with XSL-FO stylesheets. For us in publishing services, those things seem like small trade-offs for the gain of having a single set of source files that are much easier to reuse, most notably on Safari, and to update. This ceases to be an issue when the stylesheets get to be nearly indistinguishable from the InDesign/Frame templates on which they are based, so that's what we've tried to do, and we've transitioned away from the traditional page layout programs and general approach to book production.
It's worth noting that this all applies to what happens after we receive a manuscript, many of which are still being written in Word. There's a lot that can be done without ever opening the can of worms that is authoring and in XML.
Related Stories:
- Stay Connected
-
TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. Follow TOC on Twitter. Join the TOC Facebook group. Join the TOC LinkedIn group. Get the TOC Headline Widget.
- Search
-
- Events
-
TOC Online Conference
Join us on October 8th for this half-day online conference to explore the state of the art of electronic publishing.
- TOC In-Depth
-
Impact of P2P and Free Distribution on Book Sales
This report tests assumptions about free digital book distribution and P2P impact on sales. Learn more.
The StartWithXML report offers a pragmatic look at XML tools and publishing workflows. Learn more.
Dive into the skills and tools critical to the future of publishing. Learn more.
- Tag Cloud
- TOC Community Topics
-
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
O'Reilly Media Home | Privacy Policy | Community | Blog | Directory | Job Board | About