CARVIEW |
Liza Daly: August 2008
How to Read any Type of Document on the Kindle (Almost)
Liza Daly
August 26, 2008
| Permalink
| Comments (5)
|
Listen
There are a few options for readers who want to convert PDFs or other non-supported files to the Kindle's AZW format. Amazon's recommended method is to email the file to your personal Kindle email address. It's also possible for users to convert PDFs and other document types themselves using Mobipocket Creator or Stanza.
All of the above methods have the same flaw: AZW does not support the kind of advanced layout available in formats like PDF, and non-Latin fonts aren't easy to convert. What if you need to review a complex legal form, or read a graphic novel, or one in Chinese? A hidden feature can help.
The Kindle has an undocumented picture-viewing mode that was first uncovered by Igor Skochinsky. Although the black and white E Ink screen is not especially good at displaying actual photographs, it is quite good at rendering line art and text.
Here's how to do it, using PDF as an example. Note that unofficial features may be buggy and could damage your Kindle; proceed at your own risk.
- Convert the PDF to a series of images. Commercial versions of Acrobat should be able to do this in batch, but users of free readers may have to convert a page at a time. The Kindle can read JPEG, PNG and GIF; the latter two will work best. Because the picture-viewing application doesn't support a table of contents, you'll need to name the image files in ascending alphabetical or numeric order (e.g. "0001.jpg," "0002.jpg," etc.) For best results, resize the image to 600 x 800, the resolution of the Kindle screen.
- Connect the Kindle to your computer using the USB cable. Once connected, browse to the Kindle's drive. If you have an SD card installed that will appear on your computer as well. The following procedure works on either the Kindle or the SD card. I prefer to do everything on the SD card -- it feels safer.
- Create a folder called "pictures," and a folder inside of that with the name of your "document." Put the images in the document folder. Disconnect the Kindle from the PC. When you go to the Kindle's home screen, nothing will have changed. This is where the secret feature comes in:
- Press Alt-Z from the home screen. Your book title should appear in the list.
- Click on the book title. It will open the first image. Use the normal Kindle next/previous buttons to page through the "book." The picture viewer has menu options of its own to control the size of the image and how it's rendered.

Credit: octopus pie
Of course because the "PDF" is really an image it's not possible to search the document or rescale the fonts. Text-heavy PDFs should be converted in one of the recommended ways.
This same technique can be used to load image-based documents directly, such as comics. (Peeking inside the "pictures" folder after it's been read by the Kindle reveals a file with the extension manga, suggesting that the picture viewer was intended to be used for this purpose).
It's also possible to convert documents in Russian, Chinese or other non-Latin scripts this way. The Kindle does have support for embedded non-Latin fonts as part of its "Topaz" file format, but there are no tools for end-users that output Topaz.
(Screenshots courtesy the undocumented Alt-Shift-G feature, which saves to the root of the SD card.)
Related Stories:
Optimizing Web Content for the Kindle Browser
Liza Daly
August 13, 2008
| Permalink
| Comments (0)
|
Listen
Amazon's Kindle store is convenient, easy-to-use and stocked with thousands of titles.
But what about publishers and content distributors who want to reach the
estimated 240,000 Kindle users without going through Amazon's program? And what about content formats that the Kindle does not directly support?
One selling point of the device is its free, ubiquitous Internet service and Web browser. Amazon has filed the browser under "Experimental" but it's quite usable as-is. With a few simple changes to a Web site's HTML code, it's even possible to specially cater to Kindle users.
The screenshots used in this article are from the mobile version of Bookworm, my Web application for reading ebooks in the EPUB format. Although what's being displayed is ebook content, it's being delivered by the Kindle's browser, not the Kindle ebook technology, which does not yet support EPUB.
Because the mobile Web version is already heavily optimized for small devices, the layout is simpler than a traditional Web site. What works for an iPhone or other wireless device will also be a good starting point for the Kindle, although we'll see there are some special considerations that don't apply to any other device.
Default or Advanced Mode?
When the Kindle ships, its Web browser is in "default mode." It will not load images or CSS styles, but it does render basic HTML tags like the italic tag <i>. Personally, I prefer "advanced mode," which displays Web pages more like a traditional browser, but some sites can be unreadable in this mode.
When optimizing for the Kindle it's best to consider that most users will not change from "default mode," or even realize that the option exists.
How different are these modes? Here is a comparison shot of the same screen from Bookworm in both modes:
![]() |
![]() |
My list of books in Advanced mode, showing tabular layout and more advanced font styles | My list of books in Default mode |
In Default mode, all the information about the books runs together. It would be better to present this as a simple vertical list, the way the Amazon Kindle store does, rather than as a table.
Font Size Considerations
You can choose from six font sizes in the Kindle browser. As a content creator, you can provide a wider range of font sizes in your Kindle-formatted Web page, but take care that they aren't too small. The device doesn't clearly display fonts that are smaller than its default six sizes.
In this screenshot, the table of contents for a Bookworm book is not readable, even though this page has already been tailored for the small display of mobile phones:
This problem is only likely to occur in Advanced mode where stylesheets are activated.
Usability
The Kindle's method of selecting and traversing hyperlinks is unique. The user activates links by selecting along the vertical, or Y-axis, using the scroll wheel. When multiple links fall on the same line, the Kindle will open a dialog box so the user can clarify which link is the target.
In Bookworm, users move to the next or previous chapter by selecting navigation links lined up horizontally (see the top row of the first image). In the Kindle, this presentation forces the user to click a second time to select the appropriate one:
For commonly-used navigational items like this, line up the links in a vertical row:
- Next
- Contents
- Previous
Now no second click (and accompanying page refresh) is necessary.
It's also important to remember that the Kindle is a black-and-white device. If your site uses text color to convey any useful information (such as what is or is not a hyperlink), re-work the design to accommodate a grayscale display.
Finally, keep pages short. The Kindle cannot scroll; long Web pages are paginated like books. Pagination with E Ink devices is slow relative to scrolling on a computer screen. If possible, keep all your content on the first Kindle "page" when viewed at the default font size.
Targeting the Kindle
Web browsers are identified using their "user-agent" string. The current
version of the Kindle is broadcasting this user-agent:
Mozilla/4.0 (compatible; Linux 2.6.10) NetFront/3.3 Kindle/1.0
(screen 600x800)
.
It's beyond the scope of this article to describe how to set up your
Web site to deliver different kinds of content to different browsers,
a process that varies considerably with your site's technology.
How do you test your layout if you don't have a Kindle? There's no substitute for having the real device (tell your boss it's for "research"), and currently Amazon does not offer any kind of browser emulator. Some possibilities:
- Disable stylesheets on your browser and look at the output. Does it still make sense? (Instructions for disabling stylesheets; Firefox users should install the Web Developer add-on)
- Use a text-only browser like Lynx
Some Last Advice
Don't spend too much time on this process. The next version of the Kindle is expected soon, no doubt with an improved browser. Indeed, Amazon could offer a new version of the existing browser at any time. Most of the changes recommended above should take little time and money to implement, and can make a great difference in user experience.
In addition, optimizing your site for small-screen browsers can have other benefits: they allow an increasing number of mobile users to get quick access to your content, and aid accessibility for screen-readers and other non-standard browser types.
Related Stories:
Processing the Deep Backlist at the New York Times
Liza Daly
August 1, 2008
| Permalink
| Comments (0)
|
Listen
At the O'Reilly Open Source Convention (OSCON), Derek Gottfrid of the New York Times led a fascinating session on how the Times was able to utilize Amazon's cloud computing services to quickly and cheaply get their huge historical archive online and freely viewable to the public.
How big is the archive? Eleven million individual articles from 1851 to 1980, or 4 terabytes of data (over 4,000 gigabytes). The Times got it ready for distribution in 24 hours, for a total cost of $240 in computing fees and $650 in storage fees.
As part of their original TimesSelect subscription service, the paper had scanned their entire print archive. Each full-page scan was cut into individual articles. Typical of newspaper format, the articles often spanned column or page boundaries, which meant that many articles were composed of several scans. In the original subscription-based program, whenever a reader requested one of these historical articles, the Times computer would need to stitch together all of the scans for a particular article before presenting it.
This on-demand process used significant computing resources, but because TimesSelect was subscription-based there was never much traffic. Once this archive was open to the public it was expected to generate greater usage, and the safest approach in those cases is to serve pre-generated versions of all 11 million articles. Using traditional software development practices -- with a single computer churning through one article at a time -- the processing could potentially take weeks and tie up Times servers that were needed for other tasks.
Gottfrid turned to Amazon Web Services (AWS) and its two main products:
Amazon Elastic Compute Cloud (EC2) is a form of "virtualization" where one very large computer is divided up into many virtual computers that can be individually leased out for use. Traditional hosting costs money whether the server is working or idle; in EC2 you pay only as long as the virtual computer is running. When it's no longer needed, it's shut down. This makes the service ideal for one-off processing jobs.
In addition, Amazon doesn't care whether you use one EC2 "instance" 100 times, or 100 instances all at once -- the cost is the same. The difference is when you can usefully divide a job into 100 concurrent tasks, because then it takes 1/100th the total time.
Amazon's other major AWS offering is the Simple Storage Service (S3), for large-scale file hosting. Like EC2, it is a leased model -- you pay only for the space that you use in a given time period.
Gottfrid leveraged these technologies in combination with a relatively new software library called Hadoop. Hadoop is written in the Java language and is based on work done at Google. It allows programmers to very easily write programs that can be run simultaneously on multiple computers.
Combining Hadoop concurrency with EC2 and S3, the Times was able to run a job that might have taken weeks of processing time and complete it in 24 hours, using 100 EC2 instances. They were pleased enough with S3 it became their permanent hosting platform for the scans. Hosting with Amazon or other cloud computing services is usually cheaper and has much better bandwidth than the average provider, although downtime can and does occur.
At last year's OSCON, the Times announced the formation of its developer blog, Open. You can read more about the original AWS project as well as TimesMachine, a project that became economically feasible due to the low cost of AWS.
Related Stories:
- Stay Connected
-
TOC RSS Feeds
News Posts
Commentary Posts
Combined Feed
New to RSS?
Subscribe to the TOC newsletter. Follow TOC on Twitter. Join the TOC Facebook group. Join the TOC LinkedIn group. Get the TOC Headline Widget.
- Search
-
- Events
-
StartWithXML One-Day Forum
Jan. 13, 2009, New York - StartWithXML is an industry-wide project to understand and spread the knowledge publishers need to move forward with XML. Learn more and register to attend.
Tools of Change for Publishing Conference
Registration is open! TOC 2009 will take place Feb. 9-11, 2009 at the Marriott Marquis in New York City. Early registration discount available until Dec. 18.
- TOC DVDs
-
TOC 2008 Tutorial DVDs
Now available. These tutorials dive into the necessary skills and tools critical to the future of publishing.
- Tag Cloud
- TOC Community Topics
-
Tools of Change for Publishing is a division of O'Reilly Media, Inc.
© 2008, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
O'Reilly Media Home | Privacy Policy | Community | Blog | Directory | Job Board | About