CARVIEW |
Four short links: 14 May 2009
Open Source Ebook Reader, Libraries and Ebooks, Life Lessons, and Government Licenses
by Nat Torkington | comments: 1
- Open Library Book Reader -- the page-turning book reader software that the Internet Archive uses is open source. One of the reasons library scanning programs are ineffective is that they try to build new viewing software for each scan-a-bundle-of-books project they get funding for.
- Should Libraries Have eBooks? -- blog post from an electronic publisher made nervous by the potential for libraries to lend unlimited "copies" of an electronic work simultaneously. He suggests turning libraries into bookstores, compensating publishers for each loan (interestingly, some of the first circulating libraries were established by publishers and booksellers precisely to have a rental trade). I'm wary of the effort to profit from every use of a work, though. I'd rather see libraries limit simultaneous access to in-copyright materials if there's no negotiated license opening access to more. Unlike the author, I don't see this as a situation that justifies DRM, whose poison extends past the term of copyright. (via Paul Reynolds)
- Lessons Learned from Previous Employment (Adam Shand) -- great summary of what he learned in the different jobs he's had over the years. Sample:
- More than any other single thing, being successful at something means not giving up.
- Everything takes longer than you expect. Lots longer.
- In a volunteer based non-profit people don't have the shared goal of making money. Instead every single person has their own personal agenda to pursue.
- Unfortunately "dreaming big" is more fun and less work than "doing big".
- Flickr Creates New License for White House Photos (Wired) -- photos from the White House photographer were originally CC-licensed (yay, a step forward) but when it was pointed out that as government-produced information those photos weren't allowed to be copyright, the White House relicensed as "United States Government Work". Flickr had to add the category, which differs from "No Known Copyright", and it's something that all sharing sites will need to consider if they are going to offer their service to the Government.
tags: business, copyright, creative commons, drm, ebooks, flickr, gov2.0, government, libraries, life hacks
| comments: 1
submit:
Credit card company data mining makes us all instances of a type
by Andy Oram | @praxagora | comments: 2
The New York Times has recently published one of their in-depth, riveting descriptions of how credit card companies use everything they can learn about us. Any detail can be meaningful: what time of day you buy things, or the quality of the objects you choose.
The way credit collectors use psychology reminds me of CIA interogators (without the physical aspects of pressure). In fact, they're probably more effective than CIA interogators because they stick to the basic insight that kindness elicits more cooperation than threats.
So who gave them permission to use our purchase information against us? What law could possibly address this kind of power play?
There's another disturbing aspect to the data mining: it treats us all as examples of a pattern rather than as individuals. Almost eleven years I wrote an article criticizing this trend. The New York Times article shows how much we've lost from what we consider essential to our identity--our individuality.
tags: bill collectors, credit cards, data mining, data retention, mining, privacy
| comments: 2
submit:
Google's Rich Snippets and the Semantic Web
by Tim O'Reilly | @timoreilly | comments: 6There's a long-time debate between those who advocate for semantic markup, and those who believe that machine learning will eventually get us to the holy grail of a Semantic Web, one in which computer programs actually understand the meaning of what they see and read. Google has of course been the great proof point of the power of machine learning algorithms.
Earlier this week, Google made a nod to the other side of the debate, introducing a feature that they call "Rich Snippets." Basically, if you mark up pages with certain microformats ( and soon, with RDFa), Google will take this data into account, and will provide enhanced snippets in the search results. Supported microformats in the first release include those for people and for reviews.
So, for example, consider the snippet for the Yelp review page on the Slanted Door restaurant in San Francisco:
The snippet is enhanced to show the number of reviews and the average star rating, with a snippet actually taken from one of the reviews. By contrast, the Citysearch results for the same restaurant are much less compelling:
(Yelp is one of Google's partners in the rollout of Rich Snippets; Google hopes that others will follow their lead in using enhanced markup, enabling this feature.)
Rich snippets could be a turning point for the Semantic Web, since, for the first time, they create a powerful economic motivation for semantic markup. Google has told us that rich snippets significantly enhance click-through rates. That means that anyone who has been doing SEO is now going to have to add microformats and RDFa to their toolkit.
Historically, the biggest block to the Semantic Web has been the lack of a killer app that would drive widespread adoption. There was always a bit of a chicken-and-egg problem, in which users would need to do a lot of work to mark up the data for the benefit of others before getting much of a payoff themselves. But as Dan Bricklin remarked so insightfully in his 2000 paper on Napster, The Cornucopia of the Commons, the most powerful online dynamics are released not by appeals to volunteerism, but by self-interest:
What we see here is that increasing the value of the database by adding more information is a natural by-product of using the tool for your own benefit. No altruistic sharing motives need be present...(Aside: @akumar, this is the answer to your question on Twitter about why in writing up this announcement we didn't make more of Yahoo!'s prior support for microformats in searchmonkey. You guys did pioneering work, but Google has the market power to actually get people to pay attention.)
What I also find interesting about the announcement is the blurring line between machine learning and semantic markup.
Machine learning isn't just brute force analysis of unstructured data. In fact, while Google is famous as a machine-learning company, their initial breakthrough with pagerank was based on the realization that there was hidden metadata in the link structure of the web that could be used to improve search results. It was precisely their departure from previous brute force methods that gave them some of their initial success. Since then, they have been diligent in developing countless other algorithms based on regular features of the data, and in particular regular associations between data sets that routinely appear together - implied metadata, so to speak.
So, for example, people are associated with addresses, with dates, with companies, with other people, with documents, with pictures and videos. Those associations may be made explicitly, via tags or true structured markup, but given a large enough data set, they can be extracted automatically. Jeff Jonas calls this process "context accumulation." It's the way that our own brains operate: over time, we make associations between parallel data streams, each of which informs us about the other. Semantic labeling (via language) is only one of many of those data streams. We may see someone and not remember their name; we may remember the name but not the face that goes with it. We might connect the two given the additional information that we met at such and such conference three years ago.
Google is in the business of making these associations, finding pages that are about the same thing, and they use every available handle to help them do it. Seen in this way, SEO is already a kind of semantic markup, in which self-interested humans try to add information to pages to enhance their discoverability and ranking by Google. What the Rich Snippets announcement does is tell webmasters and SEO professionals a new way to add structure to their markup.
The problem with explicit metadata like this is that it's liable to gaming. But more dangerously, it generally only captures what we already know. By contrast, implicit metadata can surprise us, giving us new insight into the world. Consider Flickr's maps created by geotagged photos, which show the real boundaries of where people go in cities and what they do there. Here, the metadata may be added explicitly by humans, but it is increasingly added automatically by the camera itself. (The most powerful architecture of participation is one in which data is provided by default, without the user even knowing he or she is doing it.)
Google's Flu Trends is another great example. By mining its search database (what John Battelle calls "the database of intentions") for searches about flu symptoms, Google is able to generate maps of likely clusters of infection. Or look at Jer Thorp's fascinating project announced just the other day, Just Landed: Processing, Twitter, MetaCarta & Hidden Data. Jer simulated the possible spread of swine flu built by extracting the string "Just landed in..." from Twitter. Since Twitter profiles include a location, and the object of the phrase above is also likely to be a location, he was able to create the following visualization of travel patterns:
Just Landed - Test Render (4 hrs) from blprnt on Vimeo.
This is where the rubber meets the road of collective intelligence. I'm a big fan of structured markup, but I remain convinced that even more important is to discover new metadata that is produced, as Wallace Stevens so memorably said, "merely in living as and where we live."
P.S. There's some small irony that in its first steps towards requesting explicit structured data from webmasters, Google is specifying the vocabularies that can be used for its Rich Snippets rather than mining the structured data formats that already exist on the web. It would be more "googlish" (in the machine learning sense I've outlined above) to recognize and use them all, rather than asking webmasters to adopt a new format developed by Google. There's an interesting debate about this irony over on Ian Davis' blog. I expect there to be a lot more debate in the weeks to come.
tags: google, microformats, semantic web
| comments: 6
submit:
Come to Ignite Where & Launchpad
by Brady Forrest | @brady | comments: 1
Every year we kick-off Where 2.0 with a combination Launchpad and Ignite event. This year is no different. So far we've got 11 geo-oriented Ignite talks paired with 5 product demos spread across two sets. We'll be starting the show at 7PM and will conclude by 9PM on May 19th at the Fairmount in San Jose. Bar opens at 6:30.
RSVP @ Facebook. RSVP @ Upcoming.
First Set (Starts 7:00)
Demo: Andrew Weinreich - Xtify
Xtify is a location-based services platform offered to website developers. Xtify is able to abstract location without the involvement of wireless carriers.
Demo: Brian Trussel - Glympse: Socializing LBS
The next generation personal location-based service products should be much more like sharing a phone call and a lot less like forming a baseball team. Sharing location is impulsive, like text messaging and it needs to be instant, simple and clean.
Demo: Noam Bardin - Waze
Waze drivers are building the first dynamic driving map reflecting the roads right now. Driving with waze mobile client lets users passively and actively share real time data and receive the optimal route to their destination. This level of dynamic information can only be achieved by drivers participating and sharing real driving data. Waze is all over Israel and will be coming to the US (currently Android only).
David Troy - Election 2008: Mapping Voter Experiences with Twitter Vote Report
With irregularities in the election process widely reported in 2000 and 2004, the 2008 election represented one of the first opportunities to use technologies like Twitter, SMS, and cell phones to document and map the election process. Twitter Vote Report was the result of work by activists and technologists, and created a permanent document of the 2008 election.
Sam Hiatt - Implementing Web Services for NASA's Terrestrial Observation and Prediction System
The ecological monitoring and forecasting lab at NASA Ames Research Center produces daily global estimates of parameters related to ecosystem condition. Implementing web services has increased accessibility and greatly improved the usefulness of our data products. We present the TOPS data gateway and show how it is being used by the US National Parks Service to assist resource management.
David Felcan - A Crime Early Warning System: Using Spatial Statistics to Detect Changing Geographic Patterns in Crime
Large quantities of spatial data can be as much a burden as a boon without the tools to properly tease out important details. For police, HunchLab enables early detection of changes in crime patterns, pulls information automatically out of millions of incident records, and provides the means of detecting and stopping crime spikes earlier than they would be found through more conventional means.
Adam DuVander - How Open Should Mapping APIs Be?
Google Maps is innovative, but also proprietary. Yahoo, Microsoft, and Mapquest also have equally closed platforms, while the open source JavaScript library Mapstraction ties them together with a single interface. This panel will discuss whether there should be a standard for interoperable mapping APIs, or whether there's more benefit and innovation to remaining proprietary.
Michelle Bowman - Here There Be Lions: The Cartography of the Future
A new breed of maps is emerging that are revealing breakthroughs in our understanding of biology, neuroscience, ecology and the physical world. We’re now able to map not just physical geographies, but genomes, neural pathways, emotions, social networks - even the global movement of ideas. These new maps tell powerful stories about the changes that will shape society over the next twenty years.
Second Set (Starts 8:15)
Demo: Tom Link - Product Launch: SpatialKey
SpatialKey is a next generation Information Visualization, Analysis and Reporting System. It is designed to help organizations quickly assess location based information critical to their organizational goals, decision making processes and reporting requirements.
Demo: Ahmed Lacevic & He Huang - Demographic Data Mining Using Social Explorer
We present a very powerful new tool for mining current and historical demographic data online. We will show a quick and easy way to find the data, visualize change over time using beautiful thematic maps, create slide-shows with a click of a button, exploring everything from income to rent affordability to slavery in 1790.
Peter Batty - Social Networking Based on Future Location
This presentation talks about the challenges in building a fine-grained model of a person's future location, and about the range of powerful applications that can be built off such a model. Many applications focus on the current location of a person and their friends - future location is harder to handle but arguably more useful.
Ariel Waldman - Space Hacks
From creating remote-sensing cubesats to analyzing aerogel: how the public is hacking into space exploration.
Tim Waters - MapWarper, An Open Source Online Map Rectifier
Utilising open source tools, a website is presented enabling a user to upload an image and rectify it. Maps can be rectified by the crowd. Rectified maps can used as WMS or packaged and downloaded as tiles. Metadata regarding provenance and licensing is captured. All maps are searchable, resulting in a library of user submitted maps. The application is free and open source.
Ian White - Got Smarts
The coming wave of Intelligent Transportation Systems (ITS) has been underway in the world of public infrastructure for over 10 years. Few are aware of the vast implications--fuel efficiency gains, lessened congestion, on-time trains, decreased accident rates/fatalities, the list goes on...But few outside the public sector are aware of what this means and how it will affect the morning commute.
Martin Flynn - OpenGTS - Open Source GPS Tracking System
OpenGTS (Open Source GPS Tracking System) was first made available in January of 2007 and is now in use in at least 33 different countries around the world for tracking vehicles, trucks, delivery vans, ships, people, phones, etc. This session will be an overview of the features and capabilities of the OpenGTS System available on SourceForge.
Eric Gundersen - Washington, DC's Government Push for Open Data and Map Mashups
This session will provide an overview of the Washington, D.C. government's recent decision to open up many of its public data streams for easy public use and the contest they sponsored to highlight the usefulness of this data.
tags: geo, ignite, where 2.0
| comments: 1
submit:
Four short links: 13 May 2009
by Nat Torkington | comments: 0
- How NPR Tweets Topical Archive Material (Nieman Lab) -- NPR Twitter bot that tweets relevant links to archived NPR material based on what people are currently seaching for. What an elegant way to inform online discussion, it's like a heads-up display providing context for what you're currently talking about! The article talks about when it doesn't succeed as well as when it does. (via Evolving Newsroom)
- COMP 8440: Intro to Open Source -- Andrew Tridgell taught a 15-lecture class on open source at Australian National University. It's a comprehensive bootcamp, right up there with Producing OSS. (via fmarier on Twitter)
- DIY Broadband (ArsTechnica) -- Norwegian ISP offers cheaper rates if you dig your own fibre tunnels. (via dsearls on Twitter)
- Arduino Simple Walker -- a how-to on building a simple two-servo Arduino walker with body and legs made on demand by Ponoko. Apparently it's great for kids, too. (via makezine)
tags:
| comments: 0
submit:
2 Years Later, the Facebook App Platform is Still Thriving
by Ben Lorica | comments: 7In a few weeks, the Facebook application platform will mark its second anniversary. While it garnered lots of press coverage in the months after it launched, the arrival of the iTunes app store shifted attention away from Facebook's vibrant ecosystem. The media glow is understandable: among other things, the younger iTunes platform is adding apps at a much faster rate than Facebook or Myspace.

Games comprise a sizable chunk of app revenues in all three platforms and recent stories suggest that 2009 has been a great year for developers. The substantial revenue generated by popular Facebook (and Myspace) apps has been the subject of articles in VentureBeat, TechCrunch, and Inside Facebook. There have also been recent estimates for the revenue generated by iPhone apps (see here and here). Game developers in particular are benefiting from having a multitude of platforms: Games are the largest iTunes category, and the second largest category in both Facebook and Myspace. In addition, 4 of the top 10 most successful Facebook app providers are Game developers.
tags: facebook, iphone, myspace, platform, platforms, social media
| comments: 7
submit:
Four short posts: 12 May 2009
by Marc Hedlund | comments: 1
[Stealing Nat's "Four short" format again...]
- I went to Google and searched for a non-location-specific term today (I can't be more specific since the search was for a birthday present for my wife, but let's pretend it was "baseball cards," since that was the general form -- a noun with nothing geographically-specific about it). On the first page of results was a list of shops in my neighborhood that sell that thing (in our pretend example, baseball card trading stores). The specificity of the local results was quite good. Now, I know full well that my IP address identifies my location all too accurately, and that Google and many other sites track that information -- and I've known that for a long time. Nonetheless, seeing my neighborhood right there in the search results made me want to never use any Google site again. Call it "uncanny valley" or "rubbing your face in it" or whatever you want -- it was just too close to home in the most literal sense. I'm off trying Yahoo Search as an alternative -- not that I have any reason to believe Yahoo treats such data any differently, but simply because having alternatives is a good thing. (For the record, I'm a noted privacy freak and I don't pretend to speak for anyone else on this topic. I know that resistance is futile. I continue to believe that there is a great divide on sensitivity about privacy -- you've either had your identity stolen or been stalked or had some great intrusion you couldn't fend off, or you haven't. I'm in the former camp and it colors the way I view and think about privacy online. It makes me indescribably sad to see how clearly I and others in my camp are losing this battle.)
- I'd really like to end up on Wrong Tomorrow for predicting that the iPhone OS will be dominant for the next decade. Who knows? Prediction is completely impossible, which is one of the things that makes life fun. The tech industry seems particularly predictable, though, in that it just keeps acting in waves. The iPhone OS seems to be playing its cards right. Go ahead, commenters, freak out like you did the last time I said this. :)
- I noted to a friend the other day (while encouraging him to go work there) that I measured Twitter's value by seeing Tweetie (an awesome iPhone Twitter client) ascend to one of the four apps in the bottom bar of my iPhone. Those bottom bar apps are the ones I use all the time (the others being the phone, SMS, and email apps). Tweetie replaced Safari, the web browser, which is pretty amazing as a symbolic shift. No other third-party app -- including my own company's app -- has made it into the bottom bar for me. Who says Twitter isn't valuable?
- In contrast, using Twitter makes Facebook like watching repeats on local TV when you're home sick. I really hope the automated and out-of-control cross-posting comes to an end soon. Facebook wins for posting private messages and having inline replies; Twitter wins by letting you see the data some way other than through the official orifice (desktop clients, iPhone apps, SMS, etc). I would accept separate message streams for different types of data; or the death of Facebook in my friend group -- whatever. Unfortunately I doubt I'll get either wish.
tags:
| comments: 1
submit:
History of Fonts on the Ignite Show
by Brady Forrest | @brady | comments: 1
Bram Pitoyo gave a great and informative talk on the History of Fonts at Ignite Portland 5. It's this week's episode of the Ignite Show. Enjoy!
Subscribe to the Ignite Show via iTunes
tags:
| comments: 1
submit:
Google Engineering Explains Microformat Support in Searches
by James Turner | comments: 6
You may also download this file. Running time: 18:24
Subscribe to this podcast series via iTunes. Or, visit the O'Reilly Media area at iTunes to find other podcasts from O'Reilly.
Today, Google is releasing support for parsing and display of microformat data in their search results. While the initial launch will be limited to a specific set of partners (including LinkedIn, Yelp and CNet reviews), the intent is that very quickly, anyone who marks their pages up with the appropriate microformat data will be able to make their information understandable by Google. This technology would allow you to explicitly search, for example, for only printers that had an average customer review of 3 stars or higher. Initial support will include things such as:
- Review Ratings
- Product Prices
- Personal Details
We talked this morning with Othar Hansson and RV Guha, two of the Google engineers responsible for the new functionality, and you can listen to them discuss it in this exclusive O'Reilly interview.
JAMES TURNER: Why don't you guys start by introducing yourselves?
OTHAR HANSSON: Sure. I'm Othar Hansson: and I'm a tech lead on this project. And I'm in Google's Search UI Group.
RV GUHA: My name is Guha. I'm an engineer at Google and I do stuff across the board.
JT: So can you describe briefly, to start off, exactly what it is you're releasing today?
RVG: Okay. We are asking webmasters who have pieces of data like reviews or people profiles, and in an experimental form, things like information about organizations and products, to put the structure data representing the content on the webpage in a machine-understandable form on the webpage. Typically, what happens is that if you take a website and having created opinions, I can talk about the context of opinions. You would typically have a database in the back-end which has lots of information about products. People write reviews about them. And you get information such as the number of reviews, the average rating of the reviews, the price of the product, who sells it, et cetera, et cetera, et cetera. It's stored in a structured database in your back-end. You then use some scripts to format it into HTML as per the site's design. Now going from the structured data to the HTML is quite straight-forward. But going from the HTML back to the structured data in a fashion which works across sites is very, very, very hard. Now our search engine doesn't -- it's very difficult for a search engine to understand -- to sort of get back the structured data for all of the sites. Now if it were to understand that, if it were to understand that this is a review site where the product being reviewed is such and such and it has 30 reviews with an average rating of 3.2 and so on and so forth, we could do a better job of the search. In particular, we could do a better job of presenting the two or three lines of text that appeared as part of the search result so that the user has a better idea of what to expect on that page. And from our experiments, it seemed that giving the user a better idea of what to expect on the page increases the click-through rate on the search results. So if the webmasters do` this, it's really good for them. They get more traffic. It's good for users because they have a better idea of what to expect on the page. And, overall, it's good for the web.
JT: So in some ways, that's in the same way that right now for certain sites, you'll give the internal structure of the site as part of the search result or for shopping results, you'll give price ranges and things like this. This is just, again, enriching and providing more structured -- more than just a snippet, giving more of a structured display of the information on that page?
RVG: Yes. If we have a structured data, we can do lots of things. We're starting off by improving the snippets. It's an absolute no-brainer. It seems to be helping everybody. And, as you know us, we keep playing it on with different ideas and different things. As structured data becomes more prevalent, there's a ton of ideas, both inside Google and outside Google, on how you might improve search.
tags: google, interviews, microformats, search, seo
| comments: 6
submit:
Google Announces Support for Microformats and RDFa
by Timothy M. O'Brien | comments: 18
Don't miss James Turner's Interview with Google Engineering's Othar Hansson and RV Guha
On Tuesday, Google introduced a feature called Rich Snippets which provides users with a convenient summary of a search result at a glance. They have been experimenting with microformats and RDFa, and are officially introducing the feature and allowing more sites to participate. While the Google announcement makes it clear that this technology is being phased in over time making no guarantee that your site's RDFa or microformats will be parsed, Google has given us a glimpse of the future of indexing. Read this article to find out about the underlying technology and how you can prepare you own content to work with this emerging technology.
What is RDFa?
While Google's announcement today focuses on microformats they will soon release support for RDFa. From the W3C RDFa in XHTML Specification:
The current Web is primarily made up of an enormous number of documents that have been created using HTML. These documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience: an event on a web page can be directly imported into a user's desktop calendar; a license on a document can be detected so that users can be informed of their rights automatically; a photo's creator, camera setting information, resolution, location and topic can be published as easily as the original photo itself, enabling structured search and sharing.
Let's take a quick look at a review from Amazon, and see how it would be marked up with RDFa to provide more information for Rich Snippets. First, here's a review from the Amazon site:

Next, let's take a look at a (very simplified) example of markup that might be used to generate this review:
<div> <div> 79 of 98 people found the following review helpful: </div> <div> <span>5.0 out of 5 stars</span> <span><b>American Biographer: Jon Meacham</b>/span> </div> <div><a href="https://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/"> <span>Marian the Librarian</span></a> (NY, NY) - </div> <div> <b>This review is from: <a href="https://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/"> American Lion: Andrew Jackson in the White House (Hardcover)</a></b> </div> <div class="review"> American Lion is a wonderfully crafted biography about an incredibly interesting and oft-overlooked American who helped shaped this country... </div> </div>
Next, let's add the RDFa markup to this review that would allow Google to integrate this review into Google's Rich Snippets. To markup this XHTML with RDFa, you use the https://data-vocabulary.org namespace and a set of attributes. To see a list of attributes that work with Google's indexing technology, see this RDF for data-vocabulary.org:
<div xmlns:v="https://rdf.data-vocabulary.org " typeof="v:review"> <div> 79 of 98 people found the following review helpful: </div> <div> <span><span property="v:rating">5.0 out of 5 stars</span> <span><b>American Biographer: Jon Meacham</b>/span> </div> <div><a href="https://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/"> <span property="v:reviewer" about="https://www.amazon.com/gp/pdp/profile/A2G8PQ9HNUY6NA/">Marian the Librarian</span></a> (NY, NY) - <span property="v:dtreviewed">1st April 2009</span> </div> <div> <b>This review is from: <a property="v:itemreviewed" about="https://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/" href="https://www.amazon.com/American-Lion-Andrew-Jackson-White/dp/1400063256/"> American Lion: Andrew Jackson in the White House (Hardcover)</a></b> </div> <div class="review" property="v:description"> American Lion is a wonderfully crafted biography about an incredibly interesting and oft-overlooked American who helped shaped this country... </div> </div>
This initial release covers people and reviews, but Google will be slowly rolling out support for other RDFa vocabularies and microformats as they become available. For more information, see "Marking up content with RDFa"
on the Google Webmaster/Site Owners Help site.Analysis
While the Semantic Web has been around for years, it has yet to live up to the audacious promises that heralded its introduction to the world. What is the Semantic Web? Here's the definition from Wikipedia in case you need a refresher:
Humans are capable of using the Web to carry out tasks such as finding the Finnish word for "monkey", reserving a library book, and searching for a low price for a DVD. However, a computer cannot accomplish the same tasks without human direction because web pages are designed to be read by people, not machines. The semantic web is a vision of information that is understandable by computers, so that they can perform more of the tedious work involved in finding, sharing, and combining information on the web.
In short, the Semantic Web is about more "meaningful" content. We've perfected the art of scanning text and creating massive distributed indexes that produce highly relevant search results, but when you type in "Swine Flu" you are really still dealing with an inefficient indexing approach that doesn't know about the meaning of the text being parsed and indexed. Moving toward the Semantic Web will allow our searching technologies to become more intelligent and will set the stage for the next revolution in which computing systems can become more aware of the "meaningfulness of data".
We've already seen a shift toward "semantic search": Google has already been augmenting search results with Google Maps, limited catalog searches, and more recent entries into the search market such as Amazon's A9 and the yet to be released Wolfram Alpha differentiate themselves by the structured data and content that can be extracted from a search result. We have yet to a see a compelling reason for web masters to place RDFa or microformats into a site to enable this semantic data to be mined until today, until Google provided a social incentive for site designers. This shift toward semantic markup promises to disrupt existing SEO approaches which are built atop the platform Google provides.
With Google in the game, it now becomes an imperative, sites that want to be listed in search results with Rich Snippets will need to think about RDFa and microformats. Tools that have been designed to present person and review data will now output RDFa and microformat markup compatible with Google by default. Blogging systems like Moveable Type or Wordpress, ecommerce tools like Magento, content management tools like Alfresco and Drupal will, very quickly, adopt the formats supported by Google, and in five years time, we won't be able to imagine a web that wasn't being supported by semantic markup. We think reminisce about the days when search results were produced by ad-hoc text processing technologies unsupported by meaningful data. The search result you are used to today will seem quaint in comparison to the rich data-centric experience of the emerging Semantic Web.
"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation. " - Tim Berners-Lee
UPDATE (3:52PM): We've had some response about failing to mention Yahoo's SearchMonkey which also supports RDFa and Microformats. Google is certainly not the first search engine to support RDFa and Microformats, but it certainly has the most influence on the search market. With 72% of the search market, Google has the influence to make people pay attention to RDFa and Microformats.
tags:
| comments: 18
submit:
Four short links: 12 May 2009
Storage Superfluity, Data-Driven Design, Twit-Mapping, and DIY Biohacking
by Nat Torkington | comments: 1
- Lacie 10TB Storage -- for what used to be the price of a good computer, you can now buy 10TB of storage. Storage on sale goes for less than $100 a terabyte. This obviously promotes collecting, hoarding, packratting, and the search technology necessary to find what you've stashed away. Analogies to be drawn between McMansions full of Chinese-made crap and terabyte drive full of downloaded crap. Do we need to keep it? Are there psychological consequences to clutter? (via gizmodo)
- In Defense of Data-Driven Design -- a thoughtful response to the "Google hates design!" hashmob formed around designer Douglas Bowman's departure from Google. When you’ve got the enormous traffic necessary to work out if miniscule changes have some minor, statistically significant effect, then sure, if you can do it quickly, why wouldn’t you? But that’s optimization that should happen at the very end of the design cycle. The cart goes after the horse. Put it the other way ‘round and you have a broken setup. It doesn’t mean horses suck. It doesn’t mean carts suck. Carts are not the enemy of horses. Optimization is not the enemy of design. Get them in the right order and you have something really useful. Get them the wrong way around and you have something broken.
- Just Landed: Processing + Twitter + Metacarta + Hidden Data -- Jer searched Twitter for "just landed in", used Metacarta to extract the locations mentioned, and then used Processing to build visualizations.
- Do It Yourself Genetic Sleuthing -- MIT is starting a hotbed of DIY biologists. The 23-year-old MIT graduate uses tools that fit neatly next to her shoe rack. There is a vintage thermal cycler she uses to alternately heat and cool snippets of DNA, a high-voltage power supply scored on eBay, and chemicals stored in the freezer in a box that had once held vegan "bacon" strips. Aull is on a quirky journey of self-discovery for the genetics age, seeking the footprint of a disease that can be fatal but is easily treated if identified. But her quest also raises a broader question: If hobbyists working on computers in their garages can create companies such as Apple, could genetics follow suit? It's unclear what those DIY-started "genetics" companies would look like--the potential is there, but it's yet to met the right problem. (via Andy Oram)
Just Landed - 36 Hours from blprnt on Vimeo.
What is the Right Amount of Swine Flu Coverage?
by Brady Forrest | @brady | comments: 5
Dr. Hans Rosling (Gapminder) has posted a short, but effective video comparing the coverage of Swine Flu to a more constant killer like Tuberculosis. He decries the fact that Swine flu has generated many orders of magnitude more coverage per death than Tuberculosis.
Dr. Rosling has a point. The media could be said to be disproportionately covering Swine Flu. However, how can the media not be expected to cover Swine Flu? It is new. It is spreading quickly. It is something that will potentially impact the daily lives of their readers (and themselves). Tuberculosis, while on the rise (see the chart to the right), is a known, is relatively contained and there is a vaccine.
Which should the media focus on? Which would you expect them to? While the media coverage maybe overblown (and I questioned putting this post up at all) I think it is understandable to want to track this potential new threat closely.
[Tuberculosis Growth Chart via Wikipedia]
Updated:I realized that this post was incomplete without checking some trend data to see how people's interest compare. Here's the Wikirank comparison chart:
And the Google Trends comparison:
For "fun" I included H1N1 to see if the name change was working. Based on search volume it does not seem to have been effective use of re-marketing dollars.
It's clear that the news is driving a lot of interest in Swine Flu and that there is very little residual interest in Tuberculosis. Whether this is the tail wagging the dog remains to be seen.
tags:
| comments: 5
submit:
Recent Posts
- Vine, Disaster Tech From Microsoft | by Brady Forrest on May 11, 2009
- Four short links: 11 May 2009 | by Nat Torkington on May 11, 2009
- Goodreads vs Twitter: The Benefits of Asymmetric Follow | by Tim O'Reilly on May 10, 2009
- Hacking Primes in Mathematica | by Mike Loukides on May 9, 2009
- Who Will Cut The Gordian Knot of Healthcare Billing? | by Tim O'Reilly on May 9, 2009
- Hackers wanted! Scholarships available to coders who'll come to journalism and help save democracy | by Brian Boyer on May 8, 2009
- Four short links: 8 May 2009 | by Nat Torkington on May 8, 2009
- Velocity 2009 - Big Ideas (early registration deadline) | by Jesse Robbins on May 8, 2009
- Up Close with an Enigma | by Ben Lorica on May 8, 2009
- Overheard: @edjez on innovation in mobile | by Tim O'Reilly on May 7, 2009
- Eat Fast, Get Fat? | by Brady Forrest on May 7, 2009
- Velocity Preview - Keeping Twitter Tweeting | by James Turner on May 7, 2009
STAY CONNECTED
TIM'S TWITTER UPDATES
CURRENT CONFERENCES

Where 2.0 2009 delves into the emerging technologies surrounding the geospatial industry, particularly the way our lives are organized, from finding a restaurant to finding the source of a new millennium plague. Read more

Found is the authoritative place to discover best practices for this industry and gain a thorough understanding of why search-friendly architecture is absolutely mission-critical to businesses of all sizes. Read more
O'Reilly Home | Privacy Policy ©2005-2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.