CARVIEW |
Unlikely Group Working Happily Together To Solve Patent Problem
by Carl Malamud | @CarlMalamud | comments: 3
People following the issue of open sourcing the U.S. Patent Database might have been surprised to read an announcement in the official business opportunities web site of the U.S. Government: Synopsis for Public Data Dissemination Sole Source Contract to Google, Inc.
While the first reaction of many might be "OMG, WTF, how could they," this is actually good news, with an unlikely cast of characters working together including Google, Intellectual Ventures, and the Internet Archive.
In September, the Patent Office announced a rather strange "Request for Information" (RFI). Under this proposed scheme, the Patent Office would receive a substantial (upwards of $10 million!) donation of equipment from a vendor. In return, the vendor would get to be the official distributor of the patent database to the public, and would get to sell "value-added products." Among other things, the vendor would get access to the patents before the public does, allowing them to mine the database, and would be allowed to sell a variety of bulk products.
While the RFI makes a nod to public access, like all these Zero-Dollar deals the government cuts, there would be a lot of limits on what is "public" data as the vendor tries to recoup their investment by selling the so-called "value-added" products. Readers may remember a similar fiasco with the General Accountability Office where the Federal Legislative Histories were given away to Thomson West and now even the U.S. Congress has to pay to access this material.
The patent database is no ordinary database. This is the only database specifically called out in the U.S. Constitution as being the responsibility of the U.S. Executive Branch to run! A lot of people think this Zero-Dollar deal the Patent Office is contemplating kind of stinks, and I'm really pleased to announce that a broad coalition has come together to make this data more broadly available immediately:
- Intellectual Ventures, the IP group founded by Nathan Myhrvold, is donating several terabytes of the back file to Public.Resource.Org, the Internet Archive, and a variety of other groups to make available to everybody.
- Google asked for permission to crawl the public application system (known as "PAIR"). The announcement by the Patent Office of a "sole source contract to Google" was the government's way of saying we have permission to crawl their system and bypass the CAPTCHAs. This is good news, because the PAIR system contains the "binders," which is all the material that supplements the basic applications and grants.
- The Internet Archive has set aside a boatload of disk drives to serve this data. In addition, Public.Resource.Org will provide the usual rsync and FTP, and we expect a variety of other groups to provide mirrors both for bulk access and end-user systems.
It goes without saying that Google, the Internet Archive, and Intellectual Ventures are 3 groups that don't often work together, and I think this illustrates the compelling public interest in making the patent database more broadly available. We announced this Section 8 Task Force in a letter to Congressman Mike Honda. And, we also sent in a FOIA request to the Patent Office, putting them on notice that we expect any responses to their RFI $0 boondoggle to be made available to the public, as required by law.
In the long-term, Patent Office just needs to fix their system instead of resorting to silly $0 deals. They have 600 staff in Information Technology and spend hundreds of millions of dollars. Surely, they can find a way to serve the public as part of that? Putting a lien on the Patent database in return for $10 million in hardware instead of fixing their 70's-era mainframes just doesn't make sense.
In the meantime, we should have the first 8 terabytes of data up pretty soon. Those interested in learning more about the issue are urged to consult the paper trail on our PTO page which includes letters to and from Congress, and pointers to the Patent Office procurement docs.
tags: gov2.0, open data, open source
| comments: 3
submit:
Three Paradoxes of the Internet Age - Part Three
by Joshua-Michéle Ross | @jmichele | comments: 4
The myth of personal empowerment takes root amidst a massive loss of personal control.
Social technologies are cloaked in a rhetoric of liberation (customers are in control, the internet fosters democracy, social technologies propagate truth etc.) that tend to obscure the fact that never before have we handed so much personal information over in exchange for so little in return.
As we move from the “web of information” to the “web of people” (aka the Social Web) the output of all of this social participation is massive dossiers on individual behavior (your social network profiles, photos, location, status updates, searches etc.) and social activity.
This loss of control over personal information is on a collision course with the law of unintended consequences: MIT’s Project Gaydar can spot your sexual preference by your social ties, Facebook checks are occurring customs and every quiz you take on Facebook delivers a shocking amount of personally identifiable information to third parties.
Amidst this barrage of good news for how much power we wield in the transaction of commerce one has to wonder if we are giving away something quite precious in the bargain.
Here are links to the previous posts in this series:
One: More access to information doesn’t bring people together, often it isolates us.
Two: Individual perception of increased choice can occur while the overall choice pool is getting smaller
What are other paradoxes of the Internet Age? What did I get wrong above?
tags: MIT, paradox, social web
| comments: 4
submit:
Four short links: 6 November 2009
Barcode Scanning, Downloadable Community Book, Gov Hack Day, Android Kludges
by Nat Torkington | @gnat | comments: 2
- Red Laser -- "impossibly accurate barcode scanning". Uses Google Product Search to identify products that you scan using the camera on the phone. I remember Rael and I talking to Jeff Bezos about this years ago, before camphones had the resolution to decode barcodes. The future is here and it's $1.99 on the App Store ... (via Ed Corkery on Twitter)
- The Art of Community For Free Download -- Jono Bacon's O'Reilly book on community management now available for free download (still available for purchase!).
- Gov Hack -- Australian government ran a hack day with their open data, this is their writeup.
- Android Mythbusters -- slides for talk by Matt Porter at Embedded Linux Conference Europe. A (long) catalogue of the kludges in Android.
tags: android, augmented reality, book related, community, gov2.0, hacking, linux
| comments: 2
submit:
Three Paradoxes of the Internet Age - Part Two
by Joshua-Michéle Ross | @jmichele | comments: 8
Individual perception of increased choice can occur while the overall choice pool is getting smaller
This gem from Whimsley makes the point - with extensive statistical modeling supporting the argument - that our algorithm-obsessed, long tail merchants are actually depleting the overall choice pool despite the fact that as individuals we may be experiencing a sense of more choice through recommendations engines...
Online merchants such as Amazon, iTunes and Netflix may stock more items than your local book, CD, or video store, but they are no friend to "niche culture". Internet sharing mechanisms such as YouTube and Google PageRank, which distil the clicks of millions of people into recommendations, may also be promoting an online monoculture. Even word of mouth recommendations such as blogging links may exert a homogenizing pressure and lead to an online culture that is less democratic and less equitable, than offline culture.In short, the long tail has gangrene at its extremity - the niche. More disarming is the conclusion that it isn't just the output of our recommendation algorithms that is leading to what the author calls "monopoly populism"and the end of niche culture:
"The recommender "system" could be anything that tends to build on its own popularity, including word of mouth...Our online experiences are heavily correlated, and we end up with monopoly populism...A "niche", remember, is a protected and hidden recess or cranny, not just another row in a big database. Ecological niches need protection from the surrounding harsh environment if they are to thrive. Simply putting lots of music into a single online iTunes store is no recipe for a broad, niche-friendly culture.The network effects that so characterize Internet services are a positive feedback loop where the winners take all (or most). The issue isn't what they bring to the table, it is what they are leaving behind.
here is a link to yesterday's post: More access to information doesn’t bring people together, often it isolates us.
Tomorrow: The myth of personal empowerment takes root amidst a massive loss of personal control.
tags: google, itunes, netflix, page rank, paradox, recommendations
| comments: 8
submit:
Four short links: 5 November 2009
Heat Maps in R, EC2 Blackhat Tricks, Snickersome Unicode, and Decoding Statistics
by Nat Torkington | @gnat | comments: 0
- Heat Maps in R -- We used financial data here because it's easier to access than the airline data, but it's actually a pretty interesting way of looking at a financial time series. Weekend and holiday effects are a bit more obvious, and it's a bit like being able to see the daily, weekly, monthly and yearly closes all at once (by scanning your eye over the calendar in different directions). Includes source code. (via migurski on Delicious)
- BlackHat and EC2 -- Theft of resources is the red-headed step-child of attack classes and doesn't get much attention, but on cloud platforms where resources are shared amongst many users these attacks can have a very real impact. With this in mind, we wanted to show how EC2 was vulnerable to a number of resource theft attacks and the videos below demonstrate three separate attacks against EC2 that permit an attacker to boot up massive numbers of machines, steal computing time/bandwidth from other users and steal paid-for AMIs. (via straup on Delicious)
- Funny Characters in Unicode -- I never get tired of the wacky stuff in Unicode. I love the thought of a Unicode committee somewhere arguing passionately about the number of buttons on the snowman .... (via Hacker News)
- Statistics to English Translation -- The terms sensitivity and specificity generally refer to diagnostic or screening procedures, such as an HIV or allergy tests. The sensitivity of a test is its true positive rate; the specificity is its true negative rate, although it can be more intuitive to think of specificity as the complement of the false positive rate. This matters. Bandying around numbers with misleading labels, or misinterpreting numbers that have a precise and defined meaning, does not further understanding. (Said 78.4% of statisticians, with a 20% confidence factor probability of false positives)
tags: amazon, cloud, ec2, language, R, security, statistics, visualization
| comments: 0
submit:
Twitter Approval Matrix - October 2009
by Mike Hendrickson | @mikehatora | comments: 6
This is the fifth post for the Twitter Approval Matrix with data that spanned the month of October and different sources such as tweetsentiment.com, scraping archives, and observations. This month I received help from Joe Fernandez the CEO of Klout.com. Joe continues to provide some great 'hard' data that allowed me to better place more items on the grid this month.
A quick refresher, the matrix shows four quadrants used to describe trends found on Twitter. The Y-axis is partly analytical and shows popularity (mostly through scraped numbers) or perceived popularity (in the future nominated by you). The other part of the grid is more curated and subjective. The X-axis has been plotted based on my personal opinion. You may agree or disagree with my placements and that's all good to me. After all, this is partially about taste and numbers. The matrix and plots do not represent a thorough analytical treatment, but rather a view of the trends that could be found in data sources allowing me to plot with some sense of relevance.
For this post, I've limited the data and activity to the month of October. Again, I'll continue with this project as long as I get enough feedback/help. So, if you are interested in contributing, you can comment here, or read the original post to figure out the best way for you to submit your plots.
I hope you enjoy this and see it as a potentially useful tool to monitor trends that your fellow readers are both contributing to and tracking.
tags: social web, twitter
| comments: 6
submit:
Three Paradoxes of the Internet Age - Part One
by Joshua-Michéle Ross | @jmichele | comments: 15
In the circles that I travel the Internet is often breathlessly embraced as the herald of all things good; the bringer of increased choice, personal empowerment, social harmony...and the list goes on. And yet, as with any powerful technology, the truth of its consequences eludes such a singular and happy narrative.
Here is the first of three paradoxes of the Internet Age. I would love to see Radar readers point out others.
More access to information doesn’t bring people together, often it isolates us.
Elizabeth Kolbert has a piece in this week’s New Yorker reviewing Cass Sunstein’s new book, “On Rumors: How Falsehoods Spread, Why We Believe Them, What Can Be Done." In the review she lays out the concept of "group polarization"
People’s tendency to become more extreme after speaking with like-minded others has become known as “group polarization,” and it has been documented in dozens of other experiments. In one, feminists who spoke with other feminists became more adamant in their feminism. In a second, opponents of same-sex marriage became even more opposed to the idea, while proponents shifted further in favor. In a third, doves who were grouped with other doves became more dovish still.The Internet is becoming a vast petri dish for the group polarization phenomena. As Sunstein puts it “The most striking power provided by emerging technologies,” is the “growing power of consumers to ‘filter’ what they see.” (Thanks to Jim Stogdill for surfacing this link via email)
tags: long tail, paradox, ratings, recommendations, reviews, social web
| comments: 15
submit:
Four short links: 4 November 2009
Electronics Hacking FAQs, Speech-To-Text Democracy, Open Source Column Database, Massive Online Analysis
by Nat Torkington | @gnat | comments: 1
- ChipHacker -- collaborative FAQ site for electronics hacking. Based on the same StackExchange software as RedMonk's FOSS FAQ for open source software.
- Democracy Live -- BBC launch searchable coverage of parliamentary discussion, using speech-to-text. One aspect we're particularly proud of is that we've managed to deliver good results for speech-to-text in Welsh, which, we're told, is unique. I think of this as the start of a They Work For You for video coverage. I'd love to be able to scale this to local government coverage, which is disappearing as local newspapers turn into delivery mechanisms for real estate advertisements.
- InfiniDB: Open Source Column Database -- hooks into MySQL, uses MySQL for SQL parsing, security, etc. The commercial enterprise version has multi-server support (parallel scale-out). (via Brian Aker)
- Massive Online Analysis -- MOA is a framework for data stream mining. Includes tools for evaluation and a collection of machine learning algorithms. Related to the WEKA project, also written in Java, while scaling to more demanding problems. . (via joshua on Delicious)
tags: big data, collective intelligence, databases, democracy, gov2.0, hardware, maker, open source
| comments: 1
submit:
Following Lists
The Twitter Lists Feature is a Game Changer
by Brian Ahier | @ahier | comments: 6Guest blogger Brian Ahier is a City Councilor in The Dalles, Oregon, and he works in Information Systems at Mid-Columbia Medical Center. He is passionate about healthcare reform, government 2.0 and health IT.
One of the interesting things about the new Lists feature is the expansion of the asymmetrical nature of relationships on Twitter. I use Twitter Lists to control the flow of the fire hose of my data streams into manageable list streams. But another important aspect is the ability to create lists composed of accounts I don't follow. This is radically changing relationships and the way we build communities on Twitter. As Mark Drapeau pointed out it will become more important which lists you are on than who is following you. You could actually follow no one at all and have lists for each group of accounts you want to follow. You can create listreams to follow rather than following individual people. This is also going to add a new dimension to some of the social aspects of twitter. People will share lists, recommend lists, and get bent out of shape when they are not included on certain lists. But you don't have to go out and find all the accounts to create great lists. You can subscribe to lists created by Twitter superstars such as Robert Scoble's fascinating lists, Tim O’Reilly's great resources, or Muck Rack’s list of journalists lists. A new service called Listorious will point you to some useful lists to follow. I have created some lists and my best so far contains most of the Twitter healthcare community. Companies will create lists of employees like the Twitter employees list and you can follow their tweets without having to follow every employee. Twitter Lists also eventually means the death of the Suggested User List. At the Web 2.0 Summit Tim O'Reilly asked Ev Williams if it wasn't time to move past SUL. Tim admitted that he has benefited from being on the list, but implied that it did not reflect actual authority and suggested it may be time for it to die. Ev Williams said, "It has been time to retire suggested user lists for a while... once we get lists rolled out we can retire the Suggested User List and make that, as we like to say, much more Twittery and democratic." Since you can follow other people’s lists, others can follow yours, and you do not have to actually follow an account on your list, Twitter Lists is a game changer.
tags: social media, twitter
| comments: 6
submit:
Games Top the Charts in the iPhone and Android App Markets
by Ben Lorica | @dliman | comments: 2While it might be true that the number of Book apps is growing at a faster rate, Games continue to dominate the list of popular U.S. iTunes Apps. Games accounted for about a fifth of all iTunes apps over the past week, but the category continued to have a disproportionate share of the Top 100 charts, accounting for 52% of the Top Grossing, 56% of the Top Paid, and 50% of the Top Free apps:

Since most Book apps are actually individual e-books, the Gaming category would have a hard time keeping up with the ever increasing number of Books. Once publishers figured out how to turn their titles into iPhone apps, the number of Book apps started growing faster than Games. Nevertheless Games continue to rule the Top 100 charts.
A similar story is playing out on the Android platform: the most popular Android apps are primarily Games. (In the Android taxonomy, most Books are in the Reference category.)

Returning to the top iPhone apps, the price of the Top Grossing apps stabilized somewhat last week. Except for the top decile (rank 1 through 10) for which the median price was about $7, the median price across the other deciles was around $5.

Over the last week, the Top Paid Games were slightly more expensive than apps that made the overall Top 100 Paid list. iPhone Game developers will tell you that (visually) compelling and engaging iPhone Games are far from trivial to design and market. So it's no surprise that the creators of the most popular Games are starting to charge a little more for their software.
() Data for this post was for the week ending 11/1/2009.
() First, designing for such a small screen poses a major challenge. Secondly, the sheer number of Game apps (close to 20K last week) makes it hard to create something that turns into a long-running top-seller.
tags: android, iphone, mobile, platform, smartphone
| comments: 2
submit:
Four short links: 3 November 2009
Electoral Cryptography, Dataless Airport Security, Visualising Transport Data, Mathematically Insecure Social Asymmetry
by Nat Torkington | @gnat | comments: 0
- First Test for Election Cryptography (MIT Technology Review) -- The first government election to use a new cryptographic scheme that lets both voters and auditors check that votes were cast and recorded accurately will be held tomorrow in Takoma Park, MD. Founder of the company behind the technology is David Chaum, who ran the first electronic currency company in the 90s. That was ahead of its time (Internet faced a credibility problem, not a convenience problem), but his timing for this seems spot-on. (via timoreilly on Twitter)
- Do I Have The Right To Refuse This Search? -- a former police officer questions the efficacy of TSA screenings and is doubly worried by by the lack of data collected. For years in policing, we relied on random patrols to curb crime. We relied upon this “strategy” until someone went out and captured some data, and did a study that demonstrated conclusively that random patrols do not work (Kansas City Study). As police have employed other types of “random” interventions, as in DWI checkpoints, they have had to develop policies, procedures and training to ensure that the “random” nature of these intrusions is truly random. Whether every car gets checked, or every tenth car, police must demonstrate that they have attempted to eliminate the effects of active and passive discrimination when using “random” strategies. No such accountability currently exists at TSA. Trend I see lately is a return to quantitative decision making, reality-based data-directed system interventions. (via BoingBoing)
- Visualising Transport Data -- It can be hard to make meaningful information from huge amounts of data, a graph and a table doesn't always communicate all it should do. We have been working hard on technology to visualise big datasets into compelling stories that humans can understand. We were really pleased with what we came up with in just one and a half days. Like many places, the UK data.gov ran a dev camp to jumpstart people using their data. These appear to be successful, but I'm not aware of studies into the longterm effects nor the "value" of different types of developers.
- Why Your Friends Have More Friends Than You Do -- there's a numerical optical illusion at work here: count your friends, then ask them to count their friends. If you average the friend counts of your peers, it'll probably be higher than your friend count. The reason for this is also why (on average!) your sexual partners seem to have had more sexual partners than you, and why previous generations seem more fecund than current generations. It's because connectors (with large numbers of friends) distort the average, so unless you're the connector (and if you're reading this, you might well be!) the average will be bigger than a normal person's friend count. Left unmentioned is what kind of person would count the number of friends they have, then ask their friends for their counts .... (via Hacker News)
Four short links: 2 November 2009
Inside Botnets, Creating Choropleths, Privacy Simplified, Massively Machiavellian Online Social Gaming
by Nat Torkington | @gnat | comments: 1
- Your Botnet is My Botnet (PDF) -- 2008 USENIX Security paper analysing >70G of data gathered when security researchers hijacked the Torpig botnet. A major limitation of analyzing a botnet from the inside is the limited view. Most current botnets use stripped-down IRC or HTTP servers as their command and control channels, and it is not possible to make reliable statements about other bots. In particular, it is difficult to determine the size of the botnet or the amount and nature of the sensitive data that is stolen. One way to overcome this limitation is to “hijack” the entire botnet, typically by seizing control of the C&C channel. [...] As a result, whenever a bot resolves a domain (or URL) to connect to its C&C server, the connection is redirected or sinkholed. This provides the defender with a complete view of all IPs that attempt to connect to the C&C server as well as interesting information that the bots might send..
- cartographer.js -- build thematic maps using Google Maps. To be precise, you can build a choropleth, which is my word of the day. (via Simon Willison)
- Making Privacy Policies Not Suck (Aza Raskin) -- interested in developing a standard set of privacy policy components the way that Creative Commons has created a standard set of copyright license components.
- Scamville: The Social Gaming Ecosystem of Hell (TechCrunch) -- many of those games on Facebook that your friends play are evil. To get in-game money or objects, they'll let you take a survey but at the end you're signed up for crap you never wanted. Related: this article on monetizing social networks which talks about social gaming's business model.
tags: creative commons, gaming, google maps, mapping, privacy, research, security, social
| comments: 1
submit:
Recent Posts
- Ignite Show: Andrew Hyde on The Posting Economy | by Brady Forrest on October 30, 2009
- The Emerging Twitter List Arms Race | by Mark Drapeau on October 30, 2009
- Four short links: 30 October 2009 | by Nat Torkington on October 30, 2009
- Participatory Sensing - An Interview with Deborah Estrin | by Joshua-Michéle Ross on October 29, 2009
- Navigating the Future: Take Me to Bob | by Brady Forrest on October 29, 2009
- Four short links: 29 October 2009 | by Nat Torkington on October 29, 2009
- Online Where 2.0: iPhone Sensors for Developers | by Brady Forrest on October 28, 2009
- Google Shrinks Another Market With Free Turn-By-Turn Navigation | by Brady Forrest on October 28, 2009
- Safari Books Online 6.0: A Cloud Library as an alternate model for ebooks | by Tim O'Reilly on October 28, 2009
- iPhone Killers, Blackberries and Chicken Parts | by Mark Sigal on October 28, 2009
- Twitter Users Most Followed by the Web 2.0 Summit Crowd | by Ben Lorica on October 28, 2009
- Four short links: 28 October 2009 | by Nat Torkington on October 28, 2009
O'Reilly Home | Privacy Policy ©2005-2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.