CARVIEW |
Four short links: 1 June 2010
Legal XML, Big Social Data, Crowdsourcing Tips, Copyright Balkanization
by Nat Torkington | @gnat | comments: 1
- XML in Legislature/Parliament Environments (Sean McGrath) -- quite detailed background on the use of XML in legislation drafting systems, and the problems caused by convention in that world--page/line number citations, in particular. (Quick gloat: NZ's legislature management system is kick-ass, and soon we'll switch from print authoritative to digital authoritative)
- Large-Scale Social Media Analysis with Hadoop -- In this tutorial we will discuss the use of Hadoop for processing large-scale social data sets. We will first cover the map/reduce paradigm in general and subsequently discuss the particulars of Hadoop's implementation. We will then present several use cases for Hadoop in analyzing example data sets, examining the design and implementation of various algorithms with an emphasis on social network analysis. Accompanying data sets and code will be made available. (via atlamp on Delicious)
- Breaking Monotony with Meaning; Motivation in Crowdsourcing Markets (Crowdflower) -- This finding has important implications for those who employ labor in crowdsourcing markets. Companies and intermediaries should develop an understanding of what motivates the people who work on tasks. Employers must think beyond monetary incentives and consider how they can reward workers through non-monetary incentives such as by changing how workers perceive their task. Alienated workers are less likely to do work if they don’t know the context of the work they are doing and employers may find they can get more work done for the same wages simply by telling turkers why they are working.
- Balkanizing the Web -- The very absurdity of the global digital system is revealing itself. It created all the instruments for global access and, then, turned around and arbitrarily restricted its commercial use, paving the way for piracy. Think about it: our broadband networks now allow seamless streaming of films, TV shows, music and, soon, of a variety of multimedia products; we have created sophisticated transaction systems; we are getting extraordinary devices to enjoy all this; there is a growing English-speaking population that, for a significant part of it, is solvent and eager to buy this globalized culture and information. But guess what? Instead of a well-crafted, smoothly flowing distribution (and payment) system, we have these Cupertino, Seattle or Los Angeles-engineered restrictions. The U.S. insists on exporting harsh copyright penalties and restrictions, while not exporting license agreements and Fair Use, so the rest of the world gets very grumpy.
tags: big data, copyright, Crowdflower, crowdsourcing, gov20, hadoop, social graph, xml
| comments: 1
submit:
Four short links: 31 May 2010
Data and Context, Twitpic Hot or Not, Failing to Save Journalism, Flash in Javascript
by Nat Torkington | @gnat | comments: 0
- Transparency is Not Enough (danah boyd) -- we need people to not just have access to the data, but have access to the context surrounding the data. A very thoughtful talk from Gov 2.0 Expo about meaningful data release.
- Feed6 -- the latest from Rohit Khare is a sort of a "hot or not" for pictures posted to Twitter. Slightly addictive, while somewhat purposeless. Remarkable for how banal the "most popular" pictures are, it reminds me of the way Digg, Reddit, and other such sites trend towards the uninteresting and dissatisfying. Flickr's interestingness still remains one of the high points of user-curated notability. (via rabble on Twitter)
- Potential Policy Recommendations to Support the Reinvention of Journalism (PDF) -- FTC staff discussion document that floats a number of policy proposals around journalism: additional IP rights to defend against aggregators like Google News; protection of "hot news" facts; statutory limits to "fair use"; antitrust exemptions for cartel paywalls; and more. Jeff Jarvis hates it, but Alexander Howard found something to love in the proposal that the government "maximize the easy accessibility of government information" to help journalists find and investigate stories more easily. (via Jose Antonio Vargas)
- Smokescreen -- a Flash player in Javascript. See Simon Willison's explanation of how it works. Was created by the fantastic Chris Smoak, who was an early Google Maps hacker and built the BusMonster interface to Seattle public transport. (via Simon Willison)
tags: collective intelligence, data, Flash, gov2.0, javascript, journalism, programming, transparency, twitter
| comments: 0
submit:
Putting Online Privacy in Perspective
by Tim O'Reilly | @timoreilly | comments: 33When I wrote last week about the Facebook privacy flap, I was speaking out of the frustration that many technologists with a sense of perspective feel when we see uninformed media hysteria about the impact of new technology. (How many of you remember all the scare stories about the risks of using a credit card online from back in the mid-1990s, all of them ignoring the risks that consumers blithely took for granted in the offline world?)
Search engine expert Danny Sullivan vented some of this frustration on a private mailing list the other day. He gave me permission to reprint his remarks here. Danny was responding to a discussion of a Washington Post story about online privacy that started out with concerns about how information posted online is routinely being discovered and used against people in legal cases. (But even then, as you'll see, they left out a crucial part of the story.)
But then, the story goes on to link these cases with the general idea of data collection online.
In the 15 years since the World Wide Web brought the Internet to the masses, the most successful companies have been those that collect information about users and use it to sell things. Google, for instance, has confirmed that it keeps track of search queries sent from a particular IP address. (A spokesman said the company anonymizes IP addresses associated with search queries after nine months and cookies after 18 months.)The problem with linking these two ideas is that the kind of data in the examples above is exactly the kind of data online companies need to collect in order to manage and improve their services. They are a lot like the data collected by your car - some of which, like your speed, is reported to you, and much of which is only reported to a mechanic via a diagnostic computer. That this kind of data is collected is not only no surprise to computer professionals, it's taught as basic practice!Companies are loath to talk about what information they track, but internal compliance manuals for law enforcement for Facebook, Yahoo and Microsoft reviewed by The Washington Post show that their data collection is much more extensive than users might believe based on what they themselves can access.
For example: Microsoft tracks the Xbox LIVE start and end dates and times for game-playing and notes the game played, such as "SW: Jedi Academy." Yahoo keeps chat and instant messenger logs for 45 to 60 days and notes the time/date and IP address for when content is added or deleted to someone's profile or to its Flickr photo service.
Facebook's data collection is among the most detailed.
For every user id, Facebook keeps a log of the IP address that accessed the account, the date and time, and what exactly the user did -- clicking on an advertisement, looking at someone else's profile, posting a photo or sending a message to a friend, etc.
Danny was particularly put off by the hysteria about well-known facts, and by the scrutiny given to trivial pieces of online data collection while ignoring far more massive collection of data by more familiar means. He wrote:
Heh. Google has confirmed it tracks queries to a particular IP address. Like this wasn't something we knew for any search engine back in say, 1995. Or as if Google ever made a secret of it. Or more to the point, like tracking to an IP address is the issue versus the bigger issue of people having search histories (if people opt in) linked to real, personally identifiable accounts.There are real privacy issues to be faced in the data collected by web companies. But they are part of a far bigger picture of how the world is changing. We need thoughtful understanding of what the real risks are, not finger pointing by the media (and even more frighteningly, by members of Congress) at companies that are easy targets because they make good political theater.Heaven help us, though -- let's keep talking IP addresses and cookies. And let's ignore the fact that in virtually every court case where search queries have been notable as evidence, those queries were obtained ... wait for it ... off the person's own computer. Dude, when you're searching for ways to kill your wife, clear your browser history. Seriously, sad but true story.
I think the internet companies are indeed going to face more scrutiny, because they are big fat targets for lazy legislators who are loathe to provide some real security over, I dunno, my credit card purchases?
I mean, can you imagine if when using Google and Yahoo and Bing, they reported all your searches to a "search bureau" that was pretty easy for anyone to access? Oh, and if you disagreed with something listed, well, good luck with getting that removed. But we tolerate that bull from our credit card companies.
My credit card company knows everything I've purchased, which is a pretty personal trail. That doesn't get "anonymized" after 9 months or 18 months. I have no idea at all what happens to it. I can't, like at Google, push a button and make it go poof, either. I don't think I have any rights over it at all.
My grocery store knows all the things I've purchased using my store discount card -- no idea who they hand that out to.
My telephone company keeps my phone records for I don't know how long. Imagine that. They know who I called and for how long.
But yeah, thank you Washington Post for focusing on the fact that Xbox Live keeps track of when I began and ended my game playing. Yeah, thanks for spending time talking about IP addresses. Could they have shoved even one paragraph of perspective in there? Could we get one of the privacy groups to maybe call for some better national standards protecting user information on and OFFline? If they are, I never hear the offline part.
Rant over. I've just seen this same obsession with IP addresses over years. Years and years, rather than focusing on the bigger and more important privacy issues on a broader perspective.
tags: facebook, google, privacy
| comments: 33
submit:
California: There's an app for that
The state of California will partner with Microsoft, Google and Programmable Web to run an apps contest this summer.
by Alex Howard | @digiphile | comments: 3Can California's budget-stricken government be improved through citizen engagement and civic developers? If a new application contest that launches this week bears digital fruit, there just might be an app for that.
The state of California will partner with Microsoft, Google and Programmable Web to run an apps contest this summer. "While California is one of the anchor supporters, it wouldn't be possible without the help of the Center for Digital Government, which brought together the framework for the contest to be held,' said Adrian Farley, chief deputy CIO for the State of California, speaking in an interview Tuesday morning. "Without their sponsorship, this wouldn't have happened.
For those keeping score, that means two of the biggest technology companies in the world will be partnering with California to bring its open data to life. And the applications developed to create value from that open government data are likely to run on the iPhone, made by Apple, the company that brought the concept of a platform for applications to unprecedented heights.
Winners will be presented with their prizes at the "Best of the Web" awards in Hollywood in mid-September. The app contest will be coupled with a refreshed Data.CA.gov, which is now in soft launch. Data.CA.gov now 400 major data sources, including XLS, CSV and XML formats. State officials estimate the site conservatively contains over 100 million records.
tags: app contest, gov 2.0, gov 20
| comments: 3
submit:
Data and simplicity can build the government platform
Aneesh Chopra and Tim O'Reilly on government as a platform, open data and more.
by Mac Slocum | @macslocum | comments: 4
Tim O'Reilly and Aneesh Chopra, Federal Chief Technology Officer, had a wide-ranging discussion at this week's Gov 2.0 Expo in Washington D.C. As a relative novice in government matters, I was fortunate to be a fly on the wall during their chat. My understanding of the issues and opportunities at play increased exponentially during their 15-minute conversation. It was the highlight of the Expo for me.
The real head-turner was this: turning government into a platform is not as complicated or far off as I previously believed.
tags: aneesh chopra, gov 2.0, gov 20
| comments: 4
submit:
Gov 2.0 Week in Review
by Alex Howard | @digiphile | comments: 1
Will Gov 2.0 be the next Internet boom? Yesterday's special report from Businessweek explored some of the entrepreneurs that are finding success applying government data and innovative technology to deliver better services. As SeeClickFix's Ben Berkowitz put it, government 2.0 is a way of "redistributing governance to the hands of citizens."
This week's review is necessarily heavy on the video, people and services that were featured in the past week's Gov 2.0 Expo in Washington, D.C. You can find aggregated coverage of "#g2e" at the Expo website and videos on YouTube. Make sure to catch Tim Berners-Lee on Data.gov.uk and open government, Tim O'Reilly's conversation with U.S. CTO Aneesh Chopra about government as a platform and life in the data cloud, Facebook, privacy and more for the PBS Newshour. Video is embedded below:
This past week also saw the launch of America Speaking Out on Microsoft Town Hall and the official registration of Law.gov. You can read more about Carl Malamud's vision for Law.gov at public-resource-org.
More news, exclusive interviews, video and government 2.0 resources after the jump.
tags: Expo, gov 2.0, government 2.0, open data, open government
| comments: 1
submit:
Four short links: 28 May 2010
Understanding a Shuffle, Bias, Open Source a Success in Malaysia, and Guardian APIs
by Nat Torkington | @gnat | comments: 0
- The Intuition Behind the Fisher-Yates Shuffle -- this is a simple algorithm to randomize a list of things, but most people are initially puzzled that it is more efficient than a naive shuffling algorithm. This is a nice explanation of the logic behind it.
- Wikipedia and Inherent Open Source Bias -- a specific case of what I think of as the Firefly Principle: what happens on the Internet isn't representative of real life.
- Malaysian Public Sector Open Source Program -- the Malaysian government is a heavy and successful user of open source.
- Guardian's Platform Now Open for Business (GigaOm) -- elegant summary breakdown of services from the Guardian: metadata for free, content if you pay, custom APIs and applications if you pay more. I'm interested to see how well this works, given that the newspaper business is struggling to find a business model that values content.
tags: api, government, Guardian, open source, programming, wikipedia
| comments: 0
submit:
Tim Berners-Lee on Data.gov.uk, open linked data and open standards
Gov 2.0 Expo Keynote Video and Exclusive Interview
by Alex Howard | @digiphile | comments: 1
Can you explain open linked data using a bag of chips? Tim Berners-Lee did precisely that yesterday in his keynote at the Gov 2.0 Expo. You can watch the video below:
After the jump, you can watch an exclusive interview with Berners-Lee exploring open linked data, how governments' open data efforts should be judged, and more.
tags: gov 2.0, government 2.0, open data
| comments: 1
submit:
Four short links: 27 May 2010
Big Dumps, 3D Printing Atom Movers, Faceted Browsing, and Useful Math
by Nat Torkington | @gnat | comments: 0
- Socorro: Mozilla's Crash Reporting System (Laura Thomson) -- We receive on our peak day each week 2.5 million crash reports, and process 15% of those, for a total of 50 GB. In total, we receive around 320Gb each day. Moving to a Hadoop-based system in the future, as they're limited by database and filesystem storage.
- DIY Atomic Force Microscopy -- use a 3D printer to make the parts so you can build a cheap and simple AFM head suitable for single molecule force spectroscopy. (via Vik Olliver)
- Elastic Lists -- open-sourced ActionScript for a clever faceted browsing system. (via Flowing Data)
- The Most IMPORTANT Video You'll Ever See (YouTube) -- a math lesson everyone should have. (via Hacker News)
tags: 3D printers, business, math, mozilla, numbers, open source, science, ui, visualization
| comments: 0
submit:
Venture capitalists do it. Why shouldn't philanthropists do it, too?
by Elizabeth Corcoran | comments: 17
Pitch a new idea to venture capitalists and the first question they’ll shoot back is: “Who else is in your space?”
If you can’t answer that question, go straight back to “Go” and don’t even dream of collecting $200.
VCs, of course, needs to weigh competitive as well as potentially complementary efforts. But answering that question should help the entrepreneur, too. Entrepreneurs are most likely to help a field move forward if they build on the knowledge and the mistakes of the past rather than tripping down the well-trodden road.
Really compelling ideas draw multiple entrepreneurs (think of how the idea of social networking brought out Facebook, MySpace and a swarm of other startups). And sometimes ideas have to wait for the technology to catch up (picture phones and electronic books come to mind).
Smart startups, however, look for unique approaches even when tackling a problem that others are--or have--taken on. And the fastest way to assess whether an approach is fresh or a rerun is to know what else is going on.
So what about the educational-technology space? We want to invent new approaches and ideas that will engage students, teachers (and even the occasional parent). But do we have good maps of what’s going on—not just in the for-profit venture sector but in the philanthropic sector, too?
Dale Dougherty, who’s no slouch when it comes to staying on top of the latest technology, summed up the problem well in his recent post:
“I wished the teams themselves were a better judge of their own proposals, and that they understood how their project advanced appropriate uses of technology in education. I wished that each of the applicants had been able to consult an evolving set of best practices for developing educational technology projects.
. They might help others avoid pitfalls and learn from failures.“
Our problems in education are too intense, funding is too thin and time too precious to take on duplicative efforts. We need to apply some of the same discriminating standards in our philanthropic Edu2.0 projects that we use in for-profit ones.
So what would be the relevant features of a topographical map of the educational-technology sector? Here’s one set of categories:
Projects aimed at:
• Improving instruction
• Individualized (adaptive) instruction
• Doing assessment
• Improving teacher practices
• Promoting project-based learning
• Improving transparency
• Bridging the school-home communications gap
• Improving school infrastructure
What would you add? What elements do you think would help people designing education-technology projects get a useful picture of what else is going on?
tags: edu 2.0, education, technology
| comments: 17
submit:
Crisis Commons releases open source oil spill reporting
by Alex Howard | @digiphile | comments: 4
Crisis Commons has released a new open data initiative to enable response organizations to report from the oil spill. Oil Reporter allows response workers to capture and share data with the public as they respond to the Deepwater Horizon Oil Spill.
"The cool thing about the app is that the photos and information will be open to anyone to use," said Heather Blanchard, co-founder of Crisis Commons. "We want response organizations to use it. They can localize the app with their own logo and add data elements, thus expanding the API. They can be assigned a code so they can compare their data with the public. We believe the data with codes would be more of a verified set, as they would be response organizations and their volunteers using those codes."
tags: android, crisiscamp, crisiscommons, disaster tech, gov 2.0, iphone app, mobile systems, oil spill
| comments: 4
submit:
Four short links: 26 May 2010
Reading Outlook in Open Source, Android Tablets, Websocket Editing, Jabber for Node.js
by Nat Torkington | @gnat | comments: 0
- PSTSDK -- Apache-licensed code from Microsoft to read Outlook files. Covered by Microsoft's Open Specification Promise not to assert related patents against users of this library.
- Cheap Android Tablet -- not multitouch, but only $136. Good for hacking with in the meantime. (via Hacker News)
- Real-Time Collaborative Editing with Websockets, node.js, and Redis -- uses Chrome's websockets alternative to Comet and other long-polling web connections.
- XMPP Library for Node.js -- I'm intrigued to see how quickly Node.js, the Javascript server environment, has taken off.
tags: apache, Microsoft, Node.js, open source, real-time, XMPP
| comments: 0
submit:
Recent Posts
- Facebook Open Graph: A new take on semantic web | by Alex Iskold on May 25, 2010
- Four short links: 25 May 2010 | by Nat Torkington on May 25, 2010
- The iPad and immersive computing | by Marc Hedlund on May 24, 2010
- 37signals' "Profitable and Proud" | by Marc Hedlund on May 24, 2010
- Google vs Apple: Google Doesn't Need To Win | by Mike Loukides on May 24, 2010
- What does Government 2.0 look like? | by Mark Drapeau on May 24, 2010
- Four short links: 24 May 2010 | by Nat Torkington on May 24, 2010
- Gov 2.0 Week in Review | by Alex Howard on May 21, 2010
- My Contrarian Stance on Facebook and Privacy | by Tim O'Reilly on May 21, 2010
- App contests are unlocking government innovation | by Alex Howard on May 21, 2010
- Open space data can improve lives (and save birds) | by Alex Howard on May 21, 2010
- The solutions to our big problems are in the network | by Mac Slocum on May 21, 2010
STAY CONNECTED
RECOMMENDED FOR YOU
O'Reilly Home | Privacy Policy © 2005 - 2010, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.