CARVIEW |
Operations
On the performance of clouds
A study ran cloud providers through four tests. Here's some of the results.
by Alistair Croll | @acroll | comments: 13Public clouds are based on the economics of sharing. Cloud providers can charge less, and sell computing on an hourly basis without long-term contracts, because they're spreading costs and skills across many customers.
But a shared model means that your application is competing with other users' applications for scarce resources. The pact you're making with a public cloud, for better or worse, is that the advantages of elasticity and pay-as-you-go economics outweigh any problems you'll face.
Enterprises are skeptical because clouds force them to relinquish control over the underlying networks and architectures on which their applications run. Is performance acceptable? Will clouds be reliable? What's the tradeoff, particularly now that we know speed matters so much?
We (Bitcurrent) decided to find out. With the help of Webmetrics, we built four test applications: a small object, a large object, a million calculations, and a 500,000-row table scan. We ported the applications to five different clouds, and monitored them for a month. We discovered that performance varies widely by test type and cloud:
tags: cloud computing, operations, ops, velocity2010
| comments: 13
submit:
How Facebook satisfied a need for speed
Facebook boosted speed 2x. Director of engineering Robert Johnson explains how.
by Mac Slocum | @macslocum | comments: 6
Remember how Facebook used to lumber and strain? And have you noticed how it doesn't feel slow anymore? That's because the engineering team pulled off an impressive feat: an in-depth optimization and rewrite project made the site twice as fast.
Robert Johnson, Facebook's director of engineering and a speaker at the upcoming Velocity and OSCON conferences, discusses that project and its accompanying lessons learned below. Johnson's insights have broad application -- you don't need hundreds of millions of users to reap the rewards.
Facebook recently overhauled its platform to improve performance. How long did that process take to complete?
Robert Johnson: Making the site faster isn't something we're ever really done with, but we did make a big push the second half of last year. It took about a month of planning and six months of work to make the site twice as fast.
tags: facebook, operations, optimization, speed
| comments: 6
submit:
Velocity Culture: Web Operations, DevOps, etc...
by Jesse Robbins | @jesserobbins | comments: 0
Velocity 2010 is happening on June 22-24 (right around the corner!). This year we've added third track, Velocity Culture, dedicated to exploring what we've learned about how great teams and organizations work together to succeed at scale.
Web Operations, or WebOps, is what many of us have been calling these ideas for years. Recently the term "DevOps" has become a kind of rallying cry that is resonating with many, along with variations on Agile Operations. No matter what you call it, our experiences over the past decade taught us that Culture matters more than any tool or technology in building, adapting, and scaling the web.
Here is a small sample of the upcoming Velocity Culture sessions:
Ops Meta-Metrics: The Currency You Use to Pay For Change
Presenter: John Allspaw (Etsy.com)
Change to production environments can cause a good deal of stress and strain amongst development and operations teams. More and more organizations are seeing benefits from deploying small code changes more frequently, for stability and productivity reasons. But how can you figure out how much change is appropriate for your application or your culture?
A Day in the Life of Facebook Operations
Presenter: Tom Cook (Facebook)
Facebook’s Technical Operations team has to balance this need for constant availability with a fast-moving and experimental engineering culture. We release code every day. Additionally, we are supporting exponential user growth while still managing an exceptionally high radio of users per employee within engineering and operations.
tags: cloud, development, devops, operations, velocity10, velocity2010, velocityconf, web2.0, webops
| comments: 0
submit:
White House moves Recovery.gov to Amazon's cloud
by Alex Howard | @digiphile | comments: 0
Earlier today in a blog post on WhiteHouse.gov, federal CIO Vivek Kundra announced that Recovery.gov would be moving to the cloud. The Recovery Accountability and Transparency Board's primary contractor, Smartronix, chose Amazon's Elastic Compute Cloud (EC2) to host the site. NASA has used EC2 for testing, but this will be the first time a government website -- a ".gov" -- has been hosted on Amazon's EC2. Kundra estimated the savings to the operational budget to run Recovery.gov at approximately $750,000, with $334,000 coming in 2010 alone.
"This is a production system," said Kundra, during a press briefing today. "That's a critical difference from other agencies that have been testing or piloting. We don't have data that's sensitive in nature or vital to national security here."
The recovery board plans to redirect more than $1 million in computer hardware and software that were being used to host Recovery.gov to fraud oversight operations. It's a move that Earl Devaney, chairman of the recovery board, said will help identify fraud, waste and abuse in the recovery program.
tags: amazon ec2, cloud computing, gov 2.0, gov 20, operations
| comments: 0
submit:
Preparing for the realtime web
How the shift to realtime will affect the web (and why info overload is overblown).
by Mac Slocum | @macslocum | comments: 7
The dominance of static web pages -- and their accompanying user expectations and analytics -- is drawing to a close. Taking over: the links, notes, and updates that make up the realtime web. Ted Roden, author of O'Reilly's upcoming "Building the Realtime User Experience" and a creative technologist at the New York Times, discusses the realtime web's impact in the following Q&A.;
Mac Slocum: Have we shifted from a website-centric model to a user-centric model?
Ted Roden: It used to be that a user sat down at a computer and checked Yahoo and CNN.com and whatever else. Now, users get their Yahoo updates via Twitter and pushed into Facebook, wherever they are. So rather than a user going to to a specific website, websites are coming to where the users already are.
MS: Has push technology finally found its footing with realtime applications?
TR: I think so. It's not that push technology was a solution looking for a problem, it was only a partial solution. But now broadband has a wide penetration, browsers are much more stable and resource friendly, servers are cheap or free, and the development of realtime applications has gotten drastically easier. Using push technology without all of those other bits in place was a lot more painful for everybody involved and as a result, designers and programmers stuck with standard web apps. Now we can take a lot of that for granted and think like desktop application designers.
tags: internet operating system, operations, push, real-time, web as platform, web2.0
| comments: 7
submit:
What will the browser look like in five years?
Opera's Charles McCathieNevile on the web browser's near-term future.
by Mac Slocum | @macslocum | comments: 30
The web browser was just another application five years ago. A useful app, no doubt, but it played second fiddle to operating systems and productivity software.
That's no longer the case. Browsers have matured into multi-purpose tools that connect to the Internet (of course) and also grant access to a host of powerful online applications and services. Shut off your web connection for a few minutes and you'll quickly understand the browser's impact.
I got in touch with Charles McCathieNevile, Opera chief standards officer and a speaker at the upcoming Web 2.0 Expo, to discuss the the current role of web browsers and their near-term future. He shares his predictions in the following Q&A.;
MS: Will the web browser become the primary tool on computers?
Charles McCathieNevile: It isn't already? Email, document management, device control, are all done through the browser. Games are increasingly browser-based -- while it is not the only area that has taken time to move to the web, it is one of the biggest. There will always be applications that don't run in the browser, just because that is the way people are. But there is little reason for the browser not to be a primary application already.MS: Will we even see a browser in five years? Or, will it simply blend with the operating system?
CM: We will see it, but as its importance increases it will be the part people see of their interface to the computer. So it will be less noticeable. Five years ago people chose their computer for the OS, and the software available for that OS. Ten years ago much more so. Increasingly, the browser will be the thing people choose.
tags: browsers, internet operating system, opera, operations, web2.0
| comments: 30
submit:
Big data analytics: From data scientists to business analysts
by Ben Lorica | @dliman | comments: 0The growing popularity of Big Data management tools (Hadoop; MPP, real-time SQL, NoSQL databases; and others1) means many more companies can handle large amounts of data. But how do companies analyze and mine their vast amounts of data? The cutting-edge (social) web companies employ teams of data scientists2 who comb through data using different Hadoop interfaces and use custom analysis and visualization tools. Other companies integrate their MPP databases with familiar Business Intelligence tools. For companies that already have large amounts of data in Hadoop, there's room for even simpler tools that would allow business users to directly interact with Big Data.
A startup aims to expose Big Data to analysts charged with producing most routine reports. Datameer3 has an interesting workflow model that enables spreadsheet users to quickly perform analytics with data in Hadoop. The Datameer Analytics Solution (DAS) assumes data sits in Hadoop4, and from there a business analyst can rapidly load, transform, analyze, and visualize data:

Datameer's workflow uses the familiar spreadsheet interface as a data processing pipeline. Random samples are pulled into worksheets where spreadsheet functions let analysts customize transformations, aggregations, and joins5. Once their analytic models are created, results are computed via Hadoop's distributed processing technology (computations are initiated through a simple GUI). DAS contains over a hundred standard spreadsheet functions, NLP tools (tokenization, ngrams) for unstructured data, and basic charting tools.
What's intriguing about DAS is that it opens up Big Data analysis to large sets of business users. Based on the private demo we saw last week, we think Datameer is off to a good start. While still in beta, DAS has been deployed by many customers and feedback from users has resulted in an intuitive and extremely useful analytic tool. With DAS, spreadsheet users will be able to perform Big Data analysis without assistance from their colleagues in IT.
The buzz over Big Data has so far centered largely on (new) data management tools6. More recently, we're hearing from companies eager to tackle the next step: Big Data analysis ranging from routine reports to complex quantitative models. On one end, machine-learning algorithms and statistics are starting to appear as in-database analytic functions. At the other end, companies besides Datameer will develop Big Data analysis tools for average users (i.e., users who won't learn BI tools, SQL, Pig, Hive, and the like). If money isn't an issue, IBM's ambitious (and still immature) BigSheets project goes a step further than Datameer. It aims to provide data scientists with a single tool that can handle data acquisition (web crawlers), data management (Hadoop), text mining, and visualization (many eyes).
(1) Splunk is a tool that does both Big Data management and analytics.
(2) In fact data scientist is a title that's increasingly used in companies like Yahoo!, Facebook, Linkedin, Twitter, the NY Times, ...
(3) Datameer is a San Mateo startup, with some engineers in Germany. The company name is based on the German word for ocean.
(4) DAS can actually handle data from a variety of other sources, but for now, data from other sources gets pipelined to Hadoop in (near) real-time.
(5) Spreadsheet users should quickly be able to merge data sources with DAS: joins are done between worksheets and are intuitive. DAS is a single-tool that can handle data manipulation, analysis, and visualization, thus reducing the need to switch back-and-forth between multiple tools.
(6) Along with the cool new data management tools, there are occasional stories of amazing custom analytics produced by data scientists.
tags: analytics, big data, data scientist, hadoop, mpp, nosql, operations
| comments: 0
submit:
Twitter By The Numbers
by Ben Lorica | @dliman | comments: 1I collected some interesting stats from today's presentations at Chirp. Over a thousand people attended the conference and the numbers below attest to how vibrant the Twitter platform is. Today's announced API enhancements (e.g., user streams, annotations) will make the Twitter ecosystem even more interesting:
1. # of registered users: 105,779,710 (1,500% growth over the last three years.)
2. # of new sign-ups per day: ~ 300,000 (More recently, 60% of new accounts were from outside the U.S.)
3. # of new tweets per day: 55 million
4. # of unique daily visitors to the site twitter.com: ~ 180 million. (That's actually dwarfed by the traffic that flows through twitter's API -- 75% of traffic is through the API.)
5. # of API requests per day: 3 billion
6. # of registered apps: 100,000 (from 50,000 in Dec/2009)
7. # of search queries per day: 600 milion
8. Twitter's instance, of their recently open-sourced graph database (FlockDB), has 300 13 billion edges and handles 100,000 reads per second.
9. # of servers1: "... in the hundreds"
10. Blackberry's just released twitter app accounted for 7% of new sign-ups over the last few days
11. A NY Times story gets tweeted every 4 seconds.
(1) No surprise that Google has the most servers.
tags: big data, chirp, factoid, operations, twitter
| comments: 1
submit:
Web operators are brain surgeons
Our increased reliance on web-based intelligence makes speed and reliability even more important.
by Alistair Croll | @acroll | comments: 7
As humans rely on the Internet for all aspects of our lives, our ability to think increasingly depends on fast, reliable applications. The web is our collective consciousness, which means web operators become the brain surgeons of our distributed nervous system.
Each technology we embrace makes us more and more reliant on the web. Armed with mobile phones, we forget phone numbers. Given personal email, we ditch our friends' postal addresses. With maps on our hips, we ignore the ones in our glovebox.
For much of the Western world, technology, culture, and society are indistinguishable. We're sneaking up on the hive mind, as the ubiquitous computing envisioned by Mark Weiser over 20 years ago becomes a reality. Today's web tells you what's interesting. It learns from your behavior. It shares, connects, and suggests. It's real-time and contextual. These connected systems augment humanity, and we rely on them more and more while realizing that dependency less and less. Twitter isn't a site; it's a message bus for humans.
The singularity is indeed near, and its grey matter is the web.
Now think what that means for those who make the web run smoothly. Take away our peripheral brains, and we're helpless. We'll suddenly be unable to do things we took for granted, much as a stroke victim loses the ability to speak. Take away our web, and we'll be unable to find our way, or translate text, or tap into the wisdom of crowds, or alert others to an emergency.
tags: operations, ops, singularity, velocity10, web operators
| comments: 7
submit:
Brian Aker on post-Oracle MySQL
A deep look at Oracle's motivations and MySQL's future
by James Turner | comments: 1
Brian Aker parted ways with the mainstream MySQL release, and with Sun Microsystems, when Sun was acquired by Oracle. These days, Aker is working on Drizzle, one of several MySQL offshoot projects. In time for next week's MySQL Conference & Expo, Aker discussed a number of topics with us, including Oracle's motivations for buying Sun and the rise of NoSQL.
The key to the Sun acquisition? Hardware:
Brian Aker: I have my opinions, and they're based on what I see happening in the market. IBM has been moving their P Series systems into datacenter after datacenter, replacing Sun-based hardware. I believe that Oracle saw this and asked themselves "What is the next thing that IBM is going to do?" That's easy. IBM is going to start pushing DB2 and the rest of their software stack into those environments. Now whether or not they'll be successful, I don't know. I suspect once Oracle reflected on their own need for hardware to scale up on, they saw a need to dive into the hardware business. I'm betting that they looked at Apple's margins on hardware, and saw potential in doing the same with Sun's hardware business. I'm sure everything else Sun owned looked nice and scrumptious, but Oracle bought Sun for the hardware.
The relationship between Oracle and the MySQL Community:
BA: I think Oracle is still figuring things out as far as what they've acquired and who they've got. All of the interfacing I've done with them so far has been pretty friendly. In the world of Drizzle, we still make use of the Innodb plugin, though we are transitioning to the embedded version. Everything there has gone just along swimmingly well. In the MySQL ecosystem you have MariaDB and the other distributions. They're doing the same things that Ubuntu did for Debian, which is that they're taking something that's there and creating a different sort of product around it. Essentially though, it's still exactly the same product. I think some patches are flowing from MariaDB back into MySQL, or at least I've seen some notice of that. So for the moment it looks like everything's as friendly as it is going to be.
tags: databases, geodata, geolocation, interviews, mysql, nosql, operations, oracle, sun
| comments: 1
submit:
Google Fiber and the FCC National Broadband Plan
by Mike Loukides | @mikeloukides | comments: 12I've puzzled over Google's Fiber project ever since they announced it. It seemed too big, too hubristic (even for a company that's already big and has earned the right to hubris)--and also not a business Google would want to be in. Providing the "last mile" of Internet service is a high cost/low payoff business that I'm glad I escaped (a friend and I seriously considered starting an ISP back in '92, until we said "How would we deal with customers?").
But the FCC's announcement of their plans to widen broadband Internet access in the US (the "National Broadband Strategy") puts Google Fiber in a new context. The FCC's plans are cast in terms of upgrading and expanding the network infrastructure. That's a familiar debate, and Google is a familiar participant. This is really just an extension of the "network neutrality" debate that has been going on with fits and starts over the past few years.
Google has been outspoken in their support for the idea that network carriers shouldn't discriminate between different kinds of traffic. The established Internet carriers largely have opposed network neutrality, arguing that they can't afford to build the kind of high-bandwidth networks that are required for delivering video and other media. While the debate over network neutrality has quieted down recently, the issues are still floating out there, and no less important. Will the networks of the next few decades be able to handle whatever kinds of traffic we want to throw at it?
In the context of network neutrality, and in the context of the FCC's still unannounced (and certain to be controversial) plans, Google Fiber is the trump card. It's often been said that the Internet routes around damage. Censorship is one form of damage; non-neutral networks are another. Which network would you choose? One that can't carry the traffic you want, or one that will? Let's get concrete: if you want video, would you choose a network that only delivers real-time video from providers who have paid additional bandwidth charges to your carrier? Google's core business is predicated upon the availability of richer and richer content on the net. If they can ensure that all the traffic that people want can be carried, they win; if they can't, if the carriers mediate what can and can't be carried, they lose. But Google Fiber ensures that our future networks will indeed be able to "route around damage", and makes what the other carriers do irrelevant. Google Fiber essentially tells the carriers "If you don't build the network we need, we will; you will either move with the times, or you won't survive."
Looked at this way, non-network-neutrality requires a weird kind of collusion. Deregulating the carriers by allowing them to charge premium prices for high bandwidth services, only works as long as all the carriers play the same game, and all raise similar barriers against high-bandwidth traffic. As soon as one carrier says "Hey, we have a bigger vision; we're not going to put limits on what you want to do," the game is over. You'd be a fool not to use that carrier. You want live high-definition video conferencing? You got it. You want 3D video, requiring astronomical data rates? You want services we haven't imagined yet? You can get those too. AT&T; and Verizon don't like it? Tough; it's a free market, and if you offer a non-competitive product, you lose. The problem with the entrenched carriers' vision is that, if you discriminate against high-bandwidth services, you'll kill those services off before they can even be invented.
The U.S. is facing huge problems with decaying infrastructure. At one time, we had the best highway system, the best phone system, the most reliable power grid; no longer. Public funding hasn't solved the problem; in these tea-party days, nobody's willing to pay the bills, and few people understand why the bills have to be as large as they are. (If you want some insight into the problems of decaying infrastructure, here's an op-ed piece on Pennsylvania's problems repairing its bridges.) Neither has the private sector, where short-term gain almost always wins over the long-term picture.
But decaying network infrastructure is a threat to Google's core business, and they aren't going to stand by idly. Even if they don't intend to become a carrier themselves, as Eric Schmidt has stated, they could easily change their minds if the other carriers don't keep up. There's nothing like competition (or even the threat of competition) to make the markets work.
We're looking at a rare conjunction. It's refreshing to see a large corporation talk about creating the infrastructure they need to prosper--even if that means getting into a new kind of business. To rewrite the FCC Chairman's metaphor, it's as if GM and Ford were making plans to upgrade the highway system so they could sell better cars. It's an approach that's uniquely Googley; it's the infrastructure analog to releasing plugins that "fix" Internet Explorer for HTML5. "If it's broken and you won't fix it, we will." That's a good message for the carriers to hear. Likewise, it's refreshing to see the FCC, which has usually been a dull and lackluster agency, taking the lead in such a critical area. An analyst quoted by the Times says "One again, the FCC is putting the service providers on the spot." As well they should. A first-class communications network for all citizens is essential if the U.S. is going to be competitive in the coming decades. It's no surprise that Google and the FCC understands this, but I'm excited by their commitment to building it.
tags: broadband, carrier, fcc, google, infrastructure, network neutrality, operations
| comments: 12
submit:
Code review redux (good news from GitHub)
by Marc Hedlund | comments: 1
I wrote in 2008 about Review Board, a code review package I'd tried and liked. Unfortunately our developers didn't like it as much as I did, and having learned my lesson (thanks, FogBugz), I declined to impose a tool choice on them. They chose Gerrit, instead, which is more tightly bound to Git, and has some nice features related to that (such as pushing to master from a button in the UI when the review is complete). The rest of the UI is very unpolished, but has been getting progressively better.
Code review caused some frustrations for us -- the immediacy of "code, check in, ship" was lost, and it took some time for us to get to a new running pace. The benefits, though, were very obvious: we had dramatically fewer periods of downtime or instability after introducing reviews, and the overall quality and consistency of the code went up a lot. The mutual obligations created by asking for reviews changed the social dynamic for the better. Peer pressure caused people to report that they were much more hesitant to check something in with poor test coverage or an embarrassing hack. While anything that slows the pace of development kills me, the net payoff was high. (See Cedric Beust's 2006 post, "Why code reviews are good for you," for a great discussion of code review models and tradeoffs.)
When looking for a tool we also considered GitHub:FI, the "behind the firewall" version of GitHub. It wasn't really up to par when compared with Review Board, Crucible, or Gerrit. But so many things about GitHub are so appealing that we all wanted it to work. That's why I was excited to see today's announcement from GitHub, "Introducing GitHub Compare View" -- especially this note at the bottom of the post:
Compare View is the first of many code review related features we plan to introduce this year. We'll be incorporating Compare View into other areas of the site and developing entirely new features with Compare View as a core component.
Great. That's awesome. Can't wait to see what's coming.
tags: operations
| comments: 1
submit:
Recent Posts
- NoSQL conference coming to Boston | by Andy Oram on February 24, 2010
- Cyber warfare: don't inflate it, don't underestimate it | by Mac Slocum on February 11, 2010
- What's going on with OAuth? | by David Recordon on January 8, 2010
- Velocity 2010: Fast By Default | by Jesse Robbins on November 24, 2009
- More on how web performance impacts revenue... | by Jesse Robbins on October 1, 2009
- Four short links: 4 September 2009 | by Nat Torkington on September 4, 2009
- Is intimate personal information a toxic asset in client-cloud datacenters? | by Carl Hewitt on August 17, 2009
- John Adams on Fixing Twitter: Improving the Performance and Scalability of the World's Most Popular Micro-blogging Site | by Jesse Robbins on August 6, 2009
- Velocity and the Bottom Line | by Steve Souders on July 1, 2009
- Four short links: 29 June 2009 | by Nat Torkington on June 29, 2009
- Jonathan Heiliger on Web Performance, Operations, and Culture | by Jesse Robbins on June 24, 2009
- Announcing: Spike Night at Velocity | by Scott Ruthfield on June 19, 2009
STAY CONNECTED
RECOMMENDED FOR YOU
O'Reilly Home | Privacy Policy © 2005 - 2010, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.