CARVIEW |
Jesse Robbins

Jesse Robbins is passionate about Infrastructure, Emergency Management, and technology that helps people be safe, happy, and free. He serves as co-chair of the Velocity Performance & Operations Conference and is part of the O’Reilly Radar. Jesse currently advises companies in Seattle and San Francisco. He previously worked at Amazon.com where his title was “Master of Disaster” and where he was responsible for Website Availability. Jesse is a volunteer Firefighter/EMT & Emergency Manager, and led a task force deployed in Operation Hurricane Katrina.
Thu
Feb 5
2009
Understanding Web Operations Culture - the Graph & Data Obsession
by Jesse Robbins | comments: 6
We’re quite addicted to data pr0n here at Flickr. We’ve got graphs for pretty much everything, and add graphs all of the time.
-John Allspaw, Operations Engineering Manager at Flickr & author of The Art of Capacity Planning
One of the most interesting parts of running a large website is watching the effects of unrelated events affecting user traffic in aggregate. Web traffic is something that companies typically keep very secret, and often the only time engineers can talk about it is late at night, at a bar, and very much off the record.
There are many good reasons for keeping this kind of information confidential, particularly for publicly traded companies with complicated disclosure requirements. There are also downsides, the biggest being that is difficult for peers to learn from each other and compare notes.
John Allspaw recently created a WebOps Visualizations group on Flickr for sharing these kinds of graphs with the confidential information removed. Here’s an example of a traffic drop seen both by Flickr & by Last.FM that coincided with President Obama’s inauguration.

Similar traffic drop on Last.FM seen on the right
Google saw a similar drop as well
Was it because everybody went to Twitter?
Besides being an interesting story, sharing these kinds of graphs help people build better monitoring tools and processes. As just one example: How should the WebOps team respond to this dip in traffic? Is it an outage? The inaguration was a very well known event and so it’s easy to explain the drop in traffic… what happens when a similar drop in traffic occurs? Should the WebOps team be looking at CNN (or trends in twitter) along with everything else?
How do you tell when that unexpected 10% drop in traffic is really just people with something more important to do than browse your site?
(Note: Updated since original posting to add Google & Twitter graphs and annotations, and to switch the Last.FM graphic with an annotated one after I got permission.)
tags: big data, culture, enterprise 2.0, flickr, infovis, john allspaw, last.fm, metrics, monitoring, operations, velocity, velocity09, web2.0, webops
| comments: 6
submit:
Sat
Nov 29
2008
Data Center Power Efficiency
by Jesse Robbins | comments: 8
James Hamilton is one of the smartest and most accomplished engineers I know. He now leads Microsoft's Data Center Futures Team, and has been pushing the opportunities in data center efficiency and internet scale services both inside & outside Microsoft. His most recent post explores misconceptions about the Cost of Power in Large-Scale Data Centers:
![]()
I’m not sure how many times I’ve read or been told that power is the number one cost in a modern mega-data center, but it has been a frequent refrain. And, like many stories that get told and retold, there is an element of truth to the it. Power is absolutely the fastest growing operational costs of a high-scale service. Except for server hardware costs, power and costs functionally related to power usually do dominate.
However, it turns out that power alone itself isn’t anywhere close to the most significant a cost. Let’s look at this more deeply. If you amortize power distribution and cooling systems infrastructure over 15 years and amortize server costs over 3 years, you can get a fair comparative picture of how server costs compare to infrastructure (power distribution and cooling). But how to compare the capital costs of server, and power and cooling infrastructure with that monthly bill for power?
The approach I took is to convert everything into a monthly charge. [...]
tags: cloud computing, energy, james hamilton, microsoft, operations, performance, platforms, utilities, utility computing, velocity, velocity09, web2.0
| comments: 8
submit:
Thu
Nov 20
2008
Velocity 2009: Themes, ideas, and call for participation...
by Jesse Robbins | comments: 0
Last year's Velocity conference was an incredible success. We expected around 400 people and we ended up maxing out the facility with over 600. This year we're moving the conference to a bigger space and extending it to 3 days to accommodate workshops and longer sessions.
Velocity 2009 will be on June 22-24th, 2009 at the Fairmont Hotel in San Jose, CA.
This year's conference will be especially important. I've said many times that Web Performance and Operations is critical to the success of every company that depends on the web. In the current economic situation, it's becoming a matter of survival. The competitive advantage comes from the ability to do two things:
Our Velocity 2009 mantra is "Fast, Scalable, Efficient, Available", a slight change from last year. (We've replaced "Resilient" with "Efficient" to make focus clear.)
I'm excited to announce that joining Steve Souders & I on this year's program committee are John Allspaw, Artur Bergman, Scott Ruthfield, Eric Schurman, and Mandi Walls. We've already started working on the program, and have just opened the Call for Participation.
tags: artur bergman, conferences, Eric Schurman, John Allspaw, mandi walls, operations, performance, scott ruthfield, steve souders, velocity, velocity09, web2.0, webops
| comments: 0
submit:
Mon
Nov 3
2008
Major milestone for ProgrammableWeb & "The Web as Platform"
by Jesse Robbins | comments: 2
Last week marked an important milestone for the "Web as Platform" as the 1,000 API was added to the ProgrammableWeb registry. John Musser (see: Web2.0 Report) started tracking the first few web service API's back in 2005.
Congratulations!How do these 1000 APIs break down by type? The following chart, derived from our database, shows the the top 15 sectors or markets with the greatest number of competing API providers. As you can see there are already 71 mapping-related APIs alone"
tags: apis, mashup, programmable web, web 2.0, web as platform, web2.0, web2summit
| comments: 2
submit:
Sat
Nov 1
2008
DisasterTech: "Decisions for Heroes"
by Jesse Robbins | comments: 2
One of the most interesting DisasterTech projects I've been following is "Decisions for Heroes" led by developer and Irish Coast Guard volunteer Robin Blandford.
Decisions is like Basecamp for volunteer Search & Rescue teams. The focus is on providing "just enough" process to compliment the real-world workflow of a rescue team, without unnecessary complexity. One of Robin's design goals is that:
User requirements are nil. Nobody likes reading manuals - if we have to write one, we've gotten too complicated.
This is the winning approach for building systems that "serve those that serve others", and is echoed by InSTEDD's design philosophy and the Sahana disaster management system.
Teams begin by entering their responses to incidents and training exercises. They then tag them with things like the weather conditions, the tools and skills required, and who from the team was deployed.
As a team's incident database grows this information can be used to show heatmaps, and provide powerful insight on the locations, weather conditions, and times of year that various incidents occur. Over time this kind of data could be analyzed in aggregate across multiple teams and regions and create an incredibly powerful resource for Emergency Managers. This is very similar to what Wesabe does for consumers with financial transaction data today (disclosure: OATV investment).

Rescue team members enter training dates and levels. The system tracks certification expiration dates and prompts team members & leaders to plan classes and remain current. This is a huge issue for volunteers who have to manage professional-level training requirements with the demands of a regular career.
As more incidents are entered into the system, it compares the skills required for each of the rescues with the team training exercises. This allows teams to identify areas to focus, train, and develop new skills.

tags: disaster tech, disastertech, emergency management, firefighting, humanitarian aid, ict, innovation, operations, rescue, social networking, web 2.0, webops
| comments: 2
submit:
Fri
Oct 31
2008
Sprint blocking Cogent network traffic...
by Jesse Robbins | comments: 3
It appears that Sprint has stopped routing traffic (called "depeering") from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to block voice phonecalls to AT&T customers.
Here's a graph that shows the outage, courtesy of Keynote :
Rich Miller at DataCenterKnowledge has a great summary of the issues behind the incident, which has happened with Cogent before. Rich says:
At the heart of it, peering disputes are really loud business negotiations, and angry customers can be used as leverage by either side. This one will end as they always do, with one side agreeing to pay up or manage their traffic differently.
I think this is particularly Radar-worthy because it provides an example of the complex issues around Net Neutrality . In this case customers are harmed and most (especially Sprint wireless customers) will have no immediate recourse.
tags: cloud computing, cogent, disruption, innovation, internet policy, network neutrality, operations, sprint, utilities, utility computing, webops
| comments: 3
submit:
Fri
Oct 24
2008
Amazon's new EC2 SLA
by Jesse Robbins | comments: 7
Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services.
Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes of downtime per year. It's important to remember that an SLA is just a contract that provides a commitment to a certain level of performance and some form of compensation when a provider fails to meet it.
Here's the summary of the EC2 SLA (emphasis added):Service Commitment AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit as described below. [...]To receive a Service Credit, you must submit a request by sending an e-mail message to aws-sla-request @ amazon.com. To be eligible, the credit request must [...] include your server request logs that document the errors and corroborate your claimed outage (any confidential or sensitive information in these logs should be removed or replaced with asterisks)
- “Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.” If you have been using Amazon EC2 for less than 365 days, your Service Year is still the preceding 365 days but any days prior to your use of the service will be deemed to have had 100% Region Availability [...]
- “Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances. [...]
This new SLA does not appear to address the reliability of server instances individually or in aggregate. For example, if half of a customer's EC2 instances lose their connections or die every 6 minutes, EC2 would still be considered "available" even if it is essentially unusable.
If the entire EC2 service is down a cumulative four hours and twenty minutes, customers must furnish proof of the outage to Amazon to be eligible for the 10% credit. This seems like an onerous process for very little compensation, and isn't in-line with Amazon's famous "Relentless Customer Obsession". Amazon takes monitoring very seriously and should take the lead by tracking, reporting, and proactively compensating customers when it lets them down.
tags: amazon, availability, cloud computing, ec2, operations, s3, sla, webops
| comments: 7
submit:
Wed
Oct 15
2008
Incredible images of the Sun
by Jesse Robbins | comments: 9The Boston Globe has assembled a beautiful gallery of images of the Sun.
This LASCO C2 image, taken 8 January 2002, shows a widely spreading coronal mass ejection (CME) as it blasts more than a billion tons of matter out into space at millions of kilometers per hour. The C2 image was turned 90 degrees so that the blast seems to be pointing down. An EIT 304 Angstrom image from a different day was enlarged and superimposed on the C2 image so that it filled the occulting disk for effect (Courtesy of SOHO/LASCO consortium)
[link courtesy Barry Brumitt]
tags: science, science education, sensors
| comments: 9
submit:
Tue
Sep 23
2008
Apple's restrictions mean more jailbreaking & Android adoption
by Jesse Robbins | comments: 2
When Apple announced the iPhone SDK last year I said:
[...] Jobs makes it clear that the platform won't be completely open. While he says that this is to balance the benefits of an open platform with user security protection, it's unclear where Apple will draw those lines. Will there be a Skype client? Third-party media apps?Almost a year later Apple is using their control of the App store to block innovative developers from reaching their customers. The most recent example is the "Podcaster" iPhone app which allows you to download and manage podcasts on the iPhone directly, without having to boot your computer to sync in iTunes.It would have been better if Apple had announced [the details] when it released the iPhone. I'm hopeful that Apple will now embrace the existing iPhone developer community, and won't use “security” as a way to keep potential competitors off its platform.
According to the developer, Apple blocked this application from the App store, saying:
Since Podcaster assists in the distribution of podcasts, it duplicates the functionality of the Podcast section of iTunes.
If you want to build a platform, you have to compete fairly with the developers on your platform (if you must to compete at all). By restricting developers, Apple is stifling innovation and their long-term growth. Frustrated customers and developers who "think different" are Jailbreaking their iPhones and getting excited about Google's Android.
Remember: Successful platforms create more value than they capture.
Update: Apple is apparently responding to the backlash by prohibiting discussion of the Apple's rejection letters with an NDA.
tags: android, apple, google, iphone, mobile, open source, platforms, web 2.0
| comments: 2
submit:
Thu
Aug 7
2008
Kaminsky DNS Patch Visualization
by Jesse Robbins | comments: 4
Dan Kaminsky has posted the details of the widespread DNS vulnerability. Clarified Networks created this visualization of DNS patch deployment over the past month:
Red = Unpatched
Yellow = Patched, "but NAT is screwing things up"
Green = OK
tags: internet policy, operations, platform plays, velocity, worries
| comments: 4
submit:
Recent Posts
- The new internet traffic spikes on June 28, 2008
- Video of Rich Wolski's EUCALYPTUS talk at Velocity on June 24, 2008
- Hyperic CloudStatus service dashboard launches at Velocity! on June 23, 2008
- code_swarm - visualizing the life of open source on June 18, 2008
- Service Monitoring Dashboards are mandatory for production services! on June 17, 2008
- Two new open source projects at Velocity on June 17, 2008
- Understanding Web Operations Culture (Part 1) on June 14, 2008
- CloudCamp gathering after Velocity on June 13, 2008
- BarCampBank is spreading on June 12, 2008
- Bill Coleman to keynote Velocity on June 11, 2008
STAY CONNECTED
BUSINESS INTELLIGENCE
RELEASE 2.0
Current Issue

Where 2.0: The State of the Geospatial Web
Issue 2.0.10
Back Issues
More Release 2.0 Back IssuesCURRENT CONFERENCES

ETech, the O'Reilly Emerging Technology Conference, is O'Reilly Media's flagship "O'Reilly Radar" event. Read more

Now in its third year, Web 2.0 Expo is for the builders of the next generation web: designers, developers, entrepreneurs, marketers, business strategists, and venture capitalists. Read more
O'Reilly Home | Privacy Policy ©2005-2009, O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938
Website:
| Customer Service:
| Book issues:
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.