Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Sun, 03 Aug 2025 13:00:08 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20081217075230 location: https://web.archive.org/web/20081217075230/https://radar.oreilly.com/operations/ server-timing: captures_list;dur=0.469891, exclusion.robots;dur=0.019644, exclusion.robots.policy;dur=0.010532, esindex;dur=0.009758, cdx.remote;dur=8.164442, LoadShardBlock;dur=98.378846, PetaboxLoader3.datanode;dur=53.460782 x-app-server: wwwb-app212 x-ts: 302 x-tr: 130 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: wb-p-SERVER=wwwb-app212; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Sun, 03 Aug 2025 13:00:08 GMT content-type: text/html x-archive-orig-date: Wed, 17 Dec 2008 07:52:29 GMT x-archive-orig-server: Apache x-archive-orig-last-modified: Wed, 17 Dec 2008 01:03:28 GMT x-archive-orig-etag: "2be06ec-1af63-49484fe0" x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 110435 x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Wed, 17 Dec 2008 07:52:30 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Wed, 08 Aug 2007 05:32:16 GMT", ; rel="prev memento"; datetime="Fri, 24 Oct 2008 22:57:24 GMT", ; rel="memento"; datetime="Wed, 17 Dec 2008 07:52:30 GMT", ; rel="next memento"; datetime="Sat, 27 Dec 2008 02:34:16 GMT", ; rel="last memento"; datetime="Mon, 26 Feb 2024 18:20:10 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: 52_7_20081217034520_crawl109-c/52_7_20081217075111_crawl100.arc.gz server-timing: captures_list;dur=0.716635, exclusion.robots;dur=0.025782, exclusion.robots.policy;dur=0.011043, esindex;dur=0.015370, cdx.remote;dur=15.209755, LoadShardBlock;dur=233.590493, PetaboxLoader3.datanode;dur=80.201863, PetaboxLoader3.resolve;dur=253.614984, load_resource;dur=152.225214 x-app-server: wwwb-app212 x-ts: 200 x-tr: 629 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip Operations | Blog | O'Reilly Radar

Operations

Sat

Nov 29
2008

listen

Data Center Power Efficiency

by Jesse Robbins

James Hamilton is one of the smartest and most accomplished engineers I know. He now leads Microsoft's Data Center Futures Team, and has been pushing the opportunities in data center efficiency and internet scale services both inside & outside Microsoft. His most recent post explores misconceptions about the Cost of Power in Large-Scale Data Centers:

I’m not sure how many times I’ve read or been told that power is the number one cost in a modern mega-data center, but it has been a frequent refrain. And, like many stories that get told and retold, there is an element of truth to the it. Power is absolutely the fastest growing operational costs of a high-scale service. Except for server hardware costs, power and costs functionally related to power usually do dominate.

However, it turns out that power alone itself isn’t anywhere close to the most significant a cost. Let’s look at this more deeply. If you amortize power distribution and cooling systems infrastructure over 15 years and amortize server costs over 3 years, you can get a fair comparative picture of how server costs compare to infrastructure (power distribution and cooling). But how to compare the capital costs of server, and power and cooling infrastructure with that monthly bill for power?

The approach I took is to convert everything into a monthly charge. [...]

James Hamilton explains Datacenter Costs

[link]

tags: cloud computing, energy, James Hamilton, microsoft, operations, performance, platforms, utilities, utility computing, velocity, velocity09, web2.0 | comments: 8 | Sphere It
submit:

Tue

Nov 25
2008

listen

My Web Doesn't Like Your Enterprise, at Least While it's More Fun

by Jim Stogdill

The other day Jesse posted a call for participation for the next Velocity Web Operations Conference. My background is in the enterprise space, so, despite Velocity's web focus, I wondered if there might not be interest in a bit of enterprise participation. After all, enterprise data centers deal with the same "Fast, Scaleable, Efficient, and Available" imperatives. I figured there might be some room for the two communities to learn from each other. So, I posted to the internal Radar author's list to see what everyone else thought.

Mostly silence. Until Artur replied with this quote from one of his friends employed at a large enterprise: "What took us a weekend to do, has taken 18 months here." That concise statement seems to sum up the view of the enterprise, and I'm not surprised. For nearly six years I've been swimming in the spirit-sapping molasses that is the Department of Defense IT Enterprise so I'm quite familiar with the sentiment. I often express it myself.

We've had some of this conversation before at Radar. In his post on Enterprise Rules, Nat used contrasting frames of reference to describe the web as your loving dear old API-provisioning Dad, while the enterprise is the belt-wielding standing-in-the-front-door-when-you-come-home-after-curfew step father.

While I agree that the enterprise is about control and the web is about emergence (I've made the same argument here at Radar), I don't think this negative characterization of the enterprise is all that useful. It seems to imply that the enterprise's orientation toward control springs fully formed from the minds of an army of petty controlling middle managers. I don't think that's the case.

I suspect it's more likely the result of large scale system dynamics, where the culture of control follows from other constraints. If multiverse advocates are right and there are infinite parallel universes, I bet most of them have IT enterprises just like ours; at least in those shards that have similar corporate IT boundary conditions. Once you have GAAP, Sarbox, domain-specific regulation like HIPAA, quarterly expectations from "The Street," decades of MIS legacy, and the talent acquisition realities that mature companies in mature industries face, the strange attractors in the system will pull most of those shards to roughly the same place. In other words, the IT enterprise is about control because large businesses in mature industries are about control. On the other hand, the web is about emergence because in this time, place, and with this technology discontinuity, emergence is the low energy state.

Also, as Artur acknowledged in a follow up email to the list, no matter what business you're in, it's always more fun to be delivering the product than to be tucked away in a cost center. On the web, bits are the product. In the enterprise bits are squirreled away in a supporting cost center that always needs to be ten percent smaller next year.

(continue reading)

tags: operations, web2.0 | comments: 18 | Sphere It
submit:

Thu

Nov 20
2008

listen

Velocity 2009: Themes, ideas, and call for participation...

by Jesse Robbins

Last year's Velocity conference was an incredible success. We expected around 400 people and we ended up maxing out the facility with over 600. This year we're moving the conference to a bigger space and extending it to 3 days to accommodate workshops and longer sessions. Velocity 2009 will be on June 22-24th, 2009 at the Fairmont Hotel in San Jose, CA.

This year's conference will be especially important. I've said many times that Web Performance and Operations is critical to the success of every company that depends on the web. In the current economic situation, it's becoming a matter of survival. The competitive advantage comes from the ability to do two things:

Generate more revenue with fewer resources

Respond quickly to change

Our Velocity 2009 mantra is "Fast, Scalable, Efficient, Available", a slight change from last year. (We've replaced "Resilient" with "Efficient" to make focus clear.)

I'm excited to announce that joining Steve Souders & I on this year's program committee are John Allspaw, Artur Bergman, Scott Ruthfield, Eric Schurman, and Mandi Walls. We've already started working on the program, and have just opened the Call for Participation.

(continue reading)

tags: Artur Bergman, conferences, Eric Schurman, John Allspaw, Mandi Walls, operations, performance, Scott Ruthfield, Steve Souders, velocity, velocity09, web2.0, webops | comments: 0 | Sphere It
submit:

Sat

Nov 1
2008

listen

DisasterTech: "Decisions for Heroes"

by Jesse Robbins

One of the most interesting DisasterTech projects I've been following is "Decisions for Heroes" led by developer and Irish Coast Guard volunteer Robin Blandford.

Decisions is like Basecamp for volunteer Search & Rescue teams. The focus is on providing "just enough" process to compliment the real-world workflow of a rescue team, without unnecessary complexity. One of Robin's design goals is that:

User requirements are nil. Nobody likes reading manuals - if we have to write one, we've gotten too complicated.

This is the winning approach for building systems that "serve those that serve others", and is echoed by InSTEDD's design philosophy and the Sahana disaster management system.

Teams begin by entering their responses to incidents and training exercises. They then tag them with things like the weather conditions, the tools and skills required, and who from the team was deployed.

As a team's incident database grows this information can be used to show heatmaps, and provide powerful insight on the locations, weather conditions, and times of year that various incidents occur. Over time this kind of data could be analyzed in aggregate across multiple teams and regions and create an incredibly powerful resource for Emergency Managers. This is very similar to what Wesabe does for consumers with financial transaction data today (disclosure: OATV investment).

Rescue team members enter training dates and levels. The system tracks certification expiration dates and prompts team members & leaders to plan classes and remain current. This is a huge issue for volunteers who have to manage professional-level training requirements with the demands of a regular career.

As more incidents are entered into the system, it compares the skills required for each of the rescues with the team training exercises. This allows teams to identify areas to focus, train, and develop new skills.

This is an innovative project with tremendous potential, and hopefully an early signal of coming changes in Emergency Management.

(Note: ''How to Serve those that Serve Others" will be the theme of my "High Order Bit" session at the Web2.0 Summit. I'll be sure to post video/slides/notes when they are available.)

tags: disaster tech, disastertech, emergency management, firefighting, humanitarian aid, ict, innovation, operations, rescue, social networking, web 2.0, webops | comments: 2 | Sphere It
submit:

Fri

Oct 31
2008

listen

Sprint blocking Cogent network traffic...

by Jesse Robbins

It appears that Sprint has stopped routing traffic (called "depeering") from Cogent as a result of some sort of legal dispute. Sprint customers cannot reach Cogent customers, and vice versa. The effect is similar to what would happen if Sprint were to block voice phonecalls to AT&T customers.

Here's a graph that shows the outage, courtesy of Keynote :

Rich Miller at DataCenterKnowledge has a great summary of the issues behind the incident, which has happened with Cogent before. Rich says:

At the heart of it, peering disputes are really loud business negotiations, and angry customers can be used as leverage by either side. This one will end as they always do, with one side agreeing to pay up or manage their traffic differently.

I think this is particularly Radar-worthy because it provides an example of the complex issues around Net Neutrality . In this case customers are harmed and most (especially Sprint wireless customers) will have no immediate recourse.

(continue reading)

tags: cloud computing, cogent, disruption, innovation, internet policy, network neutrality, operations, sprint, utilities, utility computing, webops | comments: 3 | Sphere It
submit:

Fri

Oct 24
2008

listen

Amazon's new EC2 SLA

by Jesse Robbins

Amazon announced a new SLA for EC2, similar to the one for S3. This is a notable step for Amazon and cloud computing as a whole, as it establishes a new bar for utility computing services.

Amazon is committing to 99.95% availability for the EC2 service on a yearly basis, which corresponds to approximately four hours and twenty three minutes of downtime per year. It's important to remember that an SLA is just a contract that provides a commitment to a certain level of performance and some form of compensation when a provider fails to meet it.

Here's the summary of the EC2 SLA (emphasis added):

Service Commitment AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit as described below. [...]

“Annual Uptime Percentage” is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of “Region Unavailable.” If you have been using Amazon EC2 for less than 365 days, your Service Year is still the preceding 365 days but any days prior to your use of the service will be deemed to have had 100% Region Availability [...]

“Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances. [...]

To receive a Service Credit, you must submit a request by sending an e-mail message to aws-sla-request @ amazon.com. To be eligible, the credit request must [...] include your server request logs that document the errors and corroborate your claimed outage (any confidential or sensitive information in these logs should be removed or replaced with asterisks)

This new SLA does not appear to address the reliability of server instances individually or in aggregate. For example, if half of a customer's EC2 instances lose their connections or die every 6 minutes, EC2 would still be considered "available" even if it is essentially unusable.

If the entire EC2 service is down a cumulative four hours and twenty minutes, customers must furnish proof of the outage to Amazon to be eligible for the 10% credit. This seems like an onerous process for very little compensation, and isn't in-line with Amazon's famous "Relentless Customer Obsession". Amazon takes monitoring very seriously and should take the lead by tracking, reporting, and proactively compensating customers when it lets them down.

tags: amazon, availability, cloud computing, ec2, operations, s3, sla, webops | comments: 7 | Sphere It
submit:

Thu

Aug 7
2008

listen

Kaminsky DNS Patch Visualization

by Jesse Robbins

Dan Kaminsky has posted the details of the widespread DNS vulnerability. Clarified Networks created this visualization of DNS patch deployment over the past month:

Red = Unpatched
Yellow = Patched, "but NAT is screwing things up"
Green = OK

tags: internet policy, operations, platform plays, velocity, worries | comments: 4 | Sphere It
submit:

Sat

Jun 28
2008

listen

The new internet traffic spikes

by Jesse Robbins

Theo Schlossnagle, author of Scalable Internet Architectures, gave a great explanation of how internet traffic spikes are shifting:

Lately, I see more sudden eyeballs and what used to be an established trend seems to fall into a more chaotic pattern that is the aggregate of different spike signatures around a smooth curve. This graph is from two consecutive days where we have a beautiful comparison of a relatively uneventful day followed by long-exposure spike (nytimes.com) compounded by a short-exposure spike (digg.com):

The disturbing part is that this occurs even on larger sites now due to the sheer magnitude of eyeballs looking at today's already popular sites. Long story short, this makes planning a real bitch.

[...]What isn't entirely obvious in the above graphs? These spikes happen inside 60 seconds. The idea of provisioning more servers (virtual or not) is unrealistic. Even in a cloud computing system, getting new system images up and integrated in 60 seconds is pushing the envelope and that would assume a zero second response time. This means it is about time to adjust what our systems architecture should support. The old rule of 70% utilization accommodating an unexpected 40% increase in traffic is unraveling. At least eight times in the past month, we've experienced from 100% to 1000% sudden increases in traffic across many of our clients.

[Link]

tags: operations, trends, velocity, web 2.0, worries | comments: 5 | Sphere It
submit:

Tue

Jun 24
2008

listen

Video of Rich Wolski's EUCALYPTUS talk at Velocity

by Jesse Robbins

Rich Wolski gave a truly impressive talk at Velocity about an open-source software infrastructure for cloud computing called EUCALYPTUS . The API is compatible with Amazon's EC2 interface, and the underlying infrastructure is designed to support multiple client-side interfaces. EUCALYPTUS is implemented using commonly-available Linux tools and basic Web-service technologies making it easy to install and maintain. Watch and learn...

You can see more videos from Velocity on Blip.tv.

tags: cloud computing, ec2, movers and shakers, open source, operations, platform plays, science, utility computing, velocity, velocity08, videos, web 2.0 | comments: 0 | Sphere It
submit:

Mon

Jun 23
2008

listen

Hyperic CloudStatus service dashboard launches at Velocity!

by Jesse Robbins

Javier Soltero just launched CloudStatus during his Hyperic sponsor session today at Velocity. CloudStatus is a public health dashboard for web services like Amazon's EC2/S3, and Google's App Engine.

Javier called to tell me about this last week after I declared that "Service Monitoring Dashboards are mandatory". This comes right after Amazon and Google had visible outages, and couldn't have happened at a better time. I'm really excited to see this idea take off, as it's something that is critical to the broad adoption of web services and cloud computing.

tags: cloudstatus, hyperic, monitoring, operations, outages, platform plays, specialized services, startups, velocity, velocity08, web 2.0, webops | comments: 6 | Sphere It
submit:

Tue

Jun 17
2008

listen

Service Monitoring Dashboards are mandatory for production services!

by Jesse Robbins

Google App Engine went down earlier today. GAE is still a developer preview release, and currently lacks a public monitoring dashboard. Unfortunately this means that many people either found out from their app and/or admin consoles being unavailable or from Mike Arrington's post on TechCrunch.

Google has a strong Web Operations culture, and there are numerous internal monitoring tools in use across the company, along with a smaller set available to customers. It's suprising that Google launched a developer platform without providing something beyond an email group, although they are by no means the first to do so.

Service Monitoring Dashboards are mandatory for production services and platforms!

If you launch a platform that people pay you money for, you need to have a real time service dashboard. Ideally this should be decoupled from the rest of your infrastructure.
Don't rely on platforms that lack service monitoring dashboards for production.

Many companies are initially reluctant to provide this kind of monitoring to the public, and only do so in reaction to an outage. However, it seems that every company that offers such a dashboard uses it as a source of competitive advantage.

The best example of this is trust.salesforce.com which they launched after series of outages in 2006. Amazon (eventually) launched a status dashboard for AWS, and added RSS feeds for specific services which I think is pretty cool.

Javier Soltero at Hyperic points out

1. The reports of service outages arrive long after anyone who depends on the services can possibly do anything to mitigate their effect.
2. The services themselves seem incapable of providing any visibility into the circumstances that might lead to future outages.

[...]Even TechCrunch points out that the Google Apps blog doesn’t even mention the outage. Other clouds rely on blogs such as this one, this one, or maybe even this one (from our good friends at Mosso). These are all places where outages can be discussed, but not the right means for people to find out whether it their application that crashed, or the cloud that it depends on.

(Updated:Niall Kennedy pointed out that GAE is still a preview release, and I agree that my original wording was wrong. My intent is to emphasize the importance of providing a public service dashboard and so I've edited accordingly.)

tags: failure happens, google app engine, infrastructure, internet policy, monitoring, operations, outages, platform plays, platforms, saas, velocity, web 2.0, web services, webops | comments: 6 | Sphere It
submit:

Tue

Jun 17
2008

listen

Two new open source projects at Velocity

by Jesse Robbins

At Velocity next week there will be two significant open source projects debuting. The first is the Jiffy: Open Source Performance Measurement and Instrumentation tool created by Scott Ruthfield and his team at Whitepages.com.

Most tools for measuring web performance come in two flavors:

Developer-installed tools (Firebug, Fiddler, etc.) that allow individuals to closely trace single sessions

Third-party performance monitoring systems (Gomez, Keynote, etc.) that will hit your site occasionally and report back component-level metrics (for a fee)

Neither of these tools give you real-world information on what’s actually happening with your clients—how long are pages really taking to load, what’s the real cost of client-side execution, and what’s the impact of your loading or dependency chain. This is even more important when you don’t host all of your own assets, such as when you load ads or JavaScript from third parties, for example, and you need to monitor their performance.

Thus we built Jiffy—an end-to-end system for instrumenting your web pages, capturing client-side timings for any event that you determine, and storing and reporting on those timings. You run Jiffy yourself, so you aren’t dependent on the performance characteristics, inflexibility, or costs of third-party hosted services.

The second is project is EUCALYPTUS, the Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems, presented by Rich Wolski from UCSB. This project has already started getting attention. (Many thanks to Surj Patel of Structure08/GigaOM for connecting us!)

Eucalyptus is an open-source software infrastructure for implementing "cloud computing" on clusters. The current interface to EUCALYPTUS is compatible with Amazon's EC2 interface, but the infrastructure is designed to support multiple client-side interfaces. EUCALYPTUS is implemented using commonly-available Linux tools and basic Web-service technologies making it easy to install and maintain.

The talk will focus on the design, the implementation tradeoffs we have identified in implementing Eucalyptus as an exploratory tool, and the ways in which we have chosen to address these tradeoffs in the first version of the software.

tags: cloud, cloud computing, ec2, gomez, jiffy, keynote, metrics, open source, operations, performance, platform plays, startups, structure08, velocity, velocity08, web 2.0, web monitoring, webops | comments: 3 | Sphere It
submit: