CARVIEW

MOTORHOMES

Select Language

HTTP/2 302 server: nginx date: Tue, 15 Jul 2025 09:29:16 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20090702100059 location: https://web.archive.org/web/20090702100059/https://radar.oreilly.com//brady server-timing: captures_list;dur=0.950405, exclusion.robots;dur=0.030808, exclusion.robots.policy;dur=0.013312, esindex;dur=0.018477, cdx.remote;dur=28.145028, LoadShardBlock;dur=300.380014, PetaboxLoader3.datanode;dur=176.706085, PetaboxLoader3.resolve;dur=92.163119 x-app-server: wwwb-app220 x-ts: 302 x-tr: 382 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 set-cookie: SERVER=wwwb-app220; path=/ x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 302 server: nginx date: Tue, 15 Jul 2025 09:29:18 GMT content-type: text/plain; charset=utf-8 content-length: 0 x-archive-redirect-reason: found capture at 20090702100552 location: https://web.archive.org/web/20090702100552/https://radar.oreilly.com//brady/ server-timing: captures_list;dur=0.522908, exclusion.robots;dur=0.018526, exclusion.robots.policy;dur=0.009056, esindex;dur=0.012387, cdx.remote;dur=13.579391, LoadShardBlock;dur=380.503306, PetaboxLoader3.datanode;dur=348.700456, PetaboxLoader3.resolve;dur=441.425139, load_resource;dur=547.613084 x-app-server: wwwb-app220 x-ts: 302 x-tr: 1000 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() HTTP/2 200 server: nginx date: Tue, 15 Jul 2025 09:29:18 GMT content-type: text/html x-archive-orig-date: Thu, 02 Jul 2009 10:05:52 GMT x-archive-orig-server: Apache x-archive-orig-last-modified: Thu, 02 Jul 2009 00:42:48 GMT x-archive-orig-etag: "2d18a74-1986a-4a4c0288" x-archive-orig-accept-ranges: bytes x-archive-orig-content-length: 104554 x-archive-orig-connection: close x-archive-guessed-content-type: text/html x-archive-guessed-charset: utf-8 memento-datetime: Thu, 02 Jul 2009 10:05:52 GMT link: ; rel="original", ; rel="timemap"; type="application/link-format", ; rel="timegate", ; rel="first memento"; datetime="Fri, 07 Apr 2006 00:16:27 GMT", ; rel="prev memento"; datetime="Thu, 02 Jul 2009 10:00:59 GMT", ; rel="memento"; datetime="Thu, 02 Jul 2009 10:05:52 GMT", ; rel="next memento"; datetime="Fri, 18 Sep 2009 03:43:03 GMT", ; rel="last memento"; datetime="Mon, 03 Mar 2025 07:45:59 GMT" content-security-policy: default-src 'self' 'unsafe-eval' 'unsafe-inline' data: blob: archive.org web.archive.org web-static.archive.org wayback-api.archive.org athena.archive.org analytics.archive.org pragma.archivelab.org wwwb-events.archive.org x-archive-src: web_tran-022-20090702091259-00298-crawling107/web_tran-022-20090702100518-00329-crawling107.us.archive.org.warc.gz server-timing: captures_list;dur=0.636192, exclusion.robots;dur=0.022439, exclusion.robots.policy;dur=0.010800, esindex;dur=0.012595, cdx.remote;dur=6.403212, LoadShardBlock;dur=45.569338, PetaboxLoader3.datanode;dur=53.164870, load_resource;dur=173.098993, PetaboxLoader3.resolve;dur=155.395163 x-app-server: wwwb-app220 x-ts: 200 x-tr: 441 server-timing: TR;dur=0,Tw;dur=0,Tc;dur=0 x-location: All x-rl: 0 x-na: 0 x-page-cache: MISS server-timing: MISS x-nid: DigitalOcean referrer-policy: no-referrer-when-downgrade permissions-policy: interest-cohort=() content-encoding: gzip Brady Forrest | O'Reilly Radar

Brady Forrest

Brady Forrest is Chair for O'Reilly's Where 2.0 and Emerging Technology conferences. Additionally, he co-Chairs Web 2.0 Expo in San Francisco, Berlin and NYC. Brady writes for O'Reilly Radar tracking changes in technology. He previously worked at Microsoft on Live Search (he came to Microsoft when it acquired MongoMusic). Brady lives in Seattle, where he builds cars for Burning Man and runs Ignite. You can track his web travels at Truffle Honey.

Wed

Jul 1
2009

listen
print

Everyblock's Code is Open-Sourced

by Brady Forrest | @brady | comments: 2

The code for Adrian Holovaty's Everyblock has been released. The open-sourcing of the site's system were apart of the Knight News Challenge Program. Everyblock is a very impressive site that aggregates and geocodes local data -- news, crime, fire, restaraunt inspections and reviews - and then lets users define their interests down to the block-level.

Adrian made the announcement on 6/30. Here's the list of newly open-sourced, GPL'd goodies found on Everyblock's new Code page:

The main package (probably the thing you're looking for) is the publishing system, known as ebpub.
Second, the packages ebdata and ebgeo contain Python modules for processing data and making maps.
Third, the packages ebinternal and everyblock round out the code that powers EveryBlock.com. They're internal tools and are likely not of general use, but we're including them to be complete.
Finally, ebblog and ebwiki are our blog and wiki software, respectively. Because, dammit, the world needs another Django-powered blogging tool.

Django fans, Python geohackers and anyone who wants to build a local data aggregator are going to be thrilled. Adrian was one of the co-creators of Django and was one of the first Google Maps Mashup creators.

Everyblock has only launched in major US cities. There's plenty of room in the market for locals to create their own version. Everyblock spends a lot of time curating the incoming data feeds so I doubt that anyone will be able to roll out new sites too quickly. One thing to note: the trademark Everyblock is not available. However, the Everyblock team would not mind being acknowledged if you use their code. Personally I get a lot of value of Everyblock in my city. I get a daily email with all the crime, news and errata near my house.

Everyblock is now going to move onto the second stage of its existence. About five months Adrian blogged about the dilemna they would be facing when they open-sourced their software. As he said at the time:

But now we've reached an interesting point in our project's growth: our grant ends on June 30, and, under the terms of our grant, we're open-sourcing the EveryBlock publishing system so that anybody will be able to take the code to create similar sites. That's a Good Thing, in that EveryBlock's philosophies and tools will have the opportunity to spread around the world much faster than we could have done on our own, but it puts the six of us EveryBlockers in an odd spot. How do we sustain our project if our code is free to the world?

At the time I suggested that they try to federate with new everyblocks. After yesterday's announcement I mailed Adrian to ask him for a hint about their future plans, but for now he's keeping mum.

tags: geo, web 2.0 | comments: 2
submit:

Tue

Jun 30
2009

listen
print

Bing's Sanaz Ahari on System Feedback (2 of 2)

by Brady Forrest | @brady | comments: 2

A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion focuses on the systems used to generate the categorization.

Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.

Brady Forrest: Now on this image, it shows the ranking model and then it shows engagement and measurement.

Sanaz Ahari: Yes.

Brady Forrest: How does engagement and analytics factor into tweaking the ranking, measurement and engagement algorithms?

Sanaz Ahari: So the key thing about engagement is really there's two things: A, how often do people click on the different categories and then B, once they click on it, what do they do after that? So we basically feed that back into figuring out, "Okay. Did we actually put up the right thing? If something lower down is getting clicked on more, does it deserve to be higher? If something is not getting enough engagement, does it need to be bumped down?" And as we really expand the system, I'd have to say for us as a team, this is really the first step towards what we want to do. And, ideally, we want to get to the point for where we have enough understanding about every single query that we can really help you refine your tasks and your categories. So the engagement model can also help us in the future as we go in deeper into queries for helping people. We shouldn't just say, "Seattle, I'm going to Seattle restaurants." You should be able to go to Seattle restaurants and go in really deep and say, "I want restaurants in this neighborhood. I want of this price range, et cetera." So all of the engagement metrics can actually help us figure out what are the follow on tasks that users engage in the most as well.

Brady Forrest: And so what is the second flow chart?

Sanaz Ahari: So the second area, so once we felt that we could deliver intense understanding at a level of quality that we felt comfortable with, then we tackled the second area of problems which is equally difficult, which is really around, okay, how do we know that J Lo is a musician in the first place. And this is really around the query understanding aspect of things. And this is an area where we, again, explored multiple different approaches. We could've done a very kind of clustering on the entire corpus of our quarries. Or we could've said, "We're going to start a little bit more targeted and only go after the domains that we really want to go after." Like we said, "Let's just go after health and see if we can solve a small problem before trying to take on the entire corpus of the web."

For the Bing release, we focused -- and this was just like a principle that we had of the team was we really wanted to start small and see if we can get the level of quality that we wanted before trying to take on a lot more different challenges. And so, in this case, we definitely -- we went after the types of domains that we knew were strategic for us. So, all of the sudden, our corpus of quarries that we were interested in was a lot smaller. And we already have abilities to classify quarries into domains and understand, okay, this query is a music query or this query is health query, et cetera, et cetera. And so the other problems that fall out of that is, okay, when people do do health quarries, what are the categories that fall out of that? Like how do we know that people are going to care about diseases and symptoms, et cetera, et cetera. And then the next problem after that is how do we know that we have a comprehensive understanding of all diseases? So we may be able to understand that there are N different diseases, but how do we know that that’s actually a comprehensive list?

And then lastly, there's a problem about -- and this is one of the fascinating search problems -- is users query for the same thing in many, many different ways. So an example that I had was, for example, health is actually a very complicated one where the ALS disease is also known as Lou Gehrig's disease. And it's also known as one other thing which sounds kind of complicated. I don't even know how to say it. But there's lots of different ways that people basically query for the same thing. And so those were the three different problems that we really had to tackle in the query understanding space. So the two areas that we basically looked at was A, if we are able to identify a C set of quarries in a category, how can we actually really expand that out and be able to understand that we have a comprehensive list expander? Like if we start with N items, are we able to expand it out and get a more comprehensive list of items that are very similar to an existing C set that we started out with. And that's really the query expansion problem.

 Brady Forrest: And what type of numbers are you talking about? Is it 100 or 1,000 or 100,000?

Sanaz Ahari: Oh, for the C sets? It completely varies. It completely varies. There are some categories that are small. There are some that are large. Like if you try to tackle musicians as a whole, that's huge. Whereas if you try to tackle like sports teams or something, that's pretty small. So it varies.

Brady Forrest: And are you pulling category names? Like are you pulling Wikipedia? Like proper nouns in the case of musicians or are you also pulling raw queries from the logs?

Sanaz Ahari: There's definitely both. We use a whole bunch of different features. We do a lot of work from logs. We do a lot of work on document extraction as well. What's very interesting is logs can give you a lot of great information where we have enough information. So it doesn't necessarily help you address the tail with precision. And document extraction can potentially help you with more comprehensiveness. And one of the things I would say is we also realize the good thing about approach on a whole actually, both on the intense extraction side and on query understanding side has been that it was an amazing learning experience for the team to tackle the problems one at a time because we realized there were so many intricacies that there are some things where we can build a generic system and it can help every category. But there were also cases where we would find a lot of intricacies in some categories where we had to do --

Brady Forrest: So what's a query that you're proud of that was like really hard and you feel like -- like an example of a query that really came a long way?

Sanaz Ahari: I actually don't have one at the tip of my -- I do like the experience for Jennifer Lopez because she has a lot of different attributes.

Brady Forrest: What's one that you really want to improve but you didn't want to tweak by hand?

Sanaz Ahari: Actually, the Jaguar one was one (Bing search), the one this morning that we talked about. That was a great query. And in some ways, I actually think we do a lot of positive things with that query. Like in one sense, I would say that we definitely deliver a diversified experience. And we at least capture the different intents. Whereas without the left rail altogether, you get the -- most users don't really go past the third algorithm result. And that in and of itself doesn't really give users enough diverse to creations [word] to say, "Okay. This is really my intent. And this is what I really want to dig down to." So on one hand, I like what we have done. But in the ideal scenario, I envision us being able to enumerate all of the different intents and all of the different tasks that actually fall under every single intent. So ideally, we should be able to call out animal, team, car, et cetera and then call out the individual tasks that the users want to do beneath every single one of them. There is -- the two areas that I really, really want us to improve is one, around that. I think that disambiguation is a pretty hard problem where we've barely scratched the surface. And then the second area is the depths of our coverage. You know, I really want us to have a much deeper experience where if I type in Indian restaurants in Fremont (Bing search), I should be able to still get a categorized experience where I can still dig in deeper.

Brady Forrest: What percentage of queries categorize the experience?

Sanaz Ahari: So today, 20 percent of our queries have a categorized experience. And the team is actively working on our next release where we are working on increasing both the quality and the coverage and specifically going more into longer queries.

Brady Forrest: Okay. Well, thank you very much, Sanaz.

Sanaz Ahari: Thank you.

tags: bing, san ahari, web 2.0 | comments: 2
submit:

Mon

Jun 29
2009

listen
print

Bing's Sanaz Ahari on Query Level Categorization (1 of 2)

by Brady Forrest | @brady | comments: 0

A couple of weeks ago Bing had a small search summit for analysts, bloggers, SEO experts, entrepreneurs and advertisers. It was held in Bellevue; they put us up in the hotel and fed us. While there we received demos from Bing project teams. I was able to snag an interview with Sanaz Ahari, Lead PM on Bing. She led the team that developed the categories you see on a Bing web search. The interview was based on the slides from her presentation at the event. I have posted the significant images from her slides. The first portion of the interview focuses on how the Bing team handles Query level categorization and some of the problems they face. The second portion (up shortly) focuses on the systems used to generate the categorization.

Disclosure: I was on the MSN Search team (now the Bing team) from 2004- March, 2006. I knew Sanaz at that time.

bing j-lo

Brady Forrest: Hi, this is Brady Forrest with O'Reilly Radar, and I'm here with Sanaz Ahari, Lead PM on Bing Search. And she's going to lead us through the categorization process that you see on every page. Hey, Sanaz.

Sanaz Ahari: Hey, Brady. So I'm going to walk you through basically kind of just the journey that we went through for coming up with our categorized experience. And so the categorized experience is basically the left rail experience that you see on Bing today. It doesn't show up for every single query today, but when it does show up, it's really about helping the users complete their task essentially. So just to take a step back, when we started on the project, we had done a lot of analysis on queries just in vacuum. And queries are always a part of users completing a task. And in a lot of the analysis we did, we noticed that a lot of the tasks are common. And it's really just common sense. When you're looking for a car, you're either researching it; you already own it; you want to buy one. When you're looking for a musician, you want to see if they're on tour; you want lyrics, songs, albums, et cetera.

And so our challenge was can we apply some of that essentially structured aspect to queries. And this is really similar to what you see on sites like Amazon, IMDB, et cetera. They do just a really kick ass job of categorizing their content. The challenge is that A, those sites are really about one domain. And then B, those sites are really operating on top of already structured data. And so the challenge that we have with search is that A, we are a general purpose search engine, and then B, the data that we have is not structured. So the goal that we started out with was we wanted to start very simple. And categorization on clustering, et cetera are nothing really new in the search space. There are a lot of people for years that have been working around the space in the research and computer science space.

So what we started out with was one of the key things that we wanted were two principles. One of them was A, can we achieve aspects and categories that were really, really user intuitive. And B, can we achieve this across a query class. One of the things that we really wanted was in order for us to build a habit for our users, we needed to deliver a predictable consistent experience across a query class. So if I went and told my dad, "Hey, Dad, try any car," I really want him to get a categorized experience for any car. So those are the two kind of constraints that we really set for ourselves. We said, "Unless we meet these two criteria, it's not really successful." And so we started out with a lot of prototyping around, "Hey, can we actually extract intent from queries?" So we started from the intent aspect. And I'll walk you through an example just to show you a simplistic view and how it gets very easily complicated.

So in the example that you see here, we started out with musicians. So with musicians as a whole, the categories and the tasks essentially that the users do generically are fairly straightforward, you know, people want lyrics, songs, tabs, tour dates, ring tones, et cetera, and the list goes on.

Brady Forrest: And are musicians judged as a category?

Sanaz Ahari: Yes, so musicians here is, for example, a category. Yes. Now this is fairly -- what I would say, it's a fairly meaty high-level category though, because as you dig in deep, there are a lot of different attributes about musicians. So the three different examples I have here are -- well, two of them are my favorite bands, but not J Lo exactly. And they kind of cover a wide range. So you've got Jennifer Lopez (Bing search) and she's a pop musician, but she's one of those people that does a whole bunch of other things as well. You've got Gotan Project (Bing search), a little bit more tail. And they're a trip pop band. And then third, you've got Rodrigo Y Gabriella (Bing search) who are more of rhythmic guitarists. And you can think about all different sorts of attributes. You've got musicians that may not be alive anymore, et cetera. So there's all sorts of different attributes that fall out of even just a single musician's class. And so in this example, ideally, you should nail the right categories that apply to these three different examples.

So in one case, you've got the guitarist's ideally for this case, you know, tabs are pretty relevant. Lyrics definitely don't make a whole lot of sense. And then you've got J Lo and she is multifaceted, and we should really try and capture most of her facets. She's a fashion designer. She's an actress, and she's a musician, et cetera. So this shows you kind of the types of problems that we have to solve. A is a query might fall under different classes. B is that even if you're under a single class, the intent from that class, it may not be the same. And then there's the problem of head queries and tail queries, ones were we have a lot of data for and ones where we don't have a lot of data for. So from here on, we go on to basically our approach for solving this problem. I should say that this is an area where we had a brilliant set of folks working on it. We collaborated pretty closely with research. We had a brilliant set of engineers working on it. And the model that we converged on is one where we basically do category level inference as well as query levels. So in this case, in the category level, we want to figure out -- I've given a class of queries that are all similar. What are the top things that users are interested in?

In this case, our algorithms basically we used a whole bunch of different features, everything from query clustering, query clicks, session analysis, document extraction, contextual analysis, et cetera. And all of these things were things that we -- the features that we added were based on -- we did a lot of quick iteration to figure out what is good; what is bad and then where do we fall short to figure out what are the extra things that we really need to add in to our algorithm. So measurements was a very key process into our system because we really, really wanted to achieve categories that the users could make a lot of sense out of.

So algorithms don't often give you things that users really understand. So we really, really wanted to deliver things that it made sense to the users. And then on the second level, we really wanted to understand everything about just a query standalone as much as possible. And this is to balance the whole, "Okay. What are the top things people care about in a whole category?" If I've got this bag of categories that users care about, now how do I pick the right ones that only apply to this query? And that is why we had an approach at a category level and also at our query level. Lastly, we did a lot of work around determining if we know that a query is in a category, is that actually the primary intent for that query. So, I don't know, like traffic may be a movie, but a lot of users when they type in traffic, they actually are just looking for how bad is the traffic right now. And that's an example of a query, even though belonging to a category, it may be an obscure intent.

Lastly, we have our ranking model. And our ranking model basically takes all of the different inputs at the category level and at the query level in order to do some modeling around what are the top intents that apply to our re-query. And, of course, we have a very tight feedback loop system from what are the things that users engage with to feed back into the ranking of the categories as well as discovering new ones.

Brady Forrest: And how fast do you have to make this calculation for each query?

Sanaz Ahari: I mean it's all pretty fast because we are scaling through millions of queries. So there's a combination of things for performance optimizations, we do some things offline and we do some things online. For things that don't change a lot and it makes sense for us to do it offline, we try to optimize it. But it's definitely a combination of the two. And our goal is with users, performance is just an expectation. So that's something that we can't compromise on. So everything happens in a matter of milliseconds basically for all of our computations.

Brady Forrest: And how much are you able to cache in case suddenly a query starts to trend up?

Sanaz Ahari: Right. For a lot of our headquarters, we definitely do a lot of caching, et cetera. And for real time spiky things, we have invested in an entire different system where we're constantly monitoring for spiky trends. So it's basically the two systems are basically kind of optimized both individually so that we always are aware of what are the things that are all of the sudden spiking a lot. And then being smart about the things that have already been -- you know, that are head queries, that people are re-querying for.

The second portion of this interview will be posted shortly.

tags: bing, internet, sanaz ahari, search | comments: 0
submit:

Fri

Jun 26
2009

listen
print

How Active is Twitter Now? Tweespeed

by Brady Forrest | @brady | comments: 5

tweespeed clock

As of Friday, June 26th, 2009 at 1:10PM PST Twitter is pumping out 13,574 tweets per minute. I know from TweeSpeed, The Twitter Instant Speed Meter. The auto-refreshing application averages the last five minutes of Twitter's public timeline to get its figure.

The simple app was built using "Java (JSP), uses the Twitter Java API, and runs on Google App Engine, Ajax, Sitemesh for page decoration,Eclipse as dev tool, Google Visualization API Gauge."

So why is Tweespeed important? Twitter's activity is a reflection of people's reaction and excitement about news. The above graph shows the TweeSpeed over the past 24 hours. The spike is during Michael Jackson's death. That spike is almost double the current tweespeed. If you're in the news business getting an alert when the Tweespeed shoots through the roof could be valuable. TweeSpeed hasn't implemented alerts, but they do have a widget if you want TweeSpeed on your internet dashboard.

(via Programmableweb)

tags: twitter | comments: 5
submit:

Fri

Jun 26
2009

listen
print

Scott Berkun on Why You Should Speak (at Ignite)

by Brady Forrest | @brady | comments: 0

A lot of people feel that Ignite is great training for speakers. The strict format and auto-advancing slides can really solidify your self-confidence. Scott Berkun, the author of Making Things Happen and of an upcoming book on public speaking, did a talk on just that at the last Ignite Seattle. We've edited that talk and made it this week's Ignite Show.

In addition to his talk Scott wrote about how to do a great talk on his Speaker Confessions blog. Here are some of his tips:

Don't get hung up on slides. Good slides support what you're saying, not the other way around. The last thing you want is to end up chasing your slides, a common problem at ignite as you'll never catch up. Pick simple images and if you must use text be sparse. No bullet lists, just one or two points. Make the slides flexible enough that if you fall behind it's easy to skip something to catch up.
You can hack the format. The idea of a 'slide' is vestigial - they're not slides anymore. I've hacked the format a few times, including using a special time counter deck to give me more flexibility (see photo at right). You can see this in action in my ignite talk on Attention and Sex or grab the deck here if you want to use or hack it further.
Plan to lose your first and last slide. Time will get eaten by the audience laughing, by any ad-libs you do, etc. so plan for about 4:30 instead of the full 5:00.

You can also get the Ignite Show on iTunes.

tags: | comments: 0
submit:

Wed

Jun 24
2009

listen
print

Case Study: Twitter Usage at Wordcamp SF

by Brady Forrest | @brady | comments: 11

usage chart

One of my many hats is as an events organizer. Twitter has become an invaluable tool for me to gauge the mood of the attendees. Are they excited by the current speaker? Bored or excited at the latest news? Are they having a good time? And most important, are they making connections?

Pathable, an events social networking company, has posted an analysis on the use of Twitter at WordCamp SF. The above chart shows how 797 tweets were categorized by a Pathable intern. Disclosure: I am friends with the co-founders of Pathable and a proud advisor of the company.

Or as Pathable more broadly classifies them:

Tweets that are not directly relevant to the vast majority of event attendees (”Here’s what I’m doing / feeling”, “talking directly to someone else”) make up about 1/3 of the tweets sent.
Tweets that are useful to people who can’t physically be at the event (”Comments / Quotes about speakers”, “Announcements / Info / Questions related to event”) make up more than 1/3 of the tweets
Tweets that report people’s intended or actual location make up around 1/6 of the tweets (”Traveling to”, “At the event / session”)

And who do you think send those tweets?

While 258 total people sent at least one tweet, 20 people account for more than half of those. That’s consistent at a high-level with the “long-tail” notion of user-generated content (i.e., a large number of people contribute small amounts of content, but that content in aggregate accounts for a large proportion of the total content). The numbers, however, don’t fit cleanly in the 80/20 90/10 buckets that are often cited. Instead, it’s more like 50/50 (50% of the content is accounted for by a small number of high activity contributors, 50% by everybody else).

(continue reading)

tags: twitter | comments: 11
submit:

Tue

Jun 23
2009

listen
print

Bing and Google Agree: Slow Pages Lose Users

by Brady Forrest | @brady | comments: 11

Today representatives of Google Search and Microsoft's Bing teams, Jake Brutlag and Eric Schurman respectively, presented the results of user performance tests at today's Velocity Conference. The talk was entitled The User and Business Impact of Server Delays, Additional Bytes, and HTTP Chunking in Web Search. These are long-term tests were designed to see what aspects of performance are most important. To know how to improve their sites both Bing and Google need to know what tweaks to page load perceptions and realities help or hurt the user experience. This is one of the first performance tests that has actual data (and is not strictly anecdotal). The numbers may seem small, but they if you are dealing in millions/billions they add up quickly.

Here are Brutlag's and Schurman's final points:

"Speed matters" is not just lip service
Delays under half a second impact business metrics
The cost of delay increases over time and persists
Use progressive rendering
Number of bytes in response is less important than what they are and when they are sent

Server-side Delays Test:

Server-side delays that slow down page delivery can significantly and (more importantly) permanently affect usage by users with the test. Both Bing and and Google ran similar tests that support this claim.

Bing's test: Bing delayed server response by ranges from 50ms to 2000ms for their control group. You can see the results of the tests above. Though the number may seem small it's actually large shifts in usage and when applied over millions can be very significant to usage and revenue. The results of the test were so clear that they ended it earlier than originally planned. The metric Time To Click is quite interesting. Notice that as the delays get longer the Time To Click increases at a more extreme rate (1000ms increases by 1900ms). The theory is that the user gets distracted and unengaged in the page. In other words, they've lost the user's full attention and have to get it back.

Google's Test: Google ran a similar experiment for where they tested delays ranging from 50ms - 400ms. The chart above shows the impact that it had on users for the 7 weeks they were in the test. The most interesting thing to note was the continued effect the experiment had on users even after it had ended. Some of the users never recovered -- especially those with the greater delay of 400ms. Google tracked the users for an additional 5 weeks (for a total of 12).

(I've included more on the other tests after the jump.)

(continue reading)

tags: | comments: 11
submit:

Mon

Jun 22
2009

listen
print

Before and After Shots of Google's Iran Maps

by Brady Forrest | @brady | comments: 5

There many places in the world where it is not possible for larger companies to map them. These can be for economic reasons as is the case for Black Rock City (the temporary 40,000 person home for Burning Man). Or for political reasons as is the case for Iran and countries such as China.
As I mentioned the other day Google greatly improved their map coverage of Iran via user contributions through their Mapmaker program. These user contributions were applied just a few weeks ago. Here are before and after screenshots of two Iranian cities. The before shot was taken on September 22, 2008; the after shots were taken on May 18, 2009.

Mashhad (Before and After)

Tabriz (Before and After)

tags: geo, geodata, open street map | comments: 5
submit:

Thu

Jun 18
2009

listen
print

Geolocating Your iPhone Users via the Browser

by Brady Forrest | @brady | comments: 9

safari permission

Hallelujah! Geolocation is available in the iPhone's browser. I was thrilled to finally have this app ask to use my location. This is only true for the new 3.0 version of the browser (oddly, geolocation is *not* available in the Mac version of Safari 4). Adding the ability to geolocate users via the browser opens up a whole new range of web apps.

If you're eager to start catering to the legion of iPhone users ready to tell you where they are, Adam DuVander (the fellow behind the Portland Wifi Finder among other things) has written up an excellent post on how to access their location. The iPhone is using the W3C Geo-Location spec. If you are running the latest version of the iPhone OS you can try it out at https://bit.ly/w3cgeo

The code itself is very simple as Adam's sample demonstrates:

navigator.geolocation.getCurrentPosition(foundLocation, noLocation);

function foundLocation(position)
{
var lat = position.coords.latitude;
var long = position.coords.longitude;
alert('Found location: ' + lat + ', ' + long);
}
function noLocation()
{
alert('Could not find location');
}

(you can get more information on the behavior in Adam's post)

Apple was smart about the user experience and kept the user in control. I was prompted to give Safari permission to access my location (I should only be asked this question one more time). I was then prompted to share my location with the website (in this case Adam's test). I expect many sites to quickly update their mobile sites to include location (Google already identifies your location if you are using the Android browser so I hope they update the iPhone version of their homepage shortly).

tags: | comments: 9
submit:

Wed

Jun 17
2009

listen
print

Want a Map of Tehran? Use Open Street Map or Google

by Brady Forrest | @brady | comments: 8

tehran flickr map

All eyes are on Tehran right now. As the center of the Iranian election protests the city has become increasingly important to websites this week. To keep their site up-to-date with this latest crisis area Flickr switched out the Yahoo road Map with Open Street Map. When I heard about this I wondered how other major mapping sites faired.

So I examined the road and satellite maps of Yahoo, Mapquest, Google, and Bing (formerly Live Maps). Looking at the images below it becomes very clear that user-generated maps win in hard to reach places. Both Open Street Map (above) and Google (below) rely on user-contributions. Open Street Map relies almost entirely on user uploaded GPS tracks for its mapping data across the world. After the jump i've included the satellite maps from each service (except for Mapquest who did not have them). They were

Google is using data acquired from their just-under-a-year-old Mapmaker program (Radar post). With Mapmaker users can add roads, POIs, regions and features. It's a very powerful tool that has greatly expanded Google coverage. Google has been slow and deliberate in using Mapmaker data on their main site. In fact it was just a couple of weeks ago that Iran's mapmaker data "graduated" to the main site. There are now 64 countries on Google that have been updated with Mapmaker data.

This isn't the first time Flickr has done this (Radar post). They've also used Open Street Map for Beijing, Black Rock City (2008), Tokyo, Buenos Aires, Adelaide, Sydney, Brisbane, Canberra, Melbourne, Baghdad, Kabul, Kinshasa, Mogadishu, Harare, Nairobi, Accra, Cairo, and Algiers.

So what's holding back Microsoft, Yahoo and Mapquest? Unknown, but hopefully they'll realize that their top-down approach isn't working.

Compare the Maps for yourself:
Note: I have included data layers where they were available (Google and Microsoft).

Google Maps:

tehran google map