CARVIEW |
April 06, 2006
Blocking Image Bandwidth Theft with URL Rewriting
I like to periodically watch the HTTP traffic on my server. I can see what I'm actually serving up over the wire, and how much bandwidth I'm using.
That's how I noticed that I've become somewhat popular with direct-link image bandwidth thieves. In other words, people who thoughtlessly (or maliciously) embed these IMG links in their web page:
<img src="https://www.codinghorror.com/blog/images/qbert_regex_16.png">
That means the image qbert_regex_16.png is served by my webserver to every user who happens to request this myspace profile page.
Warning: like all myspace pages, that page is
- Not really safe for work
- Incredibly, mind-bendingly ugly
- Filled with thousands of images, animated images, flash, MIDI samples, embedded MP3s
- Utterly and completely incomprehensible
In short, a trainwreck. Every time I visit myspace, I feel a little bit stupider, ala Billy Madison:
Principal: Mr. Madison, what you've just said is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having listened to it. I award you no points, and may God have mercy on your soul.Billy Madison: Okay, a simple no would've done just fine.
I have no idea why myspace is so popular. I guess the best I can hope for is that those damn kids stay off my lawn.
Anyway, back to business. The most common technique for blocking direct image links is to check the HTTP referer header. Here's the complete HTTP header set of an image request that just came through:
GET /blog/images/logitech_g15_keyboard.jpg HTTP/1.1
Accept: */*
Referer: https://www2.gamelux.nl/forum/topics/10072/38/
Accept-Language: nl
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)
Host: www.codinghorror.com
Connection: Keep-Alive
Prior to serving up the image, we should check the Referer HTTP header, and make sure it's either:
- Blank
- In a list of known whitelisted referring domains
If it isn't, we will serve up either a 404 error, or a "hey, stop stealing our bandwidth" image of some kind. Because I'm a nice guy, I chose this image:
All this can be done through incredibly powerful URL Rewriting, which has been standard on Apache for some time. There's a nice walkthrough on how to set up image link blocking in Apache on Tom Sherman's site.
Unfortunately, IIS 6 doesn't have native support for URL Rewriting*, but there are any number of third party ISAPI filters that can do it. The one I use is ISAPI Rewrite. It's very similar to the Apache version, in that it is driven by the httpd.ini file in the root of each website. I struggled a bit with the rules, but thanks to a helpful forum post, I realized that I needed to put all the whitelisted domains on a single line to get a boolean "or" that included the empty referer case, like so:
[ISAPI_Rewrite]# Block external image linking RewriteCond Referer: (?!https://(?:www.codinghorror.com|www.bloglines.com|www.google.com)).+ RewriteRule .*.(?:gif|jpg|png) /images/block.gif [I,O]
So, as outlined above: unless the referer is blank, or in the whitelist, they get shunted to the blocked image.**
Take that, 26 zillion myspace users.
* I'm pretty sure URL Rewriting will be in IIS7, since they're finally getting around to making a really good copy of Apache's modular architecture in version 7.
** This is done at the ISAPI level, so unlike the cheesy ASP.NET "URL rewriting" solutions, it also works on generic URLs, not just URLs that end in .aspx or some other extension that is sent to the ASP.NET handler. This has long been a pet peeve of mine, but it's really the fault of IIS. And it's changing in IIS 7.
Thanks for adding bloglines to the list. There's probably a couple more you will have to add over time.
However, as with every header field, it really isn't reliable. The next step would probably be to use cookies or GET/HEAD pictures only in some pre-defined order.
Henry Boehlert on April 6, 2006 01:55 AMPlease add newsgator.com to the list.
Thanks!
> The next step would probably be to use cookies or GET/HEAD pictures only in some pre-defined order.
I understand the cookie approach, but describe the GET/HEAD approach?
Also, I added live.com and newsgator.com to the whitelist based on some additional sniffer trace monitoring.
Jeff Atwood on April 6, 2006 02:49 AMAlso, I found a nifty tool that lets you tests whether or not your anti-hotlink approach is working on your server:
https://coldlink.com/htm/tool.htm
Be *sure* to clear your browser cache before running the test; stuff on disk will always show up.
Looks like the various anti-hotlink alternatives are also enumerated on that site:
https://coldlink.com/htm/tech.htm
They sell a product that generates random URLs on the server side which are only valid for a fixed amount of time, eg, "ColdLink". Interesting.
Jeff Atwood on April 6, 2006 02:58 AMI'm glad I'm not the only one who's had to resort to image blocking because of those damn MySpace users.
Daniel on April 6, 2006 03:25 AMBeen an avid reader for a while. Just noticed the added image-parsing required to post.
While I understand the need to avoid spamming the board, have you considered that a blind person will now require the aid of a friend to post a comment to your blog? The solution is very far from perfect. I can't give you a better solution off the cuff, but you should be aware that it does cause problems for some users.
Ulrik Jensen on April 6, 2006 04:41 AMThis can be done with IIS 6
I've set it up at https://www.safecam.org.uk/ to stop other sites nicking the photos and maps.
I can't remember exactly what I did off the top of my head, if anyone is interesting, I'll dig out my source.
Peter Bridger on April 6, 2006 05:46 AMWow, I really should have turned down my speakers before clicking the myspace link....actually...I should have just not clicked the myspace link. Nothing good can ever come from that place.
Thanks for the great image blocking technique.
Nidifice on April 6, 2006 06:41 AMWhile understanding the reasons for this step it is also bad for some users. From now on I can see just WTF images in my own feedreader.
ciao Ronny
Ulrik Jensen: Why would a blind person care about imaging blocking posts?
For that matter, what percentage of CodingHorror readers are so blind that they have to use a screen reader as their only possible means of surfing the web? I will go out on a limb and say very few are.
My father-in-law is just about completely blind, but can see shapes out of the corner of one eye and he is STILL able to browse the web and look at images. Of course, he has a special magnification utility that goes far beyond the one built into windows, but so would anyone else that can barely see.
I'm more concerned about all the poor lynx users https://lynx.browser.org/ :(
Rick Scott on April 6, 2006 07:17 AMOh, and why is the captcha always "orange"? That's not very hard to defeat lol.
Rick Scott on April 6, 2006 07:19 AMIt works for pretty much all the standard spam-bots that are out there, which is pretty much all this site gets. It works for now, and is easy to habituate. I'm guessing if anyone bothers to "break" it, he'll change it to random words.
[ICR] on April 6, 2006 07:32 AMFor IIS you can use <a href="https://www.isapirewrite.com/">isapi rewrite</a>, there is a free "lite" version available that works like magic.
Another way they seem to be able to reach you is using a redirect from google images, so be careful with what you add to your accept list.
Werner on April 6, 2006 07:47 AMOK the link didn't show up in the previous comment: https://www.isapirewrite.com/
Werner on April 6, 2006 07:48 AMYes, I've had to do the same for coinop.org - funny that my q*bert pictures also get leeched. The other culprit (myspace is bad, yes) is ebay - people selling "emulator paks" while theiving other people's code are also likely to thieve on the bandwidth as well. For those I usually replace them with a funny custom image involving a baby and excement and then report them to ebay for having offensive images. Then again I'm vindictive.
I have a custom image deliverer that can scale up and down images and it also checcks to make sure the referer is me. It catches 99% of the links and returns an "image missing" - figure that will confuse people and waste their time.
MySpace is popular because it's chaotic and allows you to do what you feel like without much structure. You can do what you want where you want to do it. It's like IM gone mental, with the output stored for future reference.
Friendster was more structured and lost popularity for that reason, as well as having a hostile administrator and slow system response for a long period of time... but it was a lot more structured.
Anyone who thinks that the up-and-coming generation are tech-whizzes who can do great things with technology should take a look at MySpace as a counter-example. They're just consumers of what's put in front of them, and that's about the extent of it.
mattbg on April 6, 2006 08:47 AM> actually...I should have just not clicked the myspace link. Nothing good can ever come from that place
LOL
> The other culprit (myspace is bad, yes) is ebay
And online forums. Some guy in the UK made that Q*bert image his forum avatar, so it showed up in every post he made.. :P
> From now on I can see just WTF images in my own feedreader.
Ronny:
As long as your feedreader (I assume a Windows app?) is sending blank referers, it will work. I only disallow unkown referers, not blank or empty ones. There's should be no "referer" for a Windows app to use, as it's not coming from a website!
Right now the whitelist is:
- my site (duh, that'd be funny ;)
- google.com
- bloglines.com
- newsgator.com
- live.com
- (blank referer)
If it *is* sending a referer, let me know what the URL is and I will happily add it to the whitelist for you.
Jeff Atwood on April 6, 2006 09:22 AMThat is a nice blocked image. I'll be updating my httpd.ini file on my server to be:
RewriteRule .*\.(?:gif|jpg|png) https://www.codinghorror.com/images/block.gif [I,O]
:)
I cannot WAIT for IIS 7 to be released and adopted widespread!
All the hoops I have to jump through with Subtext to allow you to create a blog in a "virtual" subfolder WITHOUT setting up a virtual directory in IIS and without mapping * to aspnet_IIS.
This would allow you to create a URL like https://example.org/MyBlogFolder/ without having a physical (or even virtual) folder named "MyBlogFolder".
In the end there's no way to do it without either mapping * to aspnet_IIS or using a custom 404 page (which is the choice I made).
Ideally, I want my URLs to be really pretty. Like ponies.
Haacked on April 6, 2006 09:36 AM> I cannot WAIT for IIS 7 to be released and adopted widespread!
It's gonna be a while. All versions of Vista come with IIS7 (as we found out at Mix), but those are all desktop operating systems. Are you gonna install Vista on your hosting services' servers? That's what I thought.
We can develop against it. But we'll all be waiting for Longhorn server before we can use IIS 7 for real, production websites. I have no idea when that will be out!
> mapping * to aspnet_IIS
I do not think you should ever map * to the ASP.NET handler. Stated another way: I think this is a really bad idea.
There's no perfect solution right now, but that particular "solution" is gonna cause problems.
You should get a copy of ISAPI Rewrite and do this the right way. Obviously the subtext project can't make this a requirement, though, but as a personal workaround engine, it's nice.
Jeff Atwood on April 6, 2006 09:43 AMAnother solution is to just host your images at Flickr to start with. Then, who cares if someone hotlinks - you're not serving them. Cuts down on bandwidth, too.
And if those MySpace biters took the time to Coralize the links, you wouldn't have been hit with the bandwidth. https://www.coralcdn.org/
How hard is it to change the link to
https://www.codinghorror.com.nyud.net:8080/images/block.gif
Jon Galloway, stop thinking outside the box. Put yourself BACK in the box, man!
But seriously. I am a huge fan of Coral. I am not a huge fan of becoming dependent on another website for core functionality.. eg, Feedburner (RSS feed), Flickr (images), etcetera.
Jeff Atwood on April 6, 2006 10:04 AMWow... how many RSS reading site owners are gonna be on your whitelist? I hope it doesn't get too long to parse...
And yes, that's a reversed invitation to put mine up aswell.
Gabri on April 6, 2006 10:05 AM> how many RSS reading site owners are gonna be on your whitelist
The use cases for web sites that tend to be aggregated is definitely different than a traditional website. I think either..
A) You're a giant RSS aggregator, so you'll be on a limited whitelist.
B) You're a small RSS aggregator, so you need to write image retrieval code that passes in blank referers.
I'm not the only site that blocks unknown referers from retrieving images! As you know, all it takes is a few idiots to ruin it (free, unlimited remote image linking) for everyone.
Jeff Atwood on April 6, 2006 10:26 AMIt's interesting to see that I'm not the only guy out there using ISAPI Rewrite. I've found it to be very, very useful. You can pull off some truly neat tricks with it. For example, https://www.practicelink.com/jobs/ This entire directory tree more or less just runs off of one .aspx page. I've got ISAPI Rewrite set up to map all requests that match /jobs/.+? over to my aspx page, while the user (or search engine, which is the real idea) is none the wiser.
I've also started getting into making it so I can add some virtual directories via ISAPI rewrite via some other aspx page. The page just generates the appropriate regexes and ISAPIRewrite code and uses filestreams to update the httpd.ini file. Since ISAPI Rewrite requires no IIS restart or anything like that after you update its httpd.ini for what you changed to take effect, this works like a charm.
Alan Pack on April 6, 2006 11:46 AMNice article...I've long used Apache and now am trying to figure out how to do things that used to be easy on IIS for work. This will help.
Thought you might want to know that images are rewritten on when seen through bloglines. Additionally, I am switching to https://rojo.com for RSS feeds - it's the best feed aggregator I've come across yet, it would be nice if you could add that to your list. I always read Coding Horror, you're my absolute favorite .Net related writer!
(I thought it might be that I often leave the www our of the url since that is kind of redundant, but in either case, still not seeing your images on Bloglines.)
mahalie on April 6, 2006 02:00 PMHey Jeff,
Mate, feel you pain. You might get a chuckle out of the following article:
<a href="https://attrition.org/news/content/05-12-31.001.html">https://attrition.org/news/content/05-12-31.001.html</a>;
- Dugie
Andrew Dugdell on April 6, 2006 04:04 PMAndrew, that's hilarious, LOL!
> I am switching to https://rojo.com for RSS feeds
I'll add that to the whitelist later tonight.
Jeff Atwood on April 6, 2006 04:15 PMOkay, I'll get mostly back in the box. But one more thing to think about - if _you_ coralize all your image links, everyone who copies your image links gets the coralized copies. Instead of spending your time chasing ISAPI rules, you could change your blog rendering code to coralize your image links when it writes them out, so you could turn it off with a config setting.
And yes, I should probably spend more time writing my own stuff and less time commenting on yours.
Jon Galloway on April 7, 2006 12:08 AMYour image is too small for the teenyboppers on Myspace and the like to notice.
I would change your image to be *much* bigger, but use some sort of graphic format like gif that doesn't increase the filesize much. If you make the image say 800*800px then it will be noticed and get removed.
Thomas Tallyce on April 7, 2006 04:24 AMRick Scott: I do agree that the problem is probably very limited. And the fact that the image is the same every time (so far) does make it easier.
However, this is a site that focuses a lot on usability, with which I feel accessibility is pretty tightly connected, so I think it is relevant to consider that the solution, although widely used, isn't anything near perfect.
It's been a pet peeve of mine since I had to help a blind friend sign up for at site that used this technique. There has to be a better way of protecting against spam-bot, although I am not myself smart enough to find one.
Ulrik Jensen on April 7, 2006 08:46 AMYou could add the Yahoo! mail beta RSS reader, too.
Moreover, you could just output some more innocuous placeholder, or maybe nothing and let the browser fall back on it's broken image link. I wouldn't mind clicking through to your site to see the images, so long as the replacement images isn't painful to look at.
Thanks.
Marc on April 7, 2006 09:18 AM"I do not think you should ever map * to the ASP.NET handler. Stated another way: I think this is a really bad idea."
Is there a particular reason for this? With ASP.NET 2.0 and IIS 6 the ASP.NET handler is designed to be usable in this way and can pass back requests to IIS (so, for example, you can use Forms Authentication to protect ALL resources on your website (such as images) and not just aspx/ascx/etc. files).
I don't really know much about URL rewriting, but have been looking into it for a web app I'm working on and would appreciate any input. I was going to do a wildcard mapping to the ASP.NET handler, but will have another look if this is not a good idea.
MT on April 8, 2006 04:23 AM> I was going to do a wildcard mapping to the ASP.NET handler, but will have another look if this is not a good idea.
For one thing, this doesn't work for folders, eg, https://mywebsite.com/myfolder/
It's also unnecessary overhead for serving up basic files like CSS and images.
Jeff Atwood on April 8, 2006 12:28 PMBloglines isn't working for me, I still get the wtf pics.
Chris on April 8, 2006 01:39 PMre: bloglines, the issue is not having www. as Jeff's rewrite (all of the) require www.bloglines.com. We'll need to rewrite the rewrites. :)
Scott Hanselman on April 8, 2006 09:27 PMThis kind of things works, BTW, re the www. or no www. thing. Fixed my bloglines and yahoo mail problems.
(www\.)?netvibes\.com
Scott Hanselman on April 8, 2006 09:47 PMI think I'm going to go with (anything.)domain.com .. for all the whitelisted domains. I just haven't had a chance to update the rules yet. But I will!
Jeff Atwood on April 8, 2006 11:16 PMHow many different inline linking attempts do you get?
I mean, would it not be simpler to blacklist instead of whitelist? That would solve the problem of other on-line aggregators, other search engines , and so on.
Not realistic if you get many attempts from many different domains, but if there are only a few big ones, then letting a small number of image hits happen once in a while may not be too bad if it allows all legitimate linking through...
Hi! Is it possible to add livejournal.com to the whitelist? I have your blog syndicated with the RSS feed to my friends page...
Anonymous on April 19, 2006 03:14 PMScott Hanselman: Oh! Finally I know what happened - at the beginning of April I though that somebody hacked your blog :)
Laboremus on April 25, 2006 05:25 PMUm, so far when viewing from google reader, I still get WTFs.
I see that google.com is whitelisted, but as I am in canada, I use google.ca
Can you please whitelist that one as well? (And I guess for other international users you may have to google.co.uk, google.??)
-greg
OK I added livejournal.com and I modified the google check to
(anything).google.(2 or 3 characters)
Pesky canadians.. ;)
Jeff Atwood on May 9, 2006 12:04 AMTry Ionic's ISAPI Rewrite Filter - an ISAPI for IIS5 or IIS6 that does URL rewriting.
https://cheeso.members.winisp.net/IIRF.aspx
I know this is an older thread, but could you add the newshutch.com feed reader to your whitelist?
Nathan Bowers on August 3, 2006 12:31 PMHello, I am using Ionic's ISAPI Rewrite filter and have a question about using it. If a url has 10 parameters how do you get the 10th one, or 11th one etc...?
Using $10 does not work, it returns the value of $1 appended with a zero on the end...
i.e. <$1>0
Any suggestions?
Thanks,
Jason
jsmithe3@gmail.com
I added newsalloy.com and newshutch.com to the whitelist.
Jeff Atwood on August 15, 2006 02:49 PMWow, great site. I came over from spcr through a link to the blog about quiet computing. Good stuff. Myspace is pretty bad, I don't think it was built for so many users, there's always errors and maintainence going on. Its simply a poorly done, chaotic, but very open forum. I'm bookmarking this page instead of going to your main page just because of the Billy Madison quote.
Randy on August 24, 2006 05:00 PMI added any URLs beginning with "localhost" to the whitelist. Per one of the SharpReader developers, the IE ActiveX control always sends locahost:port as the referer when requesting images..
Jeff Atwood on August 27, 2006 02:06 PMthanks mate!
Worked like a charm
Gadabout on August 31, 2006 05:44 PMIf you are using Apache/PHP you might be interested in the tutorial at:
https://allaboutdatingsites.com/forums/showthread.php?t=25
enjoy!
m
AllAboutDatingSites Webmaster on December 21, 2006 07:19 AMI think direct linking is a sin. So I always use my photobucket account to host my images I use on forums or in a blog I can't upload images to.
I thought it was just common sense that you get hosting that is your own anyways because I always feel very guilty when I get lazy and go direct link a image without uploading it to my photobucket account.
Even I know this even though I am a n00b and I total idiot and I lack common sense in most everything else ; including properly commenting on a blog entry.
clueless_furball on December 24, 2006 01:15 PMThat's sticking it to the 122 viewers of the thread where I direct linked an image of the g-15 from you site. Woosh!
joe on February 14, 2007 12:40 AMAs of today, I have removed the referer restriction, because I have offloaded image hosting to external services like imageshack (for older archived posts) and amazon's S3 (for new posts).
https://www.codinghorror.com/blog/archives/000807.html
Jeff Atwood on March 25, 2007 04:01 PMYou can use IIS Mod-Rewrite for fully compatible mod_rewrite functionality on IIS.
https://www.micronovae.com/ModRewrite/ModRewrite.html
Craig on July 11, 2007 01:19 PMHi,
I have this problem and aren't myspace customers friendly when you tell them to take it off!
Is there anyway of doing this with HTML? I dont have a database system running, Im oldschool it that regard with hand written HTML, I know Im pre-historic..
Can you help or point in the right direction?
Glen
American largest online natural herbal health care products reviews and medicines for all kind of health care treatments and solutions for men?s and women?s health, skin care and hair care, general health and sexual health, weight loss and diet from www.naturalherbalproduct.com
smarty-01@hotmail.com
https://attrition.org/news/content/05-12-31.001.html
That article was amusing, but the fact that 2/3 of its inline images are broken is the real WTF.
If you are looking for cheap prescription drug pharmacy, I would recommend you all to shop at eshoprx.com They are reliable, fast and believe me CHEAPEST.
Jason Grange on January 28, 2009 10:38 PMRelgolook is a productivity application for Microsoft outlook users. Relgolook information management provides organize and archive emails and information and reduce attrition
vahila on March 9, 2009 10:41 PMContent (c) 2009 Jeff Atwood. Logo image used with permission of the author. (c) 1993 Steven C. McConnell. All Rights Reserved. |