CARVIEW |
April 26, 2009
Has The Virtualization Future Arrived?
On the eve of the Windows 7 release candidate, Microsoft announced that Windows 7 will include a fully licensed, virtualized copy of Windows XP:
XP Mode consists of the Virtual PC-based virtual environment and a fully licensed copy of Windows XP. It will be made available, for free, to users of Windows 7 Professional, Enterprise, and Ultimate editions via a download from the Microsoft web site. XP Mode works much like today's Virtual PC products, but with one important exception: it does not require you to run the virtual environment as a separate Windows desktop. Instead, as you install applications inside the virtual XP environment, they are published to the host OS, with shortcuts placed in the Start Menu. Users can run Windows XP-based applications alongside Windows 7 applications under a single desktop.
I've been talking about our virtual machine future for years. Shipping a fully licensed, virtualized XP along with some editions of Windows 7 has huge implications for backwards compatibility in the Windows world.
For one thing, Windows XP is ancient. While XP may have been the apple of 2001's eye, in computing dog years, it's basically.. dead. The original system requirements for Windows XP are almost comically low:
- 233 MHz processor
- 64 MB of RAM (128 MB recommended)
- Super VGA (800 x 600) display
- CD-ROM or DVD drive
- Keyboard and mouse
It doesn't take much to virtualize an OS as old as Windows XP today. I was able to cram a full Windows XP image into 641 MB of disk space, and depending on what sort of apps you're running, 256 MB of memory is often plenty.
The attraction of virtualizing older operating systems is that it throws off the eternal yoke of backwards compatibility. Instead of bending over backwards to make sure you never break any old APIs, you can build new systems free of the contortions and compromises inherent in guaranteeing that new versions of the operating system never break old applications.
Modern virtualization solutions can make running applications in a virtual machine almost seamless, as in the coherence mode of Parallels, or the unity mode of VMWare. Here's a shot of Internet Explorer 7 running under OS X, for example.
From the user's perspective, it's just another application in a window on their desktop. They don't need to know or care if the application is running in a virtual machine. Substitute Windows 7 for OS X, and you get the idea. Same principle. Virtualization delivers nearly perfect backwards compatibility, because you are running a complete copy of the old operating system alongside the new one.
While the screenshot gallery makes it clear to me that this feature of Windows 7 is not nearly as seamless as I'd like it to be, it's a small but important step forward. The demand for perfect backwards compatibility has held the industry back for too long, and having an officially blessed virtualization solution available in a major operating system release (albeit as a downloadable extra, and only in certain editions) opens the door for innovation. It frees software developers from the crushing weight of their own historical software mistakes.
[advertisement] Improve Your Source Code Management using Atlassian Fisheye - Monitor. Search. Share. Analyze. Try it for free! |
April 21, 2009
A Modest Proposal for the Copy and Paste School of Code Reuse
Is copying and pasting code dangerous? Should control-c and control-v be treated not as essential programming keyboard shortcuts, but registered weapons?
(yes, I know that in OS X, the keyboard shortcut for cut and paste uses "crazy Prince symbol key" instead of control, like God intended. Any cognitive dissonance you may be experiencing right now is also intentional.)
Here's my position on copy and paste for programmers:
Copy and paste doesn't create bad code. Bad programmers create bad code.
Or, if you prefer, guns don't kill people, people kill people. Just make sure that source code isn't pointed at me when it goes off. There are always risks. When you copy and paste code, vigilance is required to make sure you (or someone you work with) isn't falling into the trap of copy and paste code duplication:
Undoubtedly the most popular reason for creating a routine is to avoid duplicate code. Similar code in two routines is a warning sign. David Parnas says that if you use copy and paste while you're coding, you're probably committing a design error. Instead of copying code, move it into its own routine. Future modifications will be easier because you will need to modify the code in only one location. The code will be more reliable because you will have only one place in which to be sure that the code is correct.
Some programmers agree with Parnas, going so far as to advocate disabling cut and paste entirely. I think that's rather extreme. I use copy and paste while programming all the time, but never in a way that runs counter to Curly's Law.
But pervasive high-speed internet -- and a whole new generation of hyper-connected young programmers weaned on the web -- has changed the dynamics of programming. Copy and paste is no longer a pejorative term, but a simple observation about how a lot of modern coding gets done, like it or not. This new dynamic was codified into law as Bambrick's 8th Rule of Code Reuse:
It's far easier and much less trouble to find and use a bug-ridden, poorly implemented snippet of code written by a 13 year old blogger on the other side of the world than it is to find and use the equivalent piece of code written by your team leader on the other side of a cubicle partition.(And I think that the copy and paste school of code reuse is flourishing, and will always flourish, even though it gives very suboptimal results.)
Per Mr. Bambrick, copy and pasted code from the internet is good because:
- Code stored on blogs, forums, and the web in general is very easy to find.
- You can inspect the code before you use it.
- Comments on blogs give some small level of feedback that might improve quality.
- Pagerank means that you're more likely to find code that might be higher quality.
- Code that is easy to read and understand will be copied and pasted more, leading to a sort of viral reproductive dominance.
- The programmer's ego may drive her to only publish code that she believes is of sufficient quality.
But copy and pasted code from the internet is bad because:
- If the author improves the code, you're not likely to get those benefits.
- If you improve the code, you're not likely to pass those improvements back to the author.
- Code may be blindly copied and pasted without understanding what the code actually does.
- Pagerank doesn't address the quality of the code, or its fitness for your purpose.
- Code is often 'demo code' and may purposely gloss over important concerns like error handling, sql injection, encoding, security, etc.
Now, if you're copying entire projects or groups of files, you should be inheriting that code from a project that's already under proper source control. That's just basic software engineering (we hope). But the type of code I'm likely to cut and paste isn't entire projects or files. It's probably a code snippet -- an algorithm, a routine, a page of code, or perhaps a handful of functions. There are several established code snippet sharing services:
Source control is great, but it's massive overkill for, say, this little Objective-C animation snippet:
- (void)fadeOutWindow:(NSWindow*)window{ float alpha = 1.0; [window setAlphaValue:alpha]; [window makeKeyAndOrderFront:self]; for (int x = 0; x < 10; x++) { alpha -= 0.1; [window setAlphaValue:alpha]; [NSThread sleepForTimeInterval:0.020]; } }
To me, the most troubling limitation of copypasta programming is the complete disconnect between the code you've pasted and all the other viral copies of it on the web. It's impossible to locate new versions of the snippet, or fold your features and bugfixes back into the original snippet. Nor can you possibly hope to find all the other nooks and crannies of code all over the world this snippet has crept into.
What I propose is this:
// codesnippet:1c125546-b87c-49ff-8130-a24a3deda659 - (void)fadeOutWindow:(NSWindow*)window{ // code } }
Attach a one line comment convention with a new GUID to any code snippet you publish on the web. This ties the snippet of code to its author and any subsequent clones. A trivial search for the code snippet GUID would identify every other copy of the snippet on the web:
https://www.google.com/search?q=1c125546-b87c-49ff-8130-a24a3deda659
I realize that what I'm proposing, as simple as it is, might still be an onerous requirement for copy-paste programmers. They're too busy copying and pasting to bother with silly conventions! Instead, imagine the centralized code snippet sharing services automatically applying a snippet GUID comment to every snippet they share. If they did, this convention could get real traction virtually overnight. And why not? We're just following the fine software engineering tradition of doing the stupidest thing that could possibly work.
No, it isn't a perfect system, by any means. For one thing, variants and improvements of the code would probably need their own snippet GUID, ideally by adding a second line to indicate the parent snippet they were derived from. And what do you do when you combine snippets with your own code, or merge snippets together? But let's not over think it, either. This is a simple, easily implementable improvement over what we have now: utter copy-and-paste code chaos.
Sometimes, small code requires small solutions.
[advertisement] Improve Your Source Code Management using Atlassian Fisheye - Monitor. Search. Share. Analyze. Try it for free! |
April 20, 2009
How Not to Conduct an Online Poll
Inside the Precision Hack is a great read. It's all about how the Time Magazine World's Most Influential People poll was gamed. But the actual hack itself is somewhat less impressive when you start digging into the details.
Here's the voting UI for the Time poll in question.
Casting a vote submits a HTTP GET
in the form of:
https://www.timepolls.com/contentpolls/Vote.do ?pollName=time100_2009&id=1883924&rating=1
Where id is a number associated with the person being voted for, and rating is how influential you think that person is from 1 to 100. Simple enough, but Time's execution was .. less than optimal.
In early stages of the poll, Time.com didn't have any authentication or validation -- the door was wide open to any client that wanted to stuff the ballot box.Soon afterward, it was discovered that the Time.com Poll didn't even range check its parameters to ensure that the ratings fell within the 1 to 100 range
The outcome of the 2009 Time 100 World's Most Influential People poll isn't that important in the big scheme of things, but it's difficult to understand why a high profile website would conduct an anonymous worldwide poll without even the most basic of safeguards in place. This isn't high security; this is web 101. Any programmer with even a rudimentary understanding of how the web works would have thought of these exploits immediately.
Without any safeguards, wannabe "hackers" set out to game the poll in every obvious way you can think of. Time eventually responded -- with all the skill and expertise of ... a team who put together the world's most insecure online poll.
Shortly afterward, Time.com changed the protocol to attempt to authenticate votes by requiring a key be appended to the poll submission URL. The key consisted of an MD5 hash of the URL + a secret word (aka 'the salt'). [hackers eventually] discovered that the salt [..] was poorly hidden in Time.com's voting flash application. With the salt extracted, the autovoters were back online, rocking the vote.
So-called secret poorly hidden on the client: check!
Another challenge faced by the autovoters was that if you voted for the same person more often than once every 13 seconds, your IP would be banned from voting. However, it was noticed that you could cycle through votes for other candidates during those 13 seconds. The autovoters quickly adapted to take advantage of this loophole, interleaving up-votes for moot with down-votes for the competition -- ensuring that no candidate received a vote more frequently than once every 13 seconds, maximizing the voting leverage.
Sloppy, incomplete IP throttling: check!
At this point, here's the mental image I had of the web developers running the show at time.com:
Remember my advice from design for evil?
When good is dumb, evil will always triumph.
Well, here's your proof. I'm not sure they come any dumber than these clowns.
The article goes on to document how the "hackers" exploited these truck sized holes in the time.com online voting system to not only put moot on top, but spell out a little message, too, for good measure:
Looking at the first letters of each of the top 21 leading names in the poll we find the message "marblecake, also the game". The poll announces (perhaps subtly) to the world, that the most influential are not the Obamas, Britneys or the Rick Warrens of the world, the most influential are an extremely advanced intelligence: the hackers.
It's a nice sentiment, I suppose. But is it really a precision hack when your adversaries are incompetent? If you want to read about a real hack -- one that took "extremely advanced intelligence" in the face of a nearly unstoppable adversary -- try the black sunday hack. Now that's a hack.
[advertisement] Improve Your Source Code Management using Atlassian Fisheye - Monitor. Search. Share. Analyze. Try it for free! |
April 16, 2009
Exception-Driven Development
If you're waiting around for users to tell you about problems with your website or application, you're only seeing a tiny fraction of all the problems that are actually occurring. The proverbial tip of the iceberg.
Also, if this is the case, I'm sorry to be the one to have to tell you this, but you kind of suck at your job -- which is to know more about your application's health than your users do. When a user informs me about a bona fide error they've experienced with my software, I am deeply embarrassed. And more than a little ashamed. I have failed to see and address the issue before they got around to telling me. I have neglected to crash responsibly.
The first thing any responsibly run software project should build is an exception and error reporting facility. Ned Batchelder likens this to putting an oxygen mask on yourself before you put one on your child:
When a problem occurs in your application, always check first that the error was handled appropriately. If it wasn't, always fix the handling code first. There are a few reasons for insisting on this order of work:
- With the original error in place, you have a perfect test case for the bug in your error handling code. Once you fix the original problem, how will you test the error handling? Remember, one of the reasons there was a bug there in the first place is that it is hard to test it.
- Once the original problem is fixed, the urgency for fixing the error handling code is gone. You can say you'll get to it, but what's the rush? You'll be like the guy with the leaky roof. When it's raining, he can't fix it because it's raining out, and when it isn't raining, there's no leak!
You need to have a central place that all your errors are aggregated, a place that all the developers on your team know intimately and visit every day. On Stack Overflow, we use a custom fork of ELMAH.
We monitor these exception logs daily; sometimes hourly. Our exception logs are a de-facto to do list for our team. And for good reason. Microsoft has collected similar sorts of failure logs for years, both for themselves and other software vendors, under the banner of their Windows Error Reporting service. The resulting data is compelling:
When an end user experiences a crash, they are shown a dialog box which asks them if they want to send an error report. If they choose to send the report, WER collects information on both the application and the module involved in the crash, and sends it over a secure server to Microsoft.The mapped vendor of a bucket can then access the data for their products, analyze it to locate the source of the problem, and provide solutions both through the end user error dialog boxes and by providing updated files on Windows Update.
Broad-based trend analysis of error reporting data shows that 80% of customer issues can be solved by fixing 20% of the top-reported bugs. Even addressing 1% of the top bugs would address 50% of the customer issues. The same analysis results are generally true on a company-by-company basis too.
Although I remain a fan of test driven development, the speculative nature of the time investment is one problem I've always had with it. If you fix a bug that no actual user will ever encounter, what have you actually fixed? While there are many other valid reasons to practice TDD, as a pure bug fixing mechanism it's always seemed far too much like premature optimization for my tastes. I'd much rather spend my time fixing bugs that are problems in practice rather than theory.
You can certainly do both. But given a limited pool of developer time, I'd prefer to allocate it toward fixing problems real users are having with my software based on cold, hard data. That's what I call Exception-Driven Development. Ship your software, get as many users in front of it as possible, and intently study the error logs they generate. Use those exception logs to hone in on and focus on the problem areas of your code. Rearchitect and refactor your code so the top 3 errors can't happen any more. Iterate rapidly, deploy, and repeat the proces. This data-driven feedback loop is so powerful you'll have (at least from the users' perspective) a rock stable app in a handful of iterations.
Exception logs are possibly the most powerful form of feedback your customers can give you. It's feedback based on shipping software that you don't have to ask or cajole users to give you. Nor do you have to interpret your users' weird, semi-coherent ramblings about what the problems are. The actual problems, with stack traces and dumps, are collected for you, automatically and silently. Exception logs are the ultimate in customer feedback.
Am I advocating shipping buggy code? Incomplete code? Bad code? Of course not. I'm saying that the sooner you can get your code out of your editor and in front of real users, the more data you'll have to improve your software. Exception logs are a big part of that; so is usage data. And you should talk to your users, too. If you can bear to.
Your software will ship with bugs anyway. Everyone's software does. Real software crashes. Real software loses data. Real software is hard to learn, and hard to use. The question isn't how many bugs you will ship with, but how fast can you fix those bugs? If your team has been practicing exception-driven development all along, the answer is -- why, we can improve our software in no time at all! Just watch us make it better!
And that is sweet, sweet music to every user's ears.
[advertisement] Improve Your Source Code Management using Atlassian Fisheye - Monitor. Search. Share. Analyze. Try it for free! |
April 15, 2009
Is Open Source Experience Overrated?
I'm a big advocate of learning on the battlefield. And that certainly includes what may be the most epic battle of them all: open source software.
Contribute to an open-source project. There are thousands, so pick whatever strikes your fancy. But pick one and really dig in, become an active contributor. Absolutely nothing is more practical, more real, than working collaboratively with software developers all over the globe from all walks of life.
If you're looking to polish your programming chops, what could possibly be better, more job-worthy experience than immersing yourself in a real live open source software project? There are thousands, maybe hundreds of thousands, and a few of them have arguably changed the world.
Unfortunately, that wasn't what happened for one particular open source developer. In an anonymous email to me, he related his experiences:
I'm a programmer with 14 years of experience both inside academics and in commercial industry currently looking for work. In both my cover letters and my resume I indicate that I am the architect of a couple of open source Java projects where the code, design and applications were available on the web.One company seemed impressed with my enthusiasm for the job but it was part of their policy to provide coding tests. This seemed perfectly reasonable and I did it by using the first solution I thought about. When I got to the phone interview, the guy spent about five minutes telling me how inefficient my coding solution was and that they were not very impressed. Then I asked whether he had looked at the open source projects I mentioned. He said no - but it seems his impression was already set based on my performance in the coding test. The coding test did not indicate what criteria they were using for evaluation but my solution seemed to kill the interview.
In another call, I was talking with a recruiter who was trying to place someone for a contract Java development assignment. I told her that most of my recent work was open source and that she could inspect it if she wanted to assess my technical competence. Five minutes later she phoned back and said I appeared to lack any recent commercial experience. I had demonstrable open source applications that used the technologies they wanted, but it didn't appear to matter.
With yet another recruiter I told him that even years ago when I had worked on commercial projects, before I went back to school, the proprietary nature of my jobs prevented me from mentioning the specifics about a lot of what I did. The badge of commercial software experience didn't necessarily prove either my technical competence or my relative contribution to the projects. What my experience of working in industry long ago did teach me was how to fill out a time sheet and estimate time for deliverables. But this experience would seem a bit dated now for recruiters.
That's a terrible interview track record for the open source experience that I advocated so strongly. He continues:
I think it's important that I try to see their point of view. A lot of open source projects are probably poorly written and made in response to a neat idea rather than to requirements from some user community. In academia, the goal for development is often more about publishing papers than establishing a user base. Industry people sometimes have the view (sometimes justified and sometimes not) that open source developers who emerge from academic projects lack practical skills. I don't necessarily claim my open source code is the best in the world but it works, it's documented and it's available for scrutiny. One of the reasons I worked so hard on open source projects was to make job interviews easier. By providing prospective employers with large samples of publically available working code, I thought I would give them something more useful to think about than my performance on a particular coding test or whether the acronyms in the job skills matched my "years spent". I am very aware of the hype behind open source. I've heard it, lived it and even spun some of it myself. But sometimes it's good to take a sobering reality check -- is open-source experience overrated?
It's disheartening to hear so many prospective employers completely disregard experience on open source projects. It's a part of your programming portfolio, and any company not even willing to take a cursory look at your portfolio before interviewing you is already suspect. This reflects poorly on the employers. I'm not sure I'd want to work at a place where a programmers' prior body of work is treated as inconsequential.
On the other hand, perhaps the choice of open source project matters almost as much as the programming itself. How many open source projects labor away in utter obscurity, solving problems that nobody cares about, problems so incredibly narrow that the authors are the only possible beneficiaries? Just as commercial software can't possibly exist without customers, perhaps open source experience is only valid if you work on a project that attains some moderate level of critical mass and user base. Remember, shipping isn't enough. Open source or not, if you aren't building software that someone finds useful, if you aren't convincing at least a small audience of programmers that your project is worthwhile enough to join --
Then what are you really doing?
[advertisement] Improve Your Source Code Management using Atlassian Fisheye - Monitor. Search. Share. Analyze. Try it for free! |
A Modest Proposal for the Copy and Paste School of Code Reuse
How Not to Conduct an Online Poll
Exception-Driven Development
Is Open Source Experience Overrated?
Death to the Space Infidels!
Training Your Users
Has The Virtualization Future Arrived? (106)
Making Donations Easy (79)
Death to the Dialog Box (14)
A Programmer's Portfolio (15)
A Modest Proposal for the Copy and Paste School of Code Reuse (138)
CAPTCHA Effectiveness (99)
