A Refactoring Example
by Michael SchwernOctober 09, 2003
About a year ago, a person asked the Fun With Perl mailing list about some code they had written to do database queries. It's important to note that this person was posting from an .it address; why will become apparent later. The code was reading records in from a text file and then doing a series of queries based on that information. They wanted to make it faster.
Here's his code:
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($riga=<INPUT>){
$nump++;
chop($riga);
$pagina[$nump] = $riga;
$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
pageid='$pagina[$nump]' and data>='$startdate'");
$sth->execute;
$totalvisit[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataoggi')");
$sth->execute;
$totalvisittoday[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataieri')");
$sth->execute;
$totalyvisit[$nump] = $sth->fetchrow_array();
$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data<='$fine30gg' and data>='$inizio30gg')");
$sth->execute;
$totalmvisit[$nump] = $sth->fetchrow_array();
}
I decided that rather than try to read through this code and figure out what it's doing and how to make it faster, I'd clean it up first. Clean it up before you figure out how it works? Yes, using a technique called Refactoring.
Refactoring?
In his book, Martin Fowler defines Refactoring as "the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure." In other words, you clean up your code but don't change what it does.
Refactoring can be as simple as changing this code:
$exclamation = 'I like '.$pastry.'!';
To this:
$exclamation = "I like $pastry!";
Still does the same thing, but it's easier to read.
It's important to note that I don't need to know anything about the
contents of $pastry
or how $exclamation
is used. The change is
completely self-contained and does not affect surrounding code or
change what it does. This is Refactoring.
On the principle of "show me don't tell me," rather than talk about it, we'll dive right into refactoring our bit of code.
Fix the Indentation
Your first impulse when faced with a hunk of code slammed against the left margin is to indent it. This is our first refactoring.
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($riga=<INPUT>){
$nump++;
chop($riga);
$pagina[$nump] = $riga;
$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
pageid='$pagina[$nump]' and data>='$startdate'");
$sth->execute;
$totalvisit[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataoggi')");
$sth->execute;
$totalvisittoday[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data='$dataieri')");
$sth->execute;
$totalyvisit[$nump] = $sth->fetchrow_array();
$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
(pageid='$pagina[$nump]' and data<='$fine30gg'
and data>='$inizio30gg')");
$sth->execute;
$totalmvisit[$nump] = $sth->fetchrow_array();
}
close (INPUT);
Already it looks better. We can see that we're iterating over a file, performing some SELECTs on each line and shoving the results into a bunch of arrays.
A Single, Simple Change
One of the most important principles of Refactoring is that you work in small steps. This re-indentation is a single step. And part of this single step includes running the test suite, logging the change, and checking it into CVS.
Checking into CVS after something this simple? Yes. Many programmers ask the question, "When should I check in?" When you're refactoring it's simple: check in when you've done one refactoring and have tested that it works. Our re-indentation is one thing; we test that it works and check it in.
This may seem excessive, but it prevents us from entangling two unrelated changes together. By doing one change at a time we know that any new bugs were introduced by that one change. Also, you will often decide in the middle of a refactoring that it's not such a good idea. When you've checked in at every one you can simply rollback to the last version rather than having to undo it by hand. Convenient, and you're sure no stray bits of your aborted change are hanging around.
So our procedure for doing a proper refactoring is:
- Make one logical change to the code.
- Make sure it passes tests.
- Log and check in.
Big Refactorings from Small
The goal of this refactoring is to make the code go faster. One of
the simplest ways to do achieve that is to pull necessary code out of
the loop. Preparing four new statements in every iteration of the
loop seems really unnecessary. We'd like to pull those
prepare()
statements out of the loop. This is a refactoring. To achieve this
larger refactoring, a series of smaller refactorings must be done.
Use Bind Variables
Each time through the loop, a new set of SQL statements is created
based on the line read in. But they're all basically the same, just
the data is changing. If we could pull that data out of the statement
we'd be closer to our goal of pulling the prepare()
s out of the loop.
So my next refactoring pulls variables out of the SQL statements and
replaces them with placeholders. Then the data is bound to the
statement using bind variables. This means we're now
prepare()
ing the
same statements every time through the loop.
Before refactoring:
$sth= $dbh->prepare("SELECT count(*) FROM lognew WHERE
pageid='$pagina[$nump]' and data>='$startdate'");
$sth->execute;
After refactoring:
$sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
pageid=? and data>=?');
$sth->execute($pagina[$nump], $startdate);
Bind variables also protect against a naughty user from trying to slip some extra SQL into your program via the data you read in. As a side-effect of our code cleanup, we've closed a potential security hole.
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($riga=<INPUT>){
$nump++;
chop($riga);
$pagina[$nump] = $riga;
$sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
pageid=? and data>=?');
$sth->execute($pagina[$nump], $startdate);
$totalvisit[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data=?)');
$sth->execute($pagina[$nump], $dataoggi);
$totalvisittoday[$nump] = $sth->fetchrow_array();
$sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data=?)');
$sth->execute($pagina[$nump], $dataieri);
$totalyvisit[$nump] = $sth->fetchrow_array();
$sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data<=? and data>=?)');
$sth->execute($pagina[$nump], $fine30gg, $inizio30gg);
$totalmvisit[$nump] = $sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
Split a Poorly Reused Variable
The next stumbling block to pulling the prepare()
statements out of
the loop is that they all use the same variable, $sth
. We'll have
to change it so they all use different variables. While we're at it,
we'll name those statement handles something more descriptive of what
the statement does. Since at this point we haven't figured out what
the statements do, we can base the name on the array it gets assigned
to.
While we're at it, throw in some my()
declarations to limit the scope
of these variables to just the loop.
open (INPUT, "< $filepageid") || &file_open_error("$filepageid");
while ($riga=<INPUT>){
$nump++;
chop($riga);
$pagina[$nump] = $riga;
my $totalvisit_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
pageid=? and data>=?');
$totalvisit_sth->execute($pagina[$nump], $startdate);
$totalvisit[$nump] = $totalvisit_sth->fetchrow_array();
my $totalvisittoday_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data=?)');
$totalvisittoday_sth->execute($pagina[$nump], $dataoggi);
$totalvisittoday[$nump] = $totalvisittoday_sth->fetchrow_array();
my $totalyvisit_sth = $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data=?)');
$totalyvisit_sth->execute($pagina[$nump], $dataieri);
$totalyvisit[$nump] = $totalyvisit_sth->fetchrow_array();
my $totalmvisit_sth= $dbh->prepare('SELECT count(*) FROM lognew WHERE
(pageid=? and data<=? and data>=?)');
$totalmvisit_sth->execute($pagina[$nump], $fine30gg, $inizio30gg);
$totalmvisit[$nump] = $totalmvisit_sth->fetchrow_array();
}
close (INPUT);
Test. Log. Check in.
