Archive for the ‘ Refactoring ’ Category

Macros: You Oughta Know

One of the most useful tools available in any decent text editor is the macro recorder, but it’s criminally underused. It seems most people either don’t know the functionality exists, or simply ignore it. This is a shame, since it’s a great timesaver.

I don’t know why macros are so underused. It might be a mindset thing – it can take a little while to develop the ability to spot repetitive editing tasks quickly (i.e. not when you’re 75% of the way through thinking dang, I have to do this again?), so maybe many people never quite make the leap.

It’s worth it though, because once you get your eye in you see chances to use macros everywhere.

I had a useful example just yesterday, in which I needed to make a change to a colossal switch statement (220 branches! Run the cyclomatic-complexity doohickey on THAT!) and had no unit tests to fall back on.

If I had to modify (and hopefully refactor) such a huge construct I wanted to be able to compare before-and-after test results, but I didn’t much fancy hand-cranking a few hundred unit tests.

By recording a temporary macro, however, it took just a couple of minutes to cover every branch. I’ve decided to post a detailed walkthrough of the process here in the hopes that a fairly simple example will be illustrative for those that don’t already lean heavily on macros.

Note that this is not an advanced tutorial. Please refrain from leaving snarky comments about how macros are so much more powerful than this – I’m just doing some introductory material here :-)

Here is a representative snippet of the C# source. It’s part of a legacy permissioning system that, under certain circumstances, needs to check for the existence of a permission represented by an enum against a permission table containing a string-based hierarchy (application/role/permission). The code I was modifying did the appropriate conversion:

    case PermissionKey.SecurityParameterManagementAdd:
    {
        return new string[] {"Security", "ParameterManagement", "Add"};
    }
    case PermissionKey.SecurityRoleManagementAdd:
    {
        return new string[] {"Security", "RoleManagement", "Add"};
    }
    case PermissionKey.SecurityRoleManagementModify:
    {
        return new string[] {"Security", "RoleManagement", "Modify"};
    }
    case PermissionKey.SecurityUserManagementDelete:
    {
        return new string[] {"Security", "UserManagement", "Delete"};
    }
    case PermissionKey.SecurityPermissionManagementAdd:
    {
        return new string[] {"Security", "PermissionManagement", "Add"};
    }

I needed to add a couple of branches to this, but I also wanted to tidy up the code by removing the superfluous braces, as a precursor to converting it into something a bit more robust and maintainable. I wanted unit test coverage to give me confidence that I hadn’t mucked up some logic and inadvertantly granted admin access to the helpdesk trainee role or something.

So, I copied the entire switch body into Notepad++ (well, vim really, but I’ll pretend it’s Notepad++ for the sake of making this post a bit more accessible) and set to work[1].

Before recording my macro, I needed to do a bit of preprocessing to trim the code down to just the data I wanted to work with. The following steps show the ‘find’ regexes I used (in each case, the value of the replace field was empty, so these are effectively deletes), and the effect on the first switch branch from the list above:

1) Remove opening and closing braces from every switch branch:

^\s+[\{\}]$
    case PermissionKey.SecurityParameterManagementAdd:
 
        return new string[] {"Security", "ParameterManagement", "Add"};

2) Remove blanks – TextFX/Edit/Delete Blank Lines

    case PermissionKey.SecurityParameterManagementAdd:
        return new string[] {"Security", "ParameterManagement", "Add"};

3) Remove case statements and leading whitespace:

^\s+case\s+
PermissionKey.SecurityParameterManagementAdd:
PermissionKey.SecurityParameterManagementAdd:
        return new string[] {"Security", "ParameterManagement", "Add"};

4) Remove colon from end of case statement:

:$

5) Remove return statement and leading whitespace:

^\s+return new string\[\]\s*

I ended up with a sequence of couplets looking similar to this one:

PermissionKey.SecurityParameterManagementAdd
{"Security", "ParameterManagement", "Add"};

Now the fun starts – lets walk through the process.

We want to convert the first couplet into a simple unit test fixture, and record the process. This will be our macro – the instructions for converting one couplet into one unit test. We can then play the macro multiple times to convert all the others effortlessly.

Start by moving the cursor to the start of the line, before the ‘P’ of PermissionKey. This is the start point of the macro, so for the macro to be repeatable we must make sure that we finish recording the macro in perfect position to run it again, i.e. before the ‘P’ of PermissionKey for the next couplet (column 0 line 3). Hit Ctrl-Shift-R to start recording.

It is important not to use the mouse when editing – stick to the keyboard. It’s also important not to record keystrokes that are too specific to one bit of code. For instance, don’t use the arrow keys to move left and right character-by-character, because it won’t work on longer or shorter lines.

Instead, use the Home and End keys to jump to the start or end of the line, and hold Ctrl whilst arrowing left or right to move a word at a time instead of a character at a time (this is one of the areas where vim’s movement commands really differentiate it from wannabes like Notepad++…but I digress). See the ‘Detailed Instructions’ section below for more information.

Assume the original switch body is in a method called ‘LookupEnumPermission’. The couplet should be edited to look like this (without the linewrap…):

[Test]
public void TestSecurityParameterManagementAdd()
{
    string[] result = LookupEnumPermission(
            PermissionKey.SecurityParameterManagementAdd);
    Assert.AreEqual("Security", result[0]);
    Assert.AreEqual("ParameterManagement", result[1]);
    Assert.AreEqual("Add", result[2]);
}

Make sure you finish by moving the cursor into position for the next couplet, and hit Ctrl-Shift-R again to stop recording.

Now, hit Ctrl-Shift-P to play back the macro. If you’ve done everything right, the next couplet should magically format itself into a unit test. Hit Ctrl-Shift-P again, and the next couplet will change too. Under the Macro menu, select ‘Run a macro multiple times…’ and you can enter a fixed number of iterations, or just apply the macro over and over again until the end of the file is reached.

Finally, you can copy the unit tests into a new or existing test fixture, and you’re done! In much less time (hopefully) and with fewer errors than if the tests had been written one-by-one.

Detailed Instructions:

These are ley-by-key instructions in Notepad++, in case something in the description above is unclear. Visual Studio should be similar. Vim will be faster once you’ve learned how, but I’ll assume if you use vim you’re already au fait with this sort of editing :-)

  1. Type [Test], and hit enter to start a new line.
  2. Type ‘public void Test’ and hit Enter.
  3. Type ‘{‘ and hit Enter, then Tab.
  4. Hold Ctrl and tap the right arrow twice to jump over a couple of words and place the cursor at the start of the word SecurityParameterManagementAdd, then hold Ctrl-Shift and right arrow again to select the word. Ctrl-C to copy, then arrow up two lines and paste it after the word ‘Test’ to create the full function name TestSecurityParameterManagementAdd. Type () for the empty parameter list.
  5. Arrow down two lines and hit Home to jump to the start of the line. Type ‘string[] result = LookupEnumPermission(‘, then hit End to jump to the end of the line and type ‘);’.
  6. Arrow down one line, hit Home, then Tab. Type ‘Assert.Equals(‘ then hit Delete to remove the ‘{‘. Hold Ctrl and move right three times (to move the cursor just past the comma) and type ‘result[0]);’ and hit Enter.
  7. Repeat variations of step 7 a couple of times to convert the next two lines. Remember to use the correct indexes (result[1] and result[2]). Hit Enter after the last line and type ‘}’ to close the function body.
  8. Arrow down one line and hit Home to place the cursor at the correct start position for the next couplet, and end the macro by hitting Ctrl-Shift-R again.

[1]I could have just done this in a new file in Visual Studio, but for some reason I find VS intolerably slow at running macros once recorded. So slow, in fact, that you can watch the cursor laboriously complete each step – I wind up thinking it would have been quicker to do it manually. That might just be something odd about my VS installation though, as no-one else seems to think it’s slow.

  • Share/Bookmark

Test-specific Shims in Production Code

We’re currently on a fairly major kick to increase automated test coverage of our software. This doesn’t just mean ‘get the unit test coverage up to scratch’, it also means we are working towards full end-to-end integration testing using, amongst other tools, some front-end automation tools such as QTP and Selenium.

Of course, nothing is ever easy when trying to polish away the tarnish of ancient code. One particular problem we face regularly is patching up code that breaks the fragile expectations of some of these automation tools.

Some of our applications – including the one I am working to refactor – contain UI widgets that use a lot of custom painting routines and conceal data pretty well. One widget, for instance, needs to display data with a fast refresh rate and so uses a double-buffered approach to avoid flicker. The data it displays, however, is not stored anywhere; it is discarded as soon as it is rendered. And since the whole widget view is rendered as a bitmap and blitted to screen, there’s no convenient hierarchy of panels, labels, text boxes, or any other standard controls.

This, the Automated QA folks tell me, causes a problem since QTP mainly works by reflecting on properties exposed by controls to get at their data. So, if QTP wants to read some data from a text box, it accesses the Text property of that text box. Simple. But this particular widget doesn’t have the equivalent of a Text property.

This isn’t really an oversight from a purely functional point of view, since no part of the actual application code ever needs to get data from the widget – it’s a display mechanism only, not an interactive widget like a text box. Data is received from a web service, processed a bit, and dumped into the widget. The widget is the last object to do anything with the data – no other part of the app ever needs it again.

Since there are no properties on the widget exposing the data, QTP can’t get at it.

Of course, there are ways to keep QTP happy. We can add a few properties to the widget and keep some data around in member variables, or we can write some extensions for QTP that allow it to access some of the widget’s internals. The second way is probably the ‘right’ way since it keeps test-related code external to the application code – but it’s more time-consuming, and also has a training cost since most developers aren’t going to be familiar with QTP’s API.

This leaves the first option. Traditionally I’ve always been a bit wary of having what is effectively test code (since it only exists for testing purposes) deployed with production code. Furthermore, doesn’t it undermine the tests themselves, since they are dependent on code that never gets executed in production?

On the other hand, in some instances it may be the more pragmatic thing to do. It’s difficult to justify spending a day or two writing a few hundred lines of QTP extension code when the same effect can be garnered by adding a single read-only property. It still doesn’t quite sit right for me though, and I can’t find much in the way of authoritative literature that argues one way or the other.

  • Share/Bookmark

The P.G. Wodehouse Method Of Refactoring

I am much given to ruminating on refactoring at the moment, as one of my current projects is a major overhaul of a fairly large (>31,000 lines) application which has exactly the kind of dotted history any experienced developer has learned to fear – written by many different people, including short-term contractors, at a time in the company’s life when first-mover advantage was significantly more important than coding best-practice, and without any consistent steer on the subjects of structure, coding conventions, unit tests, and so on.

In other words, here be dragons.

In fairness, the application works and has been a critical part of a company that has gone from nothing to market-leading multinational in 7 years, so it has certainly pulled its weight. It is in desperate need of a spring-clean though, and my team volunteered to spend 3 months evicting the cobwebs and polishing the brasswork.

Yes, volunteered – it’s a fascinating challenge, though perhaps not something you’d want to make a career of.

Now, the first mistake to avoid here is the compulsion to throw it away and rewrite from scratch. So often when confronted with a vast seething moiling spiritless mass of code a developer throws his hands into the air and declares it a lost cause. How seductive is the thought that 31,000 lines of code could be thrown away and replaced with ~15,000 lines of clean, well-designed, beautiful code?

Sadly, that’s often a path to disaster. It’s almost a rule of the game. jwz left Netscape because he knew their decision to rewrite from scratch was doomed. Joel Spolsky wrote a rant about the same decision – in fact, the Netscape rewrite is commonly cited as a major factor in Netscape losing the first browser war.

The problem is that warty old code isn’t always just warty – it’s battle-scarred. It has years of tweaks and bug-fixes in there to deal with all sorts of edge conditions and obscure environments. Throw that out and replace it with pristine new code, and you’ll often find that a load of very old issues suddenly come back to haunt you.

So, a total rewrite is out. This means working with the old code, and finding ways to wrestle it into shape. Naturally, Working Effectively With Legacy Code now has an even more firmly established place on my ‘critical books’ bookshelf than it did before.

Inspiration came from a less well-known book, however. Buried in Chapter 10 of Code Reading is a single paragraph suggesting that it can be useful when working with unfamiliar code to paste it into a word processor and zoom out, getting a ‘bird’s eye’ view.

One other interesting way to look at a whole lot of source code quickly under Windows is to load it into Microsoft Word and then set the zoom factor to 10%. Each page of code will appear at about the size of a postage stamp, and you can get a surprising amount of information about the code’s structure from the shape of the lines.

(Spinellis, 2003)

The idea is that this lets you immediately identify potential trouble spots – if you see pages where the code is all bunched up on the right, it indicates massive nesting and over-long functions. If you see heavy congestion, it indicates dense code. It’s also easy to spot giant switch statements and other crimes against humanity.

Of course, you don’t actually need MS Word to do this – the Print Preview in Open Office is more than sufficient, and no doubt most office suites can do the same.

This 50,000ft view could be a useful tool in tracking progress. I mean sure, we can have our build system spit out cyclomatic complexity and code size metrics, but wouldn’t it be neat if we could do a weekly bird’s-eye printout of the source code and pin it up on the wall, giving a nice simple visual representation of the simplification of the code?

Except, of course, that with average page lengths of 45 lines we’d need almost 700 pages each time, and a hell of a lot of wall space.

A better solution would be to print a class per page. At the start of the project, the application had about 150 classes, and the refactoring effort is focussed on about 80 of those. Initially, gigantic classes would be an incomprehensible smudge of grey, but as the refactoring process starts tidying the code and factoring out into other classes, the weekly printout would start to literally come into focus, hopefully ending up with many pages actually containing readable code (which happens roughly when the class is small enough to fit on no more than 3 pages at normal size).

The first time we pinned up the printouts, I suddenly recalled a Douglas Adams foreword reprinted in The Salmon of Doubt. Adams was a great fan of P.G. Wodehouse, and explained Wodehouse’s interesting drafting technique:

It is the next stage of writing—the relentless revising, refining, and polishing—that turned his works into the marvels of language we know and love. When he was writing a book, he used to pin the pages in undulating waves around the wall of his workroom. Pages he felt were working well would be pinned up high, and those that still needed work would be lower down the wall. His aim was to get the entire manuscript up to the picture rail before he handed it in.

(Adams, 2002)

Hmm, isn’t redrafting a literary cousin of refactoring? In many ways, I think it is – so why not apply this technique to refactoring?

And we’ve made it so. We tied a piece of string horizontally across the wall – that’s our ‘picture rail’. Every week we reprint the classes we have been working on, and replace the old printouts. Then we move them up towards the string, in accordance with how happy we are with the view.

Obviously, this doesn’t replace all the other tools we have for evaluating code quality – e.g. the aforementioned metrics, unit tests, manual QA, and so on. It does, however, make for a brilliant way of tracking our subjective satisfaction with the class. Software quality tools can never completely replace the gut instinct of a developer – you might have massive test coverage, but that won’t help with subjective measures such as code smells. With Wodehouse-style refactoring, we can now easily keep track of which code we are happy with, and which code we remain deeply suspicious of.

As an added benefit, all those pages nicely cover up the hideous wall colour. Bonus!

  • Share/Bookmark

Legacy Code, Refactoring, and Ownership

Refactoring is good. Everyone knows that. Since Fowler popularised the concept with the seminal Refactoring: Improving the Design of Existing Code it’s become a staple of the industry, and has pride of place on many a bookshelf. In the many, many articles and discussions of the subject, the key goals and benefits of refactoring are generally taken to be the improvement of readability, testability, decoupling, and other similar worthy ideals. For me, however, there is another very distinct benefit, often overlooked. Fowler touches upon it, but doesn’t really develop it, early on in Refactoring:

I use refactoring to help me understand unfamiliar code. When I look at unfamiliar code, I have to try to understand what it does. I look at a couple of lines and say to myself, oh yes, that’s what this bit of code is doing. With refactoring I don’t stop at the mental note. I actually change the code to better reflect my understanding, and then I test that understanding by rerunning the code to see if it still works.

(Fowler, Refactoring, 1999)

By investigating a piece of code thoroughly enough to understand how it works, refactoring it to map directly on to your understanding, and reinforcing everything with good unit tests, you take ownership of the code. It’s yours now.

This is very important, psychologically. Almost every developer feels more at home with their own code than somebody else’s. That’s why you feel uncomfortable and deflated when, 20 minutes into deciphering a nasty bit of opaque gibberish, you realise it was something you yourself wrote a year earlier and subsequently forgot about.

When you refactor, you rewrite code to a greater or lesser extent. Having done so, the resulting feeling of ownership (alongside increased understanding, of course) makes the code much less scary. The benefit of this is less marked in agile methodologies or TDD, of course, since in those cases quite often the code you are refactoring was written by you anyway. Working with legacy code, though, it’s a big deal.

In the preface to Working Effectively With Legacy Code, Feathers asks “what do you think about when you hear the term legacy code?” (Feathers, 2004). He answers by stating that the standard definition is “difficult-to-change code that we don’t understand” and adds his own preferred definition which is, in essence, “code without tests”.

My own definition of legacy code would include, in many cases, code that isn’t mine. By ‘mine’ I don’t exclusively mean code I wrote personally; I also mean code written by my team, or even code written by people who sit a couple of desks down who I can go and pester about it (which is stretching the definition a bit, admittedly).

In short, legacy code for me is code that no longer has any accessible owner. Like a stray cat or dog, code without an owner goes feral. Refactoring is the process of taming feral code, but as with stray cats much of the benefit comes from re-homing. This is a vital process, even if a fairly unconscious one. When you first come face to face with some hideous 5000-line spaghetti monster of a function your heart sinks – how can anyone ever hope to understand that, let alone modify it safely? Especially if the only people that ever worked with it left the company 3 years ago?

Refactoring allows you to split this code up, create classes to better represent the problem domain, improve abstraction, add tests, and all that other good stuff; at the same time, the process of doing so makes the code yours. You make the decisions about the classes to create and the abstractions to introduce. You write the tests that ferret out all the little idiosyncrasies, and uncover the unwritten assumptions. By the end of the process, the code feels like yours. And that means that the next time you have to make a change there, you benefit from the double whammy of code that is not only well-written and tested, but recognisably yours; and that’s the kind of code that you won’t mind working with.

  • Share/Bookmark