Archive for July, 2008

Comment Discontent

Wednesday, July 30th, 2008

There seems to have been a recent outbreak in blog posts about code commenting. As is so often the case with topics such as this, everyone has an opinion and they all seem to be different. It’s quite an eye-opener seeing some of the explanations, justifications, and outright haranguing used in defence of all sorts of weird and wonderful stances.

I got a wry smile from stevey’s post, as I recognise only too well the tendency to write narrative comments. I’m sure there’s plenty of code from early in my career still floating around in various company codebases where the code/comment ratio is something embarrassing. I’ve mostly shaken that off now, though I sometimes have to fight my inner raconteur when writing something I think is neat or clever.

Jeff Atwood, as is so often the case recently, contradicted his own previous post on the matter (replacing the statement “comments can never be replaced by code alone” with “if your feel your code is too complex to understand without comments, your code is probably just bad”) and endearingly veered wildly to and fro across a sensible medium[1], without ever quite hitting it. Coding Horror, indeed.

So far, so blah; every time an argument on comments flares up we see the same thing. Something I’ve not noticed before though, either because I wasn’t paying attention or because it’s a new thing, is a trend amongst the I-don’t-need-comments crowd to advocate very long and detailed method names as an alternative.

As neophyte coders we all have it drilled into us that we must use descriptive names. Programming gospel, as handed down in sacred tomes such as Code Complete, tell us not to use names like ‘i’ and ‘tmp’ except in very specific circumstances (e.g. loop indexes and tempfile handles). And, without question, this is good solid advice. Take heed, young Padawan, etc.

But can you take it too far? It’s not something I’ve really come up against, but it seems to be increasingly popular. One response to Jeff’s post suggested (only in passing, to be fair) using a function name like newtonRaphsonSquareRoot. A digg comment (OK, OK, not exactly the fount of all knowledge) vehemently defended the virtue of the frankly-scary RunEndOfMonthReportsUnlessTheMonthStartsOnAFridayIn
WhichCaseRunTheWeeklyReportInstead
(!)

The argument is that with names like these, you don’t need comments, since it is perfectly clear what the function does. Is it perfectly clear at the wrong level though? Function names like this, in my opinion, are so ‘clear’ that they leak. These are function names that violate the principle of encapsulation.

If I write a square root function, why do I need to burden all my clients with information about how I’ve implemented it? By naming it newtonRaphsonSquareRoot, that’s exactly what I’m doing. Unless there are specific performance implications/requirements that favour Newton-Raphson, in most cases my clients just want a damn square root calculated to within a specified tolerance and don’t care whether I used Newton’s method or one of the army of alternatives. The implementation should be private to the method, and no-one else’s business.

Worse, what if a requirements change means a switch to Walsh’s fast reciprocal method? Uh-oh, now my method name is completely misleading, so I have to change it. Oops, now I have to change all the client code that calls it! I’d better hope no-one has exposed this with [WebMethodAttribute] since I wrote it, otherwise there could be thousands of client applications out there relying on it. My funky rename refactoring can’t save me now.

If every tiny change propagates through the system requiring hundreds of source files to change, and possibly external apps as well, you may as well just copy ‘n’ paste the code everywhere it’s needed and doing away with the function completely. Hell, who needs abstraction anyway?

We all do, of course, which is why I think names like this are a bad smell. The same goes for RunEndOfMonthReportsUnless... - what happens when the requirements change? This method name couples the public interface (method name) to the private implementation, which is exactly what you’re not supposed to do. RunEndOfMonthReports is probably sufficient. Separate interface and implementation. This is programming 101, people, it shouldn’t be beyond our grasp.

[1]I agree with Dan Dyer that the best choice is as follows:

/**
 *  Approximate the square root of n, to within the specified
 *  tolerance, using the Newton-Raphson method.
 */
private double approximateSquareRoot(double n, double tolerance)
{
    double root = n / 2;
    while (abs(root - (n / root)) > tolerance)
    {
        root = 0.5 * (root + (n / root));
    }
    return root;
}

The function name is descriptive and clear whilst remaining general enough to allow an alternative implementation. Anyone who cares enough about the implementation (for performance reasons, or simply curiosity) can find enough information in the comment to start their investigation, without having the details jammed in their face every time they call it.

These…Are Not The Hammer

Sunday, July 20th, 2008

So, yesterday saw the end of a weird week of Whedon wonderfulness when the third and final episode of Dr Horrible’s Sing-Along Blog was released. All three episodes are now free to view until the end of the day, so run, don’t walk, over there now. After all, how often do you get the chance to see a musical comedy about a supervillain who video blogs? With Doogie Howser and Malcolm Reynolds!

Lexical Closures in C# 3.0

Tuesday, July 1st, 2008

There’s a slightly weird article up on Dobbs Code Talk this week, speculating that aggregate functions are “the next big programming language feature” after closures. The slight weirdness comes from the fact that both features have been around for decades, and not just in dusty academic languages either.

Still, there’s some interesting discussion in the comments about whether .Net’s closures are proper first-class lexically-scoped closures. The answer is yes - but with a fun twist.

The twist has been around for a long time - Brad Abrams blogged about it way back in 2004, for instance - but it’s probably worth going over it again, since the recent arrival of LINQ and lambda syntax in C# 3.0 will presumably lead to more people being bitten by this as the use of closures becomes more mainstream.

A key thing to remember is that C# lambdas are just anonymous delegates in skimpy syntax. Behind the scenes the compiler turns them into classes - if you were looking at disassembled MSIL you wouldn’t be able to tell whether the code was written with lambda syntax or anonymous delegate syntax. Therefore, the issue discussed by Brad has not gone anywhere.

Lets revisit the problem, with a 2008 sheen applied (i.e. I’ll use lambda syntax rather than anonymous delegate syntax). What does the following code display?

Func<int>[] funcs = new Func<int>[10];
for (int i = 0; i < 10; ++i)
{
    funcs[i] = () => i * i;
}

funcs.ForEach(f => Console.WriteLine(f()));

If you answered something along the lines of “prints the square of every number between 0 and 9″ you’d be…wrong. Really, try it out. See?

Now, a lexical closure is supposed to capture its environment, meaning that the lambda stored on the first loop would capture i when i==0, the second loop would capture i when i==1, and so on. If this happened, then executing all the lambdas would indeed result in the squares of the numbers 0-9 being printed. So what gives?

The problem stems from the fact that the lambda is binding itself to a variable that is accessible outside the closure, which is being changed in every iteration of the loop. The closure doesn’t capture the value of i, it captures a reference to i itself, which is mutable.

You could actually make a case that this is bad code anyway, since it gives two responsibilities to the loop index - control the loop, and act as data in the closures. If we were being pedantic, we could split the responsibilities by creating a new variable, j, to be the closure data each iteration, and let i concentrate on being an index:

for (int i = 0; i < 10; ++i)
{
    int j = i;
    funcs[i] = () => j * j;
}

Lo and behold, the code now works! Pedantry rules! Take a look with Reflector or ildasm to see what’s going on here. The executive summary is that the compiler captures the environment (i in the first example, j in the second) by creating a member variable within the class it generates for the closure. Previously, since the same instance of i lived for the entire duration of the loop, only one instance of the generated class was created and shared. Now, however, a new instance of the generated class is created in each iteration of the loop (since j is scoped within the loop body and thus we have a new j every time round). Thus, the data is not shared and we get the expected output.

There are two important points to consider here:

  1. The problem goes away if you write code more declaratively. Do away with the clunky for loop and everything works OK.
    Enumerable.Range(0, 10).Select(x => x * x);
  2. It isn’t always bad that multiple closures can capture a reference - since one closure can ’see’ updates made to the shared data by another closure, you could use this as a coordination mechanism.

This is not an issue that’s going to crop up every day - the example above is fairly contrived - but knowing about it will save some painful debugging sessions when inevitably you do run into it. The fix is always to take a local copy of the mutable data to coerce the compiler into generating code that creates multiple instances of the class generated to represent the closure.

Simple, yes? ;-)