Archive for the ‘ Patterns ’ Category

Dynamic Async Batching with PFX

The PFX Team blog has been posting some excellent articles recently on the subject of task batching using the June 2008 CTP release of the Task Parallel Library. It’s really cool to see some of these techniques abstracted properly in .Net, and I hope it eventually becomes part of the core libraries.

I’ve been playing around a bit recently with the June CTP in the context of batching up web service calls, as that’s something I do quite a lot. One particular problem that comes up occasionally is a two-stage series of requests to download a complete set of paged data. I might do this if I wanted to download an entire discussion thread, for instance, or a large account statement from my online bank.

Typically in this situation the web service will limit the number of records I can retrieve in one request, and allow me to specify start and count parameters to the request. The response will also include a total record count, so I know how much data there is.

The normal use case for this is to request the first page of data, and use the total record count to display a list of page links that my user can click on to navigate the data or jump to any page. In my case, however, I want ALL the data as quickly as possible.

So, imagine a situation where I am using a service that lets me download a maximum of 200 records per request. My first step is to request the maximum 200 records starting from index 0, i.e. the first page of data. In the response will be a total record count – if that number is equal to the number of records I got back (i.e. <= 200) I’ve got everything in one hit and can stop. But what if the total record count is, say, 1000? I need to make four more requests (since I’ve already got records 1-200, I have 800 more to get in batches of 200 each).

Naturally I want to do this asynchronously, using as few resources as I can. This means all webservice calls should be using the APM pattern (thus using IO completion ports, and not consuming worker threads from the thread pool or creating my own threads) and, preferably, not blocking anywhere except when I actually need some data before continuing.

The two-stage process can be successfully captured asynchronously by combining a future and a continuation. I encapsulate the initial request in a Future object (which is a subclass of Task), and handle the check-record-count-and-get-more-records-if-required logic in the continuation. The code for this basically looks as follows:

public Future<List<Item>> GetAllItemsAsync()
{
    var f = Create<GetItemsResponse>(
            ac => Service.BeginGetItems(0, ac, null),
            Service.EndGetItems);

    var start = 200;

    var resultFuture = f.ContinueWith(
        r =>
            {
                // Batch retrieval here...
            });

    return resultFuture;
}

In order to support the APM pattern neatly, I’m using the following method from the PFX blog:

private static Future<T> Create<T>(
        Action<AsyncCallback> beginFunc,
        Func<IAsyncResult, T> endFunc)
{
    var f = Future<T>.Create();
    beginFunc(iar =>
        {
            try
            {
                f.Value = endFunc(iar);
            }
            catch (Exception e)
            {
                f.Exception = e;
            }
        });
    return f;
}

This could be coded as an extension method, though I haven’t bothered yet as I’m hopeful this immensely useful snippet will be integrated into the library itself.

Now I need to make a number of calls to get the rest of the data, so I loop until I’ve made the required number of async service calls:

var resultFuture = f.ContinueWith(r =>
    {
        var items = new ConcurrentQueue<Item>();
        var handles = new List<WaitHandle>();

        while (start < r.Value.TotalRecordCount)
        {
            var asyncResult = Service.BeginGetItems(200,
                ar => Service.EndGetItems(ar).Items
                    .ForEach(items.Enqueue), null);

            handles.Add(asyncResult.AsyncWaitHandle);
            start += 200;
        }

        handles.ForEach(h => h.WaitOne());
        return items.ToList();
    });

I’m about 85% happy with this as an approach. I’m not completely happy, however, because of the WaitOne calls, which mean that I’m blocking on a threadpool thread until all the calls complete. Given that this is all wrapped up in a future, I may not actually need to access the data until well after the calls have completed, in which case I am wastefully consuming a threadpool thread for some period of time. So the $64,000 question is, how do I get rid of it? I’m sure there’s a way to do it, but my brain has gone on a protest march about all the time I’m forcing it to spend thinking about this stuff.

  • Share/Bookmark

C# 3.0, Parallel LINQ, And The Betfair API – An Introduction

My pal Jan has a habit of waxing lyrical about the wonders of Parallel LINQ (PLINQ) as soon as you make the mistake of mentioning multithreading within earshot. I’ve been playing around with .Net 3.5 recently, and I write a lot of async code day-to-day when struggling to keep desktop webservice clients responsive when making lots of webservice calls, so I thought it high time I took a closer look.

The Problem

A key goal for the kind of async work I do is to batch multiple calls up, so that I get all the responses at once. This is important for keeping the rest of the code clean. To illustrate, imagine you are writing an application against the Betfair API, and you have a screen that displays a market, your current profit and loss on that market, and your unmatched bets on that market. To populate this screen will require four API calls – getMarket(), getMarketPrices(), getMarketProfitAndLoss(), and getCurrentBets().

Now, the worst (though easiest) thing to do is make the four calls sequentially on the UI thread. The problem with this is it’s slow, and the UI freezes during the process (since you’re blocking on the UI thread), which is a lousy user experience.

A slightly better approach is to spin off a thread, and make the four calls there, raising an event on completion. This gets all the work off the UI thread and therefore keeps the application responsive, but it’s still slow as the calls are still sequential.

To speed it up, you can create a thread per call (so four threads in this case). There’s a whole lot of complexity around working out the optimum number of threads to use (depending on how many processors you have, how many simultaneous connections you are allowed to open, etc) but that’s a bit beyond the scope of this post, so for now we’ll go with the one-thread-per-task approach and assume it’s optimal.

So, each thread makes one webservice call, and raises an event to signify that it’s finished. Simple, right? Unfortunately, this can lead to some real headaches in collating the data.

Imagine a user has hundreds of bets on the market, and therefore the getCurrentBets() call takes a bit longer to execute than the other three. The user clicks on a market, and the threads responsible for getting market data and P&L raise their events quickly, so you display the screen with the data you have and plan to display the bets as and when they arrive.

Before the bets are received, however, the user clicks on another market. Again, the market data and P&L come back quickly and you display them. Then, finally, the original getCurrentBets() call completes. But wait! You’ve moved onto another market now, so you don’t care about those bets any more! So you have to write some code to make sure that each piece of data received is still relevant. This can become very onerous very quickly, as you struggle to determine your UI state and work out what data you want and what should be discarded.

Now imagine that your application has timers firing all over the place to update prices and P&L on the market every second or two, so you have events being raised all the time.

I’ve worked with code that ventured down this path, and believe me, you don’t want to go there.

The Solution

The best approach is to batch these calls up, so that each happens on a separate thread, but only one event is raised – when all of the data has been received. That way, you can be sure that when you handle the event, all the data is consistent.

Since this is one of the things that PLINQ does for you, it seems like a good candidate for kicking the tyres, so to speak. First, though, I’ll do a quick run through of how to do this without PLINQ, for comparison’s sake. The task will be to display a list of all the Premiership matches available on Betfair at the time the code runs.

Take Out The Old

Betfair list Premiership matches grouped by fixture date, under the Barclays Premiership node in the event tree. It looks something like this:

Soccer
    English Soccer
        Barclays Premiership
            Fixtures 23 February
                Fulham v West Ham
                Liverpool v Middlesbrough
                ...
            Fixtures 24 February
                Blackburn v Bolton
                Reading v Aston Villa
            Fixtures 25 February
                Man City v Everton

The Barclays Premiership event node has an ID that doesn’t change (2022802), so I can jump straight to that node and save myself the bother of having to navigate the Soccer and English Soccer parent nodes.

I’ll assume you already know how to create Service References for Betfair’s global WSDL, and skip straight on to creating some useful helper methods. I need to be able to call getEvents(), obviously:

private GetEventsResp GetEvents(int parentEventID)
{
    return m_global.getEvents(
            MakeEventRequest(parentEventID)).Result;
}

private getEventsIn MakeEventRequest(int parentEventID)
{
    return new getEventsIn(new GetEventsReq()
        {
            header = new APIRequestHeader()
            {
                sessionToken = m_sessionToken
            },
            eventParentId = parentEventID
        });
}

If you’re not used to C# 3.0, this is taking advantage of type initialisation to create nested objects without having to create a bunch of extra local variables. You can write the exact same method without type initialisation like this:

private getEventsIn MakeEventRequest(int parentEventID)
{
    APIRequestHeader header = new APIRequestHeader();
    header.sessionToken = m_sessionToken;
    GetEventsReq req = new GetEventsReq();
    req.header = header;
    req.eventParentId = parentEventID;
    return new getEventsIn(req);
}

The first thing I need to do is get a list of fixture nodes. I can do this by asking for child events of the Premiership node, and filtering for the events that start with the word ‘Fixture’. This can be achieved with a simple regex and a bit of normal LINQ:

private List<BFEvent> GetPremiershipFixtureEvents()
{
    return GetEvents(PREMIERSHIP).eventItems.Where(
        (ev, idx) => Regex.IsMatch(ev.eventName, "^Fixtures.*")
        ).ToList();
}

Assume PREMIERSHIP is a const int with the value 2022802. The Where() method works as a filter – you pass it a delegate, and it executes that delegate against each member of the list and returns a new list containing only the elements for which the delegate returned true.

In this case, I’m creating the delegate with a lambda expression, which returns true for elements with an event name that is matched by the regex.

Now I’ve got the fixture events, I need to get the child events of each, which correspond to the actual matches. I want each call to be asynchronous so that they happen in parallel, rather than sequentially. I also want to wait for all calls to complete before continuing, so I use the WaitHandle.WaitAll() method:

private List<BFEvent> GetMatchEvents(
    List<BFEvent> fixtureDateEvents)
{
    List<BFEvent> matchEvents = new List<BFEvent>();
    var callbacks = (
        from ev in fixtureDateEvents
        select StartGetEvents(ev.eventId, matchEvents)
        ).ToList();
    WaitHandle.WaitAll(callbacks.ConvertAll(
                ar => ar.AsyncWaitHandle).ToArray());
    return matchEvents;
}

Here, the LINQ expression and the ConvertAll() method call are doing similar things – converting all elements of a list into another type. In the case of the LINQ expression, I am effectively obtaining a list of IAsyncResult objects by calling StartGetEvents() on each event in my list and storing the return value of each call. In the case of the ConvertAll() call, I am obtaining a list of WaitHandle objects by accessing the AsyncWaitHandle property of each IAsyncResult object in the list.

It is perfectly possible to replace the LINQ expression with a call to ConvertAll(), or the ConvertAll() call with another LINQ expression. Which one you use in cases like this is largely a matter of preference.

The StartGetEvents() method needs to make an asynchronous webservice call and append the results to the provided list. Since multiple threads are accessing the list, the write must be protected with a lock:

private IAsyncResult StartGetEvents(int parentEventID,
    List<BFEvent> matchEvents)
{
    return m_global.BegingetEvents(MakeEventRequest(parentEventID),
        delegate(IAsyncResult ar)
        {
            lock (matchEvents)
            {
                matchEvents.AddRange(
                    m_global.EndgetEvents(ar).Result.eventItems);
            }
        },
        m_global);
}

I am using an anonymous delegate for the callback here. All it does is lock the list and add the events contained in the response. Note that in production code you might want to be a bit more diligent about locking strategies and so on – I’ve written the code like this for conciseness, not necessarily for production-grade correctness.

Now the whole shebang can be invoked very simply:

var fixtures = GetPremiershipFixtureEvents();
GetMatchEvents(fixtures).ForEach(
        e => Console.WriteLine(e.eventName));

Note that the calling code is very clean and simple, and doesn’t care about threads or anything like that – all that async plumbing is nicely contained in the GetMatchEvents() and StartGetEvents() methods.

Bring In The New

So how can PLINQ help with this? Well, it lets me get rid of those GetMatchEvents() and StartGetEvents() methods, which contain all the fiddly async code and are easily the most complex methods in the code above.

First, I’ll create a simple task class which represents the task of getting events for a particular ID:

public class GetEventsTask
{
    private int m_parentEventID;
    private string m_sessionToken;

    public GetEventsTask(string sessionToken,
            int parentEventID)
    {
        m_sessionToken = sessionToken;
        m_parentEventID = parentEventID;
    }

    public List<BFEvent> GetEvents()
    {
        BFGlobalService svc = new BFGlobalServiceClient();
        APIRequestHeader header = new APIRequestHeader()
            { sessionToken = m_sessionToken };
        return new List<BFEvent>(svc.getEvents(
            new getEventsIn(new GetEventsReq()
            {
                eventParentId = m_parentEventID,
                header = header
            })).Result.eventItems);
    }
}

Once I’ve instantiated an instance of this class, a call to GetEvents() will get me all the child events for the specified parent node.

To use PLINQ, all I have to do is create an array of these task objects – one per fixture date – and use the AsParallel() extension method to specify that I want the task processing done in parallel:

    GetEventsTask[] tasks = (
            from ev in fixtureDateEvents
            select new GetEventsTask(m_sessionToken, ev.eventId)
            ).ToArray();
    var taskResults = (
            from t in tasks.AsParallel()
            select t.GetEvents()
            ).ToList();

Neat, eh? Note that PLINQ will also take care of deciding the optimal number of threads, neatly sidestepping the work I alluded to earlier.

One wrinkle is that my PLINQ statement results in a list of lists, so I need to flatten it out before returning.

List<BFEvent> matchEvents = new List<BFEvent>();
taskResults.ForEach(results => matchEvents.AddRange(results));

Obviously this is only scratching the surface, not only of PLINQ but of LINQ itself. Much more powerful expressions can be created with a little tweaking of the objects generated from the Betfair WSDL – but that’s a topic for another article.

  • Share/Bookmark

Coding by Convention

I’ve been meaning for a while to have a play around with Ruby on Rails, on the basis that anything generating so much hype over the last year or two deserves some level of investigation, if only to see whether the hype is justified. So, I spent a couple of days working through Agile Web Development with Rails and, well, it’s pretty nice. I can certainly appreciate a development environment that goes to such endearing effort to do work for you without getting in the way – a fairly tricky balancing act. I came to the book with a working knowledge of Ruby but zero practical exposure to Rails, and on top of that I’m not a web developer so could not bring much contextual experience to the table. Despite this, I worked through the book and ended up with a functional book-store application in about 15 hours. Not too shabby.

So how does Rails achieve such power and productivity? The answer is largely that Rails, more so than pretty much any other development environment I’ve used, leverages the power of convention. That is, if you stay ‘on rails’ and behave the way Rails wants you to, then in return you get a great deal of functionality for free. A kind of technological “you scratch my back, and I’ll scratch yours”. If you structure your application as Rails expects, then Rails will automatically hook everything up for you. If you name your database tables as Rails wants you to, and create the primary/foreign key id columns that Rails expects, then Rails will take care of all your object-relational-mapping needs for you. Sounds like a good deal, yes?

Rails doesn’t expect you to jump through all these hoops yourself though. It provides a number of useful scripts that you can use to perform the common tasks you want to do, in the way that Rails wants you to do them. Probably the best example of this is when you first start a new project. You ask Rails to create an application for you, with the name you specify, then off it goes – and creates 45 files in 37 directories, without you having to lift a finger.

$ rails dummy
      create
      create  app/controllers
      ...
      <snip>
      ...
      create  log/development.log
      create  log/test.log
$ find dummy/ -type f |wc -l
45
$ find dummy/ -type d |wc -l
37

Compare this to a newborn ASP.Net application created using the Web Site wizard in Visual Studio 2005:

$ find WebSite1/ -type f |wc -l
2
$ find WebSite1/ -type d |wc -l
2

A pretty substantial difference. And if you stay within the confines of Rails’ expectations when adding to the project – which is very easy to do since you are provided with more generators for creating models, controllers, and migrations (basically incremental DB deployment scripts) – then you end up with a nicely structured application in accordance with the hallowed principles of MVC design, and everything is glued together automatically. Create a new data model, and your controller is immediately able to load it from the database along with all its relational buddies in a nice aggregated object structure with just one line of code (as long as you remembered to add all the has_many and belongs_to calls, of course). Store that data object in a controller member variable, and your views can access it for display. Use one of the magical rake incantations and get a DB-backed session management system which will horizontally scale in a load-balanced environment. Run the script/console script and you are dropped into a fully interactive command-line environment similar to irb, where you can instantiate and interact with all your objects dynamically. Tail the development log and you can see all the generated SQL as it is executed, and even get indicated performance in terms of theoretical request-per-second capacity. It’s all just fab. Nothing spectacularly new, of course; each individual feature has been done before, but Rails pulls them all together very nicely indeed.

As I worked through the aforementioned book, however, it was very clear that without the guiding instruction of the esteemed Dave Thomas and DHH I’d be up the creek without a paddle, and that got me thinking. Programming by convention is all great and frictionless and wonderful as long as you know the conventions. Imagine, if you will, the sheer blank incomprehension of a maintenance programmer who’s never heard of Rails, sitting down to tweak a Rails application.

Wait, what? How can this happen? Surely everyone has heard of Rails by now? Nope, sorry, but the truth is that the majority of programmers are clock-punchers living in a single-language world who don’t read blogs, or play around with tech in their own time, and haven’t even heard of Linux, let alone Rails. Their single language will likely be an everyday static language like Java or C#, which will leave them ill-prepared for many of the dynamic tricks in idiomatic Ruby.

Ah, but surely the kind of forward-thinking proto-company that builds its product on RoR would never hire non-Ruby-savvy developers anyway? That might be the case if you drink the 37signals Kool-Aid and think that any RoR company is by default über-smart and infallible, but in the real world it doesn’t work like that; there are countless tiny non-technical companies out there with just one or two developers – I know, because I spent a few years working at one – and maybe their current developers are cool enough to use RoR, but when they inevitably leave and the tech-illiterate management hire a replacement, you can guarantee that the job spec will not include minor details like “must have at least heard of Ruby on Rails”.

So, our imaginary maintenance guy – let’s call him Ted – hired by a non-technical company to look after a web application, peers for the first time into the 37 directories (assuming no new ones have been added) and >45 files (since new ones will most certainly have been added), and nothing makes any sense. Even assuming Ted is smart enough to make reasonable guesses about Ruby syntax, and knows what MVC is, there’s no visible link between the different layers of the application. It isn’t clear how data is shuttled to and from the database. It isn’t clear why things don’t always work as expected when Ted tries to manually add new things, rather than using the Rails scripts (which he doesn’t know about), even when diligently trying to emulate the structure and layout of existing code. It all seems like sorcery. What is Ted to do?

The correct answer is to go and by a Rails book of course, or at least try and pick out the decent tutorials on the web (unfortunately, there’s a lot more chaff than wheat in this area, maybe a sign that Rails is becoming a bit more mainstream?). A few days of getting up-to-speed, and Ted achieves enlightenment and becomes mega-productive, and lives happily ever after. So coding by convention is a good thing, right?

Maybe.

I’m still uneasy about sorcery, and Rails is some of the most effective sorcery I’ve seen. The main problem is that, well, it’s sorcery. A couple of times in the 15 hours I spent going through Agile Web Development with Rails I hit problems. Not major ones, and always of my own making – some silly typo, or mistake coming from only having a working knowledge of Ruby rather than cosy familiarity. As is my wont, failure to spot the error after a cursory glance through the code led to a quick google search to see if I’ve hit a common problem, before resigning myself to going through the code in detail to sort it out (like all good programmers I’m a lazy devil).

On these periodic google jaunts I found lots and lots of forum posts and blog entries from people who, and let’s not mince words here, hadn’t the first clue what they were doing. People that had heard the Rails hype, bought the book (and probably the t-shirt), hit problems, and were now running around like a cargo cult expecting magic spells to solve all their problems. Restart WEBrick. rake db:sessions:clear. Roll the most recent migration back then forward again. None of these work? Sorry, can’t help. It reminds me of The IT Crowd‘s “have you tried turning it off and on again?”.

I shouldn’t be harsh on these folks; at least they’re getting excited by Rails and are rolling up their sleeves and having a go, and no doubt some of them will succeed wildly and become far better, richer, more attractive programmers than I can ever hope to be. Also let me be clear that I think the productivity gains of software like Rails is a good thing, and Rails is certain to account for a good chunk of my tinkering time for the next few months. It worries me, however, when people try to run before they can walk, and the magic of coding by convention tends to encourage it.

I’ll leave it as an exercise for the reader to consider the implications of the fact that the sample application being conjured here by all these sorcerers’ apprentices is an e-commerce site, at a time when online fraud is skyrocketing.

I don’t mean to single out Ruby on Rails specifically, by the way, it’s just handy as an example due to its profile. Coding by convention is not new; if you want an older example of what happens when people are given programming tools that allow them to get something working – for fairly loose definitions of ‘working’ – without knowing much about what’s happening under the hood, then look at the atrocities committed with VB and databinding over the years.

Steve Yegge has a characteristically long and insightful rant on this subject, and is troubled by the difficulty of working out where to draw the line. The line, in this case, being the level of abstraction at which a programmer should understand a system – high enough not to be bogged down in insane detail (e.g. knowing how semiconductors work) but not so high that the role of programmer is reduced to that of sideshow conjurer, waving a cheap trick-shop wand and trusting to a higher power that everything will work out OK.

Maybe it’s just a generational disease. Maybe in ten years’ time all the apprentices who have graduated to fully-fledged sorcerers will be looking on in dismay at the young scamps creating Web 5.0 applications using Ruby on MagLev simply by burping commands into their Skype headsets, and writing cautionary blogs about the dangers of not knowing how to write a partial web template.

Theoretically, a perfect system – perhaps a descendant of Rails in the dim and distant future – would contain such exquisitely crafted assumptions and such frictionless conventions that it would never go wrong and always do the right thing. Thus, the need to understand anything at the lower level of abstraction required to sort out any problems is obviated, unless you are one of the very few Grand High Wizards who keep everything running smoothly. I don’t know whether it’s fortunate or unfortunate that such a system is unlikely to appear within my lifetime.

  • Share/Bookmark

Technical Book Club

Back in October, personal finance blogger Trent at The Simple Dollar started an online book club for one of his favourite finance books. Good idea, I thought, so I’m nicking it. Starting in January, I’m running a technical book club at work with a few .Net devs, and I’ll write everything up and post it here, so if you’re so inclined you can follow along at home.

To start with, we’ll be reading language-agnostic books covering the fundamentals of software development in the real world, since it’s always valuable to refresh knowledge on the cornerstones of modern professional coding; later on this can diversify into specific technologies and subjects with more arcane, academic, or abstract overtones. Another benefit of starting with the basics is that we can concentrate on getting the format right without feeling overwhelmed by unfamiliar material.

So, the initial batch of texts will cover object-oriented design, design patterns, refactoring, code quality, and so on. Later, the idea is to study less immediate (but still vital) subjects like functional programming, compiler design, operating systems, etc.; and also to gain deeper knowledge of common specific technologies, e.g. the inner workings of the CLR or a JVM. I suspect people like stevey will argue that these latter subjects are more important than the others and should be done first, and they might even be right, but I’ve picked my approach and I’m sticking with it, so nuts to you stevey.

So, here’s the early schedule and probable books. The order we do these books might change – in fact the books themselves might change if, for example, we decide that Fowler’s Enterprise Patterns is more appropriate than the GoF’s Design Patterns. Each book will be agreed for certain in good time for it to be ordered and delivered before the scheduled start date, obviously.

Topic Book Start Date
OO Design
Object-Oriented Analysis and Design with Applications
14/01/2008
Design Patterns
Design Patterns: Elements of Reusable Object-Oriented Software
03/03/2008
Refactoring
Refactoring: Improving the Design of Existing Code
05/05/2008
Code Quality Pragmatic Programmer 07/07/2008
Legacy Code Working Effectively With Legacy Code 01/09/2008

In addition to these, we’ll also cover one chapter of Code Complete per week. So, there it is. If you want to tag along, get yourself a copy of Booch’s Object-Oriented Analysis and Design with Applications and McConnell’s seminal Code Complete, and tune in next month.

  • Share/Bookmark