C# 3.0, Parallel LINQ, And The Betfair API - An Introduction

My pal Jan has a habit of waxing lyrical about the wonders of Parallel LINQ (PLINQ) as soon as you make the mistake of mentioning multithreading within earshot. I’ve been playing around with .Net 3.5 recently, and I write a lot of async code day-to-day when struggling to keep desktop webservice clients responsive when making lots of webservice calls, so I thought it high time I took a closer look.

The Problem

A key goal for the kind of async work I do is to batch multiple calls up, so that I get all the responses at once. This is important for keeping the rest of the code clean. To illustrate, imagine you are writing an application against the Betfair API, and you have a screen that displays a market, your current profit and loss on that market, and your unmatched bets on that market. To populate this screen will require four API calls - getMarket(), getMarketPrices(), getMarketProfitAndLoss(), and getCurrentBets().

Now, the worst (though easiest) thing to do is make the four calls sequentially on the UI thread. The problem with this is it’s slow, and the UI freezes during the process (since you’re blocking on the UI thread), which is a lousy user experience.

A slightly better approach is to spin off a thread, and make the four calls there, raising an event on completion. This gets all the work off the UI thread and therefore keeps the application responsive, but it’s still slow as the calls are still sequential.

To speed it up, you can create a thread per call (so four threads in this case). There’s a whole lot of complexity around working out the optimum number of threads to use (depending on how many processors you have, how many simultaneous connections you are allowed to open, etc) but that’s a bit beyond the scope of this post, so for now we’ll go with the one-thread-per-task approach and assume it’s optimal.

So, each thread makes one webservice call, and raises an event to signify that it’s finished. Simple, right? Unfortunately, this can lead to some real headaches in collating the data.

Imagine a user has hundreds of bets on the market, and therefore the getCurrentBets() call takes a bit longer to execute than the other three. The user clicks on a market, and the threads responsible for getting market data and P&L raise their events quickly, so you display the screen with the data you have and plan to display the bets as and when they arrive.

Before the bets are received, however, the user clicks on another market. Again, the market data and P&L come back quickly and you display them. Then, finally, the original getCurrentBets() call completes. But wait! You’ve moved onto another market now, so you don’t care about those bets any more! So you have to write some code to make sure that each piece of data received is still relevant. This can become very onerous very quickly, as you struggle to determine your UI state and work out what data you want and what should be discarded.

Now imagine that your application has timers firing all over the place to update prices and P&L on the market every second or two, so you have events being raised all the time.

I’ve worked with code that ventured down this path, and believe me, you don’t want to go there.

The Solution

The best approach is to batch these calls up, so that each happens on a separate thread, but only one event is raised - when all of the data has been received. That way, you can be sure that when you handle the event, all the data is consistent.

Since this is one of the things that PLINQ does for you, it seems like a good candidate for kicking the tyres, so to speak. First, though, I’ll do a quick run through of how to do this without PLINQ, for comparison’s sake. The task will be to display a list of all the Premiership matches available on Betfair at the time the code runs.

Take Out The Old

Betfair list Premiership matches grouped by fixture date, under the Barclays Premiership node in the event tree. It looks something like this:

Soccer
    English Soccer
        Barclays Premiership
            Fixtures 23 February
                Fulham v West Ham
                Liverpool v Middlesbrough
                ...
            Fixtures 24 February
                Blackburn v Bolton
                Reading v Aston Villa
            Fixtures 25 February
                Man City v Everton

The Barclays Premiership event node has an ID that doesn’t change (2022802), so I can jump straight to that node and save myself the bother of having to navigate the Soccer and English Soccer parent nodes.

I’ll assume you already know how to create Service References for Betfair’s global WSDL, and skip straight on to creating some useful helper methods. I need to be able to call getEvents(), obviously:

private GetEventsResp GetEvents(int parentEventID)
{
    return m_global.getEvents(
            MakeEventRequest(parentEventID)).Result;
}

private getEventsIn MakeEventRequest(int parentEventID)
{
    return new getEventsIn(new GetEventsReq()
        {
            header = new APIRequestHeader()
            {
                sessionToken = m_sessionToken
            },
            eventParentId = parentEventID
        });
}

If you’re not used to C# 3.0, this is taking advantage of type initialisation to create nested objects without having to create a bunch of extra local variables. You can write the exact same method without type initialisation like this:

private getEventsIn MakeEventRequest(int parentEventID)
{
    APIRequestHeader header = new APIRequestHeader();
    header.sessionToken = m_sessionToken;
    GetEventsReq req = new GetEventsReq();
    req.header = header;
    req.eventParentId = parentEventID;
    return new getEventsIn(req);
}

The first thing I need to do is get a list of fixture nodes. I can do this by asking for child events of the Premiership node, and filtering for the events that start with the word ‘Fixture’. This can be achieved with a simple regex and a bit of normal LINQ:

private List<BFEvent> GetPremiershipFixtureEvents()
{
    return GetEvents(PREMIERSHIP).eventItems.Where(
        (ev, idx) => Regex.IsMatch(ev.eventName, “^Fixtures.*”)
        ).ToList();
}

Assume PREMIERSHIP is a const int with the value 2022802. The Where() method works as a filter - you pass it a delegate, and it executes that delegate against each member of the list and returns a new list containing only the elements for which the delegate returned true.

In this case, I’m creating the delegate with a lambda expression, which returns true for elements with an event name that is matched by the regex.

Now I’ve got the fixture events, I need to get the child events of each, which correspond to the actual matches. I want each call to be asynchronous so that they happen in parallel, rather than sequentially. I also want to wait for all calls to complete before continuing, so I use the WaitHandle.WaitAll() method:

private List<BFEvent> GetMatchEvents(
    List<BFEvent> fixtureDateEvents)
{
    List<BFEvent> matchEvents = new List<BFEvent>();
    var callbacks = (
        from ev in fixtureDateEvents
        select StartGetEvents(ev.eventId, matchEvents)
        ).ToList();
    WaitHandle.WaitAll(callbacks.ConvertAll(
                ar => ar.AsyncWaitHandle).ToArray());
    return matchEvents;
}

Here, the LINQ expression and the ConvertAll() method call are doing similar things - converting all elements of a list into another type. In the case of the LINQ expression, I am effectively obtaining a list of IAsyncResult objects by calling StartGetEvents() on each event in my list and storing the return value of each call. In the case of the ConvertAll() call, I am obtaining a list of WaitHandle objects by accessing the AsyncWaitHandle property of each IAsyncResult object in the list.

It is perfectly possible to replace the LINQ expression with a call to ConvertAll(), or the ConvertAll() call with another LINQ expression. Which one you use in cases like this is largely a matter of preference.

The StartGetEvents() method needs to make an asynchronous webservice call and append the results to the provided list. Since multiple threads are accessing the list, the write must be protected with a lock:

private IAsyncResult StartGetEvents(int parentEventID,
    List<BFEvent> matchEvents)
{
    return m_global.BegingetEvents(MakeEventRequest(parentEventID),
        delegate(IAsyncResult ar)
        {
            lock (matchEvents)
            {
                matchEvents.AddRange(
                    m_global.EndgetEvents(ar).Result.eventItems);
            }
        },
        m_global);
}

I am using an anonymous delegate for the callback here. All it does is lock the list and add the events contained in the response. Note that in production code you might want to be a bit more diligent about locking strategies and so on - I’ve written the code like this for conciseness, not necessarily for production-grade correctness.

Now the whole shebang can be invoked very simply:

var fixtures = GetPremiershipFixtureEvents();
GetMatchEvents(fixtures).ForEach(
        e => Console.WriteLine(e.eventName));

Note that the calling code is very clean and simple, and doesn’t care about threads or anything like that - all that async plumbing is nicely contained in the GetMatchEvents() and StartGetEvents() methods.

Bring In The New

So how can PLINQ help with this? Well, it lets me get rid of those GetMatchEvents() and StartGetEvents() methods, which contain all the fiddly async code and are easily the most complex methods in the code above.

First, I’ll create a simple task class which represents the task of getting events for a particular ID:

public class GetEventsTask
{
    private int m_parentEventID;
    private string m_sessionToken;

    public GetEventsTask(string sessionToken,
            int parentEventID)
    {
        m_sessionToken = sessionToken;
        m_parentEventID = parentEventID;
    }

    public List<BFEvent> GetEvents()
    {
        BFGlobalService svc = new BFGlobalServiceClient();
        APIRequestHeader header = new APIRequestHeader()
            { sessionToken = m_sessionToken };
        return new List<BFEvent>(svc.getEvents(
            new getEventsIn(new GetEventsReq()
            {
                eventParentId = m_parentEventID,
                header = header
            })).Result.eventItems);
    }
}

Once I’ve instantiated an instance of this class, a call to GetEvents() will get me all the child events for the specified parent node.

To use PLINQ, all I have to do is create an array of these task objects - one per fixture date - and use the AsParallel() extension method to specify that I want the task processing done in parallel:

    GetEventsTask[] tasks = (
            from ev in fixtureDateEvents
            select new GetEventsTask(m_sessionToken, ev.eventId)
            ).ToArray();
    var taskResults = (
            from t in tasks.AsParallel()
            select t.GetEvents()
            ).ToList();

Neat, eh? Note that PLINQ will also take care of deciding the optimal number of threads, neatly sidestepping the work I alluded to earlier.

One wrinkle is that my PLINQ statement results in a list of lists, so I need to flatten it out before returning.

List<BFEvent> matchEvents = new List<BFEvent>();
taskResults.ForEach(results => matchEvents.AddRange(results));

Obviously this is only scratching the surface, not only of PLINQ but of LINQ itself. Much more powerful expressions can be created with a little tweaking of the objects generated from the Betfair WSDL - but that’s a topic for another article.

Related Posts:
digg    del.icio.us    reddit    stumbleupon

One Response to “C# 3.0, Parallel LINQ, And The Betfair API - An Introduction”

  1. green Says:

    http://www.linqhelp.com has a bunch of these 101’s at their site. Nice read very lengthy=)

Leave a Reply