Marshalling a Variable-Length Array From Unmanaged Code In C#

I recently spent time working on some C# code to interact with a simple DNS-SD system. This requires using DNS TXT records, which are not supported in the System.Net.Dns class. After a few google searches failed to turn up a pure .Net client library that met my needs, I settled on an approach based around p/invoking the Win32 DnsQuery function.

And quickly ran into problems.

For DNS TXT records, DnsQuery returns a DNS_TXT_DATA structure in the Data field of the DNS_RECORD structure. DNS_TXT_DATA is declared like this:

typedef struct {
    DWORD dwStringCount;
    PWSTR pStringArray[1];
} DNS_TXT_DATA,
    *PDNS_TXT_DATA;

Using the very handy P/Invoke Interop Assistant, we see that this struct can be represented like this in managed code:

[StructLayout(LayoutKind.Sequential)]
public struct DNS_TXT_DATA {
 
    /// DWORD->unsigned int
    public uint dwStringCount;
 
    /// PWSTR[1]
    [MarshalAs(UnmanagedType.ByValArray,
            SizeConst=1,
            ArraySubType=UnmanagedType.SysUInt)]
    public IntPtr[] pStringArray;
}

There is a problem with pStringArray, unfortunately. The System.Runtime.InteropServices.Marshal class cannot marshal a variable length array, as it needs to know in advance how big the array is in order to allocate memory. That’s why the managed structure needs SizeConst specified in the MarshalAs attribute.

However, if the DNS TXT record data contains multiple quoted strings separated by whitespace, DnsQuery will return a structure with a variable number of elements in pStringArray. Since SizeConst is set at compile-time, when we marshal this into the managed struct defined above, we only get the first element in our single-element array. Rats.

More googling turned up very little info on dealing with this, though I found indications that others had run into the same problem without finding a satisfactory conclusion. DnsQuery is not the only Win32 function that returns variable-length arrays, and p/invoking any of the others has the same issue.

Simply declaring SizeConst to be bigger than we need – “hey, I know I’ll never get more than 10 or so strings back, so why not declare SizeConst to be 128?” – is inelegant (hardcoded upper limits, ugh) and doesn’t work properly anyway. Since the struct layout is sequential the marshaller will copy over (e.g.) 128*sizeof(IntPtr) sequential bytes (a total of 512 bytes, in this case). That much memory was never allocated on the unmanaged side, so we end up with a load of junk in the tail of pStringArray, and more often than not the marshaller chokes on this junk and throws an AccessViolationException. Fun.

There IS a way to get round the problem, though. I’m not sure it’s the best way, but it works and seems stable, so I thought I’d throw it out there in case anyone else can use it (or maybe explain to me why it’s an unsafe stupid thing to do…)

Basically, since we’re dealing with sequential memory, we can use Marshal.PtrToStructure to marshal the DNS_TXT_DATA structure as defined above, then use pointer arithmetic to gain access to any further data that needs marshalling.

Pointer arithmetic? Oh yes, even in the safe and secure world of managed code it’s sometimes still necessary to get our hands dirty, and situations like this illustrate that it will always be valuable to have some hard-earned Assembly/C/C++ war wounds.

So, assuming we have valid p/invoke declarations and data structures (I’ve included a complete source program below), DnsQuery is called like so:

var pServers = IntPtr.Zero;
var ppQueryResultsSet = IntPtr.Zero;
var ret = DnsQuery(domain,
        DnsRecordType.TEXT,
        DnsQueryType.STANDARD,
        pServers,
        ref ppQueryResultsSet,
        IntPtr.Zero);
if (ret != 0)
    throw new ApplicationException("DnsQuery failed: " + ret);

If we examine the memory location of ppQueryResultsSet (Ctrl-Alt-M,1 or Debug->Windows->Memory->Memory1 in Visual Studio) we’ll see something like the following (actual address locations may vary – just copy the int value of ppQueryResultsSet to the Address bar of the memory window):

0x049E0878  00 00 00 00  ....
0x049E087C  b8 09 9e 04  ¸.ž.
0x049E0880  10 00 20 00  .. .
0x049E0884  19 30 00 00  .0..
0x049E0888  00 00 00 00  ....
0x049E088C  00 00 00 00  ....
0x049E0890  06 00 00 00  ....
0x049E0894  b8 08 9e 04  ¸.ž.
0x049E0898  d8 08 9e 04  Ø.ž.
0x049E089C  f8 08 9e 04  ø.ž.
0x049E08A0  28 09 9e 04  (.ž.
0x049E08A4  68 09 9e 04  h.ž.
0x049E08A8  88 09 9e 04  ˆ.ž.

I’ve set the column size to 4 here, as most of the values we are dealing with are 4 bytes in size. This effectively shows one value per line.

The first 6 rows (24 bytes) correspond to the DNS_RECORD structure up until (but not including) the DNS_TXT_DATA structure in DNS_RECORD’s Data union. We can marshal this first structure without problem:

var dnsRecord = (DnsRecord) Marshal.PtrToStructure(
        ppQueryResultsSet, typeof (DnsRecord));

The DNS_TXT_DATA structure starts at address 0×049E0890 in my example. Having already marshalled the DNS_RECORD structure, now I want a pointer to the DNS_TXT_DATA structure. I can do this by creating a new pointer at the address of ppQueryResultsSet plus 24 bytes, and marshalling again:

var ptr = new IntPtr(
        ppQueryResultsSet.ToInt32() + Marshal.SizeOf(dnsRecord));
var txtData = (DNS_TXT_DATA) Marshal.PtrToStructure(
        ptr, typeof (DNS_TXT_DATA));

Because of the definition of DNS_TXT_DATA, this only marshals 8 bytes – 4 bytes for dwStringCount, and 4 bytes for the single element in pStringArray (an IntPtr). Since we know the memory is sequential, however, this gives us everything we need – we now know how many strings have been received (6 in this case, as indicated at 0×049E0890), and the location of the pointer to the first string (0×049E0894).

With this info, we can marshal all the pointers into an array with a length of dwStringCount:

ptr = new IntPtr(ptr.ToInt32() + sizeof (uint)); // move to first
var ptrs = new IntPtr[txtData.dwStringCount]; // dest array
Marshal.Copy(ptr, ptrs, 0, ptrs.Length);

And finally we iterate through those pointers, marshalling the string pointed at by each:

var strings = new List<string>();
for (var i = 0; i < ptrs.Length; ++i)
{
    strings.Add(Marshal.PtrToStringAnsi(ptrs[i]));
}

While the example I’ve presented here is specific to DnsQuery, the general approach should be applicable to any situation where you need to marshal a data structure containing a variable-length array.

Source code

  • Share/Bookmark

Project Euler Problem 8

Problem 8

“Find the greatest product of five consecutive digits in the 1000-digit number.

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450″

The first step here is to find a representation for that fairly humungous number. Obviously it’s not going to fit into a paltry 32-bit int…but then we don’t need it to. The problem description requires us to think in terms of smaller (5-digit) numbers, not one giant 1000-digit number.

So, it is sufficient for us to consider the number as an enumerable stream of single digits, which we can conveniently represent as IEnumerable. I could use a macro to convert the number into a collection initialiser, but it’s much easier to treat the string as an IEnumerable<char> and let LINQ do the heavy lifting.

var nums = Enumerable.AsEnumerable(
        "73167176531330624919225119674426574742355349194934" +
        "96983520312774506326239578318016984801869478851843" +
 
        // ..... etc etc ......
 
        "05886116467109405077541002256983155200055935729725" +
        "71636269561882670428252483600823257530420752963450"
        ).Select(x => Convert.ToInt32(x.ToString()));

This gives us an IEnumerable<int> containing every digit in the 1000-digit number. Now, the ‘obvious’ way to solve the problem is to iterate through the collection, and at each index multiply the value against the next four indexes. A simple loop should deal with it:

private static int SimpleSolver(int[] ints)
{
    int max = 0;
    for (int i = 0; i < ints.Length - 4; i++)
    {
        int tmp = ints[i] * ints[i + 1] * ints[i + 2]
            * ints[i + 3] * ints[i + 4];
        max = Math.Max(max, tmp);
    }
 
    return max;
}

As ever, though, that’s pretty ugly – the loop condition and product calculation is tied to the sequence size of 5, and messing with an index variable is tedious.

An alternative approach is to take advantage of LINQ’s Skip and Take methods to split the problem domain into overlapping ’slices’. Similar to the for loop above, the core of the approach is to iterate through the digits, and at each digit grab a number of subsequent digits and calculate the product.

Lets look at the 5-digit slices available from the first 10 digits:

  7   3   1   6   7   1   7   6   5   3
|       73167       |
    |       31671       |
        |       16717       |
            |       67176       |
                |       71765       |
                    |       17653       |

We can use Skip to progressively move the starting index forward, and Take to grab the 5 digits we need. So, starting with i=0, each successive slice can be sliced from the whole with:

var slice = ints.Skip(i++).Take(5);

To calculate the product of the digits in the slice, we can use the Aggregate operation:

slice.Aggregate(1, (curr, next) => curr*next);

We’ve met Aggregate before – it’s basically a fold, which collapses a sequence to a single item by repeatedly applying an operation to an accumulating result.

This can all be wrapped up as an iterator block, like so:

private static IEnumerable<int> EnumerateSlices(
        IEnumerable<int> ints, int sliceSize)
{
    int i = 0;
    while (true)
    {
        var slice = ints.Skip(i++).Take(sliceSize);
 
        if (slice.Count() < sliceSize)
            yield break; // end
 
        yield return slice.Aggregate(1,
                (curr, next) => curr*next);
    }
}

Note the termination condition – when we have enumerated every slice, our next slice will contain only 4 elements (3, 4, 5, and 0 from the end of the sequence) – that’s our cue to exit the loop.

Also note that this approach makes the algorithm trivial to parameterize – it will work just as well with slice sizes other than 5.

This iterator will produce an IEnumerable containing the products of all slices, so the final step is to select the largest:

var result = EnumerateSlices(nums, 5).Max();
  • Share/Bookmark

Killing and Reviving an Aspire One

I just spent 2 hours reviving my Aspire One netbook after inadvertently killing it whilst fiddling about configuring dropbox. I found the whole process unnecessarily fiddly and information on the interwebs to be a bit scarcer than I would have liked, so I’m documenting it here in case I need it in the future. Hopefully it’ll be useful to someone else too.

So, the cause of death was a typo when trying to set up the dropboxd daemon to start automatically on boot. I’m not running nautilus so couldn’t use one of the prepackaged releases, and it’s completely my fault that I made a mess of installing the vanilla x86 build.

After making the fatal change and rebooting, the system would only boot up to a blank black screen with a default X mouse cursor. This is because the system was trying to run my broken command, failing, and therefore never getting to the main desktop.

In the world of normal linux, there’s all sorts of ways of dealing with this, but despite plenty of googling I couldn’t find a way to use run-level 2 or 3 on an Aspire One, and the Ctrl+Alt+F1-F6 key combos for switching away from X to a terminal don’t work either. There seems to be no way of preventing the system following the same doomed process over and again if you break X.

Frustrated, I thought about using the restore disk, but that’s a nuclear option – it re-paves the whole machine, so bye-bye data. That seemed a bit drastic when all I needed to do was edit a single text file to fix the system.

Ironically, this was happening as a result of me trying to install a file sync system as a simple backup. Grr.

Still, like countless thousands before me, I was saved by a live linux distro – in this case, a USB bootable one (since the Aspire One has no optical drive). Following the instructions[1] at pendrivelinux I created a bootable Feather Linux USB drive, and booted the netbook from it by hitting F12 on the post screen and selecting to boot from the USB stick.

At the boot prompt, I used ‘knoppix 3′ to boot the system up to a command line, mounted /dev/hdc1 as an ext2 filesystem, and fixed my typo. Reboot, and tada! Everything was working again (well, after hitting Fn-F7 to reenable the touchpad, which I had accidentally disabled whilst mashing the keyboard in frustration at the sight of a blank screen about an hour earlier, heh).

[1] Note that I had to use a newer version of syslinux than the one referenced on pendrivelinux. This one worked for me.

  • Share/Bookmark

Macros: You Oughta Know

One of the most useful tools available in any decent text editor is the macro recorder, but it’s criminally underused. It seems most people either don’t know the functionality exists, or simply ignore it. This is a shame, since it’s a great timesaver.

I don’t know why macros are so underused. It might be a mindset thing – it can take a little while to develop the ability to spot repetitive editing tasks quickly (i.e. not when you’re 75% of the way through thinking dang, I have to do this again?), so maybe many people never quite make the leap.

It’s worth it though, because once you get your eye in you see chances to use macros everywhere.

I had a useful example just yesterday, in which I needed to make a change to a colossal switch statement (220 branches! Run the cyclomatic-complexity doohickey on THAT!) and had no unit tests to fall back on.

If I had to modify (and hopefully refactor) such a huge construct I wanted to be able to compare before-and-after test results, but I didn’t much fancy hand-cranking a few hundred unit tests.

By recording a temporary macro, however, it took just a couple of minutes to cover every branch. I’ve decided to post a detailed walkthrough of the process here in the hopes that a fairly simple example will be illustrative for those that don’t already lean heavily on macros.

Note that this is not an advanced tutorial. Please refrain from leaving snarky comments about how macros are so much more powerful than this – I’m just doing some introductory material here :-)

Here is a representative snippet of the C# source. It’s part of a legacy permissioning system that, under certain circumstances, needs to check for the existence of a permission represented by an enum against a permission table containing a string-based hierarchy (application/role/permission). The code I was modifying did the appropriate conversion:

    case PermissionKey.SecurityParameterManagementAdd:
    {
        return new string[] {"Security", "ParameterManagement", "Add"};
    }
    case PermissionKey.SecurityRoleManagementAdd:
    {
        return new string[] {"Security", "RoleManagement", "Add"};
    }
    case PermissionKey.SecurityRoleManagementModify:
    {
        return new string[] {"Security", "RoleManagement", "Modify"};
    }
    case PermissionKey.SecurityUserManagementDelete:
    {
        return new string[] {"Security", "UserManagement", "Delete"};
    }
    case PermissionKey.SecurityPermissionManagementAdd:
    {
        return new string[] {"Security", "PermissionManagement", "Add"};
    }

I needed to add a couple of branches to this, but I also wanted to tidy up the code by removing the superfluous braces, as a precursor to converting it into something a bit more robust and maintainable. I wanted unit test coverage to give me confidence that I hadn’t mucked up some logic and inadvertantly granted admin access to the helpdesk trainee role or something.

So, I copied the entire switch body into Notepad++ (well, vim really, but I’ll pretend it’s Notepad++ for the sake of making this post a bit more accessible) and set to work[1].

Before recording my macro, I needed to do a bit of preprocessing to trim the code down to just the data I wanted to work with. The following steps show the ‘find’ regexes I used (in each case, the value of the replace field was empty, so these are effectively deletes), and the effect on the first switch branch from the list above:

1) Remove opening and closing braces from every switch branch:

^\s+[\{\}]$
    case PermissionKey.SecurityParameterManagementAdd:
 
        return new string[] {"Security", "ParameterManagement", "Add"};

2) Remove blanks – TextFX/Edit/Delete Blank Lines

    case PermissionKey.SecurityParameterManagementAdd:
        return new string[] {"Security", "ParameterManagement", "Add"};

3) Remove case statements and leading whitespace:

^\s+case\s+
PermissionKey.SecurityParameterManagementAdd:
PermissionKey.SecurityParameterManagementAdd:
        return new string[] {"Security", "ParameterManagement", "Add"};

4) Remove colon from end of case statement:

:$

5) Remove return statement and leading whitespace:

^\s+return new string\[\]\s*

I ended up with a sequence of couplets looking similar to this one:

PermissionKey.SecurityParameterManagementAdd
{"Security", "ParameterManagement", "Add"};

Now the fun starts – lets walk through the process.

We want to convert the first couplet into a simple unit test fixture, and record the process. This will be our macro – the instructions for converting one couplet into one unit test. We can then play the macro multiple times to convert all the others effortlessly.

Start by moving the cursor to the start of the line, before the ‘P’ of PermissionKey. This is the start point of the macro, so for the macro to be repeatable we must make sure that we finish recording the macro in perfect position to run it again, i.e. before the ‘P’ of PermissionKey for the next couplet (column 0 line 3). Hit Ctrl-Shift-R to start recording.

It is important not to use the mouse when editing – stick to the keyboard. It’s also important not to record keystrokes that are too specific to one bit of code. For instance, don’t use the arrow keys to move left and right character-by-character, because it won’t work on longer or shorter lines.

Instead, use the Home and End keys to jump to the start or end of the line, and hold Ctrl whilst arrowing left or right to move a word at a time instead of a character at a time (this is one of the areas where vim’s movement commands really differentiate it from wannabes like Notepad++…but I digress). See the ‘Detailed Instructions’ section below for more information.

Assume the original switch body is in a method called ‘LookupEnumPermission’. The couplet should be edited to look like this (without the linewrap…):

[Test]
public void TestSecurityParameterManagementAdd()
{
    string[] result = LookupEnumPermission(
            PermissionKey.SecurityParameterManagementAdd);
    Assert.AreEqual("Security", result[0]);
    Assert.AreEqual("ParameterManagement", result[1]);
    Assert.AreEqual("Add", result[2]);
}

Make sure you finish by moving the cursor into position for the next couplet, and hit Ctrl-Shift-R again to stop recording.

Now, hit Ctrl-Shift-P to play back the macro. If you’ve done everything right, the next couplet should magically format itself into a unit test. Hit Ctrl-Shift-P again, and the next couplet will change too. Under the Macro menu, select ‘Run a macro multiple times…’ and you can enter a fixed number of iterations, or just apply the macro over and over again until the end of the file is reached.

Finally, you can copy the unit tests into a new or existing test fixture, and you’re done! In much less time (hopefully) and with fewer errors than if the tests had been written one-by-one.

Detailed Instructions:

These are ley-by-key instructions in Notepad++, in case something in the description above is unclear. Visual Studio should be similar. Vim will be faster once you’ve learned how, but I’ll assume if you use vim you’re already au fait with this sort of editing :-)

  1. Type [Test], and hit enter to start a new line.
  2. Type ‘public void Test’ and hit Enter.
  3. Type ‘{‘ and hit Enter, then Tab.
  4. Hold Ctrl and tap the right arrow twice to jump over a couple of words and place the cursor at the start of the word SecurityParameterManagementAdd, then hold Ctrl-Shift and right arrow again to select the word. Ctrl-C to copy, then arrow up two lines and paste it after the word ‘Test’ to create the full function name TestSecurityParameterManagementAdd. Type () for the empty parameter list.
  5. Arrow down two lines and hit Home to jump to the start of the line. Type ’string[] result = LookupEnumPermission(‘, then hit End to jump to the end of the line and type ‘);’.
  6. Arrow down one line, hit Home, then Tab. Type ‘Assert.Equals(‘ then hit Delete to remove the ‘{‘. Hold Ctrl and move right three times (to move the cursor just past the comma) and type ‘result[0]);’ and hit Enter.
  7. Repeat variations of step 7 a couple of times to convert the next two lines. Remember to use the correct indexes (result[1] and result[2]). Hit Enter after the last line and type ‘}’ to close the function body.
  8. Arrow down one line and hit Home to place the cursor at the correct start position for the next couplet, and end the macro by hitting Ctrl-Shift-R again.

[1]I could have just done this in a new file in Visual Studio, but for some reason I find VS intolerably slow at running macros once recorded. So slow, in fact, that you can watch the cursor laboriously complete each step – I wind up thinking it would have been quicker to do it manually. That might just be something odd about my VS installation though, as no-one else seems to think it’s slow.

  • Share/Bookmark

OOD – The Second Coming?

Over the last couple of months I’ve been burning up my free time on a pet project (hence the scarcity of posting here). This particular project is a web application, and since I’ve always been a desktop or middle-tier dude in my day job, it’s a bit of a step out of my normal environment to grapple with browser compatibility and suchlike.

Still, I wanted it to be a decent learning experience, so after a brief dalliance with Rails I scrapped the idea of using any framework sorcery and decided to write everything in plain ol’ PHP, and lean heavily on YUI and jQuery to sort out the browser stuff.

This probably isn’t an approach I’d use in future projects, but I bet I’ll appreciate those frameworks a lot more once I’ve encountered and understood the problems they attempt to solve.

So, having strayed from the comfort of Rails and its clones, I had to think about lots of things like security, validation, data access, and how to organise my code. Just because I’d abandoned the training wheels I had no intention of falling over all the time – I still wanted a nice, maintainable app with sensible  abstractions, properly decoupled, and resilient to failure. Time to start reading articles and the odd open source project, obviously.

It’s at this point I noticed something interesting. Since I regularly read plenty of development websites it could scarcely have escaped my notice that the trendy framework players (e.g. Rails, Django, Cake, ASP.NET MVC) strongly advocate the MVC pattern and class-based object-oriented design. What I hadn’t really realised until now is how endemic that viewpoint had become.

In fact, beyond a few admirably out-there frameworks like Seaside, it’s almost universal. OOD = good, EVERYTHING ELSE = bad. MVC = good, EVERYTHING ELSE = bad. No shades of grey, no room for dissenting opinion.

Go anywhere where best-practices are discussed and mention you’re writing some procedural code, and watch the fireworks. It doesn’t matter if your application has fewer lines of code than a newly-created Rails app has source files – if you haven’t structured it with models, views, and controllers you may as well have written it in Visual Basic for all the bile you’re going to have thrown at you.

If you say you’re writing functional code, you might get away with it, since functional programming is still widely misunderstood and you’ll likely be classified as some weird LISPer or Schemer doing something arcane and thus ignored.

Ironically, of course, if you grab a random Rails/Django/Cake app from github or Google Code, there’s a pretty fair chance that what you’ll find isn’t particularly object-oriented anyway. Hint – usage of the ‘class’ keyword does not an object-oriented design make. And sweet zombie Jesus, I’ve never seen such abuse of the singleton pattern. That’s a sure sign someone doesn’t ‘get’ OO – the singleton pattern is evil and basically a way to shoehorn globals into an application without admitting it to your friends.

So, we have massive fanatical advocacy of a technique that will allegedly solve all your problems, coupled with large-scale real-world misunderstandings and misapplication. Does this remind anyone of anything? Say, for example, the last time OOD swept the world, panacea to all programming woes, about 20 or so years ago?

Don’t get me wrong, I’m not arguing that class-based OO itself is just a fad – it deserves its place as a paradigm alongside procedural, functional, parallel, and numerous others. It’s the heralding of OO as the one true way that seems faddish.

So, in the very best “bah, humbug” traditions I’ve written my app in unashamedly procedural PHP code. I don’t mix my presentation and content. My data layer is decoupled and unit tested. Every bit of SQL is a parameterised query, to guard against injection. I don’t have a single echo() statement containing any html tags. All my errors are exception based, and I don’t have a single die() call anywhere. My average function size is about 10 lines, and my longest is about 20. I’ve no doubt that there’s a legion of 15-year-old self-appointed geniuses ready to accuse me of inflicting yet more spaghetti code junk on the world just because “find . -iname ‘*php’ | xargs grep class” comes up empty, but hey I’m OK with that.

I’m writing my next app in brainf*ck using ed.

  • Share/Bookmark