Archive for the ‘ Mathematics ’ Category

Project Euler Problem 7

Problem 7

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10001st prime number?

Ah, what a nice, straightforward, unambiguous spec! If only business software specifications were so precise.

Way back in problem 3, I took a bit of a wander off-topic and built a prime generator in .Net using the Sieve of Eratosthenes. Armed with this, problem 7 should be easy, right? The sieve implementation generates an IEnumerable, which is non-indexable (i.e. I can’t just say Primes()[10001]), but I can take the first 10,001 and then ask for the last element, which will be the answer to the problem.

There’s a problem with this, however. The sieve requires an upper bound during initialisation. This means it’s great for solving problems like “generate all the primes less than 10,001″, but not so great at answering questions like “what is the 10,001st prime number?”, since it requires foreknowledge of the upper bound.

To illustrate the problem, I’ll take a wild guess at the upper bound. I’m going to guess that the 10,001st prime number is less than 99,999. What happens?

var sieve = new SieveOfEratosthenes(99999);
var result = sieve.Primes().Take(10001).Last();

This generates an answer of 99,991. If I enter this into the Project Euler website, however, it tells me the answer is wrong. Gah! What went wrong? A simple test reveals the problem:

var sieve = new SieveOfEratosthenes(99999);
var primes = sieve.Primes().Take(10001);
var count = primes.Count();

There’s only 9,592 primes generated! As the docs for Take() state (emphasis mine):

Take<TSource> enumerates source and yields elements until count elements have been yielded or source contains no more elements.

Damn. So, looks like my 99,999 guess was too small – with that as an upper bound, the sieve only finds 9,592 primes, and I need the 10,001st. OK, I’ll bump it up by an order of magnitude:

var sieve = new SieveOfEratosthenes(999999);
var result = sieve.Primes().Take(10001).Last();

This gives me the correct answer. Not exactly a wonderful solution though; the idea of having to guess the upper bound is pretty horrendous, and if this was real code it wouldn’t be particularly maintainable – what if the requirements changed and we had to find the nth prime, which happened to be >99,999? We’d have to guess again. Ugh.

Worse, the sieve algorithm precomputes all the primes up to the specified upper bound, meaning that in the above approach I’ve asked the sieve to generate primes up to 999,999 (all 78,498 of them!) despite only needing 10,001. Not very efficient.

Fortunately, the upper bound can be calculated separately. Where n>8601, as in this case, we can use the following equation:

p(n) < n (loge n + loge loge n - 0.9427)

where p(n) is the nth prime number.

Alternatively, for flexibility in handling n<8601, we can use the less accurate

p(n) < n loge loge n

which works for n>5. We can easily precompute the answers for n<=5, or simply calculate on demand.

The formula can be implemented on the sieve class, with a factory method to help when we want to use it:

public static SieveOfEratosthenes
    CreateSieveWithAtLeastNPrimes(int n)
{
    return new SieveOfEratosthenes((long)
            Math.Ceiling(UpperBoundEstimate(n)));
}
 
private static double UpperBoundEstimate(int n)
{
    return n * Ln(n) + n * (Ln(Ln(n)));
}
 
private static double Ln(double n)
{
    return Math.Log(n, Math.E);
}

This leaves us with an overall solution like so:

var sieve = SieveOfEratosthenes.CreateSieveWithAtLeastNPrimes(10001);
var result = sieve.Primes().Take(10001).Last();

This generates a total 10,018 primes, cutting the wasted effort from almost 70,000 superfluous primes to just 17, and takes around 20ms to execute on my machine. Plenty fast enough, I think.

  • Share/Bookmark

Project Euler Problem 6

Onwards to…

Problem 6

The sum of the squares of the first ten natural numbers is,

12 + 22 + … + 102 = 385

The square of the sum of the first ten natural numbers is,

(1 + 2 + … + 10)2 = 552 = 3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 385 = 2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

Bit of a disappointment, problem 6; it’s too easy. It’s rated as the third-easiest, i.e. easier than problems 3, 4, and 5 which I’ve already covered. In fact, for my money it’s easier than problem 2 as well. Ah well, the difficulty ramps up soon enough, trust me.  Here’s the very simple python solution:

sum_sq = sum([ x*x for x in xrange(1, 101)])
sq_sum = sum(xrange(1, 101)) ** 2
 
print sq_sum - sum_sq

As you can see, it’s pretty intuitive. You sum the squares, square the sum, and calculate the difference. The answer is basically in the description, you just have to scale up a little.

There’s not much else to say about this one. Even if I abandon the functional approach and write a straightforward imperative solution it’s still very straightforward. In (deliberately non-idiomatic, so don’t whine at me) ruby:

sum_of_squares = 0
sum = 0
 
1.upto 100 do |x|
    sum_of_squares += x * x
    sum += x
end
 
p (sum * sum) - sum_of_squares
  • Share/Bookmark

Project Euler Problem 5

On to the next Project Euler problem (after a bit of a hiatus)…

Problem 5

2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.

What is the smallest number that is evenly divisible by all of the numbers from 1 to 20?

In common with many of the other Euler problems, there’s a brute-force way to solve this, and a clean algorithmic way. And in common with my other Euler posts so far, I’ll start with the brute-force way ;-)

This problem can be tackled head-on with the following approach: Start from n=1 and increment in a loop. Test each value of n by attempting to divide it by all numbers m from 1 to 20. The first number to pass the test (i.e. n mod m is 0 for all values of m) is the answer.

private static long BruteForceSolver()
{
    long result;
    for (result = 1; !Check(result); ++result)
        ;
    return result;
}

private static bool Check(long result)
{
    for (int i = 1; i <= 20; ++i)
    {
        if (result % i != 0)
            return false;
    }

    return true;
}

This works, but it takes >12 seconds to execute on my PC, so it’s not what you’d call efficient (though it is well within the Euler execution time guidelines).

Some speed gains can be achieved by exploiting the information provided in the question itself. We are told that 2520 is the lowest number evenly divisible by all numbers from 1 to 10. Since the problem space (1 to 20) includes all these numbers, the answer must also be evenly divisible by 2520. This allows much bigger increments each loop – rather than incrementing by 1, why not increment by 2520? And since the answer must be greater than or equal to 2520, why not start the loop there instead of 1? Finally, since we already know that 1 to 10 divide evenly into 2520, each inner loop only needs to check numbers 11 to 20.

That should speed things up a bit:

private long BruteForceSolver()
{
    long result;
    for (result = 2520; !Check(result); result += 2520)
        ;
    return result;
}

private bool Check(long result)
{
    for (int i = 11; i <= 20; ++i)
    {
        if (result % i != 0)
            return false;
    }

    return true;
}

And indeed, on my machine this is now down to 150ms or so. It’s still not a very nice way to tackle the problem, though.

Thinking about it from a different angle yields an altogether smarter approach. Imagine we are looking for the lowest number evenly divisible by the numbers 1 to 2.

[1, 2]

Well that’s easy; since there are only two numbers we just find the lowest common multiple (LCM), which in this case is 2 (since 2 % 2 == 0, and 2 % 1 == 0). If we call this sequence s1, we can say that LCM(s1) = 2.

OK, now imagine we are solving the same problem for s2, which contains the numbers 1 to 3.

[1, 2, 3]

You’ll notice that s2 contains s1 in its entirety. LCM(s2) must therefore be a multiple of LCM(s1), so we can rewrite s2 as [LCM(s1), 3], or [2, 3] (since we know LCM(s1) = 2). Now we are down to two numbers again, so we can calculate the LCM of 2 and 3, which is 6, so LCM(s2) = 6.

OK, now we solve the problem for the first 4 numbers (s3).

[1, 2, 3, 4]

This sequence contains s2, therefore LCM(s3) is a multiple of LCM(s2). We can rewrite s3 as [LCM(s2), 4], or [6, 4]. Thus, LCM(s3) = 12.

This can be repeated as many times as necessary. Generally, we have sn = [LCM(sn-1), n+1] where n > 0.

This looks recursive, but a better way to think of it is as an excellent example of a fold. A fold is one of the fundamental tools of functional programming. In fact, it is perhaps the most fundamental, since map, filter etc can be implemented as right folds[1].

I won’t inflict my pitiful Photoshop skills on anyone by trying to graphically represent a fold – try looking at this Wikipedia article if you want to try and visualise it.

Broadly, the behaviour of a fold is to apply a combining function to elements in a list (or other data structure) and accumulate the results. That’s exactly what we want here – our combining function is LCM, and our accumulating value is the LCM of the whole list. Effectively, for list s3 above, we have

LCM(s3) = LCM(LCM(LCM(1,2),3),4) = 12

Note how the result of the innermost LCM (applied to values 1 and 2) becomes a parameter to the next LCM, which in turn becomes a parameter to the outermost LCM which returns the result we want.

By using a fold, we can generalise. In Haskell, the whole problem is a one-liner:

foldl lcm 1 [1..20]

The 1 passed in as a parameter represents the terminating value to use when the end of the list is reached. It is common for this value to be the first element of the list, so Haskell provides a convenience function that removes the need to specify it as a parameter:

foldl1 lcm [1..20]

Not all languages and platforms provide an LCM function right out of the box, so to take this neat Haskell solution and port it to .Net, the LCM function needs to be implemented. This is easily done in terms of the greatest common divisor (GCD) like so:

.Net doesn’t provide a GCD function either, so I’ll implement it using Euclid’s Algorithm as an extension method on long ints:

public static long GCD(this long a, long b)
{
    while (b != 0)
    {
        long tmp = b;
        b = a % b;
        a = tmp;
    }

    return a;
}

With GCD defined, LCM can be implemented as above:

public static long LCM(this long a, long b)
{
    return (a * b) / a.GCD(b);
}

With this in place, it’s a simple matter to use .Net’s equivalent of fold – a method on IEnumerable called Aggregate – to get the answer[2]:

return LongEnumerable.Range(1, 20)
    .Aggregate(1L, (curr, next) => curr.LCM(next));

And indeed, the same basic pattern can be used to solve the problem in a number of languages. In F#, given implementations of LCM and GCD as above, we have:

List.fold_left lcm 1 [1..20]

And in ruby:

require 'rational'
(1..20).inject { |c, n| c.lcm n }

Given that the right algorithm makes this problem a fairly trivial expression in all these languages, it’s pretty hard to identify which is the nicest. I think overall I’ll give the nod to Haskell, however, for not making me implement LCM and because I find ruby’s ‘inject’ a less intuitive function name than foldr (but that’s probably because I learned the technique in Haskell in the first place and am set in my ways…)


[1]For example, in F#:

let filter p lst =
  List.fold_right (fun x xs -> if p x then x::xs else xs) lst []

let map f lst =
  List.fold_right (fun x xs -> f x :: xs) lst []

[2]Note that in this code LongEnumerable is just a very simple partial reimplementation of Enumerable, using longs instead of ints

  • Share/Bookmark

Project Euler Problem 4: Extra

Couple of things to add to yesterday’s post about problem 4. As is so often the case in life, no sooner had I finished the article than I realised there was an obvious additional step I could make, which I’d somehow failed to spot.

Regarding the C# solution, an easy win having implemented the Reverse extension method would be to add an IsPalindrome extension to the string class too. The implementation is straightforward:

public static bool IsPalindrome(this string s)
{
    return s == s.Reverse();
}

With this done, the where clause in the LINQ query is more readable, and we have a couple of handy reusable string extensions into the bargain.

var result = (from product in AllProducts.From(100).To(999)
              where product.ToString().IsPalindrome()
              select product).Max();

Also, Sol commented that the C code could have a direct implementation of a palindrome function, rather than messing about with strrev, since the implementations are very similar. Whilst this series isn’t really focussed on the performance benefit of this approach, it does also make the code more expressive, so I’ll include it:

int strpalindrome(char* s) {
    char *s1, *s2;

    s1 = s2 = s;
    while (*s2)
        s2++;

    while (s1 < s2) {
        if (*(s1++) != *(--s2))
            return 0;
    }

    return 1;
}

The loop now looks like this:

    for (i = 100; i < 1000; ++i) {
        for (j = i; j < 1000; ++j) {
            int sum = i * j;
            sprintf(s1, "%d", sum);
            if (strpalindrome(s1) && sum > largest) {
                largest = sum;
            }
        }
    }

If you are interested in the Project Euler problems but craving more detailed analysis, Joel Neely is working through at a similar rate to me, but focusing his efforts on Scala and studying each problem and its solution in greater depth rather than flitting from language to language. Highly recommended.

  • Share/Bookmark

Project Euler Problem 4

Problem 4 is as follows:

A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.

Find the largest palindrome made from the product of two 3-digit numbers.

Bit of an easy one, this. The approach is pretty simple to understand – first calculate all the products of every pair of numbers between 100 and 999, then filter for the palindromic ones, and finally select the largest. The only even vaguely tricky bit is determining if the number is palindromic. The easiest check is to simply convert the number to a string, and check if the string is equal to itself when reversed.

To shake myself out of C#/python complacency, I decided to write my first attempt at this in good old C. I’m a bit rusty so this took a few goes to get right (the shame).

First, I need a string reverse function. For those of us that learned to program when C and C++ were king (before that upstart Java came long and ousted languages with pointers from classrooms up and down the land), this is bread and butter.

char* strrev(char* s) {
    char *s1, *s2;
    char c;                           

    s1 = s2 = s;
    while (*s2)
        s2++;                           

    while (s1 < s2) {
        c = *(--s2);
        *s2 = *s1;
        *s1++ = c;
    }
    return s;
}

If you can’t read that, shame on you, go and pick up a copy of K&R and read it until you weep. In the meantime, basically what happens here is I set pointers to the start (s1) and end (s2) of the original string (s), then swap the pointed-to characters using a temporary variable (c) and move both pointers 1 character towards each other. Repeat until they meet in the middle.

In this day and age of immutable strings, this old friend now feels a little weird – although I retain (and eventually return) the original pointer, I have in fact modified the actual string that was passed in. Contrast this with C’s trendy modern progeny, where you can’t change a string at all and have to use a StringB[uilder|uffer] when mutating strings (unless lousy performance makes you smile, of course).

Still, now it is fairly simple to solve the problem. A nested for loop will let me calculate all the products, and then I just need to convert the results to strings and do the palindrome test. I keep track of the largest palindromic number found so far, and print it at the end.

int main() {
    int i, j;
    int largest = -1;
    char s1[7];
    char s2[7];                           

    for (i = 100; i < 1000; ++i) {
        for (j = i; j < 1000; ++j) {
            int sum = i * j;
            sprintf(s1, "%d", sum);
            strcpy(s2, s1);
            strrev(s2);
            if (strcmp(s1, s2) == 0 && sum > largest) {
                largest = sum;
            }
        }
    }                           

    printf("%d\n", largest);                           

    return 0;
}

Note I’m using the much-maligned strcpy function here (the cause of most of the buffer overflow attacks that were so endemic a few years back), but since I completely control the input it’s no problem. Also note the use of sprintf to convert the int to a string, and I have to make a copy of the resulting string since my strrev function is destructive. The char arrays are of size 7, since the largest possible product in the problem space is 999*999=998001 which is 6 digits – plus 1 for the null terminator. Which I didn’t forget about at all in my first attempt at this, nosirree.

To make the code a bit more ‘modern’ I could do the allocation and copy in the strrev function, so that the passed-in string remains unchanged and a new string gets returned, but without a garbage collector to rely on this potentially leads to memory leaks (since strrev allocates the memory but relies on the caller to free it – easy to forget!). Anyway, I’m wallowing in nostalgia here so who cares about modern idioms.

Whilst I chose C for this on a whim, it is (as always) enlightening to write a little C now and then, as it reminds you of the cost of things that are often taken for granted. Some people still don’t understand why a StringBuilder offers better performance (trust me, I have interviewed more than a few of them), and are happy to write string manipulation code using immutable strings that results in countless allocations and destructions taking place, for no justifiable reason. Writing some string manipulation (or anything else) in C is a nice way to regain a bit of insight and perspective if you are spoiled by quad-core PCs and high-falutin’ generational garbage collectors and smartass runtimes that don’t let you write past the end of an array.

So, now we’ve got a nuts ‘n’ bolts reference implementation, let’s look at some more exotic approaches.

As regular readers will have noticed by now, I’m fairly keen on LINQ – so it should be no surprise that C# is my next port of call. I was amused to recall that the .Net string class lacks a Reverse method so I had to write my own for this, about 20 minutes after I finished pontificating about the clean healthy virtue of doing so in C! (I didn’t plan it this way, honest.)

There are of course many ways of writing a string reversal routine, but rather than attempt to mimic the fairly idiomatic C code above, I did the simple thing:

public static string Reverse(this string s)
{
    char[] ch = s.ToCharArray();
    Array.Reverse(ch);
    return new string(ch);
}

Since the Array class contains a Reverse method, I can just convert my string to an array (of chars), reverse that, then create a new string from it. Done. Things to meditate on regarding this approach:

  1. It relies on a char array. Strings may look like a native type these days, but I have to expose their dark and shameful lineage to get the job done here.
  2. The throwback nature of this implementation does not, of course, extend as far as modifying the parameter. A new string is returned and the original remains intact. The garbage collector will take care of deallocation.
  3. This is an extension method on System.String, so I can use it naturally on any string.

Whatever faults it may have, it’s definitely easier to read than the C code, since the syntax is much closer to the problem domain. This is a recurring theme when looking at expressiveness. The C code has to specify the entire algorithm for reversing the string, whereas here the Array.Reverse method allows us to ignore the details of how the string is reversed. For the purposes of this problem, we don’t really care how the string is reversed, just that it is reversed.

It’s still warty, however, in that we have to know to turn the string into an array first, which may be completely non-intuitive to someone who’s never tangled with C-style strings.

With this minor omission from the .Net libraries sorted, the problem can be solved with a single compound LINQ query:

return (from product in
            (
             from i in Enumerable.Range(100, 900)
             from j in Enumerable.Range(i, 1000 - i)
             select i * j)
        where product.ToString() == product.ToString().Reverse()
        select product).Max();

I think that’s pretty concise, and quite readable too. The main thing I don’t like about it is the use of Enumerable.Range, which takes ‘start’ and ‘count’ parameters rather than ‘from’ and ‘to’, which would look more natural in this case. 

Parameters aside, it’s interesting to note the relative clumsiness of the twin calls to Enumerable.Range. Back when looking at problem 1, replacing a for loop with a more declarative alternative made the code considerably more expressive. In this case, however, I don’t think it helps quite so much. Once again, it’s to do with the nature of the problem domain – a nested for loop is quite a natural way to represent the process of generating the products, so the benefit of a declarative approach is less marked.

How to improve it? For fun, lets go the whole hog and make a simple fluent interface to improve readability of the LINQ query. This is what we want:

return (from product in AllProducts.From(100).To(999)
        where product.ToString() == product.ToString().Reverse()
        select product).Max();

Nifty, huh? Well actually I have my reservations about fluent interfaces, but it’s quite the fashion these days so I thought I’d give it a chance. The example above is trivial to achieve. On a class called AllProducts we need a static method called From which acts as a factory method, and an instance method called To which returns an IEnumerable<int> for use in the LINQ query. The class looks like this:

class AllProducts
{
    private int m_from;
    private AllProducts(int @from)
    {
        m_from = @from;
    }                    

    public static AllProducts From(int from)
    {
        return new AllProducts(@from);
    }                    

    public IEnumerable<int> To(int to)
    {
        int inclusiveTo = to + 1; // to is inclusive                    

        return from i in Enumerable.Range(
                   m_from, inclusiveTo - m_from)
               from j in Enumerable.Range(i, inclusiveTo - i)
               select i * j;
    }
}

Fluent interfaces are definitely this season’s black. I heard about a guy who wrote a mocking library without using fluent interfaces for expectations, and 500 angry TDD advocates chased him out of the building with pitchforks.

I’m still a bit wary though, tweedy programmer as I am – I just get a bit nervous about writing hideously contorted classes with a mixture of static and instance methods, some returning the this ref, some returning arbitrary IEnumerables, some acting as factories – and all so the calling code can prance about in a tailored coat and a cool pair of shades. A noble goal, to be sure, but is the price too high? I guess we’ll know in a year or two when some maintenance programmer has to try and debug it.

Enough of the lousy clothes metaphor. To finish up what turned out to be a longer post than expected, here’s an F# solution I hacked together before getting sidetracked with the whole fluent thing. I read a blog post here about problem 4 (warning – also contains solution to problem 6) but didn’t like it too much. It seems to be quite common when reading F# code on the web for there to be a reliance on Seq.unfold – I’m not sure it’s always the right tool. Then again, my F# is sketchy at best for the moment, so maybe I should shut up. For balance, however, this is my solution without using unfold.

Note first that F#, as a .Net language, also lacks a built-in way to reverse strings. The implementation is extremely similar to the C# approach above:

let rev (s:string) =
    let ch = s.ToCharArray()
    let ra = Array.rev ch
    let r = new string(ra)
    r

The actual implementation involves two helpers: a small recursive function to produce all possible products of the numbers contained in two lists, and a simple check to see if a string is equal to itself reversed.

let Euler4 =
    let rec allProducts l1 l2 =
        let mul a lst =
            List.map (fun x -> a * x) lst
        match l1 with
        | [] -> []
        | h::t -> (mul h l2) @ (allProducts t l2)               

    let isPalindrome x =
        let str = Int32.to_string x
        str = rev str

With the plumbing in place, we can use the very groovy forward pipe operator to chain together some very readable code. The only thing to note in here is the reverse ordering of the parameters in the call to compare – this is because we want the results in descending order, so that the largest is at the head of the list and easily accessible with List.hd.

    allProducts [100..999] [100..999]
        |> List.filter isPalindrome
        |> List.sort (fun x y = compare y x)
        |> List.hd
  • Share/Bookmark