Archive for the ‘Mathematics’ Category

Project Euler Problem 3

Monday, April 7th, 2008

Next up in the list of Project Euler problems is this one:

Problem 3

The prime factors of 13195 are 5, 7, 13 and 29.

What is the largest prime factor of the number 600851475143?

This, obviously, is a factorisation problem. There is a colossal amount of material on the web for dealing with prime factorisation - a simple google search pulls up lots of information. Prime factorisation (and the difficulty of doing it with sufficiently large numbers) is at the heart of the cryptographic methods we currently use on the internet - every time you buy something from Amazon, you are protected by the fact that evil black-hats can’t find the prime factors of your encryption key fast enough to steal your credit card number (OK, bit of a generalisation, but that’s the gist).

One of the key phrases in the above paragraph is ’sufficiently large numbers’. For a computer, 600851475143 is not a particularly big number, so this problem can be brute-forced fairly easily. Of course, not all brute force approaches are created equal. The most naive algorithm would be something along the lines of a three-pass sweep - firstly test every single number between 2 and 600851475143 to see if it divides cleanly into 600851475143 (pass 1); then test each factor from pass 1 to see if it is prime (pass 2); and finally take the biggest of the pass 2 numbers to get your answer (pass 3).

This would work, but it sucks.

Fortunately, it’s easy to optimise. Let the prime factors of our number N be f1, f2 … fn. If I start with the lowest prime number and work up from there looking for a factor, I know that the first factor I find will be prime (since if it wasn’t prime, it would have factors of its own, which by definition would also be factors of our target number). This number is f1. I can divide the target number by f1 and then factorise the result to find f2. Continuing this process will result in a list of prime factors, and then it’s simply a case of selecting the largest.

I can optimise further by not resetting the factor to the lowest prime number each time - since having found f1 I know that there aren’t any smaller factors, so I don’t have to waste time looking for them. Here’s the implementation in python:

def primeFactors(n, factor):
    factors = []
    while (n % factor != 0):
        factor = factor + 1

    factors.append(factor)

    if n > factor:
        factors.extend(primeFactors(n / factor, factor))

    return factors

print max(primeFactors(600851475143, 2))

Note that in the recursive call the current factor is retained, so that the code doesn’t repeat itself.

This executes pretty quickly, but it could be better. For a start, since 600851475143 is odd there’s no need to start with the only even prime number (2). Instead, I could just start at 3, and in the while loop skip over even numbers. This would cut the number of tested numbers in half.

A more efficient trial division approach, however, would be to generate a list of primes, divide 600851475143 by each prime to find the prime factors, then simply select the largest. To use this solution, a prime number generator is needed.

This is an interesting diversion - I’ve peeked at some of the other Project Euler problems and know that prime numbers will pop up again, so it may prove useful to have a generator handy for when I get to those. Some languages, like Ruby, have library functions that can give you primes, but other languages don’t. If you’re not interested in generating primes and just want to know the answer to problem 3, execute the code above and you’re free to get down from the table.

A Random Walk Off-Topic

The simplest way to generate primes is known as the Sieve of Eratosthenes after the Greek mathematician who invented it. In principle it’s straightforward - take a list of all integers up to an arbitrary limit, then starting from 2 (the smallest prime), mark all the numbers that are multiples of 2. Then move to the next unmarked number (i.e. 3) and mark all the multiples of 3. Then you move to the next unmarked number (5, since 4 was marked as a multiple of 2) and mark all multiples. And so on, until you get to the end of your list. Whatever numbers remain unmarked are all the primes up to your arbitrary limit.

.Net lacks a built-in prime generator, so to demonstrate the algorithm I’ll create a simple C# implementation. The list of numbers is represented as an array of booleans, all set to true by default except indexes 0 and 1 (since we aren’t interested in evaluating those numbers as prime).

The other requirement for a funky contemporary .Net implementation is, of course, to expose the results with IEnumerable. This achieves two things - firstly, it lets the sieve class control enumeration and thus skip over the marked numbers (making the calling code cleaner), and secondly it lets me use LINQ to query it.

So, here’s the code:

public class SieveOfEratosthenes
{
    private bool[] m_numbers;

    public SieveOfEratosthenes(long limit)
    {
        m_numbers = new bool[limit + 1];
        for (long l = 2; l < m_numbers.LongLength; ++l)
        {
            m_numbers[l] = true;
        }

        for (int i = 2; i != -1;
                i = Array.FindIndex(m_numbers, i + 1,
                    b => b == true))
        {
            for (int j = i * 2; j < m_numbers.Length; j += i)
                m_numbers[j] = false;
        }
    }

    public IEnumerable<long> Primes()
    {
        for (long i = 2; i < m_numbers.LongLength; ++i)
            if (m_numbers[i])
                yield return i;
    }
}

Fairly straightforward. Basically, I start by marking 2 as prime. Then, an inner loop sets all multiples of 2 to false, since no (other) even numbers are prime. Each time round the loop, we find the next true element of the array (which will have a prime index), and mark all multiples false as per the description above. The loop terminates when FindIndex fails to find any more true elements.

This results in an array where the only elements with a value of true are those with a prime index. This makes the actual IEnumerable generator very easy to write - it yield returns the index whenever it finds a true element.

There’s a problem with this code, however, that makes it unusable with Euler problem 3 (at least in .Net - hence why I called it a ‘diversion’ earlier, rather than an alternative solution). In .Net, you can’t create an array with 600851475143 elements, since an array with 600851475143 elements is way above the maximum array size limit of 2GB. Even if each element is only a single byte, 600851475143 bytes is about 560GB.

Therefore, you can’t create a sieve big enough to solve the problem.

When using trial division it seems that it is enough to only generate primes up to the square root of N, though there are cases when this is not true (e.g. where N=15, sqrt(N) = ~3.873, but the largest prime factor of 15 is 5), and I don’t have the maths (yet!) to know how big a fudge-factor is needed. I’ve seen a solution on the Project Euler forum that generates primes up to sqrt(N) + 10, which solves the example of N=15 above, but does it solve ALL cases? Another approach might be to generate the list of primes such that the largest prime in the list is the first prime > sqrt(N) - but now I’m completely guessing.

Still, for numbers that fit inside the 2GB limit, we can find the largest factor easily.

var sieve = new SieveOfEratosthenes(15);

Now I have my IEnumerable sieve, I can craft a LINQ query to find the largest factor. All I need to do is filter my list of primes for those which divide directly into 15 and call the Max() method:

return (from p in sieve.Primes()
        where 15 % p == 0
        select p).Max();

Done! And I have a handy reusable prime generator for later on.

Project Euler Problems 1 and 2

Saturday, March 22nd, 2008

Browsing through Nate Hoellein’s blog recently led me to Project Euler. This is a problem - I have a horrendous feeling I’m about to get addicted to it, to the cost of just about everything else that normally occupies my free time. Ack.

Still, at least it provides some blogging material. I’m going to start working my way through the list, and try to create idiomatic solutions in a number of languages. I won’t always look for the most efficient solution, since I’m also interested in expressiveness (see here and here for previous posts on the subject).

To start with, here’s some code and thoughts for problems 1 and 2.

Problem 1

Add all the natural numbers below 1000 that are multiples of 3 or 5.

This is generally regarded as the easiest Euler problem, so shouldn’t present too many problems. Mainstream software development is still dominated by imperative languages and styles, so the most recognisable solution to this would be a straightforward for-loop. Here is an imperative C# solution:

int result = 0;
for (int i = 0; i < 1000; ++i)
{
    if (i % 3 == 0 || i % 5 == 0)
        result += i;
}

Simple enough, but as with all for-loops the guts are a little too visible. I have to explicitly declare and increment an accumulator variable as well as the loop counter. A functional style (Haskell in this case) allows a more declarative solution:

sum $ filter (\n -> n `mod` 3 == 0 || n `mod` 5 == 0) $ [1..999]

Or, with list comprehensions:

sum [n | n <- [1..999], n `mod` 3 == 0 || n `mod` 5 == 0]

In both cases, the loop is replaced by a list generated from Haskell’s range operator. [1..999] creates a list containing every integer between 1 and 999 inclusive. The modulo test is basically the same, though Haskell lacks a modulo operator (% in most C-family languages) so the mod function is used instead.

Just for fun, here’s an F# solution too:

let sum = List.fold_left (+) 0
let mod35 = fun x -> x % 3 = 0 || x % 5 = 0

List.filter mod35 [1..999] |> sum

This could be compressed into a one-liner like the Haskell solutions, but it would be a bit long for my taste. Also note F# is slightly hamstrung by the lack of a built-in sum function, so I have to define my own using fold. Another F# solution is here, but I prefer mine. There’s a very nice snippet in the comments of that page, though, which I like even more:

Seq.fold1 (+) {for i in 1..999 when i % 3 * i % 5 = 0 -> i}

Interestingly, C# is gaining some fairly powerful functional techniques lately, in particular LINQ. I can use the new Enumerable class to mimic range syntax and filter functionality from other languages, and lambdas to keep the code concise.

Enumerable.Range(1, 999).Where(
        f => f % 3 == 0 || f % 5 == 0).Sum();

Note the similarity between F#/Haskell’s lambda syntax and that of C#. It’s very cool that a mainstream C-derivative language is getting this sort of syntax added to it.

Alternatively, I could use LINQ query expressions for a different approach:

var nums = from n in Enumerable.Range(1, 999)
           where n % 3 == 0 || n % 5 == 0
           select n;
nums.Sum();

Fun!

It should be noted that all the Project Euler problems I’ve seen so far have mathematical solutions, meaning if you are able to classify the problem correctly it is straightforward to work out the answer with pen and paper. In this case, the problem is based around an arithmetic progression, and there are powerful formulae for reasoning about those. If you’re interested, check out the forum for problem 1.

Problem 2

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Find the sum of all the even-valued terms in the sequence which do not exceed four million.

Ooh, Fibonacci. I’ve been here before. Using the Haskell code from that post makes problem 2 a snip:

fib = 0 : 1 : zipWith (+) fib (tail fib)
sum $ filter even $ takeWhile (<4000000) fib

Given the lazy Fibonacci generator in the first line, this just uses standard Haskell functions from the Prelude to do all the work - reading right to left, takeWhile pulls data from the fib sequence until the test fails (i.e. we’ve reached 4,000,000), filter even does exactly what it says on the tin, and sum does the business on the result.

C# could solve problem 1 very neatly - can it keep up the pace in problem 2? Actually, yes it can, after a fashion. The lazy Fibonacci generator can be implemented using the yield statement added in C# 2.0. This is much more efficient than the naive recursive solution I looked at in my previous post about Fibonacci sequences. Once I have the generator, the LINQ statement is very concise and quite similar to the Haskell code - C# 3.0 even has TakeWhile!

IEnumerable<long> Fibs()
{
    long a = 0, b = 1;

    while (true)
    {
        yield return b;

        b = a + b;
        a = b - a;
    }
}

Fibs().TakeWhile(f => f < 4000000).Where(f => f % 2 == 0).Sum();

The more I use C# 3.0, the more I like it - there’s quite a bit of power in there.

As with problem 1, there are some fascinating mathematical tricks that can be utilised when solving problem 2, and I recommend you check out the forum. It’s particularly cool to see how the Golden Ratio can be brought into play when working with Fibonacci sequences - I had no idea these techniques existed. So much to learn!