Programming Pearl - Generating k Random Integers

Logical Moon

2015-09-15

.net

I mentioned in a previous post that I have begun reading Programming Pearls by Jon Bentley. There’s so much to say about this book but one solution in particular really caught my eye and at first, it didn’t make much sense to me. The question was this:

How could you generate a file of k unique random integers between 0 and n - 1, in random order?

It makes sense to try to fully understand the question before jumping into any answers, so, what is it asking us to do…really? Well, I won’t be generating any files because that’s not important to the problem but what are k and n? It looks like I need to produce k (as in a total) random numbers but they can be chosen from a population which is n - 1 in size. Put another way, if k was 5 and n was 10, I would end up with 5 random numbers which could be anything between 0 and 9, but crucially, they would be unique (i.e. no duplicates).

I’d just started reading the book, it was late after a long day, it was hot (add in 30 other excuses), so I jumped to the answer at the back and it made no sense to me despite appearing short and simple :-)

for i = [0, n)
    x[i] = i
for i = [0, k)
    swap(i, randint(i, n-1))
    print x[i]

Reading the above, it felt like one of those situations when someone says a sentence to you and whilst you know what each word means individually, the whole sentence might as well be in Latin. To decipher this, let’s start with some notation. Bentley states in the preface that:

…left and right parentheses denote open ranges (which do not include the end values), and left and right square brackets denote closed ranges (which do include the end values).

Ah! So, re-interpreting that first for loop, we get something like this in C#:

int n = 10;
var x = new int[n];
for (int i = 0; i < n; i++)
{
    x[i] = i;
}

Giving us an array of integers, each containing a value which matches the index. Shown graphically, it looks an awful lot like:

Index  Value
[0]    0
[1]    1
[2]    2
[n]    n

i, then, is just a loop counter. n the total number of random values to pick from and x, the integer array where it is all stored. So even though I might only want k values, I’m creating an array of n size. Hmm. What about the next for loop? I see that it first mentions the variable k, which was referred to in the question and this loop seems to go from 0 to k - 1 (because of the rounded bracket). Next, whilst going through the loop, it then swaps two things around (we know from the solution text – not shown above – that it is swapping elements in x) before printing the value in the array at x[i]. I think we need to start padding this out, implementing the parts we are sure about.

int k = 5;
for (int i = 0; i < k; i++)
{
    Swap The Two Elements in X;
    Console.WriteLine(x\[i\]);
}

There are various tricks for swapping values in place, but I will keep this simple so that I can focus on the algorithm. I notice though that we need a random number (randint) which I’m guessing is between i and n - 1. Or is that inclusive? Hmm. Not sure.

Let’s start with the random part and handle the swapping afterwards. Fortunately, the .NET libraries are quite generous and I can use something like this to generate the random integers:

static int randint(int start, int end)
{
    Random r = new Random();
    return r.Next(start, end);
}

People with tweezers will notice that I am continually creating the Random object. I know. They might also realise that I’m most likely hampering the true random element by not seeding it. I know that too :-) Don’t worry. This is just for fun and not meant to be fully operational; the point is to understand the algorithm. Oh, one last thing. Am I generating numbers between 0 and n - 1? The following comment answers that:

//Arguments: minValue -- the least legal value for the Random number.
//maxValue -- One greater than the greatest legal return value.

OK, so I’ve got my random numbers. What about swapping those elements? I’m going to deviate ever so slightly from the solution because I don’t want to make my array a global object, giving me:

static void swap(int[] arr, int pos1, int pos2)
{
    int tmp = arr[pos1];
    arr[pos1] = arr[pos2];
    arr[pos2] = tmp;
}

That’s all my pieces. Let’s look at the program as a whole, putting the swap and randint together:

namespace ProgrammingPearls
{
    class Program
    {
        static void Main(string[] args)
        {
            int n = 10;
            var x = new int[n];
            for (int i = 0; i < n; i++)
            {
                x[i] = i;
            }

            int k = 5;
            for (int i = 0; i < k; i++)
            {
                swap(x, i, randint(i, n - 1));
                Console.WriteLine(x[i]);
            }
        }

        static void swap(int[] arr, int pos1, int pos2)
        {
            int tmp = arr[pos1];
            arr[pos1] = arr[pos2];
            arr[pos2] = tmp;
        }

        static int randint(int start, int end)
        {
            Random r = new Random();
            return r.Next(start, end);
        }
    }
}

If we run it a few times, we get the following (kinds of) output:

Run #1
1
3
5
3
9

Run #2
6
2
4
7
9

Run #3
1
5
2
8
7

Looking good! Time to think about what is really happening here though unless we want to live in black-box town. We began by generating an array of all of the numbers we could get a random number of. That part was easy but less obvious was the reason and that’s why the second for loop is important. That one executes k times: one for each random number we want to generate. Using only the first k elements, we swap what was in that position of the array with something which is in the rest of the array (chosen by the random number, which won’t go lower than the current position). But how does it guarantee uniqueness? This is the really clever part. There are no duplicates in the array - we ensured that at the start - so we are simply picking up one value from another part of the array and using that. Since we don’t touch anything from lower down in the array (below the current position of i), we make sure we don’t interfere anything; they’ve already been printed, remember? That, then, is the end of this interesting algorithm.

As I have already said, there is so much in this book that I am sure I will come back to it again and again in my blog, but I hope it’s prompted you to give it a look, too.

Hi! Did you find this useful or interesting? I have an email list coming soon, but in the meantime, if you ready anything you fancy chatting about, I would love to hear from you. You can contact me here or at stephen ‘at’ logicalmoon.com