Sorting, Random Permutations, and a Little Bit of Probability

Sorting, Random Permutations, and a Little Bit of Probability
Page content

In this article, we’re going to discuss a shuffling algorithm, and then quick_sort – a sorting algorithm. We will be introducing a lot of new concepts. It will also be our first discussion of probabilities. There’s a lot of material to cover.

Here’s a python script which outputs a permutation of the integers \( \{0, 1, 2, …, N-1\}\).

random_perm()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import random

def random_permutation(N):
  indices = []
  for i in range(N):
    indices += [i]
  for i in range(N):
    r = random.random()*(i + 1)
    r = int(r)
    indices[i], indices[r] = indices[r], indices[i]
  return indices

We can think of \( \text{random_permutation}(N) \) as a random variable, on the space of all permutations of the set of indices \( I_N := \{ i \in \mathbb{Z} | 0 \leq i \lneq N \} \). This means that we can consider the output to be an element of \(S_{I_N} \) (the group of all permutations on \(I_N\). We introduced this here Some of the Math of Permutations and here More Math of Permutations). When we think of it as an element of that group, we’ll sometimes refer to it as \(\rho \)

Now, since \( \text{random_permutation}(N) \) is a random variable, it has a distribution. I.e. each possible output \( \rho \in S_{I_N}\) has a probability associated with it, \( P(\rho) \). We want to show that if \( \text{random.random()} \) is uniformly distributed on the interval \( [0,1] \), then \( \text{random_variable}(N) \) is uniformly distributed on the set of \( \rho \in S_{I_N} \).

When we say that \(\text{random.random()}\) is “uniformly distributed” we mean that

$$ \begin{align} &P(\text{random.random()} \in [a,b]) \\ &= b - a \\ \end{align} $$

where \( 0 \leq a \leq b \leq 1 \). On the other hand, when we say that \(\rho\) is uniformly distributed among all of the permutations of \(S_{I_N} \) we mean that \( P(\rho = \sigma) = (N!)^{-1} \), where \(\sigma\) is any permutation in \(S_{I_N}\). What this means in words is that every possible permutation of the indices \( \{0, 1, 2, …, N-1\} \) is an equally likely output of \( \text{random_permutation}(N) \).

Theorem (Random Permutation)

\( \text{random_permutation}(N) \) is uniformly distributed on \(S_{I_N}\) if \( \text{random.random}() \) is uniformly distributed on \( [0, 1] \)

Proof

We’ll be using induction to prove this, so suppose that the probability that \(\sigma = \text{random_permutation}(M) \) is equal to \( (M!)^{-1} \), for all \( M < N \) (where in this case, \(\sigma\) is a permutation on the set \(I_M\)). Notice that the base case where \(M = 1\) is satisfied, because \((1!)^{-1} = 1 \).

Let \(\sigma \) be an arbitrary element of \(S_{I_N} \) and let \( t:= (\sigma(N-1), N-1) \). The \(2\)-cycle which swaps \( \sigma(N-1)\) with \(N-1 \), while leaving everything else unchanged. If \( \sigma(N-1) = N-1\), then \(t\) is the identity permutation, instead of a \(2\) cycle. In either case, \(t\) is well-defined. Notice that \(\sigma = \rho \) if and only if \( t \circ \sigma = t \circ \rho \). This happens because permutations form a group under composition and we can apply cancellations within groups (as explained in More Introductory Group Theory – cancellations mini-lemma). Notice also that \( t \circ \sigma(N-1) = N-1 \). This means that it can be viewed as a permutation on the set of indices \(I_{N-1} \). By induction the probability that

$$ \begin{align} t \circ \sigma &= \text{random_permutation}(N-1) \\ &= ((N-1)!)^{-1} \\ \end{align} $$

Now, notice that in computing \( \text{random_permutation}(N) \), that at iteration \(i = N-2 \) of the for loop, we’ve generated a permutation on the first \(N-1\) elements in the array \( \text{indices}[] \). This must be true, because for all of the loop iterations where \( i = 0, 1, 2, … , N-2\), we have \(r \leq i\), so \( \text{indices}[N-1] \) (the index stored at the \(N-1\) position) cannot be affected by any of those iterations.

Now, what will make the output of \( \text{random_permutation}(N) = \sigma \) ? At the top of the for loop at iteration \(i = N-1\), we have to have \( \text{indices}[] \) correspond to the permutation \( t \circ \sigma \), which happens with probability \( ((N-1)!)^{-1} \). Then, for that iteration, we have to have \(r = \sigma(N-1)\). When (and only when) that happens, the final state of \(\text{indices}[] \) is equal to the array which corresponds to the permutation \( \sigma \).

What’s the probability that in the last iteration \(r = \sigma(N-1) \)? It’s exactly equal to \( N^{-1} \). Since \(r\) is selected uniformly at random from all the indices \(0, 1, 2, … , N -1 \).

Therefore, the probability that \( \sigma = \rho \) is equal to \( ((N-1)!)^{-1} \) times \( N^{-1} \), because each iteration uses a different call to \( \text{random.random}() \), and we’re assuming that each call is independent of all of the calls before it. Multiplying those two probabilities gives us \( (N!)^{-1}\), as we wanted.

QED

We used, but didn’t prove that fact that the probability of two independent events, \(A\) and \(B\) occurring is equal to \(P(A) \cdot P(B) \), but it is true. We’ll get to that in a later entry.

Now that we’ve discussed a nice way to scramble an array, let’s discuss some possible ways to unscramble it. That is, let’s discuss sorting algorithms.

There’s a good chance that you know more about them than we do. We’re not sortingologists, or anything like that.

We’re going to start by discussing one of the most classic sorting algorithm of them all, quick sort. Here’s our rendition of it in C++

quick_sort()

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void quick_sort(std::vector<_Float64>& input)
{
  quick_sort(input, 0, input.size());
}

void quick_sort(std::vector<_Float64>& input, int64_t start, int64_t end)
{
  if(end - start < 2) return;

  _Float64 tmp;
  int64_t i = start;
  int64_t j = end - 1;

  while (i < j)
  {
    if (input[i] >= input[i+1])
    {
      tmp = input[i];
      input[i] = input[i+1];
      input[i+1] = tmp;
      i++;
    }
    else
    {
      tmp = input[i+1];
      input[i+1] = input[j];
      input[j] = tmp;
      j--;
    }
  } // i == j
  quick_sort(input, start, i);
  quick_sort(input, i+1, end);
}

This function sorts a vector of floating point numbers. It has some very good qualities. Perhaps the best is that it’s easy to understand and in most cases it’s about as fast as the best sorting algorithms. However, it does have its drawbacks. The biggest one is that it’s performance depends on the initial ordering of the input vector. This might seem obvious, like it has to be a characteristic of all sorting functions, but it isn’t. There are sorting algorithms whose performance doesn’t depend on how the input vector is arranged at all. We won’t address any of those algorithms in this entry, though.

In order to understand it, we have to understand some of the basics of recursion. Notice we’ve used two quick_sort functions. The first just calls the second and passes along two additional parameters, the \(\text{start}\) and \(\text{end} \) variables. \(\text{start} \) is initialized to \(0\), and \(\text{end} \) is initialized to \(\text{input.size()} \), which is the length of \(\text{input}\). To understand what’s going on, we have to figure out what’s happening with the second, longer, quick_sort function.

This function isn’t very complicated, but we do have to figure out what’s going on just before the end where it calls itself twice. This is what makes this implementation of quick_sort recursive. Recursion happens whenever a function, say \(f: \mathcal{X} \rightarrow \mathcal{Y}\) is defined in a way that references itself on a smaller input. There also has to be a base case, i.e. a smallest possible input, and in that case \(f\) is defined without reference to itself. This is like induction, where there has to be a base case, and all other cases are resolved by assuming a solution exists for smaller cases.

For \(\text{quick_sort}\), the size of a case is \(\text{end} - \text{start} \), which is the length of the subvector to be sorted. The base case is when the length is less than \(2\). In that case, the subvector is already sorted. However, if that isn’t the case, then we define two more variables, \(\text{i} := \text{start} \) and \(\text{j} := \text{end} - 1\). These correspond to the positions of the first and last entries of the subvector that will be handled by this call (or instance) of quick_sort.

Since \(\text{end} - \text{start} \geq 2\), we have \(j - i \geq 1\), so they are not equal to begin with. This means that the \( \text{while} \) loop will execute at least once. In fact, we can calculate exactly how many times this while loop will execute. Notice on each iteration, either \(i \mapsto i+1 \), or \(j \mapsto j - 1\), but not both. The loop will not execute when \(i = j\), so therefore it will execute exactly \( j - i\) times, which is equal to \((\text{end} -1) - \text{i}\), which is \(1\) less than the length of the subvector.

Notice that \(i\) will always refer to the index of the floating point number which is initially stored at \(\text{input[start]} \). To better see why this is the case, notice that when the while loop is called for the first time, \(i = \text{start} \), and if \(\text{input[i]} < \text{input[i+1]}\) then we don’t change \(\text{input[i]} \) or the variable \(i\) at all. On the other hand, if \(\text{input[i]} \geq \text{input[i+1]}\), then we swap the values of those two entries of \(\text{input}\), and increase \(i\) to \(i+1\).

For example, if \(\text{input} = [1.5_i, 0.5, 2.5, 0.5, 3.5_j] \), where we’ve written subscripts to indicate the positions pointed to by \(i\) and \(j\), then after one iteration of the while loop, we will get \( [0.5, 1.5_i, 2.5, 0.5, 3.5_j] \) and further iterations of the loop will yield

$$ \begin{align} &[0.5, 1.5_i, 3.5, 0.5_j, 2.5] \\ &[ 0.5, 1.5_i, 0.5_j, 3.5, 2.5 ] \\ \end{align} $$

and finally \[ [ 0.5, 0.5, 1.5_{i = j}, 3.5, 2.5 ] \]

Also notice that by the end of the while loop, all of the entries to the left of \(i\) are strictly less than \(\text{input[i]} \), while those to the right of the \(i^{\text{th}} \) position are \( \geq \text{input[i]} \). This happens, because at every iteration, we’re comparing \(\text{input[i]} \)with \(\text{input[i+1]}\). If \(\text{input[i+1]}\) is strictly less than \(\text{input[i]}\), we swap them, and move \(i\) up one (to the new position of what was in \(\text{input[i]}\)), otherwise, we swap \(\text[i+1] \) with another entry from the right side – the entry at \(\text{input[j]}\). After swapping, we move \(j\) down one, so that we will never compare it with the same item again.

So, ultimately, at the end of each call of quick_sort, we perform a partial sort of \(\text{input}\), where everything less than \(\text{input[start]} \) is on the left hand side of \(\text{input[i]}\) and everything else ends up to the right. \(i\) ends up somewhere \(\geq \text{start}\) and \( \leq \text{end} - 1\). Then we call quick_sort to sort each side further. When we call quick_sort to sort out the left and the right side of \(i\), we don’t need \(\text{input[i]}\) to be a part of either of those sub-vectors. This is because it is already where it should be (it’s to the right of everything less than it and to the left of everything else which is greater than it). It works, and it will finish in a finite amount of time. However, we’d like to say more about its running time.

By our analysis, we can see that if we define \(t(n)\) to be the number of times the while loop is executed during the sorting of a vector of length \(n\), that \(t(n) = (n-1) + t(l) + t(r) \), where \(l + r = (n-1) \). \(t(1) = t(0) = 0\), and \(t(2) = 1\). However, already at \(t(3)\) we aren’t guaranteed at specific answer. It’s possible that

\[t(3) = (3-1) + t(2) + t(0) = 3\]

but could also get lucky and have

\[t(3) = (3 - 1) + t(1) + t(1) = 2 \]

For larger values of \(n\) the range of possible values will become even larger.

We can prove that \(t(n) \leq n^2 \), using induction. For the base cases, we’ve already shown that for \(n \leq 3\) we have \(t(n) \leq n^2\). Now suppose that for all \(k < n\), we have \(t(k) \leq k^2\). Let \( l = (n-1) - r\). Then we need to show that

$$ \begin{align} &t(n) = (n-1) + t(r) + t(l) \\ &\leq (n-1) + l^2 + r^2 \leq n^2 \\ \\ &\text{which is equivalent to} \\ \\ &(n-1) + r^2 + ( (n-1) - r)^2 \\ &\leq n^2 \tag{1}\label{1} \\ \\ &\text{where} \\ \\ &0 \leq r < n \\ \end{align} $$

Expanding the left side out, we get

$$ \begin{align} &n^2 - n - 2(n-1)r + r^2 \\ &\text{which is } \leq n^2 \\ &\text{if and only if } \\ &n + 2(n-1)r \geq r^2 \\ &\text{and that happens if and only if} \\ &n + 2nr \geq r^2 + 2r \\ \end{align} $$

If we let \(n = r + d\), where \(d \geq 1 \), then we have

\[ (r + d) + 2(r + d)r \geq r^2 + 2r \]

doing some algebra gives us

\[ r^2 + d + 2dr \geq r \]

which must be true. Therefore, we’ve shown that \(t(n) \leq n^2 \).

QED

If we assume that the overall running time of \(\text{quick_sort}\) is proportional to the number of while loop iterations, then this means that the running time of \(\text{quick_sort}\) is “on the order of” \(n^2\), which means that it’s a member of the class of \(O(n^2)\) sorting algorithms. We’ll go into that in more detail in a later entry, but this result is actually not very impressive. A good sorting algorithm should be in the class of \(O(n \cdot \log(n))\) algorithms. What makes it often useful, however, is that even though its worst running time can be \(O(n^2)\) it typically does much better. In fact, it’s typically about as fast as an \(O(n \cdot \log(n))\) algorithm.

We’ve got a lot more material to cover later on, but this will do it for this entry. Bye!