More on Sorting and Analyzing Run Time Complexity
In the previous entry (Sorting, Random Permutations, and little bit of Probability), we introduced quick_sort
, gave a version of it in C++ and started to analyze how many steps it takes to sort a vector of floating point numbers. In this entry, we continue that analysis and prove some results that will help us get a feel for other recursive algorithms.
In what follows, we’re going to reference the C++ version of quick_sort
given in (Sorting, Random Permutations, and little bit of Probability). It might be useful to keep that page open in a side tab. We’re going to focus on the quick_sort
function that takes two indices and a vector as its input. The two indices are start
and end
, respectively. Notice that the while
loop has two indices, i
and j
and that the loop executes as long as i < j
. Initially, i = start
and j = end - 1
. We also showed that the while loop executes exactly \(\text{end} - \text{start} - 1 \) times.
This allowed us to prove that the while
loop never runs more than \(n^2\) times, where \(n\) equals input.size()
. Since the total number of operations involved in completing a run of quick_sort is proportional to the number of times the while loop executes, the amount of time it takes to finish on a given input vector of length \(n\) is no more than \( c \cdot n^2\), for some \(c \in \mathbb{R} \). This is often written as \( O(n^2)\). More precisely,
Definition (big O notation)
Let \(g : X \rightarrow Y \) be a function where both sets \(X\) and \(Y\) have a total ordering. Then we define
$$ \begin{align} &O(g) := \{ f : X \rightarrow Y | \\ &\forall x \geq w_{f}, f(x) \leq c \cdot g(x) \} \\ \end{align} $$
where \( w_{f} \in X\) and \( c \in Y\). A different \(w_f\) can be specified for each function, \(f\). Sometimes we say “\(f \) is \(O(g) \)”, instead of \( f \in O(g) \).
We’re going to prove some results which will help us determine what \(O\) set a recursively defined function \(T\) belongs to. But first, we consider a useful function.
Definition ( i_log_b(n) )
Let \(n, b \in \mathbb{N} \) be positive natural numbers (\( b \geq 2\)). Let \( \left\lceil \frac{\cdot}{b} \right\rceil : \mathbb{N} \rightarrow \mathbb{N}\) be the function which maps \(n\) to the least integer greater than or equal to \(\frac{n}{b}\) (also written \( \left\lceil \frac{n}{b} \right\rceil \)), then we define \( \text{i_log}_b(n) := \) the least integer \(i\), such that \( {\left\lceil \frac{\cdot}{b} \right\rceil}^i (n) = 1 \). Where, for example, \( {\left\lceil \frac{\cdot}{b} \right\rceil}^2 (n) \) equals \( \left\lceil \frac{ \left\lceil \frac{n}{b} \right\rceil }{b} \right\rceil\).
Lemma (1)
\( \text{i_log}_b(n) = \left\lceil { \log_b(n) } \right\rceil\).
(This is the reason nobody uses \( \text{i_log}_b(n)\))
Proof
We start by proving that \( \text{i_log}_b(n) \) is a non-decreasing function, i.e. \( \text{i_log}_b(n) \leq \text{i_log}_b(n+1) \). Suppose for a contradiction that sometimes it’s decreasing, then let \(s\) be the smallest positive natural number such that \( \text{i_log}_b(s) > \text{i_log}_b(s+1) \). Then, by definition, we have \( \text{i_log}_b( \left\lceil \frac{s}{b} \right\rceil) > \text{i_log}_b( \left\lceil \frac{s+1}{b} \right\rceil ) \). We’re almost to a contradiction, but there are a couple of minor things we need to show. First, notice that \( \left\lceil { \cdot } \right\rceil \) is non-decreasing. This means that \( \left\lceil \frac{s}{b} \right\rceil \leq \left\lceil \frac{s+1}{b} \right\rceil \). Second, we claim that since \(b \geq 2\), we have \( \left\lceil \frac{s}{b} \right\rceil < s \) (and also for \(s + 1\)). Notice that if we can prove this, then we have a contradiction, because \(s\) was assumed to be the least such natural number.
Now, let \( s = b \cdot q + r \), where \(0 \leq r < b\), and suppose \( \left\lceil \frac{s}{b} \right\rceil \geq s \). Then \( \left\lceil \frac{b \cdot q + r}{b} \right\rceil \geq s \), which implies that \( q + \left\lceil \frac{r}{b} \right\rceil \geq b \cdot q + r\), which is impossible, since \(b \geq 2 \). Therefore, we’ve shown that \( \text{i_log}_b(n) \) is non-decreasing.
Finally, notice that for all \(k \geq 0 \), \( \text{i_log}_b(b^k) = \log_b(b^k) = k \). So we have \( \text{i_log}_b(n) \leq \log_b( b^{\left\lceil { \log_b(n) } \right\rceil} ) \), which is equals \( \left\lceil \log_b(n) \right\rceil \). Therefore, if we can show that \( \text{i_log}_b(b^k + 1) > \text{i_log}_b(b^k + 1) \), then we are done.
This is true, because \( \left\lceil \frac{b^k + 1}{b} \right\rceil \) is equal to \( \left\lceil b^{k-1} + \frac{1}{b} \right\rceil\), and that is equal to \( b^{k-1} + 1\). Therefore, we will end up with a chain of integers, \( b^k + 1\), \( b^{k-1} + 1\), … , \(b^2 + 1 \), \(b + 1\), \(2\), and then \( 1\). However, without the \( + 1\), we would have \( b^k\), \(b^{k-1}\), … , \(b^2\), \(b\), and \(1\), which is one number shorter.
QED
Now we return to our discussion of recursively defined functions. Suppose that \( T(n) := f(n) + a \cdot T( \left\lceil \frac{n}{b} \right\rceil ) \), where \(a \in \mathbb{Q} \), and \(b \in \mathbb{N} \), then notice that we can rewrite \(T\) as follows
\[ \sum\limits_{i = 0}^{\lambda} a^i f( {\left\lceil \cdot \right\rceil}^i(n) ) \tag{1}\label{1}\]
Also notice that this holds even if we replace equality in the definition of \(T\) with \( \leq \). If \(f(0) = 0 \), then \(\lambda = \left\lceil \log_b(n) \right\rceil \), by lemma (1). Furthermore, we can see that if \(f \) is non-decreasing and \( n \geq b \), then
$$ \begin{align} &T(b^{ \left\lceil { \log_b(n) } \right\rceil - 1 }) \leq \\ &T(n) \leq T( b^{ \left\lceil { \log_b(n) } \right\rceil } ) \tag{2}\label{2} \\ \end{align} $$
This is quite useful, because by equation (1), we can see that if \(n = b^k \), then we have
\[ T(b^{k}) = \sum\limits_{i = 0}^{ k } a^i f(b^{k - i}) \tag{3}\label{3}\] and further, if \(f(n) = cn^{\alpha} \), then \[ T(b^{k}) = \sum\limits_{i = 0}^{ k } a^i c (b^{k - i})^{\alpha} \tag{4}\label{4}\] which is equal to \[ T(b^{k}) = \sum\limits_{i = 0}^{ k } a^i c (b^{\alpha})^{k - i} \tag{5}\label{5}\] If we factor out \(c\) and \((b^{\alpha})^k \), we get \[ c(b^{\alpha})^k \sum\limits_{i=0}^{k} { (\frac{a}{b^{\alpha}})^i } \tag{6}\label{6}\] which is a geometric sum. After some algebra, we have \[ T(b^k) = \frac{c}{a - b^{\alpha}} [a^{k+1} - (b^{\alpha})^{k+1}] \tag{7}\label{7}\] This holds as long as \( a \neq b^{\alpha} \). However, if those two are equal then, by equation (6), we see that \[ T(b^k) = c (b^{\alpha})^k \cdot k = f(b^k) k \tag{8}\label{8}\] because each term of the sum is equal to \(1\) in that case. With a bit more algebra, and using the fact that \(a^k = (b^{\log_b(a)})^k = (b^k)^{\log_b(a)} \), we see that equation (7) implies \[ T(b^k) = \frac{c}{a - b^{\alpha}} [ a (b^k)^{\log_b(a)} - \frac{b^{\alpha}}{c} f(b^k) ] \tag{9}\label{9} \]
Lemma (2)
Let \(h\) and \(l\) be non-decreasing functions from \(\mathbb{N} \) to \(\mathbb{N}\), such that for some \(b\), and every \(k \geq 0 \), \(h(b^k) \leq l(b^k)\), then for all \(n \in \mathbb{N}\), we have \(h(n) \leq l(bn) \).
Proof
Notice that \(h(n) \leq h(b^{ \left\lceil \log_b(n) \right\rceil }) \) , and this is less than or equal to \( l(b^{ \left\lceil \log_b(n) \right\rceil }) \leq l(bn)\)
QED
Lemma (3)
- If \(a < b^{\alpha} \), then \(T(b^k) \leq \frac{2c}{a-b^{\alpha}}\frac{b^{\alpha}}{c}f(b^k)\).
- If \(a > b^{\alpha} \), then \(T(b^k) \leq \frac{2c}{a-b^{\alpha}}a(b^k)^{ \log_b(a) } \).
Proof
Equation (7) above is equal to a geometric sum of non-negative terms, so we can assume that equation (9) is always non-negative. Then the result follows both for case 1 and 2 by considering equation (7) and observing that \( x \geq y \) implies that \(x^k \geq y^k\) for all \(k \geq 0 \).
QED
These results allow us to prove something which can be quite useful. It’s almost as powerful as the “Master Theorem” from the Introduction to Algorithms text book.
Corollary (Almost the Master Theorem)
Let \(T(n) = f(n) + aT(\left\lceil {\frac{n}{b}} \right\rceil)\), where \(f(n) = cn^{\alpha}\), then
- If \(a < b^{\alpha} \), then \(T(n) \in O(f(n)) \).
- If \(a > b^{\alpha} \), then \(T(n) \in O(n^{\log_b(a)}) \).
- If \(a = b^{\alpha}\), then \(T(n) \in O(\log_b(n)f(n))\).
Proof
Case 1. Combining the previous two lemmas, we see that \(T(n) \leq \frac{2c}{a-b^{\alpha}}\frac{b^{\alpha}}{c}f(bn) \). Notice that since \(f(n) = cn^{\alpha}\), we have \(f(bn) = c(bn)^{\alpha} = c(b^{\alpha})n^{\alpha}\), and that is equal to \(b^{\alpha}f(n) \). This proves case 1.
Case 2. The proof is very similar to case 1.
Case 3. Combining lemma (2) with equation (8), we get \(T(n) \leq f(bn)\log_b(bn) \). We just showed in the proof of case 1 that \(f(bn) = b^{\alpha}f(n)\). Combining that with the fact that \(\log_b(bn) = 1 + \log_b(n) \), we get \(T(n) \leq b^{\alpha}f(n)[\log_b(n) + 1] \). When \(n \geq b\), we have \(T(n) \leq 2b^{\alpha}f(n)\log_b(n) \), which proves case 3.
QED
We see from our implementation of quick_sort
that \(f(n) = n \), so \(\alpha = 1\). Thus, if we had \(a = b\), then by case 3 of the corollary, we would have \(T(n) \in O(n\log_b(n))\), which is better than \(O(n^2)\). It turns out that with our implementation, this isn’t possible. We can, however, show a weaker result about the expected number of steps it will take to finish on a given input vector.
On a given call to quick_sort(input, start, end)
, the while
loop exits with j == i
, if we treat the final value of those variables as a random variable with possible values in
$$ \begin{align} \{&\text{start}, \text{start} + 1, …, \\ &\text{start} + ((\text{end} - 1) - \text{start}) \} \end{align} $$
then if this random variable is uniformly distributed (see Sorting, Random Permutations, and little bit of Probability) among those values, then \(E[T(n)] \in O(\log_b(n)n) \), where \(E[T(n)]\) is the expected value of \(T(n)\).
You’ve probably encountered expected values before, but for clarity, if \(X\) is a discrete random variable, taking on values in a set \(\mathcal{X}\) with probability \(P(X = x)\), for each \(x \in \mathcal{X} \), then \(E[X] = \sum\limits_{x \in \mathcal{X}} (x P(x)) \).
In order to get the result, we’ll have to use a bit of calculus. We’ll use a trick very similar to what we used in (The Sum of all Prime Reciprocals is Infinite). We won’t go into it in as much detail this time.
Lemma (3)
Proof
Since \( \frac{d}{dx} x \ln(x) = 1 + \ln(x) \), \(x \ln(x)\) is a non-decreasing function when \(x \geq 1\). That means that \( \sum\limits_{k = 1}^n k\ln(k) \leq \int_1^n x \ln(x) dx\). Using “u substitution”, we get that that integral is equal to \(\frac{1}{2}[n^2(\ln(n) - \frac{1}{2}) + \frac{1}{2}] \).
QED
Now we calculate
$$ \begin{align} &E[T(n)] = \\ &E[ n + \tag{10}\label{10} \\ &\sum\limits_{s = 0}^{n-1}P(s = i)\{T(s) + T((n-1) - s)\} ] \\ \end{align} $$
Since we’re assuming that \(i\) is uniformly distributed, \(P(s = i) = \frac{1}{n}\), for all \(s\). We also observe that since the probabilities are all equal, we can write the right hand side of the summand in reverse order, which means that
$$ \begin{align} &\sum\limits_{s=0}^{n-1}\{\frac{1}{n} T((n-1) -s)\} \\ = &\sum\limits_{s=0}^{n-1} \{ \frac{1}{n}T(s) \} \\ \end{align} $$
This lets us rewrite equation (10) as
$$ \begin{align} &E[T(n)] = \\ &E[ n + \sum\limits_{s = 0}^{n-1}\frac{2}{n}T(s) ] \tag{11}\label{11} \\ \end{align} $$
Expected values can be split (since they’re integrals), so the right side of (11) is equal to \( E[n] + E[\sum\limits_{s = 0}^{n-1}\frac{2}{n}T(s)] \). Since \(n\) is a constant, \(E[n] = n \), and if we assume that \(E[s] = cs\ln(s) \), for all \( s < n\), then (11) and lemma (3) get us
$$ \begin{align} &E[T(n)] \leq n + \\ &\frac{2}{n}\frac{1}{2}c[ n^2(\ln(n) - \frac{1}{2}) + \frac{1}{2} ] \tag{12}\label{12} \\ \end{align} $$
The right hand side of (12) is equal to \( n + c[ n(\ln(n) - \frac{1}{2}) + \frac{1}{2n} ] \), which is equal to \( cn\ln(n) + n(1 - \frac{c}{2}) + \frac{c}{2n} \). If \(c\) is larger than \(3\), and \(n\) is larger than \(2\), then this is less than or equal to \(c n\ln(n) \). This proves that \( E[T(n)] \leq cn\ln(n) \).
QED
We haven’t proven that the final value of i
is uniformly distributed among the integers in \( [\text{start} .. \text{end}) \), and we won’t in this article. That is something worth revisiting in a later entry. This is it for now.