29

The following question was asked and not (yet) answered at Math Stack Exchange.

There are $n$ horses. At a time only $k$ horses can run in the single race. What is the minimum number of races required to find the top $m$ fastest horses? Please explain your answer.

The $n = 25, k = m = 5$ case was a Google interview question and there are various answers on the web. But I am not sure what the right answer should be for this. Any ideas?

  • Try math.stackexchange instead f MO. – Misha Nov 30 '12 at 16:33
  • 3
    @Misha: I have already posted the MSE link. – Debanjan Chanda Nov 30 '12 at 16:44
  • 3
    This seems like a nontrivial algorithms question to me. It may be a standard result for the right people, but I don't think it is obvious and I would be interested in learning the answer. Voting to reopen. – David E Speyer Nov 30 '12 at 20:43
  • 7
    If $k=2$, this is a well studied but not solved problem http://en.wikipedia.org/wiki/Partial_sorting . I have to assume that someone knows something about larger $k$. – David E Speyer Dec 01 '12 at 00:32
  • @David: This is even weaker than partial sorting, since we don't need to know the exact rank of elements $1$ to $m$, just the set of elements in those positions. – Zack Wolske Dec 02 '12 at 23:37
  • Indeed, for k=2 you are looking at $m$ order statistics, which is linear in $n$. – Ralph Furman Dec 03 '12 at 00:19
  • It is clear that O(nlogn) is an upper bound on what is necessary, simply by using a race as a comparison (and forgetting about 3 of the horses) and doing something like mergesort to get a complete ranking. For a complete ranking with races, information theory should yield the same order as a lower bound. For ranking a small fraction m of the n horses, it may be possible to do it in O(nlogm), but I would expect the constants for small m to outweigh logm, so practical methods would look like O(nm) in those cases. Gerhard "Ask Me About System Design" Paseman, 2012.12.02 – Gerhard Paseman Dec 03 '12 at 00:42
  • 7
    I just wanted to say that this is the same as Generalization of a horse-racing puzzle. – Dylan Pizzo Dec 24 '13 at 23:02
  • I believe the Google interview question was $m=3$ rather than $m=5$. – Eric Naslund Dec 25 '13 at 01:29
  • 8
    It isn't clearly stated whether we have to plan all the races in advance or we can plan each race after knowing the results of the previous races. The answer could change. – Brendan McKay Dec 25 '13 at 01:36
  • Phrase "Please explain your answer" sounds... I don't know. Is it necessary? – Włodzimierz Holsztyński Apr 10 '14 at 05:37
  • 3
    This has been asked before, now I cannot find the other question but see http://www.researchgate.net/publication/3042594_Sorting_n_objects_with_a_k-sorter – domotorp Apr 10 '14 at 09:57

1 Answers1

15

A trivial lower bound is $n/k$, since clearly every horse must race to obtain the answer.

I think we can get $O(n/k)$ upper bound, by adapting the median-of-medians selection algorithm.

First, note that, up to a constant factor, this problem is equivalent to finding the $m$th best horse. For the reduction one way, simply find the $m$ best horses and the $m-1$ best horses and see which horse mysteriously disappeared. For the reduction the other way, find the $m$th best horse, then race every other against it to find which are better and which are worse. (In fact, you don't need to do this - it's not hard to check that if you've found the $m$th best horse through repeated racing, you already know which are better and which are worse.)

For $k=2$, the median-of-medians algorithm is known to give an $O(n)$ time bound, as was pointed out by Ralph Furmaniak. We will also use this to handle $k\leq 4$.

So assume $k\geq 5$ and is odd. Let $T(n)$ be the time it takes to find the $m$th horse among $n$ horses. Then divide the horses into groups of $k$, race them in time $n/k$, and take the median of each. Then select the median of those medians in time $T(n/k)$, and pivot on it in time $n/(k-1)$. (to pivot, the pivot horse must race each other horse only once. This allows us to remove at least $n(k+1)/4k$ of the horses, so we can find the $m$th horse in time $T(n (3k-1)/4k)$. So we get:

$$T(n) \leq T\left(\frac{n}{k}\right) + T\left( \frac{n(3k-1)}{4k}\right) + \frac{n}{k}+\frac{n}{k-1} $$

By induction,

$$T(n) \leq \frac{4k}{k-3} \cdot \left( \frac{n}{k}+\frac{n}{k-1} \right) = O \left(\frac{n}{k}\right)$$

In fact we obtain an explicit constant of $8+o(1)$.

This ignores non-unique divisibility, which should only lead to a small error term unless $k$ is very large as a fraction of $n$.

Will Sawin
  • 135,926