If I randomly sample with replacement $P$ times from a set of all possible binary strings of length $L$, what is a good lowerbound on the expected minimum Hamming distance between any two of my $P$ strings? Can we generalize this for larger ternary/etc. string alphabets?
-
4possibly relevant --- http://mathoverflow.net/questions/16359/minimum-hamming-distance-distribution-in-a-random-subset-of-binary-vectors – tergi Aug 14 '12 at 15:14
-
@tergi Gabor Lugosi's bound seems to be quite loose. I mean absolutely no offense when saying this, but is it possible to do a bit better? – Roger S. Aug 14 '12 at 16:28
-
@tergi I also edited the problem to specify that I'm looking for a lowerbound on the expected minimum Hamming distance. – Roger S. Aug 14 '12 at 16:31
-
A trivial bound can be obtained by considering this as an instance of the "birthday problem". There are $2^L$ birthdays and $p$ people. The probability we get no matching birthdays is $(2^L-p+1)(2^L-p+2)\cdots 2^L/(2^L)^p$. Set $z = 2^L-p$. Then, the probability of no matches is $(1+z/p)^{-p} z(z+1)\cdots (z+p)/z^{p+1} \sim p! z^{p-1} / (z+p)^p \Gamma(z)$, and indeed, the convergence $p! z^p / \prod_{k=0}^p (z+k) \to \Gamma(z)$ is monotonically increasing. So, we have $$ \mathbb E \min_{i,j} H(x_i,x_j) \geq \frac{p! z^{p-1}}{(z+p)^p \Gamma(z)} \geq \frac{p! e^{-p^2/z}}{z \Gamma(z)}.$$ – cardinal Aug 14 '12 at 17:52
-
The OP might want to enlighten us as to what sort of bound he is hoping for... – Igor Rivin Aug 14 '12 at 17:56
-
Not so related but still: http://mathoverflow.net/questions/89399/how-many-vectors-of-hamming-weight-l-in-random-k-dimensional-subspace-of-f-2n – Alexander Chervov Aug 15 '12 at 10:55
1 Answers
The expected value of the distance is the sum of the probabilities that the distance is greater than $d$ for $d=0,1,2, ...$. Unfortunately, it's hard to determine the exact value of this probability. Whether this is greater than $0$ determines whether a code of that size exists.
There are some easy bounds on the probabilities which translate to bounds on the expected minimum Hamming distance. Take any ordering of the points, and check whether each point is consistent with the previous points. As an upper bound, the balls of radius $\lfloor d/2 \rfloor$ about the previous points must be disjoint, and the current point must not be contained in these balls. As a lower bound, each previous choice rules out at most a ball of radius $d$ new points.
Let $B(L,d,q) = \sum_{i=0}^d {L \choose i}(q-1)^i$ be the size of the ball of radius $d$ with an alphabet of size $q$. Let $Pr(L,d,q,P)$ be the probability that $P$ random points are at distances strictly greater than $d$ from each other.
$$ \prod_{i=1}^{P-1} \max(0,1-\frac{iB(L,d,q)}{q^L}) \le Pr(L,d,q,P) \le \prod_{i=1}^{P-1} \max(0,1-\frac{iB(L,\lfloor d/2 \rfloor, q)}{q^L}).$$
The upper bound on the expected minimum Hamming distance from summing the probabilities is precisely twice the lower bound, so these bounds get you within a factor of $2$. Actually, that upper bound on the probabilities is so weak that a better bound in many situations is $Pr(L,d,q,P) \le (1-B(L,d,q)/q^L)^{P-1}$. You get that by saying that the second and subsequent points have to avoid the ball of radius $d$ about the first point.
For example, suppose you take $q=2, L=20, P=10.$ The lower bound on the expected minimum Hamming distance is $5.10246$. If you use the second upper bound when it is better, the upper bound on the expected minimum distance is $6.71209.$

- 27,806
-
@Douglas Zare, also, how do we justify the assumption for the upper bound that balls of radius Floor[d/2] about the previous points must be disjoint? – Roger S. Aug 15 '12 at 16:02
-
If not, then two of the previous points must be within distance $d$ of each other. – Douglas Zare Aug 15 '12 at 16:57