7

Assume that $1\le k \le n$ and let $\mathscr{Z}$ be the family of all subsets of $\{1,\ldots,n\}$ with at most $k$ elements. Pick a random element $X$ of $\mathscr{Z}$ (we consider the probablity distribution on $\mathscr{Z}$ is uniform, that is, each $X$ is chosen with probability $1/|\mathscr{Z}|$). What is the expected value for the random variable $\xi_n^k=|X|$, where $|X|$ stands for the cardinality of $X$?

In addition, what can be said about asymptotic behaviour of $\xi_n^{\lfloor n\delta\rfloor}$ for a fixed $\delta>0$?

  • 4
    How precise asymptotic behaviour do you need? It is not difficult to see that $\xi_n^{n\delta}\sim n\delta$ for fixed $0<\delta<1/2$, and $\xi_n^{n\delta}\sim n/2$ for $\delta\ge1/2$. More precisely, it should be $n\delta-O(1)$ in the first case, and $n/2-O(\sqrt n)$ in the second case. – Emil Jeřábek May 16 '18 at 18:45
  • 1
    Is the question in the body of the text really the same as the one in the title? – LSpice May 16 '18 at 20:41
  • Thanks Emil Jeřábek! These asymptotics seem sufficient for us. Is your answer related to Max Alekseyev answer below? – Dominik Kwietniak May 17 '18 at 06:55
  • 2
    @LSpice, I agree that it's not the same. Do you first pick a $k$-subset and then a subset of it, or do you first take all $(\leq k)$-subsets and then pick one of them. If $n = 2$ and $k = 1$, then the title question gives $1/4$ for ${1}$ and ${2}$, and $1/2$ for $\emptyset$. The content question yields $1/3$ each. No? – Christian Stump May 17 '18 at 14:09
  • Title has been changed to address the issues mentioned by LSpice and Christian Stump in their comments. – Dominik Kwietniak May 17 '18 at 14:23
  • 2
    As @EmilJeřábek perfectly explains in its answer : for $\delta < 1/2$ $\delta k -\xi_n^{\delta n}$ converge to the geometric law of parameter $\delta/(1-\delta)$. For $\delta > 1/2$ $\xi_n^{\delta n}$ converge to the gaussian of mean $n/2$ and variance $n/4$. And for $\delta =1/2$ it converge to the lower half of this gaussian. – RaphaelB4 May 17 '18 at 17:34

3 Answers3

10

As I understand the problem, the expected value is $$\frac{\sum_{i=0}^k i\binom{n}{i}}{\sum_{i=0}^k \binom{n}{i}}$$ which, for $k=n$, reduces by nice identities to $\frac{n}{2}$.

I don't know of nice formulas for partial sums of (weighted) binomial coefficients. Below are the simplified expressions for small values of $k$.

  • $k=1$: $\frac{n}{n+1} \sim 1$,
  • $k=2$: $\frac{2n^2}{n^2+n+2} \sim 2$,
  • $k=3$: $\frac{3n^3-3n^2+6n}{n^3+5n+6} \sim 3$,
  • $k=4$: $\frac{4n^4-12n^3+32n^2}{n^4-2n^3+11n^2+14n+24} \sim 4$.

(The coefficients in the denominator expressions are given by A054651; I don't find an OEIS entry for the analogous numerator coefficients.)

8

$\DeclareMathOperator\E{E}$As already noted by the other answers, $$\E\xi^k_n=\frac{\sum_{i=0}^ki\binom ni}{\sum_{i=0}^k\binom ni}.$$ One can then easily determine the asymptotics of $\E\xi_n^{\lfloor n\delta\rfloor}$ for fixed $0\le\delta\le1$:

Case 1: $0\le\delta<1/2$. Then $$k-\frac{\delta}{1-2\delta}\le\E\xi_n^k\le k,$$ where $k=\lfloor n\delta\rfloor$. This can be shown by approximation of $\binom ni$ by a geometric series. That is, we have $$0<i\le k\implies\binom n{i-1}=\frac i{n-i+1}\binom ni\le\frac i{n-i}\binom ni\le\frac\delta{1-\delta}\binom ni,$$ hence $$0\le j\le i\le k\implies \binom nj\le\left(\frac\delta{1-\delta}\right)^{i-j}\binom ni.$$

Thus, $$\begin{align} k-\E\xi^k_n=\frac{\sum_{i=0}^k(k-i)\binom ni}{\sum_{i=0}^k\binom ni} &=\frac{\sum_{j=1}^k\sum_{i=0}^{k-j}\binom ni}{\sum_{i=0}^k\binom ni}\\ &\le\frac{\sum_{j=1}^k\left(\frac\delta{1-\delta}\right)^j\sum_{i=j}^k\binom ni}{\sum_{i=0}^k\binom ni}\\ &\le\sum_{j=1}^k\left(\frac\delta{1-\delta}\right)^j\le\frac\delta{1-2\delta}. \end{align}$$

One can show that the lower bound is closer to the truth: $\E\xi^k_n=k-\frac\delta{1-2\delta}+O\bigl(\frac{\log n}n\bigr)$.

Case 2: $1/2<\delta\le1$. Then $\E\xi^k_n=\frac n2-O(\gamma^n)$ for some $\gamma<1$ (depending on $\delta$).

Indeed, Stirling bounds give $$2^n-\sum_{i=0}^k\binom ni=\sum_{i=k+1}^n\binom ni=O(\alpha^n)$$ for some constant $\alpha<2$, hence $$\E\xi^k_n=\frac{\sum_{i=0}^ni\binom ni+O(n\alpha^n)}{\sum_{i=0}^n\binom ni+O(\alpha^n)}=\frac n2+O\bigl(n(\alpha/2)^n\bigr).$$

Case 3: $\delta=1/2$. Then $$\frac n2-\frac{\sqrt n}2\le\E\xi^k_n\le\frac n2.$$ Indeed, if $Y$ is drawn from the binomial distribution $B(n,1/2)$, we have $$\frac n2-\E\xi^k_n=\E\left|Y-\frac n2\right|\le\sqrt{\E\left(Y-\frac n2\right)^2}=\sqrt{\operatorname{Var}Y}=\frac{\sqrt n}2.$$

In this case, approximation of the binomial distribution by Gaussian distribution with mean $n/2$ and variance $n/4$ suggests that the true value of $\E\xi^k_n$ should be roughly $\frac n2-\sqrt{\frac{n}{2\pi}}$, but I will not try to make this rigorous.

5

Let $f(n,k):=\sum_{i=0}^k\binom{n}{i}$. It can be seen that $$\sum_{i=0}^k i\binom{n}{i} = \sum_{i=1}^k n\binom{n-1}{i-1} = n\cdot f(n-1,k-1).$$ So, it remains to evaluate $$\frac{n\cdot f(n-1,k-1)}{f(n,k)},$$ where sharp bounds for the numerator and denominator are known -- e.g., see answers at Lower bound for sum of binomial coefficients?

Max Alekseyev
  • 30,425