1

I have been struggling quite a bit with reconciling my intuitive understanding of probability distributions with the weird properties that almost all topologies on probability distributions possess.

For example, consider a mixture random variable $X_n$: pick a Gaussian centered at 0 with variance 1, and with probability $\frac{1}{n}$, add $n$ to the result. A sequence of such random variables would converge (weakly and in total variation) to a Gaussian centered at 0 with variance 1, but the mean of the $X_n$ is always $1$ and the variances converge to $+\infty$. I really don't like saying that this sequence converges because of that.

edit: $X_n$ has density

$$p_n(x) = \frac{n-1}{n} g(x) + \frac{1}{n} g(x-n)$$

where $g$ is the density of the gaussian with unit variance and mean 0

I took me quite some time to remember everything I've forgotten about topologies, but I finally figured out what was so unsatisfying to me about such examples: the limit of the sequence is not a conventional distribution. In the example above, the limit is a weird "Gaussian of mean 1 and of infinite variance". In topological terms, the set of probability distributions isn't complete under the weak (and TV, and all the other topologies I've looked at).

(note:the problem remains with probability measures)

I then face the following question:

  • does there exist a topology such that the ensemble of probability distributions is complete ?

  • If no, does that absence reflect an interesting property of the ensemble of probability distributions ? Or is it just boring ?

Original post here (crosspost crossvalidated!): https://stats.stackexchange.com/questions/186670/topologies-for-which-the-ensemble-of-probability-distributions-is-complete

  • 3
    The space of regular Borel probability measures on $\mathbb{R}$ is complete with respect to the total variation norm; by the Riesz representation theorem you can identify it with a closed subset of the dual of the Banach space $C_0(\mathbb{R})$, for instance. I'm a little confused by your example, but if the distributions really do converge in total variation then the limit is a regular Borel probability measure. – Paul Siegel Dec 17 '15 at 11:42
  • That's interesting. The sequence of $X_n$ do converge in TV (the distance is upper-bounded: $d(X_n,X_\infty)\leq 1/n$). Is being complete weirdly defined for measures ? – Guillaume Dehaene Dec 17 '15 at 15:41
  • Completeness isn't anything weird for measures, but the measures themselves can be weird. Could you elaborate on exactly what your $X_n$ are, perhaps by writing out their density functions? – Paul Siegel Dec 17 '15 at 15:51
  • I think that maybe the concept you need is compactness not completeness. The relevant concept is often called tightness. – kjetil b halvorsen Dec 17 '15 at 16:05
  • It seems your problem is opposite of what you state: in your example, the sequence of measures do have an actual probability measure as a measure as a limit, but you want the limit to be not a probability measure. Maybe what you want is not a topology under which the space of probability measures is complete, but rather a completion of the space under your favorite topology. – Yoav Kallus Dec 17 '15 at 16:26
  • I have added the densities of the $X_n$ in the main post for @PaulSiegel. My problem is that the limit isn't really N(0,1) because the limit mean is 1 and the limit variance is infinite: maybe your way of phrasing it is the better way. Would it make sense to complete the space under TV for example ? – Guillaume Dehaene Dec 20 '15 at 18:45
  • 1
    What do you mean by what "the limit is really"? Under total variation, as you noted, the limit is really N(0,1). – Yoav Kallus Dec 20 '15 at 19:06
  • It seems the reason you don't like the convergence is that the moments don't converge? Maybe you should just look at convergence of moments directly? Something like in this question: http://mathoverflow.net/questions/102964/convergence-of-moments-implies-convergence-to-normal-distribution – Yoav Kallus Dec 20 '15 at 19:14
  • 3
    OK, I agree that your $X_n$ converge in TV to $N(0,1)$, and that the first two moments of $X_n$ converge to $1$ and $\infty$, respectively. But this is not an issue with completeness, it's an issue with continuity. I think you are really trying to ask: "Is there a topology on the space of probability distributions with the property that the functions $X \mapsto E(X)$ and $X \mapsto Var(X)$ defined in this space are continuous? – Paul Siegel Dec 20 '15 at 20:17
  • I'm pretty sure that the answer to this question is "no" - there are just too many counterexamples. My guess is that the best you can hope for is convergence theorems for moments of specific classes if random variables, but I'm not sure. – Paul Siegel Dec 20 '15 at 20:20
  • @PaulSiegel I think you have your finger on the issue, but would just like to add that since not every probability distribution has finite variance, $X\to {\rm var}(X)$ isn't even well-defined -- unless you're thinking of it as taking values in the extended positive reals? – Yemon Choi Dec 21 '15 at 00:44
  • Guillaume: perhaps the following analogy would help. Consider $C[0,1]$ and a sequence $(f_n)\subset C[0,1]$ where $f_n(0)=0$, $f_n(n^{-2})=1/n$, $f_n(1)=1$ and we use piecewise-linear interpolation to define $f_n$ everywhere else. Now $C[0,1]$ is complete in the uniform norm, and $(f_n)$ is a sequence in the unit sphere of this Banach space which converges to the function $g(t)=t$ in the uniform norm. On the other hand, if we look at the Lipschitz constants of these functions $f_n$, we see that they blow up as $n\to\infty$, even though the limit function $g$ has Lipschitz constant $1$ – Yemon Choi Dec 21 '15 at 00:56

1 Answers1

1

As I asserted in my comments, I think it is too much to hope for a reasonable topology on the space of random variables which makes the map $X \mapsto E(X)$ continuous. This is a bit like hoping that pointwise convergence or convergence in measure implies convergence in the $L^1$ norm; it seems reasonable, but there are simple counter-examples.

But all is not lost. In analysis one salvages the situation by making stronger assumptions: for instance, if $f_n \to f$ pointwise and $|f_n(x)| \leq g(x)$ for some integrable function $g$ and all $n$, then $\int f_n \to \int f$ (the dominated convergence theorem). There is a sort of counterpart to this in probability theory.

Definition: A sequence $X_n$ of random variables is uniformly integrable if:

  • $E(|X_n|)$ is uniformly bounded in $n$: there is a constant $K$ such that $E(|X_n|) \leq K$ for all $n$
  • For every $\varepsilon > 0$ there exists $\delta > 0$ such that $\int_A |X_n| < \varepsilon$ for all $n$ whenever $P(A) < \delta$

Uniform integrability is implied by the stronger (but more easily checked) condition that $E(|X_n|^{1 + \delta})$ is uniformly bounded for some $\delta > 0$.

Theorem: Suppose $X_n$ converges to $X$ in distribution and the sequence $|X_n|^k$ is uniformly integrable. Then $E(X_n^j) \to E(X^j)$ whenever $1 \leq j \leq k$. (Reference)

There are some other results of this flavor in the wikipedia page on convergence of random variables but this theorem is the best result that I know.

Paul Siegel
  • 28,772
  • Uniform integrability reminds me a little of equicontinuity. I wonder if the theorem above is a special case of a suitable generalization of the Arzela-Ascoli theorem. – Paul Siegel Dec 20 '15 at 23:25
  • 1
    I sort of agree with the basic point, but the choice of example in your first paragraph is unfortunate, if I am not mistaken; the map $f\to \int f$ is continuous as a linear functional on the normed space $L^1({\bf R})$ – Yemon Choi Dec 21 '15 at 00:45
  • @YemonChoi Arg, continuity of that map is essentially a tautology. After recovering from flashbacks to my qualifying exams in graduate school, I amended the answer. – Paul Siegel Dec 21 '15 at 01:28
  • @PaulSiegel: thank you for the answer ! However, there is a simple metric topology which I think works (though, since I've forgotten my topology, I'm not completely sure what "making the map continuous" is exactly). That's the Wasserstein-1 metric. More generally, the Wasserstein-k metric (I think) works for all statistics with at most polynomial growth – Guillaume Dehaene Jan 04 '16 at 07:36