7

I'm not sure if the question even makes sense, but I wonder if there's any categorical reason that explains importance of Gaussian/normal distribution. In the ordinary probability theory, I guess central limit theorem explains why the normal distribution is important (or at least shows up a lot). I just found there's something called categorical probability theory (and also this) that I don't fully understand. In this viewpoint, do we still have any central limit "theorem"? Otherwise, can we still give any meaningful result about "categorical" normal distribution (whatever it is) - would be great if it satisfy certain universal property.

Seewoo Lee
  • 1,911
  • 8
    As far as I know this is still wide open, although it's certainly something that people in the categorical probability community have given some thought. We're currently working on a categorical formulation and proof of certain laws of large numbers, including the Glivenko--Cantelli theorem. We have some initial indications that it may be possible to treat the central limit theorem in a similar fashion, but I don't want to speculate too much here. I'll be happy to write an answer as soon as we have more to say. – Tobias Fritz Jan 13 '24 at 02:02
  • @TobiasFritz It is good to know that people are actually interested in this problem - thank you for your reply! I hope there are some updates in near future. – Seewoo Lee Jan 13 '24 at 05:15

1 Answers1

6

$\newcommand{\R}{\mathbb{R}}$ Although this doesn't answer the question, it may be worth noting that there is a Markov category of Gaussian distributions and Gaussian Markov kernels in in Section 6 of Fritz's paper A synthetic approach to Markov kernels, conditional independence and theorems on sufficient statistics.

The objects in the category are the natural numbers. The morphisms $n\to m$ are the triples of matrices $(M,C,s)\in\R^{m\times n}\times\R^{m\times m}\times\R^{m\times1}$. Such a triple can be thought of as the linear statistical model $$Y:=MX+\xi, \tag{1}\label{1}$$ where $X$ is a random vector in $\R^{n\times1}$ and $\xi$ is a Gaussian random vector in $\R^{m\times1}$ (independent of $X$) with mean $s$ and covariance matrix $C$. This statistical model can alternatively be viewed as a transition kernel as they appear for example in Gauss–Markov models (where usually $s = 0$ is assumed).

The composition of two morphisms $$(M,C,s)\colon n\to m\quad\text{and}\quad(N,D,t)\colon m\to k$$ is then obtained by recalling \eqref{1} and similarly writing $Z:=NY+\eta$, with $\eta$ independent of $\xi$ — to get $Z=NMX+N\xi+\eta$, with $E(N\xi+\eta)=Ns+t$ and $\operatorname{Cov}(N\xi+\eta)=NCN^T+D$, thus resulting in the following definition of the composition of the two morphisms: $$(N,D,t)\circ(M,C,s):=(NM,NCN^T+D,Ns+t).$$

To see how this category is relevant in probability and statistics, consider for example the Kalman filter. If one formulates the theory of hidden Markov models and the Bayes filter for Markov categories in general, then this theory can be instatiated in $\mathsf{Gauss}$, where it specializes to the usual formulas that define the Kalman filter. This is part of a preprint that should appear very soon, see also this talk at ACT 2023.

Final remark: One can also construct a larger Markov category containing $\mathsf{Gauss}$ by dropping the assumption of $\xi$ being Gaussian, and taking general linear statistical models as morphisms that compose in a similar way.

Tobias Fritz
  • 5,785
Iosif Pinelis
  • 116,648
  • I've taken the liberty to rework this answer in a neutral tone, and also added a bit of additional info on how the category $\mathsf{Gauss}$ is relevant. – Tobias Fritz Jan 14 '24 at 00:11
  • 1
    @TobiasFritz : If "[o]ne can also construct a larger Markov category containing $\mathsf{Gauss}$ by dropping the assumption of $\xi$ being Gaussian", why even consider this $\mathsf{Gauss}$ category? I mean, are there any interesting results specific to the $\mathsf{Gauss}$ category, as opposed to other Markov categories? – Iosif Pinelis Jan 14 '24 at 01:11
  • 1
    @TobiasFritz : More generally, I have a question to you as apparently an expert in category theory. As I understand, the idea, w.r.t. to probability, is to translate a probabilistic object into the category theory language, then use category theory on the translated object to produce some result, then translate that result back into probabilistic terms and thus get something that was too hard, or not possible at all, to get by probabilistic tools. Is this understanding correct? If so, have there been any interesting results obtained this way? – Iosif Pinelis Jan 14 '24 at 01:21
  • That's a good question, and I'm not really aware of any results that would apply to $\mathsf{Gauss}$ but not more generally. The significance of $\mathsf{Gauss}$ as I see it is rather to formalize the idea that there is something like "Gaussian probability theory" which can function as a complete standalone theory of uncertainty. This is often used in particular in signal processing (as for the Kalman filter mentioned above; the Rauch-Tung-Striebel smoother is another instance of this idea) – Tobias Fritz Jan 14 '24 at 01:25
  • Yes, that understanding of categorical probability would be one of its goals, though getting new results for probability is of course a high bar. Perhaps the most interesting result that we have in this direction so far is that every Markov kernel between standard Borel spaces splits, which strenghtens a theorem of Blackwell. This has apparnetly not been proven measure-theoretically before (undoubtedly partly because the statement itself has a categorical flavour). – Tobias Fritz Jan 14 '24 at 01:30
  • 2
    A different point of view: categorical probability relates to ordinary probability in the same way as how ring theory relates to the theory of concrete number systems. You can prove some things about number systems in ring-theoretic terms, but doing so gives you vast generalizations of those results. In categorical probability, a Markov category can be thought of as a theory of uncertainty, of which probability theory is only one. (E.g. non-determinism in the computer science sense would be another.) Any result in categorical probability automatically applies to many theories of uncertainty. – Tobias Fritz Jan 14 '24 at 01:37
  • 2
    Conceptually, my feeling is that the CLT may turn out to be analogous to the fundamental theorem of algebra. The latter is a very specific result about one particular number system. But once you've developed enough theory of rings and fields, you can largely prove the fundamental theorem of algebra purely algebraically (together with some crucial analytic input). Similarly, the CLT is a result about one particular theory of uncertainty, and one may hope that categorical probability can become powerful enough to be able to do the heavy lifting in the proof while being more broadly applicable. – Tobias Fritz Jan 14 '24 at 01:47
  • @TobiasFritz : Thank you for your responses. My impression, as one who knows next to nothing in category theory, is that it formalizes the general idea of composition of relations (with functorial relations between different classes of compositions). In probability, compositions of relations (especially long chains or more complicated graphs of such compositions) don't seem to play a major role, except of course, for Markov chains or perhaps Markov processes in general. – Iosif Pinelis Jan 14 '24 at 01:55
  • @TobiasFritz : Even for Markov chains, specific probablistic/analytical tools -- rather than of those category theory, seem to be needed to obtain interesting results. What are your thoughts on this? – Iosif Pinelis Jan 14 '24 at 01:55
  • I guess it depends on what exactly you want to do, but there are surprisingly many things for which a categorical treatment is possible. For example, one might think that the de Finetti theorem requires analytical tools, but then again we've given a categorical proof. The trick is to find suitable axioms for Markov categories which enable one to formulate the statement and conduct the proof in categorical terms, and the analytical input is then reduced to showing that the axioms hold. – Tobias Fritz Jan 14 '24 at 02:02
  • Based on the same axioms as for de Finetti, we also have a tentative categorical proof of the Aldous-Hoover theorem, which is considered a much deeper result. On the other hand, certainly other types of statements, say on mixing times for Markov chains and eigenvalues, will be much less amenable to a categorical treatment. Just like how some statements about the real numbers do not have interesting generalizations to abstract algebra. Does this all seem reasonable to you? – Tobias Fritz Jan 14 '24 at 02:03
  • @TobiasFritz : Thank you for your further responses. – Iosif Pinelis Jan 14 '24 at 02:41
  • One more clarification: I agree that "long chains" of compositions don't play a major role in probability in general. But Markov categories are symmetric monoidal categories, which means that we also have a "parallel" composition of morphisms. In probability, this corresponds to forming the product of Markov kernels, of which product measures are a special case. Now combinations of these two kinds of compositions are ubiquitous in probability, even if it's not always obvious. – Tobias Fritz Jan 14 '24 at 17:14
  • @TobiasFritz : I see. Product measures is a game changer indeed. I should try to study this. – Iosif Pinelis Jan 14 '24 at 17:19