Is there an introduction to probability theory from a structuralist/categorical perspective?

Question

The title really is the question, but allow me to explain.

I am a pure mathematician working outside of probability theory, but the concepts and techniques of probability theory (in the sense of Kolmogorov, i.e., probability measures) are appealing and potentially useful to me. It seems to me that, perhaps more than most other areas of mathematics, there are many, many nice introductory (as well as not so introductory) texts on this subject.

However, I haven't found any that are written from what it is arguably the dominant school of thought of contemporary mainstream mathematics, i.e., from a structuralist (think Bourbaki) sensibility. E.g., when I started writing notes on the texts I was reading, I soon found that I was asking questions and setting things up in a somewhat different way. Here are some basic questions I couldn't stop from asking myself:

[0) Define a Borel space to be a set $X$ equipped with a $\sigma$-algebra of subsets of $X$. This is already not universally done (explicitly) in standard texts, but from a structuralist approach one should gain some understanding of such spaces before one considers the richer structure of a probability space.]

What is the category of Borel spaces, i.e., what are the morphisms? Does it have products, coproducts, initial/final objects, etc? As a significant example here I found the notion of the product Borel space -- which is exactly what you think if you know about the product topology -- but seemed underemphasized in the standard treatments.
What is the category of probability spaces, or is this not a fruitful concept (and why?)? For instance, a subspace of a probability space is, apparently, not a probability space: is that a problem? Is the right notion of morphism of probability spaces a measure-preserving function?
What are the functorial properties of probability measures? E.g., what are basic results on pushing them forward, pulling them back, passing to products and quotients, etc. Here again I will mention that product of an arbitrary family of probability spaces -- which is a very useful-looking concept! -- seems not to be treated in most texts. Not that it's hard to do: see e.g.

http://alpha.math.uga.edu/~pete/saeki.pdf

I am not a category theorist, and my taste for how much categorical language to use is probably towards the middle of the spectrum: that is, I like to use a very small categorical vocabulary (morphisms, functors, products, coproducts, etc.) as often as seems relevant (which is very often!). It would be a somewhat different question to develop a truly categorical take on probability theory. There is definitely some nice mathematics here, e.g. I recall an arxiv article (unfortunately I cannot put my hands on it at this moment) which discussed independence of events in terms of tensor categories in a very persuasive way. So answers which are more explicitly categorical are also welcome, although I wish to be clear that I'm not asking for a categorification of probability theory per se (at least, not so far as I am aware!).

I am certainly not an expert, but I was looking for a similar thing, and found Dudley's book (http://books.google.com/books?id=Wv_zxEExK3QC&lpg=PP1&dq=dudley%20probablity&pg=PA259#v=onepage&q&f=false) promising. He doesn't mention categories at all, but it seems that he has them in mind. In particular, he defined "measurable function" between any two measurable spaces (p. 116), [which is different from the definition in Rudin]. Also, while he proves the existence of countable products of probability spaces, he does remark on converting the proof to an arbitrary product (p. 259). — user2734, Apr 08 '10 at 16:16
This is not developed enough to be a (partial) answer rather than a comment, but see perhaps: http://golem.ph.utexas.edu/category/2007/02/category_theoretic_probability_1.html (and other google/Mathscinet results for "Giry monad") — Yemon Choi, Apr 08 '10 at 16:16
One thing I thought I'd mention - as a probablist manqué - is a comment at the beginning of Williams' Probability with Martingales, where he says something along lines of "it would be nice if we could think of random variables as equivalence classes of functions rather than functions, so that we don't need to keep inserting 'a.e.' everywhere; but this point of view runs into trouble when dealing with continuous-time stochastic processes". Which implies he is not keen on 'structuralist POV', although it doesn't rule out the possibility. — Yemon Choi, Apr 08 '10 at 16:20
Something that you may want to consider is the fact that probability spaces are not the essential objects in probability, for at least two reasons. First, it is very common to change the underlying probability space, as long as the distributions of the relevant random variables remain the same. This allows to consider new events along the way. As suggested in Neel's answer, this may have a categorical formulation. But worst than that is the fact that often (every time martingales appear, at least) you want to leave the space unchanged and vary the sigma-algebra. — Andrea Ferretti, Apr 09 '10 at 16:29
Indeed, one of the major differences between measure theory and probability theory (besides the perspective being completely different) is that in measure theory one fixes one sigma algebra, and in probability one considers relationships between multiple sigma algebras. — Mark Meckes, Apr 09 '10 at 16:54
A beautiful, structural account of Borel spaces and measurable mappings is given in: http://www.ma.utexas.edu/mp_arc/c/02/02-156.pdf The category of Borel spaces and measurable mappings is sometimes denoted by meas (for measurable spaces, the nowadays more common name for Borel spaces), but the only real application I know of is in epistemic game theory, see for example chapter 7 here: https://scholarworks.iu.edu/dspace/bitstream/handle/2022/7065/umi-indiana-1146.pdf?sequence=1 — Michael Greinecker, Jan 01 '12 at 00:55
@Michael and @Pete I am bit confused with the use of Borel space for measurable space and not for a Borel subset of a Polish space. Is it a usual convention? — SBF, Mar 31 '12 at 20:06
@Ilya: I believe this use of "Borel space" is a reasonably standard convention, although of course it is not the only one. — Pete L. Clark, Apr 06 '12 at 03:36
A paper from 2013: A categorical foundation for Bayesian probability by J. Culbertson and K. Sturtz — Andrew, Dec 05 '17 at 00:25
A paper that I can never seem to find but want to read is the PhD thesis by Fred Linton. According to the math genealogy database it is entitled "The Functorial Foundations of Measure Theory". It is held at Columbia University, the US Library of Congress, and a university in Germany, from what I can tell. — Samantha Y, Feb 04 '19 at 07:14
An extremely belated addition to @MichaelGreinecker's link to Berberian's notes (which I see include the author's lament for the butchery perpetrated by the published printed version): https://web.ma.utexas.edu/mp_arc-bin/mpa?yn=02-156 is a link with an abstract and slightly more metadata — Yemon Choi, May 16 '20 at 06:24
@SamanthaY: Linton's PhD thesis is available on his homepage: http://tlvp.net/~fej.math.wes/FredLintonPhDThesis_compressed.pdf — Dmitri Pavlov, May 06 '21 at 02:07

Dmitri Pavlov · Accepted Answer · 2020-05-15T20:24:36.693

$\def\Spec{\mathop{\rm Spec}} \def\R{{\bf R}} \def\Ep{{\rm E}^+} \def\L{{\rm L}} \def\EpL{\Ep\L}$ One can argue that an object of the right category of spaces in measure theory is not a set equipped with a σ-algebra of measurable sets, but rather a set $S$ equipped with a σ-algebra $M$ of measurable sets and a σ-ideal $N$ of negligible sets, i.e., sets of measure 0. The reason for this is that you can hardly state any theorem of measure theory or probability theory without referring to sets of measure 0. However, objects of this category contain less data than the usual measured spaces, because they are not equipped with a measure. Therefore I prefer to call them enhanced measurable spaces, since they are measurable spaces enhanced with a σ-ideal of negligible sets. A morphism of enhanced measurable spaces $(S,M,N)→(T,P,Q)$ is a map $S\to T$ such that the preimage of every element of $P$ is a union of an element of $M$ and a subset of an element of $N$ and the preimage of every element of $Q$ is a subset of an element of $N$.

Irving Segal proved in “Equivalences of measure spaces” (see also Kelley's “Decomposition and representation theorems in measure theory”) that for an enhanced measurable space $(S,M,N)$ that admits a faithful measure (meaning $μ(A)=0$ if and only if $A∈N$) the following properties are equivalent.

The Boolean algebra $M/N$ of equivalence classes of measurable sets is complete;
The space of equivalence classes of all bounded (or unbounded) real-valued functions on $S$ modulo equality almost everywhere is Dedekind-complete;
The Radon-Nikodym theorem is true for $(S,M,N)$;
The Riesz representation theorem is true for $(S,M,N)$ (the dual of $\L^1$ is isomorphic to $\L^∞$);
Equivalence classes of bounded functions on $S$ form a von Neumann algebra (alias W*-algebra).

An enhanced measurable space that satisfies these conditions (including the existence of a faithful measure) is called localizable. This theorem tells us that if we want to prove anything nontrivial about measurable spaces, we better restrict ourselves to localizable enhanced measurable spaces. We also have a nice illustration of the claim I made in the first paragraph: none of these statements would be true without identifying objects that differ on a set of measure 0. For example, take a nonmeasurable set $G$ and a family of singleton subsets of $G$ indexed by themselves. This family of measurable sets does not have a supremum in the Boolean algebra of measurable sets, thus disproving a naive version of (1).

But restricting to localizable enhanced measurable spaces does not eliminate all the pathologies: one must further restrict to the so-called compact and strictly localizable enhanced measurable spaces, and use a coarser equivalence relation on measurable maps: $f$ and $g$ are weakly equal almost everywhere if for any measurable subset $B$ of the codomain the symmetric difference $f^*B⊕g^*B$ of preimages of $B$ under $f$ and $g$ is a negligible subset of the domain. (For codomains like real numbers this equivalence relation coincides with equality almost everywhere.)

An enhanced measurable space is strictly localizable if it splits as a coproduct (disjoint union) of σ-finite (meaning there is a faithful finite measure) enhanced measurable spaces. An enhanced measurable space $(X,M,N)$ is (Marczewski) compact if there is a compact class $K⊂M$ such that for any $m∈M∖N$ there is $k∈K∖N$ such that $k⊂m$. Here a compact class is a collection $K⊂2^X$ of subsets of $X$ such that for any $K'⊂K$ the following finite intersection property holds: if for any finite $K''⊂K'$ we have $⋂K''≠∅$, then also $⋂K'≠∅$.

The best argument for such restrictions is the following Gelfand-type duality theorem for commutative von Neumann algebras.

Theorem. The following 5 categories are equivalent.

The category of compact strictly localizable enhanced measurable spaces with measurable maps modulo weak equality almost everywhere.
The category of hyperstonean topological spaces and open continuous maps.
The category of hyperstonean locales and open maps.
The category of measurable locales (and arbitrary maps of locales).
The opposite category of commutative von Neumann algebras and normal (alias ultraweakly continuous) unital *-homomorphisms.

I actually prefer to work with the opposite category of the category of commutative von Neumann algebras, or with the category of measurable locales. The reason for this is that the point-set definition of a measurable space exhibits immediate connections only (perhaps) to descriptive set theory, and with additional effort to Boolean algebras, whereas the description in terms of operator algebras or locales immediately connects measure theory to other areas of the central core of mathematics (noncommutative geometry, algebraic geometry, complex geometry, differential geometry, topos theory, etc.).

Additionally, note how the fourth category (measurable locales) is a full subcategory of the category of locales. Roughly, the latter can be seen as a slight enlargement of the usual category of topological spaces, for which all the usual theorems of general topology continue to hold (e.g., Tychonoff, Urysohn, Tietze, various results about paracompact and uniform spaces, etc.). In particular, there is a fully faithful functor from sober topological spaces (which includes all Hausdorff spaces) to locales. This functor is not surjective, i.e., there are nonspatial locales that do not come from topological spaces. As it turns out, all measurable locales (excluding discrete ones) are nonspatial. Thus, measure theory is part of (pointfree) general topology, in the strictest sense possible.

The non-point-set languages (2–5) are also easier to use in practice. Let me illustrate this statement with just one example: when we try to define measurable bundles of Hilbert spaces on a compact strictly localizable enhanced measurable space in a point-set way, we run into all sorts of problems if the fibers can be nonseparable, and I do not know how to fix this problem in the point-set framework. On the other hand, in the algebraic framework we can simply say that a bundle of Hilbert spaces is a Hilbert W*-module over the corresponding von Neumann algebra.

Categorical properties of von Neumann algebras (hence of compact strictly localizable enhanced measurable spaces) were investigated by Guichardet in “Sur la catégorie des algèbres de von Neumann”. Let me mention some of his results, translated in the language of enhanced measurable spaces. The category of compact strictly localizable enhanced measurable spaces admits equalizers and coequalizers, arbitrary coproducts, hence also arbitrary colimits. It also admits products (and hence arbitrary limits), although they are quite different from what one might think. For example, the product of two real lines is not $\R^2$ with the two obvious projections. The product contains $\R^2$, but it also has a lot of other stuff, for example, the diagonal of $\R^2$, which is needed to satisfy the universal property for the two identity maps on $\R$. The more intuitive product of measurable spaces ($\R\times\R=\R^2$) corresponds to the spatial tensor product of von Neumann algebras and forms a part of a symmetric monoidal structure on the category of measurable spaces. See Guichardet's paper for other categorical properties (monoidal structures on measurable spaces, flatness, existence of filtered limits, etc.).

Another property worthy of mentioning is that the category of commutative von Neumann algebras is a locally presentable category, which immediately allows one to use the adjoint functor theorem to construct commutative von Neumann algebras (hence enhanced measurable spaces) via their representable functors.

Finally let me mention pushforward and pullback properties of measures on enhanced measurable spaces. I will talk about more general case of $\L^p$-spaces instead of just measures (i.e., $\L^1$-spaces). For the sake of convenience, denote $\L_p(M)=\L^{1/p}(M)$, where $M$ is an enhanced measurable space. Here $p$ can be an arbitrary complex number with a nonnegative real part. We do not need a measure on $M$ to define $\L_p(M)$. For instance, $\L_0$ is the space of all bounded functions (i.e., the commutative von Neumann algebra corresponding to $M$), $\L_1$ is the space of finite complex-valued measures (the dual of $\L_0$ in the ultraweak topology), and $\L_{1/2}$ is the Hilbert space of half-densities. I will also talk about extended positive part $\EpL_p$ of $\L_p$ for real $p$. In particular, $\EpL_1$ is the space of all (not necessarily finite) positive measures on $M$.

Pushforward for $\L_p$-spaces. Suppose we have a morphism of enhanced measurable spaces $M\to N$. If $p=1$, then we have a canonical map $\L_1(M)\to\L_1(N)$, which just the dual of $\L_0(N)→\L_0(M)$ in the ultraweak topology. Geometrically, this is the fiberwise integration map. If $p≠1$, then we only have a pushforward map of the extended positive parts, namely, $\EpL_p(M)→\EpL_p(N)$, which is nonadditive unless $p=1$. Geometrically, this is the fiberwise $\L_p$-norm. Thus $\L_1$ is a functor from the category of enhanced measurable spaces to the category of Banach spaces and $\EpL_p$ is a functor to the category of “positive homogeneous $p$-cones”. The pushforward map preserves the trace on $\L_1$ and hence sends a probability measure to a probability measure.

To define pullbacks of $\L_p$-spaces (in particular, $\L_1$-spaces) one needs to pass to a different category of enhanced measurable spaces. In the algebraic language, if we have two commutative von Neumann algebras $A$ and $B$, then a morphism from $A$ to $B$ is a usual morphism of commutative von Neumann algebras $f\colon A\to B$ together with an operator valued weight $T\colon\Ep(B)\to\Ep(A)$ associated to $f$. Here $\Ep(A)$ denotes the extended positive part of $A$. (Think of positive functions on $\Spec A$ that can take infinite values.) Geometrically, this is a morphism $\Spec f\colon\Spec B\to\Spec A$ between the corresponding enhanced measurable spaces and a choice of measure on each fiber of $\Spec f$. Now we have a canonical additive map $\EpL_p(\Spec A)\to\EpL_p(\Spec B)$, which makes $\EpL_p$ into a contravariant functor from the category of enhanced measurable spaces and measurable maps equipped with a fiberwise measure to the category of “positive homogeneous additive cones”.

If we want to have a pullback of $\L_p$-spaces themselves and not just their extended positive parts, we need to replace operator valued weights in the above definition by finite complex-valued operator valued weights $T\colon B\to A$ (think of a fiberwise finite complex-valued measure). Then $\L_p$ becomes a functor from the category of enhanced measurable spaces to the category of Banach spaces (if the real part of $p$ is at most $1$) or quasi-Banach spaces (if the real part of $p$ is greater than $1$). Here $p$ is an arbitrary complex number with a nonnegative real part. Notice that for $p=0$ we get the original map $f\colon A\to B$ and in this (and only this) case we do not need $T$.

Finally, if we restrict ourselves to an even smaller subcategory defined by the additional condition $T(1)=1$ (i.e., $T$ is a conditional expectation; think of a fiberwise probability measure), then the pullback map preserves the trace on $\L_1$ and in this case the pullback of a probability measure is a probability measure.

There is also a smooth analog of the theory described above. The category of enhanced measurable spaces and their morphisms is replaced by the category of smooth manifolds and submersions, $\L_p$-spaces are replaced by bundles of $p$-densities, operator valued weights are replaced by sections of the bundle of relative 1-densities, the integration map on 1-densities is defined via Poincaré duality (to avoid any dependence on measure theory) etc. There is a forgetful functor that sends a smooth manifold to its underlying enhanced measurable space.

Of course, the story does not end here, there are many other interesting topics to consider: products of measurable spaces, the difference between Borel and Lebesgue measurability, conditional expectations, etc. An index of my writings on this topic is available.

This is amazing! Where can I read about these ideas and further developments? For instance i'd really love to see information theory developed with this language. — Saal Hardali, Oct 31 '17 at 13:08
@SaalHardali: I am not aware of any exposition of information theory in this language. I have been meaning to write a survey of measure theory along the lines outlined in my answer for a while, but my research leaves me very little time for side projects. That being said, some additional material can be found in my paper https://arxiv.org/abs/1309.7856. — Dmitri Pavlov, Oct 31 '17 at 17:13
What does 'contains' in "The product contains $\mathbf R^2$, but it also has a lot of other stuff, for example, the diagonal of $\mathbf R^2$" mean? That is, does it mean, for example, that the set "diagonal of $\mathbf R^2$" is a point of the product of the two copies of $\mathbf R$? — LSpice, Jan 19 '18 at 18:38
@LSpice: It means that the categorical product is a disjoint union of R^2 (with its Lebesgue measurable/negligible sets), R^1, and a lot of other stuff. The diagonal is not a point, it is isomorphic to R and splits off as a disjoint summand. — Dmitri Pavlov, Jan 19 '18 at 23:52
Ah, I see. So the diagonal in some sense "occurs twice", once in the expected fashion as a subset of $\mathbf R^2$? — LSpice, Jan 20 '18 at 16:56
@LSpice: Yes, although one of the diagonals is a negligible set, hence invisibile in the category. — Dmitri Pavlov, Jan 20 '18 at 22:08
@DmitriPavlov, the proposition in the grey field, where is it proved exactly: "The category of localizable measurable spaces is equivalent to the category of commutative von Neumann algebras (alias W-algebras) and their morphisms (normal unital homomorphisms of -algebras)." — Sergei Akbarov, Jun 05 '18 at 15:00
Actually, it would be good if you add more references to this text. The link to the paper by Irving Segal would be also useful. — Sergei Akbarov, Jun 05 '18 at 15:29
@SergeiAkbarov: I added references to the papers by Segal and Kelley, a remark about the Riesz representation theorem, and a source for the measurable version of Gelfand duality. — Dmitri Pavlov, Jun 05 '18 at 15:44
Dmitri, Takesaki's Theorem III.1.18 is a weaker proposition. What is meant by the morphisms in the category of measurable spaces? And the reference is needed. — Sergei Akbarov, Jun 05 '18 at 15:57
@SergeiAkbarov: Morphisms of measurable spaces are defined in the first paragraph of my answer. There is no single source for the entire proof, it must be assembled from several pieces. Theorem III.1.18 addresses the most difficult part (how to construct a measurable space from a von Neumann algebra), and the other difficult part is supplied by the von Neumann-Maharam lifting theorem (how to construct a morphism of measurable spaces from a homomorphism of von Neumann algebras). I added a reference to Fremlin's book. — Dmitri Pavlov, Jun 05 '18 at 16:20
@SergeiAkbarov: An additional discussion of sources is found here: https://mathoverflow.net/questions/23408/reference-for-the-gelfand-neumark-theorem-for-commutative-von-neumann-algebras — Dmitri Pavlov, Jun 05 '18 at 16:21
Dmitri, I still don't understand something. If the measure $\mu$ is not supposed to be given on the measurable space $(S,M,N)$, then how is the functor of passage from the measurable space $(S,M,N)$ to the corresponding von Neumann algebra defined? As far as I understand, this is not $\mu\mapsto L^\infty(\mu)$... — Sergei Akbarov, Jun 05 '18 at 17:24
@SergeiAkbarov: To construct a von Neumann algebra from (S,M,N), take the complex *-algebra of Borel measurable functions S→C and mod out by the *-ideal of functions that vanish outside of an element of N. By Segal's theorem this is a von Neumann algebra. (More elegantly, one can construct C as a *-algebra in measurable spaces, then morphisms (S,M,N)→(C,M_C,N_C) automatically form a *-algebra.) — Dmitri Pavlov, Jun 06 '18 at 01:25
The relevant theorem in Fremlin's book appears to be 343B, but I do not see how to bridge the gap between complete strictly localizable measure spaces and the localizable measurable spaces of your answer. — Andre Kornell, Nov 27 '18 at 23:42
@AndreKornell: Completeness is taken care of by 322D. Strict localizability is taken care of by 322Ld(ii). — Dmitri Pavlov, Nov 28 '18 at 05:03
Dmitri, I am looking on this page from time to time and I see more details. From what is written it follows that if the Boolean algebra $M/N$ is complete, then there exists a localizable measure on $M$ with $N$ as the class of 0-sets. How (or where) is this measure constructed? — Sergei Akbarov, Jan 14 '20 at 20:29
@SergeiAkbarov: Not every complete Boolean algebra has this property (see my other question https://mathoverflow.net/questions/71259/which-complete-boolean-algebras-arise-as-the-algebras-of-projections-of-commutat). The cited result by Segal is actually useless for constructing such measures, since it assumes the existence of a faithful measure by definition. — Dmitri Pavlov, Jan 14 '20 at 21:13
Dmitri, I again don't understand something. You write that what you call localizable measurable spaces (i.e. the triples $(S,M,N)$ with the properties you give) are in one-to-one correspondence with the commutative von Neumann algebras. On the other hand, commutative von Neumann algebras are exactly the ones that can be presented as $L^\infty(\mu)$ with localizable measures (see Sakai 1.18.1). Doesn't this mean that each localizable $(S,M,N)$ has a localizable measure $\mu$? — Sergei Akbarov, Jan 14 '20 at 21:35
@SergeiAkbarov: (S,M,N) is localizable if S is a set, M is a σ-algebra on S, N is a σ-ideal on S, N⊂M, there is a measure μ on M whose null sets coincide with N, and the Boolean algebra M/N is complete. So the answer to the question is yes, by definition. I am actually preparing a manuscript (almost ready) with a complete proof of the categorical equivalence between l.m.s. and commutative von Neumann algebra, since I could not find any references in the literature about this, and neither could the others (see https://mathoverflow.net/questions/23408/). All the details will be there. — Dmitri Pavlov, Jan 14 '20 at 21:41
Ah, so the existence of a measure is included in the definition... (and I suppose, "faithful" means "semifinite"?) Pardon, I did not understand. But if so, then the question arises whether this condition can be replaced by something more simple? The existence of a measure looks strange if the idea is to get rid of measures. — Sergei Akbarov, Jan 14 '20 at 21:53
@SergeiAkbarov: The idea is to get rid of a specific choice of measure, but we still need a condition that guarantees existence of sufficiently many measures. The definition of a von Neumann algebra as a C*-algebra with a predual is also a definition of this type: the existence of a predual is precisely saying that one has sufficiently many measures on a von Neumann algebra. — Dmitri Pavlov, Jan 14 '20 at 22:10
Dmitri, however, before this detail (the existence of a localized measure in the definition) had been revealed, the construction seemed much more elegant. I think, there must be a simpler definition, without the mentioning of measures. (And the way from operator algebras is of course less elegant.) — Sergei Akbarov, Jan 14 '20 at 22:19
@SergeiAkbarov: This problem (getting rid of measures) was formulated by von Neumann in 1937 and has remained unsolved ever since. Some partial progress has been made (cf. the negative solution to the control measure problem by Talagrand and the solution to von Neumann–Maharam problem, see the work of Balcar, Jech, Pazák, and also Todorcevic), but the original problem remains elusive. — Dmitri Pavlov, Jan 15 '20 at 17:27
@SergeiAkbarov: You asked for a reference, and I am glad to say that one is now available: https://arxiv.org/abs/2005.05284. — Dmitri Pavlov, May 12 '20 at 03:23
A somewhat unrelated question: can the theorem about the canonical systems of conditional measures be stated as a certain property of the category of measure spaces? — R W, May 15 '20 at 21:18
@RW: There are many theorems that could be described by this name. Which theorem specifically do you have in mind? — Dmitri Pavlov, May 15 '20 at 22:20
Rokhlin's theorem on the existence of systems of conditional measures for homomorphisms of Lebesgue spaces. — R W, May 15 '20 at 23:10
@RW: Yes, Rokhlin's disintegration of measures can be performed in this context. Even better: the equivalence of 5 categories stated in the answer can be upgraded to an equivalence of 5 categories where morphisms are now equipped with fiberwise measures, which are composed like in the statement of the disintegration theorem. The disintegration theorem can be now stated very neatly by saying that the forgetful functor from the new category to the old category that throws away the fiberwise measure is a Grothendieck fibration of categories. — Dmitri Pavlov, May 16 '20 at 04:32

Mark Meckes · Answer 2 · 2023-02-08T20:26:27.203

129

In the spirit of this answer to a different question, I'll offer a contrarian answer. How to understand probability theory from a structuralist perspective:

Don't.

To put it less provocatively, what I really mean is that probabilists don't think about probability theory that way, which is why they don't write their introductory books that way. The reason probabilists don't think that way is that probability theory is not about probability spaces. Probability theory is about families of random variables. Probability spaces are the mathematical formalism used to talk about random variables, but most probabilists keep the probability spaces in the background as much as possible. Doing probability theory while dwelling on probability spaces is a little like doing number theory while dwelling on a definition of 1 as $\{\{\}\}$ etc. (That last sentence is definitely an overstatement, but I can't think of a more apt analogy offhand.)

That said, multiple perspectives are always good to have, so I'm very happy you asked this question and that you've gotten some very nice noncontrarian answers that I hope to digest better myself.

Added: Here is something which is perhaps more similar to dwelling on probability spaces. To set the stage for graph theory carefully one may start by defining a graph as a pair $(V,E)$ in which $V$ is a (finite, nonempty) set and $E$ is a set of cardinality 2 subsets of $V$. You need to start tweaking this in various ways to allow loops, directed graphs, multigraphs, infinite graphs, etc. But worrying about the details of how you do this is a distraction from actually doing graph theory.

Added much later: For a completely different perspective, based on new developments since this question was first asked, see my more recent answer.

edited Feb 08 '23 at 20:26

answered Apr 09 '10 at 12:28

Mark Meckes

11,286

63

Indeed, I saw a quote from somebody famous (if I think of the author I'll edit) to the effect that "one could say that probability theory is the study of measure spaces with measure one, but this is like saying that number theory is the study of finite strings of the digits {0,...,9}." – Nate Eldredge May 25 '10 at 13:36
1

Yes, I've seen that quote too, and it's better than my analogies now that you remind me of it. But I also forget the author. – Mark Meckes May 26 '10 at 11:18
10

Another great quote along the same lines, from Rudin (Real and Complex Analysis, page 18 in my edition): "For instance, the real line may be described as a quadruple $(R^1, +, \cdot, <)$ where $+$, $\cdot$, and $<$ satisfy the axioms of a complete archimedean ordered field. But it is a safe bet that very few mathematicians think of the real field as an ordered quadruple." – Carl Offner May 30 '10 at 14:47
11

That is a great quote, but it doesn't make all the points Nate's quote does. If you think of the reals as a quadruple, you have the formalism necessary to understand and prove theorems about real numbers, although you may lack the intuition needed to appreciate the theorems. But if you think of natural numbers as strings of digits, you're missing not only intuition but also interesting algebraic structure. Likewise, a measure space with measure one is insufficient structure for probability; you need some additional algebraic or geometric structure before you can even talk about expectations. – Mark Meckes May 30 '10 at 23:35
I'm not sure exactly what you mean by additional structure, because it seems to me that expectations are pretty immediate. But I think your general point is correct. A great quote illuminating that (which due to space limitations I can't quite copy here) is in Fremlin's Measure Theory treatise -- Vol.2, the second paragraph of the introduction to Chapter 27. I think it's well worth looking at, and also very well written – Carl Offner May 31 '10 at 02:24
8

You can't talk about expectations if all you have is a probability space. You need to look at a measurable function (random variable) from your probability space into $\mathbb{R}$ or a similar algebraic structure; or equivalently you need your probability space itself to have some algebraic structure. – Mark Meckes May 31 '10 at 03:16
Fair enough. If you get a chance, take a look at Fremlin's statement. I think you'll like it. – Carl Offner May 31 '10 at 13:46
2

@Carl: Indeed! The paragraph following the one you pointed out is even better for getting to the real point here. For anyone else interested, here's the link to Fremlin's treatise (although it's easy enough to find via Google): http://www.essex.ac.uk/maths/staff/fremlin/mt.htm Watch out for the fact that it's plain TeX (not LaTeX)! – Mark Meckes Jun 01 '10 at 15:12
4

The quote I mentioned above appears in Terry Tao's blog at http://terrytao.wordpress.com/2010/01/01/254a-notes-0-a-review-of-probability-theory/ dated January 1, 2010, which is cited in another answer to this question. I am not sure if that is where I read it, but it's the earliest reference I have so far. – Nate Eldredge Aug 23 '10 at 20:39
55

Just for the record, that quote is my own, though the general sentiment that probability is not about measure spaces is certainly very widely held among probabilists. – Terry Tao Sep 03 '10 at 19:08
1

Dmitri Pavlov's answer mentions that "The category of localizable measurable spaces is equivalent to the category of commutative von Neumann algebras." Doesn't this mean that studying families of random variables is the same thing as studying probability spaces? – Vectornaut Sep 23 '12 at 16:33
3

@Vectornaut: No, it doesn't. For a start, this equivalence doesn't mention anything about a measure on the one side or an expectation on the other side. Maybe the equivalence can be made to include those data as well, but the major issue -- if you want to think about things in such terms -- is that in probability theory you should not only mod out by sets of measure 0, you should also mod out by measure-automorphisms of the domain which preserve the joint distribution of the random variables under consideration. – Mark Meckes Sep 23 '12 at 20:27
Re: the quote from Rudin, I once graded a real analysis course using Wade's text, which essentially proceeds by saying "assume there exists a complete ordered field" and then proving theorems that hold in such an object. I had to tweak the way I graded the course since students didn't really appreciate that perspective on analysis at that stage in their development. – Daniel McLaury Nov 20 '13 at 03:25
8

"probability theory is not about probability spaces" --- yes, but it is perhaps about the category of probability spaces and measurable maps. Random variables are morphisms in that category. The approach to look at families of random variables to understand some distribution is really in the spirit of the Yoneda Lemma. Morphisms are more important than objects. – Martin Brandenburg Mar 25 '15 at 22:47
7

By the way, I disagree with Rudin's comment. It basically says "It is a tradition to ignore forgetful functors, so I will follow this tradition" and "argues" for this procedure by saying that otherwise one would "have to" imagine the reals as a quadruple. No. It is really about the question in which category one works. There is a field of real numbers, there is a measure space of real numbers, there is a topological space of real numbers, etc., and it is very unfortunate that all are denoted by the same symbol. At least, one should not forget the forgetful functors between these categories. – Martin Brandenburg Mar 25 '15 at 22:53
1

One can talk about a field without having to write it down as a tuple. Just say $K$ is the field, an object of the category of fields and not of the category of sets, and don't confuse it with its underlying set $|K|$. Similarly, we can talk about graphs without having to think of them as pairs of sets. The answer claims that it is somehow difficult or distracting to define graphs that way, but I don't agree with this either. – Martin Brandenburg Mar 25 '15 at 22:56
8

@MartinBrandenburg: I don't claim that it's distracting to define graphs as pairs of sets. I claim that if you do get distracted by the details of such a definition, then you're not doing graph theory. – Mark Meckes Mar 26 '15 at 01:59
18

@MartinBrandenburg: Moreover, I would need convincing that probability theory is about the category of probability spaces and measurable maps. That category surely plays a role, but most of what's of interest in the theory seems to take place outside of it. In particular, if you can show me how to view the law of large numbers or the central limit theorem as being about the category of probability spaces and measurable maps, then I'll reconsider. – Mark Meckes Mar 26 '15 at 10:42
4

"I was surprised by Kazhdan’s request since “everybody knows” that a random variable is just a measurable function $X(ω)$ from $Ω$ to $\mathcal X$. He answered “yes, but that’s not what it means to people working in probability” and of course he was right." — P. Diaconis – Watson Jul 08 '18 at 09:51

score 57 · Answer 3 · edited Jun 21 '20 at 04:19

57

A few months ago, Terry Tao had a really insightful post about "the probabilistic way of thinking", in which he suggested that a nice category of probability spaces was one in which the objects were probability spaces and the morphisms were extensions (ie, measurable surjections which are probability preserving). By avoiding looking at the details of the sample space, you can elegantly capture the style of probabilistic arguments in which you introduce new sources of randomness as needed.

edited Jun 21 '20 at 04:19

Martin Sleziak

4,608

answered Apr 08 '10 at 16:46

Neel Krishnaswami

9,013

1

This is a very interesting (and impressively long!) set of notes; thanks for linking to them. I may make more comments after I've digested them. – Pete L. Clark Apr 09 '10 at 03:11

score 32 · Answer 4 · edited Jun 21 '20 at 04:19

I want to post the following as a comment on many of the answers and comments already given.

Several people have said, "Well, watch out -- probability theory is not really the study of probability measures, but rather the study of certain quantities preserved under certain equivalence relations on probability measures, like distribution functions."

I certainly accept this point. In fact, I had more or less accepted it before I asked the question, although I admittedly didn't give much indication of this in the question itself. To be clear, I am aware "rewriting" impulses I have when reading about basic measure-theoretic probability are taking me in a direction away from the material of mainstream probability theory. I have two responses to this:

Okay, let's agree that the definition and study of a category of probability spaces is not the domain of probability theory per se. But this does not mean that it's not useful or worth studying.

1a) If this branch of mathematics is not probability theory, what is it? [User "coudy" gave an answer saying that this is ergodic theory. I was unduly dismissive of this answer at first, and I apologize for that. I still don't think that "ergodic theory" is exactly the answer to my question, for instance because so far as I understand the subject it focuses almost exclusively on the dynamical aspects of iterating a measure-preserving transformation of a probability space. (By way of analogy, the branch of mathematics that studies the category of finite type schemes over a field $K$ is arithmetic geometry, not arithmetic dynamics.)

1b) While I agree that probability theory is at present not concerned with such structuralist questions, is it clear that it shouldn't be? Or, in less polemical terms, is there no advantage or insight to be gained by studying the structural aspects of probability spaces?

I think an outsider to probability theory has a right to ask: "Okay, if probability spaces are really not the point of probability theory, why do they appear so prominently in all (so far as I know) modern foundations of the subject? Wouldn't -- or couldn't -- it be better to isolate exactly the structure that probability theory actually does care about and study this structure explicitly from the outset?"

By way of analogy, consider the notion of a "differentiable atlas" in the study of smooth manifold theory. Gian-Carlo Rota referred to atlases as a polite fiction, meaning (I think) that they are present in the foundations of the subject but do not really exist in the sense that the practitioners of the subject do not think about them and ask questions about them. They don't do any harm so long as you don't take them very seriously, but I have seen students get caught up on this point and "ask too many questions". The more modern approach of a structure sheaf seems like an improvement here -- it does the same work as an atlas but is something that the practitioners of the subject actually care about, so it is not at all a waste of time to "think deeply about structure sheaves". Indeed, the concept of "structure sheaf" is incredibly prevalent in other areas of mathematics, to the extent that if you are founding a new branch of geometry, knowing about structure sheaves will ease the birthing process.

So the dual question to 1) here is "What is the kind of mathematical structure that probability theorists are interested in studying?" (Happily, many of the very nice answers above do in fact address this question.)

Unfortunately I didn't see this nice answer/commentary until just now. A propos of your question 2), you might be interested to look at the classic two-volume probability text by Feller, in which probability spaces play a surprisingly small role and are not even introduced until well into volume 2. — Mark Meckes, Sep 04 '12 at 15:27

coudy · Answer 5 · 2020-01-15T11:10:09.603

A category consists in a class of objects together with a class of morphisms. Measure theory together with morphisms between measure spaces is the topic of ergodic theory. So if you are interested in a categorical viewpoint at measure theory, you should take a look at advanced books on ergodic theory.

Here are some references. Glasner's book "ergodic theory via joinings" is probably what is close to a full blown categorical account of some basic concepts in ergodic theory. Rudolph's "Fundamentals of measurable dynamics: ergodic theory on Lebesgue spaces" is also pretty geared toward such an account. If you are interested in applications of ergodic theory to Lie group actions and diophantine approximation, you may want to consult the appendices in the books of R. Zimmer "ergodic theory and semisimple Lie groups". These appendices summarize the categorical results relevant to these questions.

Note however that many books on ergodic theory are pretty quick on the categorical stuff. Ergodic theory is a subject which is of interest to group theorists, dynamic people, probabilists, combinatorists, physicists, computer scientists,... So, really, it is not very useful to spend too much time on some fundational material that is irrelevant to these people, and to many applications.

In contrast to algebraic geometry, which is built like a cathedral, and for which category is a very interesting foundational material, ergodic theory is more like of a bazaar. Its structure is definitely transverse to the usual classification of mathematics (algebra, analysis, geometry), and even transverse to the classification of science (math, physics, computer science, biology) you may be accustomed to. Much of the steam in ergodic theory comes from the many interactions between these communities. It is important to keep the entrance level as low as possible to get as much people as possible on the boat. Putting forward a categorical approach in the textbooks or in conferences would do much harm to the field.

The references I provide should answer your four questions. Let me just add a comment. If you define a Borel space as a set endowed with a $\sigma$-algebra, you will soon run into many problems (e.g. a morphism at the level of the algebras not necessarily comes from a map between the sets, also a non-Borel non-Lebesgue measurable subset of $[0,1]$ endowed with the Lebesgue measure is a perfectly well defined measure space, and you definitely don't want it), so that's why people don't usually define it that way. There several choices in use at the moment, for example the Borel standard spaces (Zimmer appendices), and the Lebesgue spaces (Rudolph's book).

Pete, I think you were too quick to dismiss coudy's answer (and frankly, I don't see how you found it disrespectful). Many results and methods in probability theory can be rephrased in terms of ergodic theory, which means this is a perfectly on-topic response to your question. — Tom LaGatta, May 24 '10 at 01:14
@Tom LaGatta: I agree with you, and awhile ago I deleted these comments and removed my downvote. In some sense coudy's answer is the closest I have received so far to one of the aspects of the question, although I still maintain that it is not quite dead-on. I have explained this in more detail in my CW answer below. — Pete L. Clark, Dec 19 '10 at 10:44

score 20 · Answer 6 · edited Jun 21 '20 at 04:24

20

There is an early paper by Victor Bogdan called "A new approach to the theory of probability via algebraic categories" (#54 here or here) which may be of interest.

edited Jun 21 '20 at 04:24

Martin Sleziak

4,608

answered Apr 08 '10 at 16:24

Steve Huntsman

15,258

Apologies. I seem to have clicked a down vote by mistake, and now can't undo it. – David Corfield Apr 20 '17 at 13:57

score 18 · Answer 7 · answered Apr 09 '10 at 13:04

As already noted, most probabilists identify random variables essentially with their distribution. The problem is that the kind of operations one can do with random variables often depend on the spaces they are defined on. The probabilitys spaces random variables are usually defined on, such as the unit interval with Lebesgue measure, do not allow for all the construction one wants to make (an uncountable family of independent random variables for example). In order to make all the constructions one wants to work with possible, one needs to work with more esoteric tools from measure theory. The problem is even larger when one turns to stochastic processes or adapted stochastic processes.

For this reason, people have worked on probability theory from the model theoretic view, which gives answers to existence questions much closer to the categorial view. A relatively readable introduction to this field is given in the book "Model Theory of Stochastic Processes" by Fajardo and Keisler. Their paper Existence Theorems in Probability might also be of interest.

I should have added a caveat to my answer that working with stochastic processes forces one to grapple with probability spaces more. But I've never heard of anyone actually wanting to consider an uncountable family of independent random variables. — Mark Meckes, Apr 09 '10 at 13:19
They actually occur in mathemtical economics. One wants to have a continuum of agents to apply analysis and one wants to be able that their independent actions cancel out in the aggregate. One wants a law of large numbers for such cases. Finding spaces on which one can make this work turned out to be hard but possible. — Michael Greinecker, Apr 09 '10 at 13:36

score 16 · Answer 8 · edited Feb 21 '19 at 21:15

Misha Gromov, "In a Search for a Structure, Part 1: On Entropy." https://www.ihes.fr/~gromov/wp-content/uploads/2018/08/structre-serch-entropy-july5-2012.pdf provides some interesting category-theoretic musings, among other things. One curious 'other thing' is that the Fisher metric is the flat metric on complex projective space.

He also gave a series of lectures on probability from the category theoretic perspective, "Probability, Symmetry, Linearity":

IHES talk on YouTube: https://www.youtube.com/playlist?list=PLx5f8IelFRgGo3HGaMOGNAnAHIAr1yu5W
Slides of the lectures: https://www.ihes.fr/~gromov/wp-content/uploads/2018/08/probability-huge-Lecture-Nov-2014.pdf

score 11 · Answer 9 · answered Apr 18 '17 at 10:24

For a recent approach that looks to provide a better categorical environment for probability theory:

Chris Heunen, Ohad Kammar, Sam Staton, Hongseok Yang, A Convenient Category for Higher-Order Probability Theory.

It replaces the category of measurable spaces, which isn't cartesian closed, with the category of quasi-Borel spaces, which is. As they point out in section IX, what they're doing is working with concrete sheaves on an established category of spaces, rather like the move to diffeological spaces.

score 8 · Answer 10 · answered May 13 '10 at 09:47

8

Last year Voevodsky has given a talk at MIAN about his approach to probability theory; there is online a videorecording in Russian. I do not know if anything is written on this.

There was also an old Russian book (in Russian, afaik not translated, from the 70s) developing a somewhat similar approach but I do not quite remember the reference. I could look for it, though, if there is interest...

answered May 13 '10 at 09:47

mmm

81
1

7

I don't know what is possible after 2 years - but if you're still here, I am interested – SBF Mar 31 '12 at 20:08
I am also interested. Such a book would be a great way to learn more about probability theory as well as to practice my mathematical Russian. – Chill2Macht Jan 13 '17 at 12:37
What does "afaik" mean ? – Duchamp Gérard H. E. Mar 28 '19 at 02:23
1

afaik it means "as far as I know" – Vincent Jul 09 '19 at 09:50
For what it's worth, a recording online (title is "Categorical Probability") is https://www.youtube.com/watch?v=TIfo1aIeBN4 and a set of notes by Voevodsky in English is item 57 on the page https://www.math.ias.edu/Voevodsky/voevodsky-publications.html. Click the .pdf link there. – KConrad Jan 15 '20 at 05:40

score 8 · Answer 11 · answered Feb 08 '23 at 20:24

This old question and my old answer continue to get occasional attention, and I believe it's time for a new answer.

Is there an introduction to probability theory from a structuralist/categorical perspective?

I now say: at the time you asked the question, I don't believe there was a well-developed structural/categorical approach to probability theory (as opposed to probability spaces). But by now there is at least one such approach that I find compelling, namely the theory of Markov categories as developed by Tobias Fritz, Paolo Perrone, and their collaborators. This approach is still being developed, and I'm just beginning to really learn about it, but I really like what I've seen so far. I won't attempt an introduction to it here, or an explanation of what I like better about it than other approaches I've seen, but this answer by Tobias Fritz to another question summarizes some highlights.

I will point out, though, that I and many others have argued that probability theory is not well described by any category of measure spaces and measurable maps; the categories that arise here are more subtle beasts. Basically, one works with "the Kleisli category of the Giry monad on the category of measurable spaces and measurable maps", or some variation thereof. Fortunately for us non-category theorists, this thing can be given a concrete description (well-explained in this paper): the objects are measurable spaces, the morphisms are stochastic maps (basically Markov kernels, or functions with random outputs), and there is some additional structure.

One striking aspect of this picture is that the familiar basic things in probability, like probability spaces and random variables, show up not as objects or morphisms but as diagrams in this category. And more complex diagrams can encode more complicated things, like coupling arguments, that probabilists make use of every day.

I particularly like the fact that a probability measure on $\Omega$ is naturally identified with $I \xrightarrow{\varphi} \Omega$, where $I$ is some fixed one-element space and $\varphi$ is a stochastic map. This is reminiscent of the fact that in the category of sets and functions, elements of a set $X$ can be identified with functions $I \xrightarrow{f} X$. (In both cases, $I$ is the identity for a natural monoidal structure on the category in question.) Intuitively: a probability measure on $\Omega$ is the same thing as a random element of the set $\Omega$!

This perspective on elements of sets gives a way to talk about elements in structural approaches to set theory like ETCS, while simultaneously discouraging you from focusing on the action of functions on individual elements. In the same way, this perspective on probability naturally discourages you from focusing on the pointwise behavior of random-variables-as-measurable-functions. Which any probability theorist will tell you is exactly what you should not focus on in probability theory.

I am reminded that someone wiser than me wrote that "good general theory does not search for the maximum generality, but the right generality" — Yemon Choi, Feb 08 '23 at 21:09
Thanks, Mark! It's nice to hear that things are advancing in this subject. — Pete L. Clark, Feb 08 '23 at 21:16

score 5 · Answer 12 · edited Jun 21 '20 at 04:23

5

I find this: http://etd.library.pitt.edu/ETD/available/etd-04202006-065320/unrestricted/Matthew_Jackson_Thesis_2006.pdf. (Wayback Machine, A Sheaf Theoretic Approach to Measure Theory by Matthew Jackson)

Anyway I find "Bichteler :Integration, Springer LNM 315"

it is about the foundation of the theory, the style is similar to Bourbaki, and may be adaptable for a categorical view.

edited Jun 21 '20 at 04:23

Martin Sleziak

4,608

answered May 23 '10 at 09:37

Buschi Sergio

4,125
1
21
25

score 5 · Answer 13 · edited Mar 15 '21 at 08:20

5

Bart Jacobs has a new textbook out now. It is on his webpage. It is called "Structured Probabilistic Reasoning" I believe this is going to be an important reference in the next five years.

edited Mar 15 '21 at 08:20

მამუკა ჯიბლაძე

17,468

answered Feb 23 '20 at 20:13

Ben Sprott

1,299

Is there an introduction to probability theory from a structuralist/categorical perspective?

13 Answers13

Linked