69

A cumulant is defined via the cumulant generating function $$ g(t)\stackrel{\tiny def}{=} \sum_{n=1}^\infty \kappa_n \frac{t^n}{n!},$$ where $$ g(t)\stackrel{\tiny def}{=} \log E(e^{tX}). $$ Cumulants have some nice properties, including additivity- that for statistically independent variables $X$ and $Y$ we have $$ g_{X+Y}(t)=g_X(t)+g_Y(t) $$ Additionally, in a multivariate setting, cumulants go to zero when variables are statistically independent, and so generalize correlation somehow. They are related to moments by Moebius inversion. They are a standard feature in undergraduate probability courses because they feature in a simple proof of the Central Limit Theorem (see for example here).

So the cumulants are given by a formula and have a list of good properties. A cumulant is clearly a fundamental concept, but I'm having difficulty figuring out what it is actually measuring, and how it is more than just a computational convenience.

Question: What are cumulants actually measuring? What is their conceptual meaning? Are they measuring the connectivity or cohesion of something?

I apologize that this question is surely completely elementary. I'm in low dimensional topology, and I'm having difficulty wrapping my head around this elementary concept in probability; Google did not help much. I'm vaguely imagining that perhaps they are some kind of measure of "cohesion" of the probability distribution in some sense, but I have no idea how.

  • 4
    A great introduction to both classical and free cumulants is "Three lectures on free probability" by Novak and LaCroix http://arxiv.org/abs/1205.2097. – Tom Copeland Feb 20 '15 at 22:13
  • Related to Duchamp's answer to http://mathoverflow.net/questions/214927/important-formulas-in-combinatorics/215053#215053. – Tom Copeland Sep 26 '16 at 21:22
  • 1
    My first interesting intro to cumulants and associated diagrammatics was via Ma's book Stat. Mech. and second, much later, was Itzykson and Drouffe's Stat. Field Theory, Vol. 2, (see https://oeis.org/A036040 and https://oeis.org/A127671). For the more recent construct of free cumulants, in addition to Nowak's above, the intro by Speicher, "Free probability and non-crossing partitions" is brief and informative. – Tom Copeland Dec 22 '21 at 17:50
  • Speicher has some notes on the history of the topic in his blog post "On the origin of moment-cumulant formulas" (https://wordpress.com/read/blogs/155738873/posts/1080). – Tom Copeland Mar 24 '22 at 18:59
  • @TomCopeland: The link that you have given is... weird! For me, it leads to a log-in page, for which I do not have the credentials. Furthermore, that page has a link towards the blog of some IT guy named Paul Waring. I do not know what to make of all this. The correct link to Roland Speicher's blog post seems to be https://rolandspeicher.com/2020/07/02/origin-of-moment-cumulant-formulas/. – Alex M. Jun 03 '22 at 16:38
  • https://arxiv.org/abs/math/0601149 At this URL I wrote some things about what cumulants are but I think it stops short of answering this question. Some prominence should be given to several properties of cumulants: $\qquad$ – Michael Hardy Jun 19 '22 at 18:31
  • $$ \begin{align} \text{1. } & \text{For } n\ge2, \text{ the $n$th cumulant is translation-invariant,} \ & \text{i.e.} \text{ if $X$ is a random variable and $c$ is constant then the} \ & \text{$n$th cumulant of $X$ equals that of $X+c.$} \ {} \ \text{2. } & \text{The $n$th cumulant is homogeneous of degree $n$, i.e.} \text{ the} \ & \text{$n$th cumulant of $cX$ is $c^n$ times that of $X$.} \end{align} $$ – Michael Hardy Jun 19 '22 at 18:31
  • $$ \begin{align} \text{3. } & \text{The $n$th cumulant of (the distribution of) the sum of} \ & \text{finitely many random variables is the sum of their} \ & \text{cumulants. (This is where the $n$th cumulant differs} \ & \text{from the $n$th central moment when } n\ge4. \text{ (but not} \ & \text{when $n=2$ or 3.))} \ {} \ \text{4. } & \text{The $n$th cumulant is a polynomial function of the} \& \text{first $n$ moments. One term of the polynomial is the} \&\text{$n$th raw moment, with coefficient $1$. (When $n=2$ or $3$,} \ & \text{then it's just the central moment.)} \end{align} $$ – Michael Hardy Jun 19 '22 at 18:31
  • $\qquad \uparrow\qquad$i.e. when $n=2$ or $3$, then the $n$th cumulant is just the $n$th central moment. $\qquad$ – Michael Hardy Jun 19 '22 at 18:42
  • 1
    An interesting concrete application: "Do all pieces make a whole: Thiele cumulants and the free energy decomposition" by M. Bren, Florian, Mavri, and U. Bren (https://www.researchgate.net/publication/225702917_Do_all_pieces_make_a_whole_Thiele_cumulants_and_the_free_energy_decomposition . See also refs in https://oeis.org/A263634 as well as the two other OEIS entries mentioned above. – Tom Copeland Aug 27 '22 at 17:54
  • A little history on the classical cumulants is in Problem 11: Cumulants of "Twelve problems in probability no one likes to bring up" by GC Rota. – Tom Copeland Mar 02 '23 at 02:01
  • 1
    @AlirezaBakhtiari Please don't "correct" British spelling (apologise) to American spelling (apologize) in the original post. – Dave Benson Oct 26 '23 at 14:39

6 Answers6

52

Cumulants have many other names depending on the context (statistics, quantum field theory, statistical mechanics,...): seminvariants, truncated correlation functions, connected correlation functions, Ursell functions... I would say that the $n$-th cumulant $\langle X_1,\ldots,X_n\rangle^{T}$ of random variables $X_1,\ldots,X_n$ measures the interaction of the variables which is genuinely of $n$-body type. By interaction I mean the opposite of independence. Denoting the expectation by $\langle\cdot\rangle$ as in statistical mechanics, independence implies the factorization $$ \langle X_1\cdots X_n\rangle=\langle X_1\rangle\cdots\langle X_n\rangle\ . $$ If the variables are jointly Gaussian and centered then for instance $$ \langle X_1 X_2 X_3 X_4\rangle=\langle X_1 X_2\rangle\langle X_3 X_4\rangle +\langle X_1 X_3\rangle\langle X_2 X_4\rangle +\langle X_1 X_4\rangle\langle X_2 X_3\rangle $$ so the lack of factorization is due to $2$-body interactions: namely the absence of factorization for $\langle X_i X_j\rangle$. The $4$-th cumulant for variables with vanishing moments of odd order would be the difference $LHS-RHS$ for the previous equation. Thus it would measure the "interaction" between the four variables which is due to their conspiring all together instead of being a consequence of conspiring in groups of two at a time. For higher cumulants, the idea is the same.

Cumulant are definitely related to connectedness. For instance for variables whose joint probability density is a multiple of a Gaussian by a factor $\exp(-V)$ where $V$ is quartic, one can at least formally write moments as a sum of Feynman diagrams. Cumulants are given by similar sums with the additional requirement that these diagrams or graphs must be connected.

Some references:

26

A nice question, with probably many possible answers. I'll give it a shot. I think three phenomena should be noted.

i) The cumulant function is the Laplace transform of the probability distribution. Uniqueness of Laplace transforms then tells you that the cumulant function can be used to fully characterize your probability distribution (and in particular, its properties like its connectivity or cohesion, whatever these might be). Since a probability distribution is essentially a measure, and it is often more convenient working with functions, the Laplace transform is useful. As an example, all moments may be computed from the cumulant function, and probability distributions for which the moments coincide are the same (under some extra conditions). The idea of transforming a probability distribution into a function is also exemplified by the Fourier transform of a probability distribution, i.e. the characteristic function $u \mapsto \mathbb E[e^{i u X} ]$, with $u \in \mathbb R$. For this transform there is the well known result that pointwise convergence of characteristic functions is equivalent to weak convergence (narrow convergence from analysis point of view) of corresponding probability measures. See [Williams, Probability with Martingales].

ii) Sums of independent random variables. Their probability distributions are given by convolutions, and thus hard to work with. In the Laplace/Fourier domain, this difficulty disappears.

iii) The soft-max principle. This idea plays a key role in large deviations theory. Note that $\frac 1 t \log \mathbb E[e^{t X}] \rightarrow \mathrm{ess} \sup X$ as $t \rightarrow \infty$. Related terminology is the 'Laplace approximation of an integral' in physics (see here). Extensions of this idea, combined with a little amount of convex optimization theory (in particular Legendre-Frenchel transforms), allow one to deduce estimates on the distribution of sums of (not necessarily independent) random variables. Consult e.g. the Gärtner-Ellis theorem in any textbook on large deviations theory (recommended are [Varadhan], [den Hollander] or [Dembo and Zeitouni]), or here. Again, this explains mostly why the cumulant is useful, but not really what it is.

The somewhat disappointing summary is that it seems from the above observations that the (log) cumulant function is mostly a technical device. But a very useful one.

Hopefully somebody else has a suggestion on how the cumulant function may be given a more intuitive meaning, perhaps even related to your suggestion of the cumulant function measuring cohesion of probability measures. I would certainly be interested in such an explanation.

26

It might help to take a broader perspective: in some contexts (notably quantum optics) the emphasis is not on cumulants but on factorial cumulants, with generating function $h(t)=\log E(t^X)$. While cumulants tell you how close a distribution is to a normal distribution, the factorial cumulants tell you how close it is to a Poisson distribution (since factorial cumulants of order two and higher vanish for a Poisson process).

So I would think that any privileged role of cumulants is linked to the prevalence of normal distributions.

Carlo Beenakker
  • 177,695
  • I don't know what happened but I seems I voted this down by mistake (instead of up), and now I cannot repair this as long as the post is not edited. So think (-(-1) + 1) = +2! Or, if you wish to make the effort, make a slight edit so I can correct my mistake (I don't want to interfere with your post myself.) My apologies!! – Joris Bierkens Oct 15 '13 at 10:45
  • @JorisBierkens --- thank you, Joris, for the explanation; I made the small edit, so go ahead. – Carlo Beenakker Oct 15 '13 at 14:39
  • I corrected my mistake. – Joris Bierkens Oct 15 '13 at 15:37
17

Suppose you have N billiard balls on a pool table. If N is not too large, the collisions (hence correlations) will mostly include two balls at a time. However, if you add more balls on the table, you will start seeing new types of collisions where three, four and more balls would hit each other at the same time (one ball hitting the second and then the third still falls under the category of two-ball collisions).

The n-th cumulant quantifies the probability of n ball bumping each other at the same time at the same point. We can show the fourth cumulant with this diagram

fourth cumulant

Cumulants are also called "connected" correlation functions, and it is so depicted in the diagram above.

The n-th moment quantifies the probability of n ball bumping each other at the same time at the same point as well as some balls hitting only some other ones. For example, fourth moment includes four-ball correlations (as in the diagram above) as well as three-, two- and one-ball correlations (the last one is the "mean"). The fourth moment will include, but not limited to, three disconnected four-ball diagrams constructed from two connected two-ball diagrams as shown below

enter image description here

These diagrams are "disconnected;" they contribute to the fourth moment, but they do not contribute to the fourth cumulant.

So, moments quantify correlations in general (both connected and disconnected) whereas the cumulants quantify only the direct, simultaneous correlations (connected). For example, consider three balls. That one ball hitting the second but not the third will contribute to the third moment, but not to the third cumulant.

(Warning: This explanation includes oversimplifications for pedagogical purposes. Here I used simultaneity for colliding billiard balls (contact interaction); however, simultaneity is not a necessary condition in general.)

  • 1
    Although I am not a mathematician, I found this explanation to be quite intuitive. Incidentally, it is the main reason (or explains) why cumulants are used in SOFI imaging instead of other correlation measures. I ended up here trying to figure out the significance of cumulants in SOFI :) – Kris Sep 14 '15 at 16:17
  • Actually, is there some kind of resource that somehow explains why this is so? One of the previous answers also goes into the difference between the "simultaneous" correlation and pairwise ones... – Kris Sep 14 '15 at 16:27
  • A less vague account would be clearer. – Michael Hardy Jun 19 '22 at 21:05
  • I don't know what you're trying to say in this answer, but I'm going to guess that the following is related: For four random variables $X_1,X_2,X_3,X_4,$ $$ \begin{align} \operatorname E(X_1X_2X_3X_4) = {} & \kappa(X_1,X_2,X_3,X_4) \ {} \ & {} + \kappa(X_1,X_2,X_3)\kappa(X_1) \ & {} + \kappa(X_1,X_2,X_4)\kappa(X_3) \ & {} + \kappa(X_1,X_3,X_4)\kappa(X_2) \ & {} + \kappa(X_2,X_3,X_4) \kappa(X_1) \ {} \ & {} + \kappa(X_1,X_2)\kappa(X_1,X_3) \ & {} + \kappa(X_1,X_3)\kappa(X_2,X_4) \ & {} + \kappa(X_1,X_4)\kappa(X_2,X_3) \ {} \ & {} + \cdots\cdots \end{align} $$ – Michael Hardy Jun 19 '22 at 21:21
  • $$ \begin{align} \cdots\cdots \cdots \cdots\cdots\cdots\cdots & {} + \kappa(X_1,X_2)\kappa(X_3)\kappa(X_4) \ & {} + \kappa(X_1,X_3)\kappa(X_2)\kappa(X_4) \ & {} + \kappa(X_1,X_4)\kappa(X_2)\kappa(X_3) \ & {} + \kappa(X_2,X_3)\kappa(X_1)\kappa(X_3) \ & {} + \kappa(X_2,X_4)\kappa(X_1)\kappa(X_3) \ & {} + \kappa(X_3,X_4) \kappa(X_1)\kappa(X_2) \ {} \ & {} + \kappa(X_1)\kappa(X_2)\kappa(X_3)\kappa(X_4). \end{align} $$ – Michael Hardy Jun 19 '22 at 21:23
  • The $\kappa$s are joint cumulants. There is one term in the sum for each partition of the set of four random variables. $\qquad$ – Michael Hardy Jun 19 '22 at 21:24
  • 2
    The connection (pun intended) with connectedness is though the exp formula. I don't know if someone ever made this precise, but in many cases the log operates on the power series level rearranging the terms to form connected object (connected graphs and trees, "clusters" in physics) out of generally disconnected objects. – Rnhmjoj Oct 17 '22 at 09:42
  • @Rnhmjoj, your comment is spot on. Speaking in the quantum field theory context, the partition function Z (moment generator) includes all connected and disconnected graphs whereas logZ only includes the connected graphs and the latter generates cumulants. – Sener Ozonder Oct 18 '22 at 17:05
15

The cumulants beyond the second are all zero for a normal distribution. So intuitively they measure deviations from normality. In more detail, V.V. Petrov Sums of independent random variables (Springer-Verlag 1975) has estimates of the approach to normality in central (actually local) limit theorems, and they involve cumulants.

In statistical physics there are often sums of almost independent variables, for example displacements in a diffusion process. In this context they are called Burnett coefficients, and correspond to a diffusion equation with higher derivative terms. See for example H. van Beijeren, Rev. Mod. Phys. 54, 195-234 (1982); R. Steinigeweg and T. Prosen, Phys. Rev. E 87, 050103(R) (2013).

  • Thank you for this. In what sense are they a "natural" measure of deviation from normality, more so than the moment generating function itself? What does "deviation from normality" mean, precisely? – Daniel Moskovich Oct 14 '13 at 12:23
  • 2
    @Daniel Clearly there is a 1:1 correspondence between moments and cumulants - they convey the same information. The naturalness is what you have already mentioned, the additivity, and the fact they are zero exactly at an interesting limit, the normal distribution. –  Oct 14 '13 at 12:47
  • Yes- I made a silly comment without thinking things through. Now I understand your answer better- thanks! I'll also have a look at the book by Petrov which you recommended. – Daniel Moskovich Oct 14 '13 at 13:25
4

There are two articles that answer your question in different levels of detail. First What are Cumulants? and then Cumulants are universal homomorphisms into Hausdorff groups.

What I can say in one sentence: They are a sequences of numbers that uniquely determine a probability distribution whenever they all exist, such as the moments, but they are homomorphisms from the space of distributions equipped with convolution to the real numbers equipped with +, meaning the cumulants of a sum of independent random variables is just the sum of the cumulants of the single random variables.