How to derive Shannon Entropy from Clausius Theorem?

Question

I am studying Quantum Information now, and I need to understand the entropy of a quantum system. But before I go there, I need to understand Shannon Entropy which is defined as :

$H(X) = -\sum_{i=1}^{n} {p(x_i) \log_2{p(x_i)}} $

where $X$ is a discrete random variable with possible outcomes $x_{1},...,x_{n}$ which occur with probability $p_{1},...,p_{n}$. This is entropy that works in information theory, but we know that entropy is already defined back way in thermodynamics by Clausius as :

$$dS = \frac{\delta Q}{T}$$

Then, in statistical physics, entropy is defined by Boltzmann as :

$S=k_B \ln{\Omega}$

where $\Omega$ is the number of microstates of a system. How can I derive the Shannon entropy from these thermodynamics and statistical physics entropy definitions?

Related: https://physics.stackexchange.com/a/669707/247642, https://physics.stackexchange.com/a/656263/247642 — Roger V., May 20 '22 at 08:59
What you may want is the careful derivation of Boltzmann's $H$ Theorem, which was what showed that there was a connection between entropy and (schematically) $p\log p$. — Buzz, May 20 '22 at 20:38

Roger V. · Accepted Answer · 2022-12-07T08:33:33.927

5

These are not the same.

Shannon entropy (Information entropy), $H_\alpha=-\sum_i p_i\log_\alpha p_i$ applies to any system with specified probabilities $p_i$.

Boltzmann entropy, defined via the famous $S=k\log\Omega$ implies that the system occupies all the accessible states with equal probability, $p_i=1/\Omega$ (this is a particular case of the Information entropy, as can be seen by plugging $p_i=1/\Omega$ into the Shannon formula, taking natural logarithm base, and discarding customary dimensional coefficient).

Gibbs entropy, defined via the Clausius inequality, $dS\geq \delta Q/T_{env}$, is defined empirically, as a quantity that always monotonuously increases, and thus makes thermodynamic processes irreversible.

Furthermore, Boltzmann entropy and Gibbs entropy can be shown to be equivalent, reflecting the equivalence between the microscopic statistical physics and the phenomenological thermodynamics.

Finally, Let me first point out that entropy may mean different things. As Jaynes, in his article The minimum entropy production principle claims that there are six different types of entropy with somewhat different meaning.

Remark:
There is some disagreement about what is called Gibbs entropy, as Gibbs actually introduced two - one along the lines of Clausius, and another more similar to Boltzmann entropy. These are sometimes referred to as Gibbs I and Gibbs II. For more ways to introduce entropy see this answer to Is information entropy the same as thermodynamic entropy?.

edited Dec 07 '22 at 08:33

answered May 20 '22 at 09:07

Roger V.

58,522

I do not understand why you call Gibbs entropy the Thermodynamic expression. Usually, people use the name Gibbs entropy for the same formula as Shannon's entropy but limited to the case of the equilibrium ensemble probability distribution. – GiorgioP-DoomsdayClockIsAt-90 Dec 07 '22 at 06:36
@GiorgioP sometimes these are referred to as Gibbs I and Gibbs II. There is now a more complete summary here. Anyhow the point is that there is entropy defined phenomenologically (in thermodynamics), there is entropy in statistical physics, and there is information entropy. Was your downvote only due to the disagreement in terminology or do you sincerely think that this answer is not useful as per SE policies? – Roger V. Dec 07 '22 at 08:29
I never downvoted an answer for terminology reasons. However, I think that your answer does not provide a clear answer to the question (derivation of Sh. entropy from Clausius). Moreover, it contains some inaccuracy (the equivalence between Boltzmann entropy and Clausius entropy is not unconditionally valid). – GiorgioP-DoomsdayClockIsAt-90 Dec 07 '22 at 10:54
@GiorgioP the point of the answer is that Shannon entropy cannot be derived from Clausius. – Roger V. Dec 07 '22 at 10:55
I think it should be stated more clearly than saying they are not the same. Also Shannon and Boltzmann's entropies are not the same, but one can obtain the latter from the first. – GiorgioP-DoomsdayClockIsAt-90 Dec 07 '22 at 11:51
@GiorgioP thank you for sharing your opinion. Also Shannon and Boltzmann's entropies are not the same, but one can obtain the latter from the first. - I think you miss an important point here, but this has been discussed in connection to your own answer in this thread. I pretty much share the opinions that at_JánLalinský has expressed there. – Roger V. Dec 07 '22 at 12:02

score 5 · Answer 2 · edited Dec 07 '22 at 01:16

Yes, you can.

I'm going to show that the expression of the Shannon entropy can be deduced from statistical and thermodynamic relations.

We know that the entropy defined by $\mathrm{d}S=\delta{Q}/T$ can be related to thermodynamic potentials. The formula we will use here is $$ F=U-TS, $$ where $U$ is the energy and $F$ is the Helmholtz function (sometimes maybe called the free engergy?).

And in statistical mechanics, $F$ is related to the partition function $Q$ for the canonical ensemble: $$ F=-kT\ln Q, $$ where $Q=\sum_{r}\mathrm{e}^{-\beta E_r}$ is the partition function (of course you know that $\beta = \frac{1}{kT}$).

The probability of any microstate $r$ is given by $$ P_r=\frac{\mathrm{e}^{-\beta E_r}}{Q}. $$

OK. Here it comes. Take its logarithm and average it, and we have Eq. (3.3.13) in (Pathria & Beale 2021) $$ \langle \ln P_r \rangle=-\ln Q-\beta \langle E_r\rangle=\beta(F-U) = -\frac{S}{k}. $$ The average formula is just $$ \langle \ln P_r \rangle = \sum_{r} P_r \ln P_r. $$ At last, we obtain the Shannon entropy (forget about the Boltzmann constant $k$) $$ S = -\sum_{r} P_r \ln P_r. $$

Ref.

R. K. Pathria, Paul D. Beale. Statistical Mechanics 4th ed., Academic Press, 2021.

I don't understand why this is not the chosen answer. This is an excellent derivation, thank you — bananenheld, Jul 08 '22 at 18:38
@bananenheld This answer is possibly not chosen because OP seems to be asking about the microcanonical ensemble, but this answer works in the canonical ensemble. I.e., OP asks about why $S=k\log(\Omega),$ which only holds when the microstate probabilities are all equal (all equal to $1/\Omega$), but in the canonical ensemble the probabilities are given by $e^{-\beta E_i}/Q$. — hft, Dec 07 '22 at 01:19
@bananenheld The problem with this answer is that it allows deriving a formula looking like Shannon's formula but is limited to the case of the probability of equilibrium ensembles in Statistical Mechanics. The derivation does not allow the use of the formula in other contexts, for example, the case of signal transmission that was Shannon's starting point. — GiorgioP-DoomsdayClockIsAt-90, Dec 07 '22 at 06:32

score 3 · Answer 3 · edited May 20 '22 at 20:12

3

You can't. It is not possible to derive a more general formula from a less general one. Of course, one can find hints for the generalization, but the validity of the generalization has to be proved independently.

The relationship between the formulas is the following: Shannon's formula is more general (it applies to every probability distribution, even non-equilibrium ones and even if there is no energy underlying the probabilities).

Statistical mechanics entropies (different in different ensembles) are the special case of Shannon's formula for the case where the probabilities have the correct values for each ensemble.

Clausius formula has a connection with statistical mechanics entropies only at the so-called thermodynamic limit. I.e. in the limit of a very large system.

edited May 20 '22 at 20:12

hft

19,536

answered May 20 '22 at 12:06

GiorgioP-DoomsdayClockIsAt-90

34,706

1

It is not a more general formula, it is a formula for different thing. Shannon entropy, or information-theoretic entropy, is a different concept altogether from the Clausius entropy. Clausius entropy can be sometimes calculated as value of the Shannon entropy; but this does not make the latter a more general formula for the same thing. – Ján Lalinský May 23 '22 at 19:33
@JánLalinský I do not agree with your point of view. Shannon entropy can be defined for every probability distribution, not only the one corresponding to an equilibrium ensemble. But if used with the equilibrium probability distribution of a Statistical Mechanics ensemble it provides exactly the Statistical Mechanics expression for the entropy. I would call this situation a particularization of a more general formula to a more specific case. In a similar way, the Statistical Mechanics entropy, in general, does not coincide with the thermodynamics entropy. However, with additional conditions... – GiorgioP-DoomsdayClockIsAt-90 May 23 '22 at 21:57
... i.e. with the thermodynamic limit, it coincides with the thermodynamic entropy. What else, to be allowed to speak about a transition from more general to more specific concepts? – GiorgioP-DoomsdayClockIsAt-90 May 23 '22 at 21:59
1

"...Statistical Mechanics...I would call this situation a particularization of a more general formula to a more specific case." -- that is true, but that generalization consists of going from Shannon entropy of special probability distribution to Shannon entropy of any probability distribution. Not from going from Clausius entropy to Shannon entropy. Clausius entropy isn't logically dependent on statistical mechanics or probability distributions.
– Ján Lalinský May 23 '22 at 22:07
@JánLalinský Sure, as Clausius himself, one can have Clausius entropy even without knowing that there is something called Statistical Mechanics. However, if Statistical Mechanics plus some additional prescription (existence of the thermodynamic limit) allows for justifying the properties of Clausius entropy, I would call it a more general theory. But maybe we are just disagreeing about the term generalization. – GiorgioP-DoomsdayClockIsAt-90 May 23 '22 at 22:25
"if Statistical Mechanics plus some additional prescription (existence of the thermodynamic limit) allows for justifying the properties of Clausius entropy, I would call it a more general theory" -- SM being more general than classical thermodynamics is a different issue. It is more general in the sense its method of description and analysis can be applied to almost any system, not merely systems close to equilibrium. But that does not mean that Shannon entropy is more general than Clausius entropy. They are still functions of different variables, hence different functions.
– Ján Lalinský May 23 '22 at 22:38
1

If we defined entropy for a body out of equilibrium as function of macroscopic description quantities such as energy density field and mass density field, using SM or not, then that would be generalization of the Clausius entropy, producing a concept of the same kind for more general class of systems. But Shannon entropy is not a generalization in this sense; it is a different concept that turned out to be relevant to Clausius entropy, but it is not the same kind of concept. – Ján Lalinský May 23 '22 at 22:42
@JánLalinský Shannon entropy is not a generalization in this sense; it is a different concept that turned out to be relevant to Clausius entropy, but it is not the same kind of concept. According to this point of view, one should say the same for Statistical Mechanics entropy. I think it is partially true. But I find it more useful to stress the existing conceptual relationships between these concepts (which exist!) than the differences. – GiorgioP-DoomsdayClockIsAt-90 May 24 '22 at 06:23
I get your point of view. I prefer to stress both the relation and differences of the two on this site, because people learning physics often assume all those different views and descriptions of entropy are somehow about the same thing, which is not so, and this leads to confusion and prevents understanding of the different concepts of entropy. – Ján Lalinský May 24 '22 at 09:43

score 1 · Answer 4 · edited May 23 '22 at 13:03

1

A better approach would be to use the Shannon Entropy to derive Gibbs entropy: $S=−k\cdot∑p_n \cdot \ln(p_n)$. The two equations are very similar and therefore it is much easier understand. From there it is easy to arrive at Boltzman entropy and finally Clausius.

edited May 23 '22 at 13:03

DanielC

4,333

answered May 20 '22 at 16:14

Stevan V. Saban

1,720
3
17

score 0 · Answer 5 · answered Mar 27 '24 at 12:05

You can derive the Gibbs entropy, which is simply the Shannon entropy multiplied by the Boltzmann constant, from the Boltzmann entropy and a multinomial distribution. This video proceeds with the derivation:

https://www.youtube.com/watch?v=h3xVAVcYfjk

The Boltzmann entropy is not a particular case of the Gibbs entropy in the demonstration. It's quite the other way around. Sure, the Boltzmann entropy has equal probabilities while they may differ in the Gibbs entropy. However, the probabilities refer to microstates of a canonical ensemble in the Boltzmann entropy, but to individual particle energy states in the Gibbs entropy.

How to derive Shannon Entropy from Clausius Theorem?

5 Answers5

Linked

Related