5

What exactly does $S$ represent in the CHSH inequality

$$-2~~\leq ~S~\leq ~2?$$

Sorry I've been reading for a couple days and I can't figure out what exactly $S$ is and the math is a bit over my head.

Any help is much appreciated.

glS
  • 14,271

2 Answers2

7

The $S$ in this inequality is defined as $$ S = E(a b) − E(a b') + E(a' b) + E(a' b') $$ where $E(M)$ is the "expectation value of $M$" which means the average value calculated from many repetitions of the same experiment (empirically) or from the probability distributions (theoretically).

All the expectation values are taken from the products of two quantities in the list $\{a,a',b,b'\}$. Each of these four quantities – which represent the spin measurement of the first particle with respect to two axes ($a,a'$) or the second particle with respect to two axes ($b,b'$) – is equal to either $+1$ or $-1$ so all the four products above are either $+1$ or $-1$, too. The positive value means that the two factors in the product yield the same sign; the negative value means that they have the opposite sign.

For example, $E(ab)$ may be interpreted as $1-2P_{a\neq b}$ where $P_{a \neq b}$ is the probability that $a,b$ will be measured with the opposite sign if you decide to measure the unprimed spin for the $a$ particle and unprimed spin for the $b$ particle, too. The other two $E(\dots)$ terms are similarly linked to the analogous probabilities $P_{i,j}$. So $$ S = 2 - P_{a \neq b} + P_{a\neq b'} - P_{a'\neq b} - P_{a'\neq b'} $$ You saw that $S$ is composed of four terms and each of them is in between $-1$ and $+1$. So a priori, $S$ could be anything between $-4$ and $+4$. However, the assumptions of realism and locality are enough to prove that in local realist theories, $S$ is always between $-2$ and $+2$; quantum mechanics allows $S$ to be as high as $2\sqrt{2}\sim 2.8$ which exceeds the interval allowed by local realist theories and experiments confirm that this $2.8$ is realized in the appropriate spin experiments, thus falsifying local realist theories.

An intuitive sketch why $S$ can't be greater than two in local realist theories is this one: to maximize $S$, you want the terms with the plus signs to be close to $1$ and the second term to be close to $-1$. However, it's not possible. If the first, third, and fourth terms are very close to one, it means that $a$ is highly correlated with $b$, $a'$ is also correlated with $b$, and $a'$ is correlated with $b'$. Transitive laws are enough to see that the first two conditions imply that $a$ is highly correlated with $a'$ and in combination with the last correlation, $a'$ with $b'$, it means that $a$ is correlated with $b'$, too. So the second term $-E(ab')$ is large negative, i.e. close to $-1$, and it gets mostly subtracted. Therefore, we get a number close to $S=2$ and not $S=4$.

Maths is needed to show that $2$ is the upper bound in local realist theories and $2.8$ is the prediction in quantum mechanics (confirmed experimentally). In quantum mechanics, it is impossible to deduce that $a$ is highly correlated with $a'$ out of the first two conditions. For a perpendicular pair of axes, they may be totally uncorrelated even though the previous two correlations may be very high.

Luboš Motl
  • 179,018
  • Does "upper bound" means that if i will calculate S, based on for example 4 (or more) experiments, it can be less than 2 in quantum mechanics, but some next calculated S will be more than 2? In other words, does violate CHSH equation means measure |S| > 2 at least once or always? – dk14 Oct 01 '14 at 00:47
  • 1
    Dear dk, local realist theories imply that $|S|\leq 2$ is true, i.e. it always hold. The negation of this statement is that $|S|\gt 2$ holds at least sometimes. So it's enough to find one example where $|S|\gt 2$ and all local realist theories are ruled out. – Luboš Motl Oct 01 '14 at 07:20
  • Thanks. I realized that it's just a result of Boole–Fréchet inequality so it's like i will receive head 1000000 times when fliping a quantum coin. http://physics.stackexchange.com/questions/138080/should-s-always-be-more-than-2-to-violate-chsh-inequality – dk14 Oct 01 '14 at 10:00
  • Just one more question. What if i have an hypothetical experiment where i measured states of particles 1000 times and calculated S = 1.8, then (maybe in same experiment) i measured states of particles 1000 times and calculated S = 2.21, does it means i violated CHSH (with such dynamic mathematical expectation)? – dk14 Oct 02 '14 at 00:19
  • dear @dk14, if you want to violate (rule out) an inequality by an experiment, you have to keep track of the inevitable error margin of the experiment. With the distribution of the "true" value of $S$ indicated by your measured mean value and error margin, you must calculate the probability that you got such a high value just by chance. And only if this probability is tiny, like smaller than $10^{-6}$ (the usual 5-sigma criterion), you may say that the experiment has established something. 1,000 measurements yielding $S=2.21\pm \epsilon$ is probably more than enough to falsify the $S\lt 2$ law. – Luboš Motl Oct 02 '14 at 06:37
  • 1
    There are systematic errors and statistical errors, the latter (relative sttistical errors) typically decrease as $1/\sqrt{N}$ with the number of measurements $N$, and you should learn how all these things work. These are sort of 101 issues of an experimenter's life. – Luboš Motl Oct 02 '14 at 06:38
  • What if $E(a,b)$ and $E(a'b)$ - are always $= 1$, but each of $E(ab')$ and $E(a'b')$ is equal to $0.5 \pm 5\sigma$ (assume every particle randomly choosing correlate or not when $b'$ is measuring), where $\sigma$ depends on N. Could S be more than 2 then? – dk14 Oct 08 '14 at 21:53
  • You may even calculate $\sigma$ for N = 1000 - it will be 0.032 for N/4. In worth case (when $E(ab')$ and $E(a'b')$ received symmetric errors) - it will be 0.064. Result: $S = 2 \pm 0.32 (5\sigma)$. So my 2.21 doesn't violate CHSH inequality as you can see. – dk14 Oct 08 '14 at 22:05
  • Do we have $\max(A+B)=\max(A)+\max(B)$ ? – QuantumPotatoïd Sep 03 '23 at 04:36
  • Sorry, I don't understand what it means max with a single argument. If it is the maximum element of a set, then max(A) with one argument is just A, and your equation says A+B=A+B which is true. If you maximize some entropy of something, you probably need to specify what is the something and what you maximize over, I just don't get it in this simple way. Whenever it will be something nontrivial, the additivity will be broken. – Luboš Motl Sep 04 '23 at 05:38
  • I mean the quantum operators B and B' don't commute, so there will be different times and we get $$\max_{x,y}{A(a,x)B(b,x)-A(a,y)B'(b',y)}$$. But if we want quantum to commute one should use $$A\otimes 1_2\otimes B\otimes 1_2-A\otimes 1_2\otimes 1_2\otimes B'$$ and there the eigenvalues will be 2 and -2. – QuantumPotatoïd Jan 11 '24 at 07:31
4

General Bell nonlocality framework

Consider a setup where Alice and Bob perform a measurement, each on their share of the system, which can result in one of two outputs. Each one of them is allowed to choose between two different ways of measuring (independently of each other, so Alice might choose two measurement bases and Bob two different ones).

The general way to describe correlations in such a setup is via the joint probability distribution $p(xy|ab)$, where $x,y\in\{0,1\}$ represent the possible measurement outcomes, and $a,b\in\{0,1\}$ the possible measurement choices, for Alice and Bob respectively. Notice that we might have equivalently chosen $x,y,a,b\in\{+1,-1\}$ instead, this is purely conventional.

The gist of Bell's nonlocality is that the assumption of local realism in itself constrains the form of $p(xy|ab)$. More precisely, "local realism" means that we can write $$p(xy|ab) = \sum_\lambda p_\lambda p_\lambda(x|a) p_\lambda(y|b),\tag1$$ for some "local hidden variable" $\lambda$ and some probability distributions $p_\lambda,p_\lambda(x|a),p_\lambda(y|b)$. To be clear, nonlocality means that there is no way to write $p(xy|ab)$ in such a form. It is worth noting that there is some abuse of notation in the above equation, with the symbol $p$ meaning different things in different parts. A more precise way to write it would be $$p_{XY|AB}(xy|ab)=\sum_{\lambda\in\Lambda}p_\Lambda(\lambda) p_{X|A\Lambda}(x|a\lambda)p_{Y|B\Lambda}(y|b\lambda),$$ to distinguish between the different functions. Here $$p_{XY|AB}(xy|ab)\equiv \operatorname{Prob}(X=x,Y=y|A=a,B=b),$$ and similarly for the other quantities. I will keep using the shorthand notation in the following, as the correct way to interpret the notation is generally clear from the context.

Standard form of $S$ operator

The standard way to show that CHSH's $S$ does the trick is to write it as $$S \equiv \mathbb{E}_{00}[XY] + \mathbb{E}_{01}[XY] + \mathbb{E}_{10}[XY] - \mathbb{E}_{11}[XY],$$ where $\mathbb{E}_{ab}[XY]$ stands for the expectation value of the random variable $XY$, computed over the probability distribution corresponding to the measurement choices $a,b\in\{0,1\}$. The random variables $X,Y$ are a function of the possible outcomes $x,y$, except that they mark the outcomes as $\pm1$ rather than as $0,1$. So, more explicitly, these are $$\mathbb{E}_{ab}[XY]= \sum_{x,y\in\{0,1\}} (-1)^{x+y} p(ab|xy).$$ We can then also write $S$ in similarly compact notation as $$S = \sum_{a,b} (-1)^{ab} \mathbb{E}_{ab}[XY] = \sum_{a,b,x,y} (-1)^{x+y+ab} p(ab|xy).$$

Local realism constraint on $S$

Now, suppose $p(ab|xy)$ can be written as (1). Then $$\mathbb{E}_{ab}[XY] = \sum_\lambda p_\lambda \mathbb{E}_{a,\lambda}[X] \mathbb{E}_{b,\lambda}[Y],$$ where $\mathbb{E}_{a,\lambda}$ denotes the expectation value computed with respect to the marginal probability distribution $x\mapsto p_\lambda(x|a)$, and similarly for $\mathbb{E}_{b,\lambda}$. Thus $$S = \sum_\lambda p_\lambda \underbrace{\sum_{a,b}(-1)^{ab} \mathbb{E}_{a,\lambda}[X]\mathbb{E}_{b,\lambda}[Y]}_{\equiv S_\lambda}.$$ One can show that $-2\le S_\lambda\le 2$ for all $\lambda$, and thus $-2\le S\le 2$. To this end, an observation which significantly simplify calculations is to notice that if $p(xy|ab)$ is as in (1), then up to a redefinition of the hidden variable $\lambda$, we can assume the local distributions to be deterministic. This means we can assume without loss of generality that $p_\lambda(x|a)=\delta_{x,x_a}$ for some binary function $a\mapsto x_a$. We are saying that, for any $\lambda$ and $a$, either $p_\lambda(0|a)=1$ or $p_\lambda(1|a)=1$.

The reason this makes things a lot easier is that, if the local distribution are deterministic, then the expectation values are just $\mathbb{E}_{a}[X],\mathbb{E}_{b}[Y]\in\{+1,-1\}$. But then, $$S_\lambda = \mathbb{E}_0[X] ( \mathbb{E}_0[Y] + \mathbb{E}_1[Y]) + \mathbb{E}_1[X] ( \mathbb{E}_0[Y] - \mathbb{E}_1[Y]) \in \{\pm2\mathbb{E}_0[X], \pm2 \mathbb{E}_1[X]\} \subseteq\{+2,-2\}.$$ In words, we just showed that $S_\lambda\in\{-2,2\}$, thus $|S_\lambda|=2$, and thus $|S|\le 2$.

$S$ operator via $P_{\rm same}$ probabilities

I personally am not a fan of the above formalism with expectation values of introduced random variables. Fortunately, there is a way to introduce $S$ sticking to more "natural" probability distributions.

For a given pair of measurement choices $a,b$, consider the probability of Alice and Bob obtaining the same measurement outcome. This reads $$P_{\rm same}(a,b) \equiv p(00|ab) + p(11|ab).$$ Observe the connection with the previously introduced formalism with expectation values: $$\mathbb{E}_{ab}[XY] = p(00|ab) - p(01|ab) - p(10|ab) + p(11|ab) = 2P_{\rm same}(a,b) - 1.$$ This is nice because it shows that the expectation value is essentially just probing the probability of getting identical outcomes.

Local realism constraint

Now, as above, consider the case in which the probability can be described by a local hidden variable. Note that in this case we can write the probability as $$P_{\rm same}(a,b) = p(0|a)p(0|b) + p(1|a)p(1|b).$$ Note that I'm somewhat abusing notation here: the marginal conditional probabilities need not be equal to each other (that is, even if $a=b$, we could have $p(0|a)\neq p(0|b)$; more precisely, we should have written something like $p_A(0|a)p_B(0|b)$ etc).

Furthermore, as shown above, we can assume the associated local conditional probabilities to be deterministic. A nice way to express this algebraically is to observe that there are precisely $4^2$ (local) deterministic assignments, corresponding to all the possible binary functions $x=x(a),y=y(b):\{0,1\}\to\{0,1\}$. For example, $x(a)=x(a')=1- y(b)=1- y(b')=1$ corresponds to the case in which Alice always finds the outcome "$1$" and Bob always finds the outcome "$0$". There is a nice way to connect (local deterministic) probabilities and these functions: $$p_A(1|a) = x(a), \qquad p_A(0|a) = 1-x(a),$$ and same for $p_B$ and $y$. We can thus write $$P_{\rm same}(a,b) = 1 - x(a) - y(b) + 2x(a) y(b),$$ and thus $$\frac12 \sum_{a,b} (-1)^{ab}P_{\rm same}(a,b) = 1 - x(a) - y(b) + x(a)[y(b)+y(b')]+x(a')[y(b)-y(b')]. \tag2$$ One can now observe that this quantity can take values $0$ or $1$, regardless of the choice of functions $x,y:\{0,1\}\to\{0,1\}$. In other words, any joint probability distribution which results in (2) having values not in $\{0,1\}$, is necessarily nonlocal.

Examples of nonlocal behaviour

As an example of nonlocal behaviour, consider the following: $$\begin{array}{c|cccc} & 00&01 & 10 & 11\\\hline 00 &1/2&1/2&1/2&0 \\ 01 &0&0&0&1/2 \\ 10 &0&0&0&1/2 \\ 11 &1/2&1/2&1/2&0\end{array}$$ where each column corresponds to a measurement choice, and each row to the correponding possible measurement outcomes. Notice that this gives $$P_{\rm same}(00) = P_{\rm same}(01) = P_{\rm same}(10) = 1$$ and $P_{\rm same}(11)=0$, which results $$\frac12\sum_{a,b}(-1)^{ab}P_{\rm same}(a,b) = 3,$$ which as discussed above means the behaviour is nonlocal (in fact, as it turns out, this behaviour is also incompatible with quantum mechanics).

Interpretation of constraint in terms of 4 coin flips

A nice way to understand where the constraint in the above quantity comes from is to realise that, once we reduced the problem to analysing local deterministic behaviours, we are essentially studying the probability of getting pairs of identical outcomes when tossing four independent (possibly unfair) coins.

Consider for example the previously introduced nonlocal behaviour. We can understand its incompatible with local realism by thinking of four coins: if we find the first two to have the same state (say, they are both heads), and the other two in opposite state (one is head and the other is tail), then necessarily either 1st and 3rd, or 2nd and 4th, must also have different state. But this is incompatible with the $P_{\rm same}(01)=P_{\rm same}(10)=1$, hence the conclusion.


Related posts

Some related posts include:

glS
  • 14,271
  • Nice. I only find the notation $\sum_\lambda p_\lambda p_\lambda(x|a) p_\lambda(y|b)$ confusing. IMO, $\sum_\lambda p(\lambda) p_\lambda(x|a) p_\lambda(y|b)$ is better, though it is not ideal too. $\sum_\lambda p(\lambda)$ is probability that "the hidden parameter" takes value $\lambda$, while $p_\lambda(..)$ is probability of some event given "the hidden parameter" takes value $\lambda$. This is the difference. – kludg May 20 '23 at 15:11
  • tbh I don't really see a big difference between those, as the notation still doesn't explicit the fact that the different $p$ are really different functions. But yes, I agree that when writing these things there's a large degree of notational abuse. A more careful way would be to write something like $\sum_\lambda p_\Lambda(\lambda)p_{X|A\Lambda}(x|a,\lambda)p_{Y|B\Lambda}(y|b,\lambda)$. Which is also a common notational approach for probabilities in situations where different registers are important (eg Mark Wilde's book when discussing entropies and capacities) – glS May 20 '23 at 17:22
  • though I guess if you think of $p$ not as a function but more as a shorthand for $\operatorname{Prob}$, then something like $p(\lambda)$ might make sense, understood as shorthand for $p(\Lambda=\lambda)$, and similarly for the other terms – glS May 20 '23 at 17:27
  • $p$ is a common name for PMF (Probability Mass Function) of a random variable, and good notation should include the name of this variable, and its argument, like $p_\Lambda(\lambda)$ or $p_{X|\Lambda}(x|\lambda)$. This is how I was taught in my probability course. – kludg May 20 '23 at 17:50
  • I'd finally have written you first formula as $$p_{XY}(xy|ab) = \sum_\lambda p_\Lambda(\lambda) p_{X|\Lambda}(x|\lambda,a)) p_{Y|\Lambda}(y|\lambda,b)$$ – kludg May 20 '23 at 18:29