What exactly does $S$ represent in the CHSH inequality
$$-2~~\leq ~S~\leq ~2?$$
Sorry I've been reading for a couple days and I can't figure out what exactly $S$ is and the math is a bit over my head.
Any help is much appreciated.
What exactly does $S$ represent in the CHSH inequality
$$-2~~\leq ~S~\leq ~2?$$
Sorry I've been reading for a couple days and I can't figure out what exactly $S$ is and the math is a bit over my head.
Any help is much appreciated.
The $S$ in this inequality is defined as $$ S = E(a b) − E(a b') + E(a' b) + E(a' b') $$ where $E(M)$ is the "expectation value of $M$" which means the average value calculated from many repetitions of the same experiment (empirically) or from the probability distributions (theoretically).
All the expectation values are taken from the products of two quantities in the list $\{a,a',b,b'\}$. Each of these four quantities – which represent the spin measurement of the first particle with respect to two axes ($a,a'$) or the second particle with respect to two axes ($b,b'$) – is equal to either $+1$ or $-1$ so all the four products above are either $+1$ or $-1$, too. The positive value means that the two factors in the product yield the same sign; the negative value means that they have the opposite sign.
For example, $E(ab)$ may be interpreted as $1-2P_{a\neq b}$ where $P_{a \neq b}$ is the probability that $a,b$ will be measured with the opposite sign if you decide to measure the unprimed spin for the $a$ particle and unprimed spin for the $b$ particle, too. The other two $E(\dots)$ terms are similarly linked to the analogous probabilities $P_{i,j}$. So $$ S = 2 - P_{a \neq b} + P_{a\neq b'} - P_{a'\neq b} - P_{a'\neq b'} $$ You saw that $S$ is composed of four terms and each of them is in between $-1$ and $+1$. So a priori, $S$ could be anything between $-4$ and $+4$. However, the assumptions of realism and locality are enough to prove that in local realist theories, $S$ is always between $-2$ and $+2$; quantum mechanics allows $S$ to be as high as $2\sqrt{2}\sim 2.8$ which exceeds the interval allowed by local realist theories and experiments confirm that this $2.8$ is realized in the appropriate spin experiments, thus falsifying local realist theories.
An intuitive sketch why $S$ can't be greater than two in local realist theories is this one: to maximize $S$, you want the terms with the plus signs to be close to $1$ and the second term to be close to $-1$. However, it's not possible. If the first, third, and fourth terms are very close to one, it means that $a$ is highly correlated with $b$, $a'$ is also correlated with $b$, and $a'$ is correlated with $b'$. Transitive laws are enough to see that the first two conditions imply that $a$ is highly correlated with $a'$ and in combination with the last correlation, $a'$ with $b'$, it means that $a$ is correlated with $b'$, too. So the second term $-E(ab')$ is large negative, i.e. close to $-1$, and it gets mostly subtracted. Therefore, we get a number close to $S=2$ and not $S=4$.
Maths is needed to show that $2$ is the upper bound in local realist theories and $2.8$ is the prediction in quantum mechanics (confirmed experimentally). In quantum mechanics, it is impossible to deduce that $a$ is highly correlated with $a'$ out of the first two conditions. For a perpendicular pair of axes, they may be totally uncorrelated even though the previous two correlations may be very high.
Consider a setup where Alice and Bob perform a measurement, each on their share of the system, which can result in one of two outputs. Each one of them is allowed to choose between two different ways of measuring (independently of each other, so Alice might choose two measurement bases and Bob two different ones).
The general way to describe correlations in such a setup is via the joint probability distribution $p(xy|ab)$, where $x,y\in\{0,1\}$ represent the possible measurement outcomes, and $a,b\in\{0,1\}$ the possible measurement choices, for Alice and Bob respectively. Notice that we might have equivalently chosen $x,y,a,b\in\{+1,-1\}$ instead, this is purely conventional.
The gist of Bell's nonlocality is that the assumption of local realism in itself constrains the form of $p(xy|ab)$. More precisely, "local realism" means that we can write $$p(xy|ab) = \sum_\lambda p_\lambda p_\lambda(x|a) p_\lambda(y|b),\tag1$$ for some "local hidden variable" $\lambda$ and some probability distributions $p_\lambda,p_\lambda(x|a),p_\lambda(y|b)$. To be clear, nonlocality means that there is no way to write $p(xy|ab)$ in such a form. It is worth noting that there is some abuse of notation in the above equation, with the symbol $p$ meaning different things in different parts. A more precise way to write it would be $$p_{XY|AB}(xy|ab)=\sum_{\lambda\in\Lambda}p_\Lambda(\lambda) p_{X|A\Lambda}(x|a\lambda)p_{Y|B\Lambda}(y|b\lambda),$$ to distinguish between the different functions. Here $$p_{XY|AB}(xy|ab)\equiv \operatorname{Prob}(X=x,Y=y|A=a,B=b),$$ and similarly for the other quantities. I will keep using the shorthand notation in the following, as the correct way to interpret the notation is generally clear from the context.
The standard way to show that CHSH's $S$ does the trick is to write it as $$S \equiv \mathbb{E}_{00}[XY] + \mathbb{E}_{01}[XY] + \mathbb{E}_{10}[XY] - \mathbb{E}_{11}[XY],$$ where $\mathbb{E}_{ab}[XY]$ stands for the expectation value of the random variable $XY$, computed over the probability distribution corresponding to the measurement choices $a,b\in\{0,1\}$. The random variables $X,Y$ are a function of the possible outcomes $x,y$, except that they mark the outcomes as $\pm1$ rather than as $0,1$. So, more explicitly, these are $$\mathbb{E}_{ab}[XY]= \sum_{x,y\in\{0,1\}} (-1)^{x+y} p(ab|xy).$$ We can then also write $S$ in similarly compact notation as $$S = \sum_{a,b} (-1)^{ab} \mathbb{E}_{ab}[XY] = \sum_{a,b,x,y} (-1)^{x+y+ab} p(ab|xy).$$
Now, suppose $p(ab|xy)$ can be written as (1). Then $$\mathbb{E}_{ab}[XY] = \sum_\lambda p_\lambda \mathbb{E}_{a,\lambda}[X] \mathbb{E}_{b,\lambda}[Y],$$ where $\mathbb{E}_{a,\lambda}$ denotes the expectation value computed with respect to the marginal probability distribution $x\mapsto p_\lambda(x|a)$, and similarly for $\mathbb{E}_{b,\lambda}$. Thus $$S = \sum_\lambda p_\lambda \underbrace{\sum_{a,b}(-1)^{ab} \mathbb{E}_{a,\lambda}[X]\mathbb{E}_{b,\lambda}[Y]}_{\equiv S_\lambda}.$$ One can show that $-2\le S_\lambda\le 2$ for all $\lambda$, and thus $-2\le S\le 2$. To this end, an observation which significantly simplify calculations is to notice that if $p(xy|ab)$ is as in (1), then up to a redefinition of the hidden variable $\lambda$, we can assume the local distributions to be deterministic. This means we can assume without loss of generality that $p_\lambda(x|a)=\delta_{x,x_a}$ for some binary function $a\mapsto x_a$. We are saying that, for any $\lambda$ and $a$, either $p_\lambda(0|a)=1$ or $p_\lambda(1|a)=1$.
The reason this makes things a lot easier is that, if the local distribution are deterministic, then the expectation values are just $\mathbb{E}_{a}[X],\mathbb{E}_{b}[Y]\in\{+1,-1\}$. But then, $$S_\lambda = \mathbb{E}_0[X] ( \mathbb{E}_0[Y] + \mathbb{E}_1[Y]) + \mathbb{E}_1[X] ( \mathbb{E}_0[Y] - \mathbb{E}_1[Y]) \in \{\pm2\mathbb{E}_0[X], \pm2 \mathbb{E}_1[X]\} \subseteq\{+2,-2\}.$$ In words, we just showed that $S_\lambda\in\{-2,2\}$, thus $|S_\lambda|=2$, and thus $|S|\le 2$.
I personally am not a fan of the above formalism with expectation values of introduced random variables. Fortunately, there is a way to introduce $S$ sticking to more "natural" probability distributions.
For a given pair of measurement choices $a,b$, consider the probability of Alice and Bob obtaining the same measurement outcome. This reads $$P_{\rm same}(a,b) \equiv p(00|ab) + p(11|ab).$$ Observe the connection with the previously introduced formalism with expectation values: $$\mathbb{E}_{ab}[XY] = p(00|ab) - p(01|ab) - p(10|ab) + p(11|ab) = 2P_{\rm same}(a,b) - 1.$$ This is nice because it shows that the expectation value is essentially just probing the probability of getting identical outcomes.
Now, as above, consider the case in which the probability can be described by a local hidden variable. Note that in this case we can write the probability as $$P_{\rm same}(a,b) = p(0|a)p(0|b) + p(1|a)p(1|b).$$ Note that I'm somewhat abusing notation here: the marginal conditional probabilities need not be equal to each other (that is, even if $a=b$, we could have $p(0|a)\neq p(0|b)$; more precisely, we should have written something like $p_A(0|a)p_B(0|b)$ etc).
Furthermore, as shown above, we can assume the associated local conditional probabilities to be deterministic. A nice way to express this algebraically is to observe that there are precisely $4^2$ (local) deterministic assignments, corresponding to all the possible binary functions $x=x(a),y=y(b):\{0,1\}\to\{0,1\}$. For example, $x(a)=x(a')=1- y(b)=1- y(b')=1$ corresponds to the case in which Alice always finds the outcome "$1$" and Bob always finds the outcome "$0$". There is a nice way to connect (local deterministic) probabilities and these functions: $$p_A(1|a) = x(a), \qquad p_A(0|a) = 1-x(a),$$ and same for $p_B$ and $y$. We can thus write $$P_{\rm same}(a,b) = 1 - x(a) - y(b) + 2x(a) y(b),$$ and thus $$\frac12 \sum_{a,b} (-1)^{ab}P_{\rm same}(a,b) = 1 - x(a) - y(b) + x(a)[y(b)+y(b')]+x(a')[y(b)-y(b')]. \tag2$$ One can now observe that this quantity can take values $0$ or $1$, regardless of the choice of functions $x,y:\{0,1\}\to\{0,1\}$. In other words, any joint probability distribution which results in (2) having values not in $\{0,1\}$, is necessarily nonlocal.
As an example of nonlocal behaviour, consider the following: $$\begin{array}{c|cccc} & 00&01 & 10 & 11\\\hline 00 &1/2&1/2&1/2&0 \\ 01 &0&0&0&1/2 \\ 10 &0&0&0&1/2 \\ 11 &1/2&1/2&1/2&0\end{array}$$ where each column corresponds to a measurement choice, and each row to the correponding possible measurement outcomes. Notice that this gives $$P_{\rm same}(00) = P_{\rm same}(01) = P_{\rm same}(10) = 1$$ and $P_{\rm same}(11)=0$, which results $$\frac12\sum_{a,b}(-1)^{ab}P_{\rm same}(a,b) = 3,$$ which as discussed above means the behaviour is nonlocal (in fact, as it turns out, this behaviour is also incompatible with quantum mechanics).
A nice way to understand where the constraint in the above quantity comes from is to realise that, once we reduced the problem to analysing local deterministic behaviours, we are essentially studying the probability of getting pairs of identical outcomes when tossing four independent (possibly unfair) coins.
Consider for example the previously introduced nonlocal behaviour. We can understand its incompatible with local realism by thinking of four coins: if we find the first two to have the same state (say, they are both heads), and the other two in opposite state (one is head and the other is tail), then necessarily either 1st and 3rd, or 2nd and 4th, must also have different state. But this is incompatible with the $P_{\rm same}(01)=P_{\rm same}(10)=1$, hence the conclusion.
Some related posts include: