How exactly does the proof of Bell's theorem fail if you remove the locality assumption?

Question

In this paper Bell derives his famous inequality using the assumtions of locality and realism. In order to understand how the locality assumption affects the derivation of the inequality, and why it is needed for the equality, I have attempted to re-derive the inequality, first assuming locality and then a second time assuming nonlocality, to see what the difference is. However, my derivations seem to say that there is no difference, which implies that nonlocality cannot be concluded from a Bell test, which is wrong (or some other, smarter, mathematician would have pointed it out by now). Where am I making my mistake(s)? Note: I know there are other similar questions regarding nonlocality in the CHSH inequality. I have read them and I don't see their application to this (the original) form of Bell's inequality (they use different mathematical formalism and expression which I do not see appear in Bell's original derivation).

The system is a pair of entangled particles. Let $A = \pm 1$ be the result of Alice's measurement of one particle's spin, and let $B = \pm 1$ be the result of Bob's measurement of the other's. Let $\mathbf{\alpha}$ and $\mathbf{\beta}$ be unit vectors representing Alice and Bob's measurement directions respectively. Let $\lambda$ represent a set of any number of hidden variables and $\rho = \rho(\lambda)$ the normalized probability distribution of $\lambda$.

As far as I can tell, the locality assumption amounts to assuming that $A = A(\mathbf{\alpha}, \lambda) \neq A(\mathbf{\alpha}, \mathbf{\beta}, \lambda)$, or that $A$ is independent of $\mathbf{\beta}$, and likewise for $B$ and $\mathbf{\alpha}$ (this may be my mistake if there is more to it than this).

Local derivation: $A(\mathbf{\alpha}, \lambda) = \pm 1$, $B(\mathbf{\beta}, \lambda) = \pm 1$. The expectation value of $AB$ is

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \lambda) B(\mathbf{\beta}, \lambda)\, d \lambda. \end{equation}

For a given measurement direction $\mathbf{a}$,

\begin{equation} P(\mathbf{a}, \mathbf{a}) = \int \rho A(\mathbf{a}, \lambda) B(\mathbf{a}, \lambda)\, d \lambda = -1 \implies A(\mathbf{a}, \lambda) = -B(\mathbf{a}, \lambda). \end{equation}

$P(\mathbf{a}, \mathbf{a}) = -1$ implies that the particles are anticorrelated, and so by rewriting the expectation value of $A B$ as

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = -\int \rho A(\mathbf{\alpha}, \lambda) A(\mathbf{\beta}, \lambda)\, d \lambda \tag{1} \end{equation}

(in other words, by assuming $A(\mathbf{\alpha}, \lambda) = -B(\mathbf{\beta}, \lambda)$ is always valid) we mathematically represent the assumption that the state of our two-particle system is restricted to a maximally anticorrelated state ($| \Psi^\pm \rangle$). Using this last expression, we get (for some unit vectors $\mathbf{a}$, $\mathbf{b}$, and $\mathbf{c}$)

\begin{align} P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c}) =& -\int \rho \Big( A(\mathbf{a}, \lambda)A(\mathbf{b}, \lambda) - A(\mathbf{a}, \lambda) A(\mathbf{c}, \lambda) \Big) d\lambda \\ =& -\int \rho A(\mathbf{a}, \lambda)A(\mathbf{b}, \lambda) \Big( 1 - \frac{A(\mathbf{a}, \lambda) A(\mathbf{c}, \lambda)}{A(\mathbf{a}, \lambda)A(\mathbf{b}, \lambda)} \Big) d\lambda \\ =& \int \rho A(\mathbf{a}, \lambda)A(\mathbf{b}, \lambda) \Big( A(\mathbf{b}, \lambda) A(\mathbf{c}, \lambda) - 1 \Big) d\lambda, \end{align}

\begin{equation} |P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c})| \leq \int \rho \Big( 1 - A(\mathbf{b}, \lambda)A(\mathbf{c}, \lambda) \Big) d\lambda = 1 - P(\mathbf{b}, \mathbf{c}), \end{equation}

\begin{equation} |P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c})| + P(\mathbf{b}, \mathbf{c}) \leq 1. \end{equation}

Nonlocal derivation: $A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) = \pm 1$, $B(\mathbf{\beta}, \mathbf{\alpha}, \lambda) = \pm 1$. The expectation value of $AB$ is

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda. \end{equation} \begin{equation} P(\mathbf{a}, \mathbf{a}) = \int \rho A(\mathbf{a}, \mathbf{a}, \lambda) B(\mathbf{a}, \mathbf{a}, \lambda)\, d\lambda = -1 \implies A(\mathbf{a}, \mathbf{a}, \lambda) = -B(\mathbf{a}, \mathbf{a}, \lambda), \end{equation} \begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = -\int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) A(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda, \tag{2} \end{equation} \begin{align} P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c}) =& -\int \rho \Big( A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda) - A(\mathbf{a}, \mathbf{c}, \lambda) A(\mathbf{c}, \mathbf{a}, \lambda) \Big) d\lambda \\ =& -\int \rho A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda) \Big( 1 - \frac{A(\mathbf{a}, \mathbf{c}, \lambda) A(\mathbf{c}, \mathbf{a}, \lambda)}{A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda)} \Big) d\lambda, \end{align} \begin{equation} |P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c})| \leq 1 - \int \rho \big( A(\mathbf{a}, \mathbf{c}, \lambda) A(\mathbf{c}, \mathbf{a}, \lambda) A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda) \big) d\lambda, \end{equation} \begin{equation} |P(\mathbf{a}, \mathbf{b}) - P(\mathbf{a}, \mathbf{c})| + \int \rho \big( A(\mathbf{a}, \mathbf{c}, \lambda) A(\mathbf{c}, \mathbf{a}, \lambda) A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda) \big) d\lambda \leq 1. \end{equation}

Question: My result is of the same form as Bell's, but I cannot simplify the third term on the left to $P(\mathbf{b}, \mathbf{c})$, so the third term retains its nonlocal dependence on $A$'s second argument. Despite this, both $\int \rho \big( A(\mathbf{a}, \mathbf{c}, \lambda) A(\mathbf{c}, \mathbf{a}, \lambda) A(\mathbf{a}, \mathbf{b}, \lambda) A(\mathbf{b}, \mathbf{a}, \lambda) \big) d\lambda$ and $P(\mathbf{b}, \mathbf{c})$ are restricted to the range $-1 \leq x \leq 1$, so both inequalities should lead to the same experimental conclusions regarding local realism. So what difference does the locality assumption make? What assumption am I misrepresenting? Or what other mistake am I making?

score 6 · Accepted Answer · answered Sep 25 '18 at 23:52

In my derivation, I make my error at equation $(2)$, attempting to extend the logic employed by Bell in arriving at equation $(1)$.

Bell's local derivation uses the assumption that the system being observed is in an anticorrelated state to obtain the equality

\begin{equation} A(\mathbf{a}, \lambda) = -B(\mathbf{a}, \lambda), \end{equation}

in which $\mathbf{a}$ represents a specific choice of measurement angle. However, there is no dependence on another angle $\mathbf{b}$ in the above, and so it is just as general as writing the equality

\begin{equation} A(\mathbf{\beta}, \lambda) = -B(\mathbf{\beta}, \lambda). \end{equation}

This allows us to obtain expression $(1)$:

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = -\int \rho A(\mathbf{\alpha}, \lambda) A(\mathbf{\beta}, \lambda)\, d\lambda. \end{equation}

In the nonlocal derivation, however, $A = A(\mathbf{\alpha}, \mathbf{\beta}, \lambda)$ and $B = B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)$ have nonlocal dependence on two angles, not just one. The assumption of the singlet state gives us

\begin{equation} A(\mathbf{a}, \mathbf{a}, \lambda) = -B(\mathbf{a}, \mathbf{a}, \lambda). \end{equation}

In the above, $A$ and $B$ are equal when Alice and Bob choose the same measurement angle, or when $\mathbf{\alpha} = \mathbf{\beta}$, and so the above can be written

\begin{equation} A(\mathbf{\beta}, \mathbf{\beta}, \lambda) = -B(\mathbf{\beta}, \mathbf{\beta}, \lambda) \neq B(\mathbf{\beta}, \mathbf{\alpha}, \lambda). \end{equation}

It is important to note that, because $A$ ande $B$ depend on two angles, the relationship above is only true when the two angles are the same. In the expression $P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda$, $-A(\mathbf{\beta}, \mathbf{\beta}, \lambda)$ cannot be substituted to obtain expression $(2)$:

\begin{equation} P(\mathbf{\alpha}, \mathbf{\beta}) = \int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) B(\mathbf{\beta}, \mathbf{\alpha}, \lambda)\, d\lambda \neq -\int \rho A(\mathbf{\alpha}, \mathbf{\beta}, \lambda) A(\mathbf{\beta}, \mathbf{\beta}, \lambda)\, d\lambda. \end{equation}

This inability to rewrite $P(\mathbf{\alpha}, \mathbf{\beta})$ for the singlet state halts the nonlocal derivation if attempting to apply the same steps as Bell in his local derivation.

Side note: This does not prove that another approach could not render a Bell's inequality with the assumption of nonlocality, but proving that was not my purpose.

Connor Dolan · Answer 2 · 2018-09-27T14:17:34.083

Despite this, both ∫ρ(A(a,c,λ)A(c,a,λ)A(a,b,λ)A(b,a,λ))dλ and P(b,c) are restricted to the range −1≤x≤1, so both inequalities should lead to the same experimental conclusions regarding local realism.

This does not follow. The fact that the correlation must obey $-1<x<1$ is less restrictive than Bell's inequalities. As far as I can tell you didn't make any mistake, the first derivation is the correlation that a local theory must obey, and the second has a term you can't reduce to $P(b, c)$ which allows it to violate the inequality.

In any case, I found Bell's original derivation hard to follow until I understood the inequality another way. He also misuses conditional probabilities as noted by E.T. Jaynes (though I think the error is ultimately not fatal).

I offer the following derivation if you wish to use it for understanding Bell more clearly in hindsight. Consider three ordered lists containing elements $-1$ and $1$.

$$a=\{1,1, -1 ... \}$$ $$b=\{1,-1, 1, ... \}$$ $$c=\{-1,1, 1, ... \}$$

Denoting elements $a_i$, $b_i$ and $c_i$, we have:

$$a_ib_i-a_ic_i=a_ib_i-a_ic_i$$ Since $b_i^2=1$: $$\implies a_ib_i-a_ic_i=a_ib_i(1-b_ic_i)$$ $$\implies |a_ib_i-a_ic_i|=|1-b_ic_i|$$ Since the RHS is never negative, we may drop the absolute value: $$\implies |a_ib_i-a_ic_i|=1-b_ic_i$$ Now by summing over the terms and denoting: $$\langle ab\rangle =\frac{1}{N}\sum_{i=1}^N a_ib_i$$ And using the fact that: $$\sum_{i=1}^N|A_i|\geq\left|\sum_{i=1}^NA_i\right|$$ We obtain: $$|\langle ab\rangle -\langle ac\rangle|\leq 1 - \langle bc \rangle$$

This is an identity. If you give any three lists for $a$, $b$, and $c$, this inequality always hold.

However, you might find yourself in the peculiar situation that you can only sample two of these lists at a time for any element $i$. Now there is a chance it will be violated, but if we assume that those three lists of numbers exist in principle (a.k.a. hypothetical measurements), then violations can only happen up to statistical fluctuations of the order $\sim 1/\sqrt{N}$.

QM violates this, which apparently means that those three lists don't exist, even in principle.

Of course, these lists also might not exist for variables that somehow communicate with each-other, or if the system knew ahead of time which we were going to measure and conspired against us. In both these cases if we measure $a$ and $b$ we can't talk about what $c$ would have been because what we measure plays an active role in the outcome.

+1 for the alternate view of the derivation. I have an easier time seeing realism in your derivation than locality, complementary to Bell's derivation in which I see locality more readily but cannot locate realism as of yet. That may be another question I post later... — The Ledge, Sep 26 '18 at 15:03
You said that you don't think I made any mistakes in my derivation. I don't know if you saw my answer where I identified a mistake, but if you think my answer is incorrect would you mind commenting and pointing out how my supposed "error" isn't an error? — The Ledge, Sep 26 '18 at 15:06
This derivation above necessitates either non-locality or non-realism or super-determinism, but it doesn't make an assumption regarding them or say which one is favored (you can even pick multiple if you really want physicists to suffer). The fact that the lists don't exist can be interpreted through some esoteric notion of "superposition" and not being "real" until you created it with a measurement, but you'd obtain the same result in with the more pedestrian picture that the variables are somehow in non-local communication, or are masterminds that predict what you are going to measure. — Connor Dolan, Sep 26 '18 at 15:52
As for your mistake, I think you addressed if sufficiently, though I haven't had time to look into it carefully. $A(a, a, \lambda)=-B(a, a, \lambda)$ does not imply $A(a, b, \lambda)=-B(b, a, \lambda)$ seems right. If you want to try to work it out carefully where you can follow the logic in gory detail use Bayes rule define correlations by $\langle AB \rangle = P(AB)+P(\bar{A}\bar{B})-P(A\bar{B})-P(\bar{A}B)$. — Connor Dolan, Sep 26 '18 at 16:14
is there a typo in the last equation? shouldn't it be $\lvert\langle ab\rangle-\langle ac\rangle\rvert \le 1 - \langle bc\rangle$? — glS, Sep 27 '18 at 08:15

H. Cooper · Answer 3 · 2018-09-26T03:33:01.873

0

It seems to me that $A(a,b,\lambda) =\pm 1$ is not the correct way of specifying a non-locality. If Alice makes a measurement then the non-locality assumption implies the remote $b$-vector is effectively superimposed on her location when her local setting is $a$. Since this would constitute a superposition of both vectors, they should be added (and normalized) to get the unit vector $w$. Evidently the same argument would have Bob using $w$. Then, if $A(w,\lambda) =\pm 1$, it must be the case that $B(w,\lambda) =\mp 1$.

Because of the non-locality, both observers are forced to use the same measurement direction locally with the result that they are always anti-correlated in a way that is inconsistent with the notion of a locality that would allow locally independent measurement directions.

It follows that there is no Bell theorem to be derived under the non-locality assumption.

edited Sep 26 '18 at 03:33

answered Sep 26 '18 at 02:57

H. Cooper

199

@HCooper your argument seems to imply that $A$ and $B$ are dependent on $\mathbf{a}$ and $\mathbf{b}$ in the same way ($\mathbf{a} + \mathbf{b} = \mathbf{b} + \mathbf{a} = \mathbf{w}$). However, it seems that $A$ would depend more heavily on $\mathbf{a}$ than on $\mathbf{b}$, and $B$ more heavily on $\mathbf{b}$ than on $\mathbf{a}$, so $A = A(\mathbf{a} + \mathbf{b}), B = B(\mathbf{b} + \mathbf{a})$ wouldn't be the correct representation either. In general, I would expect $A(\mathbf{a}, \mathbf{b}) \neq A(\mathbf{b}, \mathbf{a})$. – The Ledge Sep 26 '18 at 15:30
OK, suppose that dependence is expressed $A = A(a + fb)$ and $B = B(b + fa)$, where $f$ is less than one. Then, in violation of locality, any observation by alice would be instantly modified by any change in $b$ made by bob. – H. Cooper Sep 26 '18 at 22:13
Of course this locality violation is implicit in your original definition of $A = A(a,b)$, so it seems that a derivation of bell's theorem is not a possibility since it would rule out that non-locality – H. Cooper Sep 26 '18 at 22:38
Correct me if I've misunderstood, but you're essentially saying that $A(\mathbf{a}, \mathbf{b}, \lambda)$ is more general than, and inclusive of, $A(c_{1} \mathbf{a} + c_{2} \mathbf{b}, \lambda)$, so by the reasoning in my answer the nonlocality you suggested also prevents the construction of Bell's inequality? – The Ledge Sep 27 '18 at 00:56
I think the best way to put it, is that if you had succeeded in constructing bell's theorem by your approach, it would show that there is a class of non-local correlation theories that are inconsistent with the non-local correlations of QM--an important result. – H. Cooper Sep 27 '18 at 02:26
Of course, since this would only rule out the non-quantum localities, there is no reason to believe Bell's inequality cannot be derived in this case. In fact, by defining distinct three linear combinations of a, b, c, x,y,z, the c1 c2 coefficients could be locally adjusted so that x,y,z are parallel or anti-parallel. The derivation by given Connor Dolan could then be used to derive bell's inequality for that special case. – H. Cooper Oct 05 '18 at 19:46

How exactly does the proof of Bell's theorem fail if you remove the locality assumption?

3 Answers3