Why must we take the orthogonal projection to determine the probability distribution for measuring a self-adjoint operator?

Question

This is inspired from Brian Hall's "Quantum Theory for Mathematicians", in which he says (page 125):

Suppose $A$ is a self-adjoint operator. Given a Borel set $E$ of $\mathbb{R}$, let $V_E$ be the closed span of all the eigenvectors for $A$ with eigenvalues in $E$, and let $P_E$ be the orthogonal projection onto $V_E$. Then for any unit vector $\psi$, we have $$\text{prob}_\psi(A \in E) = \langle \psi, P_E\psi\rangle.$$

Why exactly is the orthogonal projection required here? What is the intuition, both mathematically and physically?

Do you mean, why must $P_E$ be an orthogonal projection in particular? Or are you asking why projection operators are involved at all? — J. Murray, Jun 29 '22 at 06:12
@J.Murray I understand why we must consider a projection, since we are only interested in what happens in $V_E$. I am curious as to why we require the projection to necessarily be orthogonal. — CBBAM, Jun 29 '22 at 06:44
Well, the statement is wrong as it stands. For instance the position operator has no eigenvectors (in the proper sense of the statement) so the final identity makes no sense. It makes sense in finite dimensional Hilbert spaces where the final identity is a postulate. — Valter Moretti, Jun 29 '22 at 08:29
@ValterMoretti This is assuming the operator does have an orthonormal basis of eigenvectors, so I believe it is correct. I should have included that assumption in the OP, my apologies. — CBBAM, Jun 29 '22 at 18:17
How would you non-orthogonally project onto a particular eigenspace of an operator that has orthogonal eigenvectors? — Connor Behan, Jun 29 '22 at 20:22

J. Murray · Accepted Answer · 2022-06-30T18:55:48.923

A quick review, for those who are less familiar with the text. At it's core, a physical theory is a mechanism for assigning probabilities to the possible outcomes of experiments. More specifically, given an $\mathbb R$-valued observable $\mathscr O$ and a (Borel-measurable) set $E\subseteq \mathbb R$, we may ask for the probability that we measure $\mathscr O$ to take its value in $E$.

In the standard formulation of quantum mechanics on an $n$-dimensional Hilbert space $\mathscr H$, we model an observable via a self-adjoint operator $\hat{\mathscr O}$. The possible outcomes of an ideal measurement correspond to the operator's spectrum $\sigma\big(\hat{\mathscr O}\big)$, which consists of the eigenvalues of $\hat{\mathscr O}$. The spectral theorem tells us that $\hat{\mathscr O}$ induces a splitting of the Hilbert space $$\mathscr H = \bigoplus_{i=1}^K V_i = V_1\oplus \ldots\oplus V_K$$ where $V_i$ is the $i^{th}$ eigenspace of $\hat{\mathscr O}$, $K$ is the number of distinct eigenvalues, and $V_i\perp V_j$ for all $i\neq j$. As a result, any vector $\psi\in \mathscr H$ can be uniquely written as $\psi = \sum_{i=1}^K \psi_i$ where $\hat{\mathscr O}\psi_i = \lambda_i \psi_i$. It is then a postulate of the theory that if the state of the system is represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take the value $\lambda_i$ is $\Vert \psi_i \Vert^2$.

To make this framework cleaner, we define the projection-valued measure $\pi$ which eats a Borel-measurable set $E\subseteq \mathbb R$ and spits out the projector onto the direct sum of eigenspaces whose eigenvalues lie in $E$. For example: $$\hat{\mathscr O}=\pmatrix{1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2}$$ $$\pi\big(\{1\}\big) = \pmatrix{1&0&0\\0&0&0\\0&0&0} \quad \pi\big(\{2\}\big) = \pmatrix{0&0&0\\0&1&0\\0&0&1} \quad \pi\big(\{1,2\}\big) = \pmatrix{1&0&0\\0&1&0\\0&0&1}$$ $$\pi\big(\{-7\}\big) = \pmatrix{0&0&0\\0&0&0\\0&0&0}$$ This allows us to define the spectral decomposition $\hat{\mathscr O}=\sum_{\lambda\in \mathbb R} \lambda \cdot \pi\big(\{\lambda\}\big)$. It also allows us to answer our motivating question in a straightforward way: for a state represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take its value in a Borel-measurable set $E\subseteq \mathbb R$ is simply $$\mathrm{Prob}_\psi(E) = \langle \psi, \pi(E) \psi\rangle$$

Why exactly is the orthogonal projection required here?

Self-adjoint operators come with a canonical set of orthogonal projectors which send vectors in $\mathscr H$ to the various eigenspaces of the operator. We use these projectors to extract the individual $\psi_i$'s from the decomposition $\psi = \sum_{i=1}^K \psi_i$, and they are orthogonal because the distinct eigenspaces of a self-adjoint operator are orthogonal.

(Not OP's question) What happens when $\mathscr H$ is not finite-dimensional?

In infinite-dimensional spaces, we run into the possibility that the spectrum of the operator in question has a continuous component. In this case, we must turn to the more sophisticated tools of functional analysis; however, if we are willing to play with the idea of generalized (non-normalizable) eigenvectors, then the only real change is that the spectral decomposition will include an integral over the continuous spectrum as well as a sum over the discrete spectrum. However, since the spirit of the answer doesn't really change, I don't think it's necessary to make this explicit. If the spectrum consists purely of a discrete (but infinite) set of eigenvalues, then everything written above stays essentially the same, with the sums extended to infinity.

This level of exposition should be in textbooks, thank you very much! — CBBAM, Jun 29 '22 at 20:46

Why must we take the orthogonal projection to determine the probability distribution for measuring a self-adjoint operator?

1 Answers1