Justification of the $U(1)$ gauge for electromagnetism?

Question

Why should we expect or require that there is a $U(1)$-gauge symmetry in the theory of a charged particle (such as QED), namely that its physical properties should not change under local changes of the wavefunction $\psi(x) \rightarrow \psi(x) e^{i \alpha(x)}$? For example, demanding this $U(1)$-symmetry be satisfied justifies the usage of a covariant derivative $D_\mu(x) = \partial_\mu + i q B_\mu(x)$, while $B_\mu$ transforms by $B_\mu(x) \rightarrow B_\mu(x) - \frac1q \partial_\mu \alpha(x)$. Letting $B_\mu$ be the vector potential $A_\mu$ in electromagnetism allows an interaction/coupling between the fermion wavefunction $\psi$ and electromagnetism, although I don't understand why that is indeed the correct thing to do. I heard that this is precisely called the minimal coupling, but I don't understand the exact details.

In summary I have two questions:

Question 1. What is the physical reason why we should require a $U(1)$-gauge symmetry to be present for the wavefunction of a charged fermion?

Question 2. After we impose this gauge symmetry on the Dirac Lagrangian, what is the physical justification that we should let the connection 1-form term to be precisely the vector potential $A_\mu$, and not other terms from electromagnetism (say, $F_{\mu\nu}A^\nu$)?

There are (many) other questions on Physics SE that address a similar concern (such as this and this), but not exactly this.

J. Murray · Accepted Answer · 2020-12-28T16:04:43.867

Ultimately, the physical reason for doing this is that it works. There’s a fairly natural line of reasoning which leads to this procedure - that’s not a proof, because proofs don’t exist in physics, but it’s suggestive motivation. I'll run through this reasoning, breaking up the narrative with some elaboration that may be helpful.

If you impose a local symmetry under the action of a Lie group $G$, then you are immediately led to the need for a connection, which can be represented by a one-form $\mathbf A$ which takes its values in the Lie algebra $\frak g$ associated to $G$, and a covariant derivative $D = \partial + A$.

Let $\Psi : \mathbb R^3\mapsto \mathbb C^n$ be an $n$-component wavefunction. We choose a basis $\{\hat e_1,\hat e_2,\ldots,\hat e_n\}$ for $\mathbb C^n$ and express our wavefunction in component form $\Psi = \psi^a \hat e_a$. If we allow the basis to be position-dependent, then differentiation of the wavefunction yields $$\partial_{\color{red}{\mu}} \Psi = (\partial_\color{red}{\mu} \psi^a)\hat e_a + \psi^a\partial_\color{red}{\mu}(\hat e_a)$$ where I use the red, Greek subscript to denote the spatial index and Latin super/subscripts to denote the $\mathbb C^n$ indices. The expression $\partial_\color{red}{\mu}(\hat e_a)$ will be some element of $\mathbb C^n$, so we can express it in the local basis as $\partial_\color{red}{\mu}(\hat e_a) = {A_\color{red}{\mu}}^b_{\ \ a} \hat e_b$. Plugging this back in and relabeling indices yields $$\partial_\color{red}{\mu} \Psi = (\partial_\color{red}{\mu} \psi^a + {A_\color{red}{\mu}}^a_{\ \ b} \psi^b)\hat e_a$$ This motivates the definition $D_\color{red}{\mu} \equiv \partial_\color{red}{\mu} + A_\color{red}{\mu}$ (where, for each $\color{red}{\mu}$, $A_\color{red}{\mu}$ is interpreted as an $n\times n$ complex matrix), so $\partial_\color{red}{\mu} \Psi = (D_\color{red}{\mu}\psi)^a \hat e_a$. Under change of basis via some $\Omega \in G$, $\psi^a \mapsto \Omega^a_{\ \ b} \psi^b$ and $\hat e_a \mapsto (\Omega^{-1})^b_{\ \ a} \hat e_b$ to preserve the value of $\Psi$ itself. Note: From here on, I'll drop the spatial index, because it just sits there and comes along for the ride. You can always put it back in if you want.

Exercise for the reader: Using the definition $\partial(\hat e_a) = A^b_{\ \ a}\hat e_b$, show that under change of basis, $A \mapsto \Omega(A + \partial) \Omega^{-1}$. Further, argue that if $\Omega = e^\theta$ for some $\theta \in \frak{g}$, consistency requires that $A \in \frak{g}$.

We now ask whether this connection has any physical meaning. If it can be uniformly set to zero by appropriate change of gauge, then we can perform all of our calculations in that gauge; since all gauges are physically equivalent, this implies that $A$ cannot manifest any physical effects. Setting the connection to zero means that $\Omega(A + \partial)\Omega^{-1} = 0 \iff A = -(\partial \Omega^{-1})\Omega = \Omega^{-1} \partial \Omega$ for some $\Omega \in G$.

Exercise for the reader: Show that if $A_\mu = \Omega^{-1} \partial_\mu \Omega$, then $$(dA)_{\mu\nu} \equiv \partial_\mu A_\nu - \partial_\nu A_\mu = -[A_\mu,A_\nu]$$ so the $\frak{g}$-valued(!) 2-form $F$ with components $F_{\mu\nu} \equiv (dA)_{\mu\nu} + [A_\mu,A_\nu] = 0$. Furthermore, show that under change of gauge, $F \mapsto \Omega F \Omega^{-1}$.

That the connection can be set to zero implies that $F$ (called the curvature form of $A$) vanishes. The reverse is also true (at least locally), but that's considerably harder to show.

If $F$ doesn't vanish, then we need some way to determine what it should be. One way to do this is to make a scalar (density) out of $\mathbf F$ and use it as a Lagrangian density. Recall that $F$ has two spatial indices which need to be taken care of. Generically$^\dagger$, the simplest scalar one can make out of $\mathbf F$ is $-\frac{1}{4}\operatorname{Tr}(F^2)$, where $F^2=F_{\mu\nu}F^{\mu\nu}$ and the numerical factor is added for conventional reasons.

Note that $g^{\mu\nu}F_{\mu\nu}$ vanishes identically, so that's no good. However, $F^2 \equiv g^{\mu\alpha}g^{\nu \beta} F_{\mu \nu} F_{\alpha \beta}$ does not. But we must be careful - each $F_{\mu\nu}$ is a matrix. Written properly with all of the relevant indices, $$F^2 = g^{\mu\alpha}g^{\nu\beta}\left( F^a_{\ \ b}\right)_{\mu\nu} \left( F^b_{\ \ c}\right)_{\alpha\beta} = (F^2)^a_{\ \ c}$$ We still need to get rid of those vector space indices, so we can just trace over them to get a real scalar: $$\operatorname{Tr}(F^2) = (F^2)^a_{\ \ a} = g^{\mu\alpha}g^{\nu\beta}\left( F^a_{\ \ b}\right)_{\mu\nu} \left( F^b_{\ \ a}\right)_{\alpha\beta}$$

Demanding that the Lagrangian be gauge-invariant rules out terms like $A_\mu A^\mu$, which would give the $A$ fields a mass; as a result, all of the auxiliary fields obtained in this way are massless.

At the end of the day, the coupling to the matter in the theory arises naturally from the presence of the connection in the covariant derivative. The dynamics arise from using the simplest gauge-invariant scalar as a Lagrangian density.

Question 1. What is the physical reason why we should require a $U(1)$-gauge symmetry to be present for the wavefunction of a charged fermion?

There’s no particular reason why we should - other than that if we do, then electromagnetism falls into our lap. If we repeat the procedure with different gauge groups, we arrive at different theories, some of which seem to be manifested in reality and others which do not.

Question 2. After we impose this gauge symmetry on the Dirac Lagrangian, what is the physical justification that we should let the connection 1-form term to be precisely the vector potential $A_\mu$, and not other terms from electromagnetism (say, $F_{\mu\nu}A^\nu$)?

I think are imagining that $U(1)$ symmetry is imposed and subsequently married to electromagnetism, but that's not quite right. $U(1)$ symmetry is imposed and then becomes electromagnetism. It's not that we find ourselves searching for a connection 1-form and decide that it should be the vector potential; it's that if you follow the procedure above, then the connection 1-form automatically obeys the Maxwell equations and exerts a Lorentz force on charged matter.

In other words, you can call it what you like but the imposition of a local $U(1)$ symmetry requires the introduction of an auxiliary field which behaves exactly like the electromagnetic 4-potential and has exactly the same effect on matter. If it walks like a duck and quacks like a duck...

$^\dagger$A fun exception to this rule is if the dimension of the underlying space and the dimension of the vector space on which $G$ acts are the same; then the Greek and Latin indices run over precisely the same values. This is true when we consider $d$-dimensional spacetime and its $d$-dimensional tangent spaces, for example.

In that case, we can contract one Latin index with one Greek index in the expression $(F^a_{\ \ b})_{\mu\nu}$, and then contract the result with the metric. The only non-trivial way to do this is

$$ g^{\mu \nu} (F^a_{\ \ \mu})_{a \nu}$$

This term is linear in $F$ rather than quadratic, and appears in general relativity; the connection is the Christoffel connection, $F$ is the Riemann curvature tensor, and the above-obtained scalar is the Ricci scalar.

I like the answer! I was thinking that maybe would be nice to add that, imposing the gauge invariance to the lagrangian, makes the photon massless, as mass terms breaks this invariance — Ratman, Dec 27 '20 at 23:37
@J.Murray do the quotes you have provided in your answer come from a specific source? They are a very nice and I would like to read more. — Schoppe, Apr 20 '22 at 16:00

Justification of the $U(1)$ gauge for electromagnetism?

1 Answers1