2

I have been trying to understand general relativity from a first-principles perspective in my spare time, and I have been unable to find a convincing derivation of the Einstein equations. The most complete one I can find is the one on Wikipedia, but it has a big mathematical gap that I can't figure out. Namely, when computing the variation of the Riemann curvature tensor, the author assumes that the variation operator is a derivation, i.e. satisfies the product rule for derivatives. This seems to be false, because the variation in question is not itself an ordinary derivative, but rather the Euler-Lagrange "derivative", whose definition for a function of the (inverse) metric and its first two partials (like the Riemann tensor) is

$$ \frac{\delta \mathcal{L}(g^{ij}, \partial_k g^{ij}, \partial_l \partial_k g^{ij})}{\delta g^{ij}} = \frac{\partial \mathcal{L}}{\partial g^{ij}} - \partial_k \frac{\partial \mathcal{L}}{\partial(\partial_k g^{ij})} + \partial_l \partial_k \frac{\partial \mathcal{L}}{\partial(\partial_l \partial_k g^{ij})}. $$

The second and third terms do not satisfy the product rule. It appears almost as though in the linked derivation the author is taking simple partials with respect to the inverse metric, which is entirely wrong. And yet, that derivation is linked to Carroll's textbook, so it must have some credibility. I don't have the textbook, so I can't check whether it explains this logic more completely. Therefore I turn to Physics.SE. What's going on here?

Qmechanic
  • 201,751

3 Answers3

3

What I think is tripping you up here is the use partial derivative notation in calculus of variations. It's generally a lot easier, particularly when doing calculations in GR, to use $\delta$-operator notation instead. (The $\delta$-operator, by definition, does obey the product rule: $\delta(fg) = f \delta g + g \delta f$.) Nonetheless, I've written up the basics of what's going on here in your notation; I do have to make a bit of a fudge at the end (see if you can spot it!), but rest assured that writing everything out using $\delta$-operators makes everything a bit more rigorous.

So let's take the functional derivative of the product $F (g, \partial g) G(g, \partial g)$: $$ \delta \left[ F (g, \partial g) G(g, \partial g) \right] = \left( \frac{\partial F}{\partial g^{ij}} \delta g^{ij} + \frac{\partial F}{\partial (\partial_k g^{ij})} \delta( \partial_k g^{ij}) \right) G( g^{ij}, \partial_k g^{ij}) + \left( \frac{\partial G}{\partial g^{ij}} \delta g^{ij} + \frac{\partial G}{\partial (\partial_k g^{ij})} \delta( \partial_k g^{ij}) \right) F( g^{ij}, \partial_k g^{ij}) $$ The first terms in each set of brackets (proportional to $\partial F/\partial g^{ij}$ & $\partial G/\partial g^{ij}$) obviously obey the product rule when taken together, so let's focus on the others:
$$ \left(\frac{\partial F}{\partial (\partial_k g^{ij})} \delta( \partial_k g^{ij}) \right) G + \left( \frac{\partial G}{\partial (\partial_k g^{ij})} \delta( \partial_k g^{ij}) \right) F \\ = \partial_k \left[ \left( \frac{\partial F}{\partial (\partial_k g^{ij})} G + \frac{\partial G}{\partial (\partial_k g^{ij})} F \right) \delta g^{ij} \right] - \partial_k \left[ \frac{\partial F}{\partial (\partial_k g^{ij})} G + \frac{\partial G}{\partial (\partial_k g^{ij})} F \right] \delta g^{ij} $$ and so (discarding the total derivative) we have $$ \frac{\delta \left[ F (g, \partial g) G(g, \partial g) \right]}{\delta g^{ij}} = G \frac{\partial F}{\partial g^{ij}} + F\frac{\partial G}{\partial g^{ij}} - \partial_k \left[ \frac{\partial F}{\partial (\partial_k g^{ij})} G + \frac{\partial G}{\partial (\partial_k g^{ij})} F \right]. \qquad {(1)} $$ This last term will not, as you note in general be equal to $$ - G \partial_k \left[ \frac{\partial F}{\partial (\partial_k g^{ij})} \right] - F \partial_k \left[\frac{\partial G}{\partial (\partial_k g^{ij})} \right] $$ as one would expect from the product rule.

However, in the case of the Einstein-Hilbert action, we have $F = \sqrt{-g}$ and $G = R$. Since $\nabla_k F = 0$ (remember, we should really be using covariant derivatives above) and since $\partial F/\partial (\partial_k g^{ij}) = 0$, the second term in (1) becomes $$ - \partial_k \left[ \frac{\partial F}{\partial (\partial_k g^{ij})} G + \frac{\partial G}{\partial (\partial_k g^{ij})} F \right] = - F \partial_k \left[ \frac{\partial G}{\partial (\partial_k g^{ij})} \right] = - G \partial_k [0] - F \partial_k \left[ \frac{\partial G}{\partial (\partial_k g^{ij})} \right], $$ which is what you would expect from the product rule. Similar logic extends to the case where $G$ (but not $F$) depends on higher derivatives of the fields.

  • Thanks for your answer. It doesn't completely solve my problem, since for example in the first step in the Wikipedia proof they apply the product rule to a product of two Christoffel symbols rather than simply $\sqrt{-g} R$, but it helped me see what I was misunderstanding. – Ryan Reich Jun 16 '17 at 21:12
1

I think it is important to understand what exactly the functional differential $\delta$ is doing. We have a functional $S:\mathscr E\to \Bbb R$, where $\mathscr E$ is some vector space of field configurations (smooth tensor fields or whatever). We take a smooth family $\psi_\epsilon$ of fields, where $\epsilon\in (-\delta,\delta)$. As $\epsilon$ varies, we obtain a family $S_\epsilon=S(\psi_\epsilon)$. We define $\delta S/\delta\psi(\psi_0)$ by the condition that for every such family with $\psi_0$ fixed, we have $$\delta S:=\left.\frac{dS_\epsilon}{d\epsilon}\right|_{\epsilon=0}=\int_M \frac{\delta S}{\delta\psi}(\psi_0)\delta\psi,\quad \delta\psi:=\left.\frac{d\psi_\epsilon}{d\epsilon}\right|_{\epsilon=0}.$$ If $S=\int L$, then we simply have $$\frac{dS}{d\epsilon}=\int_M \frac{dL}{d\epsilon}\,\mu,$$ where $\mu$ is some measure not depending on $g$. Setting $\epsilon=0$ gives $$\delta S=\int_M\delta L\,\mu.$$ So $\delta$ is indeed a derivation because it is $d/d\epsilon$.

Ryan Unger
  • 8,813
1

After pondering Michael Seifert's answer, I have realized what the full resolution of my problem is. The issue is that the expression $\delta \mathcal{L}$, which is defined to be

$$ \delta \mathcal{L} = \frac{\partial \mathcal{L}}{\partial g^{ij}} \delta g^{ij} + \frac{\partial \mathcal{L}}{\partial (\partial_k g^{ij})} \partial_k (\delta g^{ij}) + \frac{\partial \mathcal{L}}{\partial (\partial_l \partial_k g^{ij})} \partial_l \partial_k (\delta g^{ij}),$$

cannot be confused with $\frac{\delta\mathcal{L}}{\delta g^{ij}}$, unlike with differentials. This is because we don't have the linear approximation

$$ \delta \mathcal{L} = \frac{\delta\mathcal{L}}{\delta g^{ij}} \delta g^{ij} $$

as, again, we do have for differentials, but rather

$$ \delta \mathcal{L} = \frac{\delta\mathcal{L}}{\delta g^{ij}} \delta g^{ij} + \partial_i f^i, $$

for some vector $f^i$. This difference is what prevents $\frac{\delta \mathcal{L}}{\delta g^{ij}}$ from being a derivation. Doing the whole computation with the operator $\delta$ rather than the functional derivative $\frac{\delta}{\delta g^{ij}}$ works out just fine. This is actually what is depicted on the Wikipedia page; I simply assumed that the $\delta$-differential notation was a shorthand.