104

What is the reason for the observation that across the board fields in physics are generally governed by second order (partial) differential equations?


If someone on the street would flat out ask me that question, then I'd probably mumble something about physicists wanting to be able to use the Lagrangian approach. And to allow a positive rotation and translation invariant energy term, which allows for local propagation, you need something like $-\phi\Delta\phi$.

I assume the answer goes in this direction, but I can't really justify why more complex terms in the Lagrangian are not allowed or why higher orders are a physical problem. Even if these require more initial data, I don't see the a priori problem.

Furthermore you could come up with quantities in the spirit of $F\wedge F$ and $F \wedge *F$ and okay yes... maybe any made up scalar just doesn't describe physics or misses valuable symmetries. On there other hand in the whole renormalization business, they seem to be allowed to use lots and lots of terms in their Lagrangians. And if I understand correctly, supersymmetry theory is basically a method of introducing new Lagrangian densities too.

Do we know the limit for making up these objects? What is the fundamental justification for order two?

Qmechanic
  • 201,751
Nikolaj-K
  • 8,443
  • Related: https://physics.stackexchange.com/q/4102/2451 – Qmechanic Dec 21 '11 at 14:24
  • 4
    I'm surprized that in the 10 answers below, nobody mentionned "causality" and "locality". Third order and higher differential equations (like the Abraham-Lorentz-Dirac equation) have problems with causality and locality. – Cham Sep 11 '17 at 16:34

12 Answers12

37

First of all, it's not true that all important differential equations in physics are second-order. The Dirac equation is first-order.

The number of derivatives in the equations is equal to the number of derivatives in the corresponding relevant term of the Lagrangian. These kinetic terms have the form $$ {\mathcal L}_{\rm Dirac} = \bar \Psi \gamma^\mu \partial_\mu \Psi $$ for Dirac fields. Note that the term has to be Lorentz-invariant – a generalization of rotational invariance for the whole spacetime – and for spinors, one may contract them with $\gamma_\mu$ matrices, so it's possible to include just one derivative $\partial_\mu$.

However, for bosons which have an integer spin, there is nothing like $\gamma_\mu$ acting on them. So the Lorentz-invariance i.e. the disappearance of the Lorentz indices in the terms with derivatives has to be achieved by having an even number of them, like in $$ {\mathcal L}_{\rm Klein-Gordon} = \frac{1}{2} \partial^\mu \Phi \partial_\mu \Phi $$ which inevitably produce second-order equations as well. Now, what about the terms in the equations with fourth or higher derivatives?

They're actually present in the equations, too. But their coefficients are powers of a microscopic scale or distance scale $L$ – because the origin of these terms are short-distance phenomena. Every time you add a derivative $\partial_\mu$ to a term, you must add $L$ as well, not to change the units of the term. Consequently, the coefficients of higher-derivative terms are positive powers of $L$ which means that these coefficients including the derivatives, when applied to a typical macroscopic situation, are of order $(L/R)^k$ where $1/R^k$ comes from the extra derivatives $\partial_\mu^k$ and $R$ is a distance scale of the macroscopic problem we are solving here (the typical scale where the field changes by 100 percent or so).

Consequently, the coefficients with higher derivatives may be neglected in all classical limits. They are there but they are negligible. Einstein believed that one should construct "beautiful" equations without the higher-derivative terms and he could guess the right low-energy approximate equations as a result. But he was wrong: the higher derivative terms are not really absent.

Now, why don't we encounter equations whose lowest-order derivative terms are absent? It's because their coefficient in the Lagrangian would have to be strictly zero but there's no reason for it to be zero. So it's infinitely unlikely for the coefficient to be zero. It is inevitably nonzero. This principle is known as Gell-Mann's anarchic (or totalitarian) principle: everything that isn't prohibited is mandatory.

Luboš Motl
  • 179,018
  • Thanks for the answer. What is the reason that "their coefficients are powers of a microscopic scale or distance scale $L$"? In the last paragraph you use this again, where it's implied that the lower order derivatives are a priori related to a bigger scale, which then outweighs the later ones associated with higher orders. Is there a justification, which goes back to axiomatic assumptions or is it "just" an empirical insight from dealing with effective field theories? – Nikolaj-K Dec 21 '11 at 15:08
  • Dear @Nikolaj, $L$ determining the coefficients is microscopic because microscopic scales are the natural ones for the formulation of the laws of physics. By definition, microscopic scales are the scales associated with the elementary particles. These general discussions talk about many things at the same moment. For example, in GR, the typical scale is the Planck length, $10^{-35}$ meters, which is the shortest one. In other theories, the typical scale is longer. But it's always microscopic because it determines the internal structure/behavior of the fields and particles which are small. – Luboš Motl Dec 21 '11 at 16:06
  • The comment that the derivatives are not just related, they produce long scale was meant to be as a self-evident tautology. What I mean is that if we consider a field that is changing in space, e.g. as a wave with wavelength $R$, then the derivative will pick a factor of order $1/R$, too. For example, the derivative of $\sin(x/R)$, the wave of length $2\pi R$, is $\cos(x/R)/R$. Cos and sin is almost the same thing, of the same order 1, and we therefore picked an extra factor of $1/R$. All these things are order-of-magnitude estimates. Macroscopic usage of field theory has a macroscopic $R$. – Luboš Motl Dec 21 '11 at 16:09
  • I'm not sure if I successfully pointed out my problem in the comment. My question is: What is the justification for assuming the coefficient of smaller orders would describe a bigger scale? What speaks against a situation, where the fourth order term has a small coefficient, but the second order term has an even smaller one? Then in the classical limit, just the fourth order expression would survive. – Nikolaj-K Dec 21 '11 at 18:36
  • Dear @Nikolaj, it's likely that I don't understand your continued confusion at all. Whether a term may be neglected depends on the relative magnitude of the two terms, the neglected one and the surviving one. So I am estimating the ratio of higher-derivative terms and two-derivative terms and it scales like $(L/R)^k$, a small number, so the higher-derivative terms may be neglected if the two-derivative terms are there. It doesn't matter how you normalize both of these terms in an "absolute way". What matters for being able to neglect one term is the ratio of the two terms. – Luboš Motl Dec 21 '11 at 18:53
  • What I'm saying is that you might consider $A(\partial \phi)^2+B(\partial \phi)^4$, where $A$ and $B$ are different and $A$ is much much smaller than $B$. So small, that in the limit where it comes to comparing them (even times powers of $R$ and whatnot), the $(\partial \phi)^2$ has to be neglected. Then in the classical limit, not the second order expression would survive. Is there a reason why this couldn't happen?? – Nikolaj-K Dec 21 '11 at 19:47
  • Dear @Nikolaj, unfortunately, you haven't started to understand the answer at all. The point is that $A,B$ have different units so statements like "$A$ is much smaller than $B$" are meaningless. Which of the terms is more important and which of them may be neglected depends on the situation, on the particular scale $R$ of the problem you're solving. The estimate is that $B\sim AL^2$ always holds where $L$ is a microscopic scale associated with the "fundamental physics of $\phi$", so the $B$ term produces negligible effects $B/R^2\sim A L^2/R^2 \ll A$ at long distances $R\gg L$. – Luboš Motl Dec 22 '11 at 08:37
  • Mhm yes, I don't seem to understand yet. Say $A$ has units $a$ and $B$ has units $al^2$. If you compare $B$ with $AL^2$, then $B$ could still be much bigger. "Which of the terms is more important and which of them may be neglected depends on the situation". Say we consider the situation where $B(\partial\phi)^4$ is* more important than $B(\partial\phi)^4$, then the power-4-effects might be more important. Or are you really saying that the scale $R$ is not fixed a priori and at some point the $\frac{1}{R^2}$ will always kill all other effects? – Nikolaj-K Dec 22 '11 at 09:26
  • 2
    This doesn't apply to general relativity, where nevertheless equations are of second order. According to your argument, the universe should be flat. – Arnold Neumaier Nov 12 '12 at 16:41
  • 1
    It neither applies to the equations of fluid mechanics, as these are not directly goverend by microscopic considerations. – Arnold Neumaier Nov 12 '12 at 16:43
  • I have quoted and commented some of your claims in my answer. I hope this does not offend you. Maybe you want to reply. – Diego Mazón Nov 13 '12 at 22:04
  • Dear Lubos, can we rephrase your answer as follows: derivatives wrt space are the appropriate momenta in momentum space and thus the terms containing higher derivative terms contribute less in lower energies? – TheQuantumMan Jun 20 '17 at 23:19
  • Yes, I think so. It's not quite exactly the answer to the question that was asked but it is a concise version of a big part of my answer written above. – Luboš Motl Jun 23 '17 at 17:41
34

One can rewrite any pde of any order as a system of first order pde's, hence the assumption behind question is somewhat questionable. Also there exist first order PDE's of relevance to physics (Dirac equation, Burgers equation, to name just two).

However, it is common that quantities in physics appear in conjugate pairs of potential fields and their associated field strength, defined by the potential gradient. Now the gradients of field strength act as generalized forces that try to move the system to an equilibrium state at which these gradients vanish. (They will succeed only if there is sufficient friction and no external force.)

In a formulation where only one half of each conjugate pair is explicit in the equations, a second order differential equation results.

For example, in the Hamiltonian formulation of conservative mechanics, we have $$\dot q=\partial_p H(p,q),~~~\dot p = -\partial_q H(p,q).$$ This becomes in the most common special case where $H(p,q)=p^2/2m+V(q)$ the equations $$\dot q=p/m,~~~\dot p = -\partial V(q).$$ Elimination of $p$ leaves a second-order equation.

27

Here we will for simplicity limit ourselves to systems that have an action principle. (For fundamental and quantum mechanical systems, this is often the case.) Let us reformulate OP's question as follows:

Why do the Euler-Lagrange equations of motion for a relativistic (non-relativistic) system have at most two spacetime-derivatives (time-derivatives), respectively?

(Here the precise number of derivatives depends on whether one considers the Lagrangian or the Hamiltonian formulation, which are related via Legendre transformation. In case of a singular Legendre transformation, one should use the Dirac-Bergmann or the Faddeev-Jackiw method to go back and forth between the two formalisms. See also this Phys.SE post.)

Answer:

The higher-derivative terms are in certain theories suppressed for dimensional reasons by the natural scales of the problem. This may e.g. happen in renormalizable theories.

But the generic answer is that the equations of motion actually doesn't have to be of order $\leq 2$.

However, for a generic higher-order quantum theory, if higher-derivative terms are not naturally suppressed, this typically leads to ghosts of the so-called bad type with wrong sign of the kinetic term, negative norm states and unitarity violation.

At the naive level, explicit appearances of higher time-derivatives may be removed in formulas by introducing more variables, either via the Ostrogradsky method, or equivalently, via the Lagrange multiplier method. However, the positivity problem is not cured by such rewritings due to the Ostrogradsky instability, and the quantum system remains ill-defined. See also e.g. this and this Phys.SE answer.

Hence one can often not make consistent sense of higher-order theories, and this may be why OP seldom faces them.

Finally, let us mention that it is nowadays popular to study effective higher-derivative field theory, with the possibly unfounded hope, that an underlying, supposedly well-defined, unitary description, e.g. string theory, will cure all pathologies.

Qmechanic
  • 201,751
  • 1
    Thought-experiment for later: If we have a dispersion relation $\omega=\sqrt[n]{k^2+m^2}$, then the group velocity is $\quad v_g=\frac{\partial \omega}{\partial k}=\frac{2k}{n\omega^{n-1}}$. The limits are $\quad\lim_{k\to 0} v_g=0$. $\quad\lim_{k\to \infty} v_g=0$ for $n>2$. – Qmechanic Jan 24 '20 at 13:09
21

The reason for equations of physics, being of at most second order, is due to the so-called Ostrogradskian instability. (see paper by Woodard). This is a theorem, which states that equations of motion with higher-order derivatives are in principle unstable or non-local. This is easily shown using the Lagrangian and Hamiltonian formalism.

The key point is that in order to get an equation of motion of third order in the derivatives, we need a Lagrangian that depends on the coordinates and the generalized velocities and accelerations: $L(q,\dot{q},\ddot{q})$. By performing a Legendre transformation to obtain the Hamiltonian, this implies that we need two generalized momenta. The Hamiltonian results to be linear in at least one of the momenta and therefore it is unbounded from below (it can become negative). This corresponds to a phase space in which there are no stable orbits.

I would like to write the proof here, but it was already answered in this post. There the question is why Lagrangians only have one derivative, but it is actually closely related, since one can always find the equations of motion from a Lagrangian and viceversa.

Citing Woodard (https://arxiv.org/pdf/hep-th/0207191v1.pdf): "It has long seemed to me that the Ostrogradskian instability is the most powerful, and the least recognized, fundamental restriction upon Lagrangian field theory. It rules out far more candidate Lagrangians than any symmetry principle. Theoretical physicists dislike being told they cannot do something and such a bald no-go theorem provokes them to envisage tortuous evasions. ... The Ostrogradskian instability should not seem surprising. It explains why every single system we have so far observed seems to be described, on the fundamental level, by a local Lagrangian containing no higher than first time derivatives. The bizarre and incredible thing would be if this fact was simply an accident."

Santiago
  • 555
7

Actually, evolution equations are even more than just second order in time : they don't depend naively on first order derivative, that is, on "velocity". This can be easily understood as the fact that there exists no privileged inertial frames. The change (that is, what is absolute) is given by acceleration and not velocity. If it depended naively on some velocity terms, then it would implies that there's a privileged frame.

Let us make some analogy with Newtonian mechanics. If we were living in an Aristotle universe with privileged frame of reference, then $F = mv$. Motion would therefore be absolute and so would be velocity. Because there is no such privileged frame of reference, but a whole class of privileged ones (the inertial ones), $F = ma$. Why couldn't it be that we live in a universe where $F = m \dot a$ ? Simply because of Galilean principles.

If you believe that acceleration and velocities are "cancellable", and that real change is given by the derivative of acceleration, then you would have to believe in a second order Galilean principle of invariance and inertia. Second order principle of invariance would tell you that the laws of physics has to be the same in all inertial frames and all uniformly accelerated frames, otherwise it would mean that there is a way to discriminate them, and thus, that there is no equivalence between being inertial or uniformly accelerated. This, in particular implies that if you're inside one of these frames and you see someone that is uniformly accelerated with respect to your $x$ axis, that is, $x_1(t) = gt^2/2$, and you also see someone accelerated in the opposite direction, that is, $x_2(t) = -gt^2/2$, then from the point of view of $x_2$, the first object will be described by $x_2(t) = g t^2$. This implies that you would be able to see objects with arbitrary high acceleration, and this without the need to consume any "energy".

This is not what we observe in this universe, you don't uniformly accelerate an object "for free". So it looks like nature choosed to be as simple as possible in order to keep a symmetry between all inertial frames : its second order in time, not third or even worse. Note that one could say that its Machian, that is, that it is symmetric up to all order in acceleration. This would implies that there is no difference at all between rotation and being inertial. That is to say, that if I look at a guy spinning with a ball in his hands that will eventually let it go, the ball will then make a spiral movement and its angular velocity will keep increasing as far as it goes further from the guy who launched it (indeed, the latter has to see it going into straight line by Galileo principle of inertia). Universe is therefore not Machian either.

Then why does Schrödinger's equation depends on first order in time ? Because it is a modal equation : it needs an observer to makes sense and to make measurement. Hence, there is one Schrödinger equation per observer (the Hamiltonian depends on the observer and the system he is looking at, see the relational interpretations). At least, this is my interpretation of it.

sure
  • 1,014
6

First of all, it's not true that all important differential equations in physics are second-order. The Dirac equation is first-order.

This is correct. However, physical evolution equations are second (in time) order hyperbolic equations. In fact, each component of Dirac spinor follows a second order equation, namely, Klein-Gordon equation.

Now, what about the terms in the equations with fourth or higher derivatives?

They're actually present in the equations, too.

Neither the Standard Model (SM) Lagrangian nor the Einstein-Hilbert (EH) action contain higher than second order temporal derivatives. These are the actions which are experimentally tested and these two theories are the most fundamental scientific theories we have. We know that there are physics beyond these two theories and people have good candidates to the underlying theories, but physics is an experimental science and these theories are not experimentally verified. The effective SM Lagrangian (a Lorentz invariant theory with the gauge symmetries of the SM but with irrelevant operators) does contain higher than second order temporal derivatives. Equally for the EH action plus higher order scalars. Two clarifications are however in order:

  • These irrelevant terms are not experimentally verified. Almost everyone is sure that neutrino mass terms (which are irrelevant operators but do not contain higher order derivatives) exist in order to explain neutrino oscillations, but so far we do not have direct measurements of neutrino masses thus we are not allowed to claim that these terms exist. Summarizing: the effective SM is not a verified theory.

  • The origin of these irrelevant terms is a consequence of integrating out fields with a mass much greater than the energy scale we are interested in. This could be the case of the neutrino mass term and a right-handed neutrino. For instance, in quantum electrodynamics, if one is interested in the physics at much lower energies than the electron mass, one can integrate (or nature integrates-out) out the electron field obtaining an effective Lagrangian (Euler-Heisenberg Lagrangian) with terms with higher order derivatives like $\frac{\alpha ^2}{m_e^4}~F_{\mu\nu}~F^{\mu\nu}~F_{\rho\sigma}~F^{\rho\sigma}$ (which contains four derivatives). These are terms suppressed by coupling constants ($\alpha$) and high-energy scales ($m_e$). There are terms with a number of derivates arbitrarily high, and they come from inverses of differential operators. This makes that the higher order derivatives do not enter in the zeroth-order equation of motion.

However, in a fundamental theory (in contrast to an effective one), finite higher order derivatives are not allowed in interactive theories (there are some exceptions with gauge fields, but for example a generic $f(R)$ theory of gravity is inconsistent). The reason is that those theories are not bounded from bellow (see Why are there only derivatives to the first order in the Lagrangian?) or, in some quantizations, contain negative norm states. These terms are among the forbidden operators in Gell-Mann's totalitarian principle.

In summary, evolution equations are order two because of existence of a normalizable vacuum state and unitarity (including here the fact that physical states must have positive norm). Newton was right when he wrote $$\ddot x=f(x,\dot x)$$

Diego Mazón
  • 6,819
4

Weinberg gives a pretty good answer for this in Volume 1 of his QFT opus: 2nd order differential equations appear in the field theories relevant to particle physics because of the relativistic mass-shell condition $p^2 = m^2$.

If we have a quantum field $\phi$, and we think of its fourier modes $\phi(p)$ as creating particles with 4-momentum $p$, then the mass-shell condition provides a constraint: $(p^2 - m^2)\phi(p) = 0$, because we don't want particle creation off-shell. Fourier-transform this back to position space, and you find that $\phi$ has to obey a 2nd order differential equation.

user1504
  • 16,358
  • This doesn't apply to general relativity, where nevertheless equations are of second order. – Arnold Neumaier Nov 12 '12 at 16:39
  • 2
    It does tell you that the linearized Einstein equations should be second order. And it explains why the renormalization flow should be defined in such a way that the kinetic term is fixed, which is a important assumption implicit in Lubos' answer. – user1504 Nov 12 '12 at 16:42
2

Occasionally higher-order differential equations do come up: the equations of motion for a particle experiencing the Abraham-Lorentz force are third-order. (Although to be fair, this is a big part of the reason why a lot of physicists dislike the concept of the Abraham-Lorentz force!)

tparker
  • 47,418
1

It was already noted in other answers that fields in physics are not always governed by second order partial differential equations (PDEs). It was said, e.g., that the Dirac equation is a first-order PDE. However, the Dirac equation is a system of PDEs for four complex functions - components of the Dirac spinor. It was also mentioned that any PDE is equivalent to a system of PDEs of the first order.

I mentioned previously that the Dirac equation in electromagnetic field is generally equivalent to a fourth-order partial differential equation for just one complex component, which component can also be made real by a gauge transform (http://akhmeteli.org/wp-content/uploads/2011/08/JMAPAQ528082303_1.pdf (my article published in the Journal of Mathematical Physics) or http://arxiv.org/abs/1008.4828 ). Let me also mention my article http://arxiv.org/pdf/1111.4630.pdf , where it is shown that the equations of spinor electrodynamics (the Dirac-Maxwell electrodynamics) are generally equivalent to a system of PDEs of the third order for complex four-potential of electromagnetic field (producing the same electromagnetic field as the usual real four-potential of electromagnetic field).

akhmeteli
  • 26,888
  • 2
  • 27
  • 65
0

(adding comment as answer)

Actually all classical mechanics (and quantum mechanics) can be formulated with only 1st-order derivatives (with the expense of adding extra dimensions, ie phase-space, Hamiltonian formalism).

This indeed makes for a dynamic description of a physical system. Furthermore any order of differential equations can be made into 1st order by the same token.

Non-linear dynamics (i.e chaos theory) makes heavy use of only 1st-order dynamical laws in their studies.

Adding more orders to dynamical laws, needs more information to be added (initial conditions) and becomes untractable to solve explicitly or algorithmically in most cases.

Even furthermore, first order dynamical laws, do provide (at least) good approximations or even complete coverage of the dynamical evolution of a system under study

Nikos M.
  • 5,142
0

Second order wave equation can be factorized, two 1st order wave equation (one-way wave equation) result. Second order wave equations are ambigous with respect to the wave propagation direction due to squared wave velocity, whereas one-way wave equation have pre-defined propagation direction. Second order wave equation resp. "Two-way wave equation" rather describe a standing wave field.

https://en.wikipedia.org/wiki/One-way_wave_equation#:~:text=A%20one%2Dway%20wave%20equation,propagating%20wave%20travelling%20in%20a

0

A little late to the party, but let me add something to the previous answers.

The top reason should indeed be the Ostrogradsky instability mentioned already. But there are others.

In stochastic differential equations, we often encounter the Fokker-Planck equation, while a description with a higher-order equation can be obtained from the Kramers-Moyal expansion as well. One reason for truncating the expansion after the second-order term is the Pawula theorem: The expansion ends after the second term or has infinitely many terms. If you do truncate such an expansion after a finite number of terms, solving the resulting equation can yield negative probability densities. (see the classic book by Risken!) Pawula calls this a 'logical inconsistency', which I think is wrong, given that the expansion is asymptotic and a small error term changing a non-negative value to a small-magnitude negative value does not constitute an inconsistency.

Also, in material modelling there are higher-order materials like in strain gradient elasticity (works by Mindlin and Toupin in the 60s, ongoing research today...). In comparison to those, classical elasticity is obtained by removing the higher-order terms, and for most intents and purposes the classical theory will already be enough. The effects of higher-order models only come to play in special scenarios, when modelling small-scale phenomena, for example. The fact that you will mostly work with second-order equations is thus down to them being the most simple and successful approximation.

And now to give some examples for higher-order equations (not all field equations in physics are of second order!): https://en.wikipedia.org/wiki/Cahn%E2%80%93Hilliard_equation, https://en.wikipedia.org/wiki/Euler%E2%80%93Bernoulli_beam_theory, https://en.wikipedia.org/wiki/Kramers%E2%80%93Moyal_expansion, https://en.wikipedia.org/wiki/Korteweg%E2%80%93De_Vries_equation

kricheli
  • 3,216