Are the Hamiltonian and Lagrangian always convex functions?

Question

The Hamiltonian and Lagrangian are related by a Legendre transform: $$ H(\mathbf{q}, \mathbf{p}, t) = \sum_i \dot q_i p_i - \mathcal{L}(\mathbf{q}, \mathbf{\dot q}, t). $$ For this to be a Legendre transform, $H$ must be convex in each $p_i$ and $\mathcal{L}$ must be convex in each $\dot q_i$.

Of course this is the case for simple examples such as a particle in a potential well, or a relativistic particle moving inertially. However, it isn't obvious to me that it will always be the case for an arbitrary multi-component system using some complicated set of generalised coordinates.

Is this always the case? If so, is there some physical argument from which it can be shown? Or alternatively, are there cases where these convexity constraints don't hold, and if so what happens then?

For a system of (generally constrained) points of matter interacting with conservative forces (or generalized conservative forces admitting a generalized potential $U(t,{\bf q},\dot{{\bf q}})$ at most linear in the $\dot{\bf q}$ like for the EM interaction), the Lagrangian takes always the form $\sum_{hk}A(t, {\bf q}){hk}\dot{q}^h\dot{q}^k + \sum{hk}B(t, {\bf q})_{h}\dot{q}^h + C(t, {\bf q})$. Above $A$ is a strictly positive symmetric matrix. This function is convex. The resulting Hamiltonian function is convex too. — Valter Moretti, Mar 18 '14 at 07:21
An example of non-convex Lagrangian is: $L=\frac{1}{3}T^2+2TV-V^2$. I couldn't calculate the associated Hamiltonian. But since it's equivalent to: $L=T-V$, it's not really relevant. And a related question with no answers: http://math.stackexchange.com/q/482553/ — jinawee, Mar 18 '14 at 09:27
@jinawee, for $L=T^{2}/3 + 2TV - V^{2}$, the canonical momenta would be $p_{k}=2(T/3 + V)\partial T/\partial\dot{q}^{k}$ and the Hamiltonian would therefore be $H=T^{2}+2TV+V^{2}=(T+V)^{2}$. — Alex Nelson, Mar 19 '14 at 19:20
@AlexNelson How did you get from $H=2(T/3 + V)\partial T/\partial\dot{q}^{k}\dot{q}_k-T^{2}/3 - 2TV + V^{2}$ to your final Hamiltonian? — jinawee, Mar 19 '14 at 19:33
@jinawee, I should admit I assumed $\dot{q}^{k}\partial T/\partial\dot{q}^{k}=2T$ as usual (summing over $k$, a la Einstein summation convention). This works in the Newtonian setting, but if one tries to work in (say) Special Relativity, I am not as confident it would produce the same result... — Alex Nelson, Mar 19 '14 at 19:36
Related math.SE question: http://math.stackexchange.com/q/212120/11127 — Qmechanic, Apr 12 '14 at 21:22
@Qmechanic note that one of the answers on that related question is by me. A non-convex Lagrangian would imply a multiple-valued Hamiltonian, and vice versa, so I'm still slightly confused by this. (In the classical context.) — N. Virgo, Apr 13 '14 at 04:36
Try to understand your terminology. Are you implying, convex : concave ~ minimization : maximization of the functional density? — wonderich, May 01 '21 at 21:35

Qmechanic · Answer 1 · 2017-06-07T11:31:25.270

I) At the classical level, there is no convexity condition. If an action functional $S$ yields a stationary action principle, so will the negative action $-S$. (Under sign changes, a convex function turns in concave function and vice versa.) Or one could imagine a theory, which is convex in one section and concave in other sector.

II) On the Lagrangian side $L(q,v,t)$, it is easy to find counterexample, that shows, that one cannot demand convexity in the the position variables $q^i$; or the time variable $t$, for that matter. (For the former, think e.g. on a Mexican hat potential.) So, as OP writes, the convexity can at most concern the velocity variables $v^i$ in the Lagrangian; or the momentum variables $p_i$ in the Hamiltonian $H(q,p,t)$.

III) In the Hamiltonian formulation, it is possible to perform canonical transformation

$$(q^i,p^j)~\longrightarrow~(Q^i,P^j)~=~(-p^i,q^j)$$

which mixes position and momentum variables. From a Hamiltonian perspective, it is unnatural to impose convexity on half the canonical variables but not the other half.

IV) The Lagrangian (density) may be modified with total divergence terms that don't change the Euler-Lagrange equations. These total divergence terms could in principle violate convexity.

V) The Legendre transformation could be singular. In fact, this is the starting point of constraint dynamics. This happens e.g. for the Maxwell Lagrangian density $${\cal L}~=~-\frac{1}{4}F_{\mu\nu}F^{\mu\nu}.$$ See e.g. this Phys.SE post.

VI) Quantum mechanically, we must demand that the Hamiltonian operator is self-adjoint and bounded from below, i.e. the theory should be unitarity.

Perturbatively, this means that the free/quadratic kinetic term should be a (semi)positive form (and therefore a convex function). Zero-modes should be gauge-fixed. Interaction terms are usually treated perturbatively.

In conclusion, convexity does not seem to be a first principle per se, but rather a consequence of the type of QFTs that we typically are able to make sense of. It might be that it is possible to give a non-perturbative definition of a non-convex (but unitary) theory.

Thanks, I can mostly make sense of that. I wider if it might be the case that for any Hamiltonian, there exists a canonical transformation that makes it (positively or negatively) convex in the momenta. That would guarantee that a combination of canonical transformation and Legendre transformation could always put it into Lagrangian form. — N. Virgo, Mar 20 '14 at 01:19

Selene Routley · Answer 2 · 2017-06-15T23:55:32.373

There are indeed nonconvex Lagrangians and they are a problem for the Legendre transformation by making it multiple valued (indeed a convex but not strictly convex Lagrangian will throw up this problem). If one cannot get rid of this multiple valuedness through splitting the problem into convex and concave "sectors" (piecewise analysis where the domain of interest is restricted, as in the first paragraph of Qmecahnics's answer) or by imposing constraints, then it's game over for the Hamiltonian approach. Indeed, multiple valuedness of the Legendre transformation implies nonuniqueness of solution for the Euler-Lagrange equation. I'd like to talk about a famous convex, but not strictly convex example from my own field and what people do about the problems it raises. It provides an interesting illustration of difficulties with convexity and how they arise (i.e. as noted by the OP, they are tantamount to multiple valuedness of Legendre transformation) and also there are at least two common solutions to this particular problem whose appropriateness is different for different fields of physics! Your solution depends on what you want to achieve with your Hamiltonian.

This is the calculation of geodesics in a (semi) Riemannian manifold, so that:

$$\mathcal{L} = \sqrt{g(X)(\dot{X},\,\dot{X})} = \sqrt{g_{ij}(X)\,\dot{x}^i\,\dot{x}^j}\tag{1}$$

This is also the same problem as the calculation of rays from Fermat's least time principle, if you include optical density (refractive index) information in the metric tensor. Thus ray optics in an isotropic medium is the geometry of a conformally flat manifold (since $g$'s matrix in Cartesian co-ordinates is the squared refractive index times the identity); aniostropic mediums give more general geometry.

This example is historically important, not only for General Relativity but also because ray optics - this very problem - was the field that piqued Hamilton's interest in these matters.

The Bad and The Ugly

The Lagrangian is convex, but not strictly so. Consider the linear path:

$$\sigma(t) = t\,\dot{X}_0\tag{2}$$

in the tangent space i.e. where one moves by scaling the tangent vector $\dot{X}_0$. In response, the Lagrangian in (1) also scales linearly, so the linear path lies exactly within in the graph / on the edge of the epigraph of the Lagrangian. Thus, the conjugate momentum $\partial_\dot{X}\,\mathcal{L}$, being the one-form:

$$P(\text{_}) = \frac{g(\dot{X},\,\text{_})}{\sqrt{g(\dot{X},\,\dot{X})}}\tag{3}$$

is independent of $t$ as our point moves according to (2) . Thus $P$ is a highly many to one function on the tangent space at any point: any point in the tangent space of the form $t\,X_0$ for $t\in\mathbb{R}$ has the same value in (3). The Legendre transformation fails to pick out a unique $P$ for a each $\dot{X}$. Not surprisingly, therefore, if one does the Legendre transformation, one gets:

$$\mathcal{H} = P(\dot{X}) - \mathcal{L}=\frac{g(\dot{X},\,\dot{X})}{\sqrt{g(\dot{X},\,\dot{X})}}-\mathcal{L}=0\quad\quad \text{OMG!!}\tag{4}$$

The Legendre transformation in this case is clearly many to one (try saying that aloud with a straight face).

Let's look at this another way. Even the solution of the Euler-Lagrange equation for the Lagrangian - quite aside from wishing for a Hamitonian counterpart - in this problem is fraught (but it can be done with some care). The Hessian matrix of the mapping $\dot{X} \mapsto P = \partial_{\dot{X}} \mathcal{L}$ is:

$$h_{ij} = \partial_{\dot{x}^i} \partial_{\dot{x}^j} \mathcal{L} = \left(g(\dot{X},\,\dot{X})\,g_{ij} - g_{i k}\,\dot{x}^k\,g_{j \ell}\,\dot{x}^\ell\right)\,\mathcal{L}^{-3}\tag{5}$$

For our purposes, (5) is more transparent if we put it in matrix notation (here $G$ is the matrix of the metric tensor):

$$H = \frac{G}{(X^T\,G\,X)^\frac{3}{2}}\,\left(\mathrm{id} - \frac{\dot{X}\,\dot{X}^T\,G}{\dot{X}^T\,G\,\dot{X}}\right)\tag{6}$$

The rightmost term in the brackets $\dot{X}\,\dot{X}^T\,G/(\dot{X}^T\,G\,\dot{X})$ is recognized to be the projector onto the unit length vector parallel to $\dot{X}$, so the Hessian matrix is singular on every tangent space to the configuration space with kernel given by the line $\{t\,\dot{X}:\,t\in\mathbb{R}\}$. A scale multiple of the Hessian matrix is the co-efficient of $\ddot{X}$ in the Euler-Lagrange equation, showing that the Euler-Lagrange equation has a whole family of solutions. Lastly, we can look at the action integral itself and what happens to it when we scale the path parameter $\tau$. Suppose the action is computed over the interval $\tau\in[0,\,1]$, and we introduce a transformation $\tau=\zeta(\sigma)$ where $\sigma$ is any smooth function with $\zeta(0)=0;\,\zeta(1) = 1$ and write $Y(\sigma) = X(\zeta(\sigma))$, $\dot{Y}(\sigma) = \mathrm{d}_\sigma X(\zeta(\sigma))$ then:

$$\mathcal{S} = \int_{\tau=0}^1\,\sqrt{g(\dot{X}(\tau),\,\dot{X}(\tau))}\,\mathrm{d}\tau = \int_{\sigma=0}^1\,\sqrt{g\left(\frac{\dot{Y}(\sigma)}{\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}},\,\frac{\dot{Y}(\sigma)}{\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}}\right)}\,\frac{\mathrm{d}\zeta}{\mathrm{d}\sigma}\,\mathrm{d}\sigma \\= \int_{\tau=0}^1\,\sqrt{g(\dot{Y}(\tau),\,\dot{Y}(\tau))}\,\mathrm{d}\tau\tag{7}$$

Thus if $X(\tau)$ is an extremal path, then so is $X(\zeta(\tau))$ for any smooth, monotonic function with $\zeta(0)=0;\,\zeta(1)=1$. Intuitively, if we drive from A to B along the shortest (or longest) road, we can drive it with any speed versus time graph we choose, but we've still driven the extremal road.

The geodesic flow in the tangent space has multiple flowlines, indeed a whole sheet of flowlines, between any two points in configuration space; the preimages of the projection onto geodesics through configuration space at any given point are rays of tangent vectors, with the preimage over each point in the geodesic containing tangent vectors that are all scale multiples of one-another.

The Good

Let's look at the solution to this problem in semi Riemannian geometry. Here we cheat a bit and extremize the action integral:

$$\mathcal{L} = \int_0^1 g(\dot{X},\,\dot{X})\,\mathrm{d}\tau\tag{9}$$

i.e. we simply forget about the square root! (I suspect this crazy idea was originally tried through sheer desperation). Now we look at the Cauchy-Schwarz inequality for a relationship between the "real" and "cheat" Lagrangians:

$$\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\,\cdot 1\,\mathrm{d}\tau \leq \sqrt{\int_0^1\,g(\dot{X},\,\dot{X})\,\mathrm{d}\tau}\sqrt{\int_0^1\,1\,\mathrm{d}\tau} = \sqrt{\int_0^1\,g(\dot{X},\,\dot{X}),\mathrm{d}\tau}\tag{10}$$

with equality if and only if $\sqrt{g(\dot{X},\,\dot{X})}$ is constant. We have already seen that if $X(\tau)$ minimizes the leftmost integral in (9), then so does $X(\zeta(\tau))$ where $\zeta(0)=0;\,\zeta(1)=1$. So we then find the function $\zeta(\tau)$ that makes $\sqrt{g(\dot{Y},\,\dot{Y})}$ constant and equal to the average speed of $X$ for the minimizing $X$. Cauchy Schwarz saturates for this case, so that we see that the minimum of $\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\cdot 1\,\mathrm{d}\tau$ is precisely the same as the minimum of $\int_0^1\,g(\dot{X},\,\dot{X})\cdot 1\,\mathrm{d}\tau$, given the integrand is positive. On the other hand, if we are seeking to maximize the action (8), which is the case for geodesics in a Lorentzian manifold, then we simply go ahead and maximize (9). By "coincidence", we find that the maximization happens when $g(\dot{X},\,\dot{X})$ is constant, so that the overbound represented by (9) is saturated in this case, so that we have found one of the solutions that maximizes $\int_0^1\,\sqrt{g(\dot{X},\,\dot{X})}\cdot 1\,\mathrm{d}\tau$ also. Having found the one solution that minimizes the left hand side of (1), we can characterize all others through a transformation $\tau=\zeta(\sigma)$ with $\zeta(0)=0;\,\zeta(1)=1$. Or, in General Relativity, we ignore all the other solutions, because we postulate that the physical one is one where an observer’s proper time $\tau$ advances uniformly, the four-velocity is thus constant, acceleration is Minkowski-orthogonal to velocity and $\tau$ is thus affine. So we actually get more than the shape of the geodesic path with this approach; we also get an affine path parameterization.

So we now easily get our Hamiltonian formulation; if we put $\mathcal{L}=\frac{1}{2}\,g(\dot{X},\,\dot{X})$ then we get $P=\dot{X} _\flat;\,p_k = g_{kj} \dot{x}^j$ is simply the covector of $\dot{X}$ found by lowering the index of the latter, and so we have:

$$\mathcal{L} = \mathcal{H} = \frac{1}{2} g(\dot{X},\,\dot{X}) = \frac{1}{2}\,g_{ij}\dot{x}^i\,\dot{x}^j = \frac{1}{2} g^{\sharp\kern+1.4pt\sharp}(P,\,P)=\frac{1}{2}\,g^{ij}\,p_i\,p_j\tag{11}$$

We now see another reason why this crazy solution is beloved of physicists: the Lagrangian and Hamiltonian in (11) are those for the the corresponding formulations of Newtonian mechanics for a free particle. This is therefore a most pleasing, natural analogy when we are thinking about a particle that is "coasting" in an inertial frame. The Euler-Lagrange equation for (11) is readily shown to be $\ddot{X}^k + \Gamma^k_{ij}\,\dot{X}^i\,\dot{X}^j=0$; the analogy with a free particle makes it very satisfying to witness that Newton’s second law is $-g^{kj} \partial_{x^j} V = F^k = m\left(\ddot{X}^k + \Gamma^k_{ij}\,\dot{X}^i\,\dot{X}^j\right)$ when one puts a potential $V(x)$ into the mix. It's a thoroughly beautiful physical analogy. Hamilton’s equations for the geodesic are:

$$\dot{x}^k = g^{kj}\,p_j;\quad \dot{p}_k = -\frac{1}{2}\,\left(\partial_{x^k}\,g^{ij}\right)\,p_i\,p_j\tag{12}$$

Much of the time, this solution is perfectly acceptable in optics, too. Naturally, it will handle all the calculations of rays in smoothly inhomogeneous mediums. In optics, the affine parameter corresponding to proper time in GR is the optical path length, or the total phase delay along the path.

What seems like a cheat at first leads to a solution that is very elegant, smooth and easy and, for General Relativity and indeed most geometry, thoroughly complete.

However, this elegant solution has an awkward property in optics when we hit abrupt interfaces between dielectric mediums, which is an essential situation to analyze when we’re talking about lenses and mirrors, for example. The Hamiltonian approach necessitates a Lagrangian that is at least a $C^2$ function of $\dot{X}$, which assumption breaks down at such an abrupt interface. OK, so we use the Hamiltonian approach aside from at the interface, and we work out what transformation the interface works on the ray state $(X,\,P)$. But it turns out that if we do this, then Snell’s law shows us that:

The transverse components of the optical momentums are continuous across the interface, whereas the normal component necessarily is not.

That is, the transformation on the optical state $(X,\,P)$ wrought by the ray’s passage across the abrupt interface is not a symplectomorphism. The same is true of mirrors with this approach: $X$ is continuous across the interface, whereas $P\mapsto -P$, so the determinant of this linear transformation is -1 in three dimensions. The easiest way to understand all this is to note that the Hamiltonian in (12) equals the constant speed of the point in $(\text{optical phase per unit time})^2$; we can arbitrarily set this $1/2$ units - we can choose any constant as long as we are consistent (scaled and shifted affine parameters are still affine). If we take on this convention, and if we use co-ordinates which are locally Cartesian at the interface with the $x-y$ plane parallel to and the $z$ direction normal to the interface, then the optical momentums can be shown to be $p_k = n\,\gamma_k$, where $n$ is the refractive index at the point where the ray meets the interface and $\gamma_k$ are the direction cosines that the ray’s direction makes with the orthonormal axes. From here one can readily prove the above assertion about Snell’s law.

This situation brings us to the commoner method of handling singular Legendre Transformations - the use of constraints to get rid of "The Bad and The Ugly" redundancy that we discussed above. The use of $\mathcal{L} =\frac{1}{2} g(\dot{X},\,\dot{X})$ can be thought of as belonging to this idea if we think of it as finding the geodesic together with the constraint that our path parameter should be affine so that $\mathcal{L}=const$. In optics, when there are lenses and mirrors involved, the common solution is to constrain the speed along the path to be such that one of the co-ordinates $x^i$, say $x^3$ is itself the path parameter so that $\dot{x}^3=1$. The most obvious illustration of this idea is where the optical system has an optical axis, we measure the $z$ co-ordinate along this axis and so $z$-co-ordinate is the path parameter. This culls $x^3$ and $p^3$ from the Lagrangian and Hamiltonian, and now the phase space is four rather than six dimensional. More generally, we use generalized co-ordinates so that $\partial_1$ and $\partial_2$ are orthogonal to $\partial_3$ and that surfaces of constant $x^3$ are aligned with the dielectric interfaces. Clearly we can do this: we can use "voltage" co-ordinates (my name, not used in the literature) where the lens surfaces are equipotential surfaces and surfaces of constant $x^3$ in an electrostatics problem, and then the directions of increasing $x^1$ and $x^2$ lie in the equipotential surfaces. The third co-ordinate is then the voltage at any point. If we do this, then the transverse components of the optical momentum are still continuous across each interface. Since the $x^1$ and $x^2$ co-ordinates are also continuous, the dielectric interface now imparts a symplectomorphism - indeed the identity operator - on the optical state in this four-dimensional approach. In Cartesian co-ordinates, with $z$ as the path parameter, this approach looks like:

$$\begin{array}{ll}\mathcal{L} = n\,\sqrt{1+\dot{x}^2+\dot{y}^2} & \mathcal{H}=-\sqrt{n^2-p_x^2-p_y^2}\\ p_x = n\frac{\dot{x}}{1+\dot{x}^2+\dot{y}^2} & p_y = n\frac{\dot{y}}{1+\dot{x}^2+\dot{y}^2};\;\end{array}\tag{13}$$

and the epigraphs of both $\mathcal{L}$ and $\mathcal{H}$ are both perfectly well behaved, convex / concave hyperboloids. The path parameter is not affine, though, so you can't easily use this approach to calculate where the phasefronts are.

Sometimes in optics we use both approaches: if you want to calculate where the wavefronts of a field diverging from a source are, then you are clearly going to need the affine path parameters to know where the surface of each constant phase cross the geodesics and the six dimensional, $\mathscr{L} = g(\dot{X},\,\dot{X})$ approach. To do Ray Transfer Analysis, or if you need to avail yourself of the powerful optical invariant or the étendue notions (which are both invariant differential forms on symplectic optical phase space), then you need all the interfaces in the system to impart symplectomorphisms on the optical state and one will use the four-dimensional approach.

Boy S · Answer 3 · 2017-06-15T14:12:54.913

If you want a classical simple, physically significant and famous example of a toy model in fluid mechanics, you can consider heavy particles in a cell flow:

\begin{eqnarray} \ddot x&=& -\frac{\dot x- U sin(y)}{\tau}\nonumber\\ \ddot y&=& -\frac{\dot y- U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

where $\tau$ is the Stokes time of the drag force of the particles. The sines represent the carrier flow, which is made of cells. In some situations, in order to study the asymptotic behaviour, it turns out to be of interest to neglect the drag proportional to the velocity and you arrive at:

\begin{eqnarray} \ddot x&=& -\frac{ U sin(y)}{\tau}\nonumber\\ \ddot y&=& -\frac{U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

You can write this system in the Phase space:

\begin{eqnarray} \dot x&=& u\nonumber\\ \dot u&=& \frac{U sin(y)}{\tau}\nonumber\\ \dot y&=& v\nonumber\\ \ddot v&=& \frac{ U sin(x)}{\tau}\nonumber\\ \end{eqnarray}

This system comes from the Hamiltonian $$ H(x,y,u,v)=uv+U\frac{cos x+ cos y}{\tau} $$

which is neither convex nor bounded. The Lagrangian immediately follows.

What happens then? For the foregoing systems, you have numerical evidence of chaos, asymptotic stable and unstable points and coherent super diffusion... The dynamical explanation of the latter is still an open problem for cell flows.

$H$ is not particularly pathological though. The epigraph of $H = u,v$ is a two sheeted hyperboloid, each sheet being convex. Rotate your co-ordinates by $u=\tilde{u}+\tilde{v}$, $v=\tilde{u}-\tilde{v}$ to get $H=\tilde{u}^2-\tilde{v}ˆ2$ to see this more clearly. That's why the Legendre transform works smoothly here, the Hessian matrix {{0,1},{1,0}} is never singular. The function is not globally convex in $x$ and $y$, but it is locally concave / convex and the Hessian matrix only has isolated singularities. As in Qmechanic's answer, this is where you can analyze the problem in "sectors". — Selene Routley, Jun 15 '17 at 23:37

Are the Hamiltonian and Lagrangian always convex functions?

3 Answers3

Linked