Matrix derivative of a matrix with constraints

Question

I am looking for a general method to obtain derivative rules of a constrained matrix with respect to its matrix elements.

In the case of a symmetric matrix $S_{ij}$ (with $S_{ij}=S_{ji}$), one way to do that is the following (see Variation of the metric with respect to the metric). We say that a variation of a matrix element $\delta S_{ij}$ is the same than that of $\delta S_{ji}$, and thus $$ \delta S_{ij}=\frac{\delta S_{ij}+\delta S_{ji}}{2}=\frac{\delta_{ik}\delta_{jl}+\delta_{il}\delta_{jk}}{2}\delta S_{kl}=\mathcal S_{ij;kl}\delta S_{kl}. $$ The tensor $\mathcal S_{ij;kl}$ has the nice property that $\mathcal S_{ij;kl}\mathcal S_{kl;mn}=\mathcal S_{ij;mn}$. One then says that $$ \frac{\delta S_{ij}}{\delta S_{kl}}=\mathcal S_{ij;kl}. $$

I must admit that why this is the correct procedure is not quite clear to me (that seems to be quite arbitrary, although obviously it works to compute derivatives of a function of a symmetric matrix). This means that it is not clear to me how to generalize that when the constraint is different.

For example, let's take the set of matrices $O$ belonging to the group $SO(N)$. Is there a way to write $\frac{\delta O_{ij}}{\delta O_{kl}}$ in terms of a tensor $\mathcal B_{ij;kl}$, with all the same nice properties ?

In the case of $SO(2)$, this seems quite easy, since then $O_{ji}=(-1)^{i+j}O_{ij}$, and one finds in that case $$ \frac{\delta O_{ij}}{\delta O_{kl}}=\frac{\delta_{ik}\delta_{jl}+(-1)^{i+j}\delta_{il}\delta_{jk}}{2}, $$ which indeed does the job. Note however that I haven't use the defining property of $SO(N)$, that is $O O^T=1$, and I am not sure if this is relevant...

Already in the case of $SO(3)$, it does not seem to be easy to find the equivalent tensor...

Side note : using the defining property of $SO(2)$, one can massage the formulas to obtain $$ \frac{\delta O_{ij}}{\delta O_{kl}}=-O_{il}O_{kj}. $$ First of all, it depends explicitly on $O$, which seems bad. Furthermore, if we tentatively define $\mathcal B_{ij;kl}[O]=-O_{il}O_{kj}$ (which already is different from what we found for $SO(2)$), then we have $\mathcal B_{ij;kl}[O]\mathcal B_{kl;mn}[O]=\delta_{im}\delta_{jn}$, which seems pretty weird...

If anyone knows the standard procedure (if any exists) or a good reference, that would be greatly appreciated. In any case, a nice explanation (maybe a bit formal) in the case of symmetric matrix might also help me to get my head around the problem.

Qmechanic · Accepted Answer · 2020-05-20T12:42:08.637

Setup. Let there be given an $m$-dimensional manifold $M$ with coordinates $(x^1, \ldots, x^m)$. Let there be given an $n$-dimensional physical submanifold $N$ with physical coordinates $(y^1, \ldots, y^n)$. Let there be given $m-n$ independent constraints $$ \chi^1(x)~\approx~ 0,\quad \ldots,\quad \chi^{m-n}(x)~\approx~ 0, \tag{1}$$ which defines the physical submanifold $N$. [Here the $\approx$ symbol means weak equality, i.e. equality modulo the constraints.] Assume that $$(y^1, \ldots, y^n, \chi^1, \ldots,\chi^{m-n})\tag{2}$$ constitutes a coordinate system for the extended manifold $M$.
Dirac derivative. In analogy with the Dirac bracket, let us introduce a Dirac derivative $$\begin{align}\left(\frac{\partial}{\partial x^i}\right)_{\! D} ~:=~&\frac{\partial}{\partial x^i} -\sum_{a=1}^{m-n}\frac{\partial \chi^a}{\partial x^i}\left(\frac{\partial }{\partial \chi^a}\right)_{\! y} ~=~\sum_{\alpha=1}^n\frac{\partial y^{\alpha}}{\partial x^i} \left(\frac{\partial }{\partial y^{\alpha}}\right)_{\! \chi} ,\cr & \qquad i~\in~\{1,\ldots, m\}, \end{align}\tag{3}$$ that projects onto the physical submanifold $$\left(\frac{\partial}{\partial x^i}\right)_{\! D} y^{\alpha}~=~\frac{\partial y^{\alpha}}{\partial x^i} , \qquad \left(\frac{\partial}{\partial x^i}\right)_{\! D} \chi^a~=~0, $$ $$i~\in~\{1,\ldots, m\},\qquad \alpha~\in~\{1,\ldots, n\},\qquad a~\in~\{1,\ldots, m\!-\!n\}.\tag{4}$$
Remark. In many important cases it is possible to choose the physical coordinates $(y^1, \ldots, y^n)$ such that the Dirac derivative (4) can be written as linear combinations of unconstrained partial $x$-derivatives only, without referring to the $(y,\chi)$-coordinate system (2), cf. eqs. (10) & (14) below.
Does Dirac derivatives commute? Does the commutator $$\left[ \left(\frac{\partial}{\partial x^i} \right)_{\! D}, \left(\frac{\partial}{\partial x^j}\right)_{\! D}\right] ~=~\sum_{\alpha,\beta=1}^n\frac{\partial y^{\alpha}}{\partial x^i}\left[\left(\frac{\partial }{\partial y^{\alpha}}\right)_{\! \chi}, \frac{\partial y^{\beta}}{\partial x^j}\right] \left(\frac{\partial }{\partial y^{\beta}}\right)_{\! \chi} - (i\leftrightarrow j)~\stackrel{?}{\approx}~0 \tag{5}$$ vanishes weakly? Not necessarily. But if the coordinate transformation $x^i \leftrightarrow (y^{\alpha}, \chi^a)$ is linear, then the Dirac derivatives commute.
Example. Let the physical subspace be the hyperplane $N=\{\chi(x)=0\}$ with the constraint $$\chi~=~\sum_{i=1}^m x^i.\tag{6}$$ Define physical coordinates $$y^{\alpha}~=~ x^{\alpha} -\frac{1}{m}\sum_{i=1}^m x^i, \qquad \alpha~\in~\{1,\ldots, n\!=\!m\!-\!1\}.\tag{7} $$ Conversely, $$ x^{\alpha}~=~y^{\alpha} +\frac{1}{m}\chi, \qquad \alpha~\in~\{1,\ldots, n\}, \qquad x^m~=~-\sum_{\beta=1}^n y^{\beta}+\frac{1}{m}\chi.\tag{8}$$ The derivatives are related as $$\frac{\partial}{\partial x^{\alpha}} ~=~\left(\frac{\partial }{\partial y^{\alpha}}\right)_{\! \chi} -\frac{1}{m} \sum_{\beta=1}^n\left(\frac{\partial }{\partial y^{\beta}}\right)_{\! \chi} +\left(\frac{\partial }{\partial \chi}\right)_{\! y}, \qquad \alpha~\in~\{1,\ldots, n\}, $$ $$\frac{\partial}{\partial x^m} ~=~ -\frac{1}{m} \sum_{\beta=1}^n\left(\frac{\partial }{\partial y^{\beta}}\right)_{\! \chi} +\left(\frac{\partial }{\partial \chi}\right)_{\! y}.\tag{9}$$ The Dirac derivative becomes after some algebra $$\left(\frac{\partial}{\partial x^i}\right)_{\! D} ~=~\frac{\partial}{\partial x^i} -\left(\frac{\partial }{\partial \chi}\right)_{\! y} ~=~\frac{\partial}{\partial x^i} -\frac{1}{m} \sum_{j=1}^m\frac{\partial}{\partial x^j},\qquad i~\in~\{1,\ldots, m\}.\tag{10}$$
Example. Differentiation wrt. a symmetric matrix can be viewed as a Dirac differentiation (3), where the constraints (1) are given by antisymmetric matrices. Define $$ \begin{align}s_{(ij)}~:=~ \frac{M_{ij}+M_{ji}}{2} &\quad\text{and}\quad a_{(ij)}~:=~ \frac{M_{ij}-M_{ji}}{2}\quad\text{for}\quad i~>~j;\cr &\quad\text{and}\quad d_{(i)}~:=~M_{ii}.\end{align}\tag{11}$$ Conversely, $$ M_{ij}~=~\theta_{ij}(s_{(ij)}+a_{(ij)})+\theta_{ji}(s_{(ji)}-a_{(ji)}) + \delta_{ij}d_{(i)} ,\tag{12}$$ where the discrete Heaviside step function $\theta_{ij}$ here is assumed to obey $\theta_{ii}=0$ (no implicit sum). The derivatives are related as $$\frac{\partial}{\partial M_{ij}} ~=~\frac{\theta_{ij}}{2}\left(\frac{\partial}{\partial s_{(ij)}}+\frac{\partial}{\partial a_{(ij)}}\right)+\frac{\theta_{ji}}{2}\left(\frac{\partial}{\partial s_{(ji)}}-\frac{\partial}{\partial a_{(ji)}}\right) + \delta_{ij}\frac{\partial}{\partial d_{(i)}} .\tag{13}$$ The Dirac derivative becomes after some algebra $$ \left(\frac{\partial}{\partial M_{ij}}\right)_{\! D}~=~\frac{\theta_{ij}}{2}\frac{\partial}{\partial s_{(ij)}}+\frac{\theta_{ji}}{2}\frac{\partial}{\partial s_{(ji)}} + \delta_{ij}\frac{\partial}{\partial d_{(i)}} ~=~\frac{1}{2} \left(\frac{\partial}{\partial M_{ij}} +\frac{\partial}{\partial M_{ji}}\right).\tag{14}$$
Remark. Additional complications arise if the coordinates and/or constraints are not globally defined. For starters, it is actually enough if (2) is a coordinate system in a tubular neighborhood of $N$.
Reparametrizations of the constraints. Assume that there exists a second coordinate system $$(\tilde{y}^1, \ldots, \tilde{y}^n, \tilde{\chi}^1, \ldots,\tilde{\chi}^{m-n})\tag{15}$$ (which we adorn with tildes), such that $$\tilde{y}^{\alpha}~=~f^{\alpha}(y), \qquad \tilde{\chi}^a ~=~ g^a(y,\chi)~\approx~0.\tag{16}$$ This implies that $$ \left(\frac{\partial }{\partial \chi^a}\right)_{\! y} ~=~ \left(\frac{\partial \tilde{\chi}^b}{\partial \chi^a}\right)_{\! y} \left(\frac{\partial }{\partial \tilde{\chi}^b}\right)_{\! \tilde{y}}, \qquad \left(\frac{\partial }{\partial y^{\alpha}}\right)_{\! \chi} ~\approx~\left(\frac{\partial \tilde{y}^{\beta}}{\partial y^{\alpha}}\right)_{\! \chi} \left(\frac{\partial }{\partial \tilde{y}^{\beta}}\right)_{\! \tilde{\chi}}, \tag{17}$$ i.e $$\Delta_{\chi}~:=~{\rm span}\left\{\left(\frac{\partial }{\partial \chi^1}\right)_{\! y}, \ldots ,\left(\frac{\partial }{\partial \chi^{n-m}}\right)_{\! y}\right\} ~\subseteq~ TM\tag{18}$$ is an involutive distribution, while $$\Delta_y~:=~{\rm span}\left\{\left(\frac{\partial }{\partial y^1}\right)_{\! \chi}, \ldots ,\left(\frac{\partial }{\partial y^n}\right)_{\! \chi}\right\} ~\subseteq~ TM\tag{19}$$ is a weak distribution.

One may show that the Dirac derivative and its commutators $$ \left(\frac{\partial}{\partial x^i}\right)^{\sim}_{\! D}~\approx~\left(\frac{\partial}{\partial x^i}\right)_{\! D}, \qquad \left[ \left(\frac{\partial}{\partial x^i} \right)^{\sim}_{\! D}, \left(\frac{\partial}{\partial x^j}\right)^{\sim}_{\! D}\right]~\approx~ \left[ \left(\frac{\partial}{\partial x^i} \right)_{\! D}, \left(\frac{\partial}{\partial x^j}\right)_{\! D}\right], \tag{20}$$ [wrt. the tilde and the untilde coordinate systems (15) and (2), respectively] agree weakly. This shows that the Dirac derivative (3) is a geometric construction.
Subsubmanifold. Given a $p$-dimensional physical subsubmanifold $P$ with physical coordinates $(z^1,\ldots,z^p)$. Let there be given $n-p$ independent constraints $$ \phi^1(y)~\approx~ 0,\quad \ldots,\quad \phi^{n-p}(y)~\approx~ 0, \tag{21}$$ which defines the physical submanifold $P$. Assume that $$(z^1, \ldots, z^p, \phi^1, \ldots,\phi^{n-p})\tag{22}$$ constitutes a coordinate system for the submanifold $N$. One may show that $$\left(\frac{\partial}{\partial x^i}\right)^{\!(P)}_{\! D} ~=~\left(\frac{\partial}{\partial x^i}\right)^{\!(N)}_{\! D} - \sum_{a=1}^{n-p}\left(\frac{\partial \phi^a}{\partial x^i}\right)^{\!(N)}_{\! D}\left(\frac{\partial }{\partial \phi^a}\right)_{\!z}, \qquad i~\in~\{1,\ldots, m\}. \tag{23}$$ This shows that the Dirac derivative construction behaves naturally wrt. further constraints.

Also, in practice it is usually far more useful to forget that your matrix is symmetric for when you're differentiating, and impose the symmetricity condition later. — Prof. Legolasov, May 13 '18 at 02:31
Thanks for the answer. However, it is quite unclear to me how to implement that for a specific case (for example, in the case of symmetric matrices, what do $x^i$ and $y^i$ are representing ? The independent matrix element for the latter ?). Would you mind to give the explicit calculation for the symmetric case, so that I can try to generalize that for my cases ? Also, what is a good reference to start learning about that ? — Adam, May 13 '18 at 09:20
Also, I don't understand what $(\partial/\partial \chi^a)_y$ is supposed to mean in practice... — Adam, May 13 '18 at 16:59
Thanks a lot for the example ! If I translate everything between 2 and 3, the $x$'s corresponds to the (independent) elements of $M$, the $s$'s and $d$'s to the $y$'s, and the $a$'s to the $\chi$'s. I'll now try to see if I can make sense of all that for my more complicated cases ! — Adam, May 13 '18 at 18:02
@Qmechanic : I think I understand better my confusion, although I still have some issues. If I were to use standard constrained differentials, I would impose $dM_{ji}=dM_{ij}$, and I would get $\partial_{M_{ij}}|C=\partial{M_{ij}}+\partial_{M_{ji}}$ (which is what one would expect naively). But what you seem to do is to allow for arbitrary $dM_{ji}$ and $dM_{ij}$, while projecting that onto the space $M_{ij}=M_{ji}$, which gives $\partial_{M_{ij}}|D=\frac12\partial{M_{ij}}+\frac12\partial_{M_{ji}}$. My question is : why use one and not the other ? What is the distinction between the two ? — Adam, May 14 '18 at 13:06
Your method seems to be consistent with this paper: https://doi.org/10.1016/0895-7177(95)00082-D Do you have other references about this kind of method? — Adam, May 14 '18 at 13:09
I'm not sure what $\partial_{M_{ij}}|C$ is supposed to mean. Consider elaborate in detail its definition. Only $\partial{M_{ij}}|_D$ seems valid. The Dirac derivative was developed from scratch, inspired by constrained dynamics, cf. e.g. Henneaux & Teitelboim. — Qmechanic, May 14 '18 at 18:50
Here is a simple example of what I mean. Take a function $f(x,y)$, with the constraint $x=y$. The differential of $f$, with constraint, is $df=f^{(1,0)}dx+f^{(0,1)}dy$. The standard method to implement the constraint is to say that $dy=dx$ and thus $df=(f^{(1,0)}+f^{(0,1)})dx$, which gives that the derivative of $f$ wrt $x$ with the constraint is $d f/d x|_C=f^{(1,0)}+f^{(0,1)}$. On the other hand, your method allows an arbitrary change of $sx$ and $dy$, which are then projected onto the subspace of the constraint : $(\delta x,\delta y)=P(dx,dy)$ which is this case means... — Adam, May 15 '18 at 09:03
... $\delta x=\frac12(dx+dy)$ and $\delta y=\frac12(dx+dy)$, while the variation of $f$ is given by $\delta f = f^{(1,0)}\delta x+ f^{(0,1)}\delta y=\frac{ f^{(1,0)}+ f^{(0,1)}}{2}(dx+dy)$, which means that $df/dx|_D=\frac{ f^{(1,0)}+ f^{(0,1)}}{2}$. (NB: if we put $dy=dx$ in the previous formula, we get the same result than with the standard method.) — Adam, May 15 '18 at 09:06

score 1 · Answer 2 · 2018-05-13T15:12:31.813

To me it seems a bit inappropriate to differentiate an orthogonal matrix with respect to its components. By definition, this would mean that you want to find out how the other matrix components change if you vary a component. However, this is only uniquely defined in the case of SO(2), but not for SO($N>2$). To see this more explicitly, consider a rotation in 3D. Here you have 3 angles, and if you want to change one entry there are in general different possibilities. This is of course nothing but the statement that SO($N>2$) has more than one generator.

Therefore, a more reasonable way (IMHO) to differentiate an orthogonal matrix is to write it as $$ O=\exp(T)\;,\quad\text{where}~T~\text{is antisymmetric}$$ and differentiate w.r.t. the components of $T$ in the analogous way to what you quote for the symmetric matrix differentiation. This can be applied in a similar fashion to all matrix groups, e.g. for unitary matrices $T$ will be anti-Hermitean.

Just to elaborate on your statement that the dependence of the derivative is bad: you could derive the formula for SO(2) also by using the paramterization $$ O=\exp(\theta\cdot T_1)=\begin{pmatrix}\cos\theta & \sin\theta\\ -\sin\theta & \cos\theta\end{pmatrix}\;,$$ where $T_0$ is the antisymmetric "unit" matrix. Then $$ \frac{\partial O_{ij}}{\partial O_{k\ell}}=\frac{\partial O_{ij}}{\partial \theta}\cdot\frac{\partial \theta}{\partial O_{k_\ell}}= \frac{\partial O_{ij}}{\partial \theta}\cdot\left(\frac{\partial O_{k_\ell}}{\partial \theta}\right)^{-1}\;.$$ This leads to the same result as above since $$ \frac{\partial O}{\partial \theta}=\begin{pmatrix}-\sin\theta & \cos\theta\\ -\cos\theta & -\sin\theta\end{pmatrix}\;.$$ But it is also clear that the point at which you take the derivative matters.

In the problem I am interested in, I really need to derive with respect to the matrix elements, unfortunately. And I also need a general method, the O(N) case is just an example (which is not exactly the one I am interested in) — Adam, May 13 '18 at 09:47

Matrix derivative of a matrix with constraints

2 Answers2

Linked