34

Below, we compute with exact real numbers using a realistic / conservative model of computability like Type Two Effectivity.

Assume that there is an algorithm that, given a symmetric real matrix $M$, finds an eigenvector $v$ of $M$ of unit length.

Let

$$M(\epsilon) = \begin{cases} \left[\begin{matrix}1 & \epsilon\\ \epsilon & 1\end{matrix}\right] ,& \epsilon \geq 0 \\ \left[\begin{matrix}1 - \epsilon & 0\\0 & 1 + \epsilon\end{matrix}\right] ,& \epsilon \leq 0 \end{cases}$$ and assume that it's possible to find an eigenvector $v$ of $M(\epsilon)$.

  • If $\epsilon > 0$ then $v$ must necessarily be $\pm \frac 1 {\sqrt 2}\left[\begin{matrix}1\\1\end{matrix}\right]$ or $\pm \frac 1 {\sqrt 2}\left[\begin{matrix}-1\\1\end{matrix}\right]$. Observe that in all four cases, the $L^1$ norm of $v$ is $\sqrt 2$.

  • If $\epsilon < 0$, then $v$ must necessarily be $\pm\left[\begin{matrix}1\\0\end{matrix}\right]$ or $\pm\left[\begin{matrix}0\\1\end{matrix}\right]$. Observe that in all four cases, the $L^1$ norm of $v$ is $1$.

It's easily determinable whether the $L^1$ norm of $v$ is less than $\sqrt 2$ or greater than $1$. Therefore we can decide whether $\epsilon \leq 0$ or $\epsilon \geq 0$, which is impossible!

In a way, this is strange, because many sources say that the Singular Value Decomposition (SVD) and Schur Decomposition (which are generalisations of the Spectral Decomposition) are numerically stable. They're also widely used in numerical applications. But I've just tested the examples above for small $\epsilon$ using SciPy and got incorrect results.

So my question is, how do numerical analysts get around this problem? Or why is this apparently not a problem?

I could venture some guesses: While finding eigenvectors of general matrices may be impossible, it is possible to find their eigenvalues. Also, it's possible to "shift" a problematic matrix by some small $\epsilon$ so that its eigendecomposition is computable.

wlad
  • 4,823
  • 2
    I feel that the first sentence is a bit vague: "Assume that there is an algorthim that..." but you don't say in which computation model you place the question. This seems to me of paramount importance for the rest of the question (BSS will give you a different answer compared to the case where $\epsilon$ must be written down in a real-world computer memory)(also the 0 test is computable in floating-point arithmetic in real-world computers). – Loïc Teyssier Aug 24 '20 at 14:33
  • 1
    @LoïcTeyssier I assume a realistic model of computation like TTE. BSS is not realistic in my view – wlad Aug 24 '20 at 14:33
  • 3
    @ogogmad, the word "realistic" doesn't help –  Aug 24 '20 at 14:35
  • @LoïcTeyssier See edit – wlad Aug 24 '20 at 14:39
  • @MattF. See edit – wlad Aug 24 '20 at 14:45
  • Your matrix function $M : \epsilon\mapsto M(\epsilon)\in M_{2\times2}(\mathbb{R})$ is itself not computable, or is it ? – Loïc Teyssier Aug 24 '20 at 14:56
  • 2
    @LoïcTeyssier It is. It's a continuous function – wlad Aug 24 '20 at 14:57
  • 1
    Not every continuous is computable, I guess – Loïc Teyssier Aug 24 '20 at 14:58
  • 9
    @LoïcTeyssier $M(\epsilon) = \left[\begin{matrix}\min\left(0, \epsilon\right) + 1 & \max\left(0, \epsilon\right)\\max\left(0, \epsilon\right) & 1 - \min\left(0, \epsilon\right)\end{matrix}\right] $, which is clearly computable – wlad Aug 24 '20 at 15:00
  • 3
    I stand corrected ;) – Loïc Teyssier Aug 24 '20 at 15:02
  • @MattF. What do you mean by it doesn't help? – wlad Aug 25 '20 at 08:51
  • 1
    It’s like “vote for my candidate because my candidate is better” — the phrase “because my candidate is better” or “because my model is more realistic” doesn’t add any information. –  Aug 25 '20 at 11:02
  • 1
    @MattF. I don't want to get into a long discussion about this, but BSS is an unrealistic model of computation with real numbers, and TTE is easily implemented physically. And I'm not the first to say that one is more realistic than the other. Your analogy is flawed. I did not say that one was better, I said that one was more realistic – wlad Aug 25 '20 at 11:21
  • I know nothing about computability theory, but I don't understand why determining if $\epsilon \geq 0$ gives you some kind of contradiction if, as you assert, $\min(0,\epsilon)$ is computable. What am missing? – Jason DeVito - on hiatus Aug 26 '20 at 22:25
  • 4
    @JasonDeVito If $x$ and $y$ are approximately equal, then they are both approximately equal to $\min(x, y)$. Thus one can get a good approximation of $\min(x, y)$ from good approximations of $x$ and $y$, even if the approximations aren't good enough to be certain which is actually smaller. – lambda Aug 27 '20 at 01:03
  • @lambda: That makes sense! Thanks – Jason DeVito - on hiatus Aug 27 '20 at 01:29
  • Can you make this a bit more self-contained by saying what "TTE" is? – Michael Hardy Nov 12 '22 at 19:57
  • @MichaelHardy I've expanded the acronym. The standard reference is Computable Analysis by Weihrauch. I might include a link to that. [UPDATE] Done. – wlad Nov 14 '22 at 13:36
  • @wlad : Thank you. – Michael Hardy Nov 15 '22 at 21:32

3 Answers3

78

The singular value decomposition, when applied to a real symmetric matrix $A = \sum_i \lambda_i(A) u_i(A) u_i(A)^T$, computes a stable mathematical object (spectral measure $\mu_A = \sum_i \delta_{\lambda_i(A)} u_i(A) u_i(A)^T$, which is a projection-valued measure) using a partially unstable coordinate system (the eigenvalues $\lambda_i(A)$ and eigenvectors $u_i(A)$; the eigenvalues are stable, but the eigenvectors are not). The numerical instability of the latter reflects the coordinate singularities of this coordinate system, but does not contradict the stability of the former. But in numerical computations we have to use the latter rather than the former, because standard computer languages have built-in data representations for numbers and vectors, but usually do not have built-in data representations for projection-valued measures.

An analogy is with floating-point arithmetic. The operation of multiplication of two floating point numbers (expressed in binary $x = \sum_i a_i(x) 2^{-i}$ or decimal $x = \sum_i b_i(x) 10^{-i}$) is a stable (i.e., continuous) operation on the abstract real numbers ${\bf R}$, but when viewed in a binary or decimal representation system becomes "uncomputable". For instance, the square of $1.414213\dots$ could be either $1.99999\dots$ or $2.0000\dots$, depending on exactly what is going on in the $\dots$; hence questions such as "what is the first digit of the square of $1.414213\dots$" are uncomputable. But this is an artefact of the numeral representation system used and is not an indicator of any lack of stability or computability for any actual computational problem that involves the abstract real numbers (rather than an artificial problem that is sensitive to the choice of numeral representation used). In contrast, floating point division when the denominator is near zero is a true singularity; regardless of what numeral system one uses, this operation is genuinely discontinuous (in a dramatic fashion) on the abstract reals and generates actual instabilities that cannot be explained away as mere coordinate singularity artefacts.

Returning back to matrices, whereas the individual eigenvectors $u_i(A)$ of a real symmetric matrix $A$ are not uniquely defined (there is a choice of sign for $u_i(A)$, even when there are no repeated eigenvalues) or continuously dependent on $A$, the spectral measure $\mu_A := \sum_i \delta_{\lambda_i(A)} u_i(A) u_i(A)^T$ is unambiguous; it is the unique projection-valued measure for which one has the functional calculus $$ f(A) = \int_{\bf R} f(E)\ d\mu_A(E)$$ for any polynomial $f$ (or indeed for any continuous function $f \colon {\bf R} \to {\bf R}$). The spectral measure $\mu_A$ depends continuously on $A$ in the vague topology; indeed one has the inequality $$ \| f(A) - f(B) \|_F \leq \|f\|_\text{Lip} \|A-B\|_F$$ for any real symmetric $A,B$ and any Lipschitz $f$, where $\|\|_F$ denotes the Frobenius norm (also known as the Hilbert-Schmidt norm or 2-Schatten norm). This allows for the possibility for stable computation of this measure, and indeed standard algorithms such as tridiagonalisation methods using (for instance) the QR factorisation and Householder reflections do allow one to compute this measure in a numerically stable fashion (e.g., small roundoff errors only lead to small variations in any test $\int_{\bf R} f(E)\ d\mu_A(E)$ of the spectral measure $\mu_A$ against a given test function $f$), although actually demonstrating this stability rigorously for a given numerical SVD algorithm does require a non-trivial amount of effort.

The practical upshot of this is that if one uses a numerically stable SVD algorithm to compute a quantity that can be expressed as a numerically stable function of the spectral measure (e.g., the inverse $A^{-1}$, assuming that the spectrum is bounded away from zero), then the computation will be stable, despite the fact that the representation of this spectral measure in eigenvalue/eigenvector form may contain coordinate instabilities. In examples involving eigenvalue collision such as the one you provided in your post, the eigenvectors can change dramatically (while the eigenvalues remains stable), but when the time comes to apply the SVD to compute a stable quantity such as the inverse $A^{-1}$, these dramatic changes "miraculously" cancel each other out and the algorithm becomes numerically stable again. (This is analogous to how a stable floating point arithmetic computation (avoiding division by very small denominators) applied to an input $x = 1.99999\dots$ and an input $x' = 2.00000\dots$ will lead to outcomes that are very close to each other (as abstract real numbers), even though all the digits in the representations of $x$ and $x'$ are completely different; the changes in digits "cancel each other out" at the end of the day.)

[The situation is a bit more interesting when applying the SVD to a non-symmetric matrix $A = \sum_i \sigma_i(A) u_i(A) v_i(A)^T$. Now one gets two spectral measures, $\mu_{(A^* A)^{1/2}} = \sum_i \delta_{\sigma_i(A)} v_i(A) v_i(A)^T$ and $\mu_{(AA^*)^{1/2}} = \sum_i \delta_{\sigma_i(A)} u_i(A) u_i(A)^T$ which are numerically stable, but these don't capture the full strength of the SVD (for instance, they are not sufficient for computing $A^{-1}$). The non-projection-valued spectral measure $\mu_A = \sum_i \delta_{\sigma_i(A)} u_i(A) v_i(A)^T$ does capture the full SVD in this case, but is only stable using the vague topology on the open half-line $(0,+\infty)$, that is to say $\int_0^\infty f(E)\ d\mu_A(E)$ varies continuously with $A$ as long as $f$ is a test function compactly supported in $(0,+\infty)$, but is unstable if tested by functions that do not vanish at the origin. This is ultimately due to a genuine singularity in the polar decomposition of a non-selfadjoint matrix when the matrix becomes singular, which in one dimension is simply the familiar singularity in the polar decomposition of a complex number near the origin.]

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
Terry Tao
  • 108,865
  • 31
  • 432
  • 517
  • 3
    Are you sure about the Lipschitz estimate in the operator norm? On $B(\ell^2)$ already the absolute value map is not Lipschitz continuous on the self-adjoint part, so I would expect the constant in this inequality to blow up as the size of the matrices increases. It is fine for the Hilbert-Schmidt norm, though. – Mateusz Wasilewski Aug 24 '20 at 19:41
  • 1
    Oops, you are right, thanks! – Terry Tao Aug 24 '20 at 20:03
  • 1
    So the purpose of the spectral decomposition is to extend a continuous function $f:\mathbb R \to \mathbb R$ to matrices, by using the identity $f(U\Sigma U^) = U f(\Sigma) U^$. Have I understood that correctly? – wlad Aug 25 '20 at 07:46
  • In the case of the SVD, the integral $\int_0^\infty f(E)\ d\mu_A(E)$ where $\mu_A = \sum_i \delta_{\sigma_i(A)} u_i(A) v_i(A)^T$ doesn't seem to correspond to any such identity – wlad Aug 25 '20 at 07:53
  • 16
    I would just like to remark to this very nice answer that there is no a priori reason why we could not directly implement the spectral measure on a computer, nor is there any reason, other than historic accident, that the floating-point format uses numerically unstable representation of the reals. Signed binary (with digits $-1, 0, 1$) is numerically stable. – Andrej Bauer Aug 25 '20 at 08:12
  • @AndrejBauer I found this page about signed digit representation where it mentions Booth representation. Is that what you're referring to? – Loïc Teyssier Aug 25 '20 at 10:52
  • @LoïcTeyssier: yes. Although the most efficient exact real arithmetic implementations tend to use sequences of (nested) intervals to represent reals, not streams of signed digits, see for example iRRAM – Andrej Bauer Aug 25 '20 at 11:15
  • 1
    Perhaps it is helpful to think about some of the standard things you would want to do with an SVD decomposition from this perspective. For example, you often replace a matrix $\sum \lambda_i u_i v_i^T$ by a low rank approximation $\sum_{\lambda_i > \kappa} \lambda_i u_i v_i^T$ for some cut off $\kappa$. This is integrating the spectral measure against $x \chi_{\mu}(x)$ where $\chi_{\mu}$ is the characteristic function of $(\kappa, \infty)$. (continued) – David E Speyer Aug 25 '20 at 14:26
  • The characteristic function is discontinuous at $\kappa$ but, if we replaced it by some function which was $0$ on $[0, \kappa-\epsilon]$, $1$ on $[\kappa+\epsilon,\infty)$ and continuous between, then this would fit into the framework of this answer. Similarly, if we are doing dimensional reduction by PCA, then we use the projection operator $\sum_{\lambda>\kappa} u_i u_i^T$ and again, if we fix the discontinuity in the characteristic function, we are in the framework of this answer. – David E Speyer Aug 25 '20 at 14:29
  • That said, it seems to me that we could construct situations where this would cause a problem. For example, spectral clustering algorithms https://en.wikipedia.org/wiki/Spectral_clustering often compute the largest eigenvector. The largest eigenvector (or the projection onto its span) is a discontinuous function of $A$ near matrices with repeated large eigenvalue. Is this a problem in practice? I don't know. – David E Speyer Aug 25 '20 at 14:33
  • @ogogmad Yes, in the (essentially) self-adjoint case the spectral measure is precisely the Riesz representation of the functional calculus. For non-selfadjoint $A$ one has $\int_0^\infty f(E)\ d\mu_A(E) = A g((A^A)^{1/2}) = g((AA^)^{1/2}) A$ whenever $f(E) = E g(E)$ for a continuous $g$ (this occurs in particular when $f$ is an odd polynomial). With a left-polar decomposition $A = UP$ or right-polar decomposition $A=QV$ of a non-singular $A$ one has $\int_0^\infty f(E)\ d\mu_A(E) = U f(P) = f(Q) V$ for any continuous $f$. – Terry Tao Aug 25 '20 at 15:31
  • 1
    @DavidESpeyer If the spectrum is sufficiently discrete then instead of selecting a cutoff $\kappa$ in advance, one can first scan the relevant portion of the spectrum for a decent spectral gap (or appeal to the pigeonhole principle for the abstract existence of such a gap), and then place the spectral cutoff in this gap, in which case one achieves a level of numerical stability inversely proportional to the width of the gap. – Terry Tao Aug 25 '20 at 15:40
  • As a variant of this idea, instead of working with gaps that are completely devoid of spectrum, one can also use the pigeonhole principle (or direct numerical search) to locate intervals which contain only a small (but possibly nonzero) amount of "spectral energy". This "energy pigeonholing" idea for instance powers the spectral proof of the Szemeredi regularity lemma https://terrytao.wordpress.com/2012/12/03/the-spectral-proof-of-the-szemeredi-regularity-lemma/ , and was also widely used by Bourgain in several of his papers on PDE and also density Ramsey theory. – Terry Tao Aug 25 '20 at 15:42
23

The SVD decomposition falls under the family of phenomena where discontinuity implies non-computability. (Intuitively, this is because, at the point of discontinuity, infinite precision is required.)

In this particular case we speak of the (dis)continuity of a multivalued function which takes a matrix to any of its decompositions, or better, the non-existence of a realizer for the statement "For every matrix $M$ there exist suitable $U$, $\Sigma$, $V$ yielding an SVD decomposition of $M$." I belive this statement has no conitnuous realizer in function realizability, and hence no computable one either.

Some other examples of this phenomenon are:

  • The sign function $\mathrm{sgn} : \mathbb{R} \to \{-1,0,1\}$ is discontinuous, therefore non-computable. In fact, every computable map $\mathbb{R} \to \{0,1\}$ is constant.
  • The rank of a matrix is non-computable.
  • Gaussian elimination (as taught in school) cannot be performed because testing for zero is non-computable.
  • The number of distinct zeroes of a polynomial is non-computable.

So why are these, along with your observation, a problem? There are several answers, depending on the context.

In floating-point numerics, all computations are done with a fixed finite precision and numerical errors are simply unavoidable. In this setting the non-computability manifests itself as numerical instability. In your case, we might simply compute the wrong decomposition.

In some situations we can restrict to computation in a subring of $\mathbb{R}$ in which the problem disappears. For example, many of the above problems are non-existent when we restrict to $\mathbb{Q}$ or the algebraic numbers.

In exact real-arithmetic there are no numerical errors, as precision always adapts automatically to achieve the desired result. In this setting non-computability really is non-computability. The algorithm will diverge at points of discontinuity. In your case, it will just run forever trying to determine in which of the two cases it is.

There are models of real-number computation that pretend we can perform exact zero-testing, notably the Blum-Shub-Smale model. They are often used in computational geometry to side-step questions about non-computability. There are various theorems guaranteeing that a small perturbation of the input can get us out of trouble, at the price of possibly computing the wrong result.

Andrej Bauer
  • 47,834
  • I've known about the uncomputability of discontinuous maps for a long time... – wlad Aug 24 '20 at 12:39
  • Also, you haven't answered my question. Read it again. I said that SVD is considered numerically stable by many sources – wlad Aug 24 '20 at 12:39
  • So numerical stability is not as closely linked to discontinuity as you claim – wlad Aug 24 '20 at 12:40
  • You cannot compute an eigenvector given an eigenvalue. That's a mistake – wlad Aug 24 '20 at 12:40
  • 4
    I thought your question was, and I quote: "So my question is, how do numerical analysts get around this problem? Or why is this apparently not a problem?" What about SVD? Are you asking whether it is computable? – Andrej Bauer Aug 24 '20 at 12:40
  • I know it's not computable. But it's apparently not a problem – wlad Aug 24 '20 at 12:41
  • 1
    Ok, at this point I do not know what I am supposed to answer, sorry. I mean, I can answer "yeah, it seems so", but that's not what you're expecting. – Andrej Bauer Aug 24 '20 at 12:41
  • I'm not sure what I'm asking either, to be honest... – wlad Aug 24 '20 at 12:42
  • I'd say leave the question. If it starts getting negative votes, then maybe delete it. But it's in general useful to let Google slurp it, as the question is a pretty common one. – Andrej Bauer Aug 24 '20 at 12:45
  • 5
    I really liked this answer. – Nik Weaver Aug 24 '20 at 14:21
  • Again, you cannot compute an eigenvector given an eigenvalue $\lambda$. Lemme find a counterexample – wlad Aug 24 '20 at 14:26
  • Actually, the $M(\epsilon)$ in my question provides a counterexample – wlad Aug 24 '20 at 14:27
  • 1
    Computing an eigenvector to a given eigenvalue would be possible if one knew the dimension of the corresponding eigenspace, if I recall correctly. – Arno Aug 24 '20 at 15:11
  • The SVD is indeed perfectly stable and computable for a fixed matrix - in your case, for a fixed value of $\epsilon$. The fact that the singular values and vectors may be discontinuous functions of $\epsilon$ is well known - at least, as an applied mathematician working in industry, it is something that "everybody working in my problem domain knows" if only from practical experience. (It does have practical consequences, but in fact using a more complete model of the underlying physics removes the mathematical discontinuity at $\epsilon = 0$ anyway. – alephzero Aug 24 '20 at 17:38
  • Thanks @Arno, I removed that false claim about eigenvectors. – Andrej Bauer Aug 24 '20 at 18:53
  • 6
    @alephzero: stability and comptuabilithy don't make sense for a fixed matrix. There needs to be some sort of a parameter somewhere, because for a fixed matrix there is nothing to compute, as the SVD decomposition is fixed as well. Perhaps I misunderstand your point. – Andrej Bauer Aug 24 '20 at 18:54
  • 1
    @alephzero: I would be amazed if SVD decomposition is really numerically stable (without any hidden assumptions) becuase I have never seen a procedure that is both non-comptuable and numerically stable. Can you provide a reference to the claim that SVD is stable? Just to avoid a misunderstanding: by "stable" I mean "stable under small perturbations". – Andrej Bauer Aug 24 '20 at 19:07
  • 4
    @AndrejBauer The confusion here is backwards vs. forwards stability. When people say that the SVD is stable, they mean that it is backwards stable, whereas here you are talking about forwards stability. – Nick Alger Aug 25 '20 at 03:01
  • @NickAlger: Thanks for the clarification. – Andrej Bauer Aug 25 '20 at 08:06
  • The first paragraph is wrong. First of all, the SVD is not a function because it's one-to-many. Therefore, its uncomputability is not reducible to discontinuity in a straightforward way – wlad Aug 25 '20 at 09:57
  • For example, there is no continuous function which given a vector in $\mathbb R^3$ produces a perpendicular vector. But the problem is still computable – wlad Aug 25 '20 at 09:59
  • 1
    I did not mean to be condescending, I apologize if I gave you that impression. I edited the first paragraph and added a bit more information on how exactly this is about discontinuity to address your other remark. But no, I do not remember every single person on the internet that I have communicated with. – Andrej Bauer Aug 25 '20 at 10:21
22

This is primarily an issue of backwards vs. forwards stability. Good SVD algorithms are backwards stable in the sense that the computed singular values and singular vectors are the true singular values and singular vectors of a slightly perturbed problem. You may see this by noting that while $P$ may change drastically as you change $\epsilon$, the product $PDP^T$ changes negligibly.

The SVD is not forwards stable when the singular values have small spectral gap, as your example demonstrates and other answers here discuss in more detail.

For more on backwards and forwards stability, see, e.g., this post and the links therein: https://math.stackexchange.com/a/78907/3060

SCIPY uses LAPACK; some details on the stability of the algorithm are provided here: https://www.netlib.org/lapack/lug/node97.html

Nick Alger
  • 1,150
  • Now I'm confused: I always thought that backwards stability implies forwards stability. (Indeed, with the definitions I'm familiar with, I seem to remember the proof to be quite simple.) Are there different definitions in play here? – gmvh Nov 14 '22 at 14:57
  • 3
    @gmvh The forward error is bounded by the backwards error times the condition number of the problem. But here the condition number (of the mapping from matrices to their singular vectors) can become arbitrarily large – Nick Alger Nov 14 '22 at 19:58