113

In a lot of computational math, operations research, such as algorithm design for optimization problems and the like, authors like to use $$\langle \cdot, \cdot \rangle$$ as opposed to $$(\cdot)^T (\cdot)$$

Even when the space is clearly Euclidean and the operation is clearly the dot product. What is the benefit or advantage for doing so? Is it so that the notations generalize nicely to other spaces?

Update: Thank you for all the great answers! Will take a while to process...

  • 15
    If the space is explicitly Euclidean, then perhaps to emphasise that the vectors are geometric objects rather than just arrays of numbers? Otherwise, often you are given a vector space with a scalar product, but no natural choice of a basis. – erz Jul 20 '20 at 06:15
  • 5
    a typographic advantage is that you free the superscript for a label; a logical advantage is that the same notation can be used for complex vectors; furthermore, if you replace the , by a | you can insert operators, $\langle v|O|u\rangle$ (Dirac bra-ket notation). – Carlo Beenakker Jul 20 '20 at 06:16
  • 4
    The bracket notation emphasizes the bi-linear nature of inner product. For instance, you can use a notation like $\langle x, , \cdot \rangle$ to indicate that one argument is fixed, or you can give a very clear interpretation of adjointness (i.e., transposition in the standard case) as "moving the operator through (to the other side)": $$\langle Ax, , y \rangle = \langle x, , A^* y \rangle$$ – Francesco Polizzi Jul 20 '20 at 07:11
  • 26
    The bracket notation is cleary symmetrical while another is not. And, in fact, the question should be asked other way: why to use $(\cdot)^T (\cdot)$ instead of brackets? – user1123502 Jul 20 '20 at 07:26
  • 37
    $\langle u;v\rangle$ is explicitly a number, whereas $u^Tv$ is a 1 by 1 matrix :). – Kostya_I Jul 20 '20 at 07:57
  • 4
    Vectors do not have transposes, they are ordered sets of numbers; that Matlab requires them to be written as $1 \times n$ (or $n \times 1$) matrices is a lack of imagination on their part :-) – J.J. Green Jul 20 '20 at 08:10
  • 6
    @Kostya_I There is a canonical isomorphism between numbers and 1x1 matrices. :) – Federico Poloni Jul 20 '20 at 09:22
  • 3
    (Could we please stop writing answers as comments, so that I do not have to write comments-to-answers in comments-to-comments)? – Federico Poloni Jul 20 '20 at 09:24
  • 2
    I think history should be considered, When I learned mathematics and physics w.r.t. notation (about 1970) there where two worlds. There were no programs like matlab or R. In "Linear algebra" almost always we used the notation $x'y$ and $x$ and $y$ were considered explicitely as column vectors and $x'y$ to be explicitely an element of $\mathbb{C}$ or $\mathbb{R}$. Contrary to that in the lesson about quantum mechanics this notation was never used, but always $\langle x,y \rangle$, $x,y$ being elements of some Hilbert space. – Dieter Kadelka Jul 20 '20 at 09:28
  • 50
    Also, $\langle \cdot,\cdot\rangle$ looks like a curious face without a mouth, while $(\cdot)^T(\cdot)$... uh, nevermind. – Federico Poloni Jul 20 '20 at 11:33
  • 1
    But writing an answer is much more demanding than writing a comment, so I understand that these were posted as comments. – YCor Jul 20 '20 at 11:46
  • 1
    This notation has always bugged me. It seems that the transpose style notation is used much more in statistics, optimization etc. whereas mathematicians (myself included :) ) seem to prefer $\langle \cdot , \cdot \rangle$ – rubikscube09 Jul 20 '20 at 15:08
  • 5
    Wherever possible, I like to use the more economical "undergraduate, dot-product" notation $x\cdot y$. In one of my papers, with a large number of instances of the inner product of vectors in a real Hilbert space, I used the even more economical notation $(xy)$, without any apparent problems. – Iosif Pinelis Jul 20 '20 at 15:45
  • 31
    I have voted to reopen, on the grounds that any question that can attract an answer such as Tao's is surely worth asking. – Mark Wildon Jul 20 '20 at 20:22
  • 5
    A question with net + 4 upvotes, and answers with collective 59 upvotes and 0 downvotes is closed for being off-topic. Marvelous. Simply marvelous. – Mark L. Stone Jul 21 '20 at 01:21
  • 1
    I think it is great that this question was re-opened. I think notation and thinking about notation is really important. – Mary Sp. Jul 21 '20 at 19:01
  • 3
    @FedericoPoloni a bicycle! – Ziofil Jul 22 '20 at 10:21
  • 2
    Here's some controversy...in the complex case, should the first or second entry be conjugate-linear? – Jon Bannon Jul 22 '20 at 18:31
  • For typesetting such stuff $\langle x^T, y^T \rangle$! (more generally, if $x$ and $y$ need to carry either superscripts or subscripts, then $\langle , \rangle$ seems much nicer aesthetically. – Suvrit Jul 22 '20 at 19:09
  • 1
    @JonBannon The one you've chosen upfront and stated unambiguously in the "notation and conventions" section. – lisyarus Jul 24 '20 at 07:55
  • Interestingly, in my studies up until now, I have mostly encountered the notation $(\cdot)^T (\cdot)$ in computational topics such as optimization and $\langle \cdot, \cdot \rangle$ in "purer topics". – Qi Zhu Aug 12 '20 at 10:04

10 Answers10

337

Mathematical notation in a given mathematical field $X$ is basically a correspondence $$ \mathrm{Notation}: \{ \hbox{well-formed expressions}\} \to \{ \hbox{abstract objects in } X \}$$ between mathematical expressions (or statements) on the written page (or blackboard, electronic document, etc.) and the mathematical objects (or concepts and ideas) in the heads of ourselves, our collaborators, and our audience. A good notation should make this correspondence $\mathrm{Notation}$ (and its inverse) as close to a (natural) isomorphism as possible. Thus, for instance, the following properties are desirable (though not mandatory):

  1. (Unambiguity) Every well-formed expression in the notation should have a unique mathematical interpretation in $X$. (Related to this, one should strive to minimize the possible confusion between an interpretation of an expression using the given notation $\mathrm{Notation}$, and the interpretation using a popular competing notation $\widetilde{\mathrm{Notation}}$.)
  2. (Expressiveness) Conversely, every mathematical concept or object in $X$ should be describable in at least one way using the notation.
  3. (Preservation of quality, I) Every "natural" concept in $X$ should be easily expressible using the notation.
  4. (Preservation of quality, II) Every "unnatural" concept in $X$ should be difficult to express using the notation. [In particular, it is possible for a notational system to be too expressive to be suitable for a given application domain.] Contrapositively, expressions that look clean and natural in the notation system ought to correspond to natural objects or concepts in $X$.
  5. (Error correction/detection) Typos in a well-formed expression should create an expression that is easily corrected (or at least detected) to recover the original intended meaning (or a small perturbation thereof).
  6. (Suggestiveness, I) Concepts that are "similar" in $X$ should have similar expressions in the notation, and conversely.
  7. (Suggestiveness, II) The calculus of formal manipulation in $\mathrm{Notation}$ should resemble the calculus of formal manipulation in other notational systems $\widetilde{\mathrm{Notation}}$ that mathematicians in $X$ are already familiar with.
  8. (Transformation) "Natural" transformation of mathematical concepts in $X$ (e.g., change of coordinates, or associativity of multiplication) should correspond to "natural" manipulation of their symbolic counterparts in the notation; similarly, application of standard results in $X$ should correspond to a clean and powerful calculus in the notational system. [In particularly good notation, the converse is also true: formal manipulation in the notation in a "natural" fashion can lead to discovering new ways to "naturally" transform the mathematical objects themselves.]
  9. etc.

To evaluate these sorts of qualities, one has to look at the entire field $X$ as a whole; the quality of notation cannot be evaluated in a purely pointwise fashion by inspecting the notation $\mathrm{Notation}^{-1}(C)$ used for a single mathematical concept $C$ in $X$. In particular, it is perfectly permissible to have many different notations $\mathrm{Notation}_1^{-1}(C), \mathrm{Notation}_2^{-1}(C), \dots$ for a single concept $C$, each designed for use in a different field $X_1, X_2, \dots$ of mathematics. (In some cases, such as with the metrics of quality in desiderata 1 and 7, it is not even enough to look at the entire notational system $\mathrm{Notation}$; one must also consider its relationship with the other notational systems $\widetilde{\mathrm{Notation}}$ that are currently in popular use in the mathematical community, in order to assess the suitability of use of that notational system.)

Returning to the specific example of expressing the concept $C$ of a scalar quantity $c$ being equal to the inner product of two vectors $u, v$ in a standard vector space ${\bf R}^n$, there are not just two notations commonly used to capture $C$, but in fact over a dozen (including several mentioned in other answers):

  1. Pedestrian notation: $c = \sum_{i=1}^n u_i v_i$ (or $c = u_1 v_1 + \dots + u_n v_n$).
  2. Euclidean notation: $c = u \cdot v$ (or $c = \vec{u} \cdot \vec{v}$ or $c = \mathbf{u} \cdot \mathbf{v}$).
  3. Hilbert space notation: $c = \langle u, v \rangle$ (or $c = (u,v)$).
  4. Riemannian geometry notation: $c = \eta(u,v)$, where $\eta$ is the Euclidean metric form (also $c = u \neg (\eta \cdot v)$ or $c = \iota_u (\eta \cdot v)$; one can also use $\eta(-,v)$ in place of $\eta \cdot v$. Alternative names for the Euclidean metric include $\delta$ and $g$).
  5. Musical notation: $c = u_\flat(v)$ (or $c = u^\flat(v)$).
  6. Matrix notation: $c = u^T v$ (or $c = \mathrm{tr}(vu^T)$ or $c = u^* v$ or $c = u^\dagger v$).
  7. Bra-ket notation: $c = \langle u| v\rangle$.
  8. Einstein notation, I (without matching superscript/subscript requirement): $c = u_i v_i$ (or $c=u^iv^i$, if vector components are denoted using superscripts).
  9. Einstein notation, II (with matching superscript/subscript requirement): $c = \eta_{ij} u^i v^j$.
  10. Einstein notation, III (with matching superscript/subscript requirement and also implicit raising and lowering operators): $c = u^i v_i$ (or $c = u_i v^i$ or $c = \eta_{ij} u^i v^j$).
  11. Penrose abstract index notation: $c = u^\alpha v_\alpha$ (or $c = u_\alpha v^\alpha$ or $c = \eta_{\alpha \beta} u^\alpha v^\beta$). [In the absence of derivatives this is nearly identical to Einstein notation III, but distinctions between the two notational systems become more apparent in the presence of covariant derivatives ($\nabla_\alpha$ in Penrose notation, or a combination of $\partial_i$ and Christoffel symbols in Einstein notation).]
  12. Hodge notation: $c = \mathrm{det}(u \wedge *v)$ (or $u \wedge *v = c \omega$, with $\omega$ the volume form). [Here we are implicitly interpreting $u,v$ as covectors rather than vectors.]
  13. Geometric algebra notation: $c = \frac{1}{2} \{u,v\}$, where $\{u,v\} := uv+vu$ is the anticommutator.
  14. Clifford algebra notation: $uv + vu = 2c1$.
  15. Measure theory notation: $c = \int_{\{1,\dots,n\}} u(i) v(i)\ d\#(i)$, where $d\#$ denotes counting measure.
  16. Probabilistic notation: $c = n {\mathbb E} u_{\bf i} v_{\bf i}$, where ${\bf i}$ is drawn uniformly at random from $\{1,\dots,n\}$.
  17. Trigonometric notation: $c = |u| |v| \cos \angle(u,v)$.
  18. Graphical notations such as Penrose graphical notation, which would use something like $\displaystyle c =\bigcap_{u\ \ v}$ to capture this relation.
  19. etc.

It is not a coincidence that there is a lot of overlap and similarity between all these notational systems; again, see desiderata 1 and 7.

Each of these notations is tailored to a different mathematical domain of application. For instance:

  • Matrix notation would be suitable for situations in which many other matrix operations and expressions are in use (e.g., the rank one operators $vu^T$).
  • Riemannian or abstract index notation would be suitable in situations in which linear or nonlinear changes of variable are frequently made.
  • Hilbert space notation would be suitable if one intends to eventually generalize one's calculations to other Hilbert spaces, including infinite dimensional ones.
  • Euclidean notation would be suitable in contexts in which other Euclidean operations (e.g., cross product) are also in frequent use.
  • Einstein and Penrose abstract index notations are suitable in contexts in which higher rank tensors are heavily involved. Einstein I is more suited for Euclidean applications or other situations in which one does not need to make heavy use of covariant operations, otherwise Einstein III or Penrose is preferable (and the latter particularly desirable if covariant derivatives are involved). Einstein II is suitable for situations in which one wishes to make the dependence on the metric explicit.
  • Clifford algebra notation is suitable when working over fields of arbitrary characteristic, in particular if one wishes to allow characteristic 2.

And so on and so forth. There is no unique "best" choice of notation to use for this concept; it depends on the intended context and application domain. For instance, matrix notation would be unsuitable if one does not want the reader to accidentally confuse the scalar product $u^T v$ with the rank one operator $vu^T$, Hilbert space notation would be unsuitable if one frequently wished to perform coordinatewise operations (e.g., Hadamard product) on the vectors and matrices/linear transformations used in the analysis, and so forth.

(See also Section 2 of Thurston's "Proof and progress in mathematics", in which the notion of derivative is deconstructed in a fashion somewhat similar to the way the notion of inner product is here.)

ADDED LATER: One should also distinguish between the "one-time costs" of a notation (e.g., the difficulty of learning the notation and avoiding standard pitfalls with that notation, or the amount of mathematical argument needed to verify that the notation is well-defined and compatible with other existing notations), with the "recurring costs" that are incurred with each use of the notation. The desiderata listed above are primarily concerned with lowering the "recurring costs", but the "one-time costs" are also a significant consideration if one is only using the mathematics from the given field $X$ on a casual basis rather than a full-time one. In particular, it can make sense to offer "simplified" notational systems to casual users of, say, linear algebra even if there are more "natural" notational systems (scoring more highly on the desiderata listed above) that become more desirable to switch to if one intends to use linear algebra heavily on a regular basis.

Terry Tao
  • 108,865
  • 31
  • 432
  • 517
  • 3
    The geometric algebra notation can also be concisely written as $\frac{1}{2}{u,v}$ where ${\cdot,\cdot}$ is the anticommutator. – user76284 Jul 20 '20 at 23:26
  • Fair enough, I have edited the text accordingly. – Terry Tao Jul 21 '20 at 18:38
  • 4
    For what it’s worth, I learned Einstein summation as only summing on pairs of up and down indices. – Aaron Bergman Jul 21 '20 at 19:35
  • 4
    Both variants of Einstein summation are in use, with the formulation you state preferred if one wants to take full advantage of covariance, but the more relaxed formulation suitable for Euclidean contexts in which one will not need to rely much on covariant transformations. See https://en.wikipedia.org/wiki/Einstein_notation#Application . (Personally, if one is going to make heavy use of covariant operations, and particularly covariant derivatives, I would recommend using Penrose abstract summation notation instead, unless one really likes Christoffel symbols for some reason.) – Terry Tao Jul 21 '20 at 20:05
  • 1
    @AaronBergman and Terry, there is in fact an important distinction between summing over repeated indices where both are up or both are down and summing over repeated indices, where one is up and the other is down. And that's my quibble with Terry's description of Einstein notation. – Deane Yang Jul 21 '20 at 21:01
  • 1
    It is often important to distinguish between the contraction of a dual vector and a vector and the inner product of two vectors. In my work, I work without an inner product, so whether the indices are up or down is quite crucial. So in the description of Einstein notation, I would not view the two different notations are being equivalent. Repeated indices that are both up or both down indicates the use of an inner product AND an orthonormal basis. Repeated indices with one up and one down is the contraction of a dual vector with a vector with respect to ANY basis. – Deane Yang Jul 21 '20 at 21:01
  • I have now separated the two (actually three) flavours of Einstein notation in the list for disambiguation. – Terry Tao Jul 21 '20 at 21:07
  • For the musical notation example, is not appropriate as a subscript as mathematically it is more appropriately interpreted as a function. Alone it cannot identify a member of a series such as a note from a musical scale. n would be a more appriopriate and inclusive subscript as it can describe series besides the common 12-fold system that suggests. – ctpenrose Jul 22 '20 at 04:05
  • However, a G-clef, ``, might be cute and appropriate subscript instead. – ctpenrose Jul 22 '20 at 04:12
  • 1
    A lot of these "notations" are actually theorems, or rely on them for their well-definedness (e.g., the Hodge "notation"). – darij grinberg Jul 22 '20 at 10:27
  • 9
    Wonderful comments on notation. The mapping 'Notation' is what is usually called "denotational semantics" in computer science, and written [[-]], because programs are indeed 'notation'. – Jacques Carette Jul 22 '20 at 13:55
  • 1
    @ctpenrose The $\flat$ in musical notation is often written as a superscript instead, see https://en.wikipedia.org/wiki/Musical_isomorphism , which brings it in line with other standard functions such as transpose $(\cdot)^T$ or adjoint $(\cdot)^\dagger$. I prefer the variant in which the raising operator $(\cdot)^\sharp$ is superscripted and the lowering operator $(\cdot)_\flat$ is subscripted, to more closely resemble the Einstein and Penrose conventions (even if this deviates a bit more from musical usage). – Terry Tao Jul 22 '20 at 15:27
  • 24
    @darijgrinberg That's a feature, not a bug! See for instance the second part of desiderata 8 (Transformation). Good notation should be able to do a lot of the heavy mathematical lifting for you by efficiently incorporating standard theorems. (Even the bare-bones pedestrian notation implicitly uses the associativity-of-addition theorem.) Constructing such good notation (and establishing well-definedness and compatibility with other notation) can require substantial mathematical effort, but it is a one-off investment that can yield lasting dividends. – Terry Tao Jul 22 '20 at 15:29
  • 1
    @JacquesCarette How wonderfully meta that the discussion of notation applies to the concept of notation itself. I wonder how many other notations for notation there are extant in the literature... . – Terry Tao Jul 22 '20 at 15:36
  • 1
    Strictly speaking, MathML and, to a poorer extent, parts of LaTeX, are notations for notations. Theorem provers (like Isabelle, Agda and Coq) all have notations for defining notations. – Jacques Carette Jul 22 '20 at 15:45
  • @TerryTao yes superscript is often used too. My point was to indicate that isn't used with variables but is generally reserved as a modifier for a small set of constants such as the pitch classes {a, b, c, d, e, f, g}. Though it could be interpreted as a function. – ctpenrose Jul 23 '20 at 16:50
  • 2
    That TeX abuse to simulate Penrose notation is very clever. – LSpice Aug 20 '20 at 20:54
  • I find the up-down summation much better for checking an equation even makes sense! Also see Rainich's comments on the last thing you want to do is write it in coordinates ;-) – Jim Stasheff Feb 23 '21 at 20:49
26

One huge advantage, to my mind, of the bracket notation is that it admits 'blanks'. So one can specify the notation for an inner product as $\langle \ , \ \rangle$, and given $\langle \ , \rangle : V \times V \rightarrow K$, one can define elements of the dual space $V^\star$ by $\langle u , - \rangle$ and $\langle -, v \rangle$. (In the complex case one of these is only conjugate linear.)

More subjective I know, but on notational grounds I far prefer to write $\langle Au, v \rangle = \langle u, A^\dagger v \rangle$ for the adjoint map than $(Au)^t v = u^t (A^tv)$. The former also emphasises that the construction is basis independent. It generalises far better to Hilbert spaces and other spaces with a non-degenerate bilinear form (not necessarily an inner product).

I'll also note that physicists, and more recently anyone working in quantum computing, have taken the 'bra-ket' formulation to the extreme, and use it to present quite intricate eigenvector calculations in a succinct way. For example, here is the Hadamard transform in bra-ket notation:

$$ \frac{| 0 \rangle + |1 \rangle}{\sqrt{2}} \langle 0 | + \frac{| 0 \rangle - |1\rangle}{\sqrt{2}} \langle 1 |. $$

To get the general Hadamard transform on $n$ qubits, just taken the $n$th tensor power: this is compatible with the various implicit identifications of vectors and elements of the dual space.

Finally, may I issue a plea for everyone to use $\langle u ,v \rangle$, with the LaTeX \langle and \rangle rather than the barbaric $<u,v>$.

Mark Wildon
  • 10,750
  • 3
  • 44
  • 70
  • 6
    The physicists' bra-ket notation is very "type confusing". It's not just making use of a reasonable symbol of scalar product $\langle ;.; |; . ;\rangle$ to denote the "metric" dual $\langle v |; .; \rangle$ of a vector $v$, or to denote an operator of the form $v\otimes u^{\vee}$ as $v \langle u |$, which would be totally standard for mathematicians. No, they use $|\quad\rangle$ as a sort of blank in which to insert some symbol: substitute any symbol in place of the box in $|\square\rangle$ as in $v_{\square}$, (...) – Qfwfq Jul 20 '20 at 15:06
  • 6
    (...) no matter if such a symbol (inserted inside the ket) denotes a vector itself, as in $| v \rangle$, an eigen-value, as in $| \lambda_i \rangle$, or an index, as in $|\spadesuit\rangle$ or $| 0 \rangle$ or $| \uparrow \rangle$. – Qfwfq Jul 20 '20 at 15:07
  • 7
    I don’t think this is particularly strange. The ket is the vector, what goes in it is the label for the vector. If you have a vector $|v \rangle$, then its Hermitian dual is the bra $\langle v|$. Similarly, the operator is $|v \rangle \langle u|$. $v$ by itself isn’t anything at all, just a label, which could be anything. – Aaron Bergman Jul 20 '20 at 18:50
  • 1
    @Aaron Bergman: when $\psi$ is an $L^2$ function, as it's often the case in Q.M., then $\psi\neq |\psi\rangle$ I suppose? And $| \psi_1+\psi_2\rangle\neq |\psi_1\rangle +| \psi_2 \rangle$, I guess? Or are you really saying that the $\psi$ in $| \psi\rangle$ is just "any symbol"? A bit like the $+$ in $v_+$? It does make sense, strictly speaking, sure, but it strikes me as confusing. It also conflicts with the usual mathematical notation $\langle u | v \rangle$ when $u$ and $v$ are the vectors of which we're taking the scalar product. – Qfwfq Jul 20 '20 at 20:15
  • 4
    The latter. $\psi$ isn't a thing, necessarily. The trick in the notation is that, as opposed to $(u,v)$ being the inner product of $u$ and $v$, in physics $\langle u|v \rangle$ is the inner product of $|u\rangle$ and $|v\rangle$. Of course, you can choose the labels to be evocative. For example, if $\psi(x)$ is a function, you could write $\psi(x) = \langle x | \psi \rangle$. Then $|\psi\rangle$ is a vector (and $|x\rangle$ is the real abuse of notation). – Aaron Bergman Jul 20 '20 at 20:23
  • 3
    @Aaron: yes I understand. My impression is that this notation makes most sense if you constrain yourself to only denote "abstract Hilbert space vectors" (whatever abstract means) with a string of symbols always surrounded by a "ket". A bit like when they assume something is an "operator" if and only if it is written $\hat{\square}$ for $\square$ any other symbol. I venture to say that this way of "constraining type" might be seen a bit naif by many mathematicians, though I don't know this for a fact :) – Qfwfq Jul 20 '20 at 20:40
  • 10
    I think it may be best to think of $|\ \rangle$ (resp. $\langle\ |$) as a type conversion (or casting) operator from almost any type to a vector type (resp. covector type). https://en.wikipedia.org/wiki/Type_conversion . Type conversion operators are commonplace in C-type programming languages and I think some form of them could be safely adopted in mathematical notation more often than is currently done in my opinion. – Terry Tao Jul 20 '20 at 23:14
15

Inner product is defined axiomatically, as a function from $V\times V\to k$, where $k$ is a field and $V$ is a $k$-vector space, satisfying the three well-known axioms. The usual notation is $(x,y)$. So when you want to say anything about an arbitrary inner product, you use this notation (or some similar one). $(x,y)=x^*y$ is just one example of an inner product on the space $\mathbb C^n$. There are other examples on the same space, $(x,y)=x^*Ay$ where $A$ is an arbitrary Hermitian positive definite matrix, and there are dot products on other vector spaces.

10

One advantage of $\langle \cdot, \cdot \rangle$ is that you don't have to worry about changes in basis.

Suppose we have a coordinate system $\alpha$ in which our (real) inner product space is explicitly Euclidean, and an alternative coordinate system $\beta$. A vector $v$ is expressed in the coordinates systems as, respectively, the column vectors $[v]_\alpha$ and $[v]_\beta$. Let $P$ denote the change of basis matrix

$$ [v]_\beta = P [v]_\alpha $$

The inner product, which in coordinate system $\alpha$ is $\langle v, v\rangle = [v]_{\alpha}^T [v]_{\alpha}$ is certainly not in general $[v]_\beta^T[v]_\beta$ in the second coordinate system. (It is only so if $P$ is orthogonal.)


That said: given any Hilbert space $V$, by Riesz-representation there exists an (anti-)isomorphism from $V$ to its dual space $V^*$. You can certainly choose to call this mapping $v \mapsto v^*$ (in Riemannian geometry contexts this is more usually denoted using the musical isomorphism notation $\flat$ and $\sharp$) and I don't think in this case there are reasons to prefer one to another. But a major caveat if you do things this way is that unless you are working in an orthonormal basis, you cannot associate $v \mapsto v^*$ to the "conjugate transpose" operation on matrices.

Willie Wong
  • 37,551
  • @Federico Poloni: if I understand correctly, $[v]\alpha$ is the column vector of coordinates of the abstract vector $v$ w.r.t. the basis $\alpha$. So, $[v]\beta$ is a column vector too, and $[v]\beta = P[v]\alpha$ (i.e. there's a small typo). – Qfwfq Jul 20 '20 at 18:13
  • @FedericoPoloni: thanks for pointing out the typo (and thanks to Qfwfq for the correct interpretation.) Typo is now fixed. – Willie Wong Jul 20 '20 at 18:32
  • Real Hilbert spaces, I guess? – LSpice Jul 20 '20 at 23:12
  • @LSpice: why so? Riesz representation works also for complex Hilbert spaces, no? Or do you mean that the "Transpose" notation only works for real ones, whereas for complex you need "conjugate transpose"? Or am I overlooking something else that I mistyped above? – Willie Wong Jul 21 '20 at 14:23
  • I assumed you meant that we have an isomorphism $V \to V^$ given by $u \mapsto \langle u, {-}\rangle$. If $u$ is in the conjugate-linear slot, then that's an isomorphism $V \to \overline{V^}$ (the complex conjugate of $V^$). If $u$ is in the linear slot, then that's an isomorphism from $V$ to $(\overline V)^$. If you meant something else, then I missed it! – LSpice Jul 21 '20 at 15:07
  • 1
    @LSpice: bah, I forgot to include the "(anti-)". Fixing it now. – Willie Wong Jul 21 '20 at 15:11
  • I fully agree with your last paragraph (and so did Souriau for an entire book), yet one must admit that trouble looms when one wants to start talking about $\langle u,v\rangle = \operatorname{Tr}(u^v)$ on gl(n,C*). – Francois Ziegler Jul 21 '20 at 15:41
  • 2
    @FrancoisZiegler: I admit it! The best way out of this conundrum, I think, is to define $\langle u,v\rangle = \mathrm{Tr} u^* v$ even for vectors. :-) This way you also satisfy the type theorists who refuse to identify $1\times 1$ matrices with scalars. – Willie Wong Jul 21 '20 at 16:46
  • @FrancoisZiegler: this is actually super useful when trying to explain to students that when $A\in L(V,V)$ and $x \in V$, the quadratic form $\langle Ax, Ax\rangle$ not only defines a semidefinite product on $V$ (for fixed $A$) but also a semidefinite product on $L(V,V)$ for fixed $x$. – Willie Wong Jul 21 '20 at 16:52
9

This is to expand on my comment in response to Federico Poloni:

$\langle u,v\rangle $ is explicitly a number, whereas $u^Tv$ is a 1 by 1 matrix :).

While it is true that there is a canonical isomorphism between the two, how do you write the expansion of $u$ in an orthonormal base $\{v_i\}$? Something like $$ u=\sum_i u^Tv_i v_i $$ feels uncomfortable as if you view everything as matrices, the dimensions do not allow for multiplication. So, I would at least feel a need to insert parentheses, $$ u=\sum_i (u^Tv_i) v_i, $$ to indicate that the canonical isomorphism is applied. But that is still vague-ish while already cancelling any typographical advantages of $u^Tv$.

(I do also share the sentiment that the basis-dependent language is inferior and should be avoided when possible.)

Kostya_I
  • 8,662
  • 9
    You can write it $\sum_j v_j v_j^Tu$. In this form everything is perfectly associative, and you can even factor out nicely the projection matrix $(\sum_j v_j v_j^T)u$. I was also taught to put scalars to the left of column vectors, but now I think it is a lot better to put them on the right, to retain associativity in cases like this one. Actually I think orthogonal projections make a compelling case for the transpose notation, and I have written an answer to discuss it. – Federico Poloni Jul 20 '20 at 11:40
  • @FedericoPoloni, I like your argument, but surely sacrificing commutativity with scalars is a heavy price to pay for associativity? – LSpice Jul 20 '20 at 23:17
  • 4
    One can also use $\mathrm{tr}()$ (or more "multiplicative" contexts, $\mathrm{det}()$) to explicitly write the conversion from $1 \times 1$ matrices to scalars if one wished to be unambiguous and pedantically precise. – Terry Tao Jul 20 '20 at 23:28
  • 1
    @FedericoPoloni, besides being formally correct, the notation should reflect the way we think. Your notation is fine if one thinks of vectors as linear maps from a one-dimensional linear space, scalars as linear maps on that space, and multiplication by scalars as pre-composition. I, on the other hand, am used to thinking of vectors as elements of the linear space, and multiplication by scalars as operation applied on these elements, if anything a post-composition with a scalar matrix, not pre-composition. – Kostya_I Jul 21 '20 at 10:17
  • 3
    I can understand, different fields require different ways of thought. But for me scalars-right is 100% the most comfortable version of linear algebra, after one gets over the initial awkwardness. For instance, the eigenvalue/eigenvector relation when written as $Av=v\lambda$ generalizes perfectly to the invariant subspace relation $AV=VB$ and to the full eigendecomposition $AV=VD$, or matrix-vector multiplication $Ax$ corresponds perfectly to the linear combination $\sum_j A^j x_j$ of the columns $A^j$ with coefficients the entries $x_j$. One never needs to move scalars around in this way. – Federico Poloni Jul 21 '20 at 10:30
  • @FedericoPoloni Out of curiosity, how would you establish that the eigenvalues of a matrix are the roots of the characteristic polynomial if one has to always keep scalars on the right? – Terry Tao Jul 21 '20 at 14:50
  • 1
    @TerryTao Good point; I don't have a way to do it that avoids sneaking in somewhere the fact that $(\alpha I)v = v \alpha$. That is a special case of the "moving scalars from vectors to matrices" property of matrix-vector products, $(\alpha A)v = A(v\alpha)$. I'd guess that this property is the crucial point that needs to be used in all proofs, in a more or less hidden way. (One could define eigenvectors directly from invariant subspaces like in Axler's "Linear algebra done right", of course, but that would not solve this problem.) – Federico Poloni Jul 21 '20 at 15:51
  • 2
    In a sense, in this setting the true outlier is the scalar-matrix product $\alpha A$, which does not match the usual rules for compatible dimensions in a product (while $v\alpha$ does, when written in this order). Maybe the formally correct way out is seeing it as a tensor (Kronecker) product $\alpha \otimes A$ with the $1\times 1$ matrix $\alpha$... – Federico Poloni Jul 21 '20 at 15:53
  • 1
    One way to think about it is to interpret a scalar $\alpha$ as a natural transformation from the identity functor on $\mathrm{Vec}$ to itself, that on every vector space $V$ evaluates to the endomorphism $\alpha_V := \alpha 1_V: V \to V$. So if for instance one sees an expression like $S \alpha T$ where $T: U \to V$ and $S: V \to W$ are morphisms in $\mathrm{Vec}$ (i.e. linear transformations), this automatically should be interpreted as $S \alpha_V T$. That is to say, scalars automatically resize to fit whatever square matrix size is needed for the location it is placed in. ... – Terry Tao Jul 21 '20 at 18:24
  • 3
    The naturality of $\alpha$ then amounts to saying that $\alpha$ may be freely moved around as one pleases: $S \alpha T = ST\alpha = \alpha ST$. In fact, this is even a category-theoretic definition of a scalar (as an element of the endomorphism ring of the identity functor on $\mathrm{Vec}$), very much in the spirit of Grothendieck style mathematics. – Terry Tao Jul 21 '20 at 18:24
9

I consider the distinction quite important. There are two separate operations that look superficially like each other but are in fact different.

First, the abstract description. If $V$ is an abstract vector space and $V^*$ is its dual, then there is the natural evaluation operation of $v \in V$ and $\theta \in V^*$, which is commonly written as $$ \langle\theta,v\rangle = \langle v,\theta\rangle $$ No inner product is needed here. If you choose a basis $(e_1, \dots, e_n)$ of $V$ and use the corresponding dual basis $(\eta^1, \dots, \eta^n)$ of $V^*$ and write $v = v^ie_i$ and $\theta = \theta_i\eta^i$, then $$ \langle\theta,v\rangle = \theta_iv^i. $$ The distinction between up and down indices indicates whether the object is a vector or a dual vector ($1$-form).

If $V$ has an inner product and $(e_1, e_n)$ is an orthonormal basis, then given two vectors $v = v^ie_i, w = w^ie_i \in V$, then $$ v\cdot w = v^iw^i $$ Notice that here both indices are up. There is a similar formula for the dot product of two dual vectors. Here, the formula only works if the basis is orthonormal.

How does this look in terms of row and column vectors? My personal convention, a common one, is the following:

  1. When writing the components of a matrix as $A^i_j$, I view the superscript as the row index and the subscript as the column index.
  2. I view a vector $v \in V$ as a column vector, which is why its coefficients are superscripts (and the basis elements are labeled using subscripts).
  3. This means that a dual vector $\theta$ is a row vector, which is why its coefficients are subscripts.
  4. With these conventions $$ \langle \theta,v\rangle = \theta v, $$ where the right side is matrix multiplication. The catch here is that the dual vector has to be the left factor and the vector the right vector. To avoid this inconsistency, I always write either $\langle \theta,v\rangle$ or $\theta_iv^i = v^i\theta_i$. Again, note that these formulas hold for any basis of $V$.
  5. If $V$ has an inner product and $v, w$ are written with respect to an orthonormal basis, then indeed $$ v\cdot w = v^Tw = v^iw^i $$ You can, in fact, lower (or raise) all of the indices and have an implicit sum for any pair of repeated indices. This is, in fact, what Chern would do.

ASIDE: I gotta say that having such precisely defined conventions is crucial to my ability to do nontrivial calculations with vectors and tensors. When I was a graduate student, my PhD advisor, Phillip Griffiths, once asked me, "Have you developed your own notation yet?" I also have to acknowledge that my notation is either exactly or based closely on Robert Bryant's notation.

Deane Yang
  • 26,941
7

The family $F$ of (real) quadratic polynomials is a vector space isomorphic to the vector space $\mathbb{R}^3.$ One way to make $F$ an inner product space is to define $\langle f, g \rangle =\int_a^bf(t)g(t)\,dt$ for some fixed interval $[a,b].$ Instead of quadratic polynomials one might consider all polynomials or all bounded integrable functions. One could also define the inner product as $\langle f, g \rangle =\int_a^bf(t)g(t)\mu(t)dt$ for some weight function $\mu.$ There isn’t a natural role for transposes here.

  • 1
    Or this is just a special case of the product of matrices with continuously, rather than discretely, indexed rows and columns. :-) – LSpice Jul 20 '20 at 23:14
7

Lots of great answers so far, but I'll add another (hopefully at least good) answer: the notation $v^T u$ makes it somewhat difficult to speak of collections of bilinear pairings depending on a parameter. Typical examples:

  • "Let $\langle \cdot, \cdot \rangle_i$ be a finite set of inner products on a vector space $V$"
  • "Let $\langle \cdot, \cdot \rangle_p$, $p \in M$, be a Riemannian metric on a manifold $M$"
  • "Let $\langle \cdot, \cdot \rangle_t$ be a continuously varying family of inner products on a Hilbert space $H$"

These are all difficult to express using the transpose notation. The closest you can get is to write, for instance $v^T A_i u$ where $A_i$ is a family matrices, but particularly when one is speaking of continuously varying families of inner products you run into all sorts of difficult issues with coordinate systems, and it becomes very difficult to keep things straight.

Paul Siegel
  • 28,772
6

Maybe it's worth mentioning that the computer language APL has a "generalized" inner product where you can use any two functions of two arguments (i.e., "dyadic functions" in APL terms) to form an inner product. Thus, for example, ordinary inner product is written as "A+.xB", which can apply to two arrays A, B of any dimension whatsoever (vectors, matrices, three-dimensional arrays, etc.), provided that the last dimension of A matches the first dimension of B.

Thus, for example, A^.=B represents string matching of A against B, Ax.*B evaluates a number given its prime divisors A and prime factorization exponents B, etc.

The authors of APL, Iverson and Falkoff, cared intensely about notation and tried to find the most general interpretation of every new item they added to the language.

5

I do not see a compelling argument for $\langle \cdot, \cdot \rangle$ over $(\cdot)^T(\cdot)$, or, better $(\cdot)^*(\cdot)$, so that the star operator can be generalized to other more complicated settings (complex vectors, Hilbert spaces with a dual operation).

Let me summarize the arguments in the comments:

  • emphasizes vectors as geometric objects: not clear why $u^*v$ is less geometric.
  • free space for a superscript: I agree, that is an argument in favor of $\langle \cdot, \cdot \rangle$. In a setting where I need many superscripts, I would probably favor that notation.
  • emphasizes bilinearity: disagree. In the complex case, it makes a lot less clear why one of these two arguments is not like the other and implies a conjugation, and it does not make clear which one it is: is $\langle \lambda u,v \rangle$ equal to $\lambda\langle u,v \rangle$ or to $\overline{\lambda}\langle u,v \rangle$? Is there a way to recall it other than remembering it?
  • Leaves room for an operator and gives a clear interpretation of adjointness: I find $(Au)^*v=u^*A^*v = u(A^*v)$ equally clear, and it relies only on manipulations that are well ingrained in the mind of mathematicians.
  • Gives an interpretation for the linear functional $\langle u, \cdot \rangle$: but what is $u^*$ or $u^T$ if not a representation for that same linear functional?

An advantage of the $u^*v$ notation, in my view, it that it makes clear that some properties are just a consequence of associativity. Consider for instance the orthogonal projection on the orthogonal space to $u$

$$Pv = (I-uu^*)v = v - u(u^*v).$$

If one writes it as $v - \langle v,u \rangle u$ (especially by putting the scalar on the left as is customary), it is less clear that it is equivalent to applying the linear operator $I-uu^*$ to the vector $v$. Also, the notation generalizes nicely to repeated projections $$ (I-u_1u_1^* - u_2u_2^*)v = (I - \begin{bmatrix}u_1 & u_2\end{bmatrix}\begin{bmatrix}u_1^* \\ u_2^*\end{bmatrix})v = (I - UU^*)v. $$

A disadvantage, of course, is working with spaces of matrices, where transposes already have another meaning; for instance, working with the trace scalar product $\langle A,B \rangle := \operatorname{Tr}(A^TB)$ one really needs the $\langle A,B \rangle$ notation.

  • 4
    I stand by my comment that bilinearity is better emphasized (in the real case) by using the bracket notation. It is clearly more flexible and general: you can surely write $$\langle f, , g\rangle = \int_a^b f(x)g(x)\mathrm{d}x$$ and I think it would be akward to write $f^*g$ for the same bilinear form on functions (this could be easily confused with the pull-back, for instance). – Francesco Polizzi Jul 20 '20 at 12:07
  • 1
    There is definitely no way to remember without recalling whether the $u$ slot or the $v$ slot in $\langle u, v\rangle$ is linear, because it is a convention that different fields, and even different mathematicians, fix differently. – LSpice Jul 20 '20 at 23:16
  • 2
    @LSpice: or even the same mathematician, on different days of the week... – Willie Wong Jul 22 '20 at 00:33
  • 2
    @WillieWong, "I don't care if Monday's blue, Tuesday's $v$ and Wednesday's $u$ …." – LSpice Jul 22 '20 at 03:26