22

If $\frac{d}{dx}$ is a differential operator, what are its inputs? If the answer is "(differentiable) functions" (i.e., variable-agnostic sets of ordered pairs), we have difficulty distinguishing between $\frac{d}{dx}$ and $\frac{d}{dt}$, which in practice have different meanings. If the answer is "(differentiable) functions of $x$", what does that mean? It sounds like a peculiar hybrid of mathematical object (function) with mathematical notation (variable $x$).

Does $\frac{d}{dx}$ have an interpretation as an operator, distinct from $\frac{d}{dt}$, and consistent with its use in first-year Calculus?

Wojowu
  • 27,379
  • 2
    What does your question have to do with logic? People do write $\frac{d}{dx} f$ sometimes. And $\frac{d}{dx}$ and $\frac{d}{dt}$ are frequently seen as essentially identical in these contexts. Many calculus textbooks where functions are not synonymous with formulas have this outlook. – Ryan Budney Dec 04 '12 at 16:27
  • 12
    I don't think this is, at least as stated, a good question for MO (and apparently I am not the only one, because I did not go as far as to downvote). Also, I would challenge some of your assumptions. The notation $\frac{df}{dx}$ is used very frequently in the books I'm reading, for instance. Also, does $\frac{d}{dx}$ denote a single operator? As Poincare remarked: "Mathematics is the art of giving the same name to different things". – Thierry Zell Dec 04 '12 at 16:30
  • 1
    Thank you for the criticisms. I have rephrased to clarify my intended meaning. – Jason Howald Dec 04 '12 at 16:42
  • 11
    Mathematics is a human activity, hence it often benefits from some http://en.wikipedia.org/wiki/Abuse_of_notation – Qfwfq Dec 04 '12 at 17:14
  • 10
    Jason -- you could try asking this on math.stackexchange.com; there is a chance you'll get a detailed answer there. – algori Dec 04 '12 at 17:39
  • 28
    I find this question to be both deeper and more interesting than it appears by the comments that some others here do. So I have voted to reopen, and would look forward to reading a thoughtful answer that takes the issue seriously. – Joel David Hamkins Dec 04 '12 at 18:10
  • 3
    I've voted to reopen as well, as I would be interested to hear what Joel Hamkins has to say about this question. – algori Dec 04 '12 at 18:35
  • 2
    As long as we are only looking at functions in one variable, $d/dx$ and $d/dt$ mean the same thing, and one of them is redundant. As soon as we look at functions in, say, two variables, we implicitly introduce an (arbitrary) order of variables, say $x$ is the first and $t$ the second, and $d/dx$ is the derivative with respect to the first variable. This makes sense even for variable-agnostic sets of ordered pairs. True, it is an abuse of notation, but $(d/dx, d/dt)$ is sometimes just more readable than $(D_1, D_2)$. – Goldstern Dec 04 '12 at 18:37
  • 2
    Thanks, algori, although I was hoping to read an answer, rather than write one. – Joel David Hamkins Dec 04 '12 at 18:38
  • 1
    @Goldstern gives a serious answer, but (basically) the same argument can be stated in an entertaining way; see Darsh Ranjan's answer to this question:http://mathoverflow.net/questions/1083/do-good-math-jokes-exist-closed – Margaret Friedland Dec 04 '12 at 20:26
  • 6
    Of course we all understand the basic issue---none of us is confused about any elementary matter---but I feel that the explanations given don't yet get at the conflation of syntax and semantics that the question is about. For example, Goldstern says "$d/dx$ and $d/dt$ mean the same thing", but I find it unlikely that he would write ${d/dt} (x^2)=2x$ in place of ${d/dx}(x^2)=2x$ for his calculus students, without further remarks. It is the precise nature of this particular confusion that the question is about. For example, the $\lambda$-calculus is quite insistent about avoiding this collision. – Joel David Hamkins Dec 04 '12 at 20:50
  • 32
    I've always thought of $x$ as just a choice of coordinate on the $1$-manifold $\mathbb R$ (i.e. picking a diffeomorphism with a ``standard'' copy of $\mathbb R$). Without choosing coordinates, we have for each function $f:\mathbb R \to \mathbb R$ a linear operator $df$ on each tangent space. Picking a coordinate function $x$ allows us to express these as numbers, hence we get a function $df/dx$. – Sam Gunningham Dec 04 '12 at 23:41
  • I also would like to see this reopened. I'm a little confused though -- JDH and algori both say they have reopened, but the reopen counter is only at 1 (my click). Are we vote trading? http://tea.mathoverflow.net/discussion/506 I accept and follow the vote trading protocol, but it hasn't been clearly initiated. – David E Speyer Dec 05 '12 at 14:15
  • @David Speyer: please just check the revision history, it was reopened and reclosed. (I had a comment to that extent but decided it was too unfriendly so deleted it after some rethinking but did not manage to think up a friendly one, so there is no trace of the reclosure.) The meta thread you link to was declared essentially obsolete a long time ago. But in any case those written votes are effective/technical votes so it would not even apply regardless. –  Dec 05 '12 at 14:21
  • 6
    Well, I'm not satisfied by the current answers. I've written up an answer, which I will post if the question is reopened again. – Joel David Hamkins Dec 05 '12 at 15:14
  • I totally agree with Sam, though I would tend to say that we're working with the manifold $\mathbb{R}$, together with a volume form $vol=dx$, so one could write this operator as $d/vol$, if so desired. – Peter Dalakov Dec 05 '12 at 17:49
  • 5
    I posted my answer, such as it is, on my blog at http://jdh.hamkins.org/the-differential-operator-ddx-binds-variables/. – Joel David Hamkins Dec 05 '12 at 22:39
  • 2
    Since the question opened up again, I've now posted here. Thanks! – Joel David Hamkins Dec 06 '12 at 07:01
  • 3
    I'm surprised to see such a lengthy discussion of this notation with no mention of the history. To me, this is a case where historical changes have led to the reinterpretation of an elegant old notation in an awkward new way. The Leibniz notation $dy/dx$ was originally meant to represent the quotient of two infinitesimal changes. After Cauchy-Weierstrass, people wanted to make $d/dx$ an operator. It doesn't quite make sense because it's an attempt to retrofit an old notation. There's a similar awkwardness in NSA, where we'd like $dy/dx$ to mean the standard part of the quotient $dy/dx$. –  Dec 07 '12 at 16:09
  • 2
    Btw: the OP is in very good company with his question: https://math.stackexchange.com/questions/1258923/what-did-alan-turing-mean-when-he-said-he-didnt-fully-understand-dy-dx – Michael Bächtold Aug 12 '18 at 16:33
  • At least we have given up 19th-century notations like $$\frac{d^n,0^m}{d0^n}$$ – Gerald Edgar Oct 22 '23 at 23:38

10 Answers10

48

(From the post on my blog:)

To my way of thinking, this is a serious question, and I am not really satisfied by the other answers and comments, which seem to answer a different question than the one that I find interesting here.

The problem is this. We want to regard $\frac{d}{dx}$ as an operator in the abstract senses mentioned by several of the other comments and answers. In the most elementary situation, it operates on a functions of a single real variable, returning another such function, the derivative. And the same for $\frac{d}{dt}$.

The problem is that, described this way, the operators $\frac{d}{dx}$ and $\frac{d}{dt}$ seem to be the same operator, namely, the operator that takes a function to its derivative, but nevertheless we cannot seem freely to substitute these symbols for one another in formal expressions. For example, if an instructor were to write $\frac{d}{dt}x^3=3x^2$, a student might object, "don't you mean $\frac{d}{dx}$?" and the instructor would likely reply, "Oh, yes, excuse me, I meant $\frac{d}{dx}x^3=3x^2$. The other expression would have a different meaning."

But if they are the same operator, why don't the two expressions have the same meaning? Why can't we freely substitute different names for this operator and get the same result? What is going on with the logic of reference here?

The situation is that the operator $\frac{d}{dx}$ seems to make sense only when applied to functions whose independent variable is described by the symbol "x". But this collides with the idea that what the function is at bottom has nothing to do with the way we represent it, with the particular symbols that we might use to express which function is meant. That is, the function is the abstract object (whether interpreted in set theory or category theory or whatever foundational theory), and is not connected in any intimate way with the symbol "$x$". Surely the functions $x\mapsto x^3$ and $t\mapsto t^3$, with the same domain and codomain, are simply different ways of describing exactly the same function. So why can't we seem to substitute them for one another in the formal expressions?

The answer is that the syntactic use of $\frac{d}{dx}$ in a formal expression involves a kind of binding of the variable $x$.

Consider the issue of collision of bound variables in first order logic: if $\varphi(x)$ is the assertion that $x$ is not maximal with respect to $\lt$, expressed by $\exists y\ x\lt y$, then $\varphi(y)$, the assertion that $y$ is not maximal, is not correctly described as the assertion $\exists y\ y\lt y$, which is what would be obtained by simply replacing the occurrence of $x$ in $\varphi(x)$ with the symbol $y$. For the intended meaning, we cannot simply syntactically replace the occurrence of $x$ with the symbol $y$, if that occurrence of $x$ falls under the scope of a quantifier.

Similarly, although the functions $x\mapsto x^3$ and $t\mapsto t^3$ are equal as functions of a real variable, we cannot simply syntactically substitute the expression $x^3$ for $t^3$ in $\frac{d}{dt}t^3$ to get $\frac{d}{dt}x^3$. One might even take the latter as a kind of ill-formed expression, without further explanation of how $x^3$ is to be taken as a function of $t$.

So the expression $\frac{d}{dx}$ causes a binding of the variable $x$, much like a quantifier might, and this prevents free substitution in just the way that collision does. But the case here is not quite the same as the way $x$ is a bound variable in $\int_0^1 x^3\ dx$, since $x$ remains free in $\frac{d}{dx}x^3$, but we would say that $\int_0^1 x^3\ dx$ has the same meaning as $\int_0^1 y^3\ dy$.

Of course, the issue evaporates if one uses a notation, such as the $\lambda$-calculus, which insists that one be completely explicit about which syntactic variables are to be regarded as the independent variables of a functional term, as in $\lambda x.x^3$, which means the function of the variable $x$ with value $x^3$. And this is how I take several of the other answers to the question, namely, that the use of the operator $\frac{d}{dx}$ indicates that one has previously indicated which of the arguments of the given function is to be regarded as $x$, and it is with respect to this argument that one is differentiating. In practice, this is almost always clear without much remark. For example, our use of $\frac{\partial}{\partial x}$ and $\frac{\partial}{\partial y}$ seems to manage very well in complex situations, sometimes with dozens of variables running around, without adopting the onerous formalism of the $\lambda$-calculus, even if that formalism is what these solutions are essentially really about.

Meanwhile, it is easy to make examples where one must be very specific about which variables are the independent variable and which are not, as Todd mentions in his comment to David's answer. For example, cases like

$$\frac{d}{dx}\int_0^x(t^2+x^3)dt\qquad \frac{d}{dt}\int_t^x(t^2+x^3)dt$$

are surely clarified for students by a discussion of the usage of variables in formal expressions and more specifically the issue of bound and free variables.

  • I assume you still define integeration as an operation on a function, as opposed to a purely syntactic process (which for calculus would mostly be possible). So, if you write $\int_{0}^x x^3 dt$ and a student ask: please explain precisely which function is meant (with domain of definition and all that) by $x^3$ what do you reply? Same question for $7$ in $\int_{0}^x 7 dt$. –  Dec 06 '12 at 15:31
  • Well, of course I would give the answer explaining precisely which function I had meant; the notation is ambiguous without doing so, and this is the point of the question and my answer. One can imagine a variety of perfectly reasonable answers that would correspond to different intended meanings here. The discussion of whether $x$ in the integrand was meant to depend on $t$ or be an independent variable from $t$ and so on was the discussion I alluded to at the end of my answer. The part of the question interesting me is the precise nature of this particular ambiguity. – Joel David Hamkins Dec 06 '12 at 15:48
  • I did not mean anything fancy. The situation I envision is just $x^3$ does not depend on $t$ for a result of $x^4$ and $7x$ in the latter case. But what is the precise nature of the finction in the integeral. In particular, please make it so that the $d/dx$ application you give makes still sense. –  Dec 06 '12 at 16:07
  • In a now-deleted answer, "none" says, "The issue of switching variables and messing up the answer has been called "perturbation confusion" in automatic differentiation (AD) and it's discussed at some length here (pdf): http://www.bcl.hamilton.ie/~qobi/nesting/papers/ifl2005.pdf and here (sigfpe blog): http://blog.sigfpe.com/2011/04/perturbation-confusion-confusion.html . It is apparently sometimes a (pun not intended) confusing issue. AD itself is discussed by sigfpe here: http://blog.sigfpe.com/2005/07/automatic-differentiation.html ." – S. Carnahan Dec 06 '12 at 16:20
  • The function to be integrated in your case, as I am sure you know very well (and so it is hard to take you seriously here), is the function with constant value $x^3$, constant as a function of $t$. One can think multivariately, but this is not actually necessary, for it suffices to have a separate function specific to the value of $x$, which in effect here is an introduced constant, as with the coefficients in $ax^2+bx+c$. The subsequent differentiation with respect to $x$ is with respect to the function $x\mapsto x^4$, on your set-up. But I don't think you are confused about it... – Joel David Hamkins Dec 06 '12 at 16:24
  • To use the $\lambda$-calculus notation, the multivariate approach is dealing with the binary function $\lambda x,t.x^3$, and the scheme-of-unary-functions approach involves the function-valued function $\lambda x.(\lambda t.x^3)$. The introduced constant concept is the role played by $G$, when one wants to prove a theorem about all groups, and begins, "Let $G$ be a group". – Joel David Hamkins Dec 06 '12 at 17:01
  • 5
    I am grateful to Joel for his support of the question, including this interesting answer. Certainly $\frac{d}{dx}$ is similar to a quantifier: It "shields" occurrences of the variable $x$ in its scope from direct substitution. It is defined in terms of the limit, which also binds a variable, as a quantifier could. It is a very strange quantifier, though, as $x$ once again occurs free in the ("bound"?) expression $\frac{d}{dx} x^3$ since $\frac{d}{dx} x^3 = 3x^2$. – Jason Howald Dec 06 '12 at 19:35
  • 4
    I agree with this answer. While professional mathematician differentiate functions, in basic calculus we differentiate expressions. – Donu Arapura Dec 07 '12 at 14:58
  • 1
    this last comment by @DonuArapura is all this is about. d/dx can be applied to expressions (hopefully containing x), but without abuse of notation (or fancy definitions with coordinate functions) not to functions. since some people have a tendency to identify f with f(x) and thereby functions with expressions, they can apply d/dx to f. – peter Sep 06 '18 at 18:25
  • 2
    Even without the differentiation $\dfrac{d}{dt}$, I'm not really sure how I would approach the integral $\displaystyle \int_t^x (t^2+x^3) dt$. Using $t$ both as a limit of integration and also as the variable of integration is a little confusing. – Zach Teitler Feb 23 '23 at 23:09
34

Not sure why this question is back on the front page, but I just wanted to add that the situation seems to be clarified by temporarily generalising to higher dimensions and to curved spaces, i.e., by taking a differential geometry perspective.

Firstly, a quick reminder of the concept of a dual basis in linear algebra: if one has an $n$-dimensional vector space $V$ (let's say over the reals ${\bf R}$ for sake of discussion), and one has a basis $e^1,\dots,e^n$ of it, then there is a unique dual basis $e_1,\dots,e_n$ of the dual space $V^* = \mathrm{Hom}(V,{\bf R})$, such that $e_i(e^j) = \delta_i^j$ for all $i,j=1,\dots,n$ ($\delta_i^j$ being the Kronecker delta, and where I am trying to choose subscripts and superscripts in accordance with Einstein notation). It is worth pointing out that while each dual basis element $e_i$ is "dual" to its counterpart $e^i$ in the sense that $e_i(e^i) = 1$, $e_i$ is not determined purely by $e^i$ (except in the one-dimensional case $n=1$); one must also know all the other vectors in the basis besides $e^i$ in order to calculate $e_i$.

In a similar spirit, whenever one has an $n$-dimensional smooth manifold $M$, and (locally) one has $n$ smooth coordinate functions $x^1,\dots,x^n: M \to {\bf R}$ on this manifold, whose differentials $dx^1,\dots,dx^n$ form a basis of the cotangent space at every point $p$ of the manifold $M$, then (locally at least) there is a unique "dual basis" of derivations $\partial_1,\dots,\partial_n$ on $C^\infty(M)$ with the property $\partial_i x^j = \delta_i^j$ for $i,j=1,\dots,n$. (By the way, proving this claim is an excellent exercise for someone who really wants to understand the modern foundations of differential geometry.)

Now, traditionally, the derivation $\partial_i$ is instead denoted $\frac{\partial}{\partial x^i}$. But the notation is a bit misleading as it suggests that $\frac{\partial}{\partial x^i}$ only depends on the $i^{th}$ coordinate function $x^i$, when in fact it depends on the entire basis $x^1,\dots,x^n$ of coordinate functions. One can fix this by using more complicated notation, e.g., $\frac{\partial}{\partial x^i}|_{x^1,\dots,x^{i-1},x^{i+1},\dots,x^n}$, which informally means "differentiate with respect to $x^i$ while holding the other coordinates $x^1,\dots,x^{i-1},\dots,x^{i+1},\dots,x^n$ fixed". One sees this sort of notation for instance in thermodynamics. Of course, things are much simpler in the one-dimensional setting $n=1$; here, any coordinate function $x$ (with differential $dx$ nowhere vanishing) gives rise to a unique derivation $\frac{d}{dx}$ such that $\frac{d}{dx} x = 1$.

With this perspective, we can finally answer the original question. The symbol $x$ refers to a coordinate function $x: M \to {\bf R}$ on the one-dimensional domain $M$ that one is working on. Usually, one "simplifies" things by identifying $M$ with ${\bf R}$ (or maybe a subset thereof, such as an interval $[a,b]$) and setting $x$ to be the identity function $x(p) = p$, but here we will adopt instead a more differential geometric perspective and refuse to make this identification. The inputs to $\frac{d}{dx}$ are smooth (or at least differentiable) functions $f$ on the one-dimensional domain $M$. Again, one usually "simplifies" things by thinking of $f$ as functions of the coordinate function $x$, but really they are functions of the position variable $p$; this distinction between $x$ and $p$ is usually obscured due to the above-mentioned "simplification" $x(p)=p$, which is convenient for calculation but causes conceptual confusion by conflating the map with the territory.

Thus, for instance, the identity $$ \frac{d}{dx} x^2 = 2x$$ should actually be interpreted as $$ \frac{d}{dx} (p \mapsto x(p)^2) = (p \mapsto 2x(p)),$$ where $p \mapsto x(p)^2$ denotes the function that takes the position variable $p$ to the quantity $x(p)^2$, and similarly for $p \mapsto 2x(p)$.

If one also had another coordinate $t: M \to {\bf R}$ on the same domain $M$, then one would have another differential $\frac{d}{dt}$ on $M$, which is related to the original differential $\frac{d}{dx}$ by the usual chain rule $$ \frac{d}{dt} f = \left(\frac{d}{dt} x\right) \left(\frac{d}{dx} f\right).$$ Again, for conceptual clarity, $t, x, f: M \to {\bf R}$ should all be viewed here as functions of a position variable $p \in M$, rather than being viewed as functions of each other.

Terry Tao
  • 108,865
  • 31
  • 432
  • 517
  • 3
    The answer to why this thread is on the front page now is that Michael Bächtold has been reactivating interest in these issues recently: see his answer here which links to some related threads he's interested in. Some of the people participating in those threads are also recently participating here (e.g., Mike Shulman in comments). – Todd Trimble Aug 21 '18 at 02:50
  • Ah, thanks for clearing that up. And now I see that Michael's answer in fact has a lot of overlap with mine. – Terry Tao Aug 21 '18 at 03:00
  • 2
    I noticed that you switched from $\partial/\partial{x^1}$ to $d/dx$ when $n=1$ without comment. We might think of it as a traditional variation in notation, but it can also be interpreted as taking the differential and then dividing by $dx$. Dividing by a vector (in the cotangent space) is unusual, but it makes perfect sense in $1$ dimension, and then it's a theorem that the two interpretations of $d/dx$ are equivalent. (Even in $1$ dimension, you can't divide by the zero vector, but $dx$ cannot be zero anywhere if $x$ is a coordinate, so that's OK.) – Toby Bartels Aug 23 '18 at 15:42
  • With your interpretation of $\frac{\text{d}}{\text{d}x}$ then the identity $\frac{\text{d}}{\text{d}t} x^2 = 2x$ would be true (if taken as a shorthand for the longer $\frac{\partial}{\partial t} (p^2) = (p \mapsto 2p)$, since in almost any context it is understood that the coordinates $x = t = \operatorname{id}$ on $\mathbb{R}$. But the whole premise of the question is why $\frac{\text{d}}{\text{d}x} \neq \frac{\text{d}}{\text{d}t}$... – Jannik Pitt Mar 25 '23 at 12:59
  • 1
    @JannikPitt, re, if we are in a context in which it is understood that $x$ and $t$ both stand for the identity function, then surely it is appropriate to conclude that $\frac{\mathrm d}{\mathrm dx}$ equals $\frac{\mathrm d}{\mathrm dt}$. I would argue that your point rather indicates that, in any context where one wants to distinguish these notations, one is not regarding $x$ and $t$ as identical (and identity) functions on $\mathbb R$. – LSpice Mar 25 '23 at 18:36
12

The accepted answer is good in that it draws attention to the subtleties involved, but as far as I can tell it doesn't really settle the matter.

Joel is careful to speak of a kind of binding of $x$ by $\frac{d}{dx}$, but at the same time he mentions that $x$ remains free in $\frac{d}{dx}x^3$. So is it free or bound?

It cannot be bound in the traditional sense (and Joel says that), otherwise we'd be allowed to rename bound variables ($\alpha$-convert) and write $$ \frac{d}{dx}x^2 = \frac{d}{dt}t^2, $$ which everyone since Leibniz would simplify to $$ 2x=2t. $$ It's probably a bad idea to have a mechanism wich allows us to conlcude that any two free variables are equal.

On the other hand $x$ cannot be free in the traditional sense, since if we substitute say $5$ for $x$ we'd get $$ \frac{d}{d5}5^2. $$ Most people would consider this meaningless. Even if we don't consider it meaningless, I fail to see how one could arrive from there to the expected result of $10$. (Certainly if you allow substituting $5$ for $x$ in $\frac{d}{dx}x^2$ you would also allow substituting $25$ for $5^2$ in $\frac{d}{d5}5^2$ to rewrite it as $\frac{d}{d5}25.$ But the same expression results if we substitute $5$ for $x$ in $\frac{d}{dx}(20+x)$, with the expected result now being 1.)

So we conclude that $x$ it is neither bound nor free in $\frac{d}{dx}x^2$. But which kind of binding is it then?

From a modern perspective it's tempting to say that $\frac{d}{dx}x^2$ is 'syntactic sugar' for $(\lambda x.x^2)' (x)$, where $f'$ denotes the derivative of a map $f:\mathbb{R}\to \mathbb{R}$ and $\lambda x.x^2$ is lambda calculus notation for the map $x\mapsto x^2$. But the expression $(\lambda x.x^2)' (x)$ has both a free $x$ (in the second parenthesis) and a bound $x$ (inside the $\lambda x.x^2$), while it's not clear which $x$ in $\frac{d x^2}{dx}$ is free/bound. So if we really want to interpret $\frac{d x^2}{dx}$ as syntactic sugar for $(\lambda x.x^2)' (x)$, there seems to be a proof missing that this notation is correct (which reminds me of Mike Shulman's question). We might also conclude what Andrej Bauer suggested elsewhere, that maybe $\frac{d f(x)}{dx}$ is broken notation that we should stop teaching.

Instead I'll argue that there is a consistent way of making sense of the notation $\frac{dy}{dx}$. It was already suggested in your question: interpret $\frac{d}{dx}$ as acting on "functions of $x$". You rightly asks what functions of $x$ are. Here's one way to answer that: interpret the variables $x$, $y$ of calculus as differentiable maps from a manifold $M$ (the state space) to $\mathbb{R}$. Call one such variable $y$ a function of $x$, if there exists $f:\mathbb{R}\to\mathbb{R}$ such that $y=f\circ x$. One can easily prove that if $y$ is a function of $x$ in this sense, then there is a unique $z:M\to \mathbb{R}$ such that $dy=z\cdot dx$ where $dx,dy$ are differential forms in the sense of modern differential geometry. (Indeed $z=f'\circ x$ and used to be called the differential coefficient of $dy$ wrt $dx$). Denote this unique $z$ with $\frac{dy}{dx}$.

You might not be very happy with the manifold $M$ appearing here, since it never appeared explicitly in the old calculus. I am not very happy with it either, which is why I asked this question, and only found now that you had already asked a very similar question several years earlier. (The answers you received there unfortunately don't satisfy me.)

Toby Bartels
  • 2,654
  • 1
    Before seeing this answer, I just added essentially this proposal as an answer to the "very similar question" you linked to. (-: I don't think it's necessary to restrict this notation to act only on "functions of $x$", however. For instance, if $z = x^2+y^2$ for another "independent" variable $y$, then ${\rm d}z = 2 x ,{\rm d}x + 2 y,{\rm d}y$, so that $\frac{{\rm d}z}{{\rm d}x} = 2 x + 2 y \frac{{\rm d}y}{{\rm d}x}$, which makes perfect sense. – Mike Shulman Aug 13 '18 at 08:37
  • @MikeShulman Hmm, I'm not so sure if one can run into trouble with allowing arbitrary applications of d/dx. Consider the following example: take the equation $x=1$ and derive both sides with respect to $x$ to arrive at $0=1$. – Michael Bächtold Aug 13 '18 at 08:43
  • In that case the problem is that ${\rm d}x = 0$, so you can't divide by it. The state-space perspective is that there is no distinguished operator to call "d/dx", it really is literally taking the differential $\rm d$ followed by dividing pointwise by the differential of $x$, ${\rm d}x$, so there is nothing to "allow" except to worry about whether ${\rm d}x = 0$. – Mike Shulman Aug 13 '18 at 08:59
  • 2
    Possibly a more faithful way to deal with the worry about whether ${\rm d}x=0$ is to consider all functions to be partial (as one generally does in calculus anyway). Then $\frac{{\rm d}y}{{\rm d}x}$ just has its domain restricted to the points of the tangent bundle where ${\rm d}x \neq 0$. Your calculation of $0=1$ is then perfectly valid as long as you keep track of domains, which are empty in this case -- the two functions $\emptyset \to \mathbb{R}$ constant at 0 and 1 are in fact equal! – Mike Shulman Aug 13 '18 at 09:02
  • 1
    @MikeShulman: I just re-read you're answer on the other question and have a better picture of what's going on now. From your perspective $dy/dx$ will always be a partial function on $TX$, but only when $y$ is (locally) a function of $x$ will $dy/dx$ be the pullback of a function on $X$. So only then will $dy/dx$ be an observable quantity on the same state space as $y$ and $x$. – Michael Bächtold Aug 13 '18 at 19:14
  • Note that the requirement to avoid dividing by zero is also what stops you from substituting $5$ for $x$ in $d(x^2)/dx = 2x$. It would be fine to substitute, say, $5y$ for $x$ instead; or to substitute $5$ for $x$ in the nearly equivalent equation $d(x^2) = 2x,dx$. – Toby Bartels Aug 20 '18 at 05:34
  • To prove the existence of $z$ such that $dy=z,dx$, you not only need that $y$ is a function of $x$ but that $y$ is a differentiable function of $x$. This is basic fine print, of course; my real point is that one can (and people did) use language exactly in that way: treating $x$ and $y$ formally as maps on $M$, $y$ is a differentiable function of $x$ iff there exists a differentiable map $f$ such that $y = f \circ x$; or treating $x$ and $y$ informally as variable quantities, $y$ is a differentiable function of $x$ iff there exists a differentiable map $f$ such that $y = f(x)$. – Toby Bartels Aug 21 '18 at 22:15
  • [I edited the answer yesterday to require $x$ and $y$ to be differentiable maps on $M$, which is just a technicality to make $dx$ and $dy$ exist. But at the time, I didn't notice that $f$ has to be differentiable too to make $z$ exist, and that's more than just fine print in this context, so I made it a comment.] – Toby Bartels Aug 21 '18 at 22:18
  • @TobyBartels thanks! I also forgot the fine print that $dx$ needs to be non-zero, otherwise $z$ is not unique and we would be allowed to say things like "3 is a function of 5". One might interpret the condition $dx\neq 0$ as saying that $x$ can 'truly vary' or is an 'independent variable', and that this is a necessary condition for some other variable quantity to change with $x$. – Michael Bächtold Aug 22 '18 at 11:33
  • True, although I like to say that if $y=f(x)$, then $dy=f'(x),dx$, as long as $f$ and $x$ are differentiable (so $dy=z,dx$, where $z=f'(x)$), regardless of whether $x$ is constant. But we need $dx\ne0$ for even $f$ (or $z$) to be unique (even locally). – Toby Bartels Aug 23 '18 at 13:54
  • 1
    The way I look at it, $d/dx$ is the operator that takes the derivative of the one-variable function that follows it, and the symbol $x$ is (or should be) irrelevant. So it just operates on the space of all differentiable one variable functions. I think that when we write $\frac{d}{dx} x^2$ the confusing part is the $x^2$, that should be written as $x\mapsto x^2$. Then $\frac{d}{dx} (x\mapsto x^2)$ and $\frac{d}{dt}(x\mapsto x^2)$ give the same answer, $(x\mapsto 2x)$, while $\frac{d}{dx}(t\mapsto x^2)$ and $\frac{d}{dt}(t\mapsto x^2)$ are both $0$. But I know this clashes with common usage... – Valerio Jul 07 '19 at 20:18
  • @Valerio yes, your suggestion clashes with 300 years of mathematical usage. And a question arises if we adopt it: why should we use a notation ($d/dx$) that contains an irrelevant letter ($x$), when we already have a perfectly fine notation for what you want to say? The prime notation $(x\mapsto x^2)'=(t\mapsto 2t)$ is almost as old as Leibniz $d/dx$: https://hsm.stackexchange.com/questions/6206. – Michael Bächtold Jul 07 '19 at 20:45
2

Edit: obviously some people didn't realise this answer was tongue-in-cheek. Also, I read the question differently to others, given its ambiguity, and didn't bother with the last (possibly most crucial) part of the question.

The gist of my answer was an expansion of Sam Gunningham's comment, namely that the operator "$\frac{\mathrm{d}}{\mathrm{d}x}$" is actually the restriction of a functor on the category of pointed smooth manifolds and pointed maps, to the subcategory consisting of 1-dimensional vector spaces over $\mathbb{R}$. The idea of coordinate-independence is captured in the principle of equivalence (the violation of which used to jokingly be called "evil" by some people), in that mathematics can't tell the difference between diffeomorphic manifolds, and so whatever we call this operator, $\frac{\mathrm{d}}{\mathrm{d}x}$ or $\frac{\mathrm{d}}{\mathrm{d}t}$ or what-have-you, they are all naturally isomorphic, and so indistinguishable in the 1-variable case. I do take the point, hashed out in the comments below, that in the multi-variable case things are more subtle, and I cede to Golderstern's answer.

But the punchline is that the category of manifolds can be defined in many different ways: from material sets, from structural sets, via synthetic differential geometry or via Fermat theories, so I contend there is not a single answer to (my reading of the) question.


I find the statement

"(differentiable) functions" (i.e., variable-agnostic sets of ordered pairs)

exceeding peculiar. A differentiable function is a certain arrow in the category of smooth manifolds, and even better, it's a arrow in the category where objects are finite-dimensional $\mathbb{R}$-vector spaces $E^n$ (for all $n$) with the usual topology. The tangent bundle functor takes a smooth function $f\colon E^n \to E^m$ and returns a smooth function $df\colon TE^n \to TE^m$ (the tangent bundle of $E^n$ is diffeomorphic to $E^{2n}$, hence again a vector space). Let us say we are in the case $n=1$. We can restrict this function to the tangent space of $E^1$ at $0$ and get a smooth function $E \to E^m$. No coordinates were chosen here.

But how did you get this category of manifolds? I hear you ask. Well, I started with the category of sets and did the usual thing. But how did this category of sets turn up? Well, to give the short answer, ETCS. The longer answer is that the category of sets (or rather, a category of sets strong enough to formalise all of undergraduate calculus and in fact most of mathematics) can be defined in terms of a first order theory. (Aside, if it irks you to miss out of the more hard-core parts of ZFC, use the foundational theory SEAR-C instead - it likewise doesn't define functions as sets of ordered pairs.)

At no point did I define a function to be a set of ordered pairs, and everything is independent of choices of coordinates.

Alternatively, we just say that $d/dx$ is an operation in the Fermat theory of $C^\infty$-rings. In this sense, smooth functions can be seen as models for a theory which is far more focussed than set theory, and there is no flab, in that this theory only talks about smooth functions.

[If you are asking questions that assume $df(t)/dt$ and $df(x)/dx$ are somehow distinguishable, and bringing foundational definitions into basic calculus, then expect answers that answer with a similar level of chutzpah]

David Roberts
  • 33,851
  • 1
    I think it is quite natural that (possibly naive) foundational questions are asked in a basic calculus course. Of course it is difficult to answer them at the right level. I think that the idea "a relation is a set of ordered pairs, a function is a special kind of relation" can be understood and applied by first year students, even if they may reject it when (or if) they later take a category-theoretic viewpoint. – Goldstern Dec 05 '12 at 13:40
  • 3
    I agree with Joel that the question is more about careful use of syntax (e.g., the proper manipulation of free vs. bound variables), and maybe not so much about things like set-theoretic foundations. It's connected with a familiar kind of abuse of notation seen in calculus courses where one writes $f(x)$ for a function when one really means $\lambda x. f(x)$. Here's an illustration of the trouble one can get into: we sometimes express FTC in the form $\frac{d}{d x} \int_a^x g(t) dt = g(x)$. Now suppose you ask a student to differentiate the function $F(x) = \int_a^x g(t) - g(x) dt$. (cont.) – Todd Trimble Dec 05 '12 at 14:35
  • 1
    Do you just substitute $x$ for $t$ in the integrand and get $g(x) - g(x) = 0$? No? Then how do you explain the rule properly? This would get into an interesting discussion of how variables are treated. – Todd Trimble Dec 05 '12 at 14:36
  • Yes, Todd, I think you and I are on the same page! That kind of issue is exactly how I take the question. – Joel David Hamkins Dec 05 '12 at 15:16
  • @Todd Trimble: I do not understand at all what problem you are trying to illustrate. What rule even? [Added: okay, I guess, I see the point. Added 2: I would however insist that the only reason there is any issue at all is that somehow one wants to force a two/multi variable-problem into a one variable context. Just like there are some text-problems with animals with 2 and 4 legs or some such thing that can seem difficult when one throws them at kids that are only allowed to use one variable.] –  Dec 05 '12 at 15:18
  • @quid: I'm not really talking about a problem I would throw at young calculus students. I'm just illustrating the type of difficulty that can arise if one plays fast and loose with symbolic manipulation (such as can arise in the study of $\lambda$-calculus if one isn't careful, but theoretically it can arise in the ordinary calculus as well). – Todd Trimble Dec 05 '12 at 15:41
  • 1
    You don't discuss such issues in calculus? I usually give just such kind of examples when I discuss the fundamental theorem, since this issue lies precisely at the heart of what the students find confusing about it. – Joel David Hamkins Dec 05 '12 at 16:04
  • 1
    I really think one can find this confusing if and only if one never saw a multivariable function. If you integergrat $7$ you do not write $\int_{a}^x 7(t) dt$ but $\int_{a}^x 7 dt$, while $7$ is actually the constant function not the number. So if you integerate $h(x,t)= g(x)$ subject to the second variable, named $t$ you also write just $\int_{a}^x g(x) dt$ instead of $\int_a^x h(x,t) dt$ since it is convenient. I do not know what you intend to explain, but the explanation via multivariate functions seem completely transparent and natural. –  Dec 05 '12 at 16:41
  • 1
    I don't know if this will help, quid, but you could think about this not from the point of view of a classroom, but of how to correctly implement in some computer language the formal symbolic expression of the FTC: $\frac{d}{dx} \int_a^x g(t) dt = g(x)$. Informally, the rule says "substitute $x$ for the variable $t$ named in the integrand $g(t)$", but it's not as simple as that. You have to have some sort of restriction such as "unless $x$ appears freely in the term $\lambda t. g(t)$", some sort of formal syntactic rule recognizable by the program. It's not enough to say "first rewrite (cont.) – Todd Trimble Dec 05 '12 at 17:45
  • $g(x, t)$ as $H(x)$" as you suggest, because then from the point of view of the rule, $H(x)$ is as a constant as a function of $t$, and the rule "substitute $x$ for $t$ in $H(x)$" would return $H(x)$, which is not correct. Instead, the program should return a message "typing error" or something like that. – Todd Trimble Dec 05 '12 at 17:48
  • (Actually, let me take back my last sentence -- the program should not answer just "typing error" -- it should be sophisticated enough to give the correct answer to "evaluate $\frac{d}{dx} \int_a^x g(t) - g(x) dt$", namely $-(x-a)g'(x)$. But firstly it should recognize the variable $x$ occurring freely in $\lambda t. g(t) - g(x)$, and not blindly apply the rule that says to substitute $x$ for $t$.) – Todd Trimble Dec 05 '12 at 18:10
  • @Todd Trimble: actually I did not precisely suggest what you suggest I suggestd, but anyway, I think I understood you point better now. Thanks. I would not debate that if one is to write some computer algebra system this lambda calculus and related is useful/relevant (and indeed had some very vague understanding of this before). Still, it seems to me this has very little to do with the original question which seems more 'classroom' than 'computer algebra developpment' since I think for a CAS to handle a (real) function as sets of pairs will be a bit of a challenge ;) –  Dec 05 '12 at 18:41
  • 2
    @quid: I'm glad you understand me better. But as I see it, the main issue the OP has is: what is the precise role of the notation "$x$" when we write $\frac{d}{dx}$, and this is the syntactic issue that I (and I think Joel) want to address. He or she only brings in the bit about "differentiable functions as sets of ordered pairs" to say that that doesn't address the problem at hand (so let's not get deflected by that). A careful treatment of what I think is worrying OP would center on how the correct handling of variables, which is an important issue made more piquant by invoking (cont.) – Todd Trimble Dec 05 '12 at 19:37
  • (cont.) the image of computers, but also comes up with human users who wish to understand the precise scope of formal notation and formal rules for manipulating it. In particular, I think the OP is after that kind of understanding, and Joel seems to believe it's a good thing to bring up in the classroom as well. – Todd Trimble Dec 05 '12 at 19:40
  • @Todd Trimble: it is not quite clear to me what OP is after but they already asked a similar question which is even more 'philosophical'. Nothing against that in principle, but if over the course of almost a year they were not able to get a more detailed understanding of the issue than the one suggested to me by this question, then I see little reason for welcoming this most vague question here. In addition all this is not really limited to this particular calculus type of thing so it seem some sort of red-herring anyway to focus on it. Say let us look at polynomials: –  Dec 05 '12 at 19:52
  • let R be a ring or say the reals as we are discussing caculus, and consider the poly. ring in one var., denote it R[X] but why not R[Y] perhaps, is this now the same object or a different one? If one 'constructs' them in the typical way it is literally indentical, as it's only about whether one calls (0,1,0,...) an X or a Y. But in 'everyday' usage one might want to make a distinction in order to, I don't know, say to consider a map R[X] -> R[Y] , X->Y^2 and then do something while causing less notational confusion. Or to 'later' add in addition a second variable. So, same 'problem' there. –  Dec 05 '12 at 20:08
  • Okay, I will grant that it is debatable whether it's a good question for MO; might be better for math.stackexchange. I'll also grant that the other question Jason asked at MO was a little vague, but in case he's interested, we had a semi-sophisticated discussion about similar things at the n-Category Cafe, where we discuss "the $\lambda$-theory of high school calculus" and its semantics (look up "snowglobe models" and follow links). I'll quit here by saying that if this gets reopened, I'll be looking forward to Joel's prepared answer. – Todd Trimble Dec 05 '12 at 20:57
  • 1
    Todd, well, it's really nothing much, but I posted what I had wanted to write in my answer on my blog at http://jdh.hamkins.org/the-differential-operator-ddx-binds-variables/. But I think you've got the point that I take to be key already. – Joel David Hamkins Dec 05 '12 at 22:41
  • 1
    One more down-vote and I could get the peer-pressure badge! ;-) – David Roberts Dec 05 '12 at 23:26
  • 4
    @quid: Concerning "more 'classroom' than 'computer algebra development'": I've found that calculus classrooms often include students who want to compute by explicit, precise rules, just like a computer algebra system (only less sophisticated), and who get very confused when they are told (or they merely get the impression) that the correct manipulation of symbols in mathematics depends on thinking about what the symbols mean. Such thinking is, for them, a mystery; they want (at least) that this "meaning" be unambiguously inferrable from what is written. I think that underlies this question. – Andreas Blass Dec 06 '12 at 00:59
  • 1
    I should add to my previous comment that the students I describe are not just "stupid"; some of them are quite intelligent, but their thought processes are simply different from mine. – Andreas Blass Dec 06 '12 at 01:01
  • I'm going to downvote you David, since you seem to want it. ;-) – Todd Trimble Dec 06 '12 at 03:28
  • @Todd - I'm not sure I do want it :-) I'm not going to delete this answer because there's a lot of discussion here that is potentially helpful. – David Roberts Dec 06 '12 at 04:45
  • @Andreas Blass: Is this an invitation to start a discussion about teaching calculus? :) –  Dec 06 '12 at 15:13
  • @Goldstern I've now taught a discrete maths/combinatorics course (with proscribed lecture content) for first year students that did indeed try to define functions as special relations. They didn't find it that easy... – David Roberts Jan 20 '18 at 22:58
2

I repeat (a variant of) my comment, even though I agree that it is shallow and has low entertainment value.

As long as we are only looking at functions in one variable, there is only one differential operator $D$, which may be called $\frac d{dx}$ or $\frac{d}{dt}$ depending on the context.

If you look at a composite function $f \circ g$, you may introduce the notation/abbreviation $x=g(t)$, $y=f(x)$, then

  • $\frac {d}{dx} f$ or $\frac d {dx} y$ is just $D(f)$,
  • and by $\frac{d}{dt} f$ or $\frac d{dt} y$ you mean $D(f\circ g)$.

So here both $\frac {d}{dx} $ and $\frac {d}{dt} $ have a meaning, and the meaning is different.

When we look at functions in, say, two variables (do they appear in first year calculus?), we implicitly introduce an (arbitrary) order of variables, say x is the first and t the second, and $\frac{\partial}{\partial x}$ is the partial derivative with respect to the first variable. This makes sense even if you treat functions as "variable-agnostic" sets of ordered pairs. (Which I do all the time, and do not find peculiar at all. Tastes differ.)

Of course, the intended meaning always depends on the context. If $f$ is a binary function, $\frac d {dt} f$ may be a variant notation for $\frac{\partial}{\partial t}f$, or it may be understood that we are really looking at a unary function $\hat f$ obtained by composing $f$ with some function $t \mapsto (x(t), y(t))$.

Goldstern
  • 13,948
2

I am a little late to the question but wanted to add a low-tech answer which somehow complements JDH's answer:

The operators $\frac{d}{dx}$ and $\frac{d}{dt}$ are as distinguishable as $f(x)$ and $f(t)$.

Probably this formulation is a little too vague but it should just reflect that writing $d/dx$ says how one has named the free variables. As already illustrated, one gets into notational ambiguities in cases as $\frac{d}{dt} f(t,x(t))$...

Dirk
  • 12,325
2

I believe that the origin or the problem here is the common schizophrenic mixture of Leibniz-time notation with Bourbaki-time foundations of calculus.

When trying to make sense of the traditional Leibniz notation in calculus, it is important to understand that even the notion of function in the 18th century was not what it is now. Now a function is something along the lines of "a set of pairs," "a functional relation," "a triple of a functional relation, a domain, and a co-domain," "a morphism in the category of sets." In the 18th century, to be a function was a property/predicate on variables: one variable could be a function of another, and also two variables could be functions of each other.

So, if you wish to use the notation $dy/dx$ in a non-schizophrenic way, you probably need to return to the "original" approach: you have different quantities, which are denoted by variables, and some of these variables are "functions" of others, because they are related by some identities, and then you can find expressions for differentials of variables, like $dx$ and $dy$, and also to calculate $dy/dx$. Of course, you will not be talking about any "operators" acting on anything this way.

As a work-around solution for teaching, I think of $d/dx$ as a form of a macro, or syntactic sugar, acting on expressions: if $y = x^2$, then $$ \frac{d}{dx}y =\frac{d}{dx}x^2 = 2x. $$

  • "... if they are related by some identities" Which raises the question: which identities? Is for instance y a function of x when any of the following identities hold: $y+ye^y=x$, $x^2+y^2=1$ or $x=1$? (I'd say yes, no, no, but how to make that precise with your approach?) And sometimes we might want to assume that y is a function of x, without being given any identity. So we probably need "identity" and "function of" to not be purely syntactical notions, but really concepts internal to the mathematical foundation we are seeking. – Michael Bächtold Feb 25 '23 at 10:04
  • Identites are part of MLTT. How to make "function of.." internal was suggested by Mike Shulman in my other question. Another vague idea, that I communicated to you per mail: Internalize contexts and allow variables under contexts to be part of the foundation (I think contextual modal type theories do something similar, but I haven't seen one which includes identity types). So write $y|_A$ for "the variable y when A holds", where A is a list of identities (a context). With this define a variable y to be a constant, if for any two contexts $A,B$ we have $y|_A=y_B$. – Michael Bächtold Feb 25 '23 at 10:07
  • Finally define $y$ to be a function of $x$, if for all contexts $A$ we have '$x|_A$ constant implies $y|_A$ constant'. – Michael Bächtold Feb 25 '23 at 10:07
  • 1
    @MichaelBächtold, regarding the first question, I think there would be no universal answer, the conventions may depend on the problem at hand. One may introduce the notion on $y$ being locally a function of $x$. One may even wish to talk about "multi-valued functions." – Alexey Muranov Feb 25 '23 at 11:31
  • 1
    @MichaelBächtold, re$\newcommand\d{\mathrm d}$, aside from historical reasons, from this perspective why does it matter which identities determine functions? For example, I think it is quite reasonable to conclude from $x^2+y^2=1$ that $\frac{\d y}{\d x}=-\frac xy$ without any worry about whether or not $y$ is a function of $x$, only about division by $0$. (E.g., we have $\frac{\d y/\d t}{\d x/\d t}=-\frac x y$ whenever $x$ and $y$ are functions of a third variable $t$ everywhere satisfying $x^2+y^2=1$, $\d x/\d t\ne0$, $y\ne0$.) – LSpice Mar 25 '23 at 18:30
  • 1
    Re, of course, as a comment elsewhere by @TobyBartels reminded me, I meant for $x$ and $y$ to be differentiable functions of $t$ satisfying the indicated identities. – LSpice Mar 25 '23 at 18:39
  • 1
    @LSpice good question. I don't have an answer. I would need to do the experiment and try to teach calculus without using the function concept, to see how far one gets and where it is really necessary. – Michael Bächtold Mar 25 '23 at 19:40
  • 1
    @LSpice Ok, maybe I do have one answer: suppose you are not given any concrete relation between x and y (say in form of an elementary equation involving x and y), then how do you know if you are allowed to talk about dy/dx? Or put differently: suppose you want to talk about some general properties of dy/dx without being given a concrete equation, what would you have to assume about the relation between x and y in order for dy/dx to make sense? One answer is: assume y is a function of x (although this is not the only setting where dy/dx is meaningful, as you point out.) – Michael Bächtold Mar 25 '23 at 19:49
  • @LSpice Or suppose you are given the concrete relation y = x+z (y being a function of two variables), would you still consider dy/dx meaningful? What would it represent? – Michael Bächtold Mar 25 '23 at 20:35
  • 1
    For 1, I do not mean to dispose of functions, only to regard them as just a particular case of relations, or of identities. For 3, I would say that $\frac{\mathrm dy}{\mathrm dx} = 1 + \frac{\mathrm dz}{\mathrm dx}$. For 2, I am surely missing the subtlety, but the relation "$y$ is a function of $x$", if it is to be used explicitly to differentiate $y$ in terms of $x$, must surely be expressed as an identity $y = f(x)$? – LSpice Mar 25 '23 at 21:05
1

Two answers: (1) Distribution theory. On the space $\mathcal D'(\mathbb R)$ of continuous linear forms on $\mathcal D(\mathbb R)=C_c^\infty(\mathbb R)$ it is easy to define the first derivative: $$ \left\langle\frac{du}{dx},\phi\right\rangle_{\mathcal D'(\mathbb R),\mathcal D(\mathbb R)}= -\left\langle u,\frac{d\phi}{dx}\right\rangle_{\mathcal D'(\mathbb R),\mathcal D(\mathbb R)}. $$ You get the ordinary derivative of a differentiable function, also $H'=\delta$ ($H$ is the Heaviside function, characteristic function of $\mathbb R_+$, $\delta$ the Dirac mass), $$ \frac{d}{dx}(\ln \vert x\vert)=\operatorname{pv}\frac{1}{x} $$ and many other classical formulas. In particular, you can define the derivative of any $L^1_\text{loc} $ function, of course not pointwise but as above.

(2) Operator theory. In $L^2(\mathbb R)$, you consider the subspace $H^1(\mathbb R)=\{u\in L^2(\mathbb R), u'\in L^2(\mathbb R)\}$, where the derivative is taken in the distribution sense. Then the operator $d/dx$ is an unbounded operator with domain $H^1(\mathbb R)$. It is even possible to prove that the operator $\frac{d}{i\,dx}$ is selfadjoint.

LSpice
  • 11,423
Bazin
  • 15,161
  • 3
    I do not think this answers the question in any way, however it highlights a shortcoming of it. So thanks for the long comment. –  Dec 05 '12 at 11:22
  • 1
    @quid: How is "d/dx can be viewed as an operator on distributions/Sobolev spaces" not an answer in any way to "on what does d/dx operate/what are its inputs"? What are the particular shortcomings of the question you have in mind? – Martin Dec 05 '12 at 12:41
  • 4
    @user49437: Is now $d/dt$ also an operator on this space? If so, is it the same as $d/dx$ or perhaps the zero-operator or still something else. While what you quote is in the title the actual question was definitely not about giving spaces where a derivative is 'nicely' definable. One shortcoming of the question is that it does not make precise what $d/dx$ should even mean at all. There are, any number of ways to define some map somewhere one might reasonaby call '$d/dx$' consistent with calculus; specifically, is the context strictly single variable or not (cf Goldstern) –  Dec 05 '12 at 14:09
  • Thanks for the clarification. Well, I still think this does provide an answer to the the title + the first and last sentences without "distinct from d/dt". The rest of the post is apparently homotopic to an interesting question that eludes me so far, so I'm curious to see what answer Joel David Hamkins has prepared... – Martin Dec 05 '12 at 17:09
  • @user49437: You are welcome. And, it is true what you say, but it is my understanding the point of the question is precisely and only the 'distinct from d/dt' thing. But I agree it is not quite clear what OP wants. –  Dec 05 '12 at 17:24
1

Little late to the game, but when dealing with operators, as I constantly do, I always keep in mind Bernard Friedman's comment in one of his books on differential equations and Green functions to the effect: An operator is ill-defined if you don't specify on what it operates. He goes on to talk about the boundary conditions of differential equations that make them well-defined. (This notion is also manifest in the necessity of inventing and distinguishing among partial derivatives and total / material derivatives, constants w.r.t. to whatever, dependent and independent variables, Lagrangian versus Eulerian frames of reference, active and passive transformations, etc.)

Feynman stated similar sentiments when talking about the grad, or ∇ operator, in Section 2.4 of his lecture "Differential calculus of vector fields". After first talking about the beauty of mathematical abstraction in separating the del op from what it is operating on--"We leave the operators, as Jeans said, 'hungry for something to differentiate'.”--he then says, "You must always remember, of course, that ∇ is an operator. Alone, it means nothing."

In quantum mechanics, there is a similar but more heated situation. There have been skirmishes between different schools of researchers with different interpretations of the math of quantum mechanics, but in the words of Rota, "Le teorie vanno e vengono ma le formule restano." (The theories may come and go but the formulas remain.) ... because the math is fairly clear on what mathematical ops act on what mathematical functions to produce a physical observable. It's the Copenhagen probability wave function versus the de Broglie-Bohm guiding field that's the rub here for some--the interpretation of the function rather than the op.

As far as derivatives go, I can apply one term-by-term to a formal power series in one independent variable, that is divergent except at the origin. Such a series is not even a function yet the term-by-term differentiation is consistent with formal composition and compositional and multiplicative inversion of the series even formal Laplace and Legendre transforms. However, the interpretation of the derivative at a global level as the slope of some curve at some point is nonsensical. Distribution theory sheds no light on the situation. Category theory? Enlighten me. No math god or demon, no ghost of Plato gives any meaning to the differentiation aside from my specification of what it operates on and how.

Even though divergent there are often operations on the series that can give it even a physical significance. As Heaviside said, "It's divergent! Good, then maybe I can do something with it." And, any divergent series is a derivative, in my prescription, of another divergent series. I can truncate the series and assign the usual meanings, but where to truncate it? The theory of divergent series can often give some algorithm for where to truncate it so that the truncation has some significance, but then I'm throwing in another operator and specifying how it, the derivative operator, and the divergent series interact, and that's a cottage industry, assigning meaning to the "work of the Devil".

Here's another perspective--two diff ops that are equivalent when acting on certain beasts and quite different when acting on others. The hungry ops are

$$O_1 = \sum_{n \geq 0} a_n z^n \frac{\partial_{z=0}^n}{n!} $$

and, in umbral notation with

$$(1-a.)^\beta= \sum_{k \geq 0} (-1)^k \binom{\beta}{k}a_k = \triangle_{k \geq0}^\beta a_k,$$

$$O_2 = \sum_{n \geq 0}(-1)^n (1-a.)^n z^n \frac{\partial_z^n}{n!}.$$

When acting on $z^m$ for $m= 0,1,2,...$, they give

$O_1 z^m = a_m z^m$

and

$O_2 z^m = (z - (1-a.)z)^m = (a.z)^m = a_mz^m.$

They are the same, yet acting on $z^{\alpha}$ for $\alpha > 0$ and not an integer,

$O_1 z^\alpha$ diverges

and

$O_2 z^\alpha = (z-(1-a.)z)^{\alpha} = (\triangle_{n \geq 0}^\alpha\triangle_{j=0}^n a_j)z^{\alpha} = a_{\alpha}z^{\alpha},$

a Newton-Gregory series that might or might not diverge. One can assign meaning to $O_1$ via $O_2$ where it gives a convergent result, but I'm assigning that meaning. $O_1$ isn't capable of making that choice by itself.

(A theoretical physicist or a Ramanujan might even see an expression in terms of the discrete integers $n$ and then operate on it with $\frac{\partial}{\partial n}$ and find a meaningful useful interpretation. Whatever works.)

Tom Copeland
  • 9,937
  • The abstraction Feynman spoke of is the de-binding while the opposite is the binding Hamkins discusses. Fertility is achieved in the tension between the two. – Tom Copeland Mar 25 '23 at 17:55
0

The question is: how to properly render $$\frac{d}{dx}: (x ↦ f(x)) ↦ (x ↦ f'(x)),$$ so as to make the tie-in with $x$ explicit.

Since $x ↦ f(x)$ is synonymous with $(λx)f(x)$, which is $f$, itself, by the $η$-rule - similarly for $x ↦ f'(x) = (λx)f'(x) = f'$, then we can equate $$\frac{d}{dx}(\_) = Dλx(\_),$$ where $D = (λf)f'$.

So, it's what we might call a "binding" operator, since it implicitly contains a nested $λ$ in it. It's understood that it's only a partial operator, since not all $f$ are differentiable at all points of interest.

For partial derivatives, we might write: $$\frac{∂}{∂x} = λ(x,y)(Dλxf(x,y))(x), \quad \frac{∂}{∂y} = λ(x,y)(Dλyf(x,y))(y).$$ To render this accurately requires extending the syntax for $λ$-expressions to include tuple constructor and deconstructors satisfying the identities: $$H(x,y) = x, \quad I(x,y) = y, \quad (H(z),I(z)) = z.$$ Then, we can write this as: $$\frac{∂}{∂x} = λz(Dλxf(x,I(z)))(H(z)), \quad \frac{∂}{∂y} = λz(Dλyf(H(z),y))(I(z)),$$ or by reusing the earlier definition of total derivatives: $$\frac{∂}{∂x} = λz\left(\frac{d}{dx}f(x,I(z))\right)(H(z)), \quad \frac{∂}{∂y} = λz\left(\frac{d}{dy}f(H(z),y)\right)(I(z)).$$

In contrast, this can't really be considered a decisive answer, since you actually want to render the kind of distinction seen in the following example: $$f(t, u) = F(t + u, t)\quad⇔\quad F(s,t) = f(t, s - t),$$ where $$ \left(\frac{∂}{∂t}\right)_u(\_) = \left(\frac{∂}{∂s}\right)_t(\_) + \left(\frac{∂}{∂t}\right)_s(\_),\quad \left(\frac{∂}{∂u}\right)_t(\_) = \left(\frac{∂}{∂s}\right)_t(\_),$$ and $$ \left(\frac{∂}{∂t}\right)_s(\_) = \left(\frac{∂}{∂t}\right)_u(\_) - \left(\frac{∂}{∂u}\right)_t(\_),\quad \left(\frac{∂}{∂s}\right)_t(\_) = \left(\frac{∂}{∂u}\right)_t(\_), $$ in as direct of a way as possible.

I think that in this case, it would be more fruitful to treat the operators as differential operators in the sense of differential geometry on differentiable manifolds, since this seems to best fit with and fully encompass the intended usage.

Non-Analytic Usages
There are other usages, devised by analogy, that lie outside differential geometry; e.g. Differential Algebra. This can be generalized to semi-rings as in the following example.

Consider the following context-free grammar $$S → u L v | x, \quad L → λ | S L,$$ over a monoid $M$ for which $u,v,x ∈ M$, where $λ ∈ M$ denotes the identity. (Footnote: the notion of context-free languages and context-free grammar can be defined for arbitrary monoids, not just for free monoids). As an algebraic system, it can be expressed as: $$S ≥ u L v + x, \quad L ≥ 1 + S L,$$ over suitably-defined algebra $ℜ(M) ⊆ ℭ(M)$, that contains an idempotent sum $+$ (i.e. $x + x = x$) and multiplicative identity $1$ in place of $λ$. More precisely, $ℜ(M)$ and $ℭ(M)$ are, respectively, the rational and context-free subsets of $M$, with singletons $\{m\}$, for $m ∈ M$ denoted as just $m$, the identity $1 = \{λ\}$, the sum $A+B = A∪B$, for $A,B⊆M$ and $≥$ denoting inclusion of subsets.

This is a system of fixed-point inequations, in which the desired solution is the least fixed point. (Another way of saying the same is: it's a non-numeric "optimization problem").

If $M$ is a free commutative monoid (that is: a free object in the category of commutative monoids), then by Parikh's Theorem $ℜ(M) = ℭ(M)$, the simplest proof arising directly by way of an analogue differential calculus. The least fixed point of $x ≥ f(x)$, written as $(μx)f(x)$ is $x = f'(f(0))^* f(0)$, where $A ↦ A^*$ is the Kleene star operator - the same as what's used with regular expressions. This applies even in the multivariate case - but with partial differential operators.

Thus, $$L ≥ 1 + S L\quad⇒\quad(μL)(1 + SL) = \left(\frac{∂}{∂L}(1 + LS)\right)^*(1 + 0S) = S^*.$$ Substituting $f(S) = (μL)(1 + SL)$, we obtain: $$S ≥ u L v + x\quad⇒\quad(μS)(1 + xf(S)y) = \left(\frac{∂}{∂S}(1 + xf(S)y)\right)^*(1 + xf(0)y).$$ For commutative Kleene algebras, $x↦x^*$ behaves as the exponential function. Therefore, $0^* = 1$, $d(A^*)/dA = A^*$. Thus, $$(μS)(1 + xf(S)y) = \left(xf'(0)y)\right)^*(1 + x(f(0))y) = (xy)^*(1 + xy) = (xy)^*.$$ (In Kleene algebra $A^*(1 + f(A)) = A^*$ for any polynomial $f(A)$.) Together, this yields the least fixed point solution $L = {(xy)^*}^* = (xy)^*$ and $S = (xy)^*$, since ${A^*}^* = A^*$ in Kleene algebra.

  • 1
    Your first typing judgement says that $d/dx$ takes a function as input and returns a function, but your definition a few lines below suggests it takes a number as input and returns a function. – Michael Bächtold Jan 10 '24 at 08:12
  • You need to be more specific. This one: $\frac{d}{dx}(y) = Dλx(y)$? The $(_)$ place-holder is not an argument in the usual sense, because it can take bound variables. So it has to be read as something at the "alpha level", so to say, rather than "beta level"; and the place holder is more akin to being of what's referred to as a "reference type" in programming languages. It is a syntactic functional, maybe formalized as such in the Magma (or generalization thereof) underlying the Term algebra. – NinjaDarth Jan 14 '24 at 23:55
  • 1
    This gets more directly to the question: what type of functional or operator is this: $λx(_)$; i.e. $y ↦ λx(y)$, when we want bound $x$'s in $y$ to be regarded as such? I think you need to fall back to term algebras, and to magmas (or whatever the generalization of magma's is called) to formalize that as a function that respects the handling of bound variables. It's just easier to skip the formalization and describe it as just a syntactic functional, instead; somewhat like Landin's "let (_) = (_) in (_)". – NinjaDarth Jan 15 '24 at 00:02
  • 1
    Yes, what I meant was your (1) $\frac{d}{dx}: (x\mapsto f(x))\mapsto (x\mapsto f'(x))$ and later (2) $\frac{d}{dx}(y)=D\lambda x(y)$. (1) suggests that $d/dx$ takes a function $f:\mathbb{R}\to\mathbb{R}$ and returns a function $f':\mathbb{R}\to\mathbb{R}$, while in (2) $d/dx$ takes a term $y:\mathbb{R}$ containing a free variable $x$ (which is not a function) and returns a function of type $\mathbb{R}\to\mathbb{R}$ (not the same as (1)). – Michael Bächtold Jan 15 '24 at 08:42
  • In your comments you observe correctly that with (2), $d/dx$ acts on syntactic expression (I'd say at the meta level), so it stops being a mathematical operation, which is different from how mathematicians used to use it, I find. – Michael Bächtold Jan 15 '24 at 08:44
  • It doesn't "stop being a mathematical operation". The syntax, itself, is a mathematical structure that can be, and is normally, formalized as a magma or as its generalization to different signatures, which are called term algebras for that signature. The elements of a free object in a term algebra are, in fact, one and the same as an abstract syntax tree over that signature, and it is here that you formalize such notions as abstract syntax tree. This is where the "syntax" or "meta level" - as you refer to it - actually lives. – NinjaDarth Jan 20 '24 at 02:16
  • There is clearly a difference between a mathematical operator like the derivative ', which acts on modern functions and your d/dx, which acts on syntactical expressions. For instance, from $y=x^2$ we cannot conclude $dy/dx=d(x^2)/dx$ inside the formal system. – Michael Bächtold Jan 21 '24 at 10:26