158

One of the many articles on the Tricki that was planned but has never been written was about making it easier to solve a problem by generalizing it (which initially seems paradoxical because if you generalize something then you are trying to prove a stronger statement). I know that I've run into this phenomenon many times, and sometimes it has been extremely striking just how much simpler the generalized problem is. But now that I try to remember any of those examples I find that I can't. It has recently occurred to me that MO could be an ideal help to the Tricki: if you want to write a Tricki article but lack a supply of good examples, then you can ask for them on MO.

I want to see whether this works by actually doing it, and this article is one that I'd particularly like to write. So if you have a good example up your sleeve (ideally, "good" means both that it illustrates the phenomenon well and that it is reasonably easy for others to understand) and are happy to share it, then I'd be grateful to hear it. I will then base an article on those examples, and I will also put a link from that article to this MO page so that if you think of a superb example then you will get the credit for it there as well as here.

Incidentally, here is the page on this idea as it is so far. It is divided into subpages, which may help you to think of examples.

Added later: In the light of Jonas's comment below (I looked, but not hard enough), perhaps the appropriate thing to do if you come up with a good example is to add it as an answer to the earlier question rather than this one. But I'd also like to leave this question here because I'm interested in the general idea of some kind of symbiosis between the Tricki and MO (even if it's mainly the Tricki benefiting from MO rather than the other way round).

Amir Sagiv
  • 3,554
  • 1
  • 24
  • 52
gowers
  • 28,729
  • 6
    Here's a related question: http://mathoverflow.net/questions/21214/particular-problem-solved-by-solving-a-more-general-problem – Jonas Meyer Sep 26 '10 at 08:12
  • 5
    Here's another related question: http://mathoverflow.net/questions/31699/strengthening-the-induction-hypothesis – Tony Huynh Sep 26 '10 at 10:44
  • 2
    @Tony -- that's great, and will also be helpful for the Tricki article. – gowers Sep 26 '10 at 11:04
  • Cayley's tree counting theorem has several proofs that work like this. The nicest one may be the generating function one, which I added as an answer for the previous MO question. – Peter Shor Sep 26 '10 at 16:47
  • 1
    You may be interested in Tom Leinster's post here: http://golem.ph.utexas.edu/category/2010/03/a_perspective_on_higher_catego.html where he uses buffon's noodle as a generalization of buffon's needle as an example of a more general problem which is easier to solve. – Steven Gubkin Sep 28 '10 at 22:06
  • A general situation: A theorem plus several easy lemmas needed to prove it, has a stronger content than the theorem only, but may be easier to prove, because the list of the statements of the lemmas is a road-map to the proof of the theorem. – Pietro Majer Apr 05 '17 at 15:47
  • I just recently watched Mathologer's "best way to lace your shoes" video, where he outlines (starting at 17:17) a proof of the optimality of the 'obvious' best way by considering generalised lacings: https://www.youtube.com/watch?v=CSw3Wqoim5M#t=17m17s . – LSpice Jul 05 '20 at 20:26

48 Answers48

80

Great question. Maybe the phenomenon is less surprising if one thinks that there are $\infty$ ways to generalize a question, but just a few of them make some progress possible. I think it is reasonable to say that successful generalizations must embed, consciously or not, a very deep understanding of the problem at hand. They operate through the same mechanism at work in good abstraction, by helping you forget insignificant details and focus on the heart of the matter.

An example, which is probably too grandiose to qualify as an answer, since your question seems very specific, is Fredholm theory. At the beginning of last century integral equations were a hot topic and an ubiquitous tool to solve many concrete PDE problems. The theory of linear operators on Banach and Hilbert spaces is an outgrowth of this circle of problems. Now, generalizing from $$ u(x) + \lambda \int _a ^b k(x,s) u(s) ds = f(x) $$ to $$ (I+ \lambda K) u = f $$ makes the problem trivial, and we do it without thinking. But it must have been quite a shock in 1900.

55

Weyl's equidistribution theorem states that for $\alpha \notin \mathbb{Q}$, the sequence of fractional parts $x_n = \{ n \alpha \}, n=1,2,\dots$ is equidistributed in the unit interval, i.e. $\lim_{N \to \infty}\frac{|\{x_n\}_{n=1}^{N} \cap (a,b)|}{N} = b-a$ for every subinterval $(a,b) \subset [0,1]$.

The fact that this sequence is dense in $[0,1]$ is a simple application of the pigeonhole principle, but the fact that it is equidistributed seems a lot harder to prove at first. However, this is very easy to establish if one generalizes the definition of equidistribution (or more precisely, the objects appearing in it). Namely, by very easy arguments one can show that $x_n$ is equidistributed in $[0,1]$ if and only if $\frac{1}{N} \sum_{n=1}^{N} f(x_n) \to \int_{0}^{1}f(x)dx$ for every continuous (or Riemann integrable) function $f$ on $[0,1]$. Now one can't resist the temptation to look for functions $f$ for which this is easy to show. The complex exponentials $f_m (x) = e^{2 \pi i m x}$, where $m \in \mathbb{Z}$, are such functions and by classical Fourier analysis (or the Stone-Weierstrass theorem) it is enough to show the above convergence for such functions. One is thus led to the Weyl criterion, which reduces the problem of showing that the sequence $\{ n \alpha \}$ is equidistributed in $[0,1]$ to a simple computation involving the sum of a geometric sequence. (Of course, Weyl's criterion is useful in studying other, more involved sequences as well)

In essence, we started with a problem which concerns indicator functions of subintervals of $[0,1]$, generalized it to a problem involving a much bigger class of functions $\mathcal{F}$ and then found a nice, special subclass with which we can "capture" the whole class $\mathcal{F}$ (and in particular the functions we started with). This procedure of "generalize and then specialize" seems quite common in analysis. In this case it also has some relation to the Tricki article "Turn sets into functions".

Mark
  • 4,804
47

The canonical example to introduce this idea early to students is the Fundamental Theorem of Calculus. In order to figure out one area $\int_{a}^b f(t)dt$, you must come to grips with the generalized problem $x \mapsto \int_{a}^x f(t)dt$

Thierry Zell
  • 4,536
  • 5
    Some of the power series/Taylor series also fall into this category, they are probably easier to evaluate as power series than at a point.

    In both situations one can see easely why sometimes the more general problem is easier. Studing the (signed) area as a function or a series as a power series allows one to use one extra tool: derivation (which is actually the key for this problem).

    – Nick S Sep 28 '10 at 19:22
35

There are many examples where introducing one or more extra parameters into an integral that you want to evaluate or an identity that you want to prove makes things easier. For example, Feynman was fond of evaluating integrals by differentiating under the integral sign, and in his advanced determinant calculus, Christian Krattenthaler explicitly urges you to introduce more parameters into any determinant you are having trouble evaluating.

For the Tricki, maybe one of the simplest examples would be the evaluation of $\int_0^\infty {\sin x \over x}dx$ by considering $\int_0^\infty e^{-xt} \bigl({\sin x \over x}\bigr) dx$ and differentiating with respect to $t$.

Timothy Chow
  • 78,129
  • As a matter of fact there's already an example of this in the proto-article as it currently is. But there's definitely no harm in having another one, especially as just this special case is an interesting technique in itself. – gowers Sep 27 '10 at 13:02
  • 7
    Great examples with integrals! One that I remember reading in a Polya's book way back when is $$\int_{-\infty}^{\infty}\frac{1}{(x^2+1)^2},$$ which can be evaluated by replacing 1 with $a$ and differentiating $\int_{-\infty}^{\infty}\frac{1}{x^2+a}$ under the integral sign. – Victor Protsak Sep 29 '10 at 06:34
30

From Noam Elkies' AMS article Lattices, Linear Codes, and Invariants, Part I: Elkies has been discussing how difficult it is to find the minimal nonzero length of an element of a lattice $C$.

Sometimes an appropriate response to a difficult mathematical problem is to pose a much harder problem. Here we find the minimal nonzero length intractable, and thus ask for *all* the lengths of vectors of $C$ and their multiplicities. Equivalently, we ask for the following generating function of all the squared lengths, called the *theta function* (or *theta series*) of $C$:

$$\Theta_C(z) = \sum_{x \in C} z^{\langle x, x \rangle} = 1 + \sum_{m >0}^{\infty} N_m(C) z^m$$

where $N_m(C)$ is the number of lattice vectors of length $\sqrt{m}$.

It is hard to consider particular lengths but easier to consider the entire theta function because you give the problem more structure, and then you have access to stronger tools like Poisson summation.

Qiaochu Yuan
  • 114,941
29

I can't resist mentioning the following problem (and requesting that nobody gives away the solution any more than it is already given away by my mentioning it here).

Call a real number repetitive if for every k you can find a string of k digits that appears more than once in its decimal expansion. The problem is to prove that if a real number is repetitive then so is its square.

gowers
  • 28,729
  • 2
    This is trivial (does that give away the solution?) – Fan Zheng May 02 '18 at 00:47
  • 8
    and requesting that nobody gives away the solution any more than it is already given away by my mentioning it here and then I die poor, alone and without a proof that the square of a repetitive number is repetitive. – Olivier May 03 '18 at 11:41
  • 2
    hm, $\sqrt{2}$ is repetitive but 2 is not (unless we write it as $2.0\ldots$ as nobody does) – Fedor Petrov May 08 '19 at 11:13
28

The following was a conjecture for several years. Put a light bulb and a switch at every vertex of an $m\times n$ grid ($mn$ vertices in all). Each bulb can be on or off. Each switch changes the state of the bulb at its vertex and all its neighbors. (A neighbor is a vertically or horizontally adjacent vertex.) Then whatever the initial set of bulbs that are lit, it is possible to turn all the lights off by some set of switches. Sutner showed in 1989 that the corresponding result is true for any graph. Caro gave a simpler proof in 1996. The proof is an elegant application of linear algebra mod 2. Looking at grids adds an extra layer of complexity that obscures the underlying theory. One reference is http://mathworld.wolfram.com/LightsOutPuzzle.html.

  • 8
    I thought that the proof for arbitrary graphs assumes that the initial state has all lights on? Whereas for grid graphs it does not matter what the initial state is. For example if I have a triangle with one light on, I don't think I can turn all the lights off. – Tony Huynh Sep 27 '10 at 00:27
  • 2
    Tony is right: what the link actually says is that it is possible to go from "all lights off" to "all lights on" for any square grid.

    Another counterexample to the "any state" problem on arbitrary graphs is any trivalent graph. (Or pentavalent, or so on...)

    – dvitek Sep 27 '10 at 01:19
  • Incidentally, the one-dimensional version of this was the first problem in the qualifying round for the last Google Code Jam competition. – Per Vognsen Sep 27 '10 at 01:32
  • 16
    I guess the smallest counterexample would be a single edge with one light on, which also happens to be a 1x2 grid. – Tony Huynh Sep 27 '10 at 01:58
  • The all-lights on, arbitrary-graph version appears as a problem in my second-edition copy of Lovasz' Combinatorial Problems and Exercises, which was apparently completed in 1992. (Specifically, it's chapter 5, problem 17, part (c)).I don't have a first edition, but based on the tiny snippets Google Books lets me see, I'm betting that at least 17(a) was in the original. Does anyone have a handy copy to check? (Incidentally, the proof given is rather lovely, although not by any means obvious.) – Harrison Brown Sep 28 '10 at 22:03
  • I'll just comment that I view the proof itself as an instance of generalizing to make things easier. To prove that two matrices have the same rank prove the stronger statement that their column matroids are the same. – Tony Huynh Sep 28 '10 at 22:40
23

I like the matroid intersection theorem. Basically, this hammer often renders a problem trivial once you generalize to matroids. Also, it's nice that the hammer itself is not hard to prove. First the statement of the theorem.

Matroid Intersection Theorem. Let $M_1$ and $M_2$ be matroids on the same ground set $E$, with rank functions $r_1$ and $r_2$ respectively. Then the size of a maximum common independent of $M_1$ and $M_2$ is

\[ \min_{A \subseteq E} \ r_1(A)+r_2(E-A). \]

I won't include a proof, but you can find one in say Volume B of Combinatorial Optimization by Schrijver. It turns out that this theorem simultaneously proves many theorems such as König's theorem and Nash-Williams' theorem on covering graphs with $k$ edge-disjoint spanning trees. Here's another application.

Rainbow Spanning Trees. Let $G=(V,E)$ be a graph with a (not necessarily proper) colouring of $E$. Suppose you are trying to decide if $G$ contains a rainbow spanning tree. This is a spanning tree such that no two edges of the tree are the same colour. One obvious necessary condition is as follows. If I choose $t$ colours and remove all the edges with these colours, then the resulting graph better have at most $t+1$ components. Is this also sufficient? This seems that it could be rather tricky to prove. However, there is an easy proof using matroids.

Let $M_1$ be the matroid whose independent sets are those subsets of edges which contain at most one edge of each colour. Let $M_2$ be the cycle matroid of $G$. Then, $G$ has a rainbow spanning tree if and only if $M_1$ and $M_2$ have a common independent set of size $|V|-1$.

By the matroid intersection theorem it suffices that

\[ \min_{A \subseteq E} \ r_1(A)+r_2(E-A) \geq |V|-1. \]

Note that $r_1(A)$ is the number of colours among $A$. On the other hand, $r_2(E-A)$ is $|V|-c(E-A)$, where $c(E-A)$ is the number of components of the subgraph $(V,E-A)$. Substituting and simplifying, we conclude that $G$ has a rainbow spanning tree if and only if for all $A \subseteq E$,

\[ c(E-A) \leq r_1(A)+1, \]

as claimed.

Tony Huynh
  • 31,500
22

Sometime around 25 years ago, Dr. Jeffrey Vaaler at UT Austin gave me the following problem.. He needed the result as a lemma for a paper he was working on.

Let $n$ be a square-free integer with $k$ distinct prime factors and thus $\sigma(n) = 2^k$ divisors. Split the divisors into two sets of equal size: the small divisors $S$ and the large divisors $T$. The statement he was trying to prove was: $$\text{There exists a bijection}\ f: S \rightarrow T \hspace{2mm} \text{such that} \hspace{2mm} d \hspace{1mm} | \hspace{1mm} f(d) \hspace{2mm} \text{for every} \hspace{2mm} d \in S.$$ I was an undergraduate and highly motivated to demonstrate my usefulness, but I didn't really have many ideas about how to go about it. The obvious approach is by induction on $k$, but I never really got anywhere despite spending many hours on the problem.

A year later I ran into Dr. Vaaler in the hall and asked if he ever solved it. Of course, he had, by induction on $k$. He went on to explain the "trick" to making the induction work. He proved a more general result. Introduce a parameter $0 \le r \le \frac{1}{2}$ and consider $S_r$ and $T_r$, the smallest $\lfloor r \cdot 2^k \rfloor$ divisors and the largest $\lfloor r \cdot 2^k \rfloor$ divisors respectively, and instead prove the above statement with $S_r$ and $T_r$ in place of $S$ and $T$.

The lemma is then the special case with $r = \frac{1}{2}$.

This example stuck with me. How could it be easier to prove something more general? Though I understand the concept better today, it still surprises me.

  • How do you define $S$ and $T$? By letting $S$ be those divisors smaller than $\sqrt n$? – darij grinberg Nov 10 '10 at 16:21
  • There is a similar problem, by the way, with exactly the same trick: We divide the set of divisors of squarefree $n$ into two sets $S$ and $T$ of equal size, this time arbitrary (but of equal size). Then we claim that there exist at least $2^k$ pairs $\left(s,t\right)\in S\times T$ with at least one of $\frac{s}{t}$ and $\frac{t}{s}$ being a prime integer. Of course, for this problem the setting is artificial - ... – darij grinberg Nov 10 '10 at 16:26
  • ... instead of the lattice of divisors of $n$ we can just consider the lattice $\left\lbrace 0,1\right\rbrace^n$, and we claim that there are at least $2^k$ pairs $\left(s,t\right)\in S\times T$ with $s$ and $t$ adjacent. – darij grinberg Nov 10 '10 at 16:26
  • Yes, $f$ is a bijection. Edited. – I. J. Kennedy Nov 10 '10 at 17:55
  • 2
    By the way, the generalization may be stated as follows: Let A be any collection of subsets of $\{1,2,\dots,n\}$ s.t. if $U\in A$ and $V\subset U$, then $V\in A$. Then there exists a bijection $\pi:A\rightarrow A$ such that $V\cap \pi(V)=\emptyset$ for any $V\in A$. – Fedor Petrov Nov 20 '10 at 06:55
  • Re #7, the divisors of a squere-free integer. What's wrong with the following proof? By induction on k > 0, starting with the obvious case of a singleton, check that the family of all subsets of a finite k-set can be partitioned into subset-related pairs (by keeping the old arrangement + fattening the old arrangement by the new point everywhere). – Isaac Gorelic Sep 12 '11 at 11:40
  • 2
    The argument above suggests a fool-proof proof in the original divisibility terms: choose and fix one of the k primes. Those divisors of the full product which are not divisible by this fixed prime are "small," and those divisible are "big." The bijection is the multiplication by this prime. – Isaac Gorelic Sep 13 '11 at 09:25
18

How about the Bohr-Mollerup theorem:

There is a unique log-convex $f:[0,\infty)\to(0,\infty)$ that satisfies $f(x+1) = (x+1)f(x)$ and $f(0)=1$ (namely $f(x) = x! = \Gamma(x+1)$).

Here's a generalization that I think is easier to prove:

If $\delta:[0,\infty)\to\mathbb{R}$ is such that $\delta^{(n)}$ decreases monotonically to zero, then there is a unique $f$ such that $f(x+1) = f(x) + \delta(x)$, $f(0) = 0$, and $f^{(n)}$ is monotonically increasing.

(To recover the original statement, take $n=1$ and $\delta(x) = \log(x+1)$.)

The benefit of this formulation is that it can be proved by induction on $n$. The inductive step is pretty routine and involves applying the $(n-1)$-case to $f'$ and $\delta'$. The basis case for $n=0$ still requires a nontrivial argument, but to me it feels much simpler and more intuitive than the case $n=1$ (for the original theorem), and it's actually easy to remember (in contrast to the direct proof of the Bohr-Mollerup theorem, which I find hard to remember). All you have to do is apply the functional equation $k$ times to get $f(x+k)-f(k) = f(x) + \sum_{j=0}^{k-1}[\delta(x+j)-\delta(j)]$ and take the limit as $k$ approaches infinity; you end up with an increasing function of $x$ that is zero whenever $x$ is an integer and is thus zero for all $x$. Thus $f(x) = \sum_{j=0}^{\infty}[\delta(j) - \delta(x+j)]$ is the unique solution. (Since $\delta$ is decreasing, we do have an increasing function, and the sum converges since $\delta(x)\to 0$ as $x\to +\infty$.)

Darsh Ranjan
  • 5,932
17

This may be a little trivial, but there are a number of identities for the Fibonacci numbers that are most easily proved by generalizing them. For example, proving

$f_{2n-1}=f_{n+1}f_{n}-f_{n-1}f_{n-2}$

requires a rather convoluted process involving a couple lemmas unless one realizes that it is far easier to prove

$f_{m+n}=f_{m}f_{n+2}-f_{m-2}f_{n}$

and then substitude $m=n,n=n-1$.

I think a classic example of generalizing in order to prove a simple result is Galois Theory. Ruffini's attempted proofs of the unsolvability of the quintic were enormously long and tremendously complicated. However, once the machinery of Galois Theory is developed, which is rather easy, it is almost trivial to demonstrate that there exist quintic equations that are not solvable by radicals.

  • 9
    I would argue that turning the question about the quintic into the question "which polynomials can be solved by radicals" is not a simplification at all. To develop Galois theory is easy now, in retrospect, but back then it was a tremendous achievement and involved amazingly deep insights and innovative thinking. – Alex B. Sep 27 '10 at 01:21
  • 2
    The funny thing about Galois theory, as suggested in the recent Notices review of Duel at dawn http://www.ams.org/notices/201010/rtx101001297p.pdf, is that Galois was not snubbed because his ideas were perceived as incorrect, but rather because they were perceived as useless. Well, it was then and this is now, I guess... – Thierry Zell Nov 10 '10 at 15:55
17

A nice question !

An extreme case of the phenomenon you are asking about happens frequently enough to be worth mentioning (and I think that it even partially "explains" it); sometimes you can generalize a problem and then realize that the more general problem is already solved ! The point is that there are usually many ways to look at or formulate the same problem, and when the problem is looked at from some points of view it seems novel, but when correctly reformulated you can recognize it as a special case of a solved problem, perhaps even a classic one. Here is an example that I admit is a bit forced, but it illustrates the idea. Show that if a triangle C can be divided by a line from a vertex to an opposite side into two triangles A and B that are similar to the original triangle, and that if corresponding sides of the three triangles are c,a, and b, then $a^2 + b^2 = c^2$. This is of course completely trivial, since the areas of similar triangles are obviously proportional to the squares of corresponding sides while the areas of A and B clearly add up to the area of C. On the other hand, a special case of this: dropping the perpendicular from the right angle vertex of a right triangle onto the hypotenuse, gives a proof of the Pythagorean as a special case. (As you may know, this is how Einstein was supposed to have found a proof of the Pythagorean Theorem for himself.)

Dick Palais
  • 15,150
17

Doron Zeilberger wrote a very nice expository article entitled, "The method of undetermined generalization and specialization illustrated with Fred Galvin's amazing proof of the Dinitz conjecture," in which he discusses how repeated generalization and specialization can lead one to the solution of a difficult problem.

Timothy Chow
  • 78,129
14

I remember there was an exercise in linear algebra:

Find the determinant of the following matrix: \begin{pmatrix} 0 & a & a & \cdots & a\\ b & 0 & a & \cdots & a\\ \cdots & \cdots & \cdots & \cdots & \cdots\\ b & b & \cdots & 0 & a\\ b & b & \cdots & b & 0 \end{pmatrix}

A simple solution is to generalize this to a function: $$f(x) = \det\begin{pmatrix} x & a + x & a + x & \cdots & a + x\\ b + x & x & a + x & \cdots & a + x\\ \cdots & \cdots & \cdots & \cdots & \cdots\\ b + x & b + x & \cdots & x & a + x\\ b + x & b + x & \cdots & b + x & x\end{pmatrix} $$

Namely, add $x$ to every entry of the matrix.

It is then obvious that:

  • $f$ is a linear function of $x$;
  • $f(-a)$ and $f(-b)$ are easily computed;
  • the original determinant is just $f(0)$.
WhatsUp
  • 3,232
  • 16
  • 22
11

A very simple example of this phenomenon is also given by the following problem: prove that the maximum determinant of $n\times n$ matrices that have entries in the set $\{-1,+1\}$ is divisible by $2^{n-1}$. In fact, it is much easer proving by induction that all matrices with coefficients in $\{-1,+1\}$ have determinant divisible by $2^{n-1}$, because you can use induction and just say that changing a $-1$ to a $+1$ will change the determinant by the amount $2 * 2^{n-2}k$, by row expansion, and you can do so till when you get a matrix will all entries $1$, which has determinant $0$ when $n>1$. Note that it is not clear how you could use induction to prove the weaker statement about the maximum determinant.

Actually when a statement can be proved by induction it happens quite often that the correct statement that "makes induction work" is a somewhat generalized version of the result to be proved.

  • 2
    I think this leads to an alternative question of finding "simpler" proofs, where in this case "simpler" may mean "not using an induction principle" . The proof involving multiplication by -1 on columns to get a matrix with one row all ones and then subtracting this row from others to get an (n-1)x(n-1) minor which is a {0,-2} matrix seems to me simpler in this respect, and gives more information. I see a study of proof refinement based on several metrics of "to refine" that could start with your example. Gerhard "Ask Me About System Design" Paseman, 2010.09.26 – Gerhard Paseman Sep 26 '10 at 18:55
  • 1
    I am particularly interested on the example you cite. It was last Sunday that I began studying the collection of all matrices with such entries. I started enumerating all matrices with size 2X2 and as I progressed to the matrices of size 3X3, there were too many. Before I study them from scratch, is there any book you can point me to with regard to such matrices? One thing I noticed was that "the product of all the matrices of even size gives the zero matrix" What do you think? Thanks. – Unknown Sep 28 '10 at 17:18
  • 2
    @Gerhard: interesting comment, i didn't know your proof, that is even more synthetic.

    @Elohemahab: enumeration of such matrices is something that if required should be performed in an automated way, since for $3\times 3$ matrices there are already $2^9 = 512$ of them. I don't know of any book, by they have some interest because of Hadamard conjecture, you could give a look at the references on the page http://en.wikipedia.org/wiki/Hadamard_matrix (Hadamard matrices are the ${1,-1}$ that maximize the absolute value of the determinant). What do you mean by product, being it not commutative?

    – Maurizio Monge Sep 28 '10 at 19:15
  • I have started an MO on it here: http://mathoverflow.net/questions/40451/sign-matrices-1-1-square-matrices – Unknown Oct 08 '10 at 17:05
  • 1
    may not we just add first row to each of other rows and divide them by 2 after doing this? – Fedor Petrov Nov 19 '10 at 19:37
  • I support Fedor's comment. Your proof is unnecessarily complicated. Actually, this remains true whenever the entries are odd integers. – Denis Serre Aug 11 '15 at 16:33
11

One common example of the "generalize the problem" strategy is found in parameter differentiation.

$$\int^\infty_0 \frac{\sin^2(x)}{x^2} \, dx$$

We then create a more general case of $I(a)=\int^\infty_0 \frac{\sin^2(ax)}{x^2} \, dx$ where $a>0$.

\begin{align} I'(a) & =\int^\infty_0 \frac{2\sin(ax)\cdot\cos(ax)\cdot x}{x^2}\,dx \\[10pt] & =\int^\infty_0 \frac{\sin(2ax)}{x} \, dx \end{align}

So $I(a)=1/2\pi a$ and $I(1)=1/2 \pi$

Another similar cute problem is evaluate $\sum_{k=1}^n\frac{k^2}{2^k}$, we instead evaluate $S(x)=\sum_{k=1}^nk^2x^k.$

Edit: Just to finish the second example:

We know that $S(x)=\sum_{k=1}^nx^k=\frac{1-x^{n+1}}{1-x}.$ We take the derivative of this, then multiply that whole mess by x, and take the derivative again, and multiply by x again.

We get $S(x)=\sum_{k=1}^nk^2x^k= \frac{x(1+x)-x^{n+2}-x^{n+1}(nx-n-1)^2}{(1-x)^3}$, so $S(\frac{1}{2})=6-(\frac{n^2+4n+6}{2^n})$.

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
  • As it happens, the first example already appears on a subpage of the page linked to in the question: http://www.tricki.org/article/To_understand_an_object_consider_treating_it_as_one_of_a_family_of_objects_and_analysing_the_family (not that it would be reasonable to have expected you to check all subpages before giving your answer). As for your second example, it's not completely obvious what method you intend for evaluating S(x). Is it antidifferentiation? I myself would use the fact that S(x)(1-x) is of the form aT(x)+bU(x) where $T(x)=\sum kx^k$ and $U(x)=\sum x^k$ and that (1-x)T(x) is ... – gowers Jun 29 '11 at 08:52
  • a multiple of U(x). But then I would have gained nothing by the generalization. – gowers Jun 29 '11 at 08:52
  • I suppose it is along the lines of anti-differentiation. I edited the problem to include the solution. – Benjamin Horowitz Jun 29 '11 at 20:27
10

Perhaps the example of the calculation of fundamental groups works in this situation. Calculating, for instance, the fundamental group of the circle uses hard ideas from complex analysis (at least in many of the traditional approaches) but if one generalises to calculating the fundamental groupoid of the circle based at two points and one uses the groupoid van Kampen theorem, (not the group vKT) then the proof reduces to quite simple algebra. (The importance of that is really that it links the original complex analytic machinery into the geometric machinery in a neat way helping one to see the 'unity' or close symbiotic relationship of the two areas, which of course have a common heritage in Poincaré.)

Tim Porter
  • 9,187
  • 1
    As a matter of interest, how hard is the groupoid vKT? I ask because I want to get a feel for whether this generalization is really making the problem easier. I sort of feel as though there is an irreducible difficulty somewhere (basically making sense of winding number) but I'd be very interested to be proved wrong. – gowers Sep 26 '10 at 10:19
  • I was under the impression that for the traditional proof, you just have to verify that the exponential map $\exp: i\mathbb{R} \to U(1)$ is continuous, has a simply connected source (which is in fact contractible), and kernel $2\pi i \mathbb{Z}$. Is there a detail I'm missing? – S. Carnahan Sep 26 '10 at 11:25
  • On second thought, I suppose one needs to expend a certain amount of effort to show that $\exp$ is a homomorphism from an additive group to a multiplicative group, and that it sends imaginary numbers to the unit circle. The second claim follows from the first, once you establish that $\exp$ sends complex conjugates to complex conjugates and negatives to reciprocals. One can prove the first claim with relatively little analysis, e.g., only the fact that the power series converges everywhere. – S. Carnahan Sep 26 '10 at 11:43
  • 3
    Don't you then want to prove that any path in U(1) lifts uniquely (given the starting point) to a path in iR? It's not hugely hard, but it's the irreducible difficulty I was referring to when asking whether the groupoid approach could avoid it. – gowers Sep 26 '10 at 12:07
  • 1
    @ Scott There are things to prove, and if someone has a good knowledge of Complex Analysis then they are quite easy to prove. My preference in teaching this area was always to give the traditional proof as a sketch proof i.e. for the idea, then to look at the vKT and to derive the result a second time, hopefully getting the point on the interrelationships between areas across. – Tim Porter Sep 26 '10 at 12:14
  • 2
    @gowers. The groupoid vKT is not really much harder than the standard form, in fact it is almost the same proof! (Subdivision is the inverse of concatenation.) Where there is a slight degree of conceptual novelty is that the idea of a pushout has to be grasped. The pushout of two very simple groupoids is then easily checked to give a groupoid with vertex group an infinite cyclic group. The link with covering spaces can be done separately. I like the treatment in Ronnie Brown's book Topology and Groupoids, perhaps since that is one that he discussed many times,but it is a question of taste. – Tim Porter Sep 26 '10 at 12:17
  • 9
    Proving the groupoid vKT is not so different from proving the group vKT. As I see it, there are two main ways of computing a fundamental group: (1) Use vKT, writing a space as union of spaces with already-known fund gps, or (2) find a simply-connected covering space and use the tie between covering spaces of X and fund gp of X. For example, these can give a student two complementary insights into the fund gp of a Klein bottle. For the circle, method (1) is unavailable; but it becomes available when you pass to groupoids. – Tom Goodwillie Sep 26 '10 at 12:43
  • 3
    I'm not sure that I agree with this example, because the simplicity comes at the expense of extra baggage (groupoids instead of groups). I wish I had a better example (i.e. one I can agree with) of the fundamental groupoid making things genuinely easier. – Daniel Moskovich Sep 26 '10 at 15:33
  • 2
    The vKT for groupoids is substantially more natural than the vKT for groups. – Harry Gindi Sep 26 '10 at 16:03
  • 6
    I would disagree that groupoids are 'extra baggage' as they are extremely useful in proving results in group theory without resort to topological concepts. For instance that a subgroup of a free group is free and related results can be done just with groupoids, without the very complicated machinery originally used and without explicit use of covering spaces etc. Perhaps you could formulate your wish as a question in some way and see what the replies throw up. I also suggest looking at Ronnie Brown's Topology and Groupoids book as he has some examples that might be of interest. – Tim Porter Sep 26 '10 at 18:54
  • Funny computation of fundamental group of a circle (is it novel?): The circle is a topological group, so it has abelian fundamental group, which means it's fundamental group and first homology group coincide. Now just do the one line of computation for the homology and you are done. – Steven Gubkin Sep 28 '10 at 18:45
  • 1
    That is a valid point, but you are bringing a whole lot of other results and machinery into the process to replace the machinery you exclude. The machinery that you introduce IS important of course, but so is the groupoid vKT, and so is covering space theory, etc. Each approach has a minimum of new stuff that is needed. In fact I believe that it is the relationships between the different approaches that are the most fascinating thing here, but also the right level of generality, which gets us back towards the original question. – Tim Porter Sep 29 '10 at 11:21
  • 3
    Ya, I totally agree. The groupoid vKT is actually the most intuitive out of all of the approaches for me personally: Pushouts of spaces give pushouts of groupoids. How could the statement get any sweeter? I really mentioned the above computation because I think it shows that there is another nice way to compute fundamental groups (other than the two Tom mentions): If you know for some reason that the fundamental group is abelian, then you can compute homology which is much easier. – Steven Gubkin Sep 29 '10 at 15:04
  • 1
    I am a bit late in this thread, but I want to know which "ideas from complex analysis" that are used in the usual calculation of $\pi_1 (S^1)$ are considered to be "hard"? As far as I see, there is nothing more than the fundamental theorem of calculus and the periodicity of exp in that. There is one more idea used, which is absolutely fundamental in algebraic topology: approximate arbitrary continuous maps by better maps (PL, smooth or like that). The groupoid version of course also employs that idea; in tom Diecks new book it is on page 48,line 11. – Johannes Ebert Oct 24 '10 at 11:14
  • There is one thing that is substantially easier with the fundamental groupoid than with the fundamental group: the classification of coverings! The general version (cat. of coverings on X is equivalent to functor cat Fun(\Pi X; Set)) needs the same arguments, of course. Everything else follows formally from that statement. – Johannes Ebert Oct 24 '10 at 11:17
10

The free cocompletion. A lot of adjoint couples of functors are just particular case of this construction, and often it is easier to use the general theorem than working out a particular case by hand (for example for $i_{!}$ and $i^{!}$).

Ricky
  • 3,674
9

Here's an example in planar euclidean geometry. Consider an equilateral triangle of side $a$ and a general point in the plane distant $b$, $c$, and $d$ from the respective vertices. Then

$3(a^4 + b^4 + c^4 + d^4) = (a^2 + b^2 + c^2 + d^2)^2$.

This is an an awful slog to get by planar trigonometry. Even harder to do by trig in three dimensions is the corresponding result for the regular tetrahedron. However, it's easy to get the $(n - 1)$-dimensional result for a regular $(n - 1)$-dimensional simplex of side $d_0$, with vertex distances $d_1$ ,..., $d_n$ :

$n(d_0^4 + ... + d_n^4) = (d_0^2 + ... + d_n^2)^2$.

You can do this by embedding the euclidean $(n - 1)$-dimensional space as the hyperplane of points $(x_1 ,..., x_n)$ in euclidean $n$-space such that $x_1 + ... +x_n = d_0/\sqrt2$. The vertices of the simplex can then be represented as the points $(d_0/\sqrt2)(1, 0 ,..., 0)$, ... , $(d_0/\sqrt2)(0 ,..., 0, 1)$ in the hyperplane, and the result drops out in a few lines.

John Bentin
  • 2,427
8

It seems people have kept on posting answers here rather than in the older thread, so I'll put this one here as well.

I remember thinking about the phenomenon you described when I first came across the tensor power trick. I can't summarise the idea any better than the quick description given in that link; the point relevant here is that, even though one might want to prove something for a single object or a single family of objects, if one can prove it for a family that includes the one of interest and that is also closed under taking 'tensor products', then that might be easier.

Here is a quick example from the book by Tao and Vu. If $A$ and $B$ are finite non-empty subsets of an abelian group $G$, then a natural argument gives the sumset inequality

$$ |2B-2B| \leq 16 \frac{|A+B|^4 |A-A|}{|A|^4}.$$

Now, it is possible to get rid of the factor of 16, but if we had only proved the inequality for $G = \mathbf{Z}/p\mathbf{Z}$, say, then we might very well have trouble doing so. Given the more general statement, one can get rid of the factor by applying the inequality to the group $G\oplus G \oplus \cdots \oplus G$ (say with $M$ copies of $G$) with sets $A \oplus \cdots \oplus A$ and $B \oplus \cdots \oplus B$ and taking $M$th roots.

Another scenario that springs to mind is when one wants to prove a statement involving several instances of a single object $X$: it can sometimes be easier to prove the statement if one replaces some of the instances by possibly distinct objects $X_i$. For example, it seems to be easier to prove the Cauchy-Davenport inequality $|A+B| \geq \min(|A|+|B|-1,p)$ for sets $A,B \subset \mathbf{Z}/p\mathbf{Z}$ rather than its corollary $|A+A| \geq \min(2|A|-1,p)$, since one can induct on the size of $B$ (say) separately from the size of $A$. (For the particular induction proof I have in mind I guess this can be seen as an example of the 'strengthening the induction hypothesis' idea as well.)

gowers
  • 28,729
  • 1
    A similar example to Cauchy-Davenport: There's a well-known direct proof that the Ramsey number $R(k, k)$ is at most $4^k$. But I doubt there's an easy way to improve the upper bound to $\binom{2k - 2}{k-1}$ while only considering diagonal Ramsey numbers. But as soon as you introduce the off-diagonal ones, you find that the more general result that $R(m, n) \leq \binom{m+n-2}{m-1}$ has a straightforward half-page proof. Actually, something like "Introduce a new degree of freedom" may well deserve its own subpage. – Harrison Brown Sep 29 '10 at 04:33
7

The Amitsur-Levitski Theorem says that the standard non-commutative polynomial $S_{2n}$ vanishes identically over the matrix algebra ${\bf M}_n(k)$. Recall that $$S_r(X_1,\ldots,X_r)=\sum_{\sigma\in{\frak S}_r}\epsilon(\sigma)X_{\sigma(1)}\cdots X_{\sigma(r)}.$$ For instance, $S_2(X,Y)=XY-YX$ is the commutator.

The original proof was about unreadable. When S. Rosset had the idea to embed the problem into ${\bf M}_n(k)\otimes\Lambda_2(k^{2n})$, the theorem became an easy consequence of the fact that if $N,N^2,\ldots,N^n$ are traceless ($N$ an $n\times n$ matrix), then $N^n=0$.

Denis Serre
  • 51,599
  • More details? A readable ref? I recall seeing a proof using the traceless nilpotency criterion once (one of the most outlandish tricks I've ever seen used in linear algebra), but I don't recall it proving a generalization! – darij grinberg Aug 11 '15 at 15:39
  • 1
    @darijgrinberg: The paper by C. Procesi, http://arxiv.org/pdf/1308.2421.pdf, contains proofs and references. See also Gil Kalai's blog, https://gilkalai.wordpress.com/2009/05/12/the-amitsur-levitski-theorem-for-a-non-mathematician/ – ACL Mar 04 '16 at 22:01
7

Thomas Royen has recently proven the Gaussian correlation inequality by generalizing from Gaussian distributions to gamma distributions.

Says Royen: "...it occurs frequently that a seemingly difficult special problem can be solved by answering a more general question."

Quanta Magazine has a profile of Royen, and a (very brief) discussion of the proof.

https://www.quantamagazine.org/20170328-statistician-proves-gaussian-correlation-inequality/

Royen sounds like another exception that proves some rules? (Not knowing LaTeX, almost self-publishing, etc.)

Mark S
  • 2,143
7

It may be matter of taste or being used to certain ways of thinking, but I find that this proof that the opposite of a category of finitary algebras isn't itself a category of finitary algebras again, is somewhat involved, compared to the proof of the statement that the opposite of a locally presentable category is not again locally presentable (if it's not a poset) - this has both weaker hypothesis and stronger conclusion.

(the original statement follows because a category of finitary algebras is locally finitely presentable, see here)

Peter Arndt
  • 12,033
7

The family of examples that leaps to my mind is the sub-trick of "strengthening the inductive hypothesis," which I thought I wrote a Tricki page on but now see I abandoned after doing epsilon of editing. (I may still have a rough draft, or at least notes, somewhere; I'll try to dig it up in the next week or so.) My all-time favorite example of this is Carsten Thomassen's proof that planar graphs are 5-choosable, which in fact proves the following:

Let $G$ be a planar graph whose interior is triangulated; let $v_1, v_2$ be adjacent vertices lying on the infinite face of $G$; and let $\{L_v\}$ be a family of lists associated to the vertices $v \in V(G)$, $v \neq v_1, v_2$ such that $|L_v| = 3$ if $v$ lies on the infinite face, and $|L_v| = 5$ otherwise. Furthermore, fix the colors of $v_1, v_2$ (ensuring that they are not colored the same). Then $G$ is $L$-choosable.

The above is proved by a fairly straightforward induction; the 5-choosability theorem follows immediately as a corollary.

  • The general idea, of course, is that oftentimes to inductively prove property P holds for a given object you'd really like your inductive hypothesis to be some stronger property P'; on occasion, though, it turns out that P' is a strong enough hypothesis to prove that P' holds for the original object as well. – Harrison Brown Sep 28 '10 at 22:47
7

Both methods are obviously useful, and I may have under appreciated one of them, but on the relative value of generalization versus specialization in problem solving, I offer the following opinion from Hilbert:

"If we do not succeed in solving a mathematical problem, the reason frequently consists in our failure to recognize the more general standpoint from which the problem before us appears only as a single link in a chain of related problems. After finding this standpoint, not only is this problem frequently more accessible to our investigation, but at the same time we come into possession of a method which is applicable also to related problems. The introduction of complex paths of integration by Cauchy and of the notion of the ideals in number theory by Kummer may serve as examples. This way for finding general methods is certainly the most practicable and the most certain; for he who seeks for methods without having a definite problem in mind seeks for the most part in vain.

In dealing with mathematical problems, specialization plays, as I believe, a still more important part than generalization. Perhaps in most cases where we seek in vain the answer to a question, the cause of the failure lies in the fact that problems simpler and easier than the one in hand have been either not at all or incompletely solved. All depends, then, on finding out these easier problems, and on solving them by means of devices as perfect as possible and of concepts capable of generalization. This rule is one of the most important levers for overcoming mathematical difficulties and it seems to me that it is used almost always, though perhaps unconsciously."

roy smith
  • 12,063
6

Here is an example relevant to this issue and my work with (the late) Jean-Louis Loday. I visited Strasbourg in November 1981 and gave a seminar on my work with Philip Higgins on a higher van Kampen Theorem. He told me of a conjecture of his on the cofibre of a "connected" square of maps. Looking at this I found that this conjecture could be described as a triadic Hurewicz Theorem. I also explained that in the work Higgins, we showed that the classic relative Hurewicz Theorem could be deduced from our much more general higher van Kampen Theorem for relative homotopy groups. So it would be good to deduce a triadic Hurewicz Theorem from a van Kampen Theorem for triadic homotopy groups. Jean-Louis then was convinced that a van Kampen theorem for his cat-$n$-groups was true, and would be easier to prove than the more special result. This turned out to be the way the work went, and the theorem and this consequence, as well as others, were eventually proved, and they appeared as

R. Brown and J-L. Loday, ``Van Kampen theorems for diagrams of spaces'', Topology 26 (1987) 311-334.

R. Brown and J-L. Loday, ``Homotopical excision, and Hurewicz theorems, for $n$-cubes of spaces'', Proc. London Math. Soc. (3) 54 (1987) 176-192.

Ronnie Brown
  • 12,184
6

Here is a blog post I wrote about this. I contains three elementary examples of problems from the IMC that you can solve by generalizing.

6

Here is another of my favourite examples:

Prove that $({\mathbb R},+)$ and $({\mathbb R}[x],+)$ are isomorphic as abelian groups.

It is fairly easy to prove that they are actually isomorphic as ${\mathbb Q}$-vector spaces, which is a stronger result; other than that I don't know any way of proving this.

Nick S
  • 1,990
6

In the case of proofs by induction, the reason it may be easier to prove a stronger result can be simply that one can use a stronger induction hypothesis.

(I think of the example of proving Łos's theorem in model theory. It says something about formulas that may have free variables. It's proved by induction on the formation of first-order formulas. Imagine trying to prove it only in the case of sentences without free variables. A weaker statement. The proof by induction doesn't work for that case.)

Michael Hardy
  • 11,922
  • 11
  • 81
  • 119
  • 1
    Yes, this is the subject of its own MO question: http://mathoverflow.net/questions/31699/strengthening-the-induction-hypothesis – Qiaochu Yuan Jun 30 '11 at 04:37
6

I've already posted an answer on this thread, but I found another example I'd like to describe separately. Let $r > 0$ and consider the following problem, coming from compound interest or as one definition of $e^r$:

Show that $f(n) = (1 + \frac{r}{n})^n$ increases with $n$.

One generalize the problem strategy is to allow $n$ to be a continuous variable (probably this trick could have its own article). Now, see if you can prove that $f(n)$ still increases. If you take this mindset, it's natural to use the definition of $n$th power for $n \in {\mathbb R}$ and write

$f(n) = e^{n \log(1 + \frac{r}{n})}$

And the problem has reduced to showing that $x \log(1 + \frac{r}{x}) = \int_0^1 \frac{r}{(1 + \frac{sr}{x})} ds$ increases with $x$, which it clearly does. (Here we've used the integral definition of the logarithm, but written in a way typically helpful for analyzing such products.)

Another problem that can be solved through allowing a discrete parameter to be continuous is to prove Stirling's approximation for $n!$ (although to make that proof very clean you can also use other labor saving tricks like Taylor expansion by integration by parts and the dominated convergence theorem).

If you ran into this problem from compound interest, or you were hoping for something more elementary which did not use such a heavy understanding of the exponential function, then you probably want to find a different proof. But finding a different proof still seems to require "generalizing the problem", but in a different way.

Another proof, goes as follows. Imagine that interest at a rate $r$ works as follows: once an amount of money is invested, the value of each unit after a time $t$ is given by $(1 + tr)$. That is, the value of the money grows linearly. Now imagine you had the opportunity to withdraw and immediately reinvest your money at a time of your choice. Having this ability would allow you to raise more money, because it would allow you to accrue interest on the interest you've already earned (hence the name "compound interest"). With this interpretation, the number $(1 + \frac{r}{n})^n$ is the value of each unit of money after time $1$ and $n$ regularly spaced compoundings.

The proof now goes as follows: if you had a choice of when these compoundings would occur, then the more compoundings the better, and the best way to allocate $n$ compoundings is to have them occur at $n$ regularly spaced time intervals. That is, we interpret $(1 + \frac{r}{n})^n = \max \prod_{i=1}^n (1 + a_i r)$ under the constraint that $0 \leq a_i \leq 1$ with $\sum a_i = 1$.

For example, it is better to have one compounding than to have none at all, because after withdrawing and reinvesting the money, now not only does the initial investment grow linearly, but also the interest you earned before the withdrawal grows linearly. For the same reason, given $a_1, \ldots, a_n$, the opportunity to compound once more during, say, $0 < t < a_1$, would allow you to increase the amount of money at all later times.

The fact that the best choice of $(a_1, \ldots, a_n)$ is to have $a_1 = a_2 = \ldots = a_n = \frac{1}{n}$ is the principle that the largest product you can obtain when the sum of positive numbers is fixed is to have all the terms equal. This is easy to check with two variables: you can either find the largest rectangle to fit inside an isosceles triangle, or otherwise just note that if $a_1 \neq a_2$, then changing to $a_1' = \frac{(a_1 + a_2)}{2} = a_2'$ gives an improvement for $(1 + a_1 r)(1+ a_2 r) < (1 + a_1' r) (1 + a_2' r)$. The case of $n$ variables actually follows from this observation.

So if you really wanted some elementary solution to the problem, this one would do. It's an interesting example because you can see that either solution involves some kind of generalization, but the two generalizations are unrelated to each other. The first one does not need to / is unable to consider these non-even partitions. The second does not need to / is unable to consider fractional $n$.

By the way, does anyone know how to prove in an elementary way (i.e. expanding) that $\prod_{i=1}^n (1 + a_ir)$ tends to $e^r = \sum \frac{r^k}{k!}$ as $\max |a_i| \to 0$ with $0 \leq a_i \leq 1$ and $\sum a_i = 1$? An easy solution goes by writing the product with the exponential function so that you get the exponential of $\sum \log(1 + a_i r) = \sum \int_0^1 \frac{a_i r}{(1 + s a_i r)} ds$.

You can then integrate by parts (i.e. Taylor expand) to obtain $\sum a_i r - \sum \int_0^1 (1-s) \frac{(a_i r)^2}{(1 + s a_i r)^2} ds$. Now, $\sum a_i r = r$ is the main term. After you take $\max |a_i|$ to be less than $.5 / |r|$, the error term is bounded in absolute value by $C \sum (a_i r)^2 \leq \max \{ |a_i| \} \cdot \sum a_i |r|^2$. I can, of course, move this question to a different thread.

EDIT: I realized later on that there is a completely elementary proof, and it is also completely obvious even though I didn't think of it. Namely, you expand $(1 + \frac{r}{n})^n$ into powers of $r$, and it is easy to see after a little algebra that each coefficient increases with $n$. I still find the other solutions interesting, but this turns out not to be a good demonstration of how generalizing can make a problem easier. By the way, the last question I had asked was answered in this thread:

A limiting product formula for the exponential function

Phil Isett
  • 2,213
5

Evaluating the integral of a function by determining first its Fourier transform can simplify a problem in some cases. The integral of the sinc function provides a simple example:

$$\frac{\sin(\pi x)}{\pi x} = \int_{-1/2}^{1/2} \exp(2 \pi i x f) \; df = \int_{-\infty}^{\infty} Rect(f) \exp(2 \pi i x f) \; df \; ,$$

so the Fourier transform gives

$$\int_{-\infty}^{\infty} \frac{\sin(\pi x)}{\pi x} \exp(-2\pi i f x) \; dx = Rect(f)\: ,$$

and evaluating at $f=0$ gives

$$\int_{-\infty}^{\infty} \frac{\sin(\pi x)}{\pi x} \; dx = 1.$$

Similarly, you can evaluate the integral of the square of the sinc function by using the convolution theorem, i.e., by convolving the rectangle function with itself.

Tom Copeland
  • 9,937
5

In teaching calculus, I found a great example of this phenomenon: differentiation. For instance, consider the function $f(x) = x^x$ (for $x \in (0,\infty)$). I have no idea how to evaluate the limit

$$ f'(2) := \underset{x \to 2}{\lim} \frac{ x^x - 4}{x-2} $$

directly. But you can use logarithmic differentiation to find that $f'(x) = x^x (1+\ln x)$, which implies that

$$f'(2) = 4(1 + \ln 2)~. $$

So it's easier to compute $f'(x)$ and then specialize to $x=2$ than it is to compute $f'(2)$ directly.

I think this is essentially true for all functions, that it's easier to find the function $f'(x)$ than it is to directly compute a given value $f'(a)$.

5

I see several answers to this question that rely on the fact that it is often easier to prove a stronger result in the case of proofs by induction. This phenomenon is explained in this answer : https://mathoverflow.net/a/69157/120363

A generalisation of this remark is the following: when you want to prove that an object has a property $(A)$, it is sometimes easier to prove that it has a stronger property $(B)$, because $(B)$ has more nice preservation properties than $(A)$. This is exactly what happens in the case of inductions, since sometimes a stronger induction hypothesis passes more easily from the case $n$ to the case $n+1$.

Another wonderful (in my opinion!) example of this phenomenon is the following. Say that a quasi-ordering (i.e. a reflexive, transitive binary relation) is a well quasi-ordering (wqo) if it has no infinite strictly decreasing sequences and no infinite antichains. This is an important notion in combinatorics, complexity, etc., but it has little preservation properties. In the middle of the 20th century, there were several open and seemingly very difficult problem about this notion. For instance:

  1. Is the class of countable linear orderings well-quasi-ordered by the embeddability relation? (Known as Fraïssé's conjecture.)
  2. Is the class of countable trees well-quasi-ordered by the minor relation?

In the late 60's - 70's, a stronger notion than wqo, the notion of better quasi-ordering (bqo), was defined and studied by Nash-Williams. The definition of a bqo is quite scary, to say so. And, for applications of the theory (to combinatorics, complexity, etc.), knowing that a particular quasi-ordering is bqo is not more useful that knowing that it is wqo. What's the point of this notion, then? The point is that the bqo property is closed under much more interesting operations than the wqo property. So in general, it is much easier to prove that a particular ordering is bqo rather than wqo. This notion was successfuly used by Nash-Williams to prove conjecture 2., and a few years later by Laver to prove conjecture 1.

To give you an idea, this is the definition of bqo: a quasi-ordering $(Q, \leqslant)$ is bqo if for every borel mapping $f\colon [\mathbb{N}]^\infty \to Q$ with countable image (where $[\mathbb{N}]^\infty$ is the set of infinite subsets of $\mathbb{N}$ with its usual Polish topology), there exists $M \in [\mathbb{N}]^\infty$ such that $f(M) \leqslant f(M \setminus \min(M))$. It's easy to prove that every bqo is wqo, however this definition is quite awful, isn't it?

5

Bruce Schneier has an online paper called "A Self-Study Course in Block-Cipher Cryptanalysis": http://www.schneier.com/paper-self-study.pdf containing an extensive list of algorithms to cryptanalyze as exercises. By far the easiest exercise is this one:

[Cryptanalyze] a generic cipher that is “closed” (i.e., encrypting with key A and then key B is the same as encrypting with key C, for all keys).

The solution to this exercise would be a lot less obvious had Schneier instead pointed to some particular block cipher that has this property. But because the reader is told nothing about the cipher except that it is closed, he immediately knows exactly what to attack.

dfranke
  • 223
5

A Real Algebraic Geometry Example

Semialgebraic sets are very nice: they are closed under Boolean operations (obvious) and projections (not so obvious, but old result of Tarski-Seidenberg). Semianalytic sets are not so nice, because they're not closed under projections.

What is one to do if one wants to study them nonetheless? Shift the focus to projections of semianalytics instead, a.k.a. subanalytic sets. Those sets are closed under projection by construction, but all of a sudden, the Boolean algebra property is not so clear. But that's where Gabrielov's theorem of the complement comes in: the complement of a subanalytic set is again subanalytic. We now have a nice structure in which reside all the natural geometric operations we may want to do.

Thierry Zell
  • 4,536
4

An av-subgraph of graph $G$ is a subgraph that includes all of the vertices of $G$. Proving that the number of av-subgraphs of a complete graph with $N$ vertices is $2^{N(N+1)/2}$ is harder than proving the number of av-subgraphs of a graph with $E$ edges is $2^E$.

LSpice
  • 11,423
4

Quantum topology is much easier for knotted tori in R4 than for knots in R3. The former is a generalization of the latter, because for any knot you can take the boundary of a tubular neighbourhood of the inclusion of the knot into R4, which is a knotted torus.
The reason that the more general problem is easier is that the projectivization (homomorphic expansion) of the space of knotted tori in R4 gives rise to a space of diagrams $\mathcal{A}$ containing oriented chords. The homomorphic expansion of the space of knots, on the other hand, gives rise to a space of diagrams in which the chords are not oriented. Oriented, based trees are much simpler combinatorial objects that unoriented, unbased trees. In particular, the Drinfeld associator, which is the most painful aspect of quantum topology of knots, vanishes in $\mathcal{A}$.
The upshot of the generalization is that the universal finite-type invariant for knotted tori in R4 is the Alexander polynomial, which is a homological invariant, and which is immeasurably simpler then the universal finite type invariant for knots, the Kontsevich invariant.
In fact, a further generalization, allowing "trivalent vertices" in the knotted tori (where two tubes fuse into one) simplifies the algebra yet further and allows the proof of theorems relating the value of the Alexander polynomial of such an object with its cablings. Again, this is motivated by the algebra- we expand the class of topological objects under consideration in order to create an associated graded space which looks as much as possible like a quantized Lie (bi)algebra, from where our invariants are going to come, and which we are supposed to know how to handle.
Dror Bar-Natan discussed this, and related ideas, in a series of talks in Montpellier.

Added: This doesn't obviously solve a problem. It's a non-obvious example of a problem being easier for a more general class of objects; and you can hope to use insights gained from the easier problem to attack the harder, less general problem.

4

The LLL algorithm to factor polynomials with integer coefficients. Previously people had been fussing with Hensel lifting and tons of other methods that (imo) were far too complicated. (For a good reference on LLL and factoring polynomials, also see Yap's excellent book and his chapter on lattice reduction ).

LLL solved the more general problem of finding short (or 'short enough') vectors on integer lattices in higher dimensional spaces. This was then used to to encode the problem of factoring polynomials with integer coefficients in it. As an added bonus, the lattice reduction techniques presented also solved the simultaneous Diophantine approximation problem, but that somehow doesn't seem as striking as integer polynomial factorization.

4

In the computation of spectral measures for self-similar groups, one has a sequence of matrices and one wants a recursion for the characteristic polynomial. Usually one gets a recursion for some multivariable polynomial obtained by adding new parameters and then specializes the parameters to obtain the characteristic polynomial. This happens in Grigorchuk and Zuk's computation of the spectral measure of the lamplighters group.

4

Using residue theorem to compute integrals over real line intervals is an example of solving a problem by considering it in a more general setting: usually, the integrand is complexified and a closed contour is built by attaching a semicircle to the interval [-a,a].Then the integral over the contour is computed using the residue theorem and the original integral is obtained as the limit of contour integrals. This works e.g. for the function $$\int_{-\infty}^{+\infty}\frac{e^{itx}}{x^2+1}dx$$ In some cases the contour gets more complicated, to avoid branch points, as when computing $$\int_0^{\infty}\frac{dx}{x^a+1}, \quad a>1$$ Sometimes the integral over an interval is replaced by an integral over the unit circle, e.g., for $$\int_0^{\pi}\frac{d\theta}{a+\cos \theta}, \quad a>1$$ (here one also uses the equality $\cos z = (1/2)(z+1/z))$. Ahlfors's text in complex analysis explains this method in more detail. (Some other texts seem to have just a haphazard collection of examples following the statement and proof of residue theorem.)

This is not so much of proving a stronger result first, but rather making a problem tractable at all by using a more general approach (replacing real functions with complex ones and computing residues instead of actually integrating over the contours).

  • 1
    Your latest edit sums up my view of your post: it's a nice example, but does not quite fit with the original question. – Thierry Zell Jul 01 '11 at 15:41
3

I have some examples, but I might have more words of warning than examples. Example 1. Let's say you have just the axioms for the real numbers, and you want to establish that $2$ is not zero, but all you know is that $1$ is not zero (perhaps because you want to divide by 2 for whatever reason). Knowing that there is a field with $2$ elements where $2= 0$, you discover you have no choice but to prove 2 is positive, even though it's not what you were going for. (This is an example where taking on a more general point of view does not give you something different to prove instead, but limits the approaches to what you're trying to prove.)

So in order to prove 2 is positive, it suffices to prove 1 is positive. And here it somehow just makes sense to prove that the square of any nonzero number is positive -- from this point of view 1 is positive "because it's a square".

Here's another example from real analysis. You want to prove things about the cube root function from first principles. Like the fact that it's well-defined, continuous, differentiable, concave, etc. Typically one approaches by proving general theorems about smooth, increasing functions, and applies these general theorems to the function x^3 to learn about its inverse.

So while this seems like the kind of example you're going for, I personally don't like the example only because it suggests this is the "right" way to go. Often when one takes a "generalize" approach solving the problem can become easier, especially if you already know the general facts which end up doing the job. But for this example, defining the cube root is just a bit easier than using the general intermediate value theorem since you can define it as $f(x) = \sup \{ t : t^3 \leq x \}$ and prove directly that $f(x)^3 = x$.

Moreover, it's also a good example to use the implicit definition to prove things directly for this function just because it makes manipulations concrete. For example, since $f(x+h)^3 = x + h$, we have

$ (f(x+h)^3 - f(x)^3) = h $

is also equal to

$ (f(x+h) - f(x)) \cdot \int_0^1 3 (s f(x+h) + (1-s) f(x))^2 ds $

If you look at this expression for a little while, you will be able to deduce things like monotonicity, uniform continuity on any interval $[\epsilon, \infty)$, differentiability on such intervals, concavity, so on. The point is that even when general technology works, it can be quite instructive to use "general methods", but execute them on explicit examples.

You might argue that these direct proofs are the most insightful, but one is more likely to first find a general proof (especially if the appropriate general apparatus already exists), and then the direct proofs can only be found later indeed because they require a more direct understanding of the specific problem. I personally use the cube root of 7 (which nobody can name) in calculus lectures to motivate the concept of continuity, so I agree here it's a very natural use of the concept to define the cube root.

Differential equations have, I think, a lot of examples. If you want to learn something about the exponential function and your point of view is that it is the unique solution to $\frac{df}{dx} = f$ (or $\frac{df}{dx} = i f )$ with data $f(0) = 1$, then in order to give a differential equation-type analysis to prove the basic properties (which is fun) you will often have to consider the same differential equation with more general data (and in particular prove uniqueness of the $0$ solution). In fact, you may be tempted to consider more general ODE's like $\frac{df}{dx} = F(x, f)$ because this point of view can inspire some of the techniques. It also helps greatly (say if you want to prove $e^{it}$ is periodic) to just know what a vector field is and have that point of view.

I think the problem is that the general theorems can easily end up being forced to consider cases which are more complicated than the problem at hand. Even if you find a proof, that doesn't necessarily mean you should settle for it. If you look at the function $f(x) = 1/k$ where $k$ is the smallest positive integer such that $kx \in {\mathbb Z}$, and you want to show $f$ is Riemann integrable, you can of course prove a theorem that characterizes Riemann integrable functions in terms of the measure of their sets of discontinuity. Of course, if you know this theorem off the top of your head, it is "easy" to prove $f$ is Riemann integrable. But this proof is inefficient because it invokes a theorem that takes care of the nastiest examples of Riemann integrable functions (which this $f$ is not).

One more example: it's very easy to find a faulty proof of the chain rule if your point of view is not general enough. The first thing many people try to do to analyze the difference quotients for $f(g(x))$ is to multiply and divide by $g(x+h) - g(x)$. But then you get into these hairy problems where the number by which you divide may be zero. If you want to find the correct proof, you should take the point of view that the chain rule is a statement which should be true for maps between Euclidean spaces of any dimension. It just says the linearization of the composition is the composition of the linearizations. With this point of view in mind, you should not dare to try dividing, because dividing by $g(x+h) - g(x)$ does not even make sense in this generality.

Here's another: prove that $\int_{\mathbb R} e^{i \xi x} e^{- x^4 } dx$ is bounded by $\frac{C}{(1 + |\xi|)^2}$. Somehow you have to recognize the key features of $e^{-x^4}$ are that it will cancel against something very oscillatory because it is so smooth. Similar example problem: prove the decay for large $\xi$ of $\int_{\mathbb R} \log(1 + .5 \sin(\xi x) ) e^{- x^4 } dx$. It seems like this "look for a more general setting" trick really needs to be drilled into you for it to work because... it may require some imagination or experience to know what the right general setting is.

What some examples show is that finding the proof may be much more difficult, maybe impossible, without a willingness to work in this direction.

Phil Isett
  • 2,213
3

Here is a riddle which proves to be extremely hard: Imagine a finite assembly in which some people happen to be friends (friendship is a symmetric relation but not transitive and you are not your own friend). Now it happens that anytime two persons have the same number of friends, they do not have any common friend. The conclusion to be proved is that there is at least one person that has one and only one friend.

A proper generalization of the conclusion makes the riddle almost trivial.

guy
  • 21
2

When I saw the recent question Kasteleyn's formula for domino tilings generalized?, I had the idea of generalizing it by introducing an additional variable. I didn't manage to solve the generalized question yet, but working on it, it suddenly occurred to me a couple of days later that the additional variable provided in fact an easy answer of the initial question.

Wolfgang
  • 13,193
2

The Heine–Cantor Theorem: Proving it for real functions on a bounded interval is messy. But proving it for continuous functions on a compact metric space, is short, neat and easy.

Todd Trimble
  • 52,336
Omer
  • 1
  • 4
    mathwonk had edited this question into the text, now removed: doesn't that leave the task of proving a finite interval of the real line is compact? – Todd Trimble May 08 '19 at 10:46
  • @ToddTrimble: for a totally ordered field, the Heine-Borel property is equivalent to Dedekind completeness... so you can just choose your definition so that this task is trivial. :-) – Willie Wong Jun 01 '23 at 16:09
  • @WillieWong In case it wasn't clear: that wasn't my question, it was mathwonk's. If you look at the edit history, it should be clear what's going on. :-) – Todd Trimble Jun 01 '23 at 16:41
  • @ToddTrimble: ah, thanks for the clarification. I actually had no idea what the first clause in your comment meant until your response just now. – Willie Wong Jun 01 '23 at 17:30
1

A result from the present site illustrates this nicely. The task is to prove the very challenging inequality $$\left(\frac{x^n+1}{x^{n-1}+1}\right)^n+\left(\frac{x+1}{2}\right)^n\geqslant x^n+1$$for $x>0$, where $n$ is any natural number. The first step is to generalize it to$$\begin{equation} \left(\frac{x^a+1}{x^{a-1}+1}\right)^{a+b-1}+\left(\frac{x^b+1}{x^{b-1}+1}\right)^{a+b-1}\geqslant x^{a+b-1}+1, \end{equation}$$where $a$ and $b$ are arbitrary real numbers $\geqslant1$. Thereafter, a series of ingenious substitutions, along with yet more generalizations, enable a proof by relatively elementary mathematics.

John Bentin
  • 2,427
0

I am amazed by this idea, since to me the fundamental principle of problem solving is to make the problem easier, and i always assumed this meant making it more special. It is true that proving a theorem is easier by ignoring irrelevant facets, but these are only known after solving the problem. I find it is more productive in discovering which facets are relevant to do various examples, gradually trying to generalize the argument. Even Deligne proved the Weil conjectures first for K3 surfaces.

roy smith
  • 12,063
  • 4
    True, but what if when one makes the problem more special, the extra information is competelly irrelevant for the problem, and more it is also missleading.

    Very simple example (probably not the best): Let $a,b,c >0$. Prove that

    $\sin(a) + \sin(b)+ \sin(c) \leq \frac{a^3+b^3+c^3}{abc} ,.$

    This problem has two obvious trivial generalisations, and in both of them it becomes pretty clear that it is irrelevant that the $a,b,c$ on left/rigth sides of the inequalities are the same, but many students could be misslead by this fact on the wrong path.

    – Nick S Sep 28 '10 at 18:59
  • I agree it is possible to find an overly special solution to a special case, but in my experience it does not happen that often and the opposite happens more. A little example is the Torelli theorem for curves of genus 4 by intersecting the tangent quadric and the osculating cubic at the unique pair of conjugate double points of the theta divisor. Then the idea of intersecting all the tangent quadrics at all double points in higher genera is not too great a stretch. I agree that there are people who more easily grapple with more general versions of a problem but I am not one of them. – roy smith Oct 31 '10 at 21:14
  • 1
    I suspect there is some confusion here between the relative ease of proving a theorem in a more general setting and actually thinking up the idea. I have always found it easier to think of a solution in a more special case, but then once the problem is understood, it is easier to separate out the crucial parts, and give them in a general setting. Mumford told me even Grothendieck worked this way. He would begin from a simple idea, and reflect on it until he had placed it in its most general possible setting. There is also the dichotomy between conjecturing a solution and proving it. – roy smith Nov 18 '10 at 16:33
0

I have had to use something similar in order to complete a proof by induction.

I had a sequence $\{a_n\}$ defined by induction $a_{k+1}=f(a_k)$ and I needed to show a property linking $a_n$ and $a_{n+1}$, say $p(a_n,a_{n+1})$ for every $n$.

Now, says I suppose the property true for $a_n$ and try to show it for $a_{n+1}$, I get $p(a_{n+1},a_{n+2}) = p(f(a_{n}),a_{n+2})$, but you can never hope to use the induction hypothesis there, since there is nothing linking $a_n$ and $a_{n+2}$. The way I got around this was to instead show the property $p(a_n,a_{n+k})$ for every k.

-1

The FARMER and THE HORSE DRING AT THE RIVER : a folklore problem.

A farmer together with his horse (at point A) must go home (at point B). Points A and B are on the same side of a river (a line D) . The horse must be led to the river to drink ( anywhere on the river is possible). PROBLEM : Find X on D that minimize their walk back home = d(A,X)+d(X,B)

SOLUTION : Ask the same problem in 3D space : D is still a line and A,B are still points.
    a) Special cases: If A (resp B) is on the river the solution is trivially X=A (resp.X=B).
    b) Fact : (A,D) (resp (B,D)) defines a half plane Pa (resp. Pb)
    Now think of Pa,Pb as the two sides of a book partly open ( spine touching the table) where D is the spine of the book , and A,B are on left and right page of it.
    Open the book flat on the table : clearly drawing a straight line between A and B is minimal and the intersection of the line with D (the spine) is the point X wanted.
    There is a special case where the book is closed at start ( Pa=Pb i.e. zero partly open) yet you may still open it flat , this is the original problem.
REMARK : If not careful you may produce a wrong demonstration when missing that Pa=Pb is possible and need to look at the half plane as a double one folded on itself.

QUESTION : Find a 4-dimensional version of this?

A small remark for the general question: the more specific the problem is, the more data you have to deal with, so combinatorialy it means more options for proof hence larger space for search. Yet a specific problem might not be 'representative' enough of the general one which might well be harder to solve.

  • This is not an exampled of a generalization. Also, the argument doesn't need 3D. – Michael Jun 01 '23 at 18:03
  • @Michael . Sorry there was some implicit part. Of course the starting problem can be solved ( not so easily if you are 12 years old) but the solution looks like a trick, whereas the 3D with a simple deformation flows much better. – Jérôme JEAN-CHARLES Jun 02 '23 at 19:56