90

With no doubt everyone here has heard of deep learning, even if they don't know what it is or what it is good for. I myself am a former mathematician turned data scientist who is quite interested in deep learning and its applications to mathematics and symbolic reasoning. There has been a lot of progress recently, and while it is exciting to machine learning experts, the results so far are probably not useful for research mathematicians. My question is simple:

Are there areas of research math, that if one had access to a fully trained state-of-the-art machine learning model (like the ones I will describe below), it would make a positive impact in that field?

While math is mostly about proofs, there is also a need sometimes for computation, intuition, and exploration. These are the things that deep learning is particularly good at. Let me provide some examples:

Good intuition or guessing

Charton and Lample showed that Transformers, a now very standard type of neural network, are good as solving symbolic problems of the form $$ \mathit{expr}_1 \mapsto \mathit{expr}_2 $$ where $\mathit{expr}_1$ and $\mathit{expr}_2$ are both symbolic expressions, for example in their paper $\mathit{expr}_1$ was an expression to integrate and $\mathit{expr}_2$ was its integral. The model wasn't perfect. It got the right answer about 93-98% of the time and did best on the types of problems it was trained on. Also, integration is a 200 year old problem, so it is hard to outcompete a state-of-the-art CAS.

However, there are some things which make this interesting. Symbolic integration is important, difficult, and (somewhat) easy to check that the solution is correct by calculating the derivative (and checking symbolically that the derivative is equivalent to the starting integrand). Also it is an area where “intuition” and “experience” can definitely help, since a trained human integral solver can quickly guess at the right solution. Last it is (relatively) easy to compute an unlimited supply of training examples through differentiation. (This paper also uses other tricks as well to diversify the training set.)

Are there similar problems in cutting edge mathematics, possibly in algebra or combinatorics or logic where one would like to reverse a symbolic operation that is easy to compute in one direction but not the other?

Neural guided search

Some problems are just some sort of tree or graph search, such as solving a Rubik's cube. A neural network can, given a scrambled cube, suggest the next action toward solving it. A good neural network would be able to provide a heuristic for a tree or graph search and would prevent exponential blow-up compared to a naive brute-force search. Indeed, a paper in Nature demonstrated training a neural network to solve the Rubik's cube from scratch this way with no mathematical knowledge. Their neural network-guided tree search, once trained, can perfectly solve a scrambled cube. This is also similar to the idea behind AlphaGo and its variants, as well as the idea behind neural formal theorem proving—which is really exciting, but also not up to proving anything useful for research math.

Puzzle cubes and board games are not cutting edge math, but one could imagine more interesting domains where just like a Rubik's cube one has to manipulate one expression into another form through a series of simple actions, and the ability to do that reliably would be of great interest. (Note, the neural guided tree search I've described is still a search algorithm and much slower than a typical cube solving algorithm, but the emerging field of program synthesis could possibly one day soon learn from scratch a computer program which solves the Rubik's cube, as well as learn computer programs to solve more interesting math problems.)

Neural information retrieval

Suppose you have a database of mathematical objects (say, integer sequences, finite groups, elliptic curves, or homotopy groups as examples) and you want a user to be able to look up an object in this database. If that object is in the database, you would like the user to find it. If it is not, you would like the user to find similar objects. The catch is that "similar" is hard to define. Deep learning provides a good solution. Each object can be associated with an embedding, which is just a vector in, say, $\mathbb{R}^{128}$. The measure of similarity of two objects is just the inner product of their embeddings. Moreover, there are a number of techniques using self-supervised machine learning to construct these embeddings so that semantically similar objects have similar embeddings. This has already shown a lot of promise in formal theorem proving as premise selection where one wants to pick the most relevant theorems from a library of theorems to use in the proof of another theorem.

For example, I think such a neural database search could reasonable work for the OEIS, where one can use a neural network to perform various prediction tasks on an integer sequence. The inner layers of the trained network will compute a vector embedding for each sequence which can be used to search through the database for related sequences.

Geometric intuition

Neural networks are pretty good at image recognition and other image tasks (like segmentating an image into parts). Are there geometry tasks that it would be useful for a deep learning agent to perform, possibly in dimensions 4 or 5, where human geometric intuition starts to fail us since we can't see in those dimensions. (It would be hard to make, say a convolutional neural network work for a 4 dimensional image directly, but I could imagine representing a 3D surface embedded in 4D as say a point cloud of coordinates. This could possibly work well with Transformer neural networks.)

Build your own task

Neural networks are very flexible when it comes to the choice of input and output, as well as the architecture, so don’t let the specific examples above constrain your thinking too much. All you need are the following things:

  1. A type of mathematical object. One that you care about. It should be representable in some finite way (as a formula, an initial segment of a sequence, an image, a graph, a computer program, a movie, a list of properties).
  2. A task to perform on your operation. It can be well specified or fuzzy. It can be solvable, or (like integration) only partially solvable. It can be classifying your objects into finitely many buckets. It could be computing some other related object or property of the object. It could be finding the next element in a sequence. It could be coming up with a prediction or conjecture of some sort. It could be turning that object into a some 2D cartoon image even, just to think outside the box.
  3. Lots of training data. Either you need to be able to synthetically generate a lot of training examples as in the integration example above, or like OEIS have a large dataset of 10s of thousands or examples (more examples is always better). Clean data is preferred but neural networks handle messy data very well. Another “data free” solution is reinforcement learning like in the Rubik’s Cube example or AlphaGoZero, where the agent learns to explore the problem on its own (and generates data through that exploration).
  4. Patterns in the data, even if you can’t see what they are. Your task should be one where there are patterns which help the machine learning agent solve the problem. (For example, I’m not convinced that factoring large integers would be a good task.)
  5. Motivation. Why would this be useful to the field? What purpose would having this trained model have? Would it make it easier to conjecture facts, explore new areas of math, wrap ones head around a bunch of confusing formulas? Or do you have a way to turn a learned heuristic into a proof, such as with a search algorithm (as in the Rubik’s cube example above) or by checking the solution (as in the integration example above)?
Jason Rute
  • 6,247
  • 15
    I'm one of the authors of the lean-gptf paper. :) I'm very encouraged by the progress in machine learning on formal mathematics libraries, but I think it has a long way to go to help work-a-day mathematicians. On the other hand, I think there are probably more focused applications of deep learning to specific areas of mathematics that could have an impact right now, or relatively soon. – Jason Rute Apr 14 '21 at 13:15
  • 3
    Does numerical solution of PDEs count? https://arxiv.org/abs/1904.07200 – Steve Huntsman Apr 14 '21 at 16:33
  • 1
    ...or other algorithmic techniques like https://arxiv.org/abs/2005.01917? – Steve Huntsman Apr 14 '21 at 17:05
  • 1
    There are tons of papers on trainable theorem provers by now - just google. I certainly need one good at IPC, please let me know here if you are aware of such. – მამუკა ჯიბლაძე Apr 14 '21 at 20:13
  • @მამუკაჯიბლაძე, yes there are a number of papers on machine learned theorem provers (see one of the answers below). Do you think that they would help you in your or others research? You mentioned wanting a theorem prover for intuitionistic propositional logic. That is an interesting use case, since for classical propositional logic, I suspect SAT solvers still reign supreme, but for less common logics it may be easier to build a solver directly from training data. – Jason Rute Apr 14 '21 at 21:56
  • @მამუკაჯიბლაძე, have a look at arxiv.org/pdf/1805.11799.pdf which trains a neural theorem prover for intuitionistic propositional logic. (Although, you may also want to check out some intuitionistic ATPs, like some of the powerful tactics in Coq that may be better suited for your use case.) – Jason Rute Apr 14 '21 at 22:08
  • 14
    "While math is mostly about proofs" Objection! :-) – Jochen Glueck Apr 14 '21 at 22:41
  • @SteveHuntsman They can also learn mappings of function spaces https://arxiv.org/abs/2010.08895 – mysatellite Apr 15 '21 at 02:43
  • Thanks for the link! Yes I do think that I could benefit from machine learned theorem provers (don't know about others). Are you aware of any implementations? I found something called Proverbot9001 but so far was unable to install it. – მამუკა ჯიბლაძე Apr 15 '21 at 04:10
  • 1
    The deep learning could help in the categorification of some fusion rings. – Sebastien Palcoux Apr 15 '21 at 06:19
  • For many of the problems successfully attacked by deep learning, there were already existing approaches that ranged from 'usable' to 'pretty good' and beyond. If deep learning is expected to have success in math, shouldn't older methods have at least partial successes in some areas, and if so what's going on there? – JCK Apr 15 '21 at 15:25
  • Are you specifically asking about 'deep learning', or any type of neural network or machine learning? Most of the answers you've received don't seem to use multiple layers. – Brady Gilg Apr 15 '21 at 15:46
  • 2
    See also https://stats.stackexchange.com/q/73801/66864 on machine learning applications in number theory – J W Apr 15 '21 at 19:41
  • 1
    The paper Deep Learning Gauss-Manin Connections Kathryn Heal, Avinash Kulkarni, Emre Can Sertöz https://arxiv.org/abs/2007.13786 uses deep learning to guide the operation of a deterministic algorithm, that can be too slow to run if the wrong path is taken – Alex J Best Apr 16 '21 at 02:30
  • 4
    Proofs are an important part of mathematics, but mathematics is not mostly about proofs, far from it. See, for example, https://terrytao.wordpress.com/career-advice/theres-more-to-mathematics-than-rigour-and-proofs/ – Hollis Williams Apr 16 '21 at 15:25
  • 1
    Several of my colleagues have suggested using deep learning methods to search for counterexamples to conjectures, but I'm not sure whether there are any success stories as of yet. – Zach H Apr 17 '21 at 10:56
  • 1
    Not exactly what you’re looking for, but I’d love to see a mathematical reading assistant that could specifically gauge which parts of a paper are novel and how significant they are. Are the results of this paper variations on known things, genuinely new, or what? Dumb text matching won’t work. Textual similarities might mean the new paper is duplicative, or not. Especially when I look at papers outside my own area, so I don’t already have a strong sense of context. Of course I could read 10 background papers to understand this new one. But if ML can help or at least say which 10... – Zach Teitler Apr 19 '21 at 02:21
  • 1
    @ZachTeitler I suspect that it is easier to make AI prove new interesting theorems than to make it read human language and infer what is and is not novel with respect to the existing literature. –  Apr 19 '21 at 04:39
  • Also: Which areas which would suffer a negative impact from state-of-the-art machine learning? They might become less fun, less fundamental, less funded.... –  Apr 19 '21 at 09:24
  • 1
    Handwritten math equations to $\rm\LaTeX$ text recognition algorithms. – Asaf Karagila Apr 25 '21 at 08:15

11 Answers11

36

In the context of algebraic geometry, neural networks have become useful tools in the study of Calabi-Yau manifolds. The computation of their topological invariants, metrics and volumes is of particular interest for applications in physics (string theory). Recent contributions include


For a more general overview, see Machine-Learning Mathematical Structures

We review, for a general audience, a variety of recent experiments on extracting structure from machine-learning mathematical data that have been compiled over the years. Focusing on supervised machine-learning on labeled data from different fields ranging from geometry to representation theory, from combinatorics to number theory, we present a comparative study of the accuracies on different problems. The paradigm should be useful for conjecture formulation, finding more efficient methods of computation, as well as probing into certain hierarchy of structures in mathematics.

Carlo Beenakker
  • 177,695
  • 1
    Have these been used for any particularly interesting applications in physics? – Will Sawin Apr 14 '21 at 15:54
  • This is a great example! I was expecting only to get ideas for future research (and that is still what I’m looking for) and not already published results, so this is very motivating. Could you possibly elaborate on the task being performed here? I don’t know much about this area, but it seems that in the second example the input to the NN is a matrix of integers representing a class of manifolds and the output is an integer hodge number from 1 to 24 of that class. Is that correct? What about the first example? – Jason Rute Apr 14 '21 at 16:18
  • 1
    I found this description (page 11 and following) quite helpful. – Carlo Beenakker Apr 14 '21 at 19:33
  • 1
    @CarloBeenakker This is really general. From that paper: "We emphasize again that the computing a topological invariant of any manifold (appropriately embedded as an algebraic variety) can be cast into the form of [a simple machine learning training problem]. We turned to CY because of their being readily available [data], similar experiments should be carried out for general problems in geometry." (Although I don't know enough about this area to say if it is surprising or useful.) – Jason Rute Apr 14 '21 at 22:20
  • @CarloBeenakker Are you interested in making an answer just for that survey paper since it goes beyond CY manifolds? Or if you like I can do it. – Jason Rute Apr 14 '21 at 22:26
  • 1
    I added that overview paper to the answer box (since this is not (or not yet) community wiki I refrained from making a separate answer) – Carlo Beenakker Apr 15 '21 at 06:05
30

I have some thoughts on a level of generality that is a bit higher than the question asks for:


One obstacle that faces applications of supervised machine learning to predict properties of mathematical objects is that in many fields of mathematics, such as number theory, we tend to expect that properties of the nice mathematical objects we study will either be

  • given by explicit formulas, or
  • quasirandom and hard to predict.

If this expectation is correct, then machine learning researchers are in a bit of a bind - they will either discover nothing or something trivial. Of course, mathematicians are fallible and our expectations are certainly not always correct.

This suggests one approach would be to use neural networks to search for properties that are unexpectedly easy to predict. This would be a somewhat thankless task as you would usually find nothing of interest, but when you do find something it would be a counterintuitive new phenomenon, which could hopefully by human analysis turned into a new formula for something.

It is crucial in this setting to have a good understanding of existing quasirandom models of the object in question, to know what trivial bounds your prediction is trying to beat. (This has been a pitfall for some attempts to apply machine learning, or just classical statistics, to pure mathematics.)


I am particularly excited about bringing techniques from puzzles and games to pure math, as in the neural guided search section of the question. This is because neural networks on games are some of the only networks to not just achieve human performance on tasks, and not just achieve superhuman performance, but to achieve superhuman performance by introducing new ideas which human practitioners were able to learn from and incorporate into their own strategies.

Since the goal of mathematics is the production of ideas, this is very heartening.

So we're looking for games where one must apply a sequence of moves, following some rules, to try to reach some result. (I don't know many examples of adversarial games in pure mathematics, so I'm describing 1-player games.)

Formal theorem proving is an example of this, already discussed in Steve Huntsman's answer.

I believe Kirby calculus is a game or puzzle of this sort, with the goal of showing two 3-manifolds or 4-manifolds are diffeomorphic by a finite series of combinatorial moves, that is used in current research (to test whether a newly constructed smooth structure on a 4-manifold is exotic or diffeomorphic to the standard structure). One challenge might be finding a good way to represent the diagrams that a network can work with efficiently.

Will Sawin
  • 135,926
  • 2
    I agree that Kirby calculus is a good candidate for machine learning, but it should be emphasized that it's only good for showing that two things are the same (often in unexpected ways). So you can show that e.g. a potential counterexample to the smooth 4d Poincare conjecture is actually standard, but you'll never be able to use it to show that it is a counterexample. – Andy Putman Apr 14 '21 at 16:53
  • What exactly is a 1-player game? Knot isomorphism? Graph isomorphism? Any decision problem in NP? – Dustin G. Mixon Apr 14 '21 at 17:11
  • @DustinG.Mixon A key feature of a game to me is that is an NP (or N-something, maybe not necessarily polynomial time) bound where a solution/witness is naturally described by a sequence of steps that transform a state. So knot isomorphism is by this standard and graph isomorphism probably isn't because you would just write the permutation rather than express it as a sequence of transpositions. – Will Sawin Apr 14 '21 at 17:17
  • 1
    The reason I want to look for problems with a sequence of steps is because I really like the AlphaZero algorithm that combines machine learning with tree search, and think it's very powerful. This algorithm, at least in its current form, relies on a sequence of choices in time, where the algorithm considers multiple choices and evaluates their effects. – Will Sawin Apr 14 '21 at 17:19
  • 3
    A possible example of an adversarial game in pure mathematics is an Ehrenfeucht–Fraïssé game. – Timothy Chow Apr 14 '21 at 17:20
  • @TimothyChow So, if I understand correctly, a highly skilled neural agent at this game would be able to predict, non-rigorously but accurately, whether two structures are elementarily equivalent. Would this be something useful to have. – Will Sawin Apr 14 '21 at 17:24
  • 2
    @WillSawin Does this capture your idea of a 1-player game? https://en.wikipedia.org/wiki/Reachability_problem – Dustin G. Mixon Apr 14 '21 at 17:33
  • @DustinG.Mixon Yes, I would say so. – Will Sawin Apr 14 '21 at 17:36
  • Concerning your comment that properties of "nice objects" in number theory are expected to be "given by explicit formulas or are quasirandom and hard to predict", into which of those categories would you put the prime number theorem and Artin's primitive root conjecture (in its quantitative form)? While primes are themselves quasirandom, those two conjectures are asymptotic formulas (which by the nature of asymptotic formulas leaves room for flexibility, e.g., $\pi(x)$ vs. ${\rm Li}(x)$ in PNT) and neither could really be called a trivial formula. – KConrad Apr 14 '21 at 17:55
  • @KConrad My claim is not that the original discovery of these formulas was trivial, but that rediscovering the prime number theorem or a quantitative form of the primitive root conjecture today would be trivial. – Will Sawin Apr 14 '21 at 17:59
  • 1
    I see. It would be very surprising to me if a deep learning algorithm could rediscover the formula for the constant appearing in the Artin primitive root conjecture or Bateman-Horn conjecture without being told to look for infinite products over primes. Of course, since these formulas are already available it might be hard to convince anyone that a computer found them again even if one of them did. – KConrad Apr 14 '21 at 18:06
  • @KConrad Sure - it's more likely that a machine learning algorithm trained on counts of prime tuples would discover a formula that roughly approximates this infinite product - if the parameters are inputed by binary digits, I would certainly expect to see the local factor at $2$ appearing, and probably approximations to the local factors at odd primes - at least those with simple divisibility rules. This wouldn't be very interesting. – Will Sawin Apr 14 '21 at 18:13
  • What I suggest is that an algorithm should start from the best human predictions, like the product formula for the constant, and then try to find additional patterns that violate human expectations. Maybe the algorithm could join you among the august ranks of those who have found counterexamples to (generalizations of) the Bateman-Horn conjecture... – Will Sawin Apr 14 '21 at 18:15
  • As for your first point, I think it is too simplistic to assume that supervised learning problems of interest have to be the type of problems for which a formula can even exist. For example it doesn't make sense to talk about a "formula" for computing a symbolic integral. Also, maybe one wants to compute a value from a representation matrix (like in the CY manifold example above). Again, a formula doesn't seem like the right type of solution. Maybe there is an algorithm, but it is quite possible that this algorithm is hard to find, and having an approximate solution would be a good start. – Jason Rute Apr 14 '21 at 22:35
  • 1
    But I agree that many problems are not suited for machine learning since they are too easy or too hard. Nonetheless, math is a diverse subject and I think there is a lot of areas of math where machine learning would be a good fit, especially if one realizes how powerful and flexible it has now become. The challenge is pairing up the folks who know about the problems with the folks who know the machine learning solutions that might help. – Jason Rute Apr 14 '21 at 22:38
  • @JasonRute I don't disagree - I only mean to speak about a wide class of problems I am familiar with in number theory, where this simple-or-quasirandom phenomenon is ubiquitous. – Will Sawin Apr 14 '21 at 23:03
  • @WillSawin Please define "quasirandom" and "explicit formula" and why a dichotomy? – Turbo Apr 15 '21 at 08:15
  • When AI produces a solution or a counterexample to the Riemann hypothesis I will start worrying. – Liviu Nicolaescu Feb 12 '24 at 12:20
25

In the category of guessing, there is the Ramanujan Machine. This project got off on the wrong foot with the research mathematics community because their initial announcement made overblown claims (such as claiming novelty for some conjectures/theorems that weren't new), but it is my belief that the general concept is sound. Along similar lines, perhaps it's time to revisit Graffiti with modern neural nets. In his paper On Conjectures of Graffiti (Ann. Discrete Math. 38 (1988), 113–118), Siemion Fajtlowicz described a computer program for generating conjectures about finite graphs. Most of its conjectures were either false, trivial, or known, but it did come up with some novel conjectures that were interesting enough for graph theorists to spend time proving them and publishing the proofs.

In the category of guided search, people have tried to use neural networks to improve the performance of SAT solvers; see for example the work on NeuroSAT. This seems like a promising area for further research.

Timothy Chow
  • 78,129
  • Well, if most of the conjectures it come up with is false then I don’t really see the point. I can hire a monkey to type on my laptop all day and maybe after a few weeks he would come up with the statement of Riemann hypothesis. – Heavensfall Apr 15 '21 at 07:23
  • 3
    I remember having heard about this project some time ago, and while I forgot much about it since that time, I am pretty sure it has nothing to do with deep learning. – Thomas Baruchel Apr 15 '21 at 07:25
  • 6
    @Heavensfall If it took only a few weeks to come up with the Riemann hypothesis then I would gladly sift through the junk to find the gold. – Timothy Chow Apr 15 '21 at 13:15
  • 3
    @ThomasBaruchel Yes, perhaps I should have been clearer that neither the Ramanujan Machine nor Graffiti uses deep learning. I was just picking two specific areas where computers have historically come up with interesting conjectures, and suggesting that possibly better algorithms would do even better. – Timothy Chow Apr 15 '21 at 13:21
21

Neural networks might help to speed up computations in the monster group $\mathbb{M}$, which is the largest finite sporadic simple group. Such a network could be in some sense a (rather large) cousin of the neural network dealing with Rubik's cube mentioned by the OP.

Elements of $\mathbb{M}$ are usually represented as words of sparse matrices in $\mbox{GL}_n(\mathbb{F}_k)$, $196882 \leq n \leq 196884, k = 2,3$. There is an effective algorithm for checking equality of two such words, see [1]. Reduction of a word to a shorter word may take several minutes on a computer, see e.g. [2] for an overview. I'm currently working on the acceleration of the reduction of such words. I plan to exploit some geometric information contained in the images of certain vectors in $\mathbb{F}_k^n$. As far as I know, nobody has used this information before. (For experts: I focus on the images of vectors called 2A-axes). Here a neural network might be better at learning how to use this information than I am.

The mathematical benefits of such a project are:

  1. Regarding computations, the monster group is the most difficult finite simple group to deal with. If we can compute in the monster group then we can compute in all finite groups, provided we have enough information about that group and enough computer memory.

  2. We can probably finish the classification of the maximal subgroups of the monster group.

References

[1] R. Wilson. Computing in the monster. In Group, Combinatorics & Geometry, Durham 2001, 327–337. World Scientific Publishing, 2003.

[2] R. A. Wilson. The Monster and black-box groups.

18

Some recent work on neural formal theorem proving (already mentioned in the question, but the examples give a sense of the state of the art):

  • 2
    Also see TacticToe (HOL4), Tactitian (Coq), CoqGym/ASTactic (Coq), HOList/DeepHOL (HOL Light), Holoprasm (Metamath), GPT-f (Metamath), Lean GPT-f (Lean) for more state of the art in this area. – Jason Rute Apr 14 '21 at 13:21
  • 2
    And all of Josef Urban's work. – Jason Rute Apr 14 '21 at 13:22
  • 3
    While I think such work provides a lot of motivation and encouragement, I don't think it is currently useful to research math and I don't think it will be for a while. However, I think there are probably more specific applications of machine learning to specific areas of mathematics that could be of value right now if we find them. – Jason Rute Apr 14 '21 at 13:23
  • @JasonRute Agreed--the examples highlight the present gap between aspiration and capability. – Steve Huntsman Apr 14 '21 at 13:26
  • 1
    @JasonRute I think formalization of math that is close to research-level, like much of what Kevin Buzzard and his collaborators have been doing, is a valid and potentially important area of research math. As far as I know they don't use any neural methods, but at the rate machine learning approaches improve, isn't it reasonable to guess these things could be useful relatively soon? – Will Sawin Apr 14 '21 at 15:46
  • 2
    @WillSawin, I haven't been clear. I greatly value Buzzard's work and his predecessors. I played a small role in Hale's formalization of the Kepler Conjecture, and a much larger role in lean-gptf, a recent neural theorem proving tactic for Lean (Kevin's prover of choice). What I want to emphasize here is that we don't have to solve automatic theorem proving (a REALLY hard problem) to use AI to solve outstanding math problems right now. I also don't think lean-gptf and such will currently solve new theorems or create new insights, but neural tools focused on specific areas of math will. – Jason Rute Apr 14 '21 at 17:02
  • @JasonRute I actually think you were reasonably clear! II am just, personally, very excited by the machine learning formal theorem proving projects that I've heard of, and disappointed by other attempts to apply machine learning to mathematics that I've seen. I wanted to ask a question to develop my intuition about it even if it's not the actual topic of your MO question. – Will Sawin Apr 14 '21 at 17:31
  • 1
    @WillSawin My impression, as an interested onlooker, is that the bottleneck at the moment is on the data side rather than the algorithmic side. There is probably not enough formalized math yet for the algorithms to study. Generalizing from limited data is still something that current machine learning algorithms aren't very good at. – Timothy Chow Apr 14 '21 at 17:41
  • @TimothyChow Any idea if something like https://mathoverflow.net/a/100290/ might be within reach? – Steve Huntsman Apr 14 '21 at 17:47
  • @TimothyChow One approach to get more data would be to make the algorithm you're training do it. I'm obviously not an expert, but I don't think this is completely impossible for formal theorem proving. – Will Sawin Apr 14 '21 at 17:49
  • @SteveHuntsman If Gourevitch's conjecture is proved with machine assistance, I think it will be by more "conventional" means (searching for a certificate) and not via programs that are trained on formal proofs. – Timothy Chow Apr 14 '21 at 17:55
  • @WillSawin Perhaps you mean something like Generative Language Modeling for Automated Theorem Proving by Polu and Sutskever. Yes, this is possible in principle, but it seems to me that we (the mathematical community as a whole) are at the baby-step stage right now. – Timothy Chow Apr 14 '21 at 17:57
  • @TimothyChow In fact that was the main example I was thinking of when i mentioned I was excited by formal theorem proving projects! But I think one could try to also generate the theorems by neural network methods. – Will Sawin Apr 14 '21 at 18:03
13

This predates deep learning, but we should not forget that there has already been some progress in graph theory by recognizing patterns (OP: "patterns in the data") and forming conjectures.

  • Hansen, Pierre, and Gilles Caporossi. "AutoGraphiX: An automated system for finding conjectures in graph theory." Electronic Notes in Discrete Mathematics 5 (2000): 158-161. DOI.

"Up to now it [AutoGraphiX] has refuted 9 conjectures of Graffiti and suggested over 50 novel conjectures, 15 of which have been proved and none disproved."

Just posted (29Apr2021): A.Z. Wagner, "Constructions in combinatorics via neural networks." arXiv abs:

"We demonstrate how by using a reinforcement learning algorithm, the deep cross-entropy method, one can find explicit constructions and counterexamples to several open conjectures in extremal combinatorics and graph theory."

As requested, now a separate post.

Joseph O'Rourke
  • 149,182
  • 34
  • 342
  • 933
  • 4
    And not just this type of thing, here's a sample from a student in my department: https://paperswithcode.com/author/matthias-fresacher Learning Erdős-Rényi Random Graphs via Edge Detecting Queries – David Roberts Apr 15 '21 at 01:29
  • 1
    It is disconcerting that their first praise for this automated system is the way it refutes conjectures from a different automated system. –  Apr 19 '21 at 09:16
  • 3
    @MattF. And now Wagner's paper refutes a conjecture by AutoGraphiX (which was previously, independently, less efficiently refuted by a human mathematician, the authors point out). It's the circle of life! I would actually defend this - if we have automatic conjecture generators and other systems that can automatically refute them, then by combining them we should obtain automatic generation of highly plausible conjectures. – Will Sawin May 01 '21 at 21:35
  • 1
    The third example that you have just added does actually involve neural networks, and thus seems to be a better fit for the question than your other two. It seems worth posting a new answer, or editing your answer to focus more on it. – Will Sawin May 01 '21 at 21:38
13

Adam Wagner has a new paper on arXiv (29 Apr 2021), "Constructions in combinatorics via neural networks," whose Abstract is:

"We demonstrate how by using a reinforcement learning algorithm, the deep cross-entropy method, one can find explicit constructions and counterexamples to several open conjectures in extremal combinatorics and graph theory. Amongst the conjectures we refute are a question of Brualdi and Cao about maximizing permanents of pattern avoiding matrices, and several problems related to the adjacency and distance eigenvalues of graphs."

Will Sawin commented here that "Wagner's paper refutes a conjecture by AutoGraphiX" (and deserved to be in an answer of its own).

Timothy Gowers discussed this on Twitter, summarizing that "the rough idea of the program is to design a suitable reward function, usually very simple, for how good an attempted counterexample is, and let the magic of reinforcement learning do the rest".

Joseph O'Rourke
  • 149,182
  • 34
  • 342
  • 933
12

Neural networks are also just starting to be used in knot theory!

One of the largest unsolved parts of the field is finding the structure of the smooth Knot Concordance Group. Doing so requires a firm understanding of many knot invariants, chief of which are the $\tau$ and $s$ invariants of Ozsváth, Szabó, and Rasmussen. These are powerful invariants, both of which are derived from the intrinsic geometry of the knot in question, yet they are very difficult to compute for complicated knots.

The problem of computing these invariants is very conducive to the use of neural networks: there is a large training set (all knots where the computation has already been carried out) and a root in the geometry of the object. Lastly, I would be remiss not to mention that the calculation of these invariants is part of a large scale computational strategy for finding a counterexample to the Smooth 4-dimensional Poincaré conjecture. This is commonly regarded as the largest remaining question in low dimensional topology.

  • Is the suggestion that a neural network could be trained to guess certain knot invariants without calculation? – Jim Conant Aug 08 '22 at 00:26
  • 3
    Yes! In fact, I think this was actually put into practice in the work of Mark C. Hughes, found here: https://arxiv.org/pdf/1610.05744.pdf – KnotEnthusiast Aug 08 '22 at 05:39
  • Neat! I am glad to see it working in practice. It looks like neural network example finding is going to become an increasingly powerful research tool in the coming years. – Jim Conant Aug 08 '22 at 05:55
  • I am intrigued by the way you phrased your comment about the smooth 4D Poincaré conjecture. Do most experts suspect that it is false? – Timothy Chow Aug 08 '22 at 21:58
  • From my experience, there is an even split among low dimensional topologists. In the subset of knot theorists, it seems more skewed towards the belief that it is false. The reason why: the Smooth 4-dimensional Poincaré conjecture can be disproved by constructing a knot that is smoothly slice in a homotopy $4$-sphere, but not smoothly slice in $S^4$. The $\tau$ invariant is known to obstruct sliceness on ALL homotopy 4-spheres, but there is no such result for the $s$ invariant. Since there are an infinite number of knots, it seems likely that there’s SOME knot out there can provide a disproof! – KnotEnthusiast Aug 09 '22 at 22:11
9

One application that we urgently need is a "smart" catered TeX-math search engine (eg.an interesting one is https://approach0.xyz/search/ ).

  • achieving a collaboration with the people in arxiv to have the engine search through the TeX code of math papers. (So maybe even coming up with some template of math-TeX glossary that we should follow if we want to ease searchability).
  • So this would require an engine that is flexible and even present us with search-results that are close but from different areas, establishing interesting bridges.
  • I imagine the AI would also be useful here in forming a loose family of theorems and definitions based on topic. Thus giving us a sense of the various directions in math.

Another interesting application would be to help in PDEs with sharpening the various constants. I imagine a really smart AI would be great for running through a wide range of candidate functions and having the cost function just be based on the sharpening of the constant.

Thomas Kojar
  • 4,449
  • 2
    Good idea! The crawlers used in approach0.xyz is open source, so people could eventually add one for arXiv: https://github.com/approach0/a0-crawlers – Tadashi Dec 07 '21 at 02:53
  • 1
    Yes, please! In plain language search, Google and other search engines are quite good at finding relevant synonyms that you didn't type. Broadening a TeX-math search to allow for "math synonyms" or alternate notations would be just one of many useful features this technology could have. – Somatic Custard Jan 05 '23 at 23:19
  • Substitution tree indexing: https://link.springer.com/chapter/10.1007/3-540-59200-8_52

    Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval: https://paperswithcode.com/paper/evaluating-token-level-and-passage-level

    – ZENG Feb 13 '24 at 22:37
7

There has been some recent efforts in utilizing neural networks to approximate solutions of different types of PDEs. The key advantage of this approach is the possibility of tackling extremely high-dimensional problems, where traditional numerical approach involving discretization proves infeasible as the number of grid points scales exponentially with dimension.

Deep learning approaches sidestep the curse of dimensionality by converting the PDE problem into e.g. minimizing an energy functional https://arxiv.org/abs/1710.00211 or a stochastic optimal control problem https://arxiv.org/abs/2102.11379, then solve the related ERM problem via Monte Carlo sampling, whose convergence is independent of dimension.

  • 1
    Anil Ananthaswamy did an interesting article about this topic on Quanta Magazine: https://www.quantamagazine.org/new-neural-networks-solve-hardest-equations-faster-than-ever-20210419/ – Tadashi Apr 27 '21 at 13:47
  • https://www.quantamagazine.org/latest-neural-nets-solve-worlds-hardest-equations-faster-than-ever-before-20210419/ (Link broken by new URL.) – seldom seen Feb 13 '24 at 11:52
6

Full disclosure: This is a copy-and-paste of my answer to a question over on CrossValidated (stats Stack Exchange).

See the 2019 preprint Machine Learning meets Number Theory: The Data Science of Birch-Swinnerton-Dyer by Alessandretti, Baronchelli & He. Here is the Abstract:

Empirical analysis is often the first step towards the birth of a conjecture. This is the case of the Birch-Swinnerton-Dyer (BSD) Conjecture describing the rational points on an elliptic curve, one of the most celebrated unsolved problems in mathematics. Here we extend the original empirical approach, to the analysis of the Cremona database of quantities relevant to BSD, inspecting more than 2.5 million elliptic curves by means of the latest techniques in data science, machine-learning and topological data analysis.

Key quantities such as rank, Weierstrass coefficients, period, conductor, Tamagawa number, regulator and order of the Tate-Shafarevich group give rise to a high-dimensional point-cloud whose statistical properties we investigate. We reveal patterns and distributions in the rank versus Weierstrass coefficients, as well as the Beta distribution of the BSD ratio of the quantities. Via gradient boosted trees, machine learning is applied in finding inter-correlation amongst the various quantities. We anticipate that our approach will spark further research on the statistical properties of large datasets in Number Theory and more in general in pure Mathematics.

Yemon Choi
  • 25,496
J W
  • 748
  • 12
    This is, in my opinion, a great example of machine learning / statistics work which wastes effort by not incorporating pre-existing mathematical knowledge and heuristics for the topic area. More generally, when working with any dataset, it's important to understand where the data comes from and whether it consists of a biased sample from a larger population. In this case, the elliptic curves in the LMFDB are, by construction, elliptic curves of small conductor (or smallish prime conductor, or highly smooth conductor). This heavily influences the statistics. – Will Sawin Apr 17 '21 at 14:42
  • Whole bunch https://scholar.google.com/scholar?cites=13717899482370344115&as_sdt=2005&sciodt=0,5&hl=en. – Turbo Apr 17 '21 at 23:11
  • 3
    @WillSawin It sounds like a great example of machine learning being thrown at things by people without pausing to understand the proper practice of statistics (Speaking as someone who has friends and colleagues in my department who do theoretical statistics including machine learning) – Yemon Choi Apr 19 '21 at 00:32