24

What tips do you have for reading arXiv papers in mathematics?

This could be too broad if I'm not careful, so here are some limitations on how to make the question appropriate for MO:

  • I'm interested in tips for reading arXiv papers with a view to understanding the latest research, both before and after peer review.
  • Answers here could benefit everyone, but let's assume the reader is, like me, at the beginning of their academic career.
  • Feel free to impose your own restrictions.

One obvious tip is to view the articles with a higher degree of skepticism than one from a peer reviewed journal.

Another tip is: be aware of which version of the article you are reading.

David White
  • 29,779
Shaun
  • 351
  • 21
    The quality of arXiv papers in math is quite high generally. I don't think there's a reason to be enormously skeptical of most arXiv papers. Now, if we're talking about a claimed proof of a famous open problem, okay, be a skeptical; but that's definitely not the norm. On the flip side, having been peer reviewed is not a guarantee of correctness for a paper. I think young researchers are often confused about this. Proof checking is one thing that happens during peer review. But equally important, the referees and the journal itself determine if the paper has important new ideas in it. – Sam Hopkins Feb 06 '24 at 18:49
  • 5
    @SamHopkins not only this depends quite drastically on the domain of maths, I also feel that the very drastic increase of the number of daily submissions over the last 20 years impacted the quality quite drastically too. I see more and more papers that benefited no proof-reading by the authors (and it is very easy to hide mistakes in badly written arguments)... – Vladimir Dotsenko Feb 07 '24 at 07:26
  • 5
    The degree to which proofs are carefully checked during peer review can depend on the area. – Hollis Williams Feb 07 '24 at 15:40
  • 1
    It is increasingly common to see that the latest version of arXiv corrects errors in the journal version. I would look at the latest arXiv version of any journal published article. – juan Feb 07 '24 at 19:32

2 Answers2

35

As I write this, the question already has one vote to close (but no comment as to why). So, I'll start by stating that I think concerns about arxiv are reasonable for research mathematicians, and that you're likely to get more thoughtful answers on MO vs academia.SE, because many other fields do not have the same culture of communicating via preprints that math has. Furthermore, we have had several MO threads about arxiv that are still open, e.g., 1, 2, 3, 4, 5, 6, 7, 8. I will interpret the question to be about staying on top of the volume of stuff on arxiv, concerns about correctness (since the OP mentions this), and general early-career advice regarding arxiv. In terms of "tips for reading" I'll say that reading an arxiv paper is just like reading any other math paper. I assume the OP already knows how to read math papers, e.g., the importance of the introduction, finding relevant results, checking the proof, keeping track of notation, etc.

Here are some of my personal considerations regarding arXiv:

  1. I try to check the arXiv in my area at least once a week to stay up to date on what people are doing in my area. I read the title of every paper, read the abstract of those that interest me, and download the most interesting to read fully when I've got time. At the very least, I read the introductions of those most relevant papers. About 25 papers remain visible on the "recent" page, e.g., see this. If it's right in my wheelhouse, I try to read the preprint fully the following weekend, so as not to fall badly behind.

  2. If an arxiv paper overlaps with work I'm currently doing, I might email the author to discuss, or to invite them to the local seminar in my area.

  3. You can also subscribe to arxiv email bulletins, that include when a new version of a paper is posted.

  4. For any arxiv paper that I think I'm likely to cite, I keep my eyes open for when the journal version appears, and download it, because sometimes the theorem numbers in the arxiv version do not match the published version.

  5. I hate to admit this, but I do pay attention to who is the author of the paper. If it's someone I've never heard of, making big claims, I treat the preprint with more skepticism. Before basing any of my future work on it, I make sure to read it carefully and check if it seems correct. For papers by authors who I know well, I usually give them the benefit of the doubt, and don't check things carefully until I really need the result. I basically attach the new knowledge onto my existing knowledge as something that so-and-so proved, and hope I remember that the preprint exists when I next ponder that particular branch of my memory tree.

  6. It is wise for junior researchers to come up with a file organization system that works for them. Mine is to have a giant folder called "papers" where every paper is saved like "White, Yau, 2017, arxiv v1, Smith Ideals of Operadic Algebras in Monoidal Model Categories." That way, if I can remember the author names or keywords in the title, then I can probably find the paper using the mac spotlight feature or going through the folder in alphabetic order. This also means if subsequent versions come out, they can be saved near the first version. I imagine others have FAR superior organization than I do. An old friend used to use Mendeley, and it could attach keyword tags, etc.

  7. Sometimes a paper has been on arxiv for a very long time, has a well-known author, and has tons of citations. Here is an example off the top of my head. Sometimes this happens with an author who has died or left math. Having been around a long time, with lots of citations, I think the community has essentially verified all the claims, and the paper can be cited with the same level of certainty as if it had been peer reviewed.

  8. Whenever I get a paper back from a referee, I update my bibliography to be sure I cite published versions of any arxiv preprints I had previously been citing.

  9. If you are just starting off as a research mathematician, I advise you to NOT post anything to arxiv without your PhD advisor's explicit approval. Why? Well, for one thing, first impressions matter a lot. If your preprint has errors, or fails to cite some prominent people, readers might form a negative impression of you. For another thing, you don't necessarily know what kinds of things will be viewed as impressive or worthwhile by the community. Lastly, you might not know the culture in your subfield regarding junior people getting scooped, and you might not know how easy/hard it would be for a reader to read your preprint, understand your methods, and push on to the next step of your program that you and your advisor were hoping to do. I think it's good to target putting stuff on arxiv at the same moment that it's ready to submit to a journal, and when you're certain there is little danger of getting scooped (e.g., you've already worked out the first two or three papers in the series).

  10. Having preprints on arxiv the last year of your PhD might help you when you're on the job market, to show that you really have produced good research.

  11. If a new version is posted to arxiv and you want to know what the author changed, consider using a PDF comparison tool like this one.

  12. If I notice another researcher citing the arxiv version of a paper of mine that was subsequently published, I usually email them with the publication citation so they can cite the published version instead.

  13. When one of my papers gets accepted, I try to post the last version to arxiv, so the only difference between it and the published version is the use of the journal style file. Relatedly, it's good to update the arxiv journal ref to point to the actual published version so people who see the paper on arxiv realized it did actually get published (plus when and where).

  14. When I see a paper that's very interesting to me, by a very new researcher, I sometimes write them an email to encourage them in that line of work. Like "Wow, this is so exciting, I tried to prove this myself once and couldn't. Wonderful work!" I think math research can be a lonely profession and the math community might be a bit biased towards negative feedback (like, "I knew all this 10 years ago" or "What's new here? This seems like it all follows in a straight-forward manner from X.") and I'd love to shift the culture to be more encouraging. I especially try to encourage people doing the kind of work that I'd love to see more of. Maybe that will help the author feel less alone in the pursuit of this knowledge, and know that others are interested and rooting for them.

  15. Anyone uploading papers to arxiv should know that their tex code is public. Therefore, it is wise to remove any comments to yourself that you don't want others to see. It's easy to find lines that start with %, and strip them out.

I have to dash off now but I might add more if I think of any.

David White
  • 29,779
  • 3
    You can compare arXiv papers by source code, which is more precise than what PDF comparers can give you. – darij grinberg Feb 06 '24 at 18:31
  • 9
    "For any arxiv paper that I think I'm likely to cite, I keep my eyes open for when the journal version appears, and download it, because sometimes the theorem numbers in the arxiv version do not match the published version." Then cite the arXiv version! Your readers will probably be reading that one too. – darij grinberg Feb 06 '24 at 18:33
  • 7
    I especially like that the text above contains useful advices. One should not "fail to cite some prominent people"! One should be aware of "junior people getting scooped"! Apparently, supporting nepotism in academia and embracing the fact that nobody cares about mathematics (rather than "career") is the way to get by for a present day mathematician. – Denis T Feb 06 '24 at 20:12
  • 6
    A propos " I think it's good to target putting stuff on arxiv at the same moment that it's ready to submit to a journal." That's reasonable advice, but I'd add "put it on the ArXiv a week before you're planning to submit it, on the chance that someone will send you some useful comments that you can incorporate (with appropriate thanks) into the submitted version." – Joe Silverman Feb 06 '24 at 21:01
  • 3
    @JoeSilverman It seems that more journals nowadays are trying to use double-blind (a.k.a. double-anonymous) refereeing (including AMS journals). Does that mean that the mathematical community should rethink the practice of posting papers to the arXiv before submtting them for publication? – Timothy Chow Feb 06 '24 at 22:01
  • 12
    @TimothyChow: I support the double-blind refereeing process (although it does have its issues). But there is no way that it will change the established practice of making your article publicly available before it has been accepted for publication. For example, say you're a graduating PhD student. Your results from your PhD lead to a paper. It can take months to years to hear back from a journal. When you give job talks, you have to be able to reference your work, and you want people to be able to look at it. – Sam Hopkins Feb 06 '24 at 22:19
  • 3
    @TimothyChow The following is my personal opinion, it doesn't reflect AMS policy, and I certainly wouldn't try to speak for the entire mathematical community. So, my view is that there are good arguments for using double-blind refereeing, and also that there are reasonable arguments against. I lean toward the view that the upside outweighs the downside, but that the overall effect is probably not large. However, after the AMS has had (say) 5 years of experience, it will be interesting to try to quantify the (hopefully positive) effect. As for postig on the ArXiv, I'll need to go to the ... – Joe Silverman Feb 07 '24 at 01:47
  • 4
    next comment, since I'm out of space. I think it's highly beneficial for people at every career stage to post their work to the ArXiv, although the timing is up to the individual. It's true that this means double-blind is less effective, since a referee can find the article's author(s) on the ArXiv. And for the many people who look at the ArXiv's daily or weekly listings in their field, it makes DB refereeing moot. But to answer the question you seem to be asking, I think that the benefits of posting one's work on the ArXiv outweighs the possibility that it will prevent DB refereeing. – Joe Silverman Feb 07 '24 at 01:52
  • 1
    Then cite the arXiv version! @darijgrinberg Are you suggesting to not cite versions that appeared in journals? Or just include the arXiv version number when the paper has not yet appeared? The latter makes sense to me, the former not. – Kimball Feb 07 '24 at 03:37
  • @Kimball: I suggest citing both, but referencing the numbering of the arXiv one if they differ. So "By [15, Theorem 3.2 in the arXiv version]". Of course, the ideal solution is to ask the author to synchronize the versions, but in my experience authors rarely bother to. – darij grinberg Feb 07 '24 at 03:55
  • @TimothyChow: Double-blind refereeing in mathematics doesn't mean total anonymity. It means that you don't rub your name in the referees' noses. This is the best you can hope for in a field that doesn't have hundreds of people. But it's useful for ensuring that the referees' first impressions are more or less objective. – darij grinberg Feb 07 '24 at 03:58
  • 2
    @darijgrinberg Thanks for clarifying what you meant. I do think it's important to cite the published version, because of citation metrics for the author and journal. If there's a mismatch between theorem numbering, I cite the journal version because it's more stable. The author could conceivably change the arxiv version after my paper is published, so in your example one would also need to state which version it's 3.2 in, which makes the in-line citations even longer. I also strive to normalize relying on the published literature more than preprints. – David White Feb 07 '24 at 14:00
  • 1
    @DenisT Sorry if I upset you in some way. When giving career advice to young people, I strive to be forthright and realistic, instead of presenting an idealized view of the math community. There are MANY examples of prominent senior people getting upset over a perception of being snubbed. I recently read a long horror story from the category theory community. And, scooping does happen. We should aim to stop bad senior people but we should also alert junior people to potential dangers (fortunately, still pretty rare). – David White Feb 07 '24 at 14:03
  • 2
    @darijgrinberg I find that due to specialization, anonymity is practically impossible to maintain even for first impressions. I even worry that people are able to reliably guess the identity of the referee based on the way the comments are worded and what kinds of things they say. – Monroe Eskew Feb 07 '24 at 14:04
  • @JoeSilverman I agree wholeheartedly with your comments. I usually wait a week or two between arxiv and submitting. Very junior people might add the step of emailing the preprint to specific math friends to get feedback. I'm glad to know math is moving more towards double blind, for equity reasons. I've published several double blind papers in applied statistics and for those I left them off arxiv till the paper was accepted. – David White Feb 07 '24 at 14:05
  • 4
    @DavidWhite If double-blind leads to less open-access (or slower speed of access), I doubt it being overall a positive trend. That's a big trade-off. – Monroe Eskew Feb 07 '24 at 14:08
  • 3
    @DavidWhite: Since sci-hub stopped getting new papers (ca. 2021), there is no reason to expect readers to have access to published versions. Thus, many will be reading the arXiv version one way or another. If the numbering differs, they will be confused. (Even more so if there is an uncorrected error in the arXiv version.) – darij grinberg Feb 07 '24 at 14:09
  • 1
    @darijgrinberg I think we'd both agree that the best solution is for authors to upload the last version of their papers to arxiv as I suggested in (13). – David White Feb 07 '24 at 15:24
  • 1
    @DavidWhite: Yes, but that's on the authors of the paper being cited. I'm talking about the steps that the authors of a paper citing it should be taking. – darij grinberg Feb 07 '24 at 16:02
  • @darijgrinberg, re, has Sci-Hub stopped getting new papers? Wiki notes that archive efforts were undertaken in case it was taken offline, but not that it is no longer adding new papers. \ Also re, I mention that it is not required to provide source, and some authors don't like to. But even more readers don't even know about the option for the papers that do, so it's always good to spread the word! – LSpice Feb 07 '24 at 21:53
  • @LSpice: Just try searching for a post-early-2021 paper on Sci-hub (by date of publication). At the moment, even the old ones are hard to access, but they are duplicated on the annas-archive.org mirrors. – darij grinberg Feb 07 '24 at 21:56
  • 1
    @SamHopkins I just submitted a paper to the American Mathematical Monthly and the instructions said that I should not post my manuscript anywhere publicly until the journal arrives at an editorial decision on my paper. I don't think they would reject a paper if they discovered it on the arXiv, but this illustrates that the tension between double-blind refereeing and posting on the arXiv is not just hypothetical. In my case, I decided to comply with the request, but as you point out, there are many reasons why early-career mathematicians in particular might face a quandary. – Timothy Chow Mar 09 '24 at 06:36
  • @timothychow of my 32 publications, 5 have been double blind review: 3 in math adjacent fields, 2 in stats/data science education. I did what you're doing, and kept it off my webpage and arxiv till accepted. I think senior people can afford to wait, and to support double blind. Maybe someday the culture around it would change and there wouldn't be pressure on junior people to have public preprints. Anyway clearly it starts with people like us with stable jobs. – David White Mar 09 '24 at 10:55
3

I think that the question has a few false premises. The arXiv is not necessarily the best source for "understanding the latest research".

First, there is room to argue that in mathematics the "latest" research is not necessarily the best place to start. In my experience, really delving into the founding papers on a subject has been much more important as a first step. Only later, when a researcher is familiar with the big picture, is it a good idea to find the latest work in the field.

Second, the arXiv is a very small amount of recent research. It might be more helpful (depending on your field) to sign up for emails from the top journals, and review the titles of the new papers published every month.

Third, again in my experience, it is very rare for somebody to be working on exactly the same thing I'm working on. The arXiv (or any journal, or conference) can be a good place to get a new problem to work on, but the chance that somebody is going to publish on the arXiv something directly related to what you are working on is going to be very small, unless you are taking a long time to do that research, you took your problem from an active research program of someone else, or a lot of people are thinking about the same thing because it is a famous problem.

Pace Nielsen
  • 18,047
  • 4
  • 72
  • 133
  • 13
    I'm surprised about your claim that the arXiv is a small amount of recent research. I would guess the large majority (not all) of new papers I am interested appear on the arXiv, and significantly before they appear in any journal. – Sam Hopkins Feb 07 '24 at 17:55
  • 2
    @SamHopkins In the past week, there were 26 submissions under the arXiv heading "rings and algebra". Most of these are not directly in my specialized research area (noncommutative ring theory). Every month, the single journal "Communications in Algebra" publishes around 30-35 articles. A much larger percentage of these are in my area. And this is just a single journal. The total output in good journals far exceeds the arXiv (in both amount, and in quality---depending on the journal). – Pace Nielsen Feb 07 '24 at 18:02
  • 6
    Maybe it is subfield dependent then. There were 19 preprints listed or cross-listed on math.CO (combinatorics) yesterday, and I would guess that is pretty typical. But I do wonder why so many researchers in your area would not put their articles on the arXiv. – Sam Hopkins Feb 07 '24 at 18:08
  • @SamHopkins It is absolutely subfield dependent. Number theory got about 75 submissions this past week, while combinatorics got 110. By the way, some researchers in my area do put all their papers on the arXiv, others never post preprints, and others (like me) put some on and others not. Speaking just for me, there are a number of reasons I don't post all my papers. – Pace Nielsen Feb 07 '24 at 18:23
  • 1
  • My field is friendly, and I can send papers to people most likely to be interested. 2. We have regular conferences, where I can talk about my research and get feedback. 3. I have almost never received feedback from my arXiv submissions. 4. I don't like some aspects of the arXiv. 5. I post preprints on my own webpage, under my own control.
  • – Pace Nielsen Feb 07 '24 at 18:23
  • 1
    How do people stay up-to-date in noncommutative algebra then? Do they subscribe to journal ToCs and ask authors for preprints in case of interest? Are there newsgroups where papers are announced? Asking for a friend (I am partly working in the field myself, and would rather like to know what has been done). – darij grinberg Feb 07 '24 at 21:59
  • 7
    @SamHopkins the arXiv can feel like it covers "the large majority .. of the papers I am interested in", but it really is only a selection. The rate of journal publication even in mathematics is more than the arXiv. The numbers are not exactly comparable, but 2021 on the arXiv is about 34.4k articles https://arxiv.org/year/math/22 (and another 10.7k with cross-lists). The US alone has 40k articles published in maths in 2022 https://www.scimagojr.com/countryrank.php?area=2600&year=2022 Germany/UK/France is another O(40k) articles. – David Roberts Feb 08 '24 at 01:38
  • @darijgrinberg Journal ToCs are one tool I use, which I've found to be much more useful than daily arXiv checking (which I also do). But, honestly, to get up-to-date on a problem, it is much better to simply use MathSciNet, look through citations and the literature, talk to others who publish in the area, etc... [There are, of course, some significant counter-examples to my general claim, which is why I waste so much time checking the arXiv.] – Pace Nielsen Feb 08 '24 at 02:29
  • 3
    @PaceNielsen I am very surprised to hear that in your area not a lot goes to the arXiv! My experience in hiring committees suggests that people do use arXiv a lot to have a first approximation of candidates' research activity, and so if some fields do not use arXiv that much it really should be better known... – Vladimir Dotsenko Feb 11 '24 at 16:49