Recently I encountered a new phenomenon when I tried to submit a paper to arXiv. The paper was an erratum to another, already published, paper and will be published separately. I got a message from arXiv saying that I need to join the erratum with the original file. I was a little surprised receiving a reply from, obviously, a human being. Although I thought the request was a bit silly, I did what was requested, submitted the joint paper (the original union the errata), and forgot about it. But today I got a call from another mathematician. She tried to submit a paper with a title "... II". The paper "... I" was already in the arXiv and submitted to a (very good) journal. Both papers solve similar but different problems. One of these problems is at least 40 years old. Her submission was denied: she got a request from the arXiv to submit a union of that new paper and the old paper instead. This is quite silly. Is there now a special person in the arXiv who is making these decisions? It looks like there has been a change in how arXiv is managed. I understand that this is not a research question, and I make it a community Wiki. I post it here because several frequent MO users are affiliated with arXiv.
-
Something has changed. I sent in a version 2 of my own paper, it has altered the "Comments" but did not do anything with the revised manuscript, so there is still just version 1. I sent an email, I guess they are going through some transition. – Will Jagy May 13 '12 at 01:53
-
@Will: Which paper? 1010.3677? It looks like that has a version 2, different from version 1 and submitted this week. The replacement process used to be immediate but now it is on the same schedule as the paper announcements - is that what you're talking about? Or is there a deeper issue? – Henry Cohn May 13 '12 at 02:07
-
@Henry, thank you for letting me know. Evidently it is my home web browser that is not working properly. There is something mysterious about flushing the cache that i have occasionally managed to do on my own. Does version 2 have 13 pages? – Will Jagy May 13 '12 at 02:10
-
@Will: Yup, it does. – Henry Cohn May 13 '12 at 02:14
-
@Henry, Thank you so much. I went to Firefox "Tools" and then "Clear Recent History" for, I guess, the past hour. Now the thing has a version 2. Version 2 sucks, of course, the referee made me cut it all in half, but I guess version 1 wasn't so good in the first place. – Will Jagy May 13 '12 at 02:15
-
Mark: Care to explain why merging the paper with the erratum is not the best solution? I am just interested in knowing; I'm not claiming the opposite. – darij grinberg May 13 '12 at 02:22
-
2@Darij, it seems Mark is saying merging was not so bad in his own case, but genuinely problematic in the other case he mentions. – Will Jagy May 13 '12 at 02:30
-
7@Darij: I prefer to have papers in the arXiv correspond to published papers. Therefore I wanted to submit a new, corrected, version of the original paper, and the erratum separately since these will be two different publications. But I do not think arXiv's request was that bad either. I do think that in the second case (when they requested to join paper "... I" with paper "... II") the request is rather silly. In that spirit, they would have to request that I join all my papers into one because there are intersections (my name, for example, appears in all of them). – May 13 '12 at 02:36
-
6Sorry, but the ArXiv management has nothing to do with Mathematics, so I voted to close. – Felipe Voloch May 13 '12 at 02:40
-
62I vote that this question remain open. Certainly arXiv policy is directly related to the work of research mathematicians. – Theo Johnson-Freyd May 13 '12 at 02:44
-
22@Felipe: arXiv is undoubtedly the most important tool for mathematicians after a computer, and any changes in its management are important too. If arXiv is slowly turning into a mega-journal, we, the mathematicians, need to know about it, and that change cannot occur without an open discussion. – May 13 '12 at 02:47
-
1@Will: I completely agree about the other case. But in the case of an erratum, I would be seriously irked if I were to waste time reading a flawed paper because the paper doesn't link to the erratum. ArXiv gives an easy way to access previous versions of a paper, so nothing is lost when you replace a paper by a corrected version. – darij grinberg May 13 '12 at 03:00
-
@Darij: I have explained in a comment above that I wanted to submit a corrected version of the main paper separately. Now, the paper consists of two parts: the wrong part and the erratum correcting a statement in the first part. The erratum also contains some new information (it is called "Erratum and addendum"), so I could not just correct the initial paper. Anyway that is not the main point of my question. – May 13 '12 at 03:05
-
5I agree that this is an important question and that the best audience for it is the MathOverflow community. However, I am tempted to quote Mark and say "not research. Vote to close". Instead, I suggest that this be brought up on publishing.mathforge or meta.mathoverflow or even math.stackexchange. Either that, or get a consensus change of what is appropriate for MathOverflow. Gerhard "Ask Me About System Design" Paseman, 2012.05.12 – Gerhard Paseman May 13 '12 at 05:56
-
Actually, a better audience would be whoever can talk to the arXiv admins. Hopefully someone here will be able to redirect Mark's concerns appropriately. The question of effective contact information for arxiv I think might be an acceptable MathOverflow question. Gerhard "Ask Me About System Design" Paseman, 2012.05.12 – Gerhard Paseman May 13 '12 at 06:00
-
I sent an email to www-admin@arxiv.org with subject line Dear arXiv, which is what they recommend if you have no idea what you are doing. I pointed out that Henry Cohn had given me enough of a hint to find the update of my paper. Also gave a link to this question. It may get sent to a spam folder, but maybe somebody will read it. – Will Jagy May 13 '12 at 06:11
-
1@Mark: I agree with everything you said in your comment following mine. Please write to Eric Friedlander, president of the AMS, and have the professional organization that represents us write to Paul Ginsparg. MO, in my opinion, is just not the place for this discussion. – Felipe Voloch May 13 '12 at 13:35
-
3@Felipe: I am not sure AMS is related to the arXiv in any way. According to the arXiv Web site, it is run by the Cornell University library, http://arxiv.org/help/general. Anyway, I think both answers are very informative, and probably will lead to a resolution of the original problem with papers "...I" and "...II". – May 13 '12 at 15:06
-
1Yes, the arXiv is completely independent from both the AMS and the APS. – Greg Kuperberg May 13 '12 at 16:58
-
1I never suggested that the ArXiv was connected with the AMS but I assume that if the ArXiv people get contacted by the AMS president they will take notice. – Felipe Voloch May 13 '12 at 18:56
-
@Felipe: They (Greg) have already taken notice. – May 13 '12 at 19:05
-
1For the record, I wish this question would not have been asked. I only do not vote to close as it seems 'too late'. – May 13 '12 at 23:55
-
@quid: you did not explain why you wish so. Personally I learned important new information about how arXiv operates. I did not know about the "overlap" software and that arXiv is underfunded. I think everybody would benefit from this information. – May 14 '12 at 00:42
-
3@quid: I agree -- this site is for mathematics, and not for bitching about arXiv moderation (such bitching can be directed straight to @Greg). I did just vote to close -- better late than never. – Igor Rivin May 14 '12 at 00:44
-
6@Mark Sapir: similarly as Felipe Voloch and (later) Igor Rivin I think it is not really on-topic and a direct communication with 'arXiv' or somebody affiliated with it would have been a better option, and the general tone was IMO a bit uncharitable. Just like if I have a problem with a journal where you are editor (I don't, just as an example) I will contact the journal or maybe you or some other editor, but would not post a question on MO just because you are likely to read it and some other people might have a similar problem or some second hand info to share. – May 14 '12 at 01:22
-
1Cont. And finally as the long comment threads show (even after removing meta comments like mine), it is rather too discusson-y for MO. – May 14 '12 at 01:25
-
3@quid: You are of course correct in general, but the author did contact the arXiv via the standard channels and got an automatic response which did not make any sense. It is not clear who is in charge of the arXiv, I learned that it is Greg only today. Now the situation seems to be under control. The problem is that arXiv is not a journal, lots of processes there are automatic and it is not always possible to reach an actual human being. As a bi-product we all learned some valuable information about arXiv. So I think that although the question is certainly out of order, it was useful. – May 14 '12 at 01:28
-
11@Igor: I do not know exactly what you meant, but the only comments in this discussion which can be qualified as "bitching" came from you. – May 14 '12 at 01:30
-
3@Mark: no, the entire thread is about complaining (= bitching, check out your favorite English dictionary) about Olga's admittedly unfortunate experience, which has absolutely nothing to do with Math Overflow, no more than, as @quid correctly points out, complaining about injustices at the hands of journal editors. There are proper channels, and, unlike journals (where getting a paper rejected by Annals costs you that hoped-for pay raise), you can just put the paper on your web page. I'd rather learn about the best whiteboard markers (no, not being facetious). – Igor Rivin May 14 '12 at 02:12
-
5The thread is about management of the arXiv which is the most important tool for a mathematician. There is obviously no alternative to the arXiv. Many people read new additions to arXiv every day and very few people actually read the printed version of Annals of Mathematics. Personally when I need a paper published in a journal, I search for it in the arXiv and read it online. The only real benefit of journals for me is that a published article has been refereed. – May 14 '12 at 02:42
-
27I have opened a meta discussion http://tea.mathoverflow.net/discussion/1358/arxiv-management/ and recommend moving the debate about the appropriateness of the question there. Please vote this above the fold. – Benjamin Steinberg May 14 '12 at 02:44
-
1Voting to close as no longer relevant based on Mark's comments on the meta. – Benjamin Steinberg May 14 '12 at 03:03
-
17@Mark Umm...I feel pretty strongly on this point, I'm not "in charge of the arXiv". I'm a math moderator, and chair of the math advisory committee. That is very far from being in charge. – Greg Kuperberg May 14 '12 at 03:36
-
@Greg: Is that correct that if an arXiv contributor receives strange automatic response from the arXiv, then he/she needs to contact you directly (as was suggested here)? – May 14 '12 at 04:19
-
I edited the title to be more specific and match the question. – Noah Snyder May 14 '12 at 05:13
-
2@Mark No, if all else fails you can contact me directly. What you should really do is contact the arXiv admins. They should do better than respond with a stock reply that doesn't fit the facts. – Greg Kuperberg May 14 '12 at 05:14
-
28It is not appropriate to use the word "bitching" in a professional setting. – Noah Snyder May 14 '12 at 05:16
-
With the new title, the question is different, and the answer was given by Joseph (and later confirmed by Greg). Thus I changed the accepted answer. – May 14 '12 at 11:15
-
7@Noah: is this a faculty meeting? – Igor Rivin May 14 '12 at 12:19
-
7No, not as formal as that, it is a casual professional setting. Department tea is the most commonly used analogy here. An internal seminar is another good analogue. – Noah Snyder May 14 '12 at 16:31
-
I think the new new title is fine, and an improvement on the original. The new new title has the property that if you only read the title you won't be misled. The arxiv does indeed have a new or newish procedure of checking for similarities, but does not have new management. – Noah Snyder May 14 '12 at 16:52
-
1I changed the title to one that better reflects the discussion and the original question. The question is about new ways the arXiv is managed. The two examples are just examples and the facts discovered during the discussion (the overlaping software, the ways to resolve "conflicts" with the arXiv staff) go far beyond these examples. – May 14 '12 at 16:53
5 Answers
I'm still the chair of the math arXiv advisory committee, which admittedly hasn't done a whole lot lately, and one of the global math moderators. No, there has not been any dramatic change in the management of the arXiv at Cornell. If anything, I wish that by now more might have changed. The arXiv has always had the bare minimum funding, sometimes less than the bare minimum. They have never had polished public relations to properly explain small changes in policy. (Actually even wealthy Internet companies sometimes stir up confusion when they make changes.)
At some informal level, they/we have always worried about duplicate submissions, and near duplicates, and errata posted as new papers. And yes there is a new text overlap tool to detect both plagiarism and self-plagiarism. There is no good, rigorous way to draw the line for any of these issues. (Just as there isn't at MathOverflow --- what exactly is an "exact duplicate" of a previous question?) Regardless, if your submission is rejected, you do have the right to "file" an appeal with the Cornell staff. If it is a plausibly sane appeal, then they should show it to the math moderators and/or the math advisory committee, more likely the former these days.
One perfectly valid consideration is to have the arXiv correspond to what is published in journals. Although there are cases where strict adherence to that rule is untenable. For instance, my mother and I have a joint paper in the Annals of Mathematics that appeared twice just because the first time, the paper had TeX symbol encoding errors.
Also, I personally think that this posting is reasonable for MathOverflow. However, it would have been better with a less suspecting tone. The arXiv doesn't always make the best impression, but long-time users know that actually it has gotten better over the years. For a long time it had a reputation as a "user belligerent" web site. Even then, it was still a force for good, obviously.

- 56,146
-
28You could claim that your paper with your mother was so good the Annals published it twice. – Chandan Singh Dalawat May 13 '12 at 07:06
-
"arxiv has always had a bare funding" sounds as a bad surprise - the most important infrastructure project and it has a bad funding. How this can be ? – Alexander Chervov May 13 '12 at 08:07
-
-
19@Greg Kuperberg : Will an author be notified and given time to appeal before an accusation of plagiarism is attached to their posting? I've seen the accusations in the daily mailings, and usually when I check the two papers in question they don't look (on a quick glance) to be instances of plagiarism. An accusation in a public forum like that could be a real disaster for someone's career. And certainly there is sometimes a bit of shared text in papers of mine (eg how many different ways can I discuss the Birman exact sequence in the "preliminaries" section?). – Susan May 13 '12 at 13:41
-
2Strictly speaking, they aren't accusations of plagiarism, but just that part of the text is very similar to another paper (which could for example be plagiarism in the other direction). In any case, it's hard to appeal the question of whether there's overlap in the texts, unless the author claims the software simply made an error and the overlap is imaginary. – Henry Cohn May 13 '12 at 14:51
-
2@Susan These are good questions and I have some answers and not others. First, Henry is right that it is not strictly an accusation of plagiarism. The software only says "text overlap with..." when you recycle your own text. It says "...by other authors" when it was someone else's text, and "...by other authors without attribution" when it looks like real plagiarism. I do not know about notification. I know that it has to be flagrant, not just bits of text, for any of these labels to appear. – Greg Kuperberg May 13 '12 at 15:08
-
5Actually, I take it back. In most cases they do not make accusations of plagiarism, but it does happen. In http://arxiv.org/abs/1108.0977, the arXiv admin note reads "substantial text passages taken verbatim from Mantegna and Stanley (ref 1). Much additional text plagiarized from other sources without attribution, including from Farmer et al, "Is Economics the Next Physical Science?" Physics Today, Sep 2005 (arXiv:physics/0506086), and from arXiv:0810.5306, arXiv:cond-mat/0301096". I assume they only do this when it is really clear cut. – Henry Cohn May 13 '12 at 15:09
-
@Greg: Thanks! But for the answer to the question about papers "...I" and "...II" - should the authors just write to "Dear arXiv" and explain the situation? – May 13 '12 at 15:09
-
@Mark Writing to the arXiv staff in a situation like that is certainly a good start. – Greg Kuperberg May 13 '12 at 15:18
-
16@Greg: It turns out that the author did write to the arXiv staff and received a truly remarkable answer. Basically they say that she posted "too many" articles recently "with similar ideas" (!!), and that a moderator (anonymous!) suggests joining the papers into one. The author, by the way, is a Distinguished Professor in Mathematics and does not have as many papers in the arXiv as, say, Shaharon Shelah (or even myself).
About plagiarism (or "overlaps without references" as arXiv puts it), I agree with Susan. That is a really dangerous thing - both for the authors and for arXiv.
– May 13 '12 at 15:51 -
1@Mark I'm happy to take a look at what really happened if you and your colleague forward this e-mail to me. At the moment, however, you've presented this as an anonymous accusation against the arXiv. I can't do anything about it in that form. In general, if you like the arXiv, then anonymous, incomplete, public accusations aren't the best approach. – Greg Kuperberg May 13 '12 at 16:03
-
1@Greg: I have told the author to send a message to you. I cannot disclose her name here for obvious reasons. I hope this matter will be resolved. – May 13 '12 at 16:12
-
1
-
@Mark In my opinion, it is completely reasonable to ask someone like me what is going on in this case. I have no objection to that; on the contrary, I'd rather be answerable. However, the claim that you can't disclose her name turned out to be neither obvious nor true --- just two minutes later she decided that she didn't mind. – Greg Kuperberg May 13 '12 at 16:40
-
2(by the way, "Susan" is actually me -- I sometimes ask questions under fake names for various silly reasons, and I forgot that I hadn't yet cleared my cache). – Andy Putman May 13 '12 at 18:02
-
11@Greg : Thanks for the answer. If the software is going to leave a comment about there being shared text, is the author notified in time to withdraw the paper? Frankly, I'd rather not use the arXiv if I don't have control over things like this or don't have an opportunity to protest before the (non)accusation goes out on the daily mailings. – Andy Putman May 13 '12 at 18:13
-
4Here's a question about the text overlap notice: if an author quotes literally his own (let's suppose long) theorem from a previous paper is this enough to trigger the overlap notice?
What is bothersome about the text overlap notices is that they insinuate plagiarism, but it is not clear that they necessarily signify plagiarism (in fact, in some cases I looked at it was clear they did not).
– Dan Fox May 13 '12 at 20:21 -
@Dan and @Andy I would agree that that the text overlap policy is not completely thought through. I told them that myself months ago. In particular, I asked for a better warning to the author in the case of "self-plagiarism". I don't know the current status of that issue. However, I really doubt that just quoting your own theorem is enough to trigger the label. – Greg Kuperberg May 13 '12 at 20:44
-
@Mark et al, Andy/Susan reminded me of something. The psychology grad student Yla supplemented her online survey of MO users with telephone interviews a few months ago. One thing I said, as long as MO does not tie itself more closely with "traditional" publishing, MO will have underrepresented topics, too few women on site, and so on. The point is that a variety of "market" forces and governmental programs have gone into levelling things out in real life. I cannot say I thought as far as the arXiv, but anything we can do to make MO a good way to help get published is a plus. – Will Jagy May 13 '12 at 21:28
-
7@Greg: I could not disclose the name because I did not ask for permission to do that. That was the obvious (at least to me) reason which I referred to. – May 13 '12 at 21:41
-
2If substantial overlap is found in a section of a paper entitled something like "Background" or "Introduction", I wonder if arXiv could at least include the section title along with the declaration of overlap, if this declaration is made at all? (Maybe this would make it less likely that readers would jump to the wrong conclusion...) – Patricia Hersh May 13 '12 at 21:55
-
@Dan Fox: What was going on in the cases you looked at where it did not signify plagiarism? If we set aside self-plagiarism (which, if it is not explicitly acknowledged, is dishonest but certainly not as bad as plagiarizing from other people), then the cases look pretty unambiguous to me. I've only looked carefully at a handful of them, but each time it was clear that the overlap could not possibly be a coincidence. Of course, copying a few paragraphs is not as bad as copying a whole paper, but it's still a form of plagiarism. – Henry Cohn May 13 '12 at 23:11
-
9@Henry and Greg: what I was thinking above is that people may write several papers that each review some of the same definitions and key past results in a preliminary section, and I was concerned arXiv might now be picking that up as "substantial overlap". – Patricia Hersh May 13 '12 at 23:19
-
1@Henry: I looked at this some time ago, and looking again it seems my parenthetical remark is overstated, although one can find examples of articles (in some cases by well known mathematicians) that are signaled as having "text overlap" with papers by the same mathematician, and in which it looks as if what has been copied is introductory material, definitions, and the like. Since I probably mainly looked at the least suspicious cases, my judgment could be suffering from an obvious sample bias. – Dan Fox May 14 '12 at 05:48
-
4Probably this comment deserves a discussion of its own. What is self-plagiarism and why is it wrong? Some of the obvious objections presuppose the old-fashioned model of propietary journal publishing. It seems that copying one's own material is bad style, and boring writing, and communicates a lack of seriousness, but those things don't mean that it is immoral or unethical (though perhaps it is for other reasons). Certainly representing something as new that is not is somehow unethical, particularly when the motivation is to inflate publication and citation counts. – Dan Fox May 14 '12 at 05:51
-
2@Dan Of course you can debate how much it really matters, but it seems worth labelling it as self-overlap. Patricia Hersh's point is well-taken, but there is some truly flagrant regurgitation out there. – Greg Kuperberg May 14 '12 at 06:01
-
2That is true, Greg, but a tiny set of people doing the overlap checks when a script automagically flags "suspect" papers in The One Repository™ is probably not the solution to that—people will develop better rephrasing skills... – Mariano Suárez-Álvarez May 14 '12 at 06:48
-
3When a performer performs the same concert in different venues to different audiences, is that self-plagiarism ? What about an academic writing about the same stuff in different venues for different audiences with perhaps a different emphasis. Why should every paper be a presentation of new results ? If all such papers are aggregated from different venues into a single venue then it looks like "self-plagiarism" but that phrase is an ungenerous judgement about motives. The arXiv is a repository, not a venue. Just like YouTube has multiple "copies" of the same stuff from different venues. ... – user19172 May 14 '12 at 07:34
-
3... If publication counts don't distinguish between papers that are highly original and papers that restate things in a different way then that just shows the limitations of using publication counts as a measure of anything. – user19172 May 14 '12 at 07:36
-
5@Henry and Greg: Here's an example. When I search today on ArXiv for "text overlap" the very first example that appears is by some serious mathematicians. This 2012 paper repeats some proofs of some technical lemmas from a 2008 paper by the same authors that has never been published (i.e. in a traditional venue). Clearly the authors want to have a published source for the proofs. The problem stems from the fact that many people do not consider posting to the ArXiv to be publication, rather they regard it as a "preprint" - something not yet necessarily in its definitive form. – Dan Fox May 14 '12 at 07:39
-
7If an institutional repository is going to signal certain papers in a way that could easily be construed at a glance to indicate unethical behavior on the part of the authors, then that institution has a responsibility to enunciate clearly what it regards as acceptable and not acceptable, and what are its standards for signalling what it signals. This matters particularly because a quick review suggests that, as Henry Cohn said, most of the signalled examples are clearly plagiarism in the obvious and unpleasant sense (and flagging these seems to me quite reasonable). – Dan Fox May 14 '12 at 07:47
-
2@Dan: That's a good example. Fortunately what's going on would be clear to anyone who looked into it, but I can see how it would upset the authors. @Unknown: Recycling nontrivial amounts of text is generally suboptimal (rewriting would create a better fit for the venue and audience), which is a good reason to discourage self-plagiarism. However, there's no ethical problem provided you clearly announce what you're doing (and have legal permission if you signed over the previous copyright, I guess). The problem occurs only when you don't. – Henry Cohn May 14 '12 at 11:40
-
5@Henry Cohn: Recycling definitions is normal and definitions can be quite long. The few examples that I saw do not constitute self-plagiarism in any way. The latest is arXiv:1205.2434. I hope the arXiv administrators know what they are doing. – May 14 '12 at 14:47
-
3@Mark: 1205.2434 is a case like the one Dan mentioned (perhaps the same case), where they are taking nontrivial chunks of text from an unpublished paper (0801.1513). As they write, "Some steps in the proof of Theorem 1.3 (notably Propositions 6.2 and 6.3) already appeared in an unpublished manuscript by the authors (see [FV08c])." This is clearly fine: 0801.1513 wasn't even published, and in any case they explicitly acknowledged the overlap. – Henry Cohn May 14 '12 at 18:51
-
-
2@Dan: So what do you think about it? It does not look to me that anything inappropriate has been done by the authors of these papers. The fact that the arXiv robot does not distinguish "good" from "bad" is potentially very damaging. If this software is to be used, the moderators have to be involved in every case. I think it is obvious and the fact that the arXiv people have not thought about it is very surprising (to say the least). – May 17 '12 at 01:44
-
1@Mark: I gave that example to illustrate why the current manner of flagging seems to me to require some further reflection. Many of us (rightly or wrongly) interpret the flagging as insinuating plagiarism; on the other hand this example is one where there seemed to me nothing at all problematic with what the authors had done, and moreover what they had done could even be considered necessary from an old-fashioned point of view on "publication". I did not cite the example directly precisely because because calling attention unnecessarily to perfectly reasonable behavior seems an error. – Dan Fox May 21 '12 at 08:24
I talked to the arXiv staff about Olga Kharlampovich's submissions and I now have some answers. The letter that Olga posted here is a form letter that doesn't fit the facts. The text overlap tool reported that the new submission substantially overlapped with the old submission. After that, as far as I know, no moderator and no advisory committee was ever contacted. Instead, an arXiv employee sent this stock response just to keep things moving. After that, I was told, her case was added to the to-do list. I was assured that as of last week, before this question was posted to MathOverflow, her submission was already slated to be reverted in her favor on Monday.
Obviously this is not satisfactory. I am one of the moderators (and not the only one) who should have seen the appeal. The e-mail said that someone like me had seen it and rejected her appeal, but apparently no such thing happened. It seems that the submitted version (which I think is now version 3) had something like 75% text overlap with the previous version (version 2) of arXiv:1111.0577. It's not so unreasonable to flag such a submission. After that it wasn't handled properly. I do not want to name names and lead people to pour opprobrium on the overworked arXiv staff. (There are only two of them who handle daily submissions.) But I want to make this story sound accountable, so I can say that some of my information came directly from Paul Ginsparg.
To go back to the title question, no there has not been any great change in arXiv management. You could certainly argue that there is insufficient management, but that's not the same thing.
People are also asking about the policy by which papers are labelled as having text overlap with other papers. A clearer statement of that policy would be useful, but that is a separate question from Olga's case.
According to e-mail that I just saw, this morning Olga was given the option of reverting the previous arXiv paper to Part I and submitting Part II separately. Her answer, according to what I saw, was that she elected to keep it as a replacement after all. I am mentioning this so that readers who see arXiv postings this week won't think that injustice continues.
I stand by my explanation that the stock e-mail that she was sent didn't fit the facts, and that her appeal should not have been stonewalled. (In fact her appeal was soon seriously considered internally, but that was not explained.) However, in the original posting, Olga's name was withheld supposedly to protect her interests. Although I understand that anonymity is sometimes vital even in a public accusation, in this case I don't see how it helped matters.

- 56,146
-
10@Greg: I have explained to you why I did not disclose the name. I repeat the reason: I did not ask for permission to do so. There is nothing in it about "protecting interests". It is basic human behavior (taught in preschools, right after potty training). In your answers, you did not say what is the formal procedure to appeal arXiv's decision. The first step is clear: a message to the admins. What is the second step after an automatically generated reply is received? A message to you? Or is there an intermediate step? – May 14 '12 at 18:30
-
2If you send an appeal to the arXiv staff, then ordinarily you would get a human answer and not just an automatically generated reply. If that process doesn't look reasonable, then you can send e-mail, not necessarily to me personally, but to an appropriate moderator (see http://front.math.ucdavis.edu/categories/math) or to the advisory committee (see http://front.math.ucdavis.edu/about , although the physics committee there is not current). – Greg Kuperberg May 14 '12 at 18:46
-
3Greg, you mention that the arXiv staff are overworked, and indeed, I've always imagined them to be completely rushed off their feet. (It's a bit depressing that even the arXiv can't afford enough staff.) I also assume that the moderators are very busy people with a hundred other things to do, and they probably don't receive anything like the thanks they deserve. Given how overworked everyone is, I can easily understand why things that should ideally be done would get left undone: no one has the time. (Continues)... – Tom Leinster May 14 '12 at 23:25
-
11... But here's the thing: most of the arXiv problems I hear about result from the staff/mods doing too much, not too little. I'm talking about things like the case under discussion, or admins making changes to metadata against the author's will, or moderators reclassifying submissions. I'd have thought that in a situation where everyone's overworked, the default would be "don't intervene unless there's a very good reason", because intervention takes time - as does justifying it to authors, who may have valid objections. – Tom Leinster May 14 '12 at 23:26
-
9@Tom Exactly as you say, the problems that you hear about. This impression is a result of selection bias. There is usually no public discussion when they intervene or don't intervene when there is a very good reason. But if they ever make a mistake and overreact, then specific people have an incentive to air their grievances in public. It's the same way with for instance, flight attendants. (Or MathOverflow maintainers.) – Greg Kuperberg May 15 '12 at 03:51
-
2@Mark What you said above was "I cannot disclose her name here for obvious reasons." It did not occur to me that the obvious reason was that you simply hadn't asked her. I indeed thought that you meant that she had some material interest to stay anonymous. – Greg Kuperberg May 15 '12 at 03:54
-
2Greg, how could the 75% overlap of the new submission (from May) with version 1 (from November) be of any relevance if there is version 2 (from February)? There are two things that I'm concerned with: that you mentioned overlap with v1 and did not discuss overlap with v2 (is it because there was no or little overlap with v2?) and that the robot was looking at all at v1 in search of potential self-plagiarism when there is v2. That sounds like an obvious mistake in the code, contrary to your "not so unreasonable to flag". Indeed, ... – Sergey Melikhov May 15 '12 at 11:00
-
3... is it a new arXiv policy that material can no longer be moved (and not just copied) from one arXiv preprint to another (whether this involves division of one preprint into several or not)? If so, this is a huge change of policy in my eyes: I would then seriously consider refraining from any further submissions to the arXiv, and perhaps even "withdrawing" the existing ones in favor of homepage copies or some still surviving arXiv competitors, and hopefully I'm not the only one concerned with this. Could you please address these issues more explicitly in your answers in this thread? – Sergey Melikhov May 15 '12 at 11:14
-
(I did not get whether the authors also submitted a v3 in May simultaneously with their new preprint "... II"; if so, then of course the robot should have checked the new preprint against v3, not v2 or v1.) – Sergey Melikhov May 15 '12 at 11:24
-
1@Sergey I don't see what it accomplishes to interrogate me about what the three versions of arXiv:1111.0577 have in common. When I look at the three versions, they simply look like three versions of the same paper. I think that I misspoke when I said version 1 and I will edit my answer. As for your policy question, there is no policy that you can't split one paper into two papers; you certainly can. – Greg Kuperberg May 15 '12 at 19:31
-
Greg, thanks for some clarification. I'm glad that arxiv still allows splitting one paper into two. I think I was sufficiently clear on the significance of versions in this case. Unfortunately, the story of Olga Kharlampovich's new submission remains a mystery for me after all your explanations: if her new submission had a 75% text overlap with the latest version of 1111.0577 existing (either accepted or under review by the arxiv staff) at the moment of the new submission, then how could she be given the option of keeping these two nearly coincident preprints in the end? – Sergey Melikhov May 16 '12 at 08:33
-
2@Sergei: The new paper is 4 pages long. The old paper is 12 pages long. 75% overlap would mean 3 pages of the new paper are contained in the old paper verbatim. That is clear nonsense because the results proved in these papers are quite different. I suspect that the program they are using was created by an undergraduate student right after the first C++ class for a Summer REU project. – May 16 '12 at 10:32
-
Mark, I didn't see the new paper and I don't know which version you mean by the "old paper" (v2 or what was intended to replace it before robots came in). It could be that you and Greg simply refer to different versions. In any case, what Greg said sounds somewhat inconsistent. If the 75% overlap is indeed an artifact, I would expect him to acknowledge the problem with their new software (and not just public relations and overworked staff), to assure us that the arxiv team is working on fixing the bug, and to actually apologize, on behalf of the arxiv management, that things went wrong. – Sergey Melikhov May 16 '12 at 12:25
-
1I did not see the papers either, that info is from the authors. They finally modified the original version (v2, I guess) to include the new results. I do not think this is good, but it was simpler than arguing with a robot. I do not think Greg Kuperberg is the person who introduced the software and is responsible for running the arXiv. He is on the "advisory committee" whatever that means. He can probably "advise" the people who run arXiv but he is not in a position to issue apologies for them. Certainly MO is not a good place for this anyway. – May 16 '12 at 13:43
-
3(1) Mark is absolutely right, I can and do advise the people who run the arXiv, but I can't really speak for them. (2) There isn't any "bug" in the text overlap program, which was written by Ginsparg himself. There was a "bug" in how humans made use of it. (3) Olga was asked on Monday what she preferred, by a human and not a robot, and she said that version 3 should be posted. – Greg Kuperberg May 16 '12 at 14:33
Hello, I have paper 1 in the arxiv (that is submitted to the journal) and submitted paper 2 with completely new results (with similar formulations and refereeng to paper 1. I didn't want to change paper 1 because it is submitted, people refer to it, and it makes bad impression when new and new revisions are made, also the submission date is changed), the second paper was returned by the arxiv, I appealed, and this is their response:
Dear Olga Kharlampovich,
Our moderators have considered your appeal and maintain that your article is not appropriate as a new submission to arXiv. The new ideas should be incorporated into a replacement of your existing article.
In general the maintainers of arXiv choose to exercise very limited control over submissions; however, we do want arXiv to be as useful as possible for all of the various communities publishing here.
A moderator noticed that you have submitted several articles in a short period with similar ideas and content. After a discussion of your submissions among the other moderators and members of the advisory committee, we have decided to ask you to consolidate articles with similar content, or which are variations on the same theme into single articles.
This will be more efficient for the whole arXiv community, and may be beneficial to you as well. In consolidating your work you may find that you can more clearly elucidate the connections and expose the underlying principles so that your ideas will be more useful to others.
-- arXiv moderation
Let me add, that "several articles in a short period " were these Article 1 and 2". The first one was submitted in the Fall, and the second in May. I incorporated them into the same article now, but I think this is silly. What is going to happen if we get new results on a similar topic?

- 56,146
-
2I'll look into it. It would be helpful to know the date of this correspondence. – Greg Kuperberg May 13 '12 at 16:27
-
38I find this decision by the arXiv, as set out in their response, somewhat alarming. – Yemon Choi May 13 '12 at 16:30
-
4@Yemon Maybe so, but I would like to first find out what really happened. – Greg Kuperberg May 13 '12 at 16:33
-
10If actual humans are involved, they should be presumably sufficiently subject-aware to know that, for example, @Olga Kharlampovich is a distinguished mathematician, and so give her the benefit of the doubt. This is particularly weird since in physics (which probably has two orders of magnitude more volume) they are plenty of papers which are identical to first order. – Igor Rivin May 13 '12 at 20:41
-
6@Igor In fact, the math arXiv has had impressive growth and now receives more than 1/4 of total new submissions to the arXiv, so it is certainly not the case that physics has two orders of magnitude more volume. Also, as recounted elsewhere, Olga was quietly given the benefit of the doubt despite miscommunication with her. She is indeed distinguished, but that's not the reason that her case was reviewed last week. The ideal would be to give everyone the benefit of the doubt. – Greg Kuperberg May 15 '12 at 04:59
This is likely unrelated to the changes that Mark and Will noticed, but the other day a novel
(to me)
arXiv admin note (under Comments) caught my eye:

- 149,182
- 34
- 342
- 933
-
1@Joseph: Wow! An admin actually read both papers! Is (s)he getting paid for that? And it took just 2 days to discover the overlap. Amazing! – May 13 '12 at 02:14
-
@Mark: I suspect, in this case, "admin" = "software bot." Overlap perhaps confirmed by a human admin... – Joseph O'Rourke May 13 '12 at 02:16
-
Maybe the comment is autogenerated, and the text comparison is done by a computer? – Mariano Suárez-Álvarez May 13 '12 at 02:17
-
4@Mark: I believe the text overlap is checked (at least at the first stage) by machine. Paul Ginsparg (the original person behind arXiv) and his collaborators had a paper about it, see http://arxiv.org/abs/cs.DB/0702012 . – Yuji Tachikawa May 13 '12 at 02:18
-
Looks like they're screening papers for plagiarism. And they seem to be quite careful about false positives (the paper is still avaliable despite of text overlaps, Smarandache spam and the telltale math.GM classification). – darij grinberg May 13 '12 at 02:20
-
It's definitely automated, but I don't know how much human verification is involved. They distinguish between some or minor text overlap, text overlap, and substantial text overlap, and they mark some as "without attribution". – Henry Cohn May 13 '12 at 02:20
-
@Mariano and @Joseph: I would like to know more about this program. Are they comparing each new paper against all old papers (perhaps with the same "Tags")? What if the second paper changes notation a little or uses "centre" instead of "center"? Anyway, this seems like a positive change, but making a mathematician join two papers into one seems very strange. I guess the "bot" checks that there was a paper by the same author with a similar title and brings it to the attention of an admin. But then there must be an admin responsible for making requests. My question is: "Who is that?" – May 13 '12 at 02:26
-
2I took a quick glance at the two papers and the overlap appears to be that both discuss the same result due to "E. Study." This discussion is exactly the same (word for word), and goes a little beyond a formal statement of the theorem. But I do wonder how sensitive their software is - will statements of standard theorems be flagged if the theorem has been stated the same way in another arXiv paper? – Dan Ramras May 13 '12 at 02:29
-
@Yuji: Thanks! It does not answer my question though, unless the admin thinks that the paper "... II" plagiarizes paper "... I" of the same authors. – May 13 '12 at 02:31
-
1I noticed this earlier (I think in February, actually). A recent example is http://front.math.ucdavis.edu/1205.1209 – GH from MO May 13 '12 at 02:35
-
1GH's example is a case in which part III of a paper has been flagged as having overlapping content with part I. In this case, it just says "text overlap with arXiv:1203.0749" (no mention of "attribution"). – Dan Ramras May 13 '12 at 02:51
-
4@Mark, I found the full 13-page report at http://ecommons.library.cornell.edu/handle/1813/5743 with free download. – Will Jagy May 13 '12 at 03:11
-
3@Will: Thank you, it is interesting! So I guess the process is this: the software find out that papers "...I" and "... II" overlap, and sends this to a human being. The human being then decides that this is not good and asks the author to join the two papers. In that case I would like to know who that human being is (because (s)he makes important decisions concerning important papers). – May 13 '12 at 03:21
-
14It also just occurred to me that if arXiv decides that there is an overlap when there isn't one, it may ruin careers of the authors, and (at least if the authors are in the US) result in a large lawsuit against the arXiv. So the admin that makes these decisions should be quite good at it and probably well paid. – May 13 '12 at 03:32
-
2So far, no one from ArXiV said a human is involved: maybe the computer automatically generates the message suggesting a combination of the papers? – Gerald Edgar May 13 '12 at 12:31
-
2I am the happy owner of two articles with 'text overlap'. They are similar results, similarly structured, but the proofs are very different. The overlap is most likely in the historical discussion at the beginning. I'm not too happy about the admin note, but I'm not sure what I can do about it now. It's not like I can edit the published articles and change their introductory discussion. – Glen Wheeler Jun 08 '12 at 19:37
-
Let me add, that "several articles in a short period " were these Article 1 and 2". The first one was submitted in the Fall, and the second in May. I incorporated them into the same article now, but I think this is silly. What is going to happen if we get new results on a similar topic?
-
5Thank you for mentioning the date ranges, that's helpful. Otherwise "answer" in the thread should be combined with the other one. – Greg Kuperberg May 13 '12 at 16:35
-
4Welcome to MathOverflow, Prof. Kharlampovich. Any question or answer you post (but not comments) should be editable by you. For matters like this, you are encouraged to edit your answer rather than submitting a new answer. In future (and because MathOverflow submissions are much shorter than arXiv submissions) I hope you will make appropriate use of the edit feature. Gerhard "I Know; Still, Please Edit" Paseman, 2012.05.13 – Gerhard Paseman May 13 '12 at 16:38
-
1Since this is CW, I appended this remark to the other answer. This one should be removed. – Greg Kuperberg May 13 '12 at 21:20
-
49Surely someone else has noticed the humor in asking olga kharlampovich to combine this "part II" answer with her "part I" answer! – Steve D May 13 '12 at 22:13
-
3I hope Prof. Kharlampovich has noticed the respect with which the request for future edits on MathOverflow was made. I even put in a sympathetic signature. If she finds it humorous as well, so much the better for her. Gerhard "And Better For Us All" Paseman, 2012.05.13 – Gerhard Paseman May 14 '12 at 02:24