11

Let's say that I want to prove that a language is not regular.

The only general technique I know for doing this is the so-called "pumping lemma", which says that if $L$ is a regular language, then there exists some $n>0$ with the following property. If $w$ is a word in $L$ of length at least $n$, then we can write $w=xyz$ (here $x$, $y$, and $z$ are subwords) such that $y$ is nontrivial and $xy^{k}z$ is an element of $L$ for all $k>0$.

This lemma basically reflects the trivial fact that in any directed graph, there is some $n$ such that any path of length at least n contains a loop.

Question: are there any other general techniques for proving that a language is not regular?

Andy Putman
  • 43,430

6 Answers6

18

Let Ln be the number of words in L of length n. If sum L_n x^n is not a rational function, then L can't be regular. See the proof in the comments.

Qiaochu Yuan
  • 114,941
  • 3
    THAT. IS. AWESOME. Where can I read more about this? – Andrew Critch Oct 20 '09 at 04:29
  • 3
    Flajolet and Sedgewick's text "Analytic Combinatorics" is available online here: http://algo.inria.fr/flajolet/Publications/books.html

    The proof can be done using both definitions of a regular language: if you define a regular language in terms of a regular grammar, this is equivalent to specifying the above generating function using sums, products, and the "Kleene star" 1/(1-x). If you define a regular language in terms of recognizability by a state machine G, then you can extract the generating function by considering 1/(I-At) where A is the adjacency matrix.

    – Qiaochu Yuan Oct 20 '09 at 04:47
  • Simply awesome. Thanks for making me aware of this. – Andrew Critch Oct 20 '09 at 04:51
  • 1
    I'm curious about the "language is unambiguous" part. I would have thought that unambiguous-ness was a property of a grammar, not a language. What does "unambiguous" mean, and how do you prove it for the language of Dyck words? – Reid Barton Oct 20 '09 at 04:53
  • Hmm. According to Google there is a notion of "inherently ambiguous" language, which is a language with the property that every grammar describing it is ambiguous. But the search results I'm getting don't agree on whether there exist inherently ambiguous regular languages. – Qiaochu Yuan Oct 20 '09 at 05:04
  • Oh, it shouldn't be an issue because we can apply your second argument to a DFA which recognizes the language. – Reid Barton Oct 20 '09 at 05:09
  • According to Wikipedia, the language { x^ay^bz^cw^d | (a=d and b=c) or (a=b and b=c) } is inherently ambiguous. – Tom Church Oct 20 '09 at 21:57
  • Yes, but is it regular? I don't have much intuition for these things. – Qiaochu Yuan Oct 20 '09 at 22:09
  • No: applying Myhill-Nerode, note that x^k and x^j can be distinguished by w^k. Thus this language is not regular. (I had misunderstood your question; I don't know if there is any regular language that is inherently ambiguous.) – Tom Church Oct 21 '09 at 08:13
  • Diego de Estrada says "a regular language [is always unambiguous], because there exists a DFA that accepts it" on this page: http://mathoverflow.net/questions/563/is-the-diagonal-of-a-regular-language-always-context-free/1864#1864 – Tom Church Oct 22 '09 at 16:38
  • As to intuition for regular languages (and hence also for sofic shifts), suppose a very long word is displayed on a screen one letter at a time (let's say you can press a button to see the next letter, but no buttons for going back), if you can decide if the word is in L with a bounded amount of memory (either your memory or jotting things down and erasing on physical papers), then the language L is regular. And if you write your decision algorithm in an automata, you will only need a finite number of states because your algorithm requires a bounded amount of memory. – Yoo Nov 05 '09 at 08:53
  • For a tight discussion of disambiguation (by David Eppstein, of course!), see here: https://mathoverflow.net/a/45163/1650 . For a discussion of why a regular language has a rational generating function, see Stanley's discussion of the "transfer-matrix method''; you can find it as Theorem 4.7.2 of EC Vol I [page 242]. – Sam Nead Jun 14 '17 at 07:23
18

For necessary and sufficient conditions for a language to be regular (sometimes useful in proving nonregularity when simpler tricks like the pumping lemma fail) see the Myhill–Nerode theorem.

  • 3
    I'd say more (and do so when I teach the material). There's really no point in learning anything but Myhill-Nerode which is both more general, and natural (though I hate the name for the usual reason that it says nothing about the result). – Michael Albert Aug 14 '13 at 04:15
  • Sometime between when I left this answer and now I have come around to the same point of view. If we were only teaching about regular languages, I would drop the pumping lemma. But the pumping lemma for regular languages does still have some small use as a warmup to the pumping lemma for context-free languages, where we don't have Myhill–Nerode. – David Eppstein Jun 03 '17 at 04:42
9

Another good way to prove language L non-regular is to find a regular language A such that L∩A is non-regular.

For example, one can take A = a*b*, and prove that L∩A = {a^nb^n : n≥0}.

This method works because the intersection of two regular languages is always regular.

subshift
  • 1,110
6

The usual pumping lemma gives only a necessary condition for a language to be regular, but there are more powerful versions giving necessary and sufficient conditions, using "block pumping properties".

A. Ehrenfeucht, R. Parikh, and G. Rozenberg, Pumping lemmas for regular sets, SIAM J. Comput. 10 (1981), 536-541.

S. Varricchio, A Pumping Condition for Regular Sets, SIAM J. Comput. 26 (1997), 764–771.

A. de Luca and S. Varricchio, Finiteness and Regularity in Semigroups and Formal Languages (Monographs in Theoretical Computer Science. An EATCS Series), Springer 1999.
ISBN: 978-3-540-63771-4 (Print) 978-3-642-59849-4 (Online)

J.-E. Pin
  • 851
5

The following lemma, which I saw in some old book, can handle most textbook examples of pumping lemma usage and is usually much easier to apply when it does apply. Let $L$ be a language, and suppose there are strings $\alpha_1,\alpha_2,\ldots $ and $\beta_1,\beta_2,\ldots$ with the property that $\alpha_i\beta_j\in L$ iff $i=j$. Then $L$ is not regular (easy proof using a DFA).

Note that the $\alpha$s and $\beta$s are just strings, they don't need to be in $L$ and usually aren't.

Examples:

  1. $\alpha_k:=a^k$ and $\beta_k:=b^k$ shows $\lbrace a^kb^k : k\ge 0\rbrace$ is not regular.
  2. $\alpha_k:=a^k$ and $\beta_k:=ba^k$ shows the set of palindromes is not regular.
  3. $\alpha_k:=ba^k$ and $\beta_k:=ba^k$ shows $\lbrace ww : w\in\Sigma^*\rbrace$ is not regular.
Brendan McKay
  • 37,203
4

Some variants of the pumping lemma are complete for determining whether a language L is regular. For example, consider this two player game:

Player 1 picks a positive integer k. Player 2 picks a string w of length k. Player 1 picks a string partition w=abc, with b non-empty. Player 2 picks z so exactly one of {wz, acz} is in L. The last player to make a valid move wins.

Then L is regular iff Player 1 has a winning strategy. This comes down to Myhill-Nerode, as mentioned earlier. For a similar example, see Jaffe.

mic
  • 141
  • 1
  • 1
  • 4