The Prime That Wasn’t

» 03 August 2010 » In PHP, Tech »

I’ve seen a lot of problems solved with regular expressions, but last Friday, thanks to Chris and Sean, I found out that one can use them to determine whether a given integer is a prime number. The original articles showed the following regular expression:


You don’t apply this to the integer itself. Instead, you create a string of repeating 1s, where the numbers of 1s is defined by the number you’re testing, and apply the regex to that. If it fails to match, the number is a prime. The regex relies on backtracking and thus would not work with a DFA-based engine, such as PHP’s (now deprecated) ereg* functions. But, it works just fine with the preg_* ones, or so I thought (more on that in a bit).

So, how does it work? It may seem like a brain buster, but it’s actually pretty simple. Ignore the portion before the alternation operator (|), because it’s just a short-circuit case for checking whether the input string is empty or has a single 1. (If the string is empty, it means the number being tested is 0. If the string is a single 1, it means the number being tested is 1. Neither of these are prime.) The portion after the pipe is where the real magic happens.

The (11+?) sub-pattern matches strings 11, 111, etc. The \1+ matches whatever the sub-pattern matched, one or more times. So on the first try, the engine will match 11 and then attempt to match the same thing one or more times in the rest of the string. If it succeeds, the number is not a prime. Why? Because it just proved that the length of the string is divisible by 2 (the length of 11), therefore the original number is divisible by 2. If the overall match fails, the engine backtracks to the beginning and tries to match 111 two or more times and so on successively. If the first sub-pattern gets long enough (n/2 basically), and the overall match still fails, the number is a prime. Beautiful, isn’t it?

Coincidentally, Sean recently created a code evaluation plugin for the Phergie-based IRC bot that runs in the channel we hang out in. The plugin is a simple proxy to, but is helpful for doing quick code tests. We had fun experimenting with this regex pattern implemented in a PHP function that returns the next largest prime number after a given one. The trouble started when Sean fed it 99999999, and the code spit out 100000001. This didn’t seem right, and Wolfram Alpha confirmed; the answer we got was not a prime. (It factors into 17 * 5882353.)

A few similar tests also gave us numbers that were not prime. But, where was the problem? The PHP code was too simple to have a bug, many answers were correct, and the regex algorithm itself was sound. It was time for brute force. I quickly wrote some code to feed increasing odd integers into into regex pattern and check its answers against the normal sieve algorithm to see where it starts failing. The first number to fail was 22201; the regex said it was prime, but it’s actually a perfect square (1492). Past that, the failure rate increased.

It then dawned on me that the culprit might be the backtracking itself, specifically how it was implemented in PCRE, the library at the heart of PHP’s regular expressions. As I mention in my Regex Clinic talk, unbounded backtracking can dramatically slow down the matching and should be well-understood before attempting to write complex patterns. To manage the danger of runaway patterns, PCRE implemented the pcre.backtrack_limit setting a few years ago. In our case, backtracking is used to break up the string of 1s into successively larger chunks, and, with very long strings, the engine may run into this backtracking limit, which is set to 100000 by default. My guess was that with the 22201-character-long string, the default was not enough. Once the limit is reached, the match fails, and the number is declared prime.

I bumped the limit up to 200000, and voilà, 22201 was no longer declared prime. To fix the 100000001 match, the limit had to be raised dramatically, to around 250000000! And, the program took 14 seconds to deliver the verdict on my new i5 MacBook Pro. Needless to say, don’t use this prime determination algorithm for any sort of real-life stuff. Instead, just appreciate it for its elegance. I certainly did, and was glad that my bit of investigative work showed that abstract, pure, good ideas can still fail in the messy world of software and hardware.

Trackback URL

47 Comments on "The Prime That Wasn’t"

  1. andrei
    Andrew Mager
    03/08/2010 at 10:34 am Permalink

    Awesome post man

  2. andrei
    03/08/2010 at 12:22 pm Permalink

    I think, the methods of primality testing methods would go newer distance with this. Your PCRE slides are also epic tut for beginner to advanced. 🙂

  3. andrei
    03/08/2010 at 1:34 pm Permalink

    Thanks for sharing, it is really cool stuff! Thanks for mentioning “pcre.backtrack_limit”!

    So you discovered a very short reg exp. But do you know the correct one for email addresses? It shouldn’t be that long…
    Here it is:

  4. andrei
    Chris Shiflett
    03/08/2010 at 2:45 pm Permalink

    I think this is a more useful email pattern:

  5. andrei
    03/08/2010 at 4:19 pm Permalink

    You should add preg_last_error() to your code to check for this error.

  6. andrei
    03/08/2010 at 4:20 pm Permalink

    Ariel, yes, that is a good idea, because you’re bound to run into the limit sooner or later with larger numbers.

  7. andrei
    Łukasz Kurowski
    03/08/2010 at 6:45 pm Permalink

    Lol, awesome :d Thanks !

  8. andrei
    04/08/2010 at 1:04 am Permalink

    beautiful ….

  9. andrei
    Richard Quadling
    04/08/2010 at 1:25 am Permalink

    And that post is worth at least twice what I paid for it! Excellent.

  10. andrei
    04/08/2010 at 1:38 am Permalink

    @Richard, 0 × 2 = 0

  11. andrei
    04/08/2010 at 1:52 am Permalink

    “The \1+ matches whatever the sub-pattern matched, one or more times.”

    Hmm, correct me if I’m wrong or maybe this is a problem with egrep but:

    echo “h” | egrep ‘(h)\1+’
    yields nothing.

    egrep ‘(h)\1+’ seems to require 2 or more h’s….

  12. andrei
    04/08/2010 at 3:09 am Permalink

    @zitstif: (h) matches one h, \1+ matches one ore more instances of what was matched in the first subpattern (i.e. at least one more h).

  13. andrei
    04/08/2010 at 4:13 am Permalink

    Ive never-ending seen “+?” been used in a regex before. What does it mean? Is it equivalent to “*”?

  14. andrei
    04/08/2010 at 4:17 am Permalink

    Argh! That should have been “never seen”. Damn auto corrections…

  15. andrei
    04/08/2010 at 6:41 am Permalink

    14 seconds seems pretty slow, i ran 6 prime/non prime number tests in 14 milliseconds using C++ on a similar spec machine …is the overhead of PHP calling a C library that high ?

  16. andrei
    04/08/2010 at 6:45 am Permalink

    The +? is a lazy +, it forces the engine to match only once before trying permutations. Its an optimisation to reduce the overhead of backtracking…

    See here for examples:

  17. andrei
    04/08/2010 at 7:28 am Permalink

    Glad to see my regular expression has some use.

  18. andrei
    04/08/2010 at 7:37 am Permalink

    Thanks Emerson

  19. andrei
    04/08/2010 at 10:00 am Permalink

    Emerson, no, it’s not the overhead, it’s just how the algorithm works.

    Abigail, thanks for a great regex!

  20. andrei
    Paolo Bonzini
    04/08/2010 at 4:51 pm Permalink

    The regex is approximately O(n^2), as it has to visit at least half of the string for each length m of \1 ranging from 1 to n (the remaining part, which is at most half of the string, is the remainder n % m). For each m it will backtrack O(n/m) times, so the backtracking work will be O(n+n/2+n/3+…) which is O(n log n).

    If you use /^1?$|^(11+?)\1++$/ it will use only O(n) backtracking, but the cost will remain the same.

  21. andrei
    Artur Ejsmont
    05/08/2010 at 2:04 am Permalink

    Ok this is really cool.

    Im impressed!

    Well done.


  22. andrei
    05/08/2010 at 11:34 am Permalink
  23. andrei
    05/08/2010 at 2:43 pm Permalink


    Please give attribution. Also, she has refined this a lot if my jaded memory serves.

  24. andrei
    24/12/2010 at 1:13 am Permalink

    Nice! Thx!

  25. andrei
    15/01/2011 at 9:04 pm Permalink

    So, how does it work? It may seem like a brain buster, but it

  26. andrei
    05/04/2011 at 7:53 am Permalink

    Nice post. Few people appreciate what backtracking really means, and its limitations, when they write regexes for real world use.

    But I don’t agree with your last sentence. Neither backtracking nor the regex based prime test is abstract, pure, and certainly not a good idea. Perhaps its abstract or pure for some definition of those terms. But never a good idea. Its a bad idea on paper and as you demonstrated a bad idea in practice.

    The prime checker is a nifty little one-liner that did bring a smile to my face when I first came across it. But it is little more than a toy, and its weakness is apparent to everyone as soon as he sees it is building a huge string and backtracking on it.

  27. andrei
    Jonas Elfström
    06/04/2011 at 1:27 am Permalink

    You could mention that Abigail created this regex. The first mention I can find is in comp.lang.perl.misc back in Nov 23 1997.

  28. andrei
    Mauro Baraldi
    06/04/2011 at 7:47 am Permalink

    Hi Andrei,

    Congartulations by the post. I liked too much.
    Here is a Python solution based:


  29. andrei
    Arthur Corenzan
    06/04/2011 at 11:05 am Permalink


    Awesome job, really appreciated!
    Ruby is awesome too:

  30. andrei
    steve hermes
    06/04/2011 at 5:33 pm Permalink

    This makes programming fun !!!!

    Thanks for making it public

  31. andrei
    04/01/2012 at 3:06 am Permalink

    hello – is it just me !! can any one explain why when i type in the firefox browser “” i get a different site yet whe i type it in google its ok? could this be a bug in my system or is any one else having same probs ?


  1. KafeKafe » The Prime That Wasn’t 04/08/2010 at 12:28 pm

    [...] The Prime That Wasn’t » Andrei Zmievski. Share and Enjoy: [...]

  2. [...] fails to test primes, needs to be tested for prime prime test test. By foebea, on August 5th,…

  3. [...] The Prime That Wasn’t » Andrei Zmievski Priemgetallen vinden met regular expressions. Ayup. (tags: regex math) [...]

  4. [...] [...]

  5. [...] The Prime That Wasn’t ( [...]

  6. [...] [...]

Hi Stranger, leave a comment:


<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Subscribe to Comments

Additional comments powered by BackType