Category > Tech

The Prime That Wasn’t

03 August 2010 » In PHP, Tech » 27 Comments

I’ve seen a lot of problems solved with regular expressions, but last Friday, thanks to Chris and Sean, I found out that one can use them to determine whether a given integer is a prime number. The original articles showed the following regular expression:

/^1?$|^(11+?)\1+$/

You don’t apply this to the integer itself. Instead, you create a string of repeating 1s, where the numbers of 1s is defined by the number you’re testing, and apply the regex to that. If it fails to match, the number is a prime. The regex relies on backtracking and thus would not work with a DFA-based engine, such as PHP’s (now deprecated) ereg* functions. But, it works just fine with the preg_* ones, or so I thought (more on that in a bit).

So, how does it work? It may seem like a brain buster, but it’s actually pretty simple. Ignore the portion before the alternation operator (|), because it’s just a short-circuit case for checking whether the input string is empty or has a single 1. (If the string is empty, it means the number being tested is 0. If the string is a single 1, it means the number being tested is 1. Neither of these are prime.) The portion after the pipe is where the real magic happens.

The (11+?) sub-pattern matches strings 11, 111, etc. The \1+ matches whatever the sub-pattern matched, one or more times. So on the first try, the engine will match 11 and then attempt to match the same thing one or more times in the rest of the string. If it succeeds, the number is not a prime. Why? Because it just proved that the length of the string is divisible by 2 (the length of 11), therefore the original number is divisible by 2. If the overall match fails, the engine backtracks to the beginning and tries to match 111 two or more times and so on successively. If the first sub-pattern gets long enough (n/2 basically), and the overall match still fails, the number is a prime. Beautiful, isn’t it?

Coincidentally, Sean recently created a code evaluation plugin for the Phergie-based IRC bot that runs in the channel we hang out in. The plugin is a simple proxy to ideone.com, but is helpful for doing quick code tests. We had fun experimenting with this regex pattern implemented in a PHP function that returns the next largest prime number after a given one. The trouble started when Sean fed it 99999999, and the code spit out 100000001. This didn’t seem right, and Wolfram Alpha confirmed; the answer we got was not a prime. (It factors into 17 * 5882353.)

A few similar tests also gave us numbers that were not prime. But, where was the problem? The PHP code was too simple to have a bug, many answers were correct, and the regex algorithm itself was sound. It was time for brute force. I quickly wrote some code to feed increasing odd integers into into regex pattern and check its answers against the normal sieve algorithm to see where it starts failing. The first number to fail was 22201; the regex said it was prime, but it’s actually a perfect square (1492). Past that, the failure rate increased.

It then dawned on me that the culprit might be the backtracking itself, specifically how it was implemented in PCRE, the library at the heart of PHP’s regular expressions. As I mention in my Regex Clinic talk, unbounded backtracking can dramatically slow down the matching and should be well-understood before attempting to write complex patterns. To manage the danger of runaway patterns, PCRE implemented the pcre.backtrack_limit setting a few years ago. In our case, backtracking is used to break up the string of 1s into successively larger chunks, and, with very long strings, the engine may run into this backtracking limit, which is set to 100000 by default. My guess was that with the 22201-character-long string, the default was not enough. Once the limit is reached, the match fails, and the number is declared prime.

I bumped the limit up to 200000, and voilà, 22201 was no longer declared prime. To fix the 100000001 match, the limit had to be raised dramatically, to around 250000000! And, the program took 14 seconds to deliver the verdict on my new i5 MacBook Pro. Needless to say, don’t use this prime determination algorithm for any sort of real-life stuff. Instead, just appreciate it for its elegance. I certainly did, and was glad that my bit of investigative work showed that abstract, pure, good ideas can still fail in the messy world of software and hardware.

Chrome Not Quite Shiny Yet

27 June 2010 » In Opinion, Tech » 7 Comments

A month ago I decided to stop using Firefox as my main browser, because I was becoming frustrated with its sluggishness and general stability issues. I wanted to give a new, modern browser a chance and Google Chrome seemed like the most fitting choice. I’ve been using it diligently and exclusively since then and wanted to share some good and not so good things that I found out after a month of daily usage. The caveat is that I’ve only used it on the Mac platform, so my findings may differ for Windows/Linux users.

  1. It’s fast and stable. Pages load quite a bit faster, if only perceptually. And while the memory usage is not that much lower than FF’s – after all, those DOM trees and images have to go somewhere – I have never had to force-quit (or even just quit) Chrome the way I had to with FF after a couple of days. This is a big win in my book.
  2. The built-in developer tools are nice, but are not quite as good as Firebug, it seems. I will have to try them out on a project before I can say this with certainty.
  3. There is a lack of integration with 3rd-party tools and websites I used with FF, mainly 1Password and Delicious. Yes, there are Chrome alpha/beta extensions for both, but I miss the full functionality the FF versions offered. Hope the vendors hurry up and bring them up to feature parity.
  4. I miss the quick search engine access that FF has in the upper right corner. That’s how I searched Amazon, Wordreference.com, and several other sites. I can’t find an equivalent in Chrome.
  5. There is no way to configure proxy settings through Chrome itself. You have to use a command-line parameter (who does that on a Mac though?), an extension, or do it through OS X’s System Preferences.
  6. It’s from Google. So it’s just what I’ve come to expect from their products – great engine and features combined with some usability problems and a lack of polish.

Overall, I’m pleased with Chrome and will continue using it as my main browser unti something better comes along.

Update: As the comments below indicate, it is possible to configure and use various search engines from the “omnibox”, aka the search bar. Right click and select Edit Search Engines.

Voting opens for my SXSW panel

19 August 2009 » In PHP, Talks, Tech » 1 Comment

I have not yet been to South by Southwest festivals. And every time I mention this to friends and colleagues, they make big eyes and say, “You have to go!”. Well, I finally cracked and submitted a panel proposal for the 2010 SXSW Interactive portion. The panel is titled “Travelog with Maps: When 1,000 Photos Aren’t Enough” and will present and discuss ways to preserve the memories and tell the stories of your trips using latest geolocation tools, social data, and a bit of code. Chris Shiflett will be my co-panelist. We decided to do this panel after trying to share our Iceland trip in real-time and discovering what is and isn’t there to support it.

If you like this idea, please vote for this panel. Thank you!

Plaxo Pulse DOA

05 November 2007 » In Tech » 3 Comments

As if the current proliferation of social networks was not enough, Plaxo has recently launched its own offering called Pulse, in the best tradition of branding-via-metonymy. First of all, “Pulse”? I am generally not in the habit of checking my friends’ vital signs several times a day, so that kind of got lost on me. Maybe they could have done better with Spasm, Borborygm, or ultimately, Omphaloskepsis, since that’s basically what social networks are.

Anyway, what I really wanted to say is that Plaxo Pulse fails. Out of the gate. Dead on arrival. Why? Well, ever since it launched I’ve been receiving notices that such-and-such has added me as friend or wants to add me as a business contact. These notices provide a link to go to Pulse site and confirm the connection. Not recognizing one of the names, I decided to clicke on the link to check it out, but all I saw was a page that said, “Not a member yet? Sign up!” Are you freaking kidding me? You expect me to sign up just to see who wanted to add me as a contact? No thanks, Pulse. You lost me at “click here”.

Web 2.0 Law #1

11 May 2007 » In Tech » No Comments

Now that Yahoo! Photos is closing down, Flickr is going to experience another surge in membership. It is already the most popular photo sharing service that matters (forget about Photobucket and such), and it has an unparalleled API. So with that in mind here’s the first of Andrei’s Web 2.0 Laws.

Law #1: As the number of mashups increases, the probability of a new mashup not using Flickr approaches 0.

Proof, what proof? This is not math or physics here, this is Web 2.0. But do go and take a look for yourself; Flickr infiltration is everywhere. And if you find a mashup that doesn’t use Flickr API, then there are only two explanations: a) the authors have thought or are thinking about using Flickr in one way or another; b) they are Amish.

I, for one, welcome our new megapixel overlords.

“PHP Eats Rails for Breakfast”

21 October 2006 » In PHP, Tech » 3 Comments

That is indeed the title of the article on Ohloh, a site that collects information on open source projects and gathers statistics by analyzing the source code of those projects. So far they’ve indexed over 3,000 projects and their conclusion seems to be that among Web scripting languages, PHP is the undisputed champion (as measured by the LOC count).

Measured purely by the number of new lines of code, PHP leaves all other web-scripting languages in the dust, and continues to grow. Quite simply, one-fifth of all open source code being written today is written in PHP.

This, combined with their observation that the relative number of developers working in PHP is not increasing and that the number of new projects being started has actually decreased, leads to an interesting premise: the trend for for PHP-based projects points towards mature code bases. Meaning that the developers prefer to work with existing applications or frameworks and increase their output through updating these applications rather than starting new ones. In general, that seems to be a good trend, if one can rely on the data gathered by Ohloh. I have not done an exhaustive examination, but there are some puzzling things in the data on PHP project itself. For example, how do they calculate “man-years” for each developer? My number is 5.5, which puts me fourth on the list behind Rasmus, Derick, and Frank. I know I’ve been working on PHP longer than that. And then, Andi and Zeev are actually below me with 5.3 and 5.2 man-years respectively. The account with the most man-years seems to be our automatic changelog committer script, even though it committed a total 0 lines of usable code. :-)

Even without Ohloh’s data, one could venture to say that the code base of projects using PHP is larger than those of Ruby or Python. A lot of that is due to the accessibility of PHP, I think, but does pure lines-of-code count give indication to the quality of said code? Of course not. I think Ohloh is providing a useful service by tracking this kind of information, but I would take its results with a few grains of salt.

Hot and .. Not So

15 July 2006 » In Funny, Tech » 4 Comments

Thought I’d throw a couple of fun links your way. First one is a project that almost won $5,000 prize at the last Mashup Camp. It presents, shall we say, an innovative approach to user validation combining so-called business with so-called pleasure. HotCaptcha gets thumbs-up from me any day of the week.

The other is a supremely strong candidate for the title of the Worst Music Video Ever. It is an inspired effort that immediately induces cringing expression on your face and fails to release you from its grip until (in my case) 3 hours later. Have fun.

Half-kingdom for the Desktop

21 June 2006 » In Tech » 1 Comment

Last week was the first time I had to do a hard shutdown on my Powerbook. It was a strange feeling. There I was, doing something that used to be a regular occurrence in the Windows world, and now it really bothered me that I had to hold down the Power button for 5 seconds and wait for my Mac to commit temporary suicide. And I was apprehensive for a good reason.

After powering back up, I discovered that my desktop (lowercase) was gone. No icons, and I could not drag anything to the background at all. I still had ~/Desktop folder but it seemed to be disconnected from the actual display. No one I asked about this has encountered the problem before, but I had a hunch it had to do with a corrupted preference file or something similar.

After much muckying around (creating a new user, making sure its desktop was operational, running Preferential Treatment over all .plist files, doing a binary sieve on the .plists to find the offending one) I narrowed it down to com.apple.finder.plist. A bit more yahoo’ing revealed that the culprit was a “secret” preference called CreateDesktop. If set to false, it tells Finder not to display the desktop. Brilliant. Somehow this got turned off during the hard shutdown and my desktop got lost at sea. Anyways, if you ever get sick of your Desktop (or want to restore it, like I did), you can change the setting via:

defaults write com.apple.finder CreateDesktop -bool <false or true>

Until later. I am going to go click on all the icons.

UPDATE: Fixed the HTML escaping of the command (the bool arguments were hidden).

Discover New Music

16 February 2006 » In Tech » 1 Comment

Inspired by the latest entry at Yahoo! Music Blog, I registered at Last.fm and started feeding my playlist information to them via the AudioScrobbler iTunes plugin. My profile there is slowly building and I am looking forward to checking out what their customized radio station will start playing for me once they know enough of my tastes. Their recommendations so far seem to be decent, but I need to listen to about 300 tracks before the music “neighbors” become available. As a side benefit, I also put the RSS feed of the last 10 tracks played on frontpage of this site in the right hand column.

I also explored Pandora, the other service mentioned in the blog. From what little time I spent with Pandora, it seems fascinating. Created by the Music Genome Project, it allows you to specify an artist or a song and create a “radio station” that plays songs that are musically similar to the specified one. What does “similar” mean? Well, the folks at Music Genome Project analyzed over 10,000 songs of different artists and broke them down into traits, such as harmony, rhythm, instrumentation, orchestration, arrangement, lyrics, singing and vocal harmony. Given a song, for example “Volcano” by Damien Rice, Pandora will create a station that features songs with mellow rock instrumentation, folk influences, mild rhythmic syncopation, acoustic sonority, repetitive melodic phrasing, and other similarities to the original, picking things like “Smile” by Mia and Jonah and “Igloo Glass” by Holopaw. You can add more songs or artists to the station and Pandora will try to pick tracks that cover the whole gamut of what you are looking for. You can also thumbs up or down the individual songs to fine tune your preferences and build up a favorites list. I would say its recommendations range from good to very good and I definitely intend to use it as a tool for discovering new music.

Recognize This

05 October 2005 » In Hacks, Tech » 8 Comments

Face recognition technology is getting really good. Yesterday I saw a link to Intel’s OpenCV library float through the mailing list at work and a note that someone wrote a PHP extension for it. “Interesting”, I thought. I hacked up a simple PHP script that would take an image and process it slightly to make detected regions more obvious. Here’s an example of the output. Not bad, huh? Then Jeremy tried another image, with some spooky results. Note that aside from the person, there are a couple more regions that the library thought was a face. If you look closer, the larger rectangle on the carpet encloses something that does have vague face-like features. Nice job, Intel.

Page 1 of 3123