Category > Hacks

NetflixQueueShuffler is Updated

02 September 2006 » In Hacks, Movies » No Comments

An eagle-eyed user Mike Ryan noticed that Netflix changed the structure of the Queue page and sent in a patch. Download the updated NetflixQueueShuffler.

Un-probable Sentences

12 April 2006 » In Hacks » 8 Comments

I decided to start a section on Language and Linguistics, since it’s one of my passions and I am, after all, pursuing a graduate degree in it. So, I will be posting some interesting tidbits and such from the classes, the Web, and my own experiments.
This semester’s class is Computers and Written Language. It basically deals with introductory computational linguistics. Last week we covered n-gram language models, which are statistical models of word sequences. They are called n-gram because n-1 previous words are used to predict the probability of the next one. Such models are useful for a variety of tasks, including speech recognition (“The sign says key pout” vs. “The sign says keep out”), handwriting recognition, spellchecking, document identification, etc.
The programming assignment we had from the class required us to build a trigram (n=3) model of a given corpus of text. This involves counting occurrences of each trigram and calculating the probability of the final word following two preceding ones. For example, probability of see following want to can be calculated as:

Ρ(see|want to) = C(want to see) / C(want to)

That is, probability of see given want to is the number of times we’ve seen want to see trigram divided by the number of times we’ve seen want to bigram, and it turns out to be low, since want to can be followed by many different verbs. P(tonic|gin and), on the other hand, is much higher. You also want to take sentence boundaries into account, since I is very likely to begin a sentence in a fiction corpus, but not so much in a financial one.
So the idea is: read corpus, tokenize, count, calculate probabilities. Probably 30-40 lines of code in a language like PHP or Ruby (which is what I used, just for fun). Once I was done though, I thought, well, I have this nice trigram model, what else can I do with it? Ah, apply it in reverse, to generate sentences!
This is also a fairly simple task. Take a pair of words, then look in the list of the words that can possibly follow them, as learned from corpus, pick a probabilty at random and use it to make a selection from the list. Shift the sequence, so that the last word becomes next to last and the current one becomes last, rinse, repeat. The whole process is basically a Markov chain. I added some heuristics for comma insertion, a couple of controls, and called the resulting generator furby because it reminded me of that weird little toy from a few years ago that would sit there, absorb the sounds of the outside world, and regurgitate them back in a mangled, but eeriely recognizable manner.
So what kind of sentences did I obtain? Let me quote good old Chomsky first:

The notion “probability of a sentence” is an entirely useless one… — Noam Chomsky, 1969

I am not going to argue against his statement here, but I will apply it for my own purposes. You see, the sentences that furby generates are not improbable. They are… un-probable. Sometimes they are poetry, sometimes they are normal sentences you’d find in a book, but mostly they feel like someone who knows English as a second language had a hit of LSD and was asked to write down his thoughts. It’s English, with a big dollop of whoa-a-ah.
I got a few texts from Project Gutenberg site and fed them to furby. Everything from Sherlock Holmes stories to Alice in Wonderland to Robinson Crusoe. Here are some samples of what it spit out:

“I never have had a considerable household, he murmured.” (sane)
“I remember most vividly, three smashed bicycles in a fury of misery.” (poetry)
“He put his lips tight, and I wrote to the suspicion that the things had been shattered by his eager face.” (LSD)

The cool thing is that the results are in the style of the original text. Here are a couple generated from Twain’s Huckleberry Finn:

“There was them kind of a whoop nowheres.”
“You know bout dat chile stannin mos right in the night-time, sometimes another.”

Note that these are original sentence that do not occur in the texts. It was a lot of fun just running furby over and over again and seeing what it would come up with. But why not mix two authors? I tried a couple, but the best combination seemed to be DH Lawrence’s Sons and Lovers and the aforementioned Huckleberry Finn. Once it sucked in this unlikely duet, furby decided to become a comedian with a streak of soft-core pornography. Here are some gems:

“She wanted him and a half a sovereign.”
“Goodness man don’t be a fine woman.”
“Her mouth to begin working, till pretty late to-night.”
“She heard him buy threepennyworth of hot-cross buns, he talked to barmaids,
to almost any woman whom he felt.”
“He shoved his muzzle in the wet.”
“Joking, laughing with their shafts lying idle on the downward track.”
“As the lads enjoyed it when i realised that she was warm, on the pavement
then Dawes then Clara.”
“She had never been shaved.”
“He lay pressed hard against her and the electric light vanished, and I saw
the wrist and the coconut, and shook her head.”
“She could think of the body as it were, prowling abroad.”
“The three brothers sat with his finger-tips.”
“Eh, dear, if i’m a trying to get as drunk as a bubble of foam.”

Priceless.
My next goal is to feed it php-general archives and see if furby can be more intelligent that most of the postings on that list. Stay tuned.

NetflixQueueShuffler Update

12 January 2006 » In Hacks, Movies » No Comments

I upgraded to Firefox 1.5 and found out that my NetflixQueueShuffler GreaseMonkey script no longer worked. So after some digging, I fixed it up and it’s available for your downloading pleasure.

Recognize This

05 October 2005 » In Hacks, Tech » 8 Comments

Face recognition technology is getting really good. Yesterday I saw a link to Intel’s OpenCV library float through the mailing list at work and a note that someone wrote a PHP extension for it. “Interesting”, I thought. I hacked up a simple PHP script that would take an image and process it slightly to make detected regions more obvious. Here’s an example of the output. Not bad, huh? Then Jeremy tried another image, with some spooky results. Note that aside from the person, there are a couple more regions that the library thought was a face. If you look closer, the larger rectangle on the carpet encloses something that does have vague face-like features. Nice job, Intel.

Shuffle My Queue!

04 June 2005 » In Hacks, Movies » 2 Comments

This is a random world. And people seem to like randomness: witness the popularity of iPod Shuffle. I am no different. I also watch a lot of movies: my Netflix queue has 72 DVDs in it currently. Recently I wondered whether it was possible to randomize my queue, so that the next DVD that comes in is somewhat of a surprise. After emailing the Netflix customer support and getting back a completely unhelpful reply telling me that I can re-order the queue by changing the priority numbers and clicking a button, I decided that it was time to take matters into my own hands.
Input: GreaseMonkey, Javascript, and a couple of hours of hacking. Output: NetflixQueueShuffler. I know that GreaseMonkey scripts site already has a Netflix queue randomizer, but I think that one is lame, since all it does is change the priority numbers and click the submit button for you. Mine actually re-orders the table rows visually and lets you do it a few times until you are satisfied with the randomness.

Music Hunger Satisfied

28 May 2005 » In Hacks » No Comments

Well, I am happy to report that I resolved my problem of how to play YME music through my stereo. Norbert Mocsnik set me on the right track but he lamented that it was not a software-only solution. So how did I accomplish this?

  1. Install and configure ShoutCast Server on my Windows machine.
  2. Install WinAmp DSP ShoutCast plug-in.
  3. Open the mixer and select Wave under recording panel.
  4. Open WinAmp preferences, select the ShoutCast plug-in in the DSP section. A control panel will pop-up.
  5. Go to Input tab and select Souncard Input under Input Device. This means that instead of using WinAmp to play the audio, the plug-in will record the data being played through soundcard’s output and re-broadcast it.
  6. Configure Output and Encoder tabs per DSP plug-in’s docs, and connect to ShoutCast server.
  7. Start YME and hit play.
  8. Point the XboxMediaCenter at the ShoutCast stream.

Bingo. All that is required are a couple of pieces of software and a full-duplex soundcard capable of recording its own playback (basically, a loopback mechanism). While it’s a not a one-button solution, it works well enough and now I can enjoy my custom Yahoo! radio station in full glory of 5.1 audio.

mod_auth_sqlite

19 May 2003 » In Hacks » No Comments

I wrote a module for Apache 1.3.x to perform user and group authentication from an SQLite database. You can check it out here.