Wednesday, May 06, 2009

Exploiting spammers to make computers smarter

Googlers Rich Gossweiler, Maryam Kamvar, and Shumeet Baluja had a fun paper at WWW 2009, "What's Up CAPTCHA? A CAPTCHA Based On Image Orientation" (PDF), that asks people to rotate images correctly to prove they are human rather than the norm of deciphering distorted text.

Some brief excerpts from their paper:
We present a novel CAPTCHA which requires users to adjust randomly rotated images to their upright orientation ... Rotating images to their upright orientation is a difficult task for computers .... [Our] system ... results in a 84% human success rate and .009% bot success rate.

The main advantages of our CAPTCHA technique over traditional text recognition techniques are that it is language-independent, does not require text-entry (e.g. for mobile devices), and employs another domain for CAPTCHA generation beyond character obfuscation.
The paper goes on to say that "no algorithm has yet been developed to successfully rotate the set of images used in our CAPTCHA system." The key word there is "yet." As soon as there is a strong incentive for people to develop better algorithms for this problem, better algorithms will be developed.

But, as Luis von Ahn insightfully pointed out in a recent interview in New Scientist, it is a perfectly fine outcome if spammers find a way to break this new image-based CAPTCHA technique. By doing so, they are helping us make computers smarter.

From the New Scientist article:
"If [the spammers] are really able to write a programme to read distorted text, great – they have solved an AI problem," says von Ahn. The criminal underworld has created a kind of X prize for OCR.

Security groups ... [then] can ... switch for an alternative CAPTCHA system -- based on images, for example -- presenting the eager spamming community with a new AI problem to crack ... Image orientation is difficult for computers. But if [image-based] CAPTCHA becomes common, it won't be long before spammers turn their attention to cracking the problem, with potential fringe benefits to cameras and image editing software.

Speech recognition CAPTCHAs are already being used, and image labelling ones could follow, says von Ahn. AI researchers are already working in both these areas, but they could soon be joined by spammers also helping advance the technology.

Perhaps it is time to start designing CAPTCHAs in a different way -- pick problems that need solving and make them into targets to be solved by resourceful criminals.

4 comments:

Anonymous said...

Another way to view CAPTCHAs is as a mechanical Turk. That's what Recaptcha does, helping CMU with scanning of text that fails with OCR software. It's a win-win: the site host gets a powerful CAPTCHA program, while CMU gets free labor for text scanning. I wonder what other tasks lead themselves to this kind of labor.

On a less honorable side, I've read that some spammers offer porn to help break CAPTCHAs. People are offered the chance to view online porn for free, in return for solving CAPTCHAs. Once the CAPTCHA is solved, the scammer can login and spam the site.

Justin Mason said...

However, spammers don't publish their source code, or their research. Is that really advancing science?

Unknown said...

louis von ahn must have somebody on the inside.

is this not a futuristic spam-ring crime thriller waiting to happen?

Neil said...

The idea that spammers trying to break CAPTCHAs are unintentionally helping to solve hard AI problems is certainly an attractive one. Is it true, however?

Despite the widespread success that some spammers have had in breaking text-based CAPTCHAs, can anyone point to a single piece of research that draws upon the techniques employed by spammers to advance AI? There is certainly an arms race going on between CAPTCHA makers and spammers, but I wonder whether it feeds back very much useful work to the rest of the field.