If you still don’t know how captcha works, you may be surprised to know that many of them are based on free work.
The problem of bots on the Internet is almost as old as the network itself; Ideally, all the users we meet on the Internet should be real people, but unfortunately a good number of them are bots, programmed for all kinds of tasks such as publishing spam.
Anyone who has or has ever had a website has had a bot problem. At Omicrono we also suffer from it, although at least there are already advanced enough technologies to block access to these false users. For many years, one of these technologies was captchas.
In search of the solution against spammer bots
CAPTCHA comes from Completely Automated Public Turing test to tell Computers and Humans Apart, Fully automated Turing test to distinguish between computers and humans. The name says it all, right?
It was created in the early 2000s by the team led by Luis von Ahn, when the bot problem began to become more apparent. The basic concept of the Captcha is to modify an image with words, letters or numbers in such a way that an OCR (character recognition) program is not able to distinguish them, but is a normal person.
Thus began a crazy race between spammers and the industry to see who ended up on top.Captcha was not a perfect system, of course, as was demonstrated when spammers they started hiring hundreds of people for laughable amounts, mainly in China, only to solve Captchas one behind the other.
But it worked well enough to knock out many bots, at least. But von Ahn did not escape the irony that so many people were deciphering words that had no meaning or purpose, and set out to fix this.
How captcha works
Thus reCAPTCHA was born, a new project based on the same idea, but with a different base. Instead of applying filters and stretching the randomly arranged letters, reCAPTCHA gets the words from a huge database of scanned pages with some kind of problem, such as printing failures, words printed so long ago that they had lost ink, words written on torn or wet paper, or rare words that were not in any database.
OCR programs have a very hard time in all those cases, but we humans can distinguish words with a little effort and some context, so the plan was have users solve these problems themselves and thus complete the scan of the book, newspaper or pamphlet.
If you look, reCAPTCHA always shows two words, one of the two has a known meaning in the database, and the other could not be recognized by OCR software. When we fill in a reCAPTCHA, the system actually only checks that we have spelled a word correctly, which knows what it is.
In the case of the unknown word, save what we have written, and once enough people have written the same thing, save the new meaning in the database. Thus, in practice we only have to write one of the two words that we see to pass the test, but for that we would have to know which of the two is the unknown word for the system (although in some cases it is easy to see).
Free internet work
What a crazy idea, to use the users themselves to do your work for free, right? Well curiously there were many people interested in technology, starting with the newspaper The New York Times, with its database of old newspapers that it was not able to consult because the computers were not able to read them.
Shortly after the project reCAPTCHA caught the attention of Google, And the rest is history. The Internet giant had a gigantic task ahead of it, scanning and digitizing all the books that exist for the largest online catalog on the net, Google Books.
The company had encountered the same problem, had access to a large amount of material but he had no way of automatically deciphering folded papers, missing inks, and coffee stains. So reCAPTCHA was one of the clearest purchases in its history.
That’s how the entire internet helped Google digitize books, for free, and offering a service against bots at the same time. That was until two years ago, when it became clear that spammers had achieved the technology and power necessary to pass these tests without problems.
noCAPTCHA, the captcha of the future?
Then noCAPTCHA was born, a new version that no longer asks to enter words, but is based on details such as our navigation, our cookies and our behavior; based on that data, Google may conclude that we are true users, and we will only have to click on a box to demonstrate that we are not a bot.
But if Google considers us suspicious, it can present us with a challenge; Initially it was from words like reCAPTCHA, but lately it is presenting challenges based on finding objects in a photograph as traffic signs. These challenges are not by chance, considering that Google is working on AIs that are capable of analyzing and finding objects in photographs.
That is the story of the captchas. A method to avoid spam that is far from perfect, but has managed to clean up our conversations at least a little.