Proposal for an Accessible Captcha – Standards Schmandards

You have probably been subjected to a captcha last time you registered for a free e-mail account somewhere. Most likely you were presented with a funny looking image in which you were supposed to find a squence of numbers or letters which you had to copy to a textfield to prove that you are a human and not a machine.

Imagine if you could not see the image. What would you write in the text field? The phrase “you inaccessible idiots” spring to mind…

Captchas are often used to prevent automated mass registration of accounts to online services such as web mail, site forums and auction sites. A problem with captchas is that they tend to be inaccessible for all users that can not see the test image. In the article Inaccessibility of Visually-Oriented Anti-Robot Tests, Matt May from W3C takes a look at some of the issues that face internet users today.

This article is also available in Japanese.

In this article we will take a look at some of these issues and propose a solution for a captcha that improves accessibility.

Requirements for a Captcha

I propose the following requirements for a decent captcha:

The test should be accessible for as many users as possible regardless of culture, age and impairment.
The server software has to be able to create a test instance efficiently. A test instance should be able to create automatically without human intervention.
The server software has to be able to calculate the test result easily.
User privacy must not be invaded. No one would like to send personal details just to register on a discussion forum.
It should be difficult for a non-human to deduct the answer to the test.

We will come back to these requirements in a short while.

A Proposed Solution

Some of the solutions proposed in the W3C article are:

Use logic puzzles. This solution does not satisfy requirement 2. It would require someone to manually create these tests. Also, they would probably be difficult if you have a cognitive impairment.
Use credit card validation. Brilliant idea. Everyone would be happy to submit their credit card number to unknown sites on the internet.
Live operators. Fails on cost efficiency. Although it would be one of the best ways to provide support for your users, a lot of sites do not have the resources to provide this kind of service.
Federated identity systems. Well, this does not exist yet so we can not use it.
Sound output. If it could be possible to create a sound file that was difficult to decode for a machine, this may be the best option to use together with a regular captcha image. Let’s move forward with this solution.

First of all I suggest that you thoroughly evaluate why you need a captcha. Is it really necessary? A lot of sites implement them because it “looks professional”. If you can avoid captchas, please do so.

My proposed solution is based on an ordinary captcha image in combination with an audio based test. Instead of providing the same information in the audio file as in the test image (like the Hotmail registration form) we will use a separate test. This test is based on the ability of the human brain to understand the meaning of information as well as discard unnecessary details.

The captcha would look like this:

On the main registration form a regular captcha is presented just like before. Users that can see the image may use this test. A link informs users that there is an alternative test.
Clicking the link leads to the audio based test form. This form provides access to an audio file and three input fields. The audio file contains three numbers that the user has to enter into the fields.

The Structure of the Audio Clip

In order to make it hard for machines to parse the numbers from the audio clip we use a combination of methods.

First, the text in the audio file varies between test instances. This has no effect on a real user (who will only listen to a test instance once) but requires a computer to be trained on many different samples. Test instances could look like this:

“Here are the three numbers: 12, 51 and 9.”
“First number is 12. Second number is 51. The third number is 9.”
“12, 51, 9. Those were the three numbers.”

Since there may be numbers in the surrounding text (e.g. “Here are the three numbers”) it is difficult for a computer to understand which of the numbers that should be used.

Second, if these instances are spoken with different voices it becomes even harder to create a speech parser.

Enough talk. How would it look in real life? Click here to see/listen to a sample test.

Why This May Work

It is easy to automatically create a massive amount of test instances thereby satisfying requirement 2 above. See my sample application in the references section below.

Using a limited domain of a language (numbers) we make it easier for users that do not use English as their primary language. Numbers are typically one of the first things a person learns in a foreign language.

For a human it is easy to extract the three separate numbers from the information in the audio file. This is harder for a computer that has difficulties knowing if there are three or four numbers in the string “Here are the three numbers: 12, 51 and 9.”Â

Evaluating the Solution

Let’s evaluate this approach to an accessible captcha by looking at the personas available at Dive Into Accessibility. If you do serious accessibility work it will do you good to use these to evaluate your work from time to time.

Jackie is a blind 19 year old woman using Jaws to read web pages. She will have no trouble using the audio based test as long as the input form is readable by Jaws.

Michael is 27 years old and colorblind. He is using a slow modem connection to browse the web without images. If you make sure that the first registration form contains a text notifying the user that images are required for this form he can turn images on for this page. Also, please make sure the captcha image does not rely on colors.

Bill has suffered a stroke and is using the keyboard to navigate the web. He should have no problems using the image based captcha test as long as the tab order is set properly on the input form.

Lillian is 54 and has poor vision. Her native language is Cantonese and she as trouble with advanced English. She uses Internet Explorer to browse the web with javascript turned off. Lillian will be able to use the image based captcha provided that the image is big enough. If she used the audio version she probably knows enough English to understand numbers. By not using javascript to open the audio test page she can access the page and audio file without problems.

Marcus is blind since birth and uses Lynx with a Braille reader to browse the web. He hates Jaws as he wants to listen to radio while browsing. If the registration form informs Marcus that he has to listen to an audio clip to complete the registration Marcus can use the audio based captcha.

It looks like this can work. So, letÃ¢â‚¬â„¢s go through the requirements for a captcha we discussed earlier:

The test should be accessible for as many users as possible regardless of culture, age and impairment. Evaluating the test with the personas above it looks like this is ok. I have also done an unscientific test using three relatives with varying computer literacy (Hi Mom!). No one of the test subjects had difficulties passing the test.
The server software has to be able to create a test instance efficiently. A test instance should be able to create automatically without human intervention. Using the Microsoft Speech SDK I have created a sample dotnet application that can create test instances easily. With this application it is easy to generate thousands of audio files quickly. For details, see this screenshot of the captcha audio generator. If you want to use open source products, it would be easy to do the same in Java using FreeTTS (check out the Alan voice for numbers).
The server software has to be able to calculate the test result easily. Generating test instances with my captcha audio generator also creates a table linking audio files to the numbers the user is supposed to enter. It would be easy to use your programming language of choice to validate user input with this data.
User privacy must not be invaded. No one would like to send personal details just to register on a discussion forum. Nothing personal submitted here.
It should be difficult for a non-human to deduct the answer to the test. I have tried acquiring text from the audio files with Dragon dictate. It had severe difficulties interpreting the computer voices. It also requires a lot more processing after converting it to text to know what goes into the separate fields.

To further complicate things it is possible to mix other sounds into the audio files.

What do you think? Any suggestions or ideas? If you think the Captcha Generator application is useful, do not hesitate to develop it further.

References

Inaccessibility of Visually-Oriented Anti-Robot Tests by Matt May.
FreeTTS is an open source speech synthesizer written in Java.
I wrote a captcha audio generator to make sure it was easy to generate a large number of test instances easily. Download the Visual Studio dotnet 2003 project files for more details. The download also includes a compiled version. Please note that you have to have the dotnet framework and the Microsoft Speech SDK installed for it to work.
Dive Into Accessibility personas by Mark Pilgrim.
For an advanced text-to-speech SDK see AT&T Natural voices SDK.
An alternative to pure speech is to have a synthetic voice singing the numbers. This would be virtually impossible to decode in software. If Vocaloid was scriptable this could be possible to do.
The Festvox text-to-speech platform.