Proposal for an Accessible Captcha

You have probably been subjected to a captcha last time you registered for a free e-mail account somewhere. Most likely you were presented with a funny looking image in which you were supposed to find a squence of numbers or letters which you had to copy to a textfield to prove that you are a human and not a machine.

Imagine if you could not see the image. What would you write in the text field? The phrase “you inaccessible idiots” spring to mind…

Captchas are often used to prevent automated mass registration of accounts to online services such as web mail, site forums and auction sites. A problem with captchas is that they tend to be inaccessible for all users that can not see the test image. In the article Inaccessibility of Visually-Oriented Anti-Robot Tests, Matt May from W3C takes a look at some of the issues that face internet users today.

This article is also available in Japanese.

In this article we will take a look at some of these issues and propose a solution for a captcha that improves accessibility.

Requirements for a Captcha

I propose the following requirements for a decent captcha:

  1. The test should be accessible for as many users as possible regardless of culture, age and impairment.
  2. The server software has to be able to create a test instance efficiently. A test instance should be able to create automatically without human intervention.
  3. The server software has to be able to calculate the test result easily.
  4. User privacy must not be invaded. No one would like to send personal details just to register on a discussion forum.
  5. It should be difficult for a non-human to deduct the answer to the test.

We will come back to these requirements in a short while.

A Proposed Solution

Some of the solutions proposed in the W3C article are:

  1. Use logic puzzles. This solution does not satisfy requirement 2. It would require someone to manually create these tests. Also, they would probably be difficult if you have a cognitive impairment.
  2. Use credit card validation. Brilliant idea. Everyone would be happy to submit their credit card number to unknown sites on the internet.
  3. Live operators. Fails on cost efficiency. Although it would be one of the best ways to provide support for your users, a lot of sites do not have the resources to provide this kind of service.
  4. Federated identity systems. Well, this does not exist yet so we can not use it.
  5. Sound output. If it could be possible to create a sound file that was difficult to decode for a machine, this may be the best option to use together with a regular captcha image. Let’s move forward with this solution.

First of all I suggest that you thoroughly evaluate why you need a captcha. Is it really necessary? A lot of sites implement them because it “looks professional”. If you can avoid captchas, please do so.

My proposed solution is based on an ordinary captcha image in combination with an audio based test. Instead of providing the same information in the audio file as in the test image (like the Hotmail registration form) we will use a separate test. This test is based on the ability of the human brain to understand the meaning of information as well as discard unnecessary details.

The captcha would look like this:

  1. On the main registration form a regular captcha is presented just like before. Users that can see the image may use this test. A link informs users that there is an alternative test.
  2. Clicking the link leads to the audio based test form. This form provides access to an audio file and three input fields. The audio file contains three numbers that the user has to enter into the fields.

The Structure of the Audio Clip

In order to make it hard for machines to parse the numbers from the audio clip we use a combination of methods.

First, the text in the audio file varies between test instances. This has no effect on a real user (who will only listen to a test instance once) but requires a computer to be trained on many different samples. Test instances could look like this:

  • “Here are the three numbers: 12, 51 and 9.”
  • “First number is 12. Second number is 51. The third number is 9.”
  • “12, 51, 9. Those were the three numbers.”

Since there may be numbers in the surrounding text (e.g. “Here are the three numbers”) it is difficult for a computer to understand which of the numbers that should be used.

Second, if these instances are spoken with different voices it becomes even harder to create a speech parser.

Enough talk. How would it look in real life? Click here to see/listen to a sample test.

Why This May Work

It is easy to automatically create a massive amount of test instances thereby satisfying requirement 2 above. See my sample application in the references section below.

Using a limited domain of a language (numbers) we make it easier for users that do not use English as their primary language. Numbers are typically one of the first things a person learns in a foreign language.

For a human it is easy to extract the three separate numbers from the information in the audio file. This is harder for a computer that has difficulties knowing if there are three or four numbers in the string “Here are the three numbers: 12, 51 and 9.”

Evaluating the Solution

Let’s evaluate this approach to an accessible captcha by looking at the personas available at Dive Into Accessibility. If you do serious accessibility work it will do you good to use these to evaluate your work from time to time.

Jackie is a blind 19 year old woman using Jaws to read web pages. She will have no trouble using the audio based test as long as the input form is readable by Jaws.

Michael is 27 years old and colorblind. He is using a slow modem connection to browse the web without images. If you make sure that the first registration form contains a text notifying the user that images are required for this form he can turn images on for this page. Also, please make sure the captcha image does not rely on colors.

Bill has suffered a stroke and is using the keyboard to navigate the web. He should have no problems using the image based captcha test as long as the tab order is set properly on the input form.

Lillian is 54 and has poor vision. Her native language is Cantonese and she as trouble with advanced English. She uses Internet Explorer to browse the web with javascript turned off. Lillian will be able to use the image based captcha provided that the image is big enough. If she used the audio version she probably knows enough English to understand numbers. By not using javascript to open the audio test page she can access the page and audio file without problems.

Marcus is blind since birth and uses Lynx with a Braille reader to browse the web. He hates Jaws as he wants to listen to radio while browsing. If the registration form informs Marcus that he has to listen to an audio clip to complete the registration Marcus can use the audio based captcha.

It looks like this can work. So, let’s go through the requirements for a captcha we discussed earlier:

  1. The test should be accessible for as many users as possible regardless of culture, age and impairment. Evaluating the test with the personas above it looks like this is ok. I have also done an unscientific test using three relatives with varying computer literacy (Hi Mom!). No one of the test subjects had difficulties passing the test.
  2. The server software has to be able to create a test instance efficiently. A test instance should be able to create automatically without human intervention. Using the Microsoft Speech SDK I have created a sample dotnet application that can create test instances easily. With this application it is easy to generate thousands of audio files quickly. For details, see this screenshot of the captcha audio generator. If you want to use open source products, it would be easy to do the same in Java using FreeTTS (check out the Alan voice for numbers).
  3. The server software has to be able to calculate the test result easily. Generating test instances with my captcha audio generator also creates a table linking audio files to the numbers the user is supposed to enter. It would be easy to use your programming language of choice to validate user input with this data.
  4. User privacy must not be invaded. No one would like to send personal details just to register on a discussion forum. Nothing personal submitted here.
  5. It should be difficult for a non-human to deduct the answer to the test. I have tried acquiring text from the audio files with Dragon dictate. It had severe difficulties interpreting the computer voices. It also requires a lot more processing after converting it to text to know what goes into the separate fields.

To further complicate things it is possible to mix other sounds into the audio files.

What do you think? Any suggestions or ideas? If you think the Captcha Generator application is useful, do not hesitate to develop it further.


  1. Inaccessibility of Visually-Oriented Anti-Robot Tests by Matt May.
  2. FreeTTS is an open source speech synthesizer written in Java.
  3. I wrote a captcha audio generator to make sure it was easy to generate a large number of test instances easily. Download the Visual Studio dotnet 2003 project files for more details. The download also includes a compiled version. Please note that you have to have the dotnet framework and the Microsoft Speech SDK installed for it to work.
  4. Dive Into Accessibility personas by Mark Pilgrim.
  5. For an advanced text-to-speech SDK see AT&T Natural voices SDK.
  6. An alternative to pure speech is to have a synthetic voice singing the numbers. This would be virtually impossible to decode in software. If Vocaloid was scriptable this could be possible to do.
  7. The Festvox text-to-speech platform.


  1. nerkles says at 2005-01-02 19:01:

    Thanks for your thoroughness on this. I plan to integrate this kind of approach in sites I work on going forward.

  2. Tom says at 2005-01-03 00:01:

    Very intresting, I’ve seen these things on alot of sites and I often think they are overkill.

  3. grant says at 2005-01-05 04:01:

    you’ve just favoured the blind over the deaf, and added a requirement for audio playback on the web device.. interesting idea, but doesn’t sound like the ideal solution.

  4. Pete says at 2005-01-05 09:01:

    Grant, most deaf people can make of use the image based captcha which I suggest is used in combination with the audio based version. If you are both deaf and visually impaired I agree that this isn’t an ideal solution. However, this method will increase accessibility.

    Requiring audio playback on the client will likely not be a problem for visually impaired users as the majority of them use some sort of audio support already.

  5. Brett Taylor says at 2005-01-06 00:01:

    I’ve seen a Captcha that does both visual and audio. It’s my bank:

    It’s probably the most annoying captcha in my life, because I have to use it every time I log into my Internet Banking.

    Sure you won’t be able to sign in, but the captcha works fine in both picture and audio formats. :)

    Yes, the audio sample Kiwibank offers is probably relatively easy to parse, but hey, accessibility…

  6. Pete says at 2005-01-06 12:01:

    Brett, the Kiwi bank captcha is similar to the one Hotmail uses. It reads the letters and numbers in the image. My experience is that many users (like Lillian above) have difficulties understanding english characters spelled out. In that sense the limited domain of numbers may be easier.

    I understand that the Kiwi bank captcha is annoying if you have to use it every time you log in. Wouldn’t it be safer to use some sort of device to increase security instead of relying on a captcha? My bank provides customers with a digipass which of course is impossible to use if you are blind.

  7. tom sherman says at 2005-01-08 21:01:


    Great post. I’ve written some items in my blog related to CAPTCHAs as they are employed in blogs. Right now, in fact, there’s a pretty contentious debate going on between Jay Allen and Anil Dash (of Movable Type’s Six Apart) and James Seng, who wrote the SCode CAPTCHA plugin for MT. They don’t seem to like each other. Whatever.

    Anyway, I think CAPTCHAs are the solution for ending comment spam on blogs, but their implementation needs to be improved. We need the audio CAPTCHA in combination with the visual CAPTCHA. I’m not sure James will do this development, but I’d like to see it happen. I certainly think it’s a better solution than the arms race of maintaining a blacklist.

  8. Pete says at 2005-01-09 02:01:

    Tom, it would be a fairly simple extension to Movable Type to do the combination captcha I suggest above. Rendering of the audio captcha does not have to happen in real time. Using my application above you could upload 10,000 or so generated audio files to your blog and have a simple script selecting the appropriate file to play. Scripts for image based captchas are widely available.

  9. solo says at 2005-01-21 20:01:

    My contribution to a CAPTCHA Like:

    The description: link Descrpition

    The test form: Test form

    I test it using Fangs. I hope i could translate it someday. Sorry for the inconvenience of using only french.

  10. Isaac Lin says at 2005-02-01 18:02:

    With a constrained vocabulary, natural language recognition systems are quite good, so I believe the audio clip in the form you propose won’t stop automated processing.

  11. Simon says at 2005-02-01 21:02:

    Why don’t we have Turing (esque) tests, not captchas? The website would select a question from a list, then provide a random, but correct, answer.

    Something like:

    • “what is the month mentioned above?”
    • “Which is the biggest number in the list above?”
    • …etc.

    This way the test is fully accessible and, given some care constructing the logic, as foolproof as captchas; because we all know that captchas are foolproof don’t we ;)

  12. Pete says at 2005-02-03 21:02:

    Simon, the problem with logic tests is that a) you have to create tests manually and b) they tend to be hard for users with cognitive disabilities (which is a much larger group than you may think).

    Isaac, I agree that natural language recognition is getting better. However, my idea was that you also have to implement some semantic understanding if you want to break the audio based captcha (you can not just extract the numbers from “12, 51, 9. Those were the 3 numbers” as it is only the first three numbers that are relevant). I do not think there will be a captcha that is both accessible and completely unbreakable.

    The porn site solution to captchas will always be an option.

  13. tom sherman says at 2005-03-08 17:03:

    By the way, Jeff Barr has written a WordPress Comment Verification extension that follows Simon’s idea (comment #11) above. From an accessibility standpoint, it’s better than an image CAPTCHA, although it can be goofy to look at the first time you fill it out. :)

  14. tom sherman says at 2005-03-08 17:03:

    Pete, although I appreciate the link to Cory Doctorow’s highly speculative little blog entry on porn sites cirumventing image CAPTCHAs, it’s just that: speculation. It’s been bounced around in the blogosphere echo chamber a heckuvalot, mentioned in the Six Apart Guide to Fighting Comment Spam, etc., etc., but it’s still a damn blog entry. I’m sick of it.

    Spam is all about resources and picking the low-hanging fruit. Circumventing CAPTCHAs ain’t easy. Why bother when there are ripe apples to be picked?!

  15. Pete says at 2005-03-08 18:03:

    Tom, thank you for your comment. I have also been a bit sceptical about the porn site technique. Anyone know of a documented case where this has actually happened?

    Regarding Jeff’s WordPress implementation I am quite sure that it will work for smaller sites and blogs where spammers can’t be bothered to adjust their software. But if you have a large site Jeff’s solution isn’t good enough as it would take a spammer a few seconds to configure the software to pass the test.

  16. Dave says at 2005-03-09 14:03:

    I appreciate this attempt to work up a solution and the discussion that this and the paper has engendered. Mark’s lynx senario however does not take into account that the lynx user probably is not using a multi chanel sound card or cannot easily just “listen” to the audio clip. dive into accessibility link gave me a page not found 404 error. A full accessibility case study would include someone without a sound card.

  17. Darrel says at 2005-03-18 18:03:

    “Spam is all about resources and picking the low-hanging fruit. Circumventing CAPTCHAs ain’t easy. Why bother when there are ripe apples to be picked?!”

    I agree. An, as such, most captchas to prevent spamming are simply overkill.

    The best ons I’ve seen is just a simple plain text one:

    enter this in the field below to prove you are a human: 24

    It’s text, easy to read, easy to screen-read, yet likely complex enough to discourage most automated spamming tools.

  18. Chris says at 2005-03-26 05:03:

    You’ll not likely see too many blind people without some sort of sound card. Screen readers being what they are, they all have facilities for using software synthesizers. From my perspective, it’s heaps easier to haul my laptop out on a plane to write some code or fire up my instant messaging client o’ choice now that I use the in-built software synthesizer that comes with JAWS than it would have been to deal with hardware–and, though I don’t know *VAST* amounts of other blind people, the ones that I do know also, at the very least, *HAVE* a multi-channel sound card, or have a card for which multi-channel emulation is implemented through WDM. I’m just wishing the audio CAPTCHAs were more prevalent on sites such as Yahoo!, as simply joining a group right now is either requiring me to wade through eight levels of customer support, which is tedious, or enlist the help of a sighted individual, which I can’t do without becoming ill and wanting to do bad things to the contingent of both blind and sighted people who tend to see that sort of thing as “no big deal” and, consequently, slow down any attempt to implement new standards to approximately 1/128 the speed of bureaucracy.

  19. nobody says at 2005-04-22 02:04:

    If you’re deaf and blind, the ‘net might not be the thing for you any more than driving a car would be. It just wouldn’t work.

    I’d make a page usable, with alt texts for example, but I’m not going to cater to every single possible problem a human being could conceivably have, that’s ridiculous. If you’re that broken, you should be in some kind of managed care, not emailing your other deaf-dumb-blind buddies who can’t see/her/understand the email anyway.

    What’s next, a keyboard that consists of nothing but four giant keys for people with bad cases of Downs? Please.

  20. Beoran says at 2005-05-09 09:05:

    Nobdoy above, ever heard of a Braille rule and a braille keyboard? People who are both deaf and blind can still feel their way around the world, and communicate using the sense of thouch. And computers and the internet are a big help to these people. Your comments are disgraceful.

  21. Isaac Lin says at 2005-06-06 16:06:

    By “natural language recognition”, as opposed to speech recognition, I am referring to semantic understanding of a speech phrase. A number of automated assistance systems need to understand what a customer is asking for and provide automatic replies. The general case is still quite difficult, but when the questions are in a known domain, then this problem can be solved pretty well.

  22. Rick Huby says at 2005-07-27 14:07:

    Excellent article, really gives me a lot of opportunity to provide a solid and accessible solution to our clients. As with everything to do with Accessibility you are never going to satisfy EVERY eventuality, but like others mentioned, a person who uses an audio browser is likely to have audio capabilities in their computer.

    Your article also made me think about the fact that spambots are not going to be attacking ALL sites and that really evalutaing which sites will need protection and which won’t will help cut out such messing about for ALL web users.

  23. Doug de la Torre says at 2005-08-30 00:08:

    Captchas are a nice alternative for public ‘collaboration’, especially when you want real feedback without turning users away by requiring logins (and lots of annoying info like address, phone, etc).

    Glad to see you are exploring alternatives to make them accessible to everyone.

  24. José Moya says at 2005-09-29 23:09:

    I want to make you a question I haven’t seen in the previous comments. Don’t you think opening the sound file IN THE SAME WINDOW (thus closing the form) is a bit nasty? When I’ve heard the numbers I’ve thought: “Damn! I’ll forgot these numbers before I can click “back” to fill the form”.

  25. Mardeg says at 2005-10-25 04:10:

    I think that combining methods can be done more creatively, such as displaying a numbered list of image/words and and a numbered list of instructions, each telling you a different sequence of words to enter, then having the audio file tell you which instruction to follow.

    As for visually impaired people not being able to see images and browsers with images turned off, this is easily solved by instead of actually showing the images, convert them into CSS shrunken text logos that have the illusion of images. For that matter, the list of instructions themselves can be done like this aswell. My link shows one example of the CSS method at work.

    Most of all, don’t limit yourself to one set format, having many different systems combining differently each time varying the complexity will increase the time needed by bots to recognise which system/combination is in use, hopefully beyond the expiry point.

  26. alikon says at 2005-11-04 12:11:

    Is possible to have the source code in php for the audiocapctha.


  27. Pete says at 2005-11-04 19:11:

    alikon: The source code to create the audio files is written in C-sharp and is available from the references section above.

  28. Sophie says at 2005-11-29 19:11:

    Thanks for this, but doesn’t it fail on the first requirement ?

    Culture… What about people who don’t speak english ? The audio treatment makes the captcha a little hard to understand. Would you be able to generate captchas according to the language attribute of the page or form ?

  29. u24 says at 2005-12-09 03:12:

    re: tom sherman “Circumventing CAPTCHAs ain’t easy.”

    for circumvention, see: breaking captchas without OCR

    for OCR attacks, see ocr research team pwtcha captcha decoder

  30. RichGC says at 2005-12-28 18:12:

    I recently added an audio to a captcha test and I thought id throw in my experience from a technical point of view.

    You can use TTS or recording to make the base numbers, for example, 1,2,3,11,12,20,30.

    The convert them into mp3 WITHOUT any ID tags ( I used cdex software ) this allows you to join the files without any issues.

    Then any webscripting language can generate temp file on the fly that gets cleaned up after a set time.

    For your test it would be like:

    EXPIRECODE.mp3 = 70.mp3 + pause.mp3 + 30.mp3 + 3.mp3 + pause.mp3 + 12.mp3

    Also a note on playback, do not both trying to use embed or object to call a player as no solution works across platforms and browsers. Instead you can use Flash to play the mp3 in a loop from within the webpage ( most people have it ) and provide a download link for those who do not.

  31. Peter says at 2006-01-04 17:01:

    Sophie: It would be very difficult to cater for all possible visitor languages. The idea about numbers is that they are from a limited domain of the english language. Numbers are also one of the first things you learn when adopting a new language and the idea is that simple numbers are likely to be understood. Also, if your main content isn’t localized I guess translating the audio captcha isn’t a priority.

    Rich: Great idea about the flash player/download combination idea.

Peter Krantz, (remove giraffe).