Methods for measuring text readability

Content is often overlooked when working with a web site to increase accessibility. Even if your web site follows all W3C recommendations it will still be inaccessible if your content is difficult to comprehend. Let’s have a look at how you can measure text readability in english, spanish, swedish, danish and french to get a feel for the readability of your content.

In this article we will have a look at how you can test your content to see give a brief overview of formulas for measuring readability as well as an online tool to measure readability for your texts.

Content is commonly overlooked when working with a web site to increase accessibility. Even if your web site follows all W3C recommendations it will still be inaccessible if your content is difficult to comprehend. As previously mentioned here and elsewhere you can do a lot to increase accessibility by working with your content.

The technical aspects of accessibility is often more easy than the content aspects. If you are reading this you probably already know how to create an accessible web site by using the W3C recommendations. You also know how to use the W3C validator to test your templates before you put them onto your live site. Content, on the other hand, varies over time and is often created by a large number of people, each with their own personal style of writing. It is also more difficult to test using an automated tool.

Measuring readability of a text

There are a number of methods to measure the readability of a text. Most of them are based on multiple correlation analysis where researchers have selected a number of text properties (such as words per sentence, average number of syllables per word etc) and then asked test subjects to grade readability of various texts on a scale. By looking at the text properties of these texts it is possible to correlate how much “words per sentence” influence readability.

Some important facts about readability measurement methods:

  1. Readability index formulas only work for a specific language.
  2. Readability does not equal understandability.
  3. A readability index score is not an exact science. For example, it does not consider disposition (paragraphs) or actual content (are the words from a specific domain?).

Having said that we can move on to testing our content.

The readability index calculator

I have created a simple tool where you can calculate a readability index score for a text of your choice. To see a comparison, please try it with a legal document such as an EULA (End user license agreement) which typically is difficult to read. The calculator uses the following formulas:

After you are done testing your own texts, come back here and read more on what you can do to increase readability of your content.

What you can do to improve readability

If you have a large web site with many editors you should make sure all of them have a basic understanding on how to write for the web. Implementing a publishing policy and making sure it is used will ensure that your visitors get a consistent style when visiting your web site. The policy could include the following guidelines:

  1. Explain abbreviations and acronyms the first time they are used (do not rely on markup alone).
  2. Provide a subset of your content in basic english or the corresponding basic version of your language. Sweden has an organization that provides training in easy-to-read swedish. Your country/language may have similar institutions.
  3. Try to keep sentences short.
  4. Avoid symbolic language (metaphors).
  5. Avoid complicated words. Make sure you are writing from your user’s point of view. Use their terminology instead of your own.
  6. Write for the web (see Seven Krug’s book Don’t Make Me Think). Writing for the web differs from writing e.g. a scientific report. Fore more information see the references section below.

References and more information

Comments

  1. Dan says at 2005-09-23 09:09:

    I can’t believe how often content is neglected when working with accessibility. Thank you for a great article!

  2. otto says at 2005-09-25 19:09:

    captchas being used here ?

  3. Matteo Balocco says at 2005-09-28 11:09:

    For your information, you could also add the Gulpease Index formula for italian texts.

  4. Pete says at 2005-09-28 14:09:

    Matteo, thank you. I will try to add it soon.

  5. Baldo says at 2005-09-28 14:09:

    From roberto-ricci site, the output value is not always between 0 and 100 (i’ve reached also 108).

    Isn’t there any italian other program to test text readability?

  6. Adam Van Den Hoven says at 2005-09-28 18:09:

    I think that this sort of thing would be generally useful. Would you be willing to share your source code? I’d like to adapt it for our content management system so that I can provide a mechanism for our clients to check the readability of their content.

  7. Pete says at 2005-09-29 17:09:

    Adam, absolutely! The code is based on the code found at I Love Jack Daniels(!). Formulas for the other languages are documented at the University of Textas link above. If you want my specific PHP implementation, I would be happy to send it in an e-mail to you. Contact me at pete giraffe standards-schmandards.com (substitute Giraffe with the you-know-what character).

  8. José Moya says at 2005-09-30 00:09:

    Hi! I’m a Spanish teacher and I’m glad to find this good link to a method of measuring readability.

    In exchange, here’s a (non-perfect) algorithm to measure syllabes in Spanish (will not work with some vowel combinations or with some foreign words adopted by spanish).

    …too many years since I studied autosegmental phonemics!

    syllabe= [beginning] + middle + [ending].

    beginning = {hard [+liquid] or double_consonant or non-hard or “t” [+"r"]}

    double_consonant={“ch”,”qu”,”gu”,”rr”} hard={b,c,d,f,g,j,k,p,} non-hard={nasal, liquid, “s”,”v”,”w”,”x”,”y”,”z”,”h”} nasal={“n”,”m”,”ñ”} liquid={“l”,”r”}

    middle=[{i,u}]+{a,e,i,o,u}+[{i,u}] ending={liquid, “x”, “c”, “f”,”g”,”j”,”k”,”p”,”t” or “p” or “z”} or {[nasal or "b" or "d"] + ["s"]}

    Special at end of word:

    • “y” is a valid vowel when ending a word (convert “y” at end of word into “i” before parsing)
    • “z” is a valid as second-ending consonant (i.e. after “n”) when at the end of a word.
    • “h” has no sound in Spanish; it can be used to determine the begginning of a word, but it can occur at absolute end of word also (delete “h” at end of word before parsing).

    Hope you manage to understand it. there are some exceptions: {u/i}+{a/e/o} can be 2 syllabes or one, depending on the word and the presence of an acute tilde: there’s always an end-of-syllabe between (acute) í or (acute) ú and a, e or o.

  9. José Moya says at 2005-09-30 00:09:

    P.S. After visiting UTexas, I’ve make the counts by hand (with a bit of help from my word processor). Your test calculates the right score for my number of syllabes, while UTexas calculates a lower number of syllabes (thus a higher score).

  10. José Moya says at 2005-09-30 15:09:

    P.S.2 here is a simpler algorithm:

    1. replace [qg]u[ei] with [gq][ei]
    2. replace ü with u (güe->gue,not ge)
    3. replace -y at end of word with -i
    4. Delete h
    5. delete [iu] (without acute) before of after [aeiou]
    6. Substitute [áéíóú] with [aeiou]
    7. count [aeiou].
  11. Pete says at 2005-10-01 12:10:

    José: Thank you for your information. I will have a look at it and see if I can improve the Fernandez Huerta caluclation.

  12. Chris Lloyd says at 2005-10-02 06:10:

    If you just put in “Hello” it says that the level is 8 and the score is around 35. Also, I get a lower score if I put in “My name is Chris” as opposed to “My name is Chris Lloyd”.

  13. Pete says at 2005-10-02 21:10:

    Chris: Scoring of short texts are inaccurate. The longer the text is the better the measurement becomes. If you look at the formulas you will se that average word count and syllable count impacts the score a lot. So, measuring “Hello” does not really make sense.

  14. Martin Kliehm says at 2005-10-04 16:10:

    The readability for German texts can be measured by several formulas:

    Flesch Reading Ease: FRE = 206.835 – (1.015 x ASL) – (84.6 x ASW)

    where ASL is the average sentence length and ASW is the average number of syllables per word. Scores are between 0 and 100, though you have to be careful when interpreting the scores because they are not made for German (and can even result in scores above 100).

    Or another:

    AVI: AVI = 180 – (words : sentences) – 58,5 x (syllables : words)

    Also gives scores between 0 (difficult) to 100 (easy).

  15. Martin Kliehm says at 2005-10-04 17:10:

    Even better might be one developed for German texts like the “Wiener Sachtextformel”, which results in scores approximately corresponding with the reader’s recommended age, so 4 would be very easy and 15 very hard:

    WSTF = 0,1935 MS + 0,1672 ASL + 0,1297 IW – 0,0327 OS – 0,875

    MS is the percentage of words with 3 or more syllables, ASL is the average sentence length (number of words per sentence), IW is the percentage of words with 6 or more characters, OS is the percentage of words with only one syllable.

  16. franz says at 2006-01-08 07:01:

    thanks you had help me a lot

  17. Dr.Ahmed says at 2006-03-03 19:03:

    I want some informattion about Measuring text readability Dr.Ahmed

  18. zip says at 2006-03-28 10:03:

    wow this is a great tool, maybe making it into a validator type thing? where u can put in a site and it will measure the readability

Peter Krantz, peter.krantz@giraffe.gmail.com (remove giraffe).