Take a look at sentence #1 below, for which most Readability indicators (such as the Flesch-Kincaid Grade Level [1],[2],[3])1 indicate a 'High' Readability:
# | Sentence | Flesch-Kincaid Grade Level |
1 | The horse raced past the barn fell. | 0.0 |
The reasons for this 'High' Readability reported by Readability indicators include (see [4]):
However, sentence #1 is the well-known garden-path sentence that has been studied by cognitive psychologists for decades. It is known to place a very heavy load on Working Memory. In this garden-path sentence, the reader realizes that he/ she was 'led up the garden path' only when the verb 'fell' is reached; at this point, the reader realizes that the sentence has to be reprocessed from scratch. Even university students who participated in the study were confused when faced with sentences such as #1.
Now take a look at sentence #2, where the addition of the grammatical 'function words' has made the sentence much easier to comprehend (although an individual with a Low Working-Memory span may face discomfort in processing sentence #2). If the sentence is read aloud, notice also how much easier it is to comprehend when a pause is inserted at the site indicated (after 'barn'). Some psycholinguists believe that pauses at the right points (and changes in intonation around such points) can help the listener figure out the structure of a clause and assimilate its essence, before moving on to complete the sentence. An intonation cue is an additional aid that may be used by Text-To-Speech software for changes in intonation around an Object Gap [8],[9],[10],[11].
# | Sentence | Flesch-Kincaid Grade Level |
2 | The horse that was raced [intonation cue] past the barn [pause] fell. | 0.0 |
While few authors use sentences such as sentence #1, take a look at sentence #3 below which is more common, especially when people write the way they speak (for e.g., in novels). Here again, the addition of a few grammatical 'function words' (and the 'pause') enhances readability hugely.
# | Sentence | Flesch-Kincaid Grade Level |
3 | The defendant claimed that the boy the dog bit tormented him for months. | 4.8 |
4 | The defendant claimed that the boy whom the dog bit [pause] tormented him2 for months. | 5.0 |
Clearly, most authors would like their ideas to reach the widest audience, including individuals with low Working-Memory spans. However, the tools used by most authors (for e.g., Word Processors) do not report anything out of the ordinary in sentences #1 and #3 3. Given the absence of the right diagnostic tools, many authors rely on Readability indicators (such as the Flesch-Kincaid Grade Level indicator shown above) to judge the readability of passages. In fact, many governments have mandated that content intended for a mass audience should be at a Flesch-Kincaid Grade Level of around 8 (but see [4] to understand how the Grade Level of a passage is assessed by expert teachers). However, most Readability indicators work on the passage as whole, and not on individual sentences; as a result, structures such as #1 or #3 above may be overlooked during the editing process and end up being published.
There is now a vast library of content in electronic form that is being 'read aloud' everyday by commuters (among others) using Text-To-Speech software (TTS), or translated to other languages using Machine Translation software (MT). Both these software categories can benefit from the insertion of grammatical 'function words' and structural cues. This can be verified easily for sentences #2 and #4 above. The insertion of grammatical 'function words' and other structural cues improves the output of Text-To-Speech software significantly (try substituting a comma in place of 'pause'). Machine Translation software can also improve the quality of translation when complex structures are simplified before they are translated. However, only a radical rewrite can help when sentences are extremely complex - technology can only help marginally (and not at all, if a sophisticated parser cannot parse such sentences reliably).
There is a need for a technological solution that automatically identifies such complex grammatical structures in various languages, and indicates where grammatical 'function words', 'pauses', and 'intonation cues' may be inserted to reduce syntactic complexity.