Decades ago, psycholinguistics researchers established that certain syntactic structures impose a high 'cognitive load' on readers (see for e.g., [1]).
However, automated tools are not available even today to detect and correct most such complex structures (even for English sentences). Even the most sophisticated parsers have difficulty in parsing complex 'real world' sentences reliably. Today, Statistical parsers dominate the world of parsing because of their 'wide coverage', but they have significant weaknesses. Human languages are extremely complex and training data used by statistical parsers cannot capture all the variations of language use. Chomsky pointed this out decades ago, and said there was no substitute for a meticulous analysis of the rules of grammar.
Some examples of unexpected behavior by statistical parsers are provided below - we assume that statistical parsers are used since they are the dominant class of parsers today.1
Marking clause boundaries in a complex sentence is a fairly difficult problem for most statistical parsers. If clause boundaries are marked wrongly, it severely impacts downstream applications such as Text-To-Speech and Machine Translation. Let us consider sentence #3 below, in which the parser has to decide whether the verb 'believed' has a Noun Phrase argument (i.e. 'the report') or a clause argument (i.e. 'the report was true').
Most parsers will find (for #3) that 'the report was true' looks like a complete clause (i.e. it appears to have a subject and a predicate), and decide that the verb 'believed' has this clause as its argument (the parser may also check that 'The performer believed the report' does not seem to be a sentential subject). The argument of the verb 'believed' is shown in square brackets in #4 below.
So far so good. Now let us check the reasoning applied by the parser to #5 below (an awkward construction that was used merely to make a point). This sentence has the same verb 'believed', but, in this case the parser should do something different i.e. it should realize that there are two clauses joined by 'and'.
However, for some statistical parsers, coordination of the Noun Phrases 'the reporter' and 'the report' trumps other cues. As a result, the parser marks 'the reporter and the report was true' as a clause, and marks this flawed clause as the argument of the verb 'believed' (as shown in square brackets in #6).
Clearly, the above problem would not have occurred if the parser had a rule for 'number agreement' (i.e. that the predicate should be 'were true' if the subject has the plural feature). However, most statistical parsers have a minimal set of rules.
One expects that statistical parsers will exhibit different behavior for different verbs in view of their differing patterns of use. Let us consider the double-object verb 'sent' which can either be in active voice (in #7, the performer is doing the 'sending'), or in passive voice (in #8 and #9, some undefined entity is doing the 'sending' to the performer). In the ideal world, #8 may have been rewritten as shown in #9 (see [2] for a set of psycholinguistics experiments conducted using such sentences).
Oddly, if we use the parser on #8, once again it chooses to mark 'the report was pleased' as a clause and then assign this clause as the argument of the verb 'sent' (the clause argument is shown in square brackets in #10).
The point being made is that the output of statistical parsers can frequently be defective. Some parsers may not make the mistakes shown above, but will make equally severe mistakes on other kinds of structures. This is the reason why it is difficult for a downstream application to take the output of a parser at face value. The above parsing problems arise because statistical parsers (the dominant approach) do not look for defects systematically in the parse-trees that they generate. On the other hand, Rule-based parsers, that have extensive rules for grammar checking, are out of favor because they do not scale up to handle 'real world' sentences. Clearly there is a need for a hybrid approach that combines the strengths of the two approaches.
A defective parse-tree could result, say, in the insertion of cues (such as 'who was', or 'pause') at the wrong sites in the sentence (known as 'False Positives'). Downstream applications such as Text-To-Speech or Machine Translation are particularly concerned about False Positives, as the wrong insertion of the cue 'that was' will wreak havoc with their output. False Negatives (for e.g. failing to insert a required syntactic cue), on the other hand, will do little harm as the sentence is unchanged; however, the parser adds no value. Text-To-Speech software would prefer False Positives to be minimized, even if False Negatives are high.
Given the limitations of Deep Parsers and the additional resources that they require, TTS programs instead use fast, shallow parsing techniques to determine where to insert cues such as 'pauses' and changes in intonation. This approach works only for simple sentences, but is far from ideal for more complex sentences. The absence of simple cues, such as punctuation, makes it very difficult for a Shallow Parser to determine where to insert the 'pauses' and changes in intonation.