You are here

Deep Parsing

Parsing of natural language sentences is a very complex task. Decades after Chomsky published the Government and Binding Theory [1][2][3][4][5][6][7][8], most natural language parsers are unable to parse 'real world' sentences 'satisfactorily'.

Some applications may require limited information on certain parts of the sentence, and need these results fast. For e.g., Search Engines can produce better results if, for instance, they are aware of the "Named Entities" that can answer "who" or "where" questions (for e.g., Noun Phrases that indicate name or place) related to the sentence. This has resulted in the emergence of high-speed Shallow parsers that focus on identifying only certain components (say, Noun Phrases) of a sentence. However, the question arises whether one can get the parts right without being aware of the whole. The fact is that natural languages are very complex and simple rules result in high error rates (and hopelessly wrong results at times). Applications that use Shallow Parsers will make a trade-off between this higher error rate and speed of processing. In the following sentence, consider what one can figure out about the 'sender' or the 'receiver' of the flowers if information about the whole tree is unavailable (the sentence contains an embedded clause with a passive in a 'reduced' form).

  • The florist sent the flowers was very pleased.[9]
  • Deep parsers, on the other hand, tend to be slower as they try to derive the parse-tree for the entire sentence, and this may involve a slower process of generating multiple "candidates" and then pruning defective parse-trees (most parsers will identify the "best-fit" candidate in the final pass). However, it must be noted that even Deep parsers are far from perfect - few can reliably handle the complex sentences found in the "real world".

    Further, many downstream applications need much more information than is provided by most parsers. For instance, information on Gaps and Trace Chains is crucial for a sophisticated Machine Translation application. In the absence of this information it will be difficult to determine the semantic subject ('who' sent the flowers - an undefined entity in this case) and semantic object ('what' was sent, and 'who' was the recipient) of the embedded clause in the sentence below. Hence, it is not surprising that many Machine Translation programs do a poor job of translating this sentence.

  • The florist sent the flowers was very pleased.[9]
  • Simple rules cannot be used to determine that 'the florist' is not the semantic subject of the embedded clause (it happens to be a semantic object in this case). This is because there is no 'scaffolding' in the above sentence to signal the onset of an embedded clause (or that it is in Passive Voice).


    References

    1. [Chomsky1980a] Chomsky N.. On Binding. Linguistic Inquiry. 1980;11:1-46.
    2. [Chomsky1980b] Chomsky N.. Rules and Representations. New York: Columbia University Press; 1980.
    3. [Chomsky1981] Chomsky N.. Lectures on Government and Binding: The Pisa lectures. Seventh 1993 ed. Berlin; New York: Mouton de Gruyter; 1981.
    4. [Chomsky1981b] Chomsky N.. Principles and Parameters in syntactic theory. In: Hornstein N., Lightfoot D., editors. Explanations in Linguistics. London: Longman; 1981.
    5. [Chomsky1982] Chomsky N.. Some consequences of the Theory of Government and Binding. Vol Linguistic Inquiry Monograph 6. Cambridge, MA: MIT Press; 1982.
    6. [Chomsky1986] Chomsky N.. Barriers. Vol Linguistic Inquiry Monograph 13. Cambridge, MA: MIT Press; 1986.
    7. [Chomsky1995] Chomsky N.. The Minimalist Program. Cambridge, MA: MIT Press; 1995.
    8. [Chomsky2000] Chomsky N.. New horizons in the study of language and mind. Cambridge, UK: Cambridge University Press; 2000.
    9. [RaynerCarlsonFrazier1983] Rayner K, Carlson M, Frazier L.. The Interaction of Syntax and Semantics during Sentence Processing: Eye Movements in the Analysis of Semantically Biased Sentences. Journal of Verbal Learning and Verbal Behavior. 1983;22:358-374.