You are here

Semantic Information

Chomsky's famous example (sentence 1) continues to be a challenge in parser design. In order to reject sentence 1, the parser requires a fair amount of knowledge about various concepts in the world (for e.g., 'idea' is an abstract concept that cannot be 'colored'; the verb 'sleep' requires its grammatical subject to be animate, etc.). However, most parser designers choose a divide-and-conquer approach, leaving 'reasoning-about-the-world' to some other AI module to handle.

  • 1. Colorless green ideas sleep furiously.
  • While the above sentence may seem an extreme case, the following set of sentences tested in Rayner et. al. [1] are quite common:

  • 2. The spy saw the cop with binoculars but the cop didn't see him.
  • 3. The spy saw the cop with the revolver but the cop didn't see him.
  • In sentence 2, the parser needs to figure out whether the prepositional phrase 'with-binoculars' is a modifier of the verb (i.e. 'saw-with-binoculars') or a modifier of the noun (i.e. 'the-cop-with-binoculars'). A human would know that the concept 'binoculars' is 'closely' related to the concept 'to see' (i.e. 'an instrument used to view/ see'), and would then reason that the preposition 'with' in sentence 2 probably has a 'verb-modifier' sense (i.e. 'saw-with-binoculars') in this case. In sentence 3, however, the human would rule out the 'instrumental' sense and would know that the preposition 'with' could only have a 'noun-modifier' sense (i.e. 'the-cop-who-has-the-revolver'). Clearly, knowledge of the world helps humans in parsing sentences.

    However, semantic information about tens of thousands of nouns is not available to a parser; hence most parsers will parse such sentences using some simple strategy that they hope works correctly most of the time. For instance, they may adopt the 'instrumental' sense for the preposition 'with' if the training data used by the parser indicates that this is the 'most likely' case given the local context. Alternatively, they may use a strategy such as the 'Minimal Attachment Strategy' [2][3] which would result in the parser choosing the 'verb-modifier' sense over the 'noun-modifier' sense in both sentences. The use of either strategy could result in the parser getting sentence 3 right but sentence 4 wrong. Now the downstream application, for e.g. a Machine Translation application, will construe sentence 4 to be equivalent to 'the spy saw-with-the-revolver' (the problem may escape detection if both senses of 'with' are translated to the same word in the target language). It must be kept in mind that the prepositional phrase 'with-the-revolver' could have an 'instrumental' sense if the verb in sentence 3 had been 'hurt' instead (i.e. the parser would have produced the correct parse-tree in that case). The easiest strategy for a parser is a 'noun-modifier' strategy as it is a simple rule that may work satisfactorily for the common NP-PP sequences (for e.g., 'of', 'from', 'in', 'to', etc.), but will get sentence 3 wrong.

    The WordNet project [4] [5] has documented thousands of words and their relations. It may be possible for a parser to use a strategy that somehow relates the noun 'binoculars' with the verb 'to see' using the information provided by WordNet. However, it may not be easy for the parser to rule out a relation between 'revolver' and the verb 'to see' especially since WordNet does not claim to document all possible relations. Further, it must be noted that such a strategy that uses WordNet will create a new set of 'errors of commission'. It must also be remembered that there are tens of thousands of nouns in English and many more relations and concepts that are not captured by WordNet. In fact, a philosopher would argue that it is impossible to document all such relations and concepts (as there may be a vast number of uniquely individual relations and concepts).

    Now let us examine the following partially ambiguous sentence tested by Eberhard et. al. [6] using eye-tracking equipment:

  • 4. Put the apple on the towel in the box.
  • If there was a single apple on the table, subjects initially assumed that the towel was the destination for the apple until they reached the second prepositional phrase 'in the box'. When there were two apples, only one of which was on a towel, subjects faced no ambiguity and knew that the destination was the box (i.e. they read 'put the-apple-that-is-on-the-towel in the box') . Clearly, a parser that does not have access to other cues (visual, contextual, discourse, etc.) cannot resolve the ambiguity inherent in the sentence, and cannot be faulted for assuming the towel to be the destination.

    As seen above, a correct parse of such sentences requires a basic knowledge of the world. Alternatively, a parser could generate multiple candidate parse-trees (but this could be a very large number if there are multiple choices in each clause) and leave it to a 'reasoning-about-the-world' AI module to figure out which one to choose. However, many applications of the parser do not want this additional complexity of choosing the right parse-tree from many candidates; hence, most parsers use a simple strategy to select one parse-tree and hope for a modest error rate. Unfortunately, sentences such as sentence 4 are not that rare in the 'real world'; which is why parsers frequently get a part of the parse-tree wrong.


    References