Natural language understanding (NLU) is one of the major research areas in artificial intelligence. However, even after decades of work, the first step in NLU, i.e. parsing of text, continues to present a major stumbling block. The following example shows the difficulty faced by a parser when it faces two similar-looking sentences that have very different underlying structures:
Most parsers will produce identical structures for the above two sentences, as they cannot make a distinction between 'eager' and 'easy'.
Some linguistic theories, for e.g. the Minimalist Theory (for e.g. see [3],[4]), hold that parsing can be compared with putting together a jigsaw puzzle that contains odd-shaped pieces that fit together only in very specific ways. The Lexicon holds all the words that people learn, along with their usage, exceptions and specific features (i.e. odd-shaped pieces).
Nouns in the Lexicon have a bundle of features including animacy, abstractness, gender, number, etc..
Verbs have a bundle of features such as number, gender, etc. in addition to selectional restrictions and theta roles that constrain their arguments. For instance, selectional restrictions on the verb 'sleep' may specify that it can only have an animate, non-abstract subject. Theta roles associated with each argument of the verb specify the role played by each argument (for e.g., an Agent role for the semantic subject of the transitive verb 'kick').
However, there are tens of thousands of words in the Lexicon of each language - it may be impossible to 'comprehensively' capture all the different features of each entry in the Lexicon.