You are here

Automated Syntactic analysis of the Sanskrit classics

Unlike most natural languages, the constituent terms of a sentence in Sanskrit are not self-evident at first glance. This is because the underlying constituent terms undergo several transformations before they reach the final 'printed form', where they are frequently fused together with their adjacent terms. We use the term 'printed form'1 to distinguish it from the 'Phonetic Form' of the Government & Binding Theory framework [1][2][3][4][5][6][7][8]. This process of transformation, applying rules known as 'sandhi' or 'euphonic combination' rules, were described by Panini in the Ashtadhyayi (c. 500 BCE). The reader must roll-back these transformations in order to arrive at the underlying terms (click on the sutra link below, shown within curly braces, to see an illustration of that Paninian sutra). This roll-back may require considerable effort, especially when a chain of rule-based transformations is involved as below (read the transformations from right to left):

syAjjanArdana ⇐ syAj janArdana ⇐{8.4.40}=== syAd janArdana ⇐{8.2.39}=== syAt janArdana

In the above, the underlying form is 'syAt janArdana', undergoing transformations in two stages to reach its final 'printed form'. Further analysis will inform the reader that 'syAt' is the Potential-3rd Person-Singular conjugated form of the irregular verb 'as'-2P ('to be'). The reader must also figure out where to split a fused combination as there may be several valid ways to split it, especially if the components do not jump out at one as they do for experts. See, for e.g., [9] for an English translation of the key rules abridged by Varadaraja from the Siddhanta Kaumudi (the Siddhata Kaumudi, by Bhattoji Dikshitar, contains rules extracted from the Ashtadhyayi, arranged in an alternate order that was easier for novitiates to assimilate).

The above 'sandhi analysis' is only the first stage, giving us the underlying terms in the sentence. Next, we must figure out the clause structure of the sentence. A major difficulty faced by the reader is that Sanskrit (especially in the classics) has a relatively 'free' word-order i.e. constituent terms of the Subject, Object(s), Adjunct(s) and Verb phrases of a clause can be present in different parts of a sentence. In addition, many terms may also be elided (i.e. a 'hidden' Subject or a Verb must be 'read in' to understand the clause/ sentence). Furthermore, many people assume that overtly marked Case in Sanskrit makes it easy to identify the Subject and the Object(s) of clauses within a sentence. But this is not true, because many underlying declensions and conjugations may share the same form, leaving the reader in some difficulty to figure out which Case/Number/Gender or Tense/Aspect/Mood is applicable for a particular declined term or conjugated term in a given context (for e.g. whether a declined form ending in 'e' is a Nominative-Plural-Masculine term, a Nominative-Dual-Feminine term, an Accusative-Dual-Feminine term, a Nominative-Dual-Neuter term, an Accusative-Dual-Neuter term, or a Locative-Singular-Masculine nominal term, etc.). The process of systematically eliminating the wrong declined / conjugated forms leads to the identification of the Subject, Object(s), Adjunct(s), and Verb phrases of a clause.

A syntactic analysis of the kind discussed above is essential to figure out the syntactic structure of each sentence (and, thereafter, its meaning). While this process of eliminating many alternate forms may become second nature to the expert, it requires a lot of practice as well as access to syntactic analyses at the early stages of learning the language.

Clearly, every serious student of the Sanskrit classics, over the past 2,500 years and more, must have done such analyses in order to understand the structure of the sentences in each text. However, as can be seen in this document, such an analysis is not only time-consuming, but is also very tedious to present because every word in a sentence requires a line showing the root/ base word, and the specific transformations that it may have undergone. This becomes even more tedious when one needs to list the Paninian rules that are applicable to each sandhi in a sentence. See a sample analysis of Stanza 1.39 of the Srimad Bhagavad Gita for the key details required to give the reader a good idea of the internal structure of the stanza (note that the correct interpretation of this stanza also requires an appreciation of its two participles, and their respective arguments).

Technological advances in the modern era can make Sanskrit more accessible by enabling or disabling the display of such details in a browser or hand-held reading device. The student in the modern era can benefit hugely from the availability of such digital analyses, because that may make the language more accessible and the learning curve a little less steep. However, this presupposes that such analyses are available today (and in digital form) for all the classics (the Mahabharata contains roughly 100,000 stanzas, while the Ramayana contains roughly 24,000 stanzas, and there are a large number of other classics). This software takes a step towards providing such automated grammatical analyses (hopefully with a small number of errors that can be corrected manually). A rule-based system is particularly well-suited for such a syntactic analysis.

In the next section, we discuss the results of applying this automated syntactic analysis to the stanzas of the Srimad Bhagavad Gita, one of the timeless epics of ancient India.

  • 1. The term 'printed form' may require a few words of explanation given the oral tradition followed for millennia with respect to Sanskrit texts. Our rationale for this term is that it conveys the fact that the 'phonological form' of the sentence has been modified in some manner for the convenience of the 'reader', such as the insertion of the 'avagraha' symbol (indicating the elision of 'a'), the spaces used to delineate 'words', marked long pauses and full-stops indicated by 'avasAna' (delineating sentences), etc.. In other words, we do actually mean the 'printed' form that is read by the reader (and the software) rather than the 'phonological form' that is heard by the listener.

References

  1. [Chomsky1980a] Chomsky N.. On Binding. Linguistic Inquiry. 1980;11:1-46.
  2. [Chomsky1980b] Chomsky N.. Rules and Representations. New York: Columbia University Press; 1980.
  3. [Chomsky1981] Chomsky N.. Lectures on Government and Binding: The Pisa lectures. Seventh 1993 ed. Berlin; New York: Mouton de Gruyter; 1981.
  4. [Chomsky1981b] Chomsky N.. Principles and Parameters in syntactic theory. In: Hornstein N., Lightfoot D., editors. Explanations in Linguistics. London: Longman; 1981.
  5. [Chomsky1982] Chomsky N.. Some consequences of the Theory of Government and Binding. Vol Linguistic Inquiry Monograph 6. Cambridge, MA: MIT Press; 1982.
  6. [Chomsky1986] Chomsky N.. Barriers. Vol Linguistic Inquiry Monograph 13. Cambridge, MA: MIT Press; 1986.
  7. [Chomsky1995] Chomsky N.. The Minimalist Program. Cambridge, MA: MIT Press; 1995.
  8. [Chomsky2000] Chomsky N.. New horizons in the study of language and mind. Cambridge, UK: Cambridge University Press; 2000.
  9. [jrb1995] Ballantyne J.. Laghu Kaumudi of Varadaraja: A Sanskrit grammar. Motilal Banarsidass Publishers:New Delhi; 1995.