This section will present an automated syntactic analysis of the Srimad Bhagavad Gita, one of the timeless epics of ancient India. This epic has been analysed in great depth by experts for over 2,500 years, and our presentation will largely follow the traditional form followed by Sanskrit grammarians in their grammatical analyses (see, for e.g., [1]). The only additions are the identification of Clauses and their arguments, as well as a listing of the specific Paninian sandhi rules applicable to each Stanza. However, the software does not split compound words ( समास ) into their constituents at present, as is usually found in grammatical analyses by experts. This is due to the fact that such a process involves resolving ambiguity, often requiring 'knowledge of the world', which is outside the scope and capabilities of our parser. Instead, such words will usually be defined simply as nominals in our lexicon.1 The derivation of participial forms and other derived forms is also not shown in this analysis. Finally, even if all the components are parsed correctly, putting them back together (frequently with a different, canonical, word-order) to help form a reasonable translation of the entire sentence turned out to be very problematic in a number of cases, and has not been presented in this document. This hard task is probably better left to the superior creativity and understanding of the human reader.
We will also show each stanza in its 'printed form' and then show the effect of rolling-back specific Paninian sandhi rules in order to arrive at the underlying terms. After this, the clauses will be identified for each stanza, and the declined / conjugated / indeclinable forms presented for each term (i.e. to help in identifying the Subject, Verb, Object(s) and Adjunct(s) of each clause). The declined / conjugated forms were checked against [1] and [2], while the clause structure was checked partially against [2] (clauses are not identified in [KAL2015]; for chapters 1-6, clauses are identified in [MM2015]; from chapter 7 onwards, we will identify clauses from the Srimad Bhagavadgita Sankarabhashya commentary). For the Paninian sandhi rules, we verified whether the results of rolling-back the sandhi rules matched the underlying terms in [1], since we have not come across any document that contains a comprehensive list of rules applicable to each stanza of this text.
Our rule-based Sanskrit parser is currently at an exploratory stage with the focus on figuring out the critical rules for this 'free word-order' language. We follow a rule-based system and the Government & Binding Theory framework [3][4][5][6][7][8][9][10]. However, most of the constraints can only be tested / applied at a late stage of parsing, when all the words have been seen by the parser, and the word order is close to being ascertained. In many cases, several constraints cannot be applied due to incomplete sentence fragments and other complications. The parser is also not optimised for speed of processing at present. Parsing has been split into two stages during this exploratory stage. One stage does the analysis of euphonic combinations, and the other stage does the grammatical analysis, assuming that the analysis of euphonic combinations was done correctly.
However, it must be noted here that the grammatical analysis of non-prose texts (such as the Srimad Bhagavad Gita) presents the highest level of difficulty due to the problem of 'free' word-order2. However, we have undertaken such an exercise in order to develop a highly nuanced, bottom-up understanding of 'free' word-order in Classical Sanskrit. In this chapter, we present the results of this grammatical analysis of the Srimad Bhagavad Gita, from which it may be observed that our parser successfully handles extremely complex issues. Our Sanskrit parser has evolved rapidly to a very advanced level, after encountering several unique challenges in each chapter of this text. This is an ongoing project, and the results of processing each chapter will be uploaded to the website soon after it is completed.
There is still work remaining to be done on the resolution of certain kinds of ambiguous terms (such as the short-form pronouns मे, नः , नौ , ते , वाम् , वः that can refer to multiple declined forms for the same base word, for e.g., Dative, Ablative, Genitive, Accusative, etc.). Some of these issues of ambiguity can be resolved by adding specific Features in the lexicon, but this will result in longer term non-scalability of the software. The attachment of adjunctive components (or any component that does not need to satisfy a syntactic Agreement condition) will remain problematic for some time, as a heuristic that attaches such elements to one clause or another will have, at best, a 50% chance of success when there are multiple clauses in a sentence. The correct attachment of such components requires a 'knowledge of the world', and is best left to human readers (i.e. it is well beyond the scope of this parser).
Clearly, it would not be appropriate to extrapolate the results of such a small sample obtained from a system that is at the exploratory stage. However, the early results of processing this text are encouraging, showing that it may be possible to automate the analysis of some texts with a reasonably small margin of error (requiring minimal human inputs to correct). However, this conjecture will be revisited after completing the analysis of all 18 chapters of this text, as well as other texts.