You are here

Sandhi Statistics of Chapter 1

On this page, we will present the results of an automated sandhi analysis of Chapter 1 of the Srimad Bhagavad Gita. All the sandhi sutras applicable to Chapter 1, as well as one example of each sutra, are listed on the next page.

The goal of 'sandhi analysis' is to identify the chains of Paninian sutras that transform 'underlying terms' to their respective 'printed forms' (as shown in the example below). The reader of the text sees the 'printed form' and tries to derive the 'underlying terms' by rolling-back the sandhi sutras that transformed them. This transformation frequently happens in multiple stages, where each sutra in a chain performs a specific operation on its input and produces an output that is used by the next sutra in the chain. It is also important to study in greater depth the contexts (and sutras) that lead to ambiguity while analyzing the 'printed form' (i.e. while trying to roll-back the sandhi sutras that transformed the underlying terms to this 'printed form'). A grammatical analysis of a Sanskrit sentence or stanza must analyze every word in a sentence, as Paninian sandhi sutras transform (or choose not to transform) underlying terms (declensions, conjugations, and indeclinables) into 'single words' (including the single words 'sas', 'tatas', and 'paNNavAnakagomukhAs' in the example below) as well as 'combination words' (the other words in the example below). In light of this, we will use the broader term 'sandhi analysis' instead of 'sandhi splitting'.

A sample sandhi analysis of Stanza 1.13 is presented below, showing all the chains of sandhi sutras (shown in square brackets in the example) that were applied to transform the 'underlying terms' to the 'printed form':


BG 1.13

ततः शङ्खाश्च भेर्यश्च पणवानकगोमुखाः
सहसैवाभ्यहन्यन्त स शब्दस्तुमुलोऽभवत्

Printed: tataHa shaNkhAshcha bheryashcha paNNavAnakagomukhAHa , sahasEvAbhyahanyanta sa shabdastumulo'bhavat .

shabdastumulo'bhavat ... [8.4.56, 8.2.39] vA'vasAne
shabdastumulo'bhavat ... [6.1.109] eN padAntAdati
abhavat ... [8.4.56, 8.2.39] vA'vasAne
tataHa sh ... [8.3.15, 1.3.2, 8.2.66] kharavasAnayo visarjanIyaHa
sa shabdastumulo ... [6.1.132] etattado sulopo'korannyasamAse hali
shabdas ... [8.3.34, 8.3.15, 1.3.2, 8.2.66] visarjanIyasya saHa
tataHa shaNkhAshcha ... [8.3.36, 8.3.15, 1.3.2, 8.2.66] vA shari
shaNkhAshcha ... [8.4.40, 8.3.34, 8.3.15, 1.3.2, 8.2.66] stoHa shchunA shchuHa
bheryashcha ... [8.4.40, 8.3.34, 8.3.15, 1.3.2, 8.2.66] stoHa shchunA shchuHa
saHa shabdaHa ... [8.3.36, 8.3.15, 1.3.2, 8.2.66] vA shari
sahasEvAbhyahanyanta ... [6.1.88, 1.1.1] vRuddhirechi
tumulo abhavat ... [6.1.113, 6.1.109, 6.1.87, 1.3.2, 8.2.66] ato roraplutAdaplute
shaNkhAHa ch ... [8.3.15, 1.3.2, 8.2.66] kharavasAnayo visarjanIyaHa
bheryaHa ch ... [8.3.15, 1.3.2, 8.2.66] kharavasAnayo visarjanIyaHa
saHa sh ... [8.3.15, 1.3.2, 8.2.66] kharavasAnayo visarjanIyaHa
shabdaHa t ... [8.3.15, 1.3.2, 8.2.66] kharavasAnayo visarjanIyaHa

Underlying: tatas shaNkhAs cha bheryas cha paNNavAnakagomukhAs , sahasA eva abhyahanyanta sas shabdas tumulas abhavat .


Presented below are the results of an automated sandhi analysis of Chapter 1 of the Srimad Bhagavad Gita.

# Description INPUT OUTPUT
FALSE FALSE
CORRECT NEGATIVE TOTAL % POSITIVE
A Combination words (two or more terms) 108 264 1 265 46.6 1
B Changed single-word terms 144 141 3 144 25.5 2
C Unchanged Vocatives / Special terms 27 27 27 4.6
D Unchanged non-special single-word terms 133 133 133 23.4 1
TOTAL 412 565 4 569 100.0 4
Errors 0.7% 0.7%

The Columns labelled 'False Negative' and 'False Positive' may require a few words of explanation. 'False Negative' refers to a term that was expected in the output but was not found (i.e a term that was missed by the sandhi analyzer, such as the term 'asmAt' in Stanza 1.39 below). 'False Positive' refers to a term that was found in the output but was not expected (i.e. an unexpected term created by the sandhi analyzer, such as the term 'asmAn' in Stanza 1.39 below). Of course, in this example, the missing term and the extra term 'cancel' each other out in some sense in row A (i.e. the software generates 'asmAn' instead of the expected 'asmAt'). However, the analysis gets a little more involved when the False Positive and the False Negative belong in different categories (as can be seen in rows B and D, where the software assumed an unchanged underlying single-word term 'rathopastha', whereas it was actually the changed underlying single-word term 'rathopasthe').

As will be noted from the table above, the number of errors is small (False Negatives < 1% and False Positives < 1%) for this Chapter. Most of the errors are due to inherent ambiguity, and some of these errors can be rectified by a syntactic parser that is run as the next step in the grammatical analysis. Most of the errors are attributed to 'single-word' terms.

The following is a summary of the False Negatives and False Positives discussed in the table above.

Stanza False Negatives False Positives
1.6 sarve sarvas
1.11 sarve sarvas
1.39 asmAt asmAn
1.47 rathopasthe rathopastha

NOTE: We can confirm that all the above errors (from the 'sandhi analysis' stage) were indeed rectified by the syntactic parser, when it was supplied with the alternatives in each case (for e.g. 'sarvas/sarve', 'asmAn/asmAt', and 'rathopastha/rathopasthe'). Note that the syntactic parser is run after the 'sandhi analyzer' identifies the underlying words for each stanza (i.e. the input to the syntactic parser is the output of the 'sandhi analyzer'). The output of the 'sandhi analysis' stage was edited to mark all the above ambiguous terms (of course, this needs to be done programatically by the 'sandhi analysis' software in future). The syntactic parser uses syntactic constraints (for e.g., feature agreement between subject and verb, etc.) to figure out the most appropriate option when it processes a term marked as ambiguous (such as 'sarvas/sarve') (see Parsing Statistics for Chapter 1 for the results of the syntactic parser). The task of the 'sandhi analyzer' is limited to figuring out which Paninian sandhi sutras may have been applied to which underlying terms in order to result in a specific 'combination term' seen in its input.


A. From (A) in the table above, we see that there were 108 combination words each of which was split into two or more terms. These combination words were split into 264 underlying terms. However, of these, one term ('asmAn') was incorrectly split from the sandhi 'pApAdasmAnnivartitum' (see Stanza 1.39 below). In this case, it was not clear to the sandhi analyzer whether or not the 8.4.45 sutra (8.4.45-yaro'nunAsike'nunAsiko vA) had transformed an underlying 'asmAt' into 'asmAn', or whether the underlying term was an unchanged 'asmAn' in the first place. Here, it assumed an unchanged underlying term of 'asmAn' which turned out to be the wrong decision. Please note that the term 'asmAn' is a valid declined form in itself. This appears to be a case of syntactic ambiguity. The term 'combination words' includes two kinds of combinations of terms; some constituents of a combination may be fused with their adjacent terms, while other constituents may have undergone no transformation and just be attached together. For instance, in the combination 'pApAdasmAnnivartitum', 'nivartitum' is unchanged whereas the other two constituents ('pApAt' and 'asmAt') have undergone some transformation. However, one could even argue here that 'nivartitum' has been rejected by the 8.3.23-mo'nusvAraHa sutra since the word is not succeeded by a consonant (i.e. some could even consider such cases as demonstrating a rejection rule rather than a selection rule). Another example of a combination word in which the constituents are merely attached together is 'yeSHAmarthe' in Stanza 1.33.

Stanza 1.39

कथं न ज्ञेयमस्माभिः पापादस्मान्निवर्तितुम् ,

कुलक्षयकृतं दोषं प्रपश्यद्भिर्जनार्दन .


B. Next, we see from (B) in the table above that 144 single-words were changed to their underlying declined/ conjugated terms. Of these, 3 terms were wrong (all in very similar contexts) due to ambiguity in the printed form. As discussed previously, it is sometimes possible to obtain the same 'printed form' from the application of different rules to different underlying terms. In one scenario, we have the 'printed form' with the context 'a_e/u' (for e.g., 'sarva eva', and 'rathopastha upAvishat' in Stanzas 1.6, 1.11, and 1.47 with the underlying forms 'sarve eva', and 'rathopasthe upAvishat'), where the 6.1.78-echo'yavAyAvaHa sutra had been applied (with the 8.3.19-lopaHa shAkalyasya elision sutra). In the other scenario, we have other stanzas with the similar context 'a_u' (for e.g., 'dhRutarASHTra uvAcha' and 'saNjaya uvAcha' in Stanzas 1.1 and 1.2 with the underlying forms 'dhRutarASHTras uvAcha and 'saNjayas uvAcha') where the 8.3.17-bho bhago agho apUrvasya yo'shi sutra had been applied (with the 8.3.19-lopaHa shAkalyasya elision sutra). It is not possible for the sandhi analyzer to determine which of these two scenarios is applicable given the similar contexts in the 'printed form' in both scenarios. In this case of ambiguity between the two scenarios, the sandhi analyzer made the wrong choice in the following stanzas:

Stanza 1.6

युधामन्युश्च विक्रान्त उत्तमौजाश्च वीर्यवान् ,

सौभद्रो द्रौपदेयाश्च सर्व एव महारथाः .

Stanza 1.11

अयनेषु च सर्वेषु यथाभागमवस्थिताः

भीष्ममेवाभिरक्षन्तु भवन्तः सर्व एव हि

Stanza 1.47

सञ्जय उवाच .

एवमुक्त्वाऽर्जुनः संख्ये रथोपस्थ उपाविशत् ,

विसृज्य सशरं चापं शोकसंविग्नमानसः .

It must be noted that there is a fair chance that a parser could resolve some of the above errors if these terms are marked as being ambiguous (this feature in our parser is undergoing testing at present). For instance, Noun phrases usually require the component terms to be in agreement with the Head on Number, Gender, and Case; hence the Masculine Singular Nominative declension 'sarvas (NOM-S-Masc., VOC-S-Masc.)' ('all') should be excluded by the parser (occurring only occasionally in sentences such as 'All is well'), as also the Dual forms 'sarve (NOM-D-Fem., VOC-D-Fem., NOM-D-Neut., VOC-D-Neut.)', in favour of the plural forms 'sarve (NOM-P-Masc., VOC-P-Masc.)' (such as the structure 'A, B, C, D, E, all are mighty warriors' listed above). However, this requires the parser to identify all the components of the Noun Phrase correctly in the first place; this is not trivial given the occurrence of appositional structures, other ambiguous terms, and other complications that we have observed. The choice between 'rathopastha (VOC-S-Neut.)' ('chariot seat') and 'rathopasthe (LOC-S-Neut., NOM-D-Neut., VOC-D-Neut.)' is also difficult for the parser because most of their forms have adjunctive roles in the clause, and can only be checked at the semantic level (i.e. a syntactic parser can only exclude the NOM-D-Neut. form with certainty).


C. Next, we see from (C) in the table above that the sandhi analyzer correctly avoided transforming the Unchanged Vocatives and special terms that should not have been transformed (i.e. were already displayed in their underlying form, for e.g., सञ्जय in Stanza 1.1, and राजा in Stanza 1.2). Owing to ambiguity, it is not easy to classify words in this category correctly, as will be shown in analyses of later chapters of the text (currently under preparation). It is also important to note here that the presence or absence of punctuation symbols (there are 96 punctuation symbols in Chapter 1, i.e. one per every long pause) plays an important role in sandhi analysis in general. There are also Paninian sutras based on other prosodic cues in the text, such as accent markers, and syllable structure. For example, there are rules preventing accentless terms (for e.g., short-form pronouns) from being at certain sites in a stanza or a 'pada'. Most of the verses in the Srimad Bhagavad Gita follow the anuSHTubh metre with 4 'padas', each containing 8 syllables (where 'stubh' means a 'stop' or a 'pause'). Some of these additional prosodic cues may be incorporated in our parser in a future revision of the software, for use with some texts.

Stanza 1.1

धृतराष्ट्र उवाच .

धर्मक्षेत्रे कुरुक्षेत्रे समवेता युयुत्सवः ,

मामकाः पाण्डवाश्चैव किमकुर्वत सञ्जय .

Stanza 1.2

सञ्जय उवाच ,

दृष्ट्वा तु पाण्डवानीकं व्यूढं दुर्योधनस्तदा ,

आचार्यमुपसङ्गम्य राजा वचनमब्रवीत् .


D. Finally, we see from (D) in the table above that the remaining terms were non-special single-word terms that did not require transformations (i.e. they were not transformed by sandhi rules and were already in their underlying forms). These include nominals ( द्रुपदपुत्रेण , तव , शिष्येण , धीमता , चमूम् ), verbs ( दृष्ट्वा , उवाच ), and indeclinables ( तु ), as seen in the examples from Stanzas 1.2 and 1.3 below:

Stanza 1.2

सञ्जय उवाच .

दृष्ट्वा तु पाण्डवानीकं व्यूढं दुर्योधनस्तदा ,

आचार्यमुपसङ्गम्य राजा वचनमब्रवीत् .

Stanza 1.3

पश्यैतां पाण्डुपुत्राणामाचार्य महतीं चमूम् ,

व्यूढां द्रुपदपुत्रेण तव शिष्येण धीमता .


It will be noted from the above discussion, that the analysis of sandhis should extend beyond combination words (two or more terms). Sandhi sutras also transform (or choose not to transform) every word in a sentence. In fact, 3 out of the 4 errors shown above pertain to single-word terms (i.e. not components of 'combination words'). A mistake at the stage of sandhi analysis will lead to a wrong syntactic analysis (or failed parsing constraints) by the parser (for e.g., if the VOC-S saNjaya is wrongly marked as NOM-S saNjayas). It is also important to correctly identify the chain of sandhi sutras that transforms each underlying term to its 'printed form' in multiple stages.

Interestingly, the ambiguity discussed in (B) above indicates to us another reason why Panini may have labelled the 8.3.19-lopaHa shAkalyasya sutra as a view that was not shared by all his grammarian predecessors, unlike the 8.3.22-hali sarveSHAm elision sutra that was accepted by all (however, in later chapters, we will observe that this sutra is also occasionally associated with cases of syntactic ambiguity).

Another point to note is that a very high accuracy should be expected from the sandhi analyzer when a text is in consonance with all the Paninian sandhi rules, and certain sandhi sutras that result in inherent syntactic ambiguity (in some contexts) are used infrequently. The high accuracy of the above sandhi analysis shows that the pitfalls of syntactic ambiguity can be avoided by using the large Sanskrit lexicon and 'free word order' to good advantage. Some may argue that errors arising from inherent syntactic ambiguity, of the kind discussed above, may be viewed kindly (and even treated as partially correct) because the sandhi analysis merely reflects ambiguity that is present in the text. Others (including this author) may argue that the reader is only concerned with a correct grammatical analysis, and that syntactic ambiguity at the sandhi analysis stage is a technical issue that a parser should try to resolve with more contextual cues. In other words, the division of responsibilities between a 'sandhi analyzer' and a 'syntactic parser' is artificial and irrelevant, and an advanced parser should use other cues, such as those an expert reader would employ, that may help in resolving ambiguity.

As mentioned in a preceding paragraph, our parser does indeed rectify all 4 errors found in the 'sandhi analysis' stage for this Chapter, as discussed in Parsing Analysis of Chapter 1. Please note that this rectification is contingent upon the clause structure being correctly identified by the parser (as was fortunately the case while parsing the affected stanzas in this Chapter); identification of clauses crucially involves the identification and insertion of elided verbs. The insertion of the 'copula verb' (the parser always inserts 'as:2P:to be:VerbPresent', although 'bhU:1P:to be:VerbPresent' or 'vid:4A:to be:VerbPresent' may be better choices in some cases) is done fairly extensively, as certain distinguishing features are present in most contexts where it is applicable. The parser was also successful in inserting an elided transitive verb from a preceding stanza, suitably conjugated for the form required in the sentence. For e.g., in Stanza 1.16, 'dadhmO' and 'dadhmatuHa' are inserted in the two clauses (from the root verb 'dhmA:1P:to sound:VerbPerfect' found in the immediately preceding stanza Stanza 1.15). However, such insertions are done under very 'tight' conditions as they involve a high-risk, being done mechanically without a 'real-world' understanding of the semantic contexts of the two stanzas. Given the above experience in Chapter 1, it is considered likely that the combination of the 'parser' and the 'sandhi analyzer' may be able to bring the error rate in 'sandhi analysis' down to a low level. However, the parser must be extremely sensitive to the complexities seen in the input data, and this must scale up to other texts. We will revisit this conjecture after processing all 18 chapters of this text, as well as several other texts.