Our goal for a parsing analysis of the text was to analyze each stanza in order to identify the correct declined/ conjugated form for each term, and to associate it with its respective Clause as an argument (for e.g., the NOM subject or ACC object of a verb in active voice) or as an adjunct/ non-argument (for e.g, LOC, VOC). The declensions and conjugations should also show details of the base/ root and other relevant details of each declension/ conjugation. This is a very challenging problem, as will become clear from a review of our analysis of each stanza.
A sample parsing analysis for Stanza 1.39 is shown below:
A.2:
katham:Indeclinable na:Indeclinable jnyeyam:NOM-S:jnyeya:Neut.:Noun:potential_participle_passive_yat_9U_jnA asmAbhiHa:INS-P:asmad:Masc.:Pronoun:Link_gov_jnyeyam A.1:
pApAt:ABL-S:pApa:Neut.:Noun asmAt:ABL-S:idam:Masc.:Pronoun nivartitum:-:ni-vRut:1:A:VerbInfinitive kulakSHayakRutam:ACC-S:kula-kSHaya-kRuta (kulakSHayakRuta) :Neut.:Noun:past_participle_passive_kta_8P_kRu:Link_subj_doSHam doSHam:ACC-S:doSHa:Neut.:Noun:Link_gov_prapashyadbhiHa prapashyadbhiHa:INS-P:prapashyat:Masc.:Adj:present_participle_shatRu_1P_pra-dRush janArdana:VOC-S:janArdana:Masc.:Noun
In our presentation of the results, we will largely follow the method followed by Sanskrit grammarians for several thousand years (see for e.g., [KAL2015] [1] and [MM2015] [2] ). However, we are unable to provide an analysis of 'compound words' ('samAsa'), as this is beyond the capabilities of a syntactic parser. The parser needs to choose one from amongst several alternative forms for each term (for e.g., 'jnyeyam' above could be one of 'Adj/Noun:jnyeya:NOM-S-Neut.:that which ought to be known', 'Adj/Noun:jnyeya:ACC-S-Masc.:one who ought to be known', or 'Adj/Noun:jnyeya:ACC-S-Neut.:that which ought to be known' i.e. it could be either the subject of the clause, the object, or a Predicative Adjective)1. The parsing analysis of each stanza shows the clauses (for e.g., A.1 and A.2), as well as the selected declension/ conjugation/ indeclinable form for each term. Please note that the sample analysis above does not show the insertion by the parser of the elided 'copula verb' ('as:2:P:to be:VerbPresent') in clause A.1, which is crucial for the analysis, but may cause some confusion for the reader as it is not present in the input provided to the parser.
Although our parser is still at an exploratory stage, it performed very well on stanzas of Chapter 4 of the text. However, we do not claim that this high level of accuracy can be replicated in further chapters, as significant variations have been noted in the syntactic complexity of each stanza, and a number of nuances will need to be handled. In general, the accuracy of the parser will be lower on liturgical texts and non-prose texts (where the word order is partially determined by the metre of the verse), making it extraordinarily difficult for the parser to make the correct choices in the absence of sufficient cues. It must be kept in mind that the insertion of elided verbs is crucial in order that the parser can make the right choices, but it is far from trivial to insert these elided terms correctly.
It must also be pointed out that the input used by the parser was the output produced by the 'sandhi analyzer' (i.e. terms are expected to be in their underlying forms, prior to the operation of sandhi sutras), but with the 3 defects observed in the sandhi analysis stage being marked as 'unresolved' (for e.g., the input to the parser was changed to 'se/saHa' in Stanza 4.3, and this was treated by the parser as a term that was not 'resolved' by the 'sandhi analyzer'). The parser was additionally tasked with identifying the correct alternative for each such 'unresolved' term (for e.g., choose 'saHa' instead of 'se' in this stanza, if justified). Happily, the parser chose the correct alternative for all of these defects. Needless to say, this manual intervention is a temporary measure, as the 'sandhi analysis' software will eventually mark the alternatives for 'unresolved' terms automatically. We prefer to keep the 'sandhi analyzer' and the parser independent of each other at present, in order to be able to measure the defects in each independently.
In conclusion, it is safe to say that parsing a text of this complexity is extraordinarily difficult given the occurrence of 'free' word order. However, despite this difficulty, we believe that it is necessary for a 'bottom-up' analysis of such texts in order to obtain a highly nuanced understanding of 'free' word order in Sanskrit. We will show a number of instances where the assignment by the parser does not coincide with one or the other of our chosen experts ([KAL2015] and [MM2015]). We have also come across a few instances (in later chapters of the text) where the parser's assignment is at variance with those of both the experts, but the parser has probably made the correct assignment (this is an advantage of a rule-based system, provided the rules are highly nuanced).
Defective assignment of Case/ Number/ Clause have been highlighted in the respective stanzas. The standard used for identifying defects were [KAL2015] and [MM2015]. As will be seen in the table below, most of the defects identified as per [KAL2015] are questionable if checked against [MM2015]. It is possible that some of these 'defects' may be printing errors in [KAL2015].
Description | Count | Notes |
Input terms | 518 | The terms after the sandhi process analyses the input stanza. |
Additional Elided verb terms | 33 | The parser needs to insert elided verbs in a large number of clauses. |
False Negative Elided verb terms | 1 | The parser failed to create these elided verbs. See Table 5 |
False Negative Elided non-verb terms | 0 | The parser failed to create these elided terms. See Table 5 |
Total Expected terms | 552 | This count is used as the denominator for computing percentage of defects |
Total Defects (see Table 2 below) | 1+9+4=14 | 2.5 percent (13/552) |
From the above analysis, we can see that the number of defects is quite small (i.e. 2.5 percent) in the assignment of terms by the parser. The parser necessarily needs to insert elided terms (largely verbs, but also nominals occasionally) in order to figure out the clause structure, as the correct assignment of terms is contingent upon identifying the clauses. In this process, some errors can result from the erroneous inclusion or exclusion of an elided term. However, we expect the parser to perform better while processing prose, as the complexity of non-prose texts is very high due to the constraints imposed by meter (i.e. 'free word-order' is expected to be less of a problem in prose text as compared with non-prose).
Description | Count | Notes | |
Description | Correct | Defective | Notes |
|
0 | 0 | Incorrect definitions in the parser's lexicon are marked as '*dd' in the table below. |
|
0 | 0 | Short-form pronouns are very difficult to handle reliably (marked as '*spd' below). The correct identification of short-form pronouns may be beyond the capability of a syntactic parser, as this task may require the parser to have an understanding of the 'real world'.Short-form pronouns are very difficult to handle reliably (marked as '*spd' below). The correct identification of short-form pronouns may be beyond the capability of a syntactic parser, as this task may require the parser to have an understanding of the 'real world'. |
|
1 | 0 | Incorrect declension/ conjugation assignments by the parser are marked '*ad' below. |
|
2 | 1 | Incorrect gender in declension assignments by the parser are marked '*gd' below. Selecting the correct gender (where there are multiple genders) is a semantic issue that is beyond the capabilities of a syntactic parser. |
|
9 | 9 | Incorrect clause assignments by the parser are marked '*cd' below. Clause assignment is marked as N.A for [KAL2015], as it is not marked. Only significant differences are shown. |
|
4 | 4 | Sometimes, a higher-level understanding is required to handle concepts such as metaphors, that require the creation of additional Hidden Clauses (and elided verbs). These semantic defects of the parser are marked '*hd' in the tables below. Clause assignment is marked as N.A for [KAL2015], as it is not marked. Only significant differences are shown. |
|
0 | 0 | Incorrect input leading to defective assignments by the parser are marked '*id' below. |
Total Defects | 16 | 14 | See Tables 3,4,5 below |
Notes:
The following table lists those terms which have been assigned the wrong Conjugation (Verb, Number, Person) or Declension (Case, Number, Gender) in the Gloss.
# | Stanza | Clause | Word | ASSIGNMENT | Type | Comment | #Defects | ||
Parser | [KAL2015] | [MM2015] | |||||||
1 | 4.7 | A.1 | abhyutthAnam | Neut. | Neut. | Masc. | *gd | Not a defect | 0 |
2 | 4.23 | A.1 | pravilIyate | Intransitive Present | Intransitive Present | Passive Present | *ad | Not a defect | 0 |
3 | 4.32 | A.1 | brahmaNNaHa | Neuter | Masculine | Masculine | *gd | *Defect* | 1 |
TOTAL DEFECTS | 1 |
The following table lists those terms that are assigned to the wrong clause by the parser. Note that information on clauses can only be ascertained from [MM2015] ([KAL2015] does not provide such details).
# | Stanza | Clause | Word | ASSIGNMENT | Type | Comment | #Defects | ||
Parser | [KAL2015] | [MM2015] | |||||||
1 | 4.1 | B.1 | avyayam | B.1 | N.A. | B.3 | *cd | *Defect* | 1 |
2 | 4.3 | A.2 | bhaktaHa | A.2 | N.A. | A.1 | *cd | *Defect* | 1 |
3 | 4.5 | B.2 | sarvANNi | B.2 | N.A. | B.1 | *cd | *Defect* | 1 |
4 | 4.22 | A.1 | siddhO | A.1 | N.A. | A.2 | *cd | *Defect* | 1 |
5 | 4.22 | A.1 | asiddhO | A.1 | N.A. | A.2 | *cd | *Defect* | 1 |
6 | 4.22 | A.1 | cha | A.1 | N.A. | A.2 | *cd | *Defect* | 1 |
7 | 4.32 | A.1 | mukhe | A.1 | N.A. | A.4 | *cd | *Defect* | 1 |
8 | 4.39 | A.3 | tatparaHa | A.3 | N.A. | A.1 | *cd | *Defect* | 1 |
9 | 4.39 | A.3 | saNyatendriyaHa | A.3 | N.A. | A.1 | *cd | *Defect* | 1 |
TOTAL DEFECTS | 9 |
The following table lists those terms that can be classified as Hidden Clauses that the parser should have created, but did not detect. Terms shown in square brackets are elided terms (such as [asti] below) that should have been inserted by the parser in the Hidden Clause. Note that information on clauses can only be ascertained from [MM2015] ([KAL2015] does not provide such details).
*** Note the False Negative [verb:asti] shown in Stanza 4.24, where the parser did not create a Hidden Clause where one was required. In the case of Stanza 4.6, the parser created the elided copula verb but not the correct Number/Person.
# | Stanza | Clause | Word | ASSIGNMENT | Type | Comment | #Defects | ||
Parser | [KAL2015] | [MM2015] | |||||||
1 | 4.6 | A.3 | [verb:staHa] | [staHa] | N.A. | [asmi] | *hd | *Defect* | 1 |
2 | 4.24 | A.2 | brahma | A.2. | N.A. | A.3 | *hd | *Defect* | 1 |
3 | 4.24 | A.2 | arpaNNam | A.2 | N.A. | A.3 | *hd | *Defect* | 1 |
4 | 4.24 | A.2 | [verb:asti] | N.A. | N.A. | A.3[asti] | *hd | *Defect* | 1 |
TOTAL DEFECTS | 4 |