You are here

Parsing statistics

Our goal for a parsing analysis of the text was to analyze each stanza in order to identify the correct declined/ conjugated form for each term, and to associate it with its respective Clause as an argument (for e.g., the NOM subject or ACC object of a verb in active voice) or as an adjunct/ non-argument (for e.g, LOC, VOC). The declensions and conjugations should also show details of the base/ root and other relevant details of each declension/ conjugation. This is a very challenging problem, as will become clear from a review of our analysis of each stanza.

A sample parsing analysis for Stanza 1.39 is shown below:


A: katham na jnyeyam asti asmAbhiHa pApAt asmAn/asmAt nivartitum kulakSHayakRutam doSHam prapashyadbhiHa janArdana

A.2:

  • katham:Indeclinable
  • na:Indeclinable
  • jnyeyam:NOM-S:jnyeya:Neut.:Noun:potential_participle_passive_yat_9U_jnA
  • asmAbhiHa:INS-P:asmad:Masc.:Pronoun:Link_gov_jnyeyam
  • A.1:

  • pApAt:ABL-S:pApa:Neut.:Noun
  • asmAt:ABL-S:idam:Masc.:Pronoun
  • nivartitum:-:ni-vRut:1:A:VerbInfinitive
  • kulakSHayakRutam:ACC-S:kula-kSHaya-kRuta (kulakSHayakRuta) :Neut.:Noun:past_participle_passive_kta_8P_kRu:Link_subj_doSHam
  • doSHam:ACC-S:doSHa:Neut.:Noun:Link_gov_prapashyadbhiHa
  • prapashyadbhiHa:INS-P:prapashyat:Masc.:Adj:present_participle_shatRu_1P_pra-dRush
  • janArdana:VOC-S:janArdana:Masc.:Noun

  • In our presentation of the results, we will largely follow the method followed by Sanskrit grammarians for several thousand years (see for e.g., [KAL2015] [1] and [MM2015] [2] ). However, we are unable to provide an analysis of 'compound words' ('samAsa'), as this is beyond the capabilities of a syntactic parser. The parser needs to choose one from amongst several alternative forms for each term (for e.g., 'jnyeyam' above could be one of 'Adj/Noun:jnyeya:NOM-S-Neut.:that which ought to be known', 'Adj/Noun:jnyeya:ACC-S-Masc.:one who ought to be known', or 'Adj/Noun:jnyeya:ACC-S-Neut.:that which ought to be known' i.e. it could be either the subject of the clause, the object, or a Predicative Adjective)1. The parsing analysis of each stanza shows the clauses (for e.g., A.1 and A.2), as well as the selected declension/ conjugation/ indeclinable form for each term. Please note that the sample analysis above does not show the insertion by the parser of the elided 'copula verb' ('as:2:P:to be:VerbPresent') in clause A.1, which is crucial for the analysis, but may cause some confusion for the reader as it is not present in the input provided to the parser.


    Although our parser is still at an exploratory stage, it performed very well on stanzas of Chapter 5 of the text. However, we do not claim that this high level of accuracy can be replicated in further chapters, as significant variations have been noted in the syntactic complexity of each stanza, and a number of nuances will need to be handled. In general, the accuracy of the parser will be lower on liturgical texts and non-prose texts (where the word order is partially determined by the metre of the verse), making it extraordinarily difficult for the parser to make the correct choices in the absence of sufficient cues. It must be kept in mind that the insertion of elided verbs is crucial in order that the parser can make the right choices, but it is far from trivial to insert these elided terms correctly.

    It must also be pointed out that the input used by the parser was the output produced by the 'sandhi analyzer' (i.e. terms are expected to be in their underlying forms, prior to the operation of sandhi sutras), but with any defects observed in the sandhi analysis stage being marked as 'unresolved' and semantic defects rectified (for e.g., the input to the parser was changed to 'tat param' for the semantic defect in Stanza 5.16. We prefer to keep the 'sandhi analyzer' and the parser independent of each other at present, in order to be able to measure the defects in each independently.

    In conclusion, it is safe to say that parsing a text of this complexity is extraordinarily difficult given the occurrence of 'free' word order. However, despite this difficulty, we believe that it is necessary for a 'bottom-up' analysis of such texts in order to obtain a highly nuanced understanding of 'free' word order in Sanskrit. We will show a number of instances where the assignment by the parser does not coincide with one or the other of our chosen experts ([KAL2015] and [MM2015]). We have also come across a few instances (in later chapters of the text) where the parser's assignment is at variance with those of both the experts, but the parser has probably made the correct assignment (this is an advantage of a rule-based system, provided the rules are highly nuanced).

    Defective assignment of Case/ Number/ Clause have been highlighted in the respective stanzas. The standard used for identifying defects were [KAL2015] and [MM2015]. As will be seen in the table below, most of the defects identified as per [KAL2015] are questionable if checked against [MM2015]. It is possible that some of these 'defects' may be printing errors in [KAL2015].

    Table 1: Count of Expected terms in parser output
    Description Count Notes
    Input terms 356 The terms after the sandhi process analyses the input stanza.
    Additional Elided verb terms 23 The parser needs to insert elided verbs in a large number of clauses.
    False Negative Elided verb terms 0 The parser failed to create these elided verbs. See Table 5
    False Negative Elided non-verb terms 1 The parser failed to create these elided terms. See Table 5
    Total Expected terms 380 This count is used as the denominator for computing percentage of defects
    Total Defects (see Table 2 below) 0+4+1=5 1.3 percent (5/380)

    From the above analysis, we can see that the number of defects is quite small (i.e. 1.3 percent) in the assignment of terms by the parser. The parser necessarily needs to insert elided terms (largely verbs, but also nominals occasionally) in order to figure out the clause structure, as the correct assignment of terms is contingent upon identifying the clauses. In this process, some errors can result from the erroneous inclusion or exclusion of an elided term. However, we expect the parser to perform better while processing prose, as the complexity of non-prose texts is very high due to the constraints imposed by meter (i.e. 'free word-order' is expected to be less of a problem in prose text as compared with non-prose).


    Table 2: Summary of defects
    Description Count Notes
    Description Correct Defective Notes
  • *dd
  • 0 0 Incorrect definitions in the parser's lexicon are marked as '*dd' in the table below.
  • *spd
  • 0 0 Short-form pronouns are very difficult to handle reliably (marked as '*spd' below). The correct identification of short-form pronouns may be beyond the capability of a syntactic parser, as this task may require the parser to have an understanding of the 'real world'.Short-form pronouns are very difficult to handle reliably (marked as '*spd' below). The correct identification of short-form pronouns may be beyond the capability of a syntactic parser, as this task may require the parser to have an understanding of the 'real world'.
  • *ad
  • 0 0 Incorrect declension/ conjugation assignments by the parser are marked '*ad' below.
  • *gd
  • 0 0 Incorrect gender in declension assignments by the parser are marked '*gd' below. Selecting the correct gender (where there are multiple genders) is a semantic issue that is beyond the capabilities of a syntactic parser.
  • *cd
  • 4 4 Incorrect clause assignments by the parser are marked '*cd' below. Clause assignment is marked as N.A for [KAL2015], as it is not marked. Only significant differences are shown.
  • *hd
  • 1 1 Sometimes, a higher-level understanding is required to handle concepts such as metaphors, that require the creation of additional Hidden Clauses (and elided verbs). These semantic defects of the parser are marked '*hd' in the tables below. Clause assignment is marked as N.A for [KAL2015], as it is not marked. Only significant differences are shown.
  • *id
  • 0 0 Incorrect input leading to defective assignments by the parser are marked '*id' below.
    Total Defects 5 5 See Tables 3,4,5 below

    Notes:

    • We have not marked defects in identification of participles (for e.g., kta, yat, shatRu) or compound words (samAsa), as too many terms are unmarked as such in [MM2015] (although they are assumed to be present in the translations).
    • Further, we have not marked defects in the identification of the Arguments of participles, since these are not marked in [KAL2015], and occasionally unmarked in [MM2015].
    • The internal structure of clauses is difficult to compare, as the parser treats non-finite verbs (Gerunds and Infinitives) as separate clauses with their own arguments, while [MM2015] includes them as a part of a larger finite verb clause.

    The following table lists those terms which have been assigned the wrong Conjugation (Verb, Number, Person) or Declension (Case, Number, Gender) in the Gloss.

    Table 3: Defective assignment of Declension/ Conjugation/ Indeclinable Glosses to terms by the Parser
    # Stanza Clause Word ASSIGNMENT Type Comment #Defects
    Parser [KAL2015] [MM2015]
    1 5.16 A.2 tat NOM-S ACC-S NOM-S *ad Not a defect 0
    2 5.16 A.2 ajnyAnam NOM-S ACC-S NOM-S *ad Not a defect 0
    3 5.16 A.2 nAshitam NOM-S ACC-S NOM-S *ad Not a defect 0
    4 5.16 A.1 Adityavat NOM-S ACC-S NOM-S *ad Not a defect 0
    5 5.16 A.1 jnyAnam NOM-S ACC-S NOM-S *ad Not a defect 0
    6 5.21 A.1 yat ACC-S ACC-S NOM-S *ad Not a defect 0
    7 5.21 A.1 sukham ACC-S ACC-S NOM-S *ad Not a defect 0
    TOTAL DEFECTS 0

    The following table lists those terms that are assigned to the wrong clause by the parser. Note that information on clauses can only be ascertained from [MM2015] ([KAL2015] does not provide such details).

    Table 4: Defective assignment of Clauses to terms by the Parser
    # Stanza Clause Word ASSIGNMENT Type Comment #Defects
    Parser [KAL2015] [MM2015]
    1 5.10 A.1 karmANNi A.1 N.A. A.3 *cd *Defect* 1
    2 5.11 A.2 Atmashuddhaye A.2 N.A. A.1 *cd *Defect* 1
    3 5.14 A.2 na A.2 N.A. A.1 *cd *Defect* 1
    4 5.14 A.2 karmaphalasaNyogam A.2 N.A. A.1 *cd *Defect* 1
    TOTAL DEFECTS 4

    The following table lists those terms that can be classified as Hidden Clauses that the parser should have created, but did not detect. Terms shown in square brackets are elided terms (such as [asti] below) that should have been inserted by the parser in the Hidden Clause. Note that information on clauses can only be ascertained from [MM2015] ([KAL2015] does not provide such details).

    *** Note the False Negative Noun [vibhuHa] shown in Stanza 5.15, where the parser did not create a 'shared' subject between clauses A.1 and A.4.

    Table 5: Defective identification of Hidden Clauses by the Parser
    # Stanza Clause Word ASSIGNMENT Type Comment #Defects
    Parser [KAL2015] [MM2015]
    1 5.15 A.1 False Negative [vibhuHa] A.4 N.A. A.1,A.4 *hd *Defect* 1
    TOTAL DEFECTS 1

    • 1. [KAL2015] frequently does not distinguish between Adjectives and Nouns, hence we will not make that distinction when assignments to either are made by the parser. Note that, for the parser, this is usually a question of the definition of the term in the lexicon.

    References

    1. [kal2015] Kalavade L., Kalavade P.. Gitavyakaranam Panniniyapraveshaya. Chinmaya International Foundation:Unspecified; 2015.
    2. [mm2015] Michika M. Grammatical Analysis of the Bhagavad Gita Chapters 1 to 6. Arsha Avinash Foundation:Coimbatore; 2015.