Yogyata Sannidhi Aakaanksha
Yogyata Sannidhi Aakaanksha
Yogyata Sannidhi Aakaanksha
Vakya
The ‘Vakya’ is one of the principles of Purva Mimansa. The Vakya principle deals
with the syntactic completeness of the sentence. Along with the other principles
discussed in the earlier chapter, this principle is equally required to understand the
importance of the sentence for inclusion in the summary. In this chapter, the concept of
80
Vakya is elaborated in detail from the point of view of its importance in summary. The
subsequent section discusses how the classical Sanskrit theories view the concept of
Vakya which in turn is the most important element of the Text Summarization.
सम भ5यवहारम ् वा यम ् ।
This means putting together or to pronounce the component words such as subject,
objects, verb etc. in a systematic way (the words) is to form a Vakya i.e. ‘A sentence’.
With this concept, the classical Sanskrit theories present two different views on the
linguistic phenomenon of Vakya. One view is the ‘Akhandapaksha’ which is a wholistic
view. The second view is the analytical one and is known as ‘Khandapaksha’. These two
approaches are complementary to each other. Although both of them are diverse and
novel, they elaborate various interesting features of language. ‘Akhandapaksha’ followers
believe that the words do not have any meaning bearing capability by themselves. Only
in the context of a sentence, they get such a capability. Ancient Sanskrit Grammarians
like Bhartrihari, and Audumbarayana followed this view [Subrahmanyam, 1971].
‘Khandapaksha’ followers suggest that individual words are real entities and they
have their own associated meaning [Joshi, 1968]. According to them the sentences derive
their meaning as the sum of meaning of the constituent words. Major present day
linguistic theories such as GB-theory [Chomsky, 1980], LFG [Bresnan, 1984] and GPSG
[Gazdar et. al., 1985] are similar to this view. These views of sentence understanding i.e.
sentence interpretation consider the words as independent elements of spoken or written
language. When certain thoughts are created in mind, then to represent that thought the
sentences are formed by arranging the set of words. These groups of words follow certain
rules or criteria and then interpret the sentence in terms of the meaning of the integral
words. Sanskrit Grammarians such as Panini, Kaatyaayana and Patanjali adopted this
view for sentence analysis [Joshi, 1968].
81
‘Akhandapaksha’ view has its own difficulties. As explained in [Subrahmanyam,
1971] the Akhandapaksha is explained in classical linguistic literature with great
philosophical and meta-physical orientation. As well as it is more concerned about the
pragmatic and semantic aspects of language but on the other hand it ignores the syntactic
aspects of a sentence altogether. By adopting the Akhandapaksha view, one cannot
handle the ‘wh’ type questions whose answers can be a single word. It is also unable to
explain the inter-relationship among the causative sentence and the general sentence.
Hence, this makes it difficult to use this view when the accuracy of interpretation is
required.
In the next section, the various views on Vakya are discussed and enlisted because
these lay the foundation for selection of proper grammar for Text Summarization in
which sentence interpretation is a crucial task.
एका तीन वा यम ् ।
82
Patanjali defines Shabda as an element that when uttered conveys the point of the
speaker. These grammarians do not consider a word as a mere utterance, instead for them
a word is a meaning conveying unit of a sentence. Bharthrhari in his Sphota theory
addresses the Vakya sphota as [Coward et. al., 1990; Arjunwadkar 2008]:
In the task of Text Summarization, syntactically complete and sound sentences may
contribute to the knowledge rich representation. Hence, the Vakya principle is discussed
in detail and further used to contribute in the summary generation.
83
These concepts were also considered as salient by Panini's Ashtadhyayi [Joshi,
1968; Arjunwadkar 2008]. The ‘Khandapaksha’ focuses on these concepts while
analyzing sentences.
The Mimansa School used the principle of Aakanksha to explain the constituent
structure of a sentence and tries to elaborate the semantic interpretation from sentence
structure [Jha, 1964; Sarkar, 2003]. With these interpretations this theory tries to
understand the Vakyabodha i.e. meaning of the sentence. To achieve the correct
meaningful sentence Aakanksha is used along with ‘Yogyata’ i.e. (competency) and
‘Sannidhi’ (proximity). Mimansakas state that to achieve complete Shabdabodha i.e.to
get the proper sentence meaning the words should be inter-related through the concepts
of ‘Aakanksha’, ‘Yogyata’ and ‘Sannidhi’ [Sharma,2004]. As the concept of Aakanksha
is more significant from summarization point of view, hence this will be dealt first.
When a reader reads a sentence or a hearer hears a sentence they expect the mutual
relation among the words to interpret the sentence completely. A word is said be in
expectance i.e.in Aakanksha for another word, if without the later, it is unable to produce
knowledge of their interconnected utterance [Sharma, 2004] e.g. a verb such as ‘watch’
has expectancy for the other object. Without the object to be seen, the word ‘watch’ is not
able to convey its full meaning. In short, Aakanksha declares that the words are unable to
convey the complete meaning of the sentences without the presence of other words.
Consider the following sentence,
Here, the verb Pashyati (sees) alone is not able to interpret the meaning. Similarly
the rest of two words also cannot convey the meaning individually. However, their
suitable combination can do it. A single word is unable to form a sentence because that is
unable to give the complete meaning of the given sentences [Padasya (P 8.1.16); Shastri,
2008].
84
Shambhavi Samvitam pashyati.
When the above three words come in a sentence, they have no expectations for the
other words to complete the meaning. However some other words can be added which
may enrich the meaning of the sentence, that will add extra information but for the
complete formation of the sentence only the above three words are sufficient. On the
other hand, a group of words like ‘elephant cat go stand' is not complete as there is no
Aakanksha among the terms of this sentences.
Aakanksha not only exists among the sentence but also exists among the different
sentences. Mimansa School considers a sentence as an incomplete one if after the
analysis also the sentences are in mutual expectancy of one another [Sarkar, 2003]
Aakanksha principle can be addressed very well if Vakya principle of Mimansa is
combined with the ‘Prakarana Principle’, (which was discussed in detail in the 4th
chapter. When there is no requirement or expectation or Aakanksha from the outside
words then the sentences are complete in their meaning and can be treated as different or
more important sentences as compared to other sentences.
The words do not intrinsically show Aakanksha (desire), they are said to have
Aakanksha in a representative sense. Through this point, it becomes evident that, when
the tasks like opinion generation or Taatparya (Gist or summary) generation are required
to be carried out, those have to be obtained from a sentence or group of sentences by
85
proper grammatical analysis. This brings syntactic completeness as well as pragmatic
completeness [Searle, 1975] in the realm of Aakanksha.
4.2.2 Yogyata
The word Sannidhi gives the notion of proximity. It is one of the characteristic of
sequencing which has the impact on the process of deriving the sentence meaning. Even
though the sentence satisfies Aakanksha and Yogyata but if these words are not in close
proximity of each other then the sentence seems to be ill-formed. The characteristic
principle of Sannidhi is explained by the following verse which means that presence of
the padas i.e. words in a sentence in appropriate sequence and without unnecessary delay
or undue gap of time is called Asakti or Sannnidhi.
Sannidhi principle concentrates more on sentence length and ensures that the words
are in close proximity of each other. When the words are pronounced or written at long
86
intervals or while writing if they are written with many intermittent words then it
becomes difficult to determine the interrelation among them. Most of the times by
following a relative order among the words in close proximity, the context of the topic is
conveyed [Arjunwadkar, 2008].
Primarily, Sannidhi gives the relationship among the word sequence or it also tries
to give the wordily utterance which contributes to complete the sentence and obtaining
the meaning of sentence. Alternatively one can say that Sannidhi focuses on the syntax of
the language. As Sanskrit is a relatively free order the concept of Sannidhi is not much
substantial in Sanskrit [Sharma 2004; Arjunwadkar, 2008]. Nevertheless, in English, it
plays the significant role because English is a fixed order language. Therefore Sannidhi
principle requires a deeper analysis of English Grammar Formalism. However, modeling
Sannidhi is straight forward for English and will be discussed in this chapter further.
Thus ‘Aakanksha’, ‘Yogyata’ and ‘Sannidhi’ all together try to contribute to identify the
meaningful complete sentence. Identification of such sentences is the crucial task for
Text Summarization.
From the above discussion, it becomes quite evident that for the development of
applications where syntax and semantics is equally important like Text Summarization,
the grammar selected for the sentence analysis should address the issues of ‘Aakanksha’,
‘Yogyata’ and ‘Sannidhi’ properly.
87
be annotated with dependency structures. Nevertheless, dependency structures have been
applied successfully to English [Rambow et. al., 2002; Hall, 2008].
The two approaches mentioned here i.e. the simple PSG and dependency structures
are not the only options available to annotate the English corpus. The other options like
LFG, and the GPSG and HPS like complex phrase structure grammar models are
available. However, only phrase structure and Dependency Grammars are discussed
because these two models are able to annotate corpora manually as well as automatically.
The HPSG parsers exist, but not many corpora are parsed using HPSG parsers. In
addition, these parsers are not robust enough and do not have sufficient wide coverage
which can serve as a basis for corpus annotation [Hall, 2008].
Since 1957 the work of Chomsky [Chomsky, 1959], which was constituency based
representations, has influenced the field of linguistics. As the field of linguistics
progressed, the sophisticated models with advanced syntactical and semantic analysis
along with the diversity of the lexicon were developed. These models were then used as
popular choice over the traditional constituency grammars. One of such grammatical
formalism is Dependency Grammar [Kruijff, 2002]. In this section, the dependency
structures and the constituency structure are compared for the deciding the ideal choice
for the syntactic structure representation. Further, the concept of dependency framework
is elaborated in detail.
Consider a sentence, ‘They killed the man with the Gun.’ Figure (4.1) and (4.2)
gives the constituency based phase structure representation and the dependency
representation for the sentence ‘They killed the man with the Gun’.
88
Figure 4.1 Phrase structure grammar representations
Phrase structure trees start with the highest constituent and then analyzes it into
phrases like Noun phrase (NP), Verb Phrase (VP), Preposition phrase (PP) etc., giving
the hierarchical phrase analysis. On the other hand Dependency structures representation
concentrates on the word order and hence instead it gives the word dependencies among
the constituent words like Verb (V), noun (N), preposition (P), determiner (D)
[Matthews,1981].
89
Secondly, from the figures (4.1) and (4.2) it can be observed that the parse tree
given by the PSG (phrase structure grammar) contains 12 nodes while the Dependency
Parse tree contains only 7 nodes. This shows that, in terms of the nodes and the
corresponding paths to be traversed among the nodes, the dependency structures are
minimal. This optimization is achieved by word dominance and hierarchy that is
followed in the dependency framework e.g. It can be observed that in sentence structure
(a) in figure (4.1), the ‘preposition with’ constitutes the prepositional phrase with the
noun phrase ‘a gun’. When this is compared with the dependency tree (b) in figure (4.2),
it can be observed that here the preposition dominates the noun gun, which in turn
dominates the article ‘a’.
One more thing that is quite true for the Dependency Grammar is that they are more
close to the way; the human brain processes a sentence when a hearer hears or reader
reads it to interpret its meaning. He/she does not wait for the complete sentence to arrive
instead as and when a particular word is heard they try to find the relation among the
words. As observed in the dependency tree in figure (4.2), the Dependency Grammar also
do not wait for a phrases to appear, instead it works on the word level and finds out the
relation between them [Hall, 2008]. Along with this, the important fact associated with
dependency structure is that grammatical functions can be integrated more easily in the
dependency framework [Miller, 1992]. With these advantages, the subsequent sections
discuss the actual dependency formalism.
The roots of the Dependency Grammar are found in the early Paninian grammar.
However, the developments in the representation have taken place during the mean time
and this grammatical tradition was finally culminated by Tesniere in 1959 and became
the base of the modern Dependency Grammar formalisms [Melcuk, 2003]. Tesniere
explains these basic relations in terms of heads which are known as governors and the
dependents i.e. regents.
These modern formalisms which are syntactic in nature represent the connections
between the lexical elements. These connections are further known as dependencies
90
which in turn are binary asymmetrical in nature. Further Miller [Miller, 1992] very well
explains the various criteria for deciding these connections between the head and
dependents. The connectionisms between the head and the governor are the good blend of
Syntacto-semantic properties. The following sections discuss the Syntacto-semantic
nature of Dependency Grammar formalism and there by tries to throw the light on how it
is closer to the concepts of Khandapaksha view.
91
machine translation, more often this tagger is used as a part of preprocessing by various
groups [Marneffe and Manning, 2008]. These Stanford typed dependency structures
represent the grammatical relationships among the words in a very simple manner. The
representation can be easily realized and used in an effective manner for the task of
extracting the textual relations with less linguistic skill. The Stanford representation gives
the inter relationship among the words in the sentence in a uniform manner with
dependency relations typed on the edges.
There are various kinds of dependency relations given by the Stanford parsers as
described in Stanford typed dependency manual [Marneffe and Manning, 2008]. The
basic typed dependency framework applies the dependencies on the tree nodes of a
sentence based on dependencies defined in second section of the Stanford dependency
manual. In basic dependencies, there are no crossing dependencies that are also known
projective dependency structure. Except the head or the root, every word in a sentence
depends of one the other word in the sentence.
Initially, the Stanford dependency parser parses the given input sentences and
identifies the POS (Parts of speech) tag for every word. Various POS taggers are
available for the processing of the text [Asmussen, 2014]. Among these, an open
configurable POS tagger is considered as more feasible. For the Open course POS tagger,
the tagger software and its knowledgebase both are freely available and configurable too.
The parameters like open source and free availability are major contributor in achieving
the satisfactory precision. Along with these basic requirements, the tagger also should
satisfy certain other requirements, enlisted below, in order to give a satisfactory
performance. A tagger should be developed on a platform independent, widely used
programming language with its web based solution. One should be able to execute the
tagger on a standalone PC so that a large amount of text is processed quickly with no
need of any remote service. Moreover, the tagger should be maintained and documented
well by a community instead of relying on only one developer.
Most of the taggers available for English are thoroughly enlisted on Stanford
University’s NLP site3. When these POS taggers are studied the Stanford dependency
92
POS tagger is observed to be more suitable for our application. The Stanford POS tagger
is freely available for some of the languages like English, Chinese etc. The added
advantage of this framework is that it is a language free model i.e. it can be trained on
any dataset with annotated text. Furthermore, it is well documented and maintained. All
of its revised versions are available on the website. All these things make this POS tagger
as an ideal choice for our task [Marneffe and Manning, 2008].
Once the POS tags are assigned then, the Stanford parser further analyzes a
sentence grammatically and establishes connection across governor words and heads. The
heads are the words modifying the governor, which is the root of a sentence. This kind of
parser executes the linear-time search along the words in the given sentence and
constructs a parse tree. The subsequent section discusses the various denotations used by
the dependency parser in this process along.
4.5 Denotations
In this section a generic explanation about the denotations used by the Dependency
Grammar are given. Basically, there are three different linguistic categories with which
the complete dependency parse of a sentence is shown [Otero, 2008]. These are:
• Lexical words: Here the lexical headings such as noun, verb, adjective etc., are
considered as set of properties, which are semantically compatible with one
another. Thus the constituent words, with their form among the sentences, are
known as lexical words e.g. ‘Went’, ‘cut’, ‘goes’, ‘Shambhavi’, etc.
• Syntactic dependencies: The lexical words are related to each other with the
binary syntactic dependencies like ‘nsubj’ (nominal subject), ‘agent, ‘dobj’ (direct
object), ‘iobj’ (indirect object) etc. The binary dependency relation accepts two
lexical elements as arguments and gives a more fine or restricted relations in
between them. Consider a binary dependency ‘nsubj’; two binary λ-expressions
are associated with it, λx and λy. Consider a sentence ‘Shambhavi walked’. Let
us take the example of ‘nsubj’ dependency for this sentence. Actually,
‘Shambhavi’ and ‘walked’ are the denotations of ‘Shambhavi’ and ‘Walk’
93
respectively. The binary relation among these two is given as nsubj (walk,
Shambhavi); this is the restricted representation between the lexical elements
‘Shambhavi’ and ‘walked’. The Dependency Grammar representation assigns
‘walked’ to head and ‘Shambhavi’ to the dependent role. Here the properties of
the lexical words are not elaborated much because our aim is to focus upon the
combinational operations that are involved in a dependency.
• The third category is of Lexico-syntactic associations, which is combination of
lexical words and their associated dependencies. These Lexico-syntactic
dependencies are the patterns like POS tag and Dependency along with lexical
word’. Lexical words denote sets of words while the Lexico-syntactic associations
give a specific operation that is carried on any two words from the set of words
[Otero, 2008]. Thus, the ‘nsubj’ dependency can be represented as ‘noun +
subject+ verb’. Such a pattern that is more specific in terms of lexical elements
and dependencies is known as Lexico- syntactic pattern.
With these concepts, a partial parse at every step it maintained in a stack in which
all the words, which are handled presently, are kept. In parallel, a buffer is created for the
words that are required to be processed yet. Until the buffer is empty, the transitions are
applied to complete the dependency graph. Initially all the words are in the buffer with a
stack having a root node which is empty. Then the left and the right transitions are
performed. LEFT-ARC is used to mark the next term in the stack which is depends on the
earlier item. The second item is then removed. Similarly, the RIGHT-ARC is used to
mark the first item which depends on the second item in the stack. The SHIFT operation
removes a word from the buffer and then the word is pushed on to the stack. Any
projective dependency can be generated by these three types of transitions. It is observed
that for every typed dependency the respective transition is also specified that gives the
type of the relationship existing in-between the head and governor being described. The
figure (5.3) shows a dependency parse of a short sentence. ‘She is singing even better’.
The Stanford dependency graphs shown in the following section are drawn using the
coreNLP linguistic analysis tool available at http://nlp.stanford.edu:8080/corenlp/process.
94
Figure 4.3 Stanford parse of a small sentence
In the above figure (4.3), ‘better’ is modifying the verb ‘singing’. The verb
‘singing’ is at the centre of the sentence dependency parse. The word ‘she’ is the nominal
subject of the action described by the verb ‘sing’. The words ‘even’ and’ better’ are
labeled as adverbial modifiers for the verb ‘sing’. Such dependencies give the relation of
one word with other. Similarly consider the sentence ‘Yashodhan, based in Alandi, buys
and sells the petroleum products’. For this sentence, the Stanford Dependencies (SD)
representation is shown in figure (4.4) below:
Figure 4.5 Stanford Dependencies for the ‘Yashodhan, based in Alandi, buys and
distributes Petroleum products.’
95
In this example also the verb ‘buys’ is at the centre of the parse shown by ‘root
(ROOT-0, buys-7)’. The word ‘Yashodhan’ is showing the ‘nsubj’ dependency. That
means it is the nominal subject of the action described by the verb ‘buys’. Similarly the
word ‘products’ is the object of the verb ‘buys’. In this way the dependency relations for
each word are available with the use of Dependency Grammar [Marneffe and Manning,
2008]. Thus it can be observed that the Stanford Dependency Grammar formalism is able
to give the POS tags as well as the relations among these POS tags.
As shown in figure (4.1) and (4.2), in the constituent phrase structure the root node
of a tree is the starting grammatical non terminal construct which is generally the entire
sentence. While in case of Dependency Grammar, the verb of the sentence is at the root
node for analysis. This verb centrality will be more useful to us for the Text
Summarization as will be discussed further in the chapter 6th and 7th chapters where we
discuss the use of Karakas for Text Summarization. With this verb centrality, the notion
of Dependency Grammar is closer to the opinions of Khandapaksha.
These syntactic trees and the associated syntactic function are closer to the
terminologies of ‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’. Thus, the dependency graph not
only gives the information about the mutually dependent words, but it also gives the
information about how they are dependent on each other. These dependencies are labeled
thus giving the relation information.
When Dependency Grammar tries to find out the dependencies among the head and
governor words along with their associated POS tags then it is closer to the concept of
Aakanksha. As discussed in section (4.2.1) with the notion of Aakanksha, the Vakya
principle tries to form the interconnection among the words by considering the verb as
the central element. With these interconnections, only the meaning of the sentence is
obtained. The Aakanksha declares that the words are unable to convey the complete
meaning of the sentences without the presence of other words [Sharma, 2004].
96
As discussed in section (4.2.3), Sannidhi focuses upon the word utterance or
presence which contributes to complete the sentence and obtaining the meaning of
sentence. Alternatively, we can say Sannidhi identifies the syntax of the language
[Sharma, 2004; Arjunwadkar, 2008]. With this view, if the placement of POS tags is used
as a feature to analyze the sentence structure then it is possible to incorporate the notion
of Sannidhi for English text.
In the proposed work, The Dependency Grammar is used in two parts. In one part,
while performing the surface level analysis, in which we intend to use the statistical
feature based approach with the syntactic structure, we will combine the statistical
Mimansa feature based study with the Dependency Grammar based syntactic POS
tagging. With this, we have tried to incorporate the Sannidhi principle of Vakya. During
this approach we extract the relevant Mimansa features for every sentence then the
associated POS tag of every word in sentence is obtained by using Stanford POS tagger.
Further the placement of the words in accordance with the POS tag and the repetition of a
particular POS tag are obtained for all the sentences in the given document. The
sequential placement and repetition of POS tags will enable the system to observe how
the words are placed in a sentence and it will also enable the system to observe the
proximity among them. The placement of POS tags and repetition of POS tags is
different for well formed sentences and ill formed sentence. To make the system
understand such structures, the Neural Network will used. This network is trained to
understand the placement sequence and frequency of the important POS tags in a
particular sentence. Thereby the system will be able to understand the structure of
important sentences in term of their constituent words POS tags.
In the second part, the concept of Aakanksha i.e. dependency relations among the
words with their associated labels will be used to incorporate the Syntacto semantic
nature of the Sanskrit language with the help of concept of Karaka. The detailed
discussion about the same will be given in the sixth and the seventh chapter.
Our aim in the proposed approach is to develop the efficient hybrid Text
Summarization system, which will be rich in Syntacto-semantic concepts. We do not
97
intend here to incorporate the pragmatic level of language analysis in the proposed
model. Hence, one thing that can be stated here is that, as the concept of Yogyata is much
related to pragmatic analysis; it has not been incorporated in the proposed model. It can
be studied or incorporated as a part of applications with discourse analysis, word sense
disambiguation etc.
4.7 Conclusion
From the above discussion, it becomes evident that the ‘Syntactic dependency’ is
the important dependency formalism. Its views are closer to the views of Vakya principle
of ‘Purva Mimansa’ which is intend to be used in the proposed frame work. In
dependency structures, there is one to one association among the words of the sentence
i.e. for every word in a sentence there is exactly one node in the dependency tree. Thus,
in these word grammars, there exist a word and its associated dependency i.e. relation
that connects it with the other word or element in the structure. When this is compared
with the constituency structures then it can be easily observed that for the constituency
structures there exists one-to-many relationship among the sentence elements. Thus, from
the linguistic analysis point of view the dependency structures are minimal and rich in
giving the relationship among the words. With their associated framework, the
grammatical formalism of Dependency Grammar goes more close to the concepts of
‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’. Therefore for the task of summarization where
the sentence interpretation characteristics like ‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’ are
essentially required, dependency formalism becomes an ideal choice. With all these
concerns, we choose Stanford Dependency framework, which is a syntactic Dependency
Grammar structure, as a tool for linguistic analysis in carrying out the task of
identification of important sentences for Text Summarization.
98