Yogyata Sannidhi Aakaanksha

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Chapter 4

Vakya

A meaningful sentence is an important integral entity of any textual document.


Under a sentence, if the configuration is not correct then the meaning of it could be a
calamity which may lead the hearer or writer to interpret the sentence wrongly. The term
sentence is not merely a group of words instead it is a broad concept which involves the
presence of meaningful words in systematic manner. These words may be the words like
subjects, objects, actions which make the sentence complete. The concept of sentence
(sentential syntax) is one of the most important problems of logic and language that
follows the rules of grammar. Grammar of a language is based on its own logical
background. On these logical bases, diverse philosophical theories got developed in
Indian tradition, regarding the meaning of the sentences [Kunjunni, 1963]. Under the
linguistic study, a sentence can be studied with various different approaches that bring it
close to the way humans interpret or generate the language. Cognitive scientists are
proposing that Natural Language Processing is cognitively sounder if Sanskrit is used as
the logical base behind this theory [Briggs, 1985; Saxena and Agrwal, 2013]. If this
phenomenon of Sanskrit language is used to carry out the process of Text Summarization,
it leads towards the cognitively closer summary generation.

The ‘Vakya’ is one of the principles of Purva Mimansa. The Vakya principle deals
with the syntactic completeness of the sentence. Along with the other principles
discussed in the earlier chapter, this principle is equally required to understand the
importance of the sentence for inclusion in the summary. In this chapter, the concept of

80
Vakya is elaborated in detail from the point of view of its importance in summary. The
subsequent section discusses how the classical Sanskrit theories view the concept of
Vakya which in turn is the most important element of the Text Summarization.

4.1 Two Approaches to Study the Language

Laugakshi Bhaskara [Sarkar, 2003] comments on ‘Vakya’ as:

सम भ5यवहारम ् वा यम ् ।

This means putting together or to pronounce the component words such as subject,
objects, verb etc. in a systematic way (the words) is to form a Vakya i.e. ‘A sentence’.
With this concept, the classical Sanskrit theories present two different views on the
linguistic phenomenon of Vakya. One view is the ‘Akhandapaksha’ which is a wholistic
view. The second view is the analytical one and is known as ‘Khandapaksha’. These two
approaches are complementary to each other. Although both of them are diverse and
novel, they elaborate various interesting features of language. ‘Akhandapaksha’ followers
believe that the words do not have any meaning bearing capability by themselves. Only
in the context of a sentence, they get such a capability. Ancient Sanskrit Grammarians
like Bhartrihari, and Audumbarayana followed this view [Subrahmanyam, 1971].

‘Khandapaksha’ followers suggest that individual words are real entities and they
have their own associated meaning [Joshi, 1968]. According to them the sentences derive
their meaning as the sum of meaning of the constituent words. Major present day
linguistic theories such as GB-theory [Chomsky, 1980], LFG [Bresnan, 1984] and GPSG
[Gazdar et. al., 1985] are similar to this view. These views of sentence understanding i.e.
sentence interpretation consider the words as independent elements of spoken or written
language. When certain thoughts are created in mind, then to represent that thought the
sentences are formed by arranging the set of words. These groups of words follow certain
rules or criteria and then interpret the sentence in terms of the meaning of the integral
words. Sanskrit Grammarians such as Panini, Kaatyaayana and Patanjali adopted this
view for sentence analysis [Joshi, 1968].

81
‘Akhandapaksha’ view has its own difficulties. As explained in [Subrahmanyam,
1971] the Akhandapaksha is explained in classical linguistic literature with great
philosophical and meta-physical orientation. As well as it is more concerned about the
pragmatic and semantic aspects of language but on the other hand it ignores the syntactic
aspects of a sentence altogether. By adopting the Akhandapaksha view, one cannot
handle the ‘wh’ type questions whose answers can be a single word. It is also unable to
explain the inter-relationship among the causative sentence and the general sentence.
Hence, this makes it difficult to use this view when the accuracy of interpretation is
required.

In the next section, the various views on Vakya are discussed and enlisted because
these lay the foundation for selection of proper grammar for Text Summarization in
which sentence interpretation is a crucial task.

4.1.1 Views on ‘Vakya’

Sage Katyayana defined a sentence as a collection of words that have at least a


finite verb as an integral element. The Sutra (aphorism) i.e. A Sanskrit compound used to
define this characteristic is

एका तीन वा यम ् ।

It means, at least one verb is mandatory for a sentence to be a complete one


[Coward et. al., 1990; Bhattacharyya, 1998]. When the view accepts the single word
presence as a Vakya then a sentence like ‘Likhati’ should be considered as a complete
sentence. However for a reader, ‘Likhati’ alone is unable to give the complete sentence
but the groups of words ‘Shambhavi Likhati’ will surely make a sense. Consider another
example,
Pasya, Shamabhavi Likhati,
(Look, Shambhavi writes).
In this case, the sentence has two verbs; however, it doesn’t indicate two different
sentences. Thus the concept of Eka Vakyata i.e. single sentence-ness lies in the utterance
given attention. The primary issue here is to address the relation among the overall Artha
(meaning) of the sentence and its constituent linguistic elements (‘Shabda’) i.e. words.

82
Patanjali defines Shabda as an element that when uttered conveys the point of the
speaker. These grammarians do not consider a word as a mere utterance, instead for them
a word is a meaning conveying unit of a sentence. Bharthrhari in his Sphota theory
addresses the Vakya sphota as [Coward et. al., 1990; Arjunwadkar 2008]:

‘A complete utterance of the sentence is the unit of language’

A sentence which is considered as apparently complete (self-contained) is called as


Vakya. Mimansakas quote that, when the meaning of a word or collection of words is
obtained from the whole sentence, then it is called the Vakya [Sandal, 1980: xix].
Similarly A sentence, which is sound i.e. complete in its syntactical arguments, is known
as a Vakya [Jha, 1964]. A complete sentence is capable to make out a satisfactory sense.
According to ‘Sloka Vartika’ one can derive a special sense from the sentence by
examining the structure of the sentences [Sarkar, 2003].

In the task of Text Summarization, syntactically complete and sound sentences may
contribute to the knowledge rich representation. Hence, the Vakya principle is discussed
in detail and further used to contribute in the summary generation.

4.2 Analytical View

The Khandapaksha, which is the analytical perspective, is closer to the views of


predominant computational linguistic theories. The Khandapaksha theory concentrates
upon the words and their meanings. They expect the words to fall within certain
proximity constraint with relatively free order. To achieve this, the sentence analysis is
carried out from surface i.e. utterance level to the root i.e. individual word. The
Khandapksha view, which is followed by the Mimansakas in their Vakya principle, not
only stops at finding the stem but further tries to find out the inter-relation among the
words. The three important factors that are used to find these mutual relations are:

1. ‘Aakanksha’ 2. ‘Yogyata’ 3.‘Sannidhi’

83
These concepts were also considered as salient by Panini's Ashtadhyayi [Joshi,
1968; Arjunwadkar 2008]. The ‘Khandapaksha’ focuses on these concepts while
analyzing sentences.

The Mimansa School used the principle of Aakanksha to explain the constituent
structure of a sentence and tries to elaborate the semantic interpretation from sentence
structure [Jha, 1964; Sarkar, 2003]. With these interpretations this theory tries to
understand the Vakyabodha i.e. meaning of the sentence. To achieve the correct
meaningful sentence Aakanksha is used along with ‘Yogyata’ i.e. (competency) and
‘Sannidhi’ (proximity). Mimansakas state that to achieve complete Shabdabodha i.e.to
get the proper sentence meaning the words should be inter-related through the concepts
of ‘Aakanksha’, ‘Yogyata’ and ‘Sannidhi’ [Sharma,2004]. As the concept of Aakanksha
is more significant from summarization point of view, hence this will be dealt first.

4.2.1 Aakanksha (Expectancy)

When a reader reads a sentence or a hearer hears a sentence they expect the mutual
relation among the words to interpret the sentence completely. A word is said be in
expectance i.e.in Aakanksha for another word, if without the later, it is unable to produce
knowledge of their interconnected utterance [Sharma, 2004] e.g. a verb such as ‘watch’
has expectancy for the other object. Without the object to be seen, the word ‘watch’ is not
able to convey its full meaning. In short, Aakanksha declares that the words are unable to
convey the complete meaning of the sentences without the presence of other words.
Consider the following sentence,

‘Shambhavi Samvitam pashyati’

‘Shambhavi sees Samvit.’

Here, the verb Pashyati (sees) alone is not able to interpret the meaning. Similarly
the rest of two words also cannot convey the meaning individually. However, their
suitable combination can do it. A single word is unable to form a sentence because that is
unable to give the complete meaning of the given sentences [Padasya (P 8.1.16); Shastri,
2008].

84
Shambhavi Samvitam pashyati.

When the above three words come in a sentence, they have no expectations for the
other words to complete the meaning. However some other words can be added which
may enrich the meaning of the sentence, that will add extra information but for the
complete formation of the sentence only the above three words are sufficient. On the
other hand, a group of words like ‘elephant cat go stand' is not complete as there is no
Aakanksha among the terms of this sentences.

Utthita-Aakanksha is part of the concept Aakanksha, which means ‘aroused or


potential expectancy’. The Advaitic School of philosophy has used this concept. It says
there is Aakanksha between the words expect each other to complete their meaning as
well as a potential chance of expectation from other supporting words also exists. For
instance, consider the sentence ‘Go to the Market’. To which market one has to go is not
mentioned explicitly. Market may be described by the words like vegetable market, fruit
market, timber market etc. Though these words are necessary to complete the meaning
such words are useful during sentence analysis. In this case the word, market is said to
have Utthita Aakanksha for its adjectives. Similarly, there can be Aakanksha for adverbs
or some other adjuncts [Joshi, 1968].

Aakanksha not only exists among the sentence but also exists among the different
sentences. Mimansa School considers a sentence as an incomplete one if after the
analysis also the sentences are in mutual expectancy of one another [Sarkar, 2003]
Aakanksha principle can be addressed very well if Vakya principle of Mimansa is
combined with the ‘Prakarana Principle’, (which was discussed in detail in the 4th
chapter. When there is no requirement or expectation or Aakanksha from the outside
words then the sentences are complete in their meaning and can be treated as different or
more important sentences as compared to other sentences.

The words do not intrinsically show Aakanksha (desire), they are said to have
Aakanksha in a representative sense. Through this point, it becomes evident that, when
the tasks like opinion generation or Taatparya (Gist or summary) generation are required
to be carried out, those have to be obtained from a sentence or group of sentences by

85
proper grammatical analysis. This brings syntactic completeness as well as pragmatic
completeness [Searle, 1975] in the realm of Aakanksha.

4.2.2 Yogyata

Khandapaksha further enriches the analytical view by adding the concept of


Yogyata. It focuses on whether the words used in a sentence are mutually relevant to
carry a full sense. Yogyata ensures the logical compatibility of the words in a sentence
with each other. The nature of Yogyata is critical, it tries to judge the meaning of the
sentence based on the compatibility and consistency among the words. Consider the
sentence, ‘the table wrote a good letter’, the table is not logically compatible with writing
and hence this sentence gets rejected as ill-formed or meaningless. Similarly the
combinations like ‘Triangular circle’ though they are void by definition they are illogical
and unreal. Though such combinations comprehend a proper sentence, they are not valid
for the proper knowledge gain. Such inconceivable association of the words make the
sentences nonsensical [Sharma, 2004; Arjunwadkar 2008]. With this overview one can
comment that the concept of Yogyata is moreover closer to the pragmatic analysis of a
sentence rather than syntactic or semantic view.

4.2.3 Sannidhi (Proximity)

The word Sannidhi gives the notion of proximity. It is one of the characteristic of
sequencing which has the impact on the process of deriving the sentence meaning. Even
though the sentence satisfies Aakanksha and Yogyata but if these words are not in close
proximity of each other then the sentence seems to be ill-formed. The characteristic
principle of Sannidhi is explained by the following verse which means that presence of
the padas i.e. words in a sentence in appropriate sequence and without unnecessary delay
or undue gap of time is called Asakti or Sannnidhi.

आसि त9चा5य़वधानेन पदज<य़पदाथ=पि!थ तः। [Dharmaraja Adhwarindra, 1963]

Sannidhi principle concentrates more on sentence length and ensures that the words
are in close proximity of each other. When the words are pronounced or written at long

86
intervals or while writing if they are written with many intermittent words then it
becomes difficult to determine the interrelation among them. Most of the times by
following a relative order among the words in close proximity, the context of the topic is
conveyed [Arjunwadkar, 2008].

Primarily, Sannidhi gives the relationship among the word sequence or it also tries
to give the wordily utterance which contributes to complete the sentence and obtaining
the meaning of sentence. Alternatively one can say that Sannidhi focuses on the syntax of
the language. As Sanskrit is a relatively free order the concept of Sannidhi is not much
substantial in Sanskrit [Sharma 2004; Arjunwadkar, 2008]. Nevertheless, in English, it
plays the significant role because English is a fixed order language. Therefore Sannidhi
principle requires a deeper analysis of English Grammar Formalism. However, modeling
Sannidhi is straight forward for English and will be discussed in this chapter further.
Thus ‘Aakanksha’, ‘Yogyata’ and ‘Sannidhi’ all together try to contribute to identify the
meaningful complete sentence. Identification of such sentences is the crucial task for
Text Summarization.

From the above discussion, it becomes quite evident that for the development of
applications where syntax and semantics is equally important like Text Summarization,
the grammar selected for the sentence analysis should address the issues of ‘Aakanksha’,
‘Yogyata’ and ‘Sannidhi’ properly.

4.3 Vakya and Dependency Grammar

As discussed earlier, to develop the Text Summarization we seek to use the


grammar which is closer to the concepts of ‘Aakanksha’, ‘Yogyata’ and ‘Sannidhi’. For
English, there are two major types of syntactic annotations i.e. Grammar formalisms. One
is the phrase structure and the other one-dependency representation. Generally, a Phrase
Structure Grammar (PSG) is applied to the fixed order languages with clear constituency
structures like English. On the other hand, the Dependency structure is closer to the
Syntacto-semantic roles and is more suitable for languages with greater freedom of word
order. However, this does not mean that the fixed order languages like English should not

87
be annotated with dependency structures. Nevertheless, dependency structures have been
applied successfully to English [Rambow et. al., 2002; Hall, 2008].

The two approaches mentioned here i.e. the simple PSG and dependency structures
are not the only options available to annotate the English corpus. The other options like
LFG, and the GPSG and HPS like complex phrase structure grammar models are
available. However, only phrase structure and Dependency Grammars are discussed
because these two models are able to annotate corpora manually as well as automatically.
The HPSG parsers exist, but not many corpora are parsed using HPSG parsers. In
addition, these parsers are not robust enough and do not have sufficient wide coverage
which can serve as a basis for corpus annotation [Hall, 2008].

4.3.1 Dependency verses Constituency

Since 1957 the work of Chomsky [Chomsky, 1959], which was constituency based
representations, has influenced the field of linguistics. As the field of linguistics
progressed, the sophisticated models with advanced syntactical and semantic analysis
along with the diversity of the lexicon were developed. These models were then used as
popular choice over the traditional constituency grammars. One of such grammatical
formalism is Dependency Grammar [Kruijff, 2002]. In this section, the dependency
structures and the constituency structure are compared for the deciding the ideal choice
for the syntactic structure representation. Further, the concept of dependency framework
is elaborated in detail.

Consider a sentence, ‘They killed the man with the Gun.’ Figure (4.1) and (4.2)
gives the constituency based phase structure representation and the dependency
representation for the sentence ‘They killed the man with the Gun’.

88
Figure 4.1 Phrase structure grammar representations

Figure 4.2 Dependency Grammar representation

Phrase structure trees start with the highest constituent and then analyzes it into
phrases like Noun phrase (NP), Verb Phrase (VP), Preposition phrase (PP) etc., giving
the hierarchical phrase analysis. On the other hand Dependency structures representation
concentrates on the word order and hence instead it gives the word dependencies among
the constituent words like Verb (V), noun (N), preposition (P), determiner (D)
[Matthews,1981].

89
Secondly, from the figures (4.1) and (4.2) it can be observed that the parse tree
given by the PSG (phrase structure grammar) contains 12 nodes while the Dependency
Parse tree contains only 7 nodes. This shows that, in terms of the nodes and the
corresponding paths to be traversed among the nodes, the dependency structures are
minimal. This optimization is achieved by word dominance and hierarchy that is
followed in the dependency framework e.g. It can be observed that in sentence structure
(a) in figure (4.1), the ‘preposition with’ constitutes the prepositional phrase with the
noun phrase ‘a gun’. When this is compared with the dependency tree (b) in figure (4.2),
it can be observed that here the preposition dominates the noun gun, which in turn
dominates the article ‘a’.

One more thing that is quite true for the Dependency Grammar is that they are more
close to the way; the human brain processes a sentence when a hearer hears or reader
reads it to interpret its meaning. He/she does not wait for the complete sentence to arrive
instead as and when a particular word is heard they try to find the relation among the
words. As observed in the dependency tree in figure (4.2), the Dependency Grammar also
do not wait for a phrases to appear, instead it works on the word level and finds out the
relation between them [Hall, 2008]. Along with this, the important fact associated with
dependency structure is that grammatical functions can be integrated more easily in the
dependency framework [Miller, 1992]. With these advantages, the subsequent sections
discuss the actual dependency formalism.

4.3.2 Dependency Grammar Formalism

The roots of the Dependency Grammar are found in the early Paninian grammar.
However, the developments in the representation have taken place during the mean time
and this grammatical tradition was finally culminated by Tesniere in 1959 and became
the base of the modern Dependency Grammar formalisms [Melcuk, 2003]. Tesniere
explains these basic relations in terms of heads which are known as governors and the
dependents i.e. regents.

These modern formalisms which are syntactic in nature represent the connections
between the lexical elements. These connections are further known as dependencies

90
which in turn are binary asymmetrical in nature. Further Miller [Miller, 1992] very well
explains the various criteria for deciding these connections between the head and
dependents. The connectionisms between the head and the governor are the good blend of
Syntacto-semantic properties. The following sections discuss the Syntacto-semantic
nature of Dependency Grammar formalism and there by tries to throw the light on how it
is closer to the concepts of Khandapaksha view.

4.4 Syntactic Dependencies

The identification of dependencies and the direction of these dependencies is often


a challenging task. When the dependency structures are formed using the Dependency
Grammar, much focus is given on the grouping of the words. Various heuristic methods,
basic constituency testing and etymological rules are used for grouping the words and
determining the directions among the words Attention is also given on distribution
because it is the base of syntactic dependency [Melcuk, 2003; Otero, 2008]. The entire
process of dependency formalism is explained below.

4.4.1 Syntactic Functions

Traditionally, Dependency Grammars treat the grammatical association as primary


from the syntactic structure point of view. They present the directory of functions (like.
subject, object, determiner, predicative, attribute, etc.). When a dependency tree is
formed, these grammatical relations come along as labels on the respective dependency
edge in the dependency tree. The inventory of functions and their corresponding
designations may vary from one structure of DG to the other structure of DG. In the next
section, the most commonly used syntactic dependency grammatical framework named
as Stanford dependency parser is briefed.

4.4.2 Stanford Dependencies

In 2005, the Stanford Dependencies were first represented in terms of Stanford


Parser. It was used to extract the relations among the textual units. It was also used in the
various other domains like opinion analysis, sentiment analysis etc. In the task of

91
machine translation, more often this tagger is used as a part of preprocessing by various
groups [Marneffe and Manning, 2008]. These Stanford typed dependency structures
represent the grammatical relationships among the words in a very simple manner. The
representation can be easily realized and used in an effective manner for the task of
extracting the textual relations with less linguistic skill. The Stanford representation gives
the inter relationship among the words in the sentence in a uniform manner with
dependency relations typed on the edges.

There are various kinds of dependency relations given by the Stanford parsers as
described in Stanford typed dependency manual [Marneffe and Manning, 2008]. The
basic typed dependency framework applies the dependencies on the tree nodes of a
sentence based on dependencies defined in second section of the Stanford dependency
manual. In basic dependencies, there are no crossing dependencies that are also known
projective dependency structure. Except the head or the root, every word in a sentence
depends of one the other word in the sentence.

Initially, the Stanford dependency parser parses the given input sentences and
identifies the POS (Parts of speech) tag for every word. Various POS taggers are
available for the processing of the text [Asmussen, 2014]. Among these, an open
configurable POS tagger is considered as more feasible. For the Open course POS tagger,
the tagger software and its knowledgebase both are freely available and configurable too.
The parameters like open source and free availability are major contributor in achieving
the satisfactory precision. Along with these basic requirements, the tagger also should
satisfy certain other requirements, enlisted below, in order to give a satisfactory
performance. A tagger should be developed on a platform independent, widely used
programming language with its web based solution. One should be able to execute the
tagger on a standalone PC so that a large amount of text is processed quickly with no
need of any remote service. Moreover, the tagger should be maintained and documented
well by a community instead of relying on only one developer.

Most of the taggers available for English are thoroughly enlisted on Stanford
University’s NLP site3. When these POS taggers are studied the Stanford dependency

92
POS tagger is observed to be more suitable for our application. The Stanford POS tagger
is freely available for some of the languages like English, Chinese etc. The added
advantage of this framework is that it is a language free model i.e. it can be trained on
any dataset with annotated text. Furthermore, it is well documented and maintained. All
of its revised versions are available on the website. All these things make this POS tagger
as an ideal choice for our task [Marneffe and Manning, 2008].

Once the POS tags are assigned then, the Stanford parser further analyzes a
sentence grammatically and establishes connection across governor words and heads. The
heads are the words modifying the governor, which is the root of a sentence. This kind of
parser executes the linear-time search along the words in the given sentence and
constructs a parse tree. The subsequent section discusses the various denotations used by
the dependency parser in this process along.

4.5 Denotations

In this section a generic explanation about the denotations used by the Dependency
Grammar are given. Basically, there are three different linguistic categories with which
the complete dependency parse of a sentence is shown [Otero, 2008]. These are:

• Lexical words: Here the lexical headings such as noun, verb, adjective etc., are
considered as set of properties, which are semantically compatible with one
another. Thus the constituent words, with their form among the sentences, are
known as lexical words e.g. ‘Went’, ‘cut’, ‘goes’, ‘Shambhavi’, etc.
• Syntactic dependencies: The lexical words are related to each other with the
binary syntactic dependencies like ‘nsubj’ (nominal subject), ‘agent, ‘dobj’ (direct
object), ‘iobj’ (indirect object) etc. The binary dependency relation accepts two
lexical elements as arguments and gives a more fine or restricted relations in
between them. Consider a binary dependency ‘nsubj’; two binary λ-expressions
are associated with it, λx and λy. Consider a sentence ‘Shambhavi walked’. Let
us take the example of ‘nsubj’ dependency for this sentence. Actually,
‘Shambhavi’ and ‘walked’ are the denotations of ‘Shambhavi’ and ‘Walk’

93
respectively. The binary relation among these two is given as nsubj (walk,
Shambhavi); this is the restricted representation between the lexical elements
‘Shambhavi’ and ‘walked’. The Dependency Grammar representation assigns
‘walked’ to head and ‘Shambhavi’ to the dependent role. Here the properties of
the lexical words are not elaborated much because our aim is to focus upon the
combinational operations that are involved in a dependency.
• The third category is of Lexico-syntactic associations, which is combination of
lexical words and their associated dependencies. These Lexico-syntactic
dependencies are the patterns like POS tag and Dependency along with lexical
word’. Lexical words denote sets of words while the Lexico-syntactic associations
give a specific operation that is carried on any two words from the set of words
[Otero, 2008]. Thus, the ‘nsubj’ dependency can be represented as ‘noun +
subject+ verb’. Such a pattern that is more specific in terms of lexical elements
and dependencies is known as Lexico- syntactic pattern.

With these concepts, a partial parse at every step it maintained in a stack in which
all the words, which are handled presently, are kept. In parallel, a buffer is created for the
words that are required to be processed yet. Until the buffer is empty, the transitions are
applied to complete the dependency graph. Initially all the words are in the buffer with a
stack having a root node which is empty. Then the left and the right transitions are
performed. LEFT-ARC is used to mark the next term in the stack which is depends on the
earlier item. The second item is then removed. Similarly, the RIGHT-ARC is used to
mark the first item which depends on the second item in the stack. The SHIFT operation
removes a word from the buffer and then the word is pushed on to the stack. Any
projective dependency can be generated by these three types of transitions. It is observed
that for every typed dependency the respective transition is also specified that gives the
type of the relationship existing in-between the head and governor being described. The
figure (5.3) shows a dependency parse of a short sentence. ‘She is singing even better’.
The Stanford dependency graphs shown in the following section are drawn using the
coreNLP linguistic analysis tool available at http://nlp.stanford.edu:8080/corenlp/process.

94
Figure 4.3 Stanford parse of a small sentence

In the above figure (4.3), ‘better’ is modifying the verb ‘singing’. The verb
‘singing’ is at the centre of the sentence dependency parse. The word ‘she’ is the nominal
subject of the action described by the verb ‘sing’. The words ‘even’ and’ better’ are
labeled as adverbial modifiers for the verb ‘sing’. Such dependencies give the relation of
one word with other. Similarly consider the sentence ‘Yashodhan, based in Alandi, buys
and sells the petroleum products’. For this sentence, the Stanford Dependencies (SD)
representation is shown in figure (4.4) below:

Figure 4.4 Stanford typed dependencies

These dependencies are mapped on directed graph representation, nodes in which


the words and edges are represented by the grammatical relational labels. Graphical
representation for the above sentence is given in Figure (4.5).

Figure 4.5 Stanford Dependencies for the ‘Yashodhan, based in Alandi, buys and
distributes Petroleum products.’

95
In this example also the verb ‘buys’ is at the centre of the parse shown by ‘root
(ROOT-0, buys-7)’. The word ‘Yashodhan’ is showing the ‘nsubj’ dependency. That
means it is the nominal subject of the action described by the verb ‘buys’. Similarly the
word ‘products’ is the object of the verb ‘buys’. In this way the dependency relations for
each word are available with the use of Dependency Grammar [Marneffe and Manning,
2008]. Thus it can be observed that the Stanford Dependency Grammar formalism is able
to give the POS tags as well as the relations among these POS tags.

4.6 Dependency Grammar and ‘Vakya’

As shown in figure (4.1) and (4.2), in the constituent phrase structure the root node
of a tree is the starting grammatical non terminal construct which is generally the entire
sentence. While in case of Dependency Grammar, the verb of the sentence is at the root
node for analysis. This verb centrality will be more useful to us for the Text
Summarization as will be discussed further in the chapter 6th and 7th chapters where we
discuss the use of Karakas for Text Summarization. With this verb centrality, the notion
of Dependency Grammar is closer to the opinions of Khandapaksha.

These syntactic trees and the associated syntactic function are closer to the
terminologies of ‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’. Thus, the dependency graph not
only gives the information about the mutually dependent words, but it also gives the
information about how they are dependent on each other. These dependencies are labeled
thus giving the relation information.

When Dependency Grammar tries to find out the dependencies among the head and
governor words along with their associated POS tags then it is closer to the concept of
Aakanksha. As discussed in section (4.2.1) with the notion of Aakanksha, the Vakya
principle tries to form the interconnection among the words by considering the verb as
the central element. With these interconnections, only the meaning of the sentence is
obtained. The Aakanksha declares that the words are unable to convey the complete
meaning of the sentences without the presence of other words [Sharma, 2004].

96
As discussed in section (4.2.3), Sannidhi focuses upon the word utterance or
presence which contributes to complete the sentence and obtaining the meaning of
sentence. Alternatively, we can say Sannidhi identifies the syntax of the language
[Sharma, 2004; Arjunwadkar, 2008]. With this view, if the placement of POS tags is used
as a feature to analyze the sentence structure then it is possible to incorporate the notion
of Sannidhi for English text.

In the proposed work, The Dependency Grammar is used in two parts. In one part,
while performing the surface level analysis, in which we intend to use the statistical
feature based approach with the syntactic structure, we will combine the statistical
Mimansa feature based study with the Dependency Grammar based syntactic POS
tagging. With this, we have tried to incorporate the Sannidhi principle of Vakya. During
this approach we extract the relevant Mimansa features for every sentence then the
associated POS tag of every word in sentence is obtained by using Stanford POS tagger.
Further the placement of the words in accordance with the POS tag and the repetition of a
particular POS tag are obtained for all the sentences in the given document. The
sequential placement and repetition of POS tags will enable the system to observe how
the words are placed in a sentence and it will also enable the system to observe the
proximity among them. The placement of POS tags and repetition of POS tags is
different for well formed sentences and ill formed sentence. To make the system
understand such structures, the Neural Network will used. This network is trained to
understand the placement sequence and frequency of the important POS tags in a
particular sentence. Thereby the system will be able to understand the structure of
important sentences in term of their constituent words POS tags.

In the second part, the concept of Aakanksha i.e. dependency relations among the
words with their associated labels will be used to incorporate the Syntacto semantic
nature of the Sanskrit language with the help of concept of Karaka. The detailed
discussion about the same will be given in the sixth and the seventh chapter.

Our aim in the proposed approach is to develop the efficient hybrid Text
Summarization system, which will be rich in Syntacto-semantic concepts. We do not

97
intend here to incorporate the pragmatic level of language analysis in the proposed
model. Hence, one thing that can be stated here is that, as the concept of Yogyata is much
related to pragmatic analysis; it has not been incorporated in the proposed model. It can
be studied or incorporated as a part of applications with discourse analysis, word sense
disambiguation etc.

4.7 Conclusion

From the above discussion, it becomes evident that the ‘Syntactic dependency’ is
the important dependency formalism. Its views are closer to the views of Vakya principle
of ‘Purva Mimansa’ which is intend to be used in the proposed frame work. In
dependency structures, there is one to one association among the words of the sentence
i.e. for every word in a sentence there is exactly one node in the dependency tree. Thus,
in these word grammars, there exist a word and its associated dependency i.e. relation
that connects it with the other word or element in the structure. When this is compared
with the constituency structures then it can be easily observed that for the constituency
structures there exists one-to-many relationship among the sentence elements. Thus, from
the linguistic analysis point of view the dependency structures are minimal and rich in
giving the relationship among the words. With their associated framework, the
grammatical formalism of Dependency Grammar goes more close to the concepts of
‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’. Therefore for the task of summarization where
the sentence interpretation characteristics like ‘Aakanksha’, ‘Sannidhi’ and ‘Yogyata’ are
essentially required, dependency formalism becomes an ideal choice. With all these
concerns, we choose Stanford Dependency framework, which is a syntactic Dependency
Grammar structure, as a tool for linguistic analysis in carrying out the task of
identification of important sentences for Text Summarization.

98

You might also like