153 Sample-Chapter PDF
153 Sample-Chapter PDF
153 Sample-Chapter PDF
This chapter discusses some of the basics of English language, which are relevant
for the understanding of language analysis. Every language defines certain basic
alphabets, words, word categories and language formation rules called grammar
rules. These categories are made according to their role in parts of speech. From
the language analysis point of view, the style of a language must be concretely
defined to design a working parser for that language. Though there is no hard
and fast rule to name the formal categories, but it is customary to give various
parts of speech their traditional names. The set of grammatical categories (like
noun, verb, etc.) which are taught in English literature are very informal and are
not precisely defined as formal grammar. In addition to this, there are many
more distinctions that have to be made in a real parser.
Hence, it is evident that for language processing using computer, the grammar
writer should very clearly understand the basic word categories of any language,
types of words and other constituents of the language and the process in which
they interact with each other.
In linguistic analysis, Chomsky has done pioneer work in 1960s. He has formally
defined various grammars, types of grammars, features and characteristics of
grammars. These are described in detail later in this chapter. As a result of
Chomsky’s work on transformational generative grammar, a vast amount of fairly
descriptive linguistic analysis is carried out, and as a result of it, a large repository
of terminology has grown up, which augments informal set of old fashioned
terms. Now let us describe elementary terminology of English grammar.
The well-accepted English grammar terminology defines the following word categories:
2.2 SENTENCE
(vii) Conjunction
(viii) Interjection
Adjectives can have degrees. The degrees mentioned quantity of the concept
indicated by adjective. There can be three degrees. Positive degree, comparative
degree, superlative degree. The positive degree is simple form of adjective. The
comparative degree is used to indicate comparison between the concepts. And
the superlative degree is highest degree of quality, e.g., strong, stronger, strongest.
Basic English Concepts 27
Article: The words a, an and the are called articles. They come before a noun.
A and an are indefinite articles because these usually leave indefinite the persons
or thing spoken of, as a doctor, an orange:
“The” is called definite article because it normally points to some particular
person or thing.
Pronoun: A word that is used instead of noun is called pronoun. The pronouns
can be of various types. Personal pronoun like, I, we, he, she, it, they, you. They
indicate the personal category. The persons can be of three types. 1st person, 2nd
person and 3rd person.
Verb: A word that tells or asserts something about a person or thing. For
example, Harry laughs, the clock strikes. The verbs can be of two types.
Types of verbs: Transitive and intransitive verbs. Transitive verb is a verb
which denotes an action which passes over from the subject to an object. The
intransitive verb is a verb which denotes an action which does not pass over
to an object or which expresses a state or being. For example, he ran a long
distance.
Most transitive verbs take a single object. But such transitive verbs as give,
ask, offer, promise, tell, etc. take two objects after them, an indirect object which
denoted the person to whom something is given or for whom something is done,
and a direct object which is usually the name of something, for example,
His father gave him (indirect) a watch (direct).
He told me (indirect) a secret (direct).
Most verbs can be used both as transitive and intransitive verbs. It is therefore,
better to say that a verb is used transitively or intransitively rather than that is
transitive or intransitive.
Some verbs, e.g., come, go, fall, die, sleep, lie, denote actions which cannot be
done to anything, they can therefore never be used transitively.
Voice is the form of verb which shows whether whatever is denoted by the subject
does something or has something done on it. Active and passive are two methods
of framing an English sentence. They uses different types of verbs. In active voice
the verb form shows that the person or thing denoted by the subject does something
or we can say is doer of the action.
e.g., Ram helps Hari.
The active voice is so called because the person denoted by the subject
acts.
A verb is in passive voice when its form shows that something is denoted to
the person or thing denoted by the subject, e.g., Hari is helped by Ram.
The passive voice is so called because the person or thing denoted by the
subject is not active but is passive, that is, suffers or receives some action.
28 Natural Language Processing
2.4 TENSES
Tense is the concept which indicates about ‘time’. In literature, there are three
demarcations done on timing template.
(i) The time which is presently going (or present).
(ii) The time which is before the present or the time which has passed (past).
(iii) The time which will come after the present or the time which has not yet
arrived, (future) to represent these three timing categories, language
incorporates the concept of ‘tenses’. The tense of a verb shows the time of
an action or an event. Corresponding to three categories there are three
tenses. These are present tense, past tense and future tense. In English
different verb categories represent these tenses. A verb that refers to present
time is said to be in present tense. A verb that refers to past time is said to
be in past tense, and a verb that refers to future time is said to be future
tense.
For example, see the following examples:
(i) I write this letter to please you.
(ii) I wrote the letter in his very presence.
(iii) I shall write another letter tomorrow.
While performing the language analysis these verb forms of tenses are utilized
to find the timing of the event. However, there are many variations of these verb
forms in English language. Sometime a past tense may refer to present time, and
a present tense may express a future time. For example,
Basic English Concepts 29
I wish, I knew the answer. (This sentence is equivalent to the saying that I am
sorry I don’t know the answer. It is past tense, present time).
Let’s wait till he comes (present tense – future degree)
Below we give the chief tenses (active voice, indicative mood) of the verb to
love.
Present tense
Singular number Plural number
1st person I love We love
2nd person You love You love
3rd person He loves They love
Past tense
Singular number Plural number
1st person I loved We loved
2nd person You loved You loved
3rd person He loved They loved
Future tense
Singular number Plural number
1st person I shall/will love We shall/will love
2nd person You will love You will love
3rd person He will love They will love
In English language each tense is further divided into four categories, namely,
simple present, present continuous, present perfect, present perfect continuous.
See the following sentences:
1. I love (Simple present)
2. I am loving (Present continuous)
3. I have loved (Present perfect)
4. I have been loving (Present perfect continuous)
Verb in all of these sentences refers to the present time, and are therefore said to
be in the present tense. In sentence 1, however, the verb shows that action is
mentioned simply without anything being said about the completeness or
incompleteness about the action.
In sentence 2, the verb shows that action is mentioned as incomplete or continuous,
that is, it is still going on. In sentence 3, the verb shows that the action mentioned
as finished, complete or perfect, at the time of speaking.
The tense of verb in sentence 4 is said to be present perfect continuous because
the verb shows that the action is going on continuously and not completed at this
present moment.
30 Natural Language Processing
Thus, we see that the tense of a verb shows not only the time of an action or
event, but also the state of an action referred to.
Just as the present tense has four forms, the past tense has also following four
forms:
1. I loved (Simple past)
2. I was loving (Past continuous)
3. I had loved (Past perfect)
4. I have been loving (Past perfect continuous)
According to English sentence formation rules, a verb agrees with its subject
in number and person. There are different verb forms corresponding to different
number and person. This requirement of type matching corresponding to number
and person is utilized in language analysis to find out whether a sentence a
syntactically valid or not.
Besides the main verbs in English language, there are certain verbs which are
known as auxiliary verbs. The verbs be (am, is, was, etc. have and do, when used
with ordinary verbs to make tenses, passive forms, questions and negatives, are
called auxiliary verbs. The verbs can, could, may, might, will, would, shall, should,
must, and ought are called modal verbs. They are used before ordinary verbs and
express meaning such as permission, possibility, certainty and necessity. Need
and dare can sometimes be used like modal verbs.
Present continuous
Active Passive
I am loving I am being loved
You are loving You are being loved
He is loving He is being loved
We are loving We are being loved
They are loving They are being loved
Present Perfect
Active Passive
I have loved I have been loved
You have loved You have been loved
He has loved He has been loved
They have loved They have been loved
Simple past
Active Passive
I loved I was loved
You loved You were loved
He loved He was loved
They loved They were loved
Past continuous
Active Passive
I was loving I was being loved
You were loving You were being loved
He was loving He was being loved
They were loving They were being loved
Past perfect
Active Passive
I had loved I am loved
You had loved You are loved
He was loved He is loved
They had loved They had been loved
Basic English Concepts 33
(iii) Non-finites
Present infinitive to love to be loved
Continuous infinitive to be loving ——————
Perfect participle to have loved to have been loved
Present participle loving being loved
Perfect participle having loved having been loved
2.5 ADVERB
Words which modify meaning of a verb, an adjective, or another adverb and tells
the quality of the verb are known as adverbs. e.g., quickly, very, and quite are
adverbs in the following sentences:
(i) Rama runs quickly.
(ii) This is very sweet mango.
(iii) Govind reads quite clearly.
Besides these, there are many cue phrases like however, anyway which mark
the change of theme in the discourse. These have special significance in the
linguistic analysis. It is used to analyze the theme of discourse.
We all know that dictionary is something that provides definition of words. From
computer storage viewpoint how definitions are stored in it differ in some sense.
This definition of word from the viewpoint of storage in computer database is
important for linguistic analysis and it is this definition we will describe in this
chapter.
Let us discuss these categories in little bit detail from lexicon storage point of
view.
Articles
It contains only three words a, an, the. The dictionary definition of ART looks like:
A (ART A), AN (ART AN), THE (ART THE)
Nouns
These are classified as animate or inanimate. These are further classified into
singular and plural. The inanimate nouns are further classified into categories
like place, conveyance, time, objects, etc. and the animates are further classified
into male and female categories. Some examples of words are as follows:
RAM (NOUN RAM ANIMATE MALE SINGULAR)
BOY (NOUN ANIMATE MALE SINGULAR)
CAR (NOUN CAR CONVEYANCE SINGULAR)
RESTAURANT (NOUN RESTAURANT PLACE SINGULAR)
SUNRISE (NOUN SUNRISE TIME SINGULAR)
Pronouns
As such the pronouns have got maximum number of categories. First criterion for
classification is person, based on this classification the categories are, first person,
second person, and third person. Further criteria are number, gender and role.
Some examples of pronouns are as follows:
HE (PRONOUN HE THIRD PERSON MALE SINGULAR NOMINATIVE)
THEY (PRONOUN THEY THIRD PERSON MALE FEMALE NEUTER
PLURAL NOMINATIVE)
YOU (PRONOUN YOU SECOND PERSON MALE FEMALE NOMINATIVE
ACCUSATIVE SINGULAR PLURAL)
I (PRONOUN I FIRST PERSON MALE FEMALE SINGULAR NOMINATIVE)
36 Natural Language Processing
Numbers
Number can also appear in the sentence and a peculiar feature about them is
that they have got two representations, one in figures while other in words. An
example dictionary of entry of number words may be:
SIX (NUMBER SIX)
TWENTY (NUMBER TWENTY)
Structure of Dictionary
The dictionary should be structured so as to retrieve the definition as quickly as
possible, i.e., the search time should be reduced to minimum. One possible method
to reduce the search time is discussed below:
(i) Break up the whole dictionary according to the first alphabet of the word.
This way we will have 26 sublists of dictionary.
(ii) If we have just 1000 words in dictionary, there will be on an average 40
words per list requiring less time for searching.
Lexicon serves the purpose of providing tokens to the parser. The words along
with their definition remain stored in the dictionary. The dictionary specifies for
each word, its part of speech, any non-default value for its features, and
presumably something about its meaning. However, in English, as in all other
languages, individual words, often can be given different prefixes and suffixes,
for example, word “love” can appear in different guises, such as “loves”, “ loved”,
“loving”, “unloving”, etc. all of these words have one basic word and various
other derived forms. From computer storage point of view, it will be wasteful if
dictionary had to include all of these. The better approach would be to have the
lexicon use explicit knowledge of the structure of words (their morphology) and
have it figure out when a word is simply a variant of one that is already in the
dictionary. However, it needs to be mentioned that how to generate these
variations of words. A care must be taken in generating these patterns, e.g., an
error can be reported in the following:
“Kiss” —Æ “kis” + “s”
to some degree, such mistakes can be prevented by installing more stringent checks
on which endings are allowed in which circumstances. For example, a singular
noun ending in “s” will never form its plural by adding “s”, but rather by adding
“es”. It also helps to first ensure that a word is already not in the dictionary
before attempting to remove the “ endings” and the end product, after the lexicon
has removed all the supposed endings, is itself in the dictionary.
If such procedure is stored in the lexicon, then the dictionary can be made
reasonably compact, as the morphological unit will take care of standard
examples. Furthermore having default values for features will mean that for the
root words, like singular nouns, the dictionary need not even indicate that the
word is singular, since this is the default case. Such routines are called
38 Natural Language Processing
name of the concept, information about the deep case structure of the concept
and default values for those cases. For example, the case structure of the concept
like “drink” might include the cases: agent object and instrument. The cases are
meant to account for the fact that the concept ‘drink’ includes an agent who
performs the action’, ‘an object that is drunk’ and sometimes’ an instrument that
is used to ‘aid in drinking’. So, given the event description “Jatin drank a can of
beer”, the agent of the action is Jatin, the object is beer, and the instrument is
“can”.
The default values associated with case instruments are meant to be used as
tool for rejecting aberrant interpretations of text, whereas the text “Jatin has a
coke. He drank.” is interpreted to mean that “Jatin drank a coke”, the text “Jatin
drank a kite. He drank”, is not interpreted to mean “Jatin drank a kite”, since the
default value for the object of a drinking event is ‘liquid’ and a kite is not a type of
‘liquid’.
The case structure and the default and the default values associated with
each case are stored in a 3 tuple. These are sometimes called templates for the
event/state concept.
The structure of the template is: