100% found this document useful (1 vote)
1K views

NLP Techmax NLP

The document outlines the course contents for a Natural Language Processing module. It includes 6 units covering topics like word level analysis, syntax analysis, semantic analysis, pragmatics, and applications of NLP. Unit 1 provides an introduction to NLP including its history, generic systems, levels of processing, and challenges. Unit 2 discusses word level analysis through morphology, lemmatization, and n-gram language models. Unit 3 covers part-of-speech tagging and syntactic analysis using techniques like hidden Markov models and conditional random fields. The document lists the main topics and number of hours for each unit over a total of 52 hours for the course.

Uploaded by

Sankalp Rane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

NLP Techmax NLP

The document outlines the course contents for a Natural Language Processing module. It includes 6 units covering topics like word level analysis, syntax analysis, semantic analysis, pragmatics, and applications of NLP. Unit 1 provides an introduction to NLP including its history, generic systems, levels of processing, and challenges. Unit 2 discusses word level analysis through morphology, lemmatization, and n-gram language models. Unit 3 covers part-of-speech tagging and syntactic analysis using techniques like hidden Markov models and conditional random fields. The document lists the main topics and number of hours for each unit over a total of 52 hours for the course.

Uploaded by

Sankalp Rane
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 137

CoURSE CONTENTS

Module Unit Hrs


Toplcs
No No.

1. Introduction History of NLP, Generic NIP system, levels of NLP


Kniowledge in language processing, Ambiguity in Natural
language, stages in NLP, challenges ofNLP,Applicatíons of
NLP (Refer Chapter- 1)
2. Word Level Morphology analysis -survey of English Morphology, 10

Analysis Inflectional morphology &Derivational morphology


Lemmatization, Regular enpression, finite automata,
finíte state transducers (FST), Morphological parsing
with FST, Lezícon free FST Porterstemmer. N-Grams- N-|
gran language model, N-gram for spelling correction.

(ReferChapter-2)
3. Syntax Part-0f-Speech tagging( POS)- Tag set for English ( Penn 10
analysis
Treebank), Rule based POS tagging. Stochastic POS
tagging, Issues -Multiple tags &words, Unknown words.
Introduction to CFG, Sequence labeling Hidden Markov
Model (HMM), Maximurm Entropy, and Conditional
Random Field (CRF). (Refer Chapter-3)
Lexical Semantics, Attachment for fragment of English 10

Semantic sentences, noun phrases, Verb phrases, prepositional


Analysis phrases, Relations among lexemes & their senses

Homonymy, Polysemy, Synonymy, Hyponymy, WordNet,


Robust Word Sense Disambiguation (WSD), Dictionary
based approach (Refer Chapter -4)
5. Pragmatics Discourse-reference resolution, reference phenomenon, 8

syntactic& semantic constraints on co reference


(Refer Chapter-5)
6 Applications Machine translation, Information retrieval, Question 10
(preferably answers system, categorization, summarization, sentiment
for Indian
Regional analysis, Named Entity Recognition, (Refer Chapter-5)
Languages)
Total 52
Natunal Language l'roronnbyM (MU)

MODULE I

Chapter 1 : Introduction

History of NIP, Generic NIP system, level's of NiP,


Natural
Kriowledge in lanquaie oroet
language, stages in NIP, chalenges of
NLP,Applications of NLP ing, Arnbiguity in
1.1 What is Natural Language Procewsing 7..
1.2 History of NIP ..
12
1.3 Generic NILP
System.
1-2
1.4 Levels of NLP *nbSbNDAD**Rd#h#h#R#an#9#G4VeR* "#*#DBGN**#9#9 D6#* # 90g#9 4un46sn4sdesa ae *****43
1.5
Knowledge in Language
Processing.w.e
smm vRg .1-4
1.6
Ambiguity Natural
in
l.anguage..m s#sdesoNEGravengesaves net desed*4#94 0 456e0sed d4dres se dt ssess.se. 1 -6
1.7 Stages in NLP.
1.8
Challenges of NLP,
1.9
Applications of NILP... 1-10

1-12
MODULE I1
Chapter 2: Word Level
Analysis
Morphology analysis- survey of
2-1 to 2-24
morphology, Lemmatization, English Morphology,
Inflectional morphology 8
Regular
Morphological parsing with FST, Lexicon expression, finite automata,
N-gram for spelling correction. free FST Porter finite state Derivational
transducers (FST),
stemmer. N-Grams-
2.1 N-gram language model,
Morphology Analysis.
2.2
Survey of English
Morphology... O*#909S90G* #SRDBg#ge#GVG G S14954n*Bas usdgndud4si sugusg*D*nd dnas d as gor0s 0gas ds *dBes ds * * #*s esns aseone*
.2-2
2.3

2.4
Inflectional
RI9* 9GDUS* *GDnMorphology..
Morphology and Derlvational
Lemmatizatlon...
momsndsosn4oanvas*s gsoaousegnsans gs10s56aD0s*novs s as sgases s#Sn9 s1adas s.2-2
be
2-3
2.4.1
Difference between
2.5 Stemming and Lemmatízation.. ss**sssSsSSs s2-6
Regular Expresslon... #*****011099090t10 *4#nn0004s440##**************"
.2-6

2-7

TechyDITCAtI0ns
Knouledge
Natural Language Processing(MU) Table of Contents

2.6 Finite Automata.. *****9 NH004s90444in 4444*9*****************


.2-9

2.7 Finite State Transducers(FST).. 2-12

2.8 Morphological Parsing With FST. us 2-15

2.8.1 Lexicon and Morphotactics. * * * * * * * * * * * * * * * * * * * s v e i e * * * * * * " * * * * * ****** 5 a * * * i a * 5 o s s 4 b a d e s +


2-16

2.9 Lexicon Free FST Porter Stemmer... ********esise 2-19

2.10 N-Grams- N-Gram Language Model. 2-19

2.11 N-Gram for Spelling Correction..******************** ******s*******


2-21

Module I1I

3-1 to 3-26
Chapter 3: Syntax Analysis

Part-Of-Speech tagging( POS)- Tag set for English (Penn Treebank), Rule based POS tagging,
Stochastic POS tagging, ssues -Multiple tags & words, Unknown words. Introduction to CFG, Sequence

labelling: Hidden Markov Model (HMM), Maximum Entropy,and


Conditional Random Field (CRF).
.3-2
3.1
3.1 Part-Of-Speech Tagging (POS).
.3-2
3.1.1 Part of Speech Categories.. sae**********a9a**********a****ss*********************************

- 5

3.1.2 Part-0f-Speech Tagging


3-5
3.1.3 Methods for Part-0f-Speech(P0S) Tagging. 4 * a a a t * * a a t s * a * * a * * * * * a * * * * * * * * * * * * * * * * * * * * * * * * * * *

aanaassaunaauanasssnaO .3-8
3.2 Other Issues. nunusannantnl

******aa********************************aaasaa******a*4a*******aa**astsaa***sssasuman .3-8
3.2.1 Multiple Tags and Multiple Words..
*********************sn*****************a*****ha**************************************************************************************** 3-8
3.2.2 Unknown Words.
3-9
Introduction to CFG...
a 4 8 * * * * * a a a A R A A B A R 4 L s u a a s 4 a

3.3 nsonsansssseal

3-9
3.3.1 Constituency..
.3-11
3.3.2 Context-Free Grammars. ad
*****************************************AA ***** ****************s****s******ssa

************A 4***********************************************************"******************************************************* neaso .3-15


3.3.3 Parsing.

*********************************"***********************************************************ana e 3-16
3.3.3(A)Top-down parsing.
3-17
3.3.3(B) Bottom-up parsing
a * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ° * * * * * * * * * * * *

************as
ssansane esu

3-20
3.3.4 Coordination.
Iech Kaewledge
uDIIC tions
Natural Language Processing (MU) 3

3.3.5 Agreement.

3.3.6 The Verb Phrase and


Subcategorization..
3.4 Sequence Labeling...
2
aundaudgena nsnanunduesve nu3-2)
3.4.1 Sequence Labeling as Classification.
3.4.2 Hidden Markov Model.
easa**hh*sbo*********ht*s************ 4s44#d9bwsnse0A -22
3.4.3 Conditional Random Fields..
24
3.4.4 Maximum Entropy.
.3-25
MODULE IV 3-26

Chapter 4: Semantic Analysis


4-1 to 4-28
Lexical Semantics, Attachment for
fragment of English- sentences,
prepositional phrases, Relations among lexemes & their senses
noun
phrases, Verh nhrae
rases, |
-Homonymy, Polysemy, Synonvmy
Hyponymy, WordNet, Robust Word Sense Disambiguation my
(WSD), Dictionary based approach
4.1 Lexical Semantics.

4.1.1 Semantic Analysis.. sOnsesassssn4t-2


*****s*****a***asasasem****"a *****.aseo***a********"************************************s*********s esenseeeesesdades.

4.1.2 Lexical Semantics

4.1.3 Elements of Lexical Semantic


**********************"******************a*****a***********"as *****a******s*s s***esesoses*****************e*******iu***e **L 4-2
Analysis. **********************a*****aa*****************a******a************* ****4s ******** 4-3
4.2 Attachment for Fragment
of English. nausssuuasnuesussseauusuasuemdensounuuunensOMsuamuu uanssseanenneunauuuensuseT
4.2.1 Semantic Attachment.
..4-3
4.2.2 Strategy for Semantic Attachments..
***°*******Desee**p**auua**a*********nae****** *asea********"****************sia**0**4-6
4.2.3 Attachments for a Fragment of
English..
******ensssse***************** "**e*a**ee*****agee .4-7
4.3 Relations Among Lexemes and Their
Senses... 4-12

4.3.1 Senses... ******ass *sape********a*a**sspnuaspss**************9***osno*****s******e******

* e9**********°***************************°*" 4-12
4.3.2 Relations Between
Words/Senses.. ************************** **********"*******"sse**a**ss***************e*a*s****ee*s**a** 4-12
4.4 WordNet..DBaubenasnsapaaaBnapenepuasaeBsuapauasuseppuuduseus ususeuesuuudUpausdeauuasuedsesuesduuussssaanusennaasesdneaeeu 4-17

4.4.1 WordNet and Synsets.. . 4-17


************s*****e**************************************asasaanan****

4.5 Word-sense Disambiguation(WSD).. 4-18


s u p u e a s u s o e s u n n u e n B B D a n m u n a s s

ETech Knowledge
PubilC 3tions
Natural Language Processing (MU) Table of Contents

4.5.1 Word-sense Disambiguation(WSD). 4-18

4.5.2 WSD Methods... .. 4-19

4.5.3 WSD Evaluation. ..4-21

Difficulties in WS... ,4-22


4.5.4
4.5.5 Applications ofWSD. *******"*******"********'BBsts**** 0iee*"*******""***"**"
4-22

4.6 Dictionary(Knowledge) Based Approach. 9A9544440144i9444#d4#*444949449441044499*449**441*************,4-23

MODULE V

5-1 to 5-14
Chapter 5: Pragmatics

Discourse -reference resolution, reference phenomenon, syntactic & semantic constraints on co

reference
Discourse- Reference Resolution...
5.1
*************s********************a**taseess*************anonenap**#a*es**********************************************************

5.1.1 Concept of Coherence.


5.1.2 Discourse Structure.
****"***"*********0a*0ee0uspsses***************ss**************s*as***"******a*sssss bss*******"***********************************

"a**a*********e******"**sseose**********************a***a**e**************as**a*a"**a*a**************************
5.1.3 Discourse Segmentation.

5.2 Coreference Resolution...

5.3 Syntactic and Semantic Constraints


on Coreference.
aaaaa************a*********************a***

Syntactic Constraints Reference..


seth********ia***a*a**a*8AR

5.3.1
0*A0*a*****.*******"*******eaa

*p*s******t*s**is*setsa*ea**ae0*******"****°*******"*********************************************"**** *

5.3.2 Selectional Restrictions.. * * * * * * * * * * * * * * * * * * * * * * * s

UNIT VI

6-1 to 6-26
Chapter 6: Applications (Preferably ForIndian Regional Languages)

system, categorization, summarization,


Machine translation, Information retrieval, Question
answers

Sentiment analysis, Named Entity Recognition.


.6-2
6.1 Machine Translation.
..6-2
6.1.1 Modern Machine Translation.. * * * * * * * * * * * * * * s a 0 * * * * * * * * * * * * * * * * 4 * * * * * * * * * * * * * * * * * * * * * * * * * * * " * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Tech Knowledge
PubIiCattuns
Natural Language Processing(MU) 5

6.1.2 Approaches..
Table of Content
tenta
99******* **"**** ******°********abea**io peeedne
. . .s4esesesse********** ************0*s..
6.1.3 Different Types of Machine Translation..s4s0noed4409s200
999sssddstoss4svuee
3
6.1.4 The benefits and Uses of Machine Translation... esssteesevss4ese*H .b-
.64
. .44a6sros*4* ****************n
6.1.5 Difference between Rule-Based MT vs. Statistical MT.. 6-5
sseepedee

6.2 Information Retrieval.

6.2.1 Types of IR Models. aseaaaaoatbgsashenesnansess*e*******a*soa *44a**#****asoaee*


6.2.2
ses**************************************rnsseveo*************wne
***********se
Difference between Data Retrieval and ".6-7
Information Retrieval..
6.3 Question Answers System.. .6-11
6.4 Categorization.. 6-12
6.5 Summarization... 6-15
6.6 Sentiment Analyses.. 6-18
6.7 Named Entity Recognition(NER)..m 6-200
6.7.1 Types of Named Entity 6-23
Recognition..
6.7.2 Challenges in Named Entity **** u. b-24
Recognition.. Bauenas aaesanasauanaaneasatennsaa aeeasatasanaatataatastuauaanana
6-25

TRR
Introduction
Module 1

Syllabus
History of NLP, Generic NLP system, levels of NLP, Knowledge in language processing, Ambiguity
in Natural language, stages in NLP, challenges of NLP Applications of NLP

TOPICS

. 1-2
1.1 What is Natural Language Processing ?
1-2
1.2 History of NLP..
1-3
.3 Generic NLP System. RORSssanopyauaupnapsappppBassepsspnaagesssaegn
uagenaueuupepnseesamse,

*saa.-4
1.4 Levels of NLP. POBassenngeesasammuanonan

1-6
1.5 Knowledge in Language Processing.
1-8
1.6 Ambiguity in Natural Language.
** -9
1.7 Stages in NLP..

Challenges of NLP.. * 10
1.8

1.9 Applications of NLP. *0*seae** *Reananananaanaannan


1-12
Natural Language Processing (MU) 1-2

1.1 What is Natural Language Processing ?


Introduction
Humans are using language as a primary means of communication and
by using this tos
are
expressing their emotions and ideas. I they

Language is used to
shape the thoughts; it has structure and also it carries a
using our
language, we naturally learn the new concepts and hardly realise
meani
how we By nr cess
this natural language.
Natural Language
Processing is the process of computer analysis of input
human language, and conversion of this provided in
input into a useful form of representation,
.

Natural language
processing
is concerned with the development of
aspects of human computational models af
language processing. The following are the two main
reasons for
developments Cuch

Develop automatic tool for natural language


processing
Gain better
understanding of human communication
When we build
computational models by using human language, we need
where this processing abilities and incorporates how human processing abilities
language. It also needed
collects, Store and process the
a
knowledge of the world and of language.
The input and output of the NLP is text or
speech.
1.2 History of NLP

Machine Translation (1940-1960)


In year 1940 Natural
the first NLP
language processing has started. In 1948,in Birkbeck
application was developed. Later, in College, London,
between linguistics and 1950, there were contradictory opinions
computer science.Chomsky came
structures and claimed that up with his first book called syntacue
language is
reproductive in
idea of Generative Grammar that nature.Chomsky
then in 1957 introduceu
was a rule-based description of
Flavoured with Artificial syntactic structures.
Intelligence (1960-1980)
The year 1960 to 1980
witnessed the
which was a finite state machine that developments like Augmented Transition Nerwo
is capable of recognizing regular languages. n
1968, Linguist Charles J.
FillmoredevelopedCase Grammar and this case grammar u
English language to express the association
between nouns and verbs by usi 1g the
preposition. Here, Case role is to link certain
types of verbs and objects. For example
broke the table with the stone". In this
example case grammar identify Meena as au agent,

table as a theme, and stone as an instrument. SHRDLU and LUNAR were the stems in
key »y*
the year 1960 to 1980.
R E TechKnowledge

u b I I l d t v
Natural Language Processing (MU) 1-3 Introduction

Terry Winograd in 1968-70 wrote a program named SHRDLU. This program helps users to
communicate with the computer and moving objects. It can handle orders such as pick up the
green ball and likewise answer the questions like What is inside the black box. The SHRDLU's
key importance is it shows those syntax, semantics, and reasoning about the world that can
be combined to produce a system that understands a natural language
Another system is LUNAR. It is the typical example of a Natural Language database interface
system. LUNAR used ATNs and Woods' Procedural Semantics. It was proficient of translating
extravagant natural language expressions into database queries and handle 78% of requests
without mistakes

1980 Current

NLP was based on complex sets of hand-written rules till the year 1980. machine learning

algorithms were introduced after 1980 for language processing. NLP started growing faster in the
ng of the year 1990s and accomplished good process accuracy, especially in English
Grammar. Electronic text was introduced in 1990 thatgiven a decent resource for training and
examining natural language programs. Other factors may include the availability of computers
with fast CPUs and more memory. The keyfeature behind the progress of natural language
become
processing was the Internet. In the year1990 Probabilistic and data-driven models had
become
quite standard. After that, in the year 2000, ahuge amount of spoken and textual data
available.

1.3 Generic NLP System


The Generic NLP systems are ELIZA, SysTran, TAUM METE0, SHRDLU, and LUNAR.

Generic NLP System

1. ELIZA

2. SysTran

3. TAUM METEO

4. SHRDLU

5. LUNAR

Fig. 1.3.1: Generic NLP system

Tech Knouledge
Natural Language Processing (M0) 1-4

ELIZA: ELIZA
Introduction
an early natural language understanding by Jose ted
seph
program created
by
Weizenbaum.The human conversation with the user is mimicked by using syntactic
ntactic pattern
patter
his system demonstrates communication between humans and machines.

2. SysTran (System Translation) : In 1969, the SysTran machine translation


system
developed. This system was developed for Russian- English translation. This system prodwas
the first online machine translation service named Babel Fish. This Babel Fish ovides
Ised by
Alta Vista Search engine to handle translation request from users.

3. TAUM METEO: TAUM METEO, is natural language generation system. This system
a
was
used in Canada for generating weather reports. This system accepts daily weather
reporh
orts in
English and French.
. SHRDLU: Terry Winograd in 1968-70 wrote a program named SHRDLU. This program helne
users to communicate with the computer and moving objects. It can handle orders such
nelps
asas
pick up the green ball and likewise answer the questions like What is inside the black
box
Ox.
The SHRDLU's key importance is it shows those syntax, semantics, and
reasoning about the
world that can be combined to produce a system that understands a natural
language.
5. LUNAR: LUNAR is the typical example of a Natural Language database interface system. It
was an early question answer system that answers questions related to moon rock. It
was
proficient of translating extravagant natural language expressions into database
handle 78% of requests without mistakes.
queries and

1.4 Levels of NLP

There are seven


interdependent levels to understand and extract
meaning from a text or
spoken words. In order to understand natural languages, it's
them: important to differentiate amongst

Levels of NLP

1. Phonology level

2. Morphological level

3. Lexical level

4. Syntactic level

5. Semantic level

4. Disclosure level

5. Pragmatic level

Fig. 1.4.1: Levels of NLP


Tech Kauledge
Introduction
Natural Language Processing (MU) 1-5

1. Phonology level

This level basically deals with the pronunciation. As English spelling is especially oniy
for example, the h
partially phonemic, John inputs the data does not show these very clearly;
in John is silent and the two as in data resemble to very unlike sounds

2. Morphological level

and suffixes and


Morphology deals with the smallest parts of words that convey meaning,
from smaller meaning. For
prefixes. Morphemes means studying how the words are built
'rats' have two morphemes
example, the word 'dog' has single morpheme while the word
'rat' and morpheme 's' denotes singular and plural concepts.

3. Lexical level

their lexical
The lexical level deals with the study at the level of words with respect to
collection of individual
meaning and Part-Of-Speech (POs). This level uses lexicon that is a
is abstract unit of
lexemes. A lexeme is a, basic unit of lexical meaning; which an

of forms "senses" taken by a single


morphological analysis that represents the set or

take the form of noun or a verb but its POS and


morpheme. For example, "Duck", can a

in the phrase/sentence.
lexical meaning can only be derived in context with other words used

4. Syntactic level

sentences.It studies the proper


Syntactic level deals with grammar and structure of
tagging output
between words. The POS of the lexical analysis can be used at
relationships
and clause brackets. Syntactic Analysis
the syntactic level of two group words into the phrase
more meaning
also referred to as "parsing ", allows the extraction of phrases which convey
than just the individual words by themselves, such as in a noun phrase.

5. Semantics level

This level deals with the meaning of words and


sentences. There are two approaches of

semanticlevel
1) syntax-driven semantic analysis,
2) Semantic grammar
structure. For
study of the meaning of words that are associated with grammatical
It is a

understand that John is an


example, John inputs the data from this statement we can
Agent.

Tech Knowledge
bled tions
Natural Language Processing (MU) 1-6
Introduction
6. Discourse level
This level deals with the structure of different kinds
oftext. There are two types
of
discourse:
1) Anaphora resolution,
2) discourse/text structure recognition.
The words arereplaced in Anaphora resolution, for example pronouns. Discourse
structure recognition determines the
purpose of sentences in the text which enhance
meaningful illustration of the text.
7. Pragmatic level

This level deals with the of real world


use knowledge and
understanding of how thic
influences the meaning of what is being communicated. By analysing the
dimension of the documents and queries, a appropriate
more detailed representation is derived.
1.5 Knowledge in Language Processing
Language is used for communication and knowledge is
text as language and content interpreted in it. Here, we consider
as
knowledge.
Language is a medium used for
expression. It is the outer form of the content
the same content it
can be
expressed in various languages. expresses,
The question over
years can language be
content itself be separated from its content? if yes then how can
represented? Usually, the the
language but by using a different set of meaning of one language is written in the same
words. Therefore, to
process the content of it. process a language means to
Computers are not able to understand
its content in natural language, methods are developed for
the formal
language. The knowledge
body of knowledge and that
mapping
has been modified representation tool represents the whole
include new ideas and maybe through generation of new words, 0
situations.
The
language processing has different levels and each
types of knowledge. the level of
explained below ;
various levels of processing contains difrereent
processing and the type of knowledge it in wailsIs is
1.
Phonetic and
Phonological Knowledge
Phonetics deals sounds
while
organized units of speech, the phonology is the study of combination into
formation of syllables and of sounu
Phonetic and larger units.
deal with how phonological knowledge are
words are related essential for speech based
to the
sounds that realize systen
them.
TechKnouledge
Pubtitat|ot>
Natural Language Processing(MU) 1-7 Introduction

2. Morphological Knowledge
Morphology concerns word formation.
It is a study of the patterns of formatlon of words by the combination of sounds into

minimal distinctive units of meaning called mophemes.

Morphological knowledge concerns how words are constructed from morphemes.

3. Syntactic Knowledge
to form phrases, phrases combine to form
Syntax deals with how words combine
clauses and clauses join to make sentences.

Syntactic analysis concerns sentence formation.


It deals with how words can be put together to form correct sentences.

It also determines what structural role each word plays in the sentence and what

phrases are subparts of what other phrases.

4. Semantic Knowledge

It concerns meanings of the words and sentences.

has,
study of context independent meaning that is the meaning a
sentence no
This is the
matter in which context it is used.

Defining the meaning of a sentence is very difficult due to the ambiguities involved.

5. Pragmatic Knowledge
semantics.
Pragmatics is the extension of the meaning or
Pragmatics deals with the contextual aspects of meaning
in particular situations.

how sentences are used in different situations and how use affects the
It concerns

interpretation of the sentence.

6. Discourse Knowledge
Discourse concerns connected sentences.

It is a study of chunks of language which are bigger than a single sentence.

Discourse language concernsinter-sentential links that is how the immediately

preceding sentences affect the interpretation of the next sentence.

Discourse knowledge is important for interpreting pronouns and temporal aspects of


the information conveyed.

E TochKnowledge
7. World Knowledge
that all speakers share
but everyday knowledge about the
Word knowledge is nothing
world. vh
s t r u c t u r e of
the world and
knowledge
about the each
It includes the general user's beliefs
and goals.
know about the other
must
language u s e r
understanding
much better.
make the language
This essential to
Language
in Natural
1.6 Ambiguity
It is very ambíguous. Ambiguity
mcze.
structure.
form and
language has a very rich
Natural
in a language with a large-enough gramme
mar
well defined
solution. Any s e n t e n c e
not having
can have another interpretation.
related to natural language
and they are:
various forms of ambiguity
There are

1. Lexical Ambiguity

2 Syntactic Ambiguity

3. Semantic Ambiguity

Metonymy Ambiguity

1. Lexical Ambiguity
then it is known as lexical ambiguity.
When words have multiple assertion

For example:
noun o r an adjective.
the word back can be a

Noun: back stage

adjective: back door

2. Syntactic Ambiguity
2. syntactical forms or
sentences are parsed in multiple
means
Syntactic ambiguity
in different ways
sentence can be parsed

For example:
beach with my binoculars
I s a w the girl on the
could

binoculars
created. The phrase
with my
sentence, confusion in meaning is
In this

modify the verb, saw or the noun, girl.

TechKnowledge
PuDiicatf0P
Natural Language Processing (MU) 1.9 Introduction

3. Semantic Ambigulty

Semantic ambigulty is related to the sentence nterpretation.

For example:
I saw the girl on the beach with my binoculars.
The sentence means that I saw a girl through my binoculars or the girl had my

binoculars with her

4
Metonymy Ambiguity
in which the literal
Metonymy is the most difficult ambiguity. It deals with phrases
meaning is different from the figurative assertion.
For example:
Nokia us screaming for new management",

Here it really doesn't mean that the company is literally screaming.

1.7 Stages in NLP


The Fig. 1.7.1 shows the stages
There are five stages in natural language processing.
and pragmatic
lexicalanalysis, syntactic analysis,
semantic analysis, disclosureintegration,

analysis.
Lexical Analysis

Syntactic Analysis

Semantic Analysis

Disclosure Integration

Pragmatic Analysis

Fig. 1.7.1: Stages of NLP

1. Lexical Analysis
Lexical Analysis is the first stage in NLP. It is also known as morphological analysis.

is identified and analysed.


At this stage the structure of the words
Tech
P Knouledge
tations
1-10 Introductlon
Natural Language Processing(MU) in a languago
and phrases
collection of words
the
adraphs, sentences, and
means
Lexicon of a language senten
text Into paragraphs,
the whole portion of
Lexical analysis is dividing

words.

Syntactic Analysis (Parsing9)


2. for grammar
and in
ordering words in .
a way
sentence
of words in the
It involves analysis
among
the words.
the relationship
that shows by English syntacticanalyser
er.
rejected
to girl is
The sentence such as The school goes
Semantic Analysis from the text
3.
meaning or the dictionary meaning
draws the exact
analysis
mapping syntactic structures
Semantic and
nd
meaningfulness.
It is done by
checked for
The text is
task domain.
objects in the
sentence such as "hot ice-cream".
analyserneglects
The semantic

4. Discoursse Integration
4. of the sentence just before it.
sentence depends upon the meaning
The meaning of any following sentence.
about the meaning ofimmediately
Furthermore, it also brings
here "she" is a dependency pointing to
is a girl, she goes to school
For example: Meena

Meena.

5. Pragmatic Analysis
meant. It contains deriving
on what it truly
what was said is re-interpreted
During this,
which necessitate real world knowledge.
those aspects of language is with cat
can't say that John
in garden with a cat, here we
For example, John s a w Mary
a

mary is with
or
cat

NLP
1.8 Challenges of
nonetheless there are still numerous Natura
benefits;
powerful tool with
enormous
NLP is a

Language Processing challenges:


1. Contextual words and phrases and homonyms of a

have diverse meanings according


the conte
The same words and phrases can erent

pronunciation but completely an


sentence and many words have the exact same

meanings.
TechKnouledge

I E t l o v " _
Natural Language Processing (MU) 1-11 Introduction

Forexample
I ran to the store because we ran out of milk.

Can I run something past you really quick?


The house is looking really run down.

In the above three sentences the meaning of the run is different according to the context

Homonyms means the pronunciation of two or more words is same but have different
meaning. For example, their and there, right and write. This will create problem in
question answering and speech-to-text applications.

2. Synonyms
Synonyms can cause issues like contextual understanding since we use many different
words to express the identical idea.

Additionally, some of these words may convey exactly the same meaning, while some
may be levels of complexity and different people use synonyms to denote slightly
different meanings within their personal vocabulary.
For example, small, little, tiny, minute have same meaning.

3. Irony and sarcasm

Irony and sarcasm present problems for machine learning models since they usually
use

words and phrases that, strictly by definition, may be positive or negative, but truly

mean the opposite.


Models can be trained with certain indications that frequently accompany ironic or

sarcastic phrases, like yeah right, whatever, etc., and word embeddings (where words
that have the same meaning have a similar representation), but ie's still a complicated

process.

4. Ambiguity

in NLP refers to sentences and phrases that potentially have two or more
Ambiguity
possible interpretations.
There is lexical, syntactic and sematic ambiguity.

5. Errors in text or speech

Misspelled or misused words can generate problems for text analysis. Autocorrect and
grammar correction applications can handle common mistakes, but do not at all times
understand the writer's intention.

Techu b lKnowledge
tatlons
Introduction
1-12
Natural Language Processing (MU)
mispronunciations
ns,
machine to understand
wIn spoken language it is difficult for the

different accents, stammers, etc.

6. Idioms and slang


Iingo present a number of
culture-specific
idioms, and
Ormal phrases, expressions, use.
for comprehensive
for models intended
problems for NLP, especially
definition at all, and theco
se
have no dictionary
idioms may
Because as formal language, areas.
different geographic
different meanings in
even have
expressions may so new words
and increasing,
is continuously morphing
Furthermore, cultural slang

arise every day.


7. Domain specific language
different language.
and industries
often use very
Different businesses
would be very different than one used
model needed for healthcare
An NLP processing

to process legal documents.


8. Low-resource languages

machine learning NLP applications


have been mostly built for the
Artificial Intelligence,
most common, widely used languages.
It is absolutely incredible at how precise translation systems have become. However,
many languages, especially those spoken by people with less access to technology often

go overlooked and under processed.


For example, there are over 3,000 languages in Africa, alone. There simply isn't ample
data on many of these languages.

1.9 Applications of NLP

The first application area of Natural Language Processing is machine translation. Machine
translate considered complete linguistic
analysis of the natural language sentences as well as
linguistic generation of and output sentence. There is vast
applications of NLP are as follows:
progress happened in NLP field. Some

1. Machine translation
2. Speech recognition
3. Speech synthesis

5. Information extraction
4.
Informationretrieval
6
7.
Question answering
Text summarization.
8. Sentiment Analysis

Tech Knouledge
(MU) Introduction
Natural Language Processing 1-13

1. Machine translation

In machine translation, the translation of the text in one human language to another
human language is performed automatically.
and
For performing the translation, it is important to have the knowledge of the words
of the
phrases, grammar of two languages that are involved in Translation, semantics
languages and the Knowledge of the word.
2. Speech recognition
2.
Speech recognition is the process where the acoustic peech signals are mapped to the

set of words.

As there is wide variation in the pronunciation of the word, homonym for example, sea

and see, acoustic ambiguities like in the rest and interest.


3. Speech synthesis

Automatic production of speech is known as speech synthesis. It means speaking a

sentence in natural language.


or reads storybooks for you.
The speech synthesis system reads mails on your telephone
For generating the utterances text processing is required, so, NLP is an important

component in speech synthesis system.


4. Information retrieval

In Information retrieval the relevant documents


related to the user's queries are
modification, word sense
identified. In - Information retrieval indexing, query
used for enhancing the performance.
disambiguation, and knowledge bases are
of Contemporary English (LDOCE) Are
For example, wordnet, and Longman Dictionary
Information retrieval research.
some useful lexical resources for

5. Information extraction
of NLP. It is used for
Information extraction is one of the most significant applications
information from unstructured or semi-structured machine
extracting structured
extraction system captures and outputs factual
readable documents. Information
information contained within a document.
extraction system also response to user's
Like Information retrieval system information
retrieval system, the information required is not
information need. Unlike the Information
it is stated as redefine database schemas all templates.
expressed as a keyword query. Instead,
In the Information retrieval system, it identifies a subset of documents in a large
consider a library scenario subset of resources
repository of text database for example
a

in a library. Information extraction system identifies a subset of information within a


document that fits the predefined template.
F Tech Knowledge
u b l c a tlons
Natural Language Processing (MU) 1-14
Introductlon
6. Question answering

The question answers system tries to find out the correct answer or part of tho
part of the
text
where the answer appears for the given question and a set of documents

The question answers system returns a full document that is relevant to


to the
the user's

query. The question answer system uses an


information action
extra system
for
identifying the entities in the text.
A question answering system needed more NLP than an Information retrieval system.
n or
an information extraction system. It needed process analysis of question and portionse
ns of
to answer
text and also semantic as well as background knowledge certain type of
questions.
7. Text summarization

Text summarization means creating short, correct summary of longer text documents
Automatic text summarization will assist us with appropriate information in less time.
NLP has an important role in developing an automatic text summarization.
Text summarisation involves syntactic, semantics, and discourse level
processing of text
8. Sentiment Analysis

Sentiment Analysis is also referred as


opinion mining. It is used on the web to
the attitude, behaviour, and emotional state analyse
of the sender.
This
application is implemented through a
combination of NLP and statistics
assigning the values to the text such as by
the mood of the positive, negative, or natural and then recognize
context for
example, happy, sad, angry, etc.
Review Questions
Q.1 What is NLP? What are the applications of NLP ?
Q.2 Explain generic NLP system.
Q.3 what are the levels of NLP ?
Q.4 Explain the stages of NLP ?
Q.5
Explain the challenges in NLP?
Q.6 List and
explain applications of NLP.
Q.7 Explain the knowledge level?

You might also like