NLP Techmax NLP
NLP Techmax NLP
(ReferChapter-2)
3. Syntax Part-0f-Speech tagging( POS)- Tag set for English ( Penn 10
analysis
Treebank), Rule based POS tagging. Stochastic POS
tagging, Issues -Multiple tags &words, Unknown words.
Introduction to CFG, Sequence labeling Hidden Markov
Model (HMM), Maximurm Entropy, and Conditional
Random Field (CRF). (Refer Chapter-3)
Lexical Semantics, Attachment for fragment of English 10
MODULE I
Chapter 1 : Introduction
1-12
MODULE I1
Chapter 2: Word Level
Analysis
Morphology analysis- survey of
2-1 to 2-24
morphology, Lemmatization, English Morphology,
Inflectional morphology 8
Regular
Morphological parsing with FST, Lexicon expression, finite automata,
N-gram for spelling correction. free FST Porter finite state Derivational
transducers (FST),
stemmer. N-Grams-
2.1 N-gram language model,
Morphology Analysis.
2.2
Survey of English
Morphology... O*#909S90G* #SRDBg#ge#GVG G S14954n*Bas usdgndud4si sugusg*D*nd dnas d as gor0s 0gas ds *dBes ds * * #*s esns aseone*
.2-2
2.3
2.4
Inflectional
RI9* 9GDUS* *GDnMorphology..
Morphology and Derlvational
Lemmatizatlon...
momsndsosn4oanvas*s gsoaousegnsans gs10s56aD0s*novs s as sgases s#Sn9 s1adas s.2-2
be
2-3
2.4.1
Difference between
2.5 Stemming and Lemmatízation.. ss**sssSsSSs s2-6
Regular Expresslon... #*****011099090t10 *4#nn0004s440##**************"
.2-6
2-7
TechyDITCAtI0ns
Knouledge
Natural Language Processing(MU) Table of Contents
Module I1I
3-1 to 3-26
Chapter 3: Syntax Analysis
Part-Of-Speech tagging( POS)- Tag set for English (Penn Treebank), Rule based POS tagging,
Stochastic POS tagging, ssues -Multiple tags & words, Unknown words. Introduction to CFG, Sequence
- 5
aanaassaunaauanasssnaO .3-8
3.2 Other Issues. nunusannantnl
******aa********************************aaasaa******a*4a*******aa**astsaa***sssasuman .3-8
3.2.1 Multiple Tags and Multiple Words..
*********************sn*****************a*****ha**************************************************************************************** 3-8
3.2.2 Unknown Words.
3-9
Introduction to CFG...
a 4 8 * * * * * a a a A R A A B A R 4 L s u a a s 4 a
3.3 nsonsansssseal
3-9
3.3.1 Constituency..
.3-11
3.3.2 Context-Free Grammars. ad
*****************************************AA ***** ****************s****s******ssa
*********************************"***********************************************************ana e 3-16
3.3.3(A)Top-down parsing.
3-17
3.3.3(B) Bottom-up parsing
a * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ° * * * * * * * * * * * *
************as
ssansane esu
3-20
3.3.4 Coordination.
Iech Kaewledge
uDIIC tions
Natural Language Processing (MU) 3
3.3.5 Agreement.
* e9**********°***************************°*" 4-12
4.3.2 Relations Between
Words/Senses.. ************************** **********"*******"sse**a**ss***************e*a*s****ee*s**a** 4-12
4.4 WordNet..DBaubenasnsapaaaBnapenepuasaeBsuapauasuseppuuduseus ususeuesuuudUpausdeauuasuedsesuesduuussssaanusennaasesdneaeeu 4-17
ETech Knowledge
PubilC 3tions
Natural Language Processing (MU) Table of Contents
MODULE V
5-1 to 5-14
Chapter 5: Pragmatics
reference
Discourse- Reference Resolution...
5.1
*************s********************a**taseess*************anonenap**#a*es**********************************************************
"a**a*********e******"**sseose**********************a***a**e**************as**a*a"**a*a**************************
5.1.3 Discourse Segmentation.
5.3.1
0*A0*a*****.*******"*******eaa
*p*s******t*s**is*setsa*ea**ae0*******"****°*******"*********************************************"**** *
UNIT VI
6-1 to 6-26
Chapter 6: Applications (Preferably ForIndian Regional Languages)
Tech Knowledge
PubIiCattuns
Natural Language Processing(MU) 5
6.1.2 Approaches..
Table of Content
tenta
99******* **"**** ******°********abea**io peeedne
. . .s4esesesse********** ************0*s..
6.1.3 Different Types of Machine Translation..s4s0noed4409s200
999sssddstoss4svuee
3
6.1.4 The benefits and Uses of Machine Translation... esssteesevss4ese*H .b-
.64
. .44a6sros*4* ****************n
6.1.5 Difference between Rule-Based MT vs. Statistical MT.. 6-5
sseepedee
TRR
Introduction
Module 1
Syllabus
History of NLP, Generic NLP system, levels of NLP, Knowledge in language processing, Ambiguity
in Natural language, stages in NLP, challenges of NLP Applications of NLP
TOPICS
. 1-2
1.1 What is Natural Language Processing ?
1-2
1.2 History of NLP..
1-3
.3 Generic NLP System. RORSssanopyauaupnapsappppBassepsspnaagesssaegn
uagenaueuupepnseesamse,
*saa.-4
1.4 Levels of NLP. POBassenngeesasammuanonan
1-6
1.5 Knowledge in Language Processing.
1-8
1.6 Ambiguity in Natural Language.
** -9
1.7 Stages in NLP..
Challenges of NLP.. * 10
1.8
Language is used to
shape the thoughts; it has structure and also it carries a
using our
language, we naturally learn the new concepts and hardly realise
meani
how we By nr cess
this natural language.
Natural Language
Processing is the process of computer analysis of input
human language, and conversion of this provided in
input into a useful form of representation,
.
Natural language
processing
is concerned with the development of
aspects of human computational models af
language processing. The following are the two main
reasons for
developments Cuch
table as a theme, and stone as an instrument. SHRDLU and LUNAR were the stems in
key »y*
the year 1960 to 1980.
R E TechKnowledge
u b I I l d t v
Natural Language Processing (MU) 1-3 Introduction
Terry Winograd in 1968-70 wrote a program named SHRDLU. This program helps users to
communicate with the computer and moving objects. It can handle orders such as pick up the
green ball and likewise answer the questions like What is inside the black box. The SHRDLU's
key importance is it shows those syntax, semantics, and reasoning about the world that can
be combined to produce a system that understands a natural language
Another system is LUNAR. It is the typical example of a Natural Language database interface
system. LUNAR used ATNs and Woods' Procedural Semantics. It was proficient of translating
extravagant natural language expressions into database queries and handle 78% of requests
without mistakes
1980 Current
NLP was based on complex sets of hand-written rules till the year 1980. machine learning
algorithms were introduced after 1980 for language processing. NLP started growing faster in the
ng of the year 1990s and accomplished good process accuracy, especially in English
Grammar. Electronic text was introduced in 1990 thatgiven a decent resource for training and
examining natural language programs. Other factors may include the availability of computers
with fast CPUs and more memory. The keyfeature behind the progress of natural language
become
processing was the Internet. In the year1990 Probabilistic and data-driven models had
become
quite standard. After that, in the year 2000, ahuge amount of spoken and textual data
available.
1. ELIZA
2. SysTran
3. TAUM METEO
4. SHRDLU
5. LUNAR
Tech Knouledge
Natural Language Processing (M0) 1-4
ELIZA: ELIZA
Introduction
an early natural language understanding by Jose ted
seph
program created
by
Weizenbaum.The human conversation with the user is mimicked by using syntactic
ntactic pattern
patter
his system demonstrates communication between humans and machines.
3. TAUM METEO: TAUM METEO, is natural language generation system. This system
a
was
used in Canada for generating weather reports. This system accepts daily weather
reporh
orts in
English and French.
. SHRDLU: Terry Winograd in 1968-70 wrote a program named SHRDLU. This program helne
users to communicate with the computer and moving objects. It can handle orders such
nelps
asas
pick up the green ball and likewise answer the questions like What is inside the black
box
Ox.
The SHRDLU's key importance is it shows those syntax, semantics, and
reasoning about the
world that can be combined to produce a system that understands a natural
language.
5. LUNAR: LUNAR is the typical example of a Natural Language database interface system. It
was an early question answer system that answers questions related to moon rock. It
was
proficient of translating extravagant natural language expressions into database
handle 78% of requests without mistakes.
queries and
Levels of NLP
1. Phonology level
2. Morphological level
3. Lexical level
4. Syntactic level
5. Semantic level
4. Disclosure level
5. Pragmatic level
1. Phonology level
This level basically deals with the pronunciation. As English spelling is especially oniy
for example, the h
partially phonemic, John inputs the data does not show these very clearly;
in John is silent and the two as in data resemble to very unlike sounds
2. Morphological level
3. Lexical level
their lexical
The lexical level deals with the study at the level of words with respect to
collection of individual
meaning and Part-Of-Speech (POs). This level uses lexicon that is a
is abstract unit of
lexemes. A lexeme is a, basic unit of lexical meaning; which an
in the phrase/sentence.
lexical meaning can only be derived in context with other words used
4. Syntactic level
5. Semantics level
semanticlevel
1) syntax-driven semantic analysis,
2) Semantic grammar
structure. For
study of the meaning of words that are associated with grammatical
It is a
Tech Knowledge
bled tions
Natural Language Processing (MU) 1-6
Introduction
6. Discourse level
This level deals with the structure of different kinds
oftext. There are two types
of
discourse:
1) Anaphora resolution,
2) discourse/text structure recognition.
The words arereplaced in Anaphora resolution, for example pronouns. Discourse
structure recognition determines the
purpose of sentences in the text which enhance
meaningful illustration of the text.
7. Pragmatic level
2. Morphological Knowledge
Morphology concerns word formation.
It is a study of the patterns of formatlon of words by the combination of sounds into
3. Syntactic Knowledge
to form phrases, phrases combine to form
Syntax deals with how words combine
clauses and clauses join to make sentences.
It also determines what structural role each word plays in the sentence and what
4. Semantic Knowledge
has,
study of context independent meaning that is the meaning a
sentence no
This is the
matter in which context it is used.
Defining the meaning of a sentence is very difficult due to the ambiguities involved.
5. Pragmatic Knowledge
semantics.
Pragmatics is the extension of the meaning or
Pragmatics deals with the contextual aspects of meaning
in particular situations.
how sentences are used in different situations and how use affects the
It concerns
6. Discourse Knowledge
Discourse concerns connected sentences.
E TochKnowledge
7. World Knowledge
that all speakers share
but everyday knowledge about the
Word knowledge is nothing
world. vh
s t r u c t u r e of
the world and
knowledge
about the each
It includes the general user's beliefs
and goals.
know about the other
must
language u s e r
understanding
much better.
make the language
This essential to
Language
in Natural
1.6 Ambiguity
It is very ambíguous. Ambiguity
mcze.
structure.
form and
language has a very rich
Natural
in a language with a large-enough gramme
mar
well defined
solution. Any s e n t e n c e
not having
can have another interpretation.
related to natural language
and they are:
various forms of ambiguity
There are
1. Lexical Ambiguity
2 Syntactic Ambiguity
3. Semantic Ambiguity
Metonymy Ambiguity
1. Lexical Ambiguity
then it is known as lexical ambiguity.
When words have multiple assertion
For example:
noun o r an adjective.
the word back can be a
2. Syntactic Ambiguity
2. syntactical forms or
sentences are parsed in multiple
means
Syntactic ambiguity
in different ways
sentence can be parsed
For example:
beach with my binoculars
I s a w the girl on the
could
binoculars
created. The phrase
with my
sentence, confusion in meaning is
In this
TechKnowledge
PuDiicatf0P
Natural Language Processing (MU) 1.9 Introduction
3. Semantic Ambigulty
For example:
I saw the girl on the beach with my binoculars.
The sentence means that I saw a girl through my binoculars or the girl had my
4
Metonymy Ambiguity
in which the literal
Metonymy is the most difficult ambiguity. It deals with phrases
meaning is different from the figurative assertion.
For example:
Nokia us screaming for new management",
analysis.
Lexical Analysis
Syntactic Analysis
Semantic Analysis
Disclosure Integration
Pragmatic Analysis
1. Lexical Analysis
Lexical Analysis is the first stage in NLP. It is also known as morphological analysis.
words.
4. Discoursse Integration
4. of the sentence just before it.
sentence depends upon the meaning
The meaning of any following sentence.
about the meaning ofimmediately
Furthermore, it also brings
here "she" is a dependency pointing to
is a girl, she goes to school
For example: Meena
Meena.
5. Pragmatic Analysis
meant. It contains deriving
on what it truly
what was said is re-interpreted
During this,
which necessitate real world knowledge.
those aspects of language is with cat
can't say that John
in garden with a cat, here we
For example, John s a w Mary
a
mary is with
or
cat
NLP
1.8 Challenges of
nonetheless there are still numerous Natura
benefits;
powerful tool with
enormous
NLP is a
meanings.
TechKnouledge
I E t l o v " _
Natural Language Processing (MU) 1-11 Introduction
Forexample
I ran to the store because we ran out of milk.
In the above three sentences the meaning of the run is different according to the context
Homonyms means the pronunciation of two or more words is same but have different
meaning. For example, their and there, right and write. This will create problem in
question answering and speech-to-text applications.
2. Synonyms
Synonyms can cause issues like contextual understanding since we use many different
words to express the identical idea.
Additionally, some of these words may convey exactly the same meaning, while some
may be levels of complexity and different people use synonyms to denote slightly
different meanings within their personal vocabulary.
For example, small, little, tiny, minute have same meaning.
Irony and sarcasm present problems for machine learning models since they usually
use
words and phrases that, strictly by definition, may be positive or negative, but truly
sarcastic phrases, like yeah right, whatever, etc., and word embeddings (where words
that have the same meaning have a similar representation), but ie's still a complicated
process.
4. Ambiguity
in NLP refers to sentences and phrases that potentially have two or more
Ambiguity
possible interpretations.
There is lexical, syntactic and sematic ambiguity.
Misspelled or misused words can generate problems for text analysis. Autocorrect and
grammar correction applications can handle common mistakes, but do not at all times
understand the writer's intention.
Techu b lKnowledge
tatlons
Introduction
1-12
Natural Language Processing (MU)
mispronunciations
ns,
machine to understand
wIn spoken language it is difficult for the
The first application area of Natural Language Processing is machine translation. Machine
translate considered complete linguistic
analysis of the natural language sentences as well as
linguistic generation of and output sentence. There is vast
applications of NLP are as follows:
progress happened in NLP field. Some
1. Machine translation
2. Speech recognition
3. Speech synthesis
5. Information extraction
4.
Informationretrieval
6
7.
Question answering
Text summarization.
8. Sentiment Analysis
Tech Knouledge
(MU) Introduction
Natural Language Processing 1-13
1. Machine translation
In machine translation, the translation of the text in one human language to another
human language is performed automatically.
and
For performing the translation, it is important to have the knowledge of the words
of the
phrases, grammar of two languages that are involved in Translation, semantics
languages and the Knowledge of the word.
2. Speech recognition
2.
Speech recognition is the process where the acoustic peech signals are mapped to the
set of words.
As there is wide variation in the pronunciation of the word, homonym for example, sea
5. Information extraction
of NLP. It is used for
Information extraction is one of the most significant applications
information from unstructured or semi-structured machine
extracting structured
extraction system captures and outputs factual
readable documents. Information
information contained within a document.
extraction system also response to user's
Like Information retrieval system information
retrieval system, the information required is not
information need. Unlike the Information
it is stated as redefine database schemas all templates.
expressed as a keyword query. Instead,
In the Information retrieval system, it identifies a subset of documents in a large
consider a library scenario subset of resources
repository of text database for example
a
The question answers system tries to find out the correct answer or part of tho
part of the
text
where the answer appears for the given question and a set of documents
Text summarization means creating short, correct summary of longer text documents
Automatic text summarization will assist us with appropriate information in less time.
NLP has an important role in developing an automatic text summarization.
Text summarisation involves syntactic, semantics, and discourse level
processing of text
8. Sentiment Analysis