Unit - 5 Natural Language Processing
Unit - 5 Natural Language Processing
Unit - 5 Natural Language Processing
Natural Language
Refers to the language spoken by people, Eg. English,
Hindi as opposed to artificial languages like C++, Java
etc.
Natural Language Processing
Applications that deal with natural language in a way or
another
[Computational Linguistics]
Doing linguistics on computers
More on the linguistic side than NLP , but closely related.
Why Natural Language Processing?
Sdgfdgh ngmfngmng
Fgfh kgokro gkjgkajh
Dfagfh fghyhyt
Fdgfh5ythtt
Computers
Artificial
Databases Algorithms Networking
Intelligence
Machine
Information Language
Translation
Retrieval Analysis
Semantics Parsing
Linguistics Levels of Analysis
Speech
Written Language
Phonology: Sounds / Letters / Pronunciation
Morphology: the structure of words
Syntax: How these sequences are structured
Semantics: meaning of the strings
Interaction between levels
Issues in Syntax
Learning?
Assume a large amount of annotated data = training
Assume a new text not annotated = test
Learn from previous experience to classify new data
Decision trees, memory based learning, neural networks
Machine Learning
Issues in Information Extraction
Extract information
Detect new patterns
Issues in Information Retrieval
General Model
A huge collection of texts
A query
Task: find documents that are relervant to the given
query
How? Create an index, like the index in a boook
Examples: Google, Yahoo, Altavista, etc.
Contd…
Index meaning
Search for plant(=living organism)
Should not retrieve texts with plant
(=industrial plant)
But should retrieve documents including “flora” or
other related terms
Issues in Machine Translations
Morphological Analysis
Syntactic Analysis
Semantic Analysis
Discourse Integration
Pragmatic Analysis
Morphological Analysis
Individual words are analyzed into their components and
nonword tokens such as punctuation are separated from
words
Syntactic Analysis
Linear sequences of words are transformed into structures
that show how the words relate to each other
Semantic Analysis
Structures created by the syntactic analyzer are assigned
meanings
Discourse Integration
The meaning of an individual sentence may depend on
the sentences that precede it and may influence the
meanings of the sentences that follow it
Pragmatic Analysis
The structure representing what was said is
reinterpreted to determine what was actually meant.
Morphological Analysis
Example:
I want to print Bill’s .init file
Want, print, file can all function as more than one syntactic
category
Syntactic Analysis
Exploits the results of morphological analysis to
build a structural description of the sentence.
SNP VP VPV
NPthe NP1 VPV NP
NPPRO Nfile | Printer
NPPN PN Bill
NP NP1 PROI
NPI ADJS N ADJ short | long | fast
ADJS ∈ | ADJ ADJS V printed | created | want
Parse tree simply records the rules and how
they are matched
The parsing process takes the rules of the
grammar and compares them against the
input sentence.
Each rule that matches add something to the
complete structure
S
NP VP
PN V NP
Top-Down Parsing
Begin with the start symbol and apply the grammar
rules forward until the symbols at the terminals of the
tree correspond to the components of the sentence
being parsed
Bottom-Up Parsing
Begin with a sentence to be parsed and apply the
grammar rules backward until a single tree whose
terminals are the words of the sentence
Two paths
1. Have the students who missed the exam take it today
2. Have the students who missed the exam taken it
today?
Parser
Chart Parsers
Provides a way of avoiding backup by storing
intermediate constituents
Lexical Processing
Look up the individual words in a dictionary and
extract their meaning
Eg: Diamond
A geometrical shape
A baseball field
A valuable gemstone
The process of determining the correct meaning of an
individual word is called word sense disambiguation
or lexical disambiguation.
Semantic grammars
Case grammars
Conceptual Parsing
Compositional semantic interpretation / Montague
Analysis
Semantic Grammars