Natural Language Processing
SCHOOL: School of Engineering COURSE TYPE: L-T-P
NAME: Natural Language Processing
COURSE CREDIT: 04 [4-0-0]
with Python
DEPARTMENT: Computer Science CATEGORY:
CODE: XXXXXX SEMESTER: 4th
THEORY
Learning objectives: On completion of the course, student will be able to: Extract information
from text automatically using concepts and methods from natural language processing (NLP).
Develop speech-based applications that use speech analysis (phonetics, speech recognition, and
synthesis) and can analyze the syntax, semantics, and pragmatics of a statement written in a
natural language.
Prerequisite: Before learning the concepts of Natural Language Processing, you should have a
basic knowledge prior to Design and Analysis of Algorithms, Formal Language and Automata,
Compiler Design etc.
Course content/Syllabus:
Module no. No of Weightage (%)
lecture/Cont
act hour
Module-I: Introduction to NLP 15 25
Module-II: Semantics and Word Vectors 6 15
Module-III: N‐Gram Language Model 4 10
Module-IV: Text Representation 5 10
Module-V: Parsing 6 15
Module-VI: Applications of NLP 12 25
SYLLABUS OUTLINE:
Module-I: Introduction to NLP [15L]
Introduction to NLP, Origins and challenges of NLP, Sentence Segmentation, Tokenization,
Parts of Speech assessment, Stemming, Lemmatization, Stop Words.
Module-II: Semantics and Word Vectors [6L]
Overview of semantics, Word vectors, Word embeddings, Representation of words and phrases.
Module-III: N‐Gram Language Model [4L]
Introduction to N‐Gram, N‐Gram probability estimation and perplexity, Smoothing technique.
Module-IV: Text Representation: [5L]
Bag‐of‐word: TF/IDF, Count vector, Vector space Model, Latent semantic Analysis,
Word embedding, Word2Vec, Glove, fastText, Sentence embedding Technique: Doc2Vec.
Module-V: Parsing: [6L]
Syntax Parsing, Grammar formalisms and treebanks, Parsing with Context Free Grammars,
Features and Unification, Statistical parsing and probabilistic CFGs.
Module-VI: Applications of NLP: [12L]
Information Extraction, Introduction to Named Entity Recognition and Relation Extraction,
Question Answering, Text Summarization, Dialog System, Machine Translation.
Pedagogy for Course Delivery: Hybrid Mode (Offline Class/Presentation/Video/MOODLE/NPTEL)
List of Professional Skill Development Activities (PSDA):NA
Continuous assessment: Quiz/assessment/presentation/problem solving etc.
Text & Reference books:
Text Books:
1. Tanveer Siddiqui, U.S. Tiwary, “Natural Language Processing and Information Retrieval”,
Oxford University Press, 2008.
2. Anne Kao and Stephen R. Poteet (Eds), “Natural Language Processing and Text Mining”,
Springer-Verlag London Limited 2007.
Reference Books:
1. Daniel Jurafsky and James H Martin, “Speech and Language Processing: An introduction to
Natural Language Processing, Computational Linguistics and Speech Recognition”, 2nd Edition,
Prentice Hall, 2008.
2. James Allen, “Natural Language Understanding”, 2nd edition, Benjamin/Cummings
publishing company, 1995.
3. Gerald J. Kowalski and Mark. T. Maybury, “Information Storage and Retrieval systems”,
Kluwer academic Publishers, 2000.
CO-PO Mapping
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 3 2 - - - - - - - - - 1
CO2 - 2 - - - - - - - - -
CO3 2 2 - - - - - - - - - 1
CO4 - 3 3 2 - - - - - - - 2
CO5 1 - 3 3 3 - - - - - - 1
CO6 1 - 3 2 2 1 - - - - - 1
Avg 1.75 2.25 3 2.33 2.5 1 - - - - - 1.5
Highly Correlated: 3
Moderately Correlated: 2
Slightly Correlated: 1
Course learning outcome: (CO)
1XXXXX. CO1: To be able to understand the fundamental concepts and techniques of natural language
processing. (BT2)
1XXXXX. CO2: To be able to distinguish among the various techniques, taking into account the
assumptions, strengths, and weaknesses of each. (BT2)
1XXXXX. CO3: To be able to understand appropriate descriptions, visualizations, and statistics to
communicate the problems and their solutions. (BT2)
1XXXXX. CO4: To be able to analyze large volume text data generated from a range of real-world
applications. Analyze large volume text data generated from a range of real-world applications. (BT4)
1XXXXX. CO5: To be able to apply machine learning algorithms to natural language processing. (BT5)
1XXXXX. CO6: To be able to develop various applications that use text and speech analysis. (BT6)