NLP Key Points
NLP Key Points
NLP Key Points
A chatbot is a computer program that can learn over time how to best interact with
humans. It can answer questions and troubleshoot customer problems, evaluate and
qualify prospects, generate sales leads and increase sales on an ecommerce site.
Tokenization- After segmenting the sentences, each sentence is then further divided
into tokens. Tokens is a term used for any word or number or special character
occurring in a sentence. Under tokenization, every word, number and special
character is considered separately and each of them is now a separate token.
Removing Stop words, Special Characters and Numbers - In this step, the tokens
which are not necessary are removed from the token list.
Converting text to a common case -After the stop words removal, we convert the
whole text into a similar case, preferably lower case. This ensures that the
case-sensitivity of the machine does not consider same words as different just
because of different cases.
Stemming In this step, the remaining words are reduced to their root words. In other
words, stemming is the process in which the affixes of words are removed and the
words are converted to their base form.
Lemmatization -in lemmatization, the word we get after affix removal (also known as
lemma) is a meaningful one. With this we have normalized our text to tokens which
are the simplest form of words present in the corpus. Now it is time to convert the
tokens into numbers. For this, we would use the Bag of Words algorithm