What Is Natural Language Processing?
What Is Natural Language Processing?
Natural language processing strives to build machines that understand and respond to
text or voice data—and respond with text or speech of their own—in much the same
way humans do.
NLP drives computer programs that translate text from one language to another,
respond to spoken commands, and summarize large volumes of text rapidly—even in
real time. There’s a good chance you’ve interacted with NLP in the form of voice-
operated GPS systems, digital assistants, speech-to-text dictation software, customer
service chatbots, and other consumer conveniences. But NLP also plays a growing role
in enterprise solutions that help streamline business operations, increase employee
productivity, and simplify mission-critical business processes.
NLP tasks
Human language is filled with ambiguities that make it incredibly difficult to write
software that accurately determines the intended meaning of text or voice data.
Homonyms, homophones, sarcasm, idioms, metaphors, grammar and usage
exceptions, variations in sentence structure—these just a few of the irregularities of
human language that take humans years to learn, but that programmers must teach
natural language-driven applications to recognize and understand accurately from the
start, if those applications are going to be useful.
Several NLP tasks break down human text and voice data in ways that help the
computer make sense of what it's ingesting. Some of these tasks include the following:
See the blog post “NLP vs. NLU vs. NLG: the differences between three natural
language processing concepts” for a deeper look into how these concepts relate.
The Python programing language provides a wide range of tools and libraries for
attacking specific NLP tasks. Many of these are found in the Natural Language Toolkit,
or NLTK, an open source collection of libraries, programs, and education resources for
building NLP programs.
The NLTK includes libraries for many of the NLP tasks listed above, plus libraries for
subtasks, such as sentence parsing, word segmentation, stemming and lemmatization
(methods of trimming words down to their roots), and tokenization (for breaking
phrases, sentences, paragraphs and passages into tokens that help the computer better
understand the text). It also includes libraries for implementing capabilities such as
semantic reasoning, the ability to reach logical conclusions based on facts extracted
from text.
Enter statistical NLP, which combines computer algorithms with machine learning
and deep learning models to automatically extract, classify, and label elements of text
and voice data and then assign a statistical likelihood to each possible meaning of those
elements. Today, deep learning models and learning techniques based on convolutional
neural networks (CNNs) and recurrent neural networks (RNNs) enable NLP systems
that 'learn' as they work and extract ever more accurate meaning from huge volumes of
raw, unstructured, and unlabeled text and voice data sets.
For a deeper dive into the nuances between these technologies and their learning
approaches, see “AI vs. Machine Learning vs. Deep Learning vs. Neural Networks:
What’s the Difference?”
Spam detection: You may not think of spam detection as an NLP solution, but
the best spam detection technologies use NLP's text classification capabilities
to scan emails for language that often indicates spam or phishing. These
indicators can include overuse of financial terms, characteristic bad grammar,
threatening language, inappropriate urgency, misspelled company names, and
more. Spam detection is one of a handful of NLP problems that experts
consider 'mostly solved' (although you may argue that this doesn’t match your
email experience).
Machine translation: Google Translate is an example of widely available NLP
technology at work. Truly useful machine translation involves more than
replacing words in one language with words of another. Effective translation
has to capture accurately the meaning and tone of the input language and
translate it to text with the same meaning and desired impact in the output
language. Machine translation tools are making good progress in terms of
accuracy. A great way to test any machine translation tool is to translate text to
one language and then back to the original. An oft-cited classic example: Not
long ago, translating “The spirit is willing but the flesh is weak” from English to
Russian and back yielded “The vodka is good but the meat is rotten.” Today,
the result is “The spirit desires, but the flesh is weak,” which isn’t perfect, but
inspires much more confidence in the English-to-Russian translation.
Virtual agents and chatbots: Virtual agents such as Apple's Siri and
Amazon's Alexa use speech recognition to recognize patterns in voice
commands and natural language generation to respond with appropriate action
or helpful comments. Chatbots perform the same magic in response to typed
text entries. The best of these also learn to recognize contextual clues about
human requests and use them to provide even better responses or options over
time. The next enhancement for these applications is question answering, the
ability to respond to our questions—anticipated or not—with relevant and
helpful answers in their own words.
Social media sentiment analysis: NLP has become an essential business tool
for uncovering hidden data insights from social media channels. Sentiment
analysis can analyze language used in social media posts, responses, reviews,
and more to extract attitudes and emotions in response to products,
promotions, and events–information companies can use in product designs,
advertising campaigns, and more.
Text summarization: Text summarization uses NLP techniques to digest huge
volumes of digital text and create summaries and synopses for indexes,
research databases, or busy readers who don't have time to read full text. The
best text summarization applications use semantic reasoning and natural
language generation (NLG) to add useful context and conclusions to
summaries.