Natural Language Processing: Bachelor of Technology Computer Science and Engineering

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Natural Language Processing

Bachelor of Technology
Computer Science and Engineering

September 2019
CONTENTS

1. Abstract

2. Introduction

3. Brief History

4. Methodology Used In Natural Language Processing

5. Difficulties in NLP

6. Current Study

7. An Application of NLP

8. Conclusion

9. References
Abstract
NLP is a natural enabler of the intelligent, intuitive applications that most of us use every day and is
transforming the way that humans and computers communicate with each other. NLP techniques incorporate a
variety of methods to enable a machine to understand what’s being said or written in human communication—
not just single words—in a comprehensive way. This includes linguistics, semantics, statistics and machine
learning to extract entities and relationships, disambiguate meaning and decipher ambiguities in language. [1]
Powered by these technologies, the future of natural language processing will be able to capitalize on its
potential for human-like understanding of speech and text through a variety of applications.

This paper gives the basic idea about natural language processing and briefly discusses its history. It then goes
on to explain the different methodologies used by NLP and the challenges faced by it. The current focus of the
people is then discussed and an application of NLP is mentioned. The paper finally concludes by summarizing
the above information and giving a glance of the future of NLP.

Introduction
Natural Language Processing, usually shortened as NLP, is a branch of artificial intelligence that deals with the
interaction between computers and humans using the natural language. The ultimate objective of NLP is to read,
decipher, understand, and make sense of the human languages in a manner that is valuable.
Most NLP techniques rely on machine learning to derive meaning from human languages.
.By utilizing NLP, developers can organize and structure knowledge to perform tasks such as automatic
summarization, translation, named entity recognition, relationship extraction, sentiment analysis, speech
recognition, and topic segmentation.[1]

Natural Language Processing is the driving force behind the following common applications:
1. Language translation applications such as Google Translate
2. Word Processors such as Microsoft Word and Grammarly that employ NLP to check grammatical accuracy
of texts.
3. Interactive Voice Response (IVR) applications used in call centers to respond to certain users’ requests.
4. Personal assistant applications such as OK Google, Siri, Cortana, and Alexa.

Brief History

Proposals for mechanical translators of languages pre-date the invention of the digital computer. However the
first major breakthrough in NLP was in 1950 when Alan Turing wrote a paper describing a test for a “thinking”
machine. He stated that if a machine could be part of a conversation through the use of a teleprinter, and it
imitated a human so completely there were no noticeable differences, then the machine could be considered
capable of thinking. Shortly after this, in 1952, the Hodgkin-Huxley model showed how the brain uses neurons
in forming an electrical network. [3]

Until the 1980s, the majority of NLP systems used complex, “handwritten” rules. But in the late 1980s, a
revolution in NLP came about. This was the result of both the steady increase of computational power, and the
shift to Machine Learning algorithms. While some of the early Machine Learning algorithms (decision trees
provide a good example) produced systems similar to the old school handwritten rules, research has
increasingly focused on statistical models.[4] These statistical models are capable making soft, probabilistic
decisions. Throughout the 1980s, IBM was responsible for the development of several successful, complicated
statistical models.

In the 1990s, the popularity of statistical models for Natural Language Processes analyses rose dramatically.
The pure statistics NLP methods have become remarkably valuable in keeping pace with the tremendous flow
of online text. N-Grams have become useful, recognizing and tracking clumps of linguistic data, numerically.[3]
In 1997, LSTM recurrent neural net (RNN) models were introduced, and found their niche in 2007 for voice
and text processing. Currently, neural net models are considered the cutting edge of research and development
in understanding of text and speech generation.

Methodology Used In Natural Language Processing

NLP entails applying algorithms to identify and extract the natural language rules such that the
unstructured language data is converted into a form that computers can understand. When the text has been
provided, the computer will utilize algorithms to extract meaning associated with every sentence and collect the
essential data from them. Sometimes, the computer may fail to understand the meaning of a sentence well,
leading to obscure results. Syntax and semantic analysis are two main techniques used with natural language
processing.

1. Syntax

Syntax is the arrangement of words in a sentence to make grammatical sense. NLP uses syntax to assess
meaning from a language based on grammatical rules.[2] Some syntax techniques that can be used:

 Lemmatization: It entails reducing the various inflected forms of a word into a single form for easy
analysis.
 Morphological segmentation: It involves dividing words into individual units called morphemes.
 Word segmentation: It involves dividing a large piece of continuous text into distinct units.
 Part-of-speech tagging: It involves identifying the part of speech for every word.
 Parsing: It involves undertaking grammatical analysis for the provided sentence.
 Sentence breaking: It involves placing sentence boundaries on a large piece of text.
 Stemming: It involves cutting the inflected words to their root form.

2. Semantics

Semantics refers to the meaning that is conveyed by a text. Semantic analysis is one of the difficult aspects of
Natural Language Processing that has not been fully resolved yet.
It involves applying computer algorithms to understand the meaning and interpretation of words and how
sentences are structured.[2]
Here are some techniques in semantic analysis:
 Named entity recognition (NER): It involves determining the parts of a text that can be identified and
categorized into preset groups. Examples of such groups include names of people and names of places.
 Word sense disambiguation: It involves giving meaning to a word based on the context.
 Natural language generation: It involves using databases to derive semantic intentions and convert
them into human language.

Difficulties in NLP

NLP is characterized as a difficult problem in computer science. It has not yet been wholly perfected.
For example, semantic analysis can still be a challenge for NLP. Other difficulties include the fact that abstract
use of language is typically tricky for programs to understand. For instance, NLP does not pick up sarcasm
easily. These topics usually require the understanding of the words being used and the context in which the way
they are being used.

There can be different levels of ambiguity –


1. Lexical ambiguity − It is at very primitive level such as word-level. For example, treating the word “board”
as noun or verb?
2. Syntax Level ambiguity − A sentence can be parsed in different ways. For example, “He lifted the beetle with
red cap.” − Did he use cap to lift the beetle or he lifted a beetle that had red cap?
3. Referential ambiguity − Referring to something using pronouns. For example, Rima went to Gauri. She said,
“I am tired.” − Exactly who is tired? One input can mean different meanings. Many inputs can mean the same
thing.
As another example, a sentence can change meaning depending on which word the speaker puts stress on. NLP
is also challenged by the fact that language, and the way people use it, is continually changing. Human language
is rarely precise, or plainly spoken. To understand human language is to understand not only the words, but the
concepts and how they’re linked together to create meaning Despite language being one of the easiest things for
the human mind to learn, the ambiguity of language is what makes natural language processing a difficult
problem for computers to master.

Current Study

Current approaches to NLP are based on deep learning, a type of AI that examines and uses patterns in data to
improve a program's understanding. Deep learning models require massive amounts of labeled data to train on
and identify relevant correlations, and assembling this kind of big data set is one of the main hurdles to NLP
currently.
Earlier approaches to NLP involved a more rules-based approach, where simpler machine
learning algorithms were told what words and phrases to look for in text and given specific responses when
those phrases appeared. But deep learning is a more flexible, intuitive approach in which algorithms learn to
identify speakers' intent from many examples, almost like how a child would learn human language.
Three tools used commonly for NLP include NLTK, Gensim, and Intel NLP Architect. NTLK, Natural
Language Toolkit, is an open source python modules with data sets and tutorials. Gensim is a Python library for
topic modeling and document indexing. Intel NLP Architect is also another Python library for deep learning
topologies and techniques.[7]
An Application of NLP

NLP is everywhere even if we don’t realize it. Does your email application automatically correct you when you
try to send an email without the attachment that you referenced in the text of the email? This is Natural
Language Processing Applications at work.
Sentiment analysis is an important application of natural language processing.
The goal of sentiment analysis is to identify sentiment among several posts or even in the same post where
emotion is not always explicitly expressed. Companies use natural language processing applications, such as
sentiment analysis, to identify opinions and sentiment online to help them understand what customers think
about their products and services (i.e., “I love the new iPhone” and, a few lines later “But sometimes it doesn’t
work well” where the person is still talking about the iPhone) and overall indicators of their reputation.[5]
Beyond determining simple polarity, sentiment analysis understands sentiment in context to help you better
understand what’s behind an expressed opinion, which can be extremely relevant in understanding and driving
purchasing decisions.[5]

Conclusion

While NLP is a relatively recent area of research and application, as compared to other information technology
approaches, there have been sufficient successes to date that suggest that NLP-based information access
technologies will continue to be a major area of research and development in information systems now and far
into the future.
The state-of the-art Natural Language Processing techniques applied to speech technologies, specifically to
Text-To-Speech synthesis and Automatic Speech Recognition. In 3TTS. The importance of NLP in processing
the input text to be synthesized is reflected. The naturalness of the speech utterances produced by the signal-
processing modules are tightly bound to the performance of the previous text-processing modules. In ASR the
use of NLP particularly is complementary.[6]
It simplifies the recognition task by assuming that the input speech utterances must be produced according to a
predefined set of grammatical rules. Its capabilities can though be enhanced through the usage of NLP aiming at
more natural interfaces with a certain degree of Knowledge.[6] Reviews the major approaches proposed in
language model adaptation in order to profit from this specific knowledge.

You might also like