Application of Computational Linguistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

omputational linguistics is the

scientific study of human


language from a computational
point
of view. Computational
linguists provide computer
models of different types of
linguistic phenomena.
Computer oriented studies have
evolved into a hybrid type
called computational linguistics.
As an
interdisciplinary field,
computational linguistics has a
history of nearly half a century.
The ultimate goal
of computational linguistics is
to explain the basic techniques
used to create computer models
for the
generation and understanding of
natural langua
omputational linguistics is the
scientific study of human
language from a computational
point
of view. Computational
linguists provide computer
models of different types of
linguistic phenomena.
Computer oriented studies have
evolved into a hybrid type
called computational linguistics.
As an
interdisciplinary field,
computational linguistics has a
history of nearly half a century.
The ultimate goal
of computational linguistics is
to explain the basic techniques
used to create computer models
for the
generation and understanding of
natural langua
omputational linguistics is the
scientific study of human
language from a computational
point
of view. Computational
linguists provide computer
models of different types of
linguistic phenomena.
Computer oriented studies have
evolved into a hybrid type
called computational linguistics.
As an
interdisciplinary field,
computational linguistics has a
history of nearly half a century.
The ultimate goal
of computational linguistics is
to explain the basic techniques
used to create computer models
for the
generation and understanding of
natural langua

Analysis of application of computational


linguistics
10. Applications
As indicated at the outset, applications of computational linguistics techniques range from
those minimally dependent on linguistic structure and meaning, such as document retrieval
and clustering, to those that attain some level of competence in comprehending and using
language, such as dialogue agents that provide help and information in limited domains like
personal scheduling, flight booking, or help desks, and intelligent tutoring systems. In the
following we enumerate some of these applications. In several cases (especially machine
translation) we have already provided considerable detail, but the intent here is to provide a
bird's eye view of the state of the art, rather than technical elucidations.
With the advent of ubiquitous computing, it has become increasingly difficult to provide a
systematic categorization of NLP applications: Keyword-based retrieval of documents (or
snippets) and database access are integrated into some dialogue agents and many voice-based
services; animated dialogue agents interact with users both in tutoring systems and games;
chatbot techniques are incorporated into various useful or entertaining agents as a backends;
and language-enabled robots, though distinctive in combining vision and action with
language, are gradually being equipped with web access, QA abilities, tutorial functions, and
no doubt eventually with collaborative problem solving abilities. Thus the application
categories in the subsections that follow, rather than being mutually exclusive, are ever more
interwined in practice.

10.1 Machine translation (again)


One of the oldest MT systems is SYSTRAN, which was developed as a rule-based system
beginning in the 1960s, and has been extensively used by US and European government
agencies, and also in Yahoo! Babel Fish and (until 2007) in Google Translate. In 2010, it was
hybridized with statistical MT techniques. As mentioned, Google Translate currently uses
phrase-based MT, with English serving as an interlingua for the majority of language pairs.
Microsoft's Bing Translator employs dependency structure analysis together with statistical
MT. Other very comprehensive translation systems include Asia Online and WorldLingo.
Many systems for small language groups exist as well, for instance for translating between
Punjabi and Hindi (the Direct MT system), or between a few European languages (e.g.,
OpenLogos, IdiomaX, and GramTrans).
Translations remain error-prone, but their quality is usually sufficient for readers to grasp the
general drift of the source contents. No more than that may be required in many cases, such
as international web browsing (an application scarcely anticipated in decades of MT
research). Also, MT applications on hand-held devices, designed to aid international
travellers, can be sufficiently accurate for limited purposes such as asking directions or
emergency help, interacting with transportation personnel, or making purchases or
reservations, When high-quality translations are required, automatic methods can be used as
an aid to human translators, but subtle issues may still absorb a large portion of a translator's
time.

10.2 Document retrieval and clustering applications


Information retrieval has long been a central theme of information science, covering retrieval
of both structured data such as are found in relational databases as well as unstructured text
documents (e.g., Salton 1989). Retrieval criteria for the two types of data are not unrelated,
since both structured and unstructured data often require content-directed retrieval. For
example, while users of an employee database may wish at times to retrieve employee
records by the unique name or ID of employees, at other times they may wish to retrieve all
employees in a certain employment category, perhaps with further restrictions such as falling
into a certain salary bracket. This is accomplished with the use of “inverted files” that
essentially index entities under their attributes and values rather than their identifiers. In the
same way, text documents might be retrieved via some unique label, or they might instead be
retrieved in accord with their relevance to a certain query or topic header. The simplest
notion of relevance is that the documents should contain the terms (words or short phrases)
of the query. However, terms that are distinctive for a document should be given more
weight. Therefore a standard measure of relevance, given a particular query term, is the tf–
idf (term frequency–inverse document frequency) for the term, which increases (e.g.,
logarithmically) with the frequency of occurrences of the term in the document but is
discounted to the extent that it occurs frequently in the set of documents as a whole.
Summing the tf-idf's of the query terms yields a simple measure of document relevance.
Shortcomings of this method are first, that it underrates term co-occurrences if each term
occurs commonly in the document collection (for instance, for the query “rods and cones of
the eye”, co-occurrences of rods, cones, and eye may well characterize relevant documents,
even though all three terms occur quite commonly in non-physiological contexts), and
second, that relevant documents might have few occurrences of the query terms, while
containing many semantically related terms. Some of the vector methods mentioned in
connection with document clustering can be used to alleviate these shortcomings. We may
reduce the dimensionality of the term-based vector space using LSA, obtaining a much
smaller “concept space” in which many terms that tend to co-occur in documents will have
been merged into the same dimensions (concept). Thus sharing of concepts, rather than
sharing of specific terms, becomes the basis for measuring relevance.
Document clustering is useful when large numbers of documents need to be organized for
easy access to topically related items, for instance in collections of patent descriptions,
medical histories or abstracts, legal precedents, or captioned images, often in hierarchical
fashion. Clustering is also useful in exploratory data analysis (e.g., in exploring token
occurrences in an unknown language), and indirectly supports various NLP applications
because of its utility in improving language models, for instance in providing word clusters
to be used for backing off from specific words in cases of data sparsity.
Clustering is widely used in other areas, such as biological and medical research and
epidemiology, market research and grouping and recommendation of shopping items,
educational research, social network analysis, geological analysis, and many others.
Document retrieval and clustering often serve as preliminary steps in information extraction
(IE) or text mining, two overlapping areas concerned with extracting useful knowledge from
documents, such as the main features of named entities (category, roles in relation to other
entities, location, dates, etc.) or of particular types of events, or inferring rule-like
correlations between relational terms (e.g., that purchasing of one type of product correlates
with purchasing another).
We will not attempt to survey IE/text mining applications comprehensively, but the next two
subsections, on summarization and sentiment analysis, are subareas of particular interest here
because of their emphasis on the semantic content of texts.
10.3 Knowledge extraction and summarization
Extracting knowledge or producing summaries from unstructured text are ever more
important applications, in view of the deluge of documents issuing forth from news media,
organizations of every sort, and individuals. This unceasing stream of information makes it
difficult to gain an overview of the items relevant to some particular purpose, such as basic
data about individuals, organizations and consumer products, or the particulars of accidents,
earthquakes, crimes, company take-overs, product maintenance and repair activities, medical
research results, and so on.
One commonly used method in both knowledge extraction and certain types of “rote”
summarization relies on the use of extraction patterns; these are designed to match the kinds
of conventional linguistic patterns typically used by authors to express the information of
interest. For example, text corpora or newswire might be mined for information about
companies, by keying in on known company names and terms such as “Corp.”, “.com”,
“headquartered at”, and “annual revenue of”, as well as parts of speech and dependency
relations, and matching regular-expression patterns against local text segments containing
key phrases or positioned close to them. As another example, summarization of earthquake
reports might extract expected information such as the epicenter of the quake, its magnitude
on the Richter scale, the time and duration of the event, affected population centers, extent of
death tolls, injuries, and property damage, consequences such as fires and tsunamis, etc.
Extraction patterns can usually be thought of as targeting particular attributes in
predetermined attribute-value frames (e.g., a frame for company information or a frame for
facts about an earthquake), and the filled-in frames may themselves be regarded as
summaries, or may be used to generate natural-language summaries. Early systems of this
type were FRUMP (DeJong 1982) and JASPER (Andersen et al. 1992). Among the hundreds
of more modern extraction systems, a particularly successful one in competitions has been
SRI's “Fastus” (Hobbs et al. 1997).
Note that whether a pattern-based system is viewed as a knowledge extraction system or
summarization system depends on the text it is applied to. If all the information of interest is
bundled together in a single, extended text segment (as in the case of earthquake reports),
then the knowledge extracted can be viewed as a summary of the segment. If instead the
information is selectively extracted from miscellaneous sentences scattered through large
text collections, with most of the material being ignored as irrelevant to the purposes of
extraction, then we would view the activity of the system as information extraction rather
than summarization.
When a document to be summarized cannot be assumed to fall into some predictable
category, with the content structured and expressed in a stereotyped way, summarization is
usually performed by selecting and combining “central sentences” from the document. A
sentence is central to the extent that many other sentences in the document are similar to it,
in terms of shared word content or some more sophisticated similarity measure such as one
based on the tf-idf metric for terms, or a cosine metric in a dimensionality-reduced vector
space (thus it is as if we were treating individual sentences as documents, and finding a few
sentences whose “relevance” to the remaining sentences is maximal). However, simply
returning a sequence of central sentences will not in general yield an adequate summary. For
example, such sentences may contain unresolved pronouns or other referring expressions,
whose referents may need to be sought in non-central sentences. Also, central “sentences”
may actually be clauses embedded in lengthier sentences that contain unimportant
supplementary information. Heuristic techniques need to be applied to identify and excise the
extra material, and extracted clauses need to be fluently and coherently combined. In other
cases, complex descriptions should be more simply and abstractly paraphrased. For example,
an appropriate condensation of a sentence such as “The tornado carried off the roof of a local
farmhouse, and reduced its walls and contents to rubble” might be “The tornado destroyed a
local farmhouse.” But while some of these issues are partially addressed in current systems,
human-like summarization will require much deeper understanding than is currently
attainable. Another difficulty in this area (even more so than in machine translation) is the
evaluation of summaries. Even human judgments differ greatly, depending, for instance, on
the sensitivity of the evaluator to grammatical flaws, versus inadequacies in content.

10.4 Sentiment analysis


Sentiment analysis refers to the detection of positive or negative attitudes (or more specific
attitudes such as belief or contempt) on the part of authors of articles or blogs towards
commercial products, films, organizations, persons, ideologies, etc. This has become a very
active area of applied computational linguistics, because of its potential importance for
product marketing and ranking, social network analysis, political and intelligence analysis,
classification of personality types or disorders based on writing samples, and other areas. The
techniques used are typically based on sentiment lexicons that classify the affective polarity
of vocabulary items, and on supervised machine learning applied to texts from which word
and phrasal features have been extracted and that have been hand-labeled as expressing
positive or negative attitudes towards some theme. Instead of manual labeling, existing data
can sometimes be used to provide a priori classification information. For example, average
numerical ratings of consumer products or movies produced by bloggers may be used to
learn to classify unrated materials belonging to the same or similar genres. If fact, affective
lexical categories and contrast relations may be learnable from such data; for example,
frequent occurrences of phrases such as great movie or pretty good movie or terrible movie in
blogs concerning movies with high, medium, and low average ratings may well suggest
that great, pretty good, and terrible belong to a contrast spectrum ranging from a very
positive to a very negative polarity. Such terminological knowledge can in turn boost the
coverage of generic sentiment lexicons. However, sentiment analysis based on lexical and
phrasal features has obvious limitations, such as obliviousness to sarcasm and irony ( “This
is the most subtle and sensitive movie since The Texas Chainsaw Massacre”), quotation of
opinions contrasting with the author's (“According to the ads, Siri is the greatest app since
iTunes, but in fact …”), and lack of understanding of entailments (“You'll be much better off
buying a pair of woolen undies for the winter than purchasing this item”). Thus researchers
are attempting to integrate knowledge-based and semantic analysis with superficial word-
and phrase-based sentiment analysis.

10.5 Chatbots and companionable dialogue agents


Current chatbots are the descendants of Weizenbaum's ELIZA (see section 1.2), and are
typically used (often with an animated “talking head” character) for entertainment, or to
engage the interest of visitors to the websites of certain “dotcoms”. They may be equipped
with large hand-crafted scripts (keyword-indexed input-response schemas) that enable them
to answer simple inquiries about the company and their products, with some ability to
respond to miscellaneous topics and to exchange greetings and pleasantries. A less benign
application is the use of chatbots posing as visitors to social network sites, or interactive
game sites, with the aim of soliciting private information from unwitting human participants,
or recommending websites or products to them. As a result, many social networking sites
have joined other bot-targeted sites in using CAPTCHAS to foil bot entry.
Companionable dialogue agents (also called relational agents) have so far relied rather
heavily on chatbot techniques, i.e., authored input patterns and corresponding outputs. But
the goal is to transcend these techniques, creating agents (often with talking heads or other
animated characters) with personality traits and capable of showing emotion and empathy;
they should have semantic and episodic memory, learning about the user over the long term
and providing services to the user. Those services might include, besides companionship and
support: advice in some areas of life, health and fitness, schedule maintenance, reminders,
question answering, tutoring (e.g., in languages), game playing, and internet services. Yorick
Wilks has suggested that ideally such characters would resemble “Victorian companions”,
with such characteristics as politeness, discretion, modesty, cheerfulness, and well-
informedness (Wilks 2010).
However, such goals are far from being achieved, as speech recognition, language
understanding, reasoning and learning are not nearly far enough advanced. As a noteworthy
example of the state of the art, we might mention the HWYD (“How Was Your Day”)
system of Pulman et al. (2010), which won a best demonstration prize at an autonomous
agents conference. The natural language processing in this system is relatively sophisticated.
Shallow syntactic and semantic processing is used to find instantiations of some 30 “event
templates”, such as ones for “argument at work between X and Y,” or “meeting
with X about Y”. The interpretation process includes reference and ellipsis resolution, relying
on an information state representation maintained by the dialogue manager. Goals generated
by the dialogue manager lead to responses via planning, which involves instantiation and
sequencing of response paradigms. The authors report the system's ability to maintain
consistent dialogues extending over 20 minutes.
Systems of a rather different sort, aimed at clinically well-founded health counseling, have
been under development as well. For example, the systems described in (Bickmore et al.
2011) rely on an extensive, carefully engineered formalization of clinically proven
counseling strategies and knowledge, expressed within a description logic (OWL) and a
goal-directed task description language. Such systems have proved to perform in a way
comparable to human counselors. However, though dialogues are plan-driven, they
ultimately consist of scripted system utterances paired with multiple-choice lists of responses
offered to the client.
Thus companionable systems remain very constrained in the dialogue themes they can
handle, their understanding of language, and their ability to bring extensive general
knowledge to a conversation, let alone to use such knowledge inferentially.

10.6 Virtual worlds, games, and interactive fiction


Text-based adventure (quest) games, such as Dungeons and Dragons, Hunt the Wumpus (in
its original version), and Advent began to be developed in the early and middle 1970s, and
typically featured textual descriptions of the setting and challenges confronting the player,
and allowed for simple command-line input from the player to select available actions (such
as “open box”, “take sword” or “read note”). While the descriptions of the settings (often
accompanied by pictures) could be quite elaborate, much as in adventure fiction, the input
options available to the player were, and have largely remained, restricted to simple
utterances of the sort that can be anticipated or collected in pre-release testing by the game
programmers, and for which responses can be manually prepared. Certainly more flexible
use of NL ( “fend off the gremlin with the sword!”, “If I give you the gold, will you open the
gate for me?”) would enliven the interaction between player and the game world and the
characters in it. In the 1980s and 90s text-based games declined in favor of games based
primarily on graphics and animation, though an online interactive fiction community grew
over the years that drove the evolution of effective interactive fiction development software.
A highly touted program (in the year 2000) was Emily Short's ‘Galatea’, which enabled
dialogue with an animated sculpture. However, this is still an elaborately scripted program,
allowing only for inputs that can be heuristically mapped to one of various preprogrammed
responses. Many games in this genre also make use of chatbot-like input-output response
patterns in order to gain a measure of robustness for unanticipated user inputs.
The most popular PC video games in the 1990s and beyond were Robyn and Rand Miller's
Myst, a first-person adventure game, and Maxis Software's The Sims, a life-simulation game.
Myst, though relying on messages in books and journals, was largely nonverbal, and The
Sims' chief developer, Will Wright, finessed the problem of natural language dialogue by
having the inhabitants of SimCity babble in Simlish, a nonsense language incorporating
elements of Ukrainian, French and Tagalog.
Commercial adventure games and visual novels continue to rely on scripted dialogue trees—
essentially branching alternative directions in which the dialogue can be expected to turn,
with ELIZA-like technology supporting the alternatives. More sophisticated approaches to
interaction between users and virtual characters are under development in various research
laboratories, for example at the Center for Human Modeling and Simulation at the University
of Pennsylvania, and the USC-affiliated Institute for Creative Technologies. While the
dialogues in these scenarios are still based on carefully designed scripts, the interpretation of
the user's spoken utterances exploits an array of well-founded techniques in speech
recognition, dialogue management, and reasoning. Ongoing research can be tracked at
venues such as IVA (Intelligent Virtual Agents), AIIDE (AI and Interactive Digital
Entertainment), and AAMAS (Autonomous Agents and Multiagent Systems).

10.7 Natural language user interfaces


The topic of NL user interfaces subsumes a considerable variety of NL applications, ranging
from text-based systems minimally dependent on understanding to systems with significant
comprehension and inference capabilities in text- or speech-based interactions. The
following subsections briefly survey a range of traditional and current applications areas.
Text-based question answering
Text-based QA is practical to the extent that the types of questions being asked can be
expected to have ready-made answers tucked away somewhere in the text corpora being
accessed by the QA system. This has become much more feasible in this age of burgeoning
internet content than a few decades ago, though questions still need to be straightforward,
factual ones (e.g., “Who killed President Lincoln?”) rather than ones requiring inference
(e.g., “In what century did Catherine the Great live?”, let alone “Approximately how many 8-
foot 2-by-4s do I need to build a 4-foot high, 15-foot long picket fence?”).
Text-based QA begins with question classification (e.g., yes-no questions, who-questions,
what-questions, when-questions, etc.), followed by information retrieval for the identified
type of question, followed by narrowing of the search to paragraphs and finally sentences
that may contain the answer to the question. The successive narrowing typically employs
word and other feature matching, and ultimately dependency and role matching, and perhaps
limited textual inference to verify answer candidates. Textual inference may, for instance,
use WordNet hypernym knowledge to try to establish that a given candidate answer sentence
supports the truth of the declarative version of the question. Since the chosen sentence(s)
may contain irrelevant material and anaphors, it remains to extract the relevant material
(which may also include supporting context) and generate a well-formed, appropriate
answer. Many early text-based QA systems up to 1976 are discussed in Bourne & Hahn
2003. Later surveys (e.g., Maybury 2004) have tended to include the full spectrum of QA
methods, but TREC conference proceedings (https://trec.nist.gov/) feature numerous papers
on implemented systems for text-based QA.
In open-domain QA, many questions are concerned with properties of named entities, such
as birth date, birth place, occupation, and other personal attributes of well-known present and
historical individuals, locations, ownership, and products of various companies, facts about
consumer products, geographical facts, and so on. For answering such questions, it makes
sense to pre-assemble the relevant factoids into a large knowledge base, using knowledge
acquisition methods like those in section 8. Examples of systems containing an abundance of
factoids about named entities are several developed at the University of Washington, storing
factoids as text fragments, and various systems that map harvested factoids into RDF
(Resource Description Framework) triples (see references in Other Internet Resources).
Some of these systems obtain their knowledge not only from open information extraction and
targeted relation extraction, but also from such sources as Wikipedia “infoboxes” and
(controlled) crowdsourcing. Here we are also stretching the notion of question answering,
since several of the mentioned systems require the use of key words or query patterns for
retrieval of factoids.
From a general user perspective, it is unclear how much added benefit can be derived from
such constructed KBs, given the remarkable ability of Google and other search engines to
provide rapid answers even to such questions as “Which European countries are
landlocked?” (typed without quotes—with quotes, Google finds the top answer using True
Knowledge), or “How many Supreme Court justices did Kennedy appoint?” Nonetheless,
both Google and Microsoft have recently launched vast “knowledge graphs” featuring
thousands of relations among hundreds of millions of entities. The purpose is to provide
direct answers (rather then merely retrieved web page snippets) to query terms and natural
language questions, and to make inferences about the likely intent of users, such as
purchasing some type of item or service.
Database front-ends
Natural-language front ends for databases have long been considered an attractive
application of NLP technology, beginning with such systems as LUNAR (Woods et al. 1972)
and REL (Thompson et al. 1969; Thompson & Thompson 1975). The attractiveness lies in
the fact that retrieval and manipulation of information from a relational (or other uniformly
structured) database can be assumed to be handled by an existing db query language and
process. This feature sharply limits the kinds of natural language questions to be expected
from a user, such as questions aimed at retrieving objects or tuples of objects satisfying given
relational constraints, or providing summary or extremal properties (longest rivers, lowest
costs, and the like) about them. It also greatly simplifies the interpretive process and
question-answering, since the target logical forms—formal db queries—have a known,
precise syntax and are executed automatically by the db management system, leaving only
the work of displaying the computed results in some appropriate linguistic, tabular or
graphical form.
Numerous systems have been built since then, aimed at applications such as navy data on
ships and their deployment (LADDER: Hendrix et al. 1978), land-use planning (Damerau
1981), geographic QA (CHAT-80: Pereira & Warren 1982), retrieval of company records and
product records for insurance companies, oil companies, manufacturers, retailers, banks, etc.
(INTELLECT: Harris 1984), compilation of statistical data concerning customers, services,
assets, etc., of a company (Cercone et al. 1993), and many more (e.g., see Androutsopoulos
& Ritchie 2000). However, the commercial impact of such systems has remained scant,
because they have generally lacked the reliability and some of the functionalities of
traditional db access.
Inferential (knowledge-based) question answering
We have noted certain limited inferential capabilities in text-based QA systems and NL front
ends for databases, such as the ability to confirm entailment relations between candidate
answers and questions, using simple sorts of semantic relations among the terms involved,
and the ability to sort or categorize data sets from databases and compute averages or even
create statistical charts.
However, such limited, specialized inference methods fall far short of the kind of general
reasoning based on symbolic knowledge that has long been the goal in AI question
answering. One of the earliest efforts to create a truly inferential QA system was the
ENGLAW project of L. Stephen Coles (Coles 1972). ENGLAW was intended as a prototype
of a kind of system that might be used by scientists and engineers to obtain information
about physical laws. It featured a KB of axioms (in first-order logic) for 128 important
physical laws, manually coded with the aid of a reference text. Questions (such as “In the
Peltier Effect, does the heat developed depend on the direction of the electric current?”) were
rendered into logic via a transformational grammar parser, and productions (aided by various
Lisp functions) that map phrase patterns to logical expressions. The system was not
developed to the point of practical usefulness, but its integration of reasoning and NLP
technologies and its methods of selectively retrieving axioms for inferential QA were
noteworthy contributions.
An example of a later larger-scale system aimed at practical goals was BBN's JANUS system
(Ayuso et al. 1990). This was intended for naval battle management applications, and could
answer questions about the locations, readiness, speed and other attributes of ships, allowing
for change with the passage of time. It mapped English queries to a very expressive initial
representation language with an “intension” operator to relate formulas to times and possible
worlds, and this was in turn mapped into the NIKL description logic, which proved adequate
for the majority of inferences needed for the targeted kinds of QA.
Jumping forward in time, we take note of the web-based Wolfram|Alpha (or WolframAlpha)
answer engine, developed by Wolfram Research and consisting of 15 million lines of
Mathematica code grounded in curated data bases, models, and algorithms for thousands of
different domains. (Mathematica is a mathematically oriented high-level programming
language developed by the British scientist Stephen Wolfram.) The system is tilted primarily
towards quantitative questions (e.g., “What is the GDP of France?”, or “What is the surface
area of the Moon?”) and often provides charts and graphics along with more direct answers.
The interpretation of English queries into functions applied to various known objects is
accomplished with the pattern matching and symbol manipulation capabilities of
Mathematica. However, the comprehension of English is not particularly robust at the time
of writing. For example “How old was Lincoln when he died?”, “At what age did Lincoln
die?” and other variants were not understood, though in many cases of misunderstanding,
Wolfram|Alpha displays enough retrieved information to allow inference of an answer. A
related shortcoming is that Wolfram|Alpha's quantitative skills are not supplemented with
significant qualitative reasoning skills. For example, “Was Socrates a man?” (again, at the
time of writing) prompts display of summary information about Socrates, including an
image, but no direct answer to the question. Still, Wolfram|Alpha's quantitative abilities are
not only interesting in stand-alone mode, but also useful as augmentations of search engines
(such as Microsoft Bing) and of voice-based personal assistants such as Apple's Siri (see
below).
Another QA system enjoying wide recognition because of its televised victory in the
Jeopardy! quiz show is IBM's “Watson” (Ferrucci 2012; Ferrucci et al. 2010; Baker 2011).
Like Wolfram|Alpha, this is in a sense a brute force program, consisting of about a million
lines of code in Java, C++, Prolog and other languages, created by a core team of 20
researchers and software engineers over the course of three years. The program runs 3000
processes in parallel on ninety IBM Power 750 servers, and has access to 200 million pages
of content from sources such as Wordnet, Wikipedia (and its structured derivatives YAGO
and DBpedia), thesauri, newswire articles, and literary texts, amounting to several terabytes
of human knowledge. (This translates into roughly 1010 clausal chunks—a number likely to
be around 2 orders of magnitude greater than the number of basic facts over which any one
human being disposes.)
Rather than relying on any single method of linguistic or semantic analysis, or method of
judging relevance of retrieved passages and textual “nuggets” therein, Watson applies
multiple methods to the questions and candidate answers, including methods of question
classification, focal entity detection, parsing, chunking, lexical analysis, logical form
computation, referent determination, relation detection, temporal analysis, and special
methods for question-answer pairs involving puns, anagrams, and other twists common in
Jeopardy!. Different question analyses are used separately to retrieve relevant documents,
and to derive, analyze and score potential answers from passages and sentences in those
documents. In general, numerous candidate answers to a question are produced, and their
analyses provide hundreds of features whose weights for obtaining ranked answers with
corresponding confidence levels are learned by ML methods applied to a corpus of past
Jeopardy! questions and answers (or officially, answers and questions, according to the
peculiar conceit of the Jeopardy! protocol). Watson's wagers are based on the confidence
levels of its potential answers and a complex regression model.
How well does Watson fit under our heading of inferential, knowledge-based QA? Does it
actually understand the questions and the answers it produces? Despite its impressive
performance against Jeopardy! champions, Watson reasons, and understands English in only
very restricted senses. The program exploits the fact that the target of a Jeopardy! question is
usually a named entity, such as Jimmy Carter, Islamabad, or Black Hole of Calcutta, though
other types of phrases are occasionally targeted. Watson is likely to find multiple sentences
that mention a particular entity of the desired type, and whose syntactic and semantic
features are close to the features of the question, thereby making the named entity a plausible
answer without real understanding of the question. For example, a “recent history” question
asking for the president under whom the US gave full recognition to Communist China
(Ferrucci 2012) might well zero in on such sentences as
Although he was the president who restored full diplomatic relations with China in 1978,
Jimmy Carter has never visited that country … (New York Times, June 27, 1981)
or
Exchanges between the two countries' nuclear scientists had begun soon after President
Jimmy Carter officially recognized China in 1978. (New York Times, Feb. 2, 2001)
While the links between such sentences and the correct answer are indirect (e.g., dependent
on resolving he and who to Jimmy Carter, and associating restored diplomatic
relations with recognized, and Communist China with China), correct analysis of those links
is not a requirement for success—it is sufficient for the cluster of sentences favoring the
answer Jimmy Carter (in virtue of their word and phrasal content and numerous other
features) to provide a larger net weight to that answer than any competing clusters. This type
of statistical evidence combination based on stored texts seems unlikely to provide a path to
the kind of understanding that even first-graders betray in answering simple commonsense
questions, such as “How do people keep from getting wet when it rains?”, or “If you eat a
cookie, what happens to the cookie?” At the same time, vast data banks utilized in the
manner of Watson can make up for inferential weakness in various applications, and IBM is
actively redeveloping Watson as a resource for physicians, one that should be able to provide
diagnostic and treatment possibilities that even specialists may not have at their fingertips. In
sum, however, the goal of open-domain QA based on genuine understanding and knowledge-
based reasoning remains largely unrealized.
Voice-based web services and assistants
Voice-based services, especially on mobile devices, are a rapidly expanding applications
area. Services range from organizers (for grocery lists, meeting schedules, reminders, contact
lists, etc.), to in-car “infotainment” (routing, traffic conditions, hazard warnings, iTunes
selection, finding nearby restaurants and other venues, etc.), to enabling use of other
miscellaneous apps such as email dictation, dialing contacts, financial transactions,
reservations and placement of orders, Wikipedia access, help-desk services, health advising,
and general question answering. Some of these services (such as dialing and iTunes
selection) fall into the category of hands-free controls, and such controls are becoming
increasingly important in transport (including driverless or pilotless vehicles), logistics
(deployment of resources), and manufacturing. Also chatbot technology and companionable
dialogue agents (as discussed in section 10.5) are serving as general backends to more
specific voice-based services.
The key technology in these services is of course speech recognition, whose accuracy and
adaptability has been gradually increasing. The least expensive, narrowly targeted systems
(e.g., simple organizers) exploit strong expectations about user inputs to recognize, interpret
and respond to those inputs; as such they resemble menu-driven systems. More versatile
systems, such as car talkers that can handle routing, musical requests, searches for venues,
etc., rely on more advanced dialogue management capabilities. These allow for topic
switches and potentially for the attentional state of the user (e.g., delaying answering a
driver's question if the driver needs to attend to a turn). The greatest current “buzz”
surrounds advanced voice-based assistants, notably iPhone's Siri (followed by Android's Iris,
True Knowledge's Evi, Google Now, and others). While previous voice control and dictation
systems, like Android's Vlingo, featured many of the same functionalities, Siri adds
personality and improved dialogue handling and service integration—users feel that they are
interacting with a lively synthetic character rather than an app. Besides Nuance SR
technology, Siri incorporates complex techniques that were to some extent pushed forward
by the CALO (Cognitive Assistant that Learns and Organizes) project carried out by SRI
International and multiple universities from 2003–2008 (Ambite et al. 2006; CALO
[see Other Internet Resources]). These techniques include aspects of NLU, ML, goal-
directed and uncertain inference, ontologies, planning, and service delegation. But while
delegation to web services, including Wolfram|Alpha QA, or chatbot technology provides
considerable robustness, and there is significant reasoning about schedules, purchasing and
other targeted services, general understanding is still very shallow, as users soon discover.
Anecdotal examples of serious misunderstandings are “Call me an ambulance” eliciting the
response “From now on I will call you ‘an ambulance’”. However, the strong interest and
demand in the user community generated by these early (somewhat) intelligent, quite
versatile assistants is likely to intensify and accelerate research towards ever more life-like
virtual agents, with ever more understanding and common sense.

10.8 Collaborative problem solvers and intelligent tutors


We discuss collaborative problem solving systems (also referred to as “mixed-initiative” or
“task-oriented” dialogue systems) and tutorial dialogue systems (i.e., tutorial systems in
which dialogue plays a pivotal role) under a common heading because both depend on rather
deep representations or models of the domains they are aimed at as well as the mental state
of the users they interact with.
However, we should immediately note that collaborative problem solving systems typically
deal with much less predictable domain situations and user inputs than tutorial systems, and
accordingly the former place much greater emphasis on flexible dialogue handling than the
latter. For example, collaborators in emergency evacuation (Ferguson and Allen 1998, 2007)
need to deal with a dynamically changing domain, at the same time handling the many
dialogue states that may occur, depending on the participants' shared and private beliefs,
goals, plans and intentions at any given point. By contrast, in a domain such as physics
tutoring (e.g., Jordan et al. 2006; Litman and Silliman 2004), the learner can be guided
through a network of learning goals with authored instructions, and corresponding to those
goals, finite-state dialogue models can be designed that classify student inputs at each point
in a dialogue and generate a prepared response likely to be appropriate for that input.
It is therefore not surprising that tutorial dialogue systems are closer to commercial
practicality, with demonstrated learning benefits relative to conventional instruction in
various evaluations, than collaborative problem solving systems for realistic applications.
Tutorial dialogue systems have been built for numerous domains and potential clienteles,
ranging from K-12 subjects to computer literacy and novice programming, qualitative and
quantitative physics, circuit analysis, operation of machinery, cardiovascular physiology, fire
damage control on ships, negotiation skills, and more (e.g., see Boyer et al. 2009; Pon-Barry
et al. 2006). Among the most successful tutorial systems are reading tutors (e.g., Mostow and
Beck 2007; Cole et al. 2007), since the materials presented to the learner (in a “scaffolded”
manner) are relatively straightforward to design in this case, and the responses of the learner,
especially when they consist primarily of reading presented text aloud, are relatively easy to
evaluate. For the more ambitious goal of fostering reading comprehension, the central
problem is to design dialogues so as to make the learner's contributions predictable, while
also making the interaction educationally effective (e.g., Aist and Mostow 2009).
Some tutoring systems, especially ones aimed at children, use animated characters to
heighten the learner's sense of engagement. Such enhancements are in fact essential for
systems aimed at learners with disabilities like deafness (where mouth and tongue
movements of the virtual agent observed by the learner can help with articulation), autism, or
aphasia (Massaro et al. 2012; Cole et al. 2007). As well, if tutoring is aimed specifically at
training interpersonal skills, implementation of life-like characters (virtual humans) becomes
an indispensable part of system development (e.g., Core et al. 2006; Campbell et al. 2011).
Modeling the user's state of mind in tutoring systems is primarily a matter of determining
which of the targeted concepts and skills have, or have not yet, been acquired by the user,
and diagnosing misunderstandings that are likely to have occurred, given the session
transcript so far. Some recent experimental systems can also adapt their strategies to the
user's apparent mood, such as frustration or boredom, as might be revealed by the user's
inputs, tone of voice, or even facial expressions or gestures analyzed via computer vision.
Other prototype systems can be viewed as striving towards more general mental modeling,
by incorporating ideas and techniques from task-oriented dialogue systems concerning
dialogue states, dialogue acts, and deeper language understanding (e.g., Callaway et al.
2007).
In task-oriented dialogue systems, as already noted, dialogue modeling is much more
challenging, since such systems are expected not only to contribute to solving the domain
problem at hand, but to understand the user's utterances, beliefs, and intentions, and to hold
their own in a human-like, mixed-initiative dialogue. This requires domain models, general
incremental collaborative planning methods, dialogue management that models rational
communicative interaction, and thorough language understanding (especially intention
recognition) in the chosen domain. Prototype systems have been successfully built for
domains such as route planning, air travel planning, driver and pedestrian guidance, control
and operation of external devices, emergency evacuation, and medication advising (e.g.,
Allen et al. 2006; Rich and Sidner 1998; Bühler and Minker 2011; Ferguson and Allen 1998,
2007), and these hold very significant practical promise. However, systems that can deal
with a variety of reasonably complex problems, especially ones requiring broad
commonsense knowledge about human cognition and behavior, still seem out of reach at this
time.

10.9 Language-enabled robots


As noted at the beginning of section 10, robots are beginning to be equipped with web
services, question answering abilities, chatbot techniques (for fall-back and entertainment),
tutoring functions, and so on. The transfer of such technologies to robots has been slow,
primarily because of the very difficult challenges involved in just equipping a robot with the
hardware and software needed for basic visual perception, speech recognition, exploratory
and goal-directed navigation (in the case of mobile robots), and object manipulation.
However, the keen public interest in intelligent robots and their enormous economic potential
(for household help, eldercare, medicine, education, entertainment, agriculture, industry,
search and rescue, military missions, space exploration, and so on) will surely continue to
energize the drive towards greater robotic intelligence and linguistic competence.
A good sense of the state of the art and difficulties in human-robot dialogue can be gained
from (Scheutz et al. 2011). Some of the dialogue examples presented there, concerning boxes
and blocks, are reminiscent of Winograd's SHRDLU, but they also exhibit the challenges
involved in real interaction, such as the changing scenery as the robot moves, speech
recognition errors, disfluent and complex multi-clause utterances, perspective-dependent
utterances ( “Is the red box to the left of the blue box?”), and deixis (“Go down there”). In
addition, all of this must be integrated with physical action planned so as to fulfill the
instructions as understood by the robot. While the ability of recent robots to handle these
difficulties to some degree is encouraging, many open problems remain, such as the
problems of speech recognition in the presence of noise, better, broader linguistic coverage,
parsing, and dialogue handling, adaptation to novel problems, mental modeling of the
interlocutor and other humans in the environment, and greater general knowledge about the
world and the ability to use it for inference and planning (both at the domain level and the
dialogue level).
While task-oriented robot dialogues involve all these challenges, we should note that some
potentially useful interactions with “talking” robots require little in the way of linguistic
skills. For example, the RUBI robot described in (Movellan et al. 2009), displayed objects on
its screen-equipped “chest” to toddlers, asking them to touch and name the objects. This
resulted in improved word learning by the toddlers, despite the simplicity of the interaction.
Another example of a very successful talking robot with no real linguistic skills was the
“museum tour guide” RHINO (Burgard et al. 1999). Unlike RUBI it was able to navigate
among unpredictably moving humans, and kept its audience engaged with its prerecorded
messages and with a display of its current goals on a screen. In the same way, numerous
humanoid robots (for example, Honda's Asimo) under past and present development across
the world still understand very little language and rely mostly on scripted output. No doubt
their utility and appeal will continue to grow, thanks to technologies like those mentioned
above—games, companionable agent systems, voice-based apps, tutors, and so on; and these
developments will also fuel progress on the deeper aspects of perception, motion,
manipulation, and meaningful dialogue.

You might also like