A Short Introduction To Arti Cial Intelligence: Methods, Success Stories, and Current Limitations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

A Short Introduction to Artificial

Intelligence: Methods, Success Stories,


and Current Limitations

Clemens Heitzinger and Stefan Woltran

Abstract This chapter gives an overview of the most important methods in artificial
intelligence (AI). The methods of symbolic AI are rooted in logic, and finding
possible solutions by search is a central aspect. The main challenge is the combina-
torial explosion in search, but the focus on the satisfiability problem of propositional
logic (SAT) since the 1990s and the accompanying algorithmic improvements have
made it possible to solve problems on the scale needed in industrial applications. In
machine learning (ML), self-learning algorithms extract information from data and
represent the solutions in convenient forms. ML broadly consists of supervised
learning, unsupervised learning, and reinforcement learning. Successes in the
2010s and early 2020s such as solving Go, chess, and many computer games as
well as large language models such as ChatGPT are due to huge computational
resources and algorithmic advances in ML. Finally, we reflect on current develop-
ments and draw conclusions.

1 Introduction

Dartmouth College, 1956, USA. Renowned scientists from various disciplines,


including Claude Shannon, the founder of information theory; Herbert Simon,
who later won the Nobel Prize for Economics; and the computer scientists Marvin
Minsky and John McCarthy, met to explore the potential of the emerging computer
technology. The term “artificial intelligence” had already been coined the year
before in the course of planning the meeting, and now the following idea was

C. Heitzinger
Center for Artificial Intelligence and Machine Learning, Institute of Information Systems
Engineering, TU Wien, Wien, Austria
e-mail: clemens.heitzinger@tuwien.ac.at
S. Woltran (✉)
Center for Artificial Intelligence and Machine Learning, Institute of Logic and Computation, TU
Wien, Wien, Austria
e-mail: stefan.woltran@tuwien.ac.at

© The Author(s) 2024 135


H. Werthner et al. (eds.), Introduction to Digital Humanism,
https://doi.org/10.1007/978-3-031-45304-5_9
136 C. Heitzinger and S. Woltran

formulated: “If computers manage to perform tasks such as calculating ballistic


trajectories better than any human just by applying simple calculation rules, it should
be possible to simulate or even generate human thinking by working through simple
logical rules.” In fact, in the 1960s, the first computer programs were equipped with
logical methods that could create a mathematical proof (“Logic Theorist”) or beat
humans at games like chess. The euphoria of those days fizzled out relatively
quickly, however, and we will discuss the reasons in more detail in Sect. 2.1.
One disappointment resulted from the fact that while the explicit specification of
rules (“symbolic AI”) works well in areas such as proving mathematical statements
or planning a sequence of concrete steps to reach a specified goal, other supposedly
simpler cognitive performances, such as recognizing objects in a picture or under-
standing language, turned out to be extremely difficult, if not impossible, to specify
in this way. For tasks of this kind, a different approach, which already existed in
theory since the late 1940s, but only led to breakthroughs in the twenty-first century
due to the availability of the necessary huge data sets, proved to be more purposeful
(see Sect. 2.2). Here, no rules are given to the computer, such that the processing of
the rules leads to the solution of the problem, but solutions are learned on the basis of
data by self-learning. This approach, of course, requires large amounts of data and
computing power.
Understanding and distinguishing between these two methods is central to grasp
the limitations of current AI research, as well as the resulting problems; we will
discuss this in more detail in Sect. 3. From a digital humanism perspective, we
consider it paramount from an understanding of the existing methods to discuss
dangers but also opportunities that arise from the pervasiveness and availability of
AI systems in various areas of life today. We will therefore not address issues such as
the treatment of AIs with consciousness, but discuss implications of the so-called
singularity (Walsh, 2017), or transhumanistic visions. For space reasons, we also
omit topics from the field of robotics (“embodied AI”) as well as their implications
(e.g., autonomous weapon systems). For other aspects such as bias, trustworthiness,
or AI ethics, we refer to the corresponding chapters in this book.

2 Methods of AI

2.1 Symbolic AI

Symbolic AI refers to those methods that are based on explicitly describing the
problems or the necessary solution steps to the computer. Logic or related formal
languages are used to describe the problems; actually finding possible solutions
(“search”) is a central aspect of symbolic AI. It should already be pointed out at this
stage that in this model, the “explainability” of a solution is conceptually easy to
obtain (however, for larger specifications, the explanations tend to become incom-
prehensible for humans). Furthermore, the correctness of a solution is generally
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 137

definite and not subject to probabilities. The “intelligent” behavior results here
simply by the computing power.
Let’s consider an example: board games like chess are defined by clear rules that
tell the players the possibilities of their moves. Assume we are in a game situation
where I can mate black in two moves, i.e., there is a move for me so that no matter
what the other player decides, I have a move that mates the opponent; this is called a
winning strategy. To find such a strategy, I simply let the computer try all possible
moves on my part. For each such move, I let the computer calculate all possible
choices of the opponent and my possible answer to them. If we assume in a
simplified way that there are 10 moves to choose from in each situation, we have
103 = 1000 operations to perform. If we want to calculate one turn ahead, it is
already 105 = 10,000 and so on. It is clear that this cannot be carried on arbitrarily,
since the problem of the “combinatorial explosion” comes to bear. In chess pro-
grams, this is solved by so-called board evaluations (with which move do I have the
best possible position after three rounds, e.g., guaranteed more pieces on the board
than the opponent). Mediocre players can be beaten with such a preview already with
reasonable computing power and simple board evaluations; for grandmasters, how-
ever, it took until 1997 when Deep Blue was able to defeat the then world chess
champion Garry Kasparov.
The Power of Propositional Logic It is important to emphasize that for problems
where the computational effort increases exponentially with the problem size,
symbolic methods have a scalability problem. This is true in many areas: finding
models for logical formulas, creating an optimal shift schedule of workers, designing
timetables, computing routes in a traffic network, or for expert systems of different
kinds. Since it was clear that any progress in the computing power of chips would
not withstand exponential growth, symbolic AI methods were not considered to have
much potential for solving problems on the scale needed in industrial applications.
However, the tide turned in the mid-1990s when Kautz and Selman (1992) proposed
to reduce problems of this type to one that is as easy to handle as possible (but still
has to deal with the combinatorial explosion) and to use search methods that are as
efficient as possible for this problem. This problem is the satisfiability problem of
propositional logic (SAT).
In this logic, atomic propositions (which can be true or false) are combined via
connectives. The truth value of the compound formula is then given by the assertions
to atomic propositions and the semantics of the connectives. Let us have a simple
example with the atomic proposition “ai” (standing for “one should study artificial
intelligence”) and “dh” (standing for “one should study digital humanism”). The
state of an agent might be represented by the following formula:

ðai OR dhÞ AND NOTðai AND dhÞ

stating the fact that one should study AI or digital humanism or both (the part “ai OR
dh”), but at same time—maybe due to time constraints—one should not study both
at the same time (the part “NOT (ai AND dh)”). We have four possible assertions to
138 C. Heitzinger and S. Woltran

the atomic propositions: setting both ai and dh to true; setting ai to true and dh to
false; setting ai to false and dh to true; and, finally, setting both to false. Without
giving an exact definition of the semantics of the connectives “AND,” “OR,” and
“NOT,” it should be quite intuitive that only two of the assertions make the entire
formula true, namely, those stating that one should study either AI or digital
humanism. The formula is thus satisfied. Suppose now we add the knowledge that
one should study AI whenever studying digital humanism and, likewise, one should
study digital humanism whenever studying AI. Formally, this leads to the formula

ðai OR dhÞ AND NOTðai AND dhÞ AND ðdh - > aiÞ AND ðai - > dhÞ:

This formula is now unsatisfiable, since whatever assertions are provided to the
atomic propositions, the formula does not evaluate to true. What makes the SAT
problem computationally challenging is the fact that the possible assertions to be
checked grow exponentially in the number of atomic propositions present in the
formula.
However, it turned out that by using clever search heuristics, exploiting shortcuts
in the search space, and by using highly bred data structures, certain formulas with
millions of variables can be solved, but other, randomly generated, formulas cannot
(Ganesh & Vardi, 2020). However, the formulas that can be solved well are often
those found in the “wild.” This is partly explained by the fact that they have certain
structural properties, which are used by the search procedure—if one now reduces,
e.g., routing problems in traffic networks to such formulas, then the formulas have
“good” properties, because in the real world, traffic networks have, e.g., maximum
node degree1 10 and are not arbitrary graphs. This led in the past years to a success
story of SAT-based methods in many areas, especially in the verification of speci-
fications in hardware and software.
Since these applications are often no longer attributed to AI, here is an example
where SAT has actually led to the solution of an open problem in mathematics,
namely, the problem of Pythagorean triples: the question here is whether the natural
numbers can be divided into two parts in such a way that neither of the two parts
contains a triple (a, b, c) with a2 + b2 = c2. For the numbers 1 to 10, this is still
possible, because I only have to avoid putting the numbers 3, 4, and 5 into the same
pot. If we have to divide numbers from 1 to 15, more caution is already needed since
now 5, 12, and 13 must not end up in the same pot as well, but it still works. The
question is now as follows: Is this division always possible no matter how big the
range of numbers is? The SAT solver said no. The numbers from 1 to 7825 can no
longer be divided in this way! We refer to Heule et al. (2016) for further details on
this project.
The Limits of Propositional Logic We have thus seen that (propositional) logic
can be used to solve problems that exceed human capabilities. In fact, the pioneers of

1
The degree of a node in a graph is the number of nodes which are directly connected to that node.
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 139

symbolic AI considered logic a central vehicle to describe and simulate human


thinking. However, apart from the problem of combinatorial explosion outlined
above, another obstacle has arisen here. Human thinking does not always follow
(classical) logical steps; we have to deal with uncertainties, process contradictory
information, or even revise conclusions once made. In fact, in classical logic, it is
already immensely complex to represent plausible facts like “if I put block A on B,
the position of all other blocks remains unchanged”; see Hayes (1973). In the course
of this, in the 1970s and 1980s, symbolic AI has been centrally concerned with other
types of logic systems that allow formalizations of “common-sense reasoning.” The
numerous varieties cannot be enumerated here comprehensively, but it should not
remain unmentioned that these are today often subsumed under the term “knowledge
representation and reasoning” (van Harmelen et al., 2008) and offer a rich portfolio
of methods that could find relevance in future AI applications—in particular if it
comes to explainability.

2.2 Machine Learning

General Considerations The defining characteristic of algorithms in machine


learning (ML) is that they are self-learning, meaning that the algorithm improves
itself, or learns, using data. Traditionally, classical chess programs were explicitly
programmed using rules that describe the advantage or disadvantage a player has in
terms of points. For example, taking a rook is worth about five points, and domi-
nating the center of the board is advantageous. Self-learning algorithms, by contrast,
draw their own conclusions by watching many chess games; there is no programmer
who tunes built-in rules. Hence, in ML, the availability of larger and larger data sets
makes time-consuming and error-prone fine-tuning of internal rules or parameters of
the algorithm superfluous.
In other words, the machine learns, while the human designs the learning
algorithm. It was already recognized at the Dartmouth Workshop in 1956 that self-
improvement and self-learning are central notions of intelligence.
In the modern view of ML, the data that are used for learning are supposed to be
drawn from a probability distribution. Therefore, any learning is stochastic by
nature, which gives rise to fundamental considerations. Because the number of
data samples is always finite, although it may be huge, we may never observe
samples that are important, or we may observe samples that are not representative.
The first issue means our learning result can only be probably correct. The second
issue means that our learning results can only be approximately correct. Therefore,
the best learning results are “probably approximately correct” (PAC) statements
about the quality of a learned result.
To illustrate these considerations, let us consider the example of searching for
black swans. The black swan (Cygnus atratus) lives in southeastern and southwest-
ern Australia. We must always sample the whole space, but if the number of samples
is insufficient, we will never encounter a black swan. This is the first issue. The
140 C. Heitzinger and S. Woltran

second issue is that the first black swans that we encounter may have an uncharac-
teristically light color, misleading us in our approximation of its color.
ML is a large field, but it broadly consists of supervised learning, unsupervised
learning, and reinforcement learning. We consider these three large subfields in turn
and mention some important applications of ML.
Supervised Learning Supervised learning (SL) is concerned with finding func-
tions that correctly classify or predict an output value given an input value. These
functions are called classifiers or predictors and are chosen from a predefined class of
functions and parameterized by parameters to be learned. For example, in image
recognition, the inputs are the pixels of an image, and the output may be whether an
object that belongs to a certain class of objects (cats, dogs, etc.) is visible or whether
the image satisfies a certain property. In SL, the learning algorithm uses training data
that consists of inputs and outputs and hence the name. The outputs are often called
labels; e.g., an input sample may be a photo of a dog, and the corresponding output
may be the label “dog.” In classification tasks, the set of all outputs is finite, whereas
in prediction tasks, the set of all outputs is infinite (real numbers).
Many algorithms have been developed for SL, and we mention some of the most
important ones: artificial neural networks (ANN), decision trees, random forests,
ensemble learning, k-nearest neighbor, Bayesian networks, hidden Markov models,
and support vector machines.
Without doubt, nowadays, the most prominent approach to SL is the use of ANNs
as classifiers/predictors (Heitzinger 2022, Chapter 13). ANNs are functions that are
arranged in layers, where linear functions alternate with pointwise applied nonlinear
functions, the so-called activation functions (see Fig. 1). ANNs have a long history,
having been already discussed at the Dartmouth Workshop in 1956. A first break-
through was the backpropagation algorithm (which is automatic backward differen-
tiation), because it enabled the efficient training of ANNs.
Why are ANNs so successful? Although classification is a discrete problem,
ANNs are differentiable functions, and, as such, they have gradients, which are the
directions of fastest change of a function. Knowing this direction is extremely useful
for solving optimization problems, as the gradient provides a useful search direction.
For training in SL, it is hence expedient to use the gradient of the classifier/predictor
in order to solve the error minimization problem. In ANNs, calculating the gradient
is surprisingly fast due to the backpropagation algorithm, taking only about twice as
long as evaluating the ANNs.
ANNs are very flexible data structures, and many different ones have been
employed, since the number of layers and their sizes can or must be adjusted to
the SL problem at hand.
If the number of layers is small, but the sizes of the layers become larger, any
continuous function can be approximated, resulting in the famous universal approx-
imation property of ANNs. However, this property of wide ANNs is misleading. In
practice, increasing the number of layers helps image recognition and many other
applications, resulting in deep, not wide, ANNs. This is the main observation behind
deep learning, which is learning using deep ANNs.
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 141

Fig. 1 This schematic diagram shows how an ANN works. On the left-hand side, a vector of real
numbers is the input to the ANN. In the hidden layers, whose number is the depth of the network,
the input vector is transformed until in the final output layer, the output vector is calculated. In each
hidden layer, the previous vector is multiplied by a matrix (the weights), another vector (the bias) is
added, and a nonlinear function (the activation function) is applied element-wise. All these weight
vectors, bias vectors, and activation functions must be adjusted such that the ANN solves the given
classification/prediction problem. The arrows indicate how one parameter influences other param-
eters. In this example, the output vector consists of three numbers. The largest of the three signifies
one of the three classes if the network is used as a classifier. (Figure from Heitzinger (2022,
Chapter 13))

A breakthrough recent development are transformers, which are a certain kind of


ANN that uses the so-called attention mechanism to learn relationships between
words across long distances in a text. Transformers originated in machine translation
(Vaswani et al., 2017), yielding the best and fastest machine translation at that time.
142 C. Heitzinger and S. Woltran

They were adapted for use in InstructGPT, ChatGPT, and GPT-4 and are a milestone
in natural language processing.
The attention mechanism solves two main challenges in natural language
processing, both for translation and for text generation. The first challenge is that
natural language presupposes a lot of background knowledge—or a world model or
common sense—in order to make sense of ambiguities in natural language. The
second challenge is the use of pronouns and other relationships between words,
sometimes over large distances in a text. The attention mechanism addresses both
challenges surprisingly well and can learn the grammar of most natural languages.
Unsupervised Learning In contrast to SL, there are no outputs in unsupervised
learning (UL). In UL, the learning task is to find patterns in input samples from
untagged or unlabeled data. Often, the input samples are to be grouped into classes
according to their features, or relationships between the samples are to be found.
These relationships are often expressed as graphs or by special kinds of ANNs
such as autoencoders.
Common approaches in UL are clustering methods, anomaly detection methods,
and learning latent variable models. Clustering methods include hierarchical clus-
tering, k-means, and mixture models. An example of a clustering problem is taking a
set of patients and clustering them into groups or clusters according to the similar-
ities of their current states. Anomaly detection methods include local outlier factors
and isolation forests. Latent variables can be learned by the expectation-
maximization algorithm, the method of moments, and blind signal separation
techniques.
Reinforcement Learning Reinforcement learning (RL) is the subfield of machine
learning that is concerned with finding optimal policies to control environments in
time-dependent settings (Sutton & Barto, 2018). In each time step, the actions of the
agent influence the environment (and possibly the agent itself), and the new state of
the environment and a reward are then communicated to the agent. The learning task
is to find optimal policies that maximize the expected value of the return, i.e., the
discounted sum of all future rewards that the agent will receive while following a
policy.
RL is a very general concept that includes random environments as well as
policies whose actions are random. It encompasses all board games such as Go,
chess, and backgammon, where the rewards are non-zero typically only at the end of
the game. The agent receives a reward of +1 for winning, a reward of -1 for losing,
and a reward of 0 for a draw. Other applications occur in robotics, user interactions at
websites, finance, autonomous driving, medicine (Böck et al., 2022), etc.
Reinforcement learning problems are hard in particular when lots of time passes
between taking an action and receiving positive rewards due to this action or a
combination of actions. This is the so-called credit assignment problem.
In deep reinforcement learning, deep neural networks are used to represent the
policies and the so-called action-value functions. In this context, deep neural net-
works serve as powerful function approximators in infinite state (and action) spaces.
In distributional reinforcement learning as an extension of the classic approach, not
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 143

only is the expected value of the return maximized, but the whole probability
distribution of the return is calculated. This makes it possible to know the risk that
is associated with an action that may be taken in a given state.
An early success of RL in the 1990s was solving the board game of backgammon
with a huge search tree and a large random component (Tesauro, 1995). Starting in
the 2010s until today, reinforcement learning has been the field that enabled a string
of milestones in the history of AI. The string of publications (Silver, 2016, 2017,
2018) showed in progressively simpler, but at the same time more powerful,
algorithms that Go, chess, shogi, and a large collection of Atari 2600 games can
be solved by self-learning algorithms. Quite impressively, a single algorithm,
AlphaZero, can learn to play these games at superhuman level. It also learns starting
from zero knowledge (tabula rasa), hence Zero in its name.
In the following years, more complicated computer games were solved by similar
approaches. Computer games and card games such as poker pose their own chal-
lenges, as they contain a considerable amount of hidden information, while all state
information is observable by the agent in board games such as chess and Go.
RL is also the reason that InstructGPT (Ouyang et al., 2022)—a precursor—and
ChatGPT/GPT-4 (OpenAI, 2023) work so well. A generative pre-trained trans-
former (GPT), having been trained on vast amounts of text, can generate beautiful
text, but it is hard to make it give helpful answers.
The final, but crucial, step in training InstructGPT and ChatGPT is reinforcement
learning from human feedback (RLHF) (Ouyang et al., 2022), where four answers to
prompts are ordered and these orderings are used as the reward model in the final RL
training step. This RL training step aims to align the language model to the needs of
the user.
The needs of the user are essentially the 3H (OpenAI, 2023). The first H is for
honest; the language model should give honest/correct answers. (Truthful or correct
would be better names, as the inner belief of the language model is unknown.) The
second H is for helpful; the answers should be helpful and useful. The third H is for
harmless; the system should not give any answers that may cause harm. Unfortu-
nately, these three goals are very difficult to achieve in practice and even contradic-
tory. If we ask a language model how to rob a bank, it cannot be helpful and harmless
at the same time. Much of ongoing research on AI safety is concerned with satisfying
the 3H requirements.
Applications of ML Because its algorithms are very versatile, ML has found many
applications, and its range of applications is still expanding. Due to the speed of
using ML algorithms and due to the algorithms having reached near-human or
superhuman capabilities in many areas, they have become practically important in
many areas.
Applications include bioinformatics, computer vision, data mining, earth sci-
ences, email filtering, natural language processing (grammar checker, handwriting
recognition, machine translation, optical character recognition, speech recognition,
text-to-speech synthesis), pattern recognition (facial recognition systems), recom-
mendation systems, and search engines.
144 C. Heitzinger and S. Woltran

2.3 Combination of Methods

It is evident that human intelligence relies on different cognitive tasks with the
separation into fast and slow thinking (Kahneman, 2011) being a popular approach.
Fast thinking refers to automatic, intuitive, and unconscious tasks (e.g., pattern
recognition), while slow thinking describes conscious tasks such as planning,
deduction, and deliberation. It is evident that machine learning is the method to
simulate fast thinking, while symbolic approaches are better suited for problems
related to slow thinking. Consequently, the combination of both approaches is seen
as the holy grail for next-level AI systems.
In recent years, the term neuro-symbolic AI has been established to name this line
of research. However, it comes in many different flavors, and we shall just list a few
of them. First, the famous AlphaGo system is mentioned as a prototypical system in
this context: the symbolic approach is Monte Carlo tree search to traverse the search
space (recall our consideration on chess in Sect. 2.1), but the board evaluation is
done via ML techniques (in opposite to Deep Blue where board evaluation was
explicitly coded and designed by experts).
A second branch are neuro-symbolic architectures where the neural nets are
generated from symbolic rules (for instance, graph neural networks—GNN). Finally,
approaches like DeepProbLog offer a weak coupling between the neural and the
symbolic part; essentially, deep neural networks are treated as predicates that can be
incorporated, with an approximate probability distribution over the network output,
into logical rules, and semantic constraints can be utilized for guided gradient-based
learning of the network. However, it has to be mentioned that such hybrid architec-
tures do not immediately lead to human-like cognitive capabilities or even con-
sciousness or self-awareness.

3 Reflections
3.1 AI4Good

Through the lens of digital humanism, one might ask where AI provides us with a
valuable tool to support human efforts toward solutions to vexing problems. Such
applications are often termed “AI for Good,” and there are indeed many examples
where we benefit from AI. Such applications range from applications in medicine
(treatments, diagnosis, early detection, cancer screening, drug design, etc.) to the
identification of hate speech or fake news (cf. chapter by Prem and Krenn) and tools
for people with disabilities. A more subtle domain is climate change: while AI
techniques can be used to save energy, control herbicide application, and many
more, it is AI itself that requires a certain amount of energy (in particular, in the
training phase). For a thorough discussion on this important topic, we refer to
Rolnick et al. (2023).
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 145

3.2 Is ChatGPT a Tipping Point?

ChatGPT is without doubt a major milestone in the history of AI. It is the first system
that can interact in a truly helpful manner with users, as demonstrated by its scores
on many academic tests (Eloundou et al., 2023). It shows surprising powers of
reasoning, considering that it is a system for generating text. Its knowledge is
encyclopedic, since during learning, the GPT part has ingested vast amounts of
text, probably a good portion of all existing knowledge.
Interestingly enough, ChatGPT’s creativity is closely coupled to its so-called
temperature parameter and can therefore be adjusted easily. During text generation,
the next token or syllable is chosen from an ordered list of likely continuations. At a
low temperature, only the first syllables on the list have a chance of being selected,
but at a higher temperature, more syllables down the list also stand a chance. Thus, a
higher temperature parameter during text generation increases creativity. Again, the
Dartmouth Workshop turned out to be prescient, since creativity in the context of
intelligence was a major topic discussed there.
ChatGPT can also be used to solve mathematical or logical problems. However,
as a program for generating text, it is no substitute for specialized programs such as
computer algebra systems and SAT solvers. However, it is straightforward to couple
such specialized programs to ChatGPT. ChatGPT can be trained to become profi-
cient in using programming languages, and therefore, it can generate input to those
coupled programs. We expect that such interfaces will become more and more
refined and will gain importance, supplementing ChatGPT’s capabilities with pro-
cedural domain-expert knowledge.
Therefore, we predict that ChatGPT will revolutionize human-computer inter-
faces (HCI). The reason is that it can act as a knowledgeable translator of the user’s
intention to a vast array of specialized programs, reducing training time for human
users and resulting in interactive usage with lots of guidance and easily accessible
documentation.
Its potential for revolutionizing HCIs in this sense may likely turn out to be its
true power and a tipping point, but its effects to society should not be underrated, and
critical reflection over different disciplines is needed.2

3.3 Pressing Issues

Media often associate the danger of AI with robot apocalypses of various kinds.
However, we see the main issue in the ever-growing use of data (see Sect. 2.2) to
train more and more advanced AI models. This leads to several problems related to
copyright, privacy and personalized systems, and low-cost workers for labeling data

2
See https://dighum.ec.tuwien.ac.at/statement-of-the-digital-humanism-initiative-on-chatgpt/.
146 C. Heitzinger and S. Woltran

and training the models. Due to limited space, we will not discuss other important
issues here, such as education and the impact of AI on the working world.
In fact, ChatGPT has been fueled with sources such as books, (news) articles,
websites or posts, or even comments from social networks to perform its core
function as a dialogue system. No one has asked us whether we agree that our
research papers, blog articles, or comments in social media shall be used to train such
an AI. While this kind of copyright issues has always been pressing on the Web, the
use of private data becomes even more problematic with personal AI assistants. We
know that social media platforms are designed to keep the user on the platform (and
thus to present more advertisements) using data about her personal preferences,
browser history, and so on. As side effects, we have seen what is called filter
bubbles, echo chambers, etc., leading to political polarization and undermining
democratic processes in the long run.
All these effects have the potential to be multiplied when AI assistants start to use
knowledge about their users to give answers they want to hear, supporting them in
radical views, etc. We should have learned our lessons and be extremely careful in
feeding AI systems with personal data! Finally, it should not be overseen that AI
often relies on hidden human labor (often in the Global South) that can be damaging
and exploitative—for instance, these workers have to label hate speech, violence in
pictures and movies, and even child pornographic content. For ChatGPT, it has been
revealed that the fine-tuning of the system in order to avoid toxic answers has been
delegated to outsourced Kenyan laborers earning less than $2 per hour.3 For the first
time, this led to some media echo in this respect, thus raising awareness to a broader
public. However, this particular problem is not a new one and seems inherent to the
way advanced AI systems are built today (Casilli, 2021).

4 Conclusions

There are three main factors that have resulted in the current state of the art of
AI. The first is the availability of huge data sets and databases for learning purposes,
also due to the Internet. This includes all modalities, e.g., labeled images for
supervised learning and large collections of high-quality texts for machine transla-
tion. The second is the availability of huge computational resources for learning, in
particular graphic cards (GPUs) and clusters of GPUs for calculations with ANNs.
The third factor are algorithmic advancements and software tools. While ANNs have
appeared throughout the history of AI, new structures and new gradient-descent
algorithms have been and still are instrumental to applications. Another example are
the advancements in RL algorithms and SAT solvers.
A division of AI into symbolic AI on the one hand and into machine learning or
non-symbolic AI on the other hand can also be viewed as a division of all methods

3
https://time.com/6247678/openai-chatgpt-kenya-workers/
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 147

employed in AI into discrete and continuous methods. Here, continuous methods are
methods that use the real numbers or vectors thereof, while discrete methods do not
and often focus on logic and symbolic knowledge. Furthermore, many problems in
AI can be formulated as (stochastic) optimization problems; for example, in super-
vised learning, an error is to be minimized, and in reinforcement learning, optimal
policies are sought.
Among optimization problems, continuous optimization problems can be solved
much more efficiently than discrete optimization problems due to the availability of
the gradient, which indicates a useful search direction and which is the basis of the
fundamental gradient-descent and gradient-ascent algorithms. Thus, the formulation
of learning problems as problems in continuous optimization has turned out to be
tremendously fruitful. An example is image classification, a problem in supervised
learning, which is discrete by its very nature: the question whether an image shows a
dog has a discrete answer. Using ANNs and a softmax output, this discrete problem
is translated into a continuous one, and training the ANN benefits from gradient
descent.
Since the Dartmouth Workshop, AI has seen tremendous, albeit nonlinear,
progress. Throughout the history of AI, we have witnessed AI algorithms becoming
able to replicate more and more capabilities that were unique to human minds before,
in many cases surpassing human capabilities. ChatGPT is the recent example that
revolutionizes how AI deals with natural language. It is remarkable that it can
compose poems much better than nearly all humans. Also, systems such as
AlphaZero and ChatGPT took many people, including AI researchers, by surprise.
We expect these developments and the quest for superhuman capabilities to
continue. The recent breakthroughs will see some consolidation in the sense that
learning algorithms will become more efficient and better understood. At the same
time, many open questions and challenges remain, and the three driving factors of AI
discussed at the beginning of this section will remain active.
Research will continue at a fast pace, and more and more human capabilities will
be matched and surpassed. The defining characteristic of humans has always been
that we are the smartest entities and the best problem-solvers. This defining charac-
teristic is eroding. It will be up to us to improve the human condition and to answer
the philosophical question of what makes us human; it will not be our capabilities
alone.
Discussion Questions for Students and Their Teachers
1. Which are, in your opinion, the major opportunities and positive effects of AI
technology?
2. Provide a list of cognitive tasks humans are capable to do, and discuss which AI
method would be the one to solve it.
3. Which are, in your opinion, the major risks of AI technology?
4. Which types of questions can be answered well by large language models such as
ChatGPT? Which cannot be answered well?
5. For which types of questions and in which areas do you trust the answers of large
language models such as ChatGPT?
148 C. Heitzinger and S. Woltran

6. What do you expect to use computers for in 5 years’ time for which you are not
using them nowadays? In 10 years’ time?
7. In their book Why Machines Will Never Rule the World, Jobst Landgrebe and
Barry Smith argue that human intelligence is a capability of a complex dynamic
system that cannot be modeled mathematically in a way that allows them to
operate inside a computer (see also the interview here: https://www.digitaltrends.
com/computing/why-ai-will-never-rule-the-world/). Find arguments in favor and
against their claim.
8. For a provocative article on machine learning and its limits, see Darwiche (2018).
Discuss this article in the light of recent developments.
Learning Resources for Students
1. Marcus, G., and Davis, E. (2019) Rebooting AI—Building Artificial Intelligence
We Can Trust. Pantheon.
This is a popular science book by a psychologist and a computer scientist; it
offers an analysis of the current state of the art and discusses the need for robust,
trustworthy AI systems.
2. Russell, S.J., and Norvig, P. (2021) Artificial Intelligence, a Modern Approach.
4th edition. Pearson.
This is a standard textbook on artificial intelligence, which comprises 7 parts
(artificial intelligence; problem-solving; knowledge, reasoning, and planning;
uncertain knowledge and reasoning; machine learning; communicating, perceiv-
ing, and acting; conclusions) on more than one thousand pages. The two authors,
highly accomplished researchers, provide comprehensive treatments of all major
strands of AI.

References

Böck, M., Malle, J., Pasterk, D., Kukina, H., Hasani, R., & Heitzinger, C. (2022). Superhuman
performance on sepsis MIMIC-III data by distributional reinforcement learning. PLoS One,
17(11), e0275358. https://doi.org/10.1371/journal.pone.0275358
Casilli, A. (2021). Waiting for robots: The ever-elusive myth of automation and the global
exploitation of digital labor. Sociologias, 23(57), 112–133.
Darwiche, A. (2018). Human-level intelligence or animal-like abilities? Communications of the
ACM, 61(10), 56–67.
Eloundou T., et al. (2023). GPTs are GPTs: An early look at the labor market impact potential of
large language models. arXiv:2303.10130.
Ganesh, V., & Vardi, M. Y. (2020). On the unreasonable effectiveness of SAT solvers. In Beyond
the worst-case analysis of algorithms (pp. 547–566). Columbia University Press.
Hayes, P. (1973). The frame problem and related problems in artificial intelligence. University of
Edinburgh.
Heitzinger, C. (2022). Algorithms with Julia (1st ed.). Springer.
Heule, M. J. H., Kullmann, O., & Marek, V. W. (2016). Solving and verifying the Boolean
Pythagorean triples problem via cube-and-conquer. Proceedings SAT, 2016, 228–245.
Kahneman, D. (2011). Thinking, fast and slow. Farrar, Straus and Giroux.
Kautz, H. A., & Selman, B. (1992). Planning as satisfiability. Proceedings ECAI, 1992, 359–363.
A Short Introduction to Artificial Intelligence: Methods, Success Stories,. . . 149

OpenAI. (2023). GPT-4 Technical Report. arXiv:2303.08774.


Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback.
arXiv:2203.02155.
Rolnick, D., Donti, P. L., Kaack, L. H., Kochanski, K., Lacoste, A., Sankaran, K., Ross, A. S.,
Milojevic-Dupont, N., Jaques, N., Waldman-Brown, A., Luccioni, A. S., Maharaj, T., Sherwin,
E. D., Mukkavilli, S. K., Kording, K. P., Gomes, C. P., Ng, A. Y., Hassabis, D., Platt, J. C.,
Creutzig, F., Chayes, J. T., & Bengio, Y. (2023). Tackling climate change with machine
learning. ACM Computing Surveys, 55(2), 42.1–42.96.
Silver, D. (2016). Mastering the game of Go. Nature, 529, 484–489.
Silver, D. (2017). Mastering the game of Go without human knowledge. Nature, 550, 354–359.
Silver, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go
through self-play. Science, 362, 1140–1144.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). MIT Press.
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM,
38(3). https://doi.org/10.1145/203330.203343
van Harmelen, F., Lifschitz, V., & Porter, B. W. (Eds.). (2008). Handbook of knowledge represen-
tation. Elsevier.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Kaiser, L. (2017).
Attention is all you need. arXiv:1706.03762.
Walsh, T. (2017). The singularity may never be near. AI Magazine, 38(3), 58–62.

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons license and
indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative
Commons license, unless indicated otherwise in a credit line to the material. If material is not
included in the chapter's Creative Commons license and your intended use is not permitted by
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder.

You might also like