From Machine Learning To Explainable AI

From Machine Learning to Explainable AI
Andreas Holzinger
Holzinger Group, HCI-KDD, Institute for Medical Informatics, Statistics & Documentation
Medical University Graz, Austria
and
Institute of Interactive Systems & Data Science, Graz University of Technology, Austria
a.holzinger@hci.kdd.org
Keynote Talk at IEEE DISA 2018 Conference, Kosice, August, 23, 2018
Abstract—The success of statistical machine learning (ML) (HCI), rooted in cognitive science, particularly dealing with
methods made the field of Artificial Intelligence (AI) so pop- human intelligence, and Knowledge Discovery/Data Mining
ular again, after the last AI winter. Meanwhile deep learning (KDD), rooted in computer science particularly dealing with
approaches even exceed human performance in particular tasks.
However, such approaches have some disadvantages besides artificial intelligence. This approach fosters a complete ma-
of needing big quality data, much computational power and chine learning pipeline beyond algorithm development. It in-
engineering effort; those approaches are becoming increasingly cludes knowledge extraction, ranging from issues of data pre-
opaque, and even if we understand the underlying mathematical processing, data mapping and data fusion of heterogeneous and
principles of such models they still lack explicit declarative high-dimensional data sets up to the visualization of the results
knowledge. For example, words are mapped to high-dimensional
vectors, making them unintelligible to humans. What we need in a dimension accessible to a human end-user and making
in the future are context-adaptive procedures, i.e. systems that data interactively accessible and manipulable. Thematically,
construct contextual explanatory models for classes of real-world these Machine Learning & Knowledge Extraction (MAKE)
phenomena. This is the goal of explainable AI, which is not pipeline encompasses seven sections ([5], [6]):
a new field; rather, the problem of explainability is as old as Section 1: Data: data preprocessing, integration, map-
AI itself. While rule-based approaches of early AI were com-
prehensible ”glass-box” approaches at least in narrow domains, ping, fusion. This starts with understanding the physical
their weakness was in dealing with uncertainties of the real aspects of raw data and fostering a deep understanding of
world. Maybe one step further is in linking probabilistic learning the data ecosystem, particularly within an application domain.
methods with large knowledge representations (ontologies) and Quality of the data is of utmost importance.
logical approaches, thus making results re-traceable, explainable Section 2: Learning: algorithms. The core section deals
and comprehensible on demand.
with all aspects of learning algorithms, in the design, devel-
I. I NTRODUCTION opment, experimentation, testing and evaluation of algorithms
This talk is divided into six sections: 1) I will start with generally and in the application to application domains specif-
explaining the HCI–KDD approach towards integrative ma- ically.
chine learning (ML); 2) I will continue with discussing the Section 3: Visualization: data visualization, visual analy-
importance of understanding intelligence and 3) show very sis. At the end of the pipeline there is a human, who is limited
briefly our application domain health, where it becomes clear to perceive information in dimensions 3. It is a hard task to
why 4) dealing with uncertainty is important. A quick journey map the results, gained in arbitrarily high dimensional spaces,
through some successful applications of 5) automatic machine down to the lower dimensions, ultimately to R2 .
learning (aML) will bring us to the limitations of these and Section 4: Privacy: Data Protection, Safety & Security.
let us understand that sometimes a human-in-the-loop can be Worldwide increasing demands on data protection laws and
beneficial. The discussion of 6) interactive machine learning regulations (e.g., the new European Union data protection di-
(iML) will directly lead us to the topic 7) explainable AI; I rections), privacy aware machine learning becomes a necessity
will finish the talk with outlining some future directions. not an add-on. New approaches, e.g., federated learning, glass-
box approaches, will be important in the future. However, all
II. W HAT IS THE HCI–KDD APPROACH ? these topics needs a strong focus on usability, acceptance and
ML is a very practical field. Algorithm development is at also social issues.
the core, however, successful ML requires a concerted effort of Section 5: Network Science: Graph-Based Data Mining.
various experts with diverse background. Such a field needs an Graph theory provides powerful tools to map data structures
integrated approach: Integrative Machine Learning [1] is based and to find novel connections between data objects and the
on the idea of combining the best of the two worlds dealing inferred graphs can be further analyzed by using graph-
with understanding intelligence, which is manifested in the theoretical, statistical and ML techniques.
HCI–KDD approach:[2, 3, 4]: Human–Computer Interaction Section 6: Topology: Topology-Based Data Mining. The
55
most popular techniques of computational topology include we may refer to X as the hidden data, and Y as the observed
homology and persistence and the combination with ML data. If we suppose that M , f and X are unknown, but Y is
approaches would have enormous potential for solving many known, the question remains if we can identify M ? [12]. Such
practical problems. questions are studied in section 6 of the MAKE-approach [13].
Section 7: Entropy: Entropy-Based Data Mining. En- Consequently, to reach a level of usable intelligence, we
tropy can be used as a measure of uncertainty in data, need (1) to learn from prior data, (2) to extract knowledge,
thus provides a bridge to theoretical and practical aspects (3) to generalize, (4) to fight the curse of dimensionality,
of information science (e.g., Kullback–Leibler Divergence for (5) to disentangle the underlying explanatory factors of the
distance measure of probability distributions). data [14] and (6) to understand the data in the context of an
I will explain our HCI–KDD logo in more detail later on, but application domain. One grand challenge still remains open:
before we shall talk briefly about understanding intelligence. to make sense of the data in the context of the application
domain. The quality of data and appropriate features matter
III. U NDERSTANDING I NTELLIGENCE
most, and previous work has shown that the best-performing
”Solve intelligence – then solve everything else” ... if I methods typically combine multiple low-level features with
would say this, my students would not believe it; therefore high-level context [15].
not I am saying this, but it is the official motto of Google We compare a DQN-agent (Deep Q-learning, Q-learning is
Deepmind (see e.g. the talk by Demis Hassabis from May, a kind of model-free reinforcement learning [16]) with the
22, 2015). best reinforcement learning methods, where we normalize the
Now let me explain our HCI-KDD logo1 in more detail: performance of the DQN-agent with respect to a professional
Augmenting human intelligence (left) with artificial intelli- human games tester (that is, 100% level) and random play (that
gence (right) means mapping results from high-dimensional is, 0% level). It can be seen that DQN outperforms competing
spaces into the lower dimensions [3]. The logo shall indicate methods in almost all the games, and performs at a level that is
the connection between Cognitive Science and Computer broadly comparable with or superior to a professional human
Science: Cognitive Science studies the principles of human games tester (that is, operationalized as a level of 75% or
intelligence and human learning [7]. Our natural surrounding above) in the majority of games. However, still humans are
is in R3 and humans are excellent in perceiving patterns out much better in certain games, and the question remains open
of data sets with dimensions of ≤ 3. In fact, it is amazing why [7].
how humans learn and extract so much knowledge even from Always robotics is seen as the most feared threat of AI for
little or incomplete data [8]. This is a strong motivator for mankind3 . In this talk, I want to emphasize that humanoid
the concept of interactive Machine Learning (iML), i.e., using AI = human-level AI. Achievement of human-level machine
the experience, knowledge, even the intuition of humans to intelligence was the basic objective since the early days of AI.
help to solve problems which would otherwise remain compu- It was actually started by Alan Turing with his question Can
tationally intractable. However, in most application domains, machines think? [17]. Exaggerated expectations led to a bitter
e.g., in health informatics, we are challenged with data of AI-winter, and recently we have been feeling a real AI-spring
arbitrarily high dimensions [9]. Within such data, relevant [18].
structural patterns and/or temporal patterns (“knowledge”)
are often hidden, knowledge is difficult to extract, hence not IV. A PPLICATION AREA : HEALTH
directly accessible to a human. There is need to bring the Why is the application domain health complex? In medicine
results from these high dimensions into the lower dimension we have two different worlds: we have the science of medicine:
for the human end user2 – the ”customer” of ML/AI. mathematics, physics, physiology, biology, chemistry, etc. at
Computer Science studies the principles of computational
the bench; and there is the clinical medicine focusing on the
learning from data to understand artificial intelligence [10].
patient at the bed side. The main problem is that there is
Computational learning has been of general interest for a very
a (big) gap between those two (and some say not only a
long time, but we are far away from solving intelligence:
big gap, but there is an ocean between those two worlds).
facts are not knowledge and descriptions are not insight, and
How can we bridge this gap? Our central hypothesis is:
new approaches are needed. A challenge is to interactively
information may bridge this gap. Not data. Not knowledge.
discover unknown patterns within high-dimensional data sets.
It is the quality of information what both sides need for
Computational geometry and algebraic topology may be of
making decisions [19]. Optimally designed workflows thereby
great help here [11]. For example, if we define M as hidden
integrating sophisticated AI/ML with appropriate visualization
parameter space, and we define RD as an observation space,
[20] directly into the workplaces of medical professionals may
and let f : M → RD be a continuous embedding; X ⊂ M
therefore be a great help for future medicine.
be a finite set of data points, and Y = f (X) ⊂ RD shall be
In the medical domain the Number 1 problem is the bad
the image of these points under the mapping f . Consequently,
quality of data along with the heterogeneity of data [21].
1 https://hci-kdd.org Consequently data integration, data fusion and data mapping
2 Although this can range from tablet computers to large wall-displays the
representation is always limited to R2 . 3 Intelligence does not need a metal body to be a thread
56
is one of the most important issues of data science and the now we can reduce this fraction by p(y) and we receive what
combination of various data would enable to get new kinds of is today called Bayes rule (actually it was Laplace)4 :
information, hence novel insights. p(y|x) ∗ p(x)
However, most of the data are in arbitrarily high dimen- p(x, y) = (8)
p(y)
sions, therefore we are always confronted with the curse of
dimensionality [22]. And this is now the basis for machine learning and applied in
A further problem is complexity, of all sort: clinical practice, all sort of advanced techniques, based on adding, and repeated
organization and information management are interdependent adding, and this is what our Von Neumann machines can do
and built around multiple self adjusting and interacting sys- good. However, it is not that easy in high-dimensional spaces,
tems. Here we are always confronted with unpredictability, and a grand challenge is: How to add efficient.
non-linearity and non-homogeneity in time [23]. In the following the large H is the hypothesis space, and
All these lead us to the fourth main problem: uncertainty. e.g. decision making is searching for an optimal solution in
an arbitrarily high dimensional search space. However, in
V. D EALING WITH UNCERTAINTY: S TATISTICAL LEARNING medicine we need not always the optimal solution often a
FROM BIG DATA good solution in short time is better because time is a very
critical aspect!
We now briefly rehearse the very basics of what makes
If we denote d as the data and h as the hypothesis and with
current ML so successful. The principles are so fascinat-
H = {H1 , H2 , ..., Hn } then ∀(h, d)
ing simple: Bayesian learning, optimization and prediction
(inverse probability), which go back to Thomas Bayes and P (d|h) ∗ P (h)
Richard Price [24]. I emphasize here that it was Pierre Simon P (h|d) = (9)
∈H p(d|)p()
de Laplace who did the pioneering work and delivered us the
foundations of statistical machine learning [25], [26]. P (d|h) ∗ P (h)
P (h|d) =
(10)
Let us consider n data contained in a set D h ∈H P (d|h )P (h )

D = x1:n = {x1 , x2 , ..., xn } J= f (θ) ∗ p(θ|D)dθ (11)
and let us write down the expression for the likelihood: These astonishingly simple fundamentals led to the current
success in automatic machine learning.
p(D|θ) (1)
VI. AUTOMATIC M ACHINE L EARNING ( A ML)
Next we specify a prior:
The ML community today is concentrating on automatic
p(θ) (2) machine learning (aML) approaches, with the grand goal
of bringing humans-out-of-the-loop [28], resulting in fully
Finally we can compute the posterior: autonomous solutions. Maybe the best practice real-world
example of today is autonomous driving [29].
p(D|θ) ∗ p(θ)
p(θ|D) = (3) This automatic machine learning (aML) works well when
p(D) having large amounts of training data [30]. That means the
The inverse probability allows to learn from data, infer often debated ”big data” issue is not bad, instead the large
unknowns, and make predictions [27]. amount of data is beneficial for automatic approaches (and I
Most fascinating is the simplicity of this approach; we can emphasize again the need of good quality data!).
add probabilities (sum rule): However, sometimes we do not have large amounts of
data, and/or we are confronted with rare events and/or hard
p(x) = (p(x, y)) (4) problems. The health domain is a representative example for
y a domain with many such complex data problems [31, 32].
In such domains the application of fully automatic approaches
By introducing ”repeated adding” (adding multiple times), we
(“press the button and wait for the results”) seems elusive in
can write (product rule):
the near future.
p(x, y) = p(y|x) ∗ p(y) (5) Again, a good example are Gaussian processes, where aML
approaches (e.g., kernel machines [33]) struggle on function
Laplace (in 1773 !) showed that we can write: extrapolation problems, which are astonishingly trivial for
human learners [34].
p(x, y) ∗ p(y) = p(y|x) ∗ p(x) (6) A famous example was given by [35] where they considered
and introduced a third operation (division): the problem of building high-level, class-specific feature de-
tectors from unlabeled for detecting a cat automatically on the
p(x, y) ∗ p(y) p(y|x) ∗ p(x)
= (7) 4 to be completely correct it should be called Bayes-Price-Laplace: BPL
p(y) p(y)
57
basis of 10 million 200200 pixel internet images. They trained problems are computationally hard. All these constraints make
a deep autoencoder on a cluster with 1,000 machines (16,000 the application of fully automated approaches difficult or
cores) for three days. The results impressively demonstrated even impossible. Also, the quality of results from automatic
that it is possible to train an image detector without labelling approaches might be questionable. Consequently, the integra-
the images - but at what price. tion of the knowledge, intuition and experience of a domain
A more recent example is the work by [36], who pre- expert can sometimes be indispensable and the interaction of a
sented fully automated classification of skin lesions using domain expert with the data would greatly enhance the whole
dermatological images. They trained deep convolutional neural ML pipeline. Hence, interactive machine learning (iML) puts
networks with a data set of 129,450 clinical images. They the “human-in-the-algorithmic-loop” to enable what neither a
used a GoogleNet Inception v3 network, pretrained on approx- human nor a computer could do on their own.
imately 1.28 million images (1,000 object categories) from We define iML-approaches as algorithms in an multi-agent-
the 2014 ImageNet Large Scale Visual Recognition Challenge. hybrid system, that can interact with both computational agents
The authors tested the performance against 21 board-certified and human agents5 and can optimize their learning behaviour
dermatologists on biopsy-proven clinical images with two through these interactions [39].
critical binary classification use cases: keratinocyte carcinomas Why should the integration of human intelligence be bene-
versus benign seborrheic keratoses; and malignant melanomas ficial? One strength of humans is that they, even little children,
versus benign nevi. The automatic ML achieves performance can make inferences from little data (zero-shot learning). The
on par with the 21 medical doctors, demonstrating that AI greatest strength is that the are able to recognize the context.
is actually capable of classifying skin cancer with a level of There is evidence that humans sometimes even outperform
competence comparable to dermatologists. ML-algorithms. Humans can provide almost instantaneous
Despite their impressive results such automatic approaches interpretations of complex patterns, for example in diagnostic
have some limitations: radiologic imaging: A promising technique to fill the semantic
1) they are very data intensive, need often millions of gap is to adopt an expert-in-the-loop approach, to integrate
training samples of highest quality which is both hard the physicians high-level expert knowledge into the retrieval
to achieve; process by acquiring his/her relevance judgments regarding a
2) they are non-convex, difficult to set up, difficult to train set of initial retrieval results [40].
and to optimize, are very error prone and sensitive Consequently, iML-approaches, by integrating a human-
to adversarial examples, consequently need a lot of into-the-loop (e.g. a human kernel [41], or the involvement
engineering effort; of a human directly into the machine-learning algorithm [39],
3) the are affected by catastrophic forgetting, so when a thereby making use of human cognitive abilities, seems to be
problem changes only slightly, it looses the learned a promising approach. iML-approaches can be of particular
parameters, this calls urgently for transfer learning and interest to solve problems in health informatics, where we
multi-task learning; are lacking big data sets, deal with complex data and/or rare
4) they are very resource intensive, need much computa- events, where traditional learning algorithms suffer due to
tional power and storage; insufficient training samples. Here the doctor-in-the-loop can
5) they are bad in dealing with uncertainties; but help, where human expertise and long-term experience can
6) most of all such approaches are considered to be black- assist in solving problems which otherwise would remain NP-
box approaches, they lack transparency, do therefore hard.
not foster trust and acceptance, and legal aspects make A recent experimental work [42] demonstrates the useful-
such opaque models extremely difficult to use in certain ness on the Traveling Salesman Problem (TSP), which appears
situations in a number of practical problems, e.g., the native folded three-
So, sometimes we (still) need a human-in-the-loop. Some- dimensional conformation of a protein in its lowest free energy
times we do not have big data where aML algorithms benefit, state; or both 2D and 3D folding processes as a free energy
sometimes we have only small amount of data sets (see e.g. minimization problem belong to a large set of computational
[37], or we have rare events or even no training samples, problems, assumed to be conditionally intractable [43]. As
or we deal with NP-hard problems, e.g. subspace clustering, the TSP is about finding the shortest path through a set of
protein folding, or k-anonymization, to name three. This leads points, it is an intransigent mathematical problem, where many
us directly to interactive machine learning. heuristics have been developed in the past to find approximate
solutions [44]. There is evidence that the inclusion of a human
VII. I NTERACTIVE M ACHINE L EARNING ( I ML) can be useful in numerous other problems in different appli-
However, we cannot compare car driving with the com- cation domains, see e.g., [45, 46]. However, for clarification,
plexity of the biomedical domain. The main problem for iML means the integration of a human into the algorithmic
automatic solutions is in the extremely poor quality of data loop, i.e., to open the black box approach to a glass box. Other
in this domain. Biomedical data sets are full of uncertainty, definitions speak also of a human-in-the-loop, but it is what
incompleteness etc.; they can contain missing, wrong data,
noisy data, dirty data, unwanted data, etc. However, many 5 In Active Learning such agents are referred to as so-called “oracles” [38]
58
we would call classic supervised approaches [47], or in a total NP-hard. Therefore, no polynomial time algorithm is possible
different meaning to put the human into physical feedback (unless P = NP). Consequently the protein folding problem is
loops [48]. NP-complete [55].
In such cases the inclusion of a “doctor-into-the-loop” [49] Many such problems (still) require an expert-in-the-loop,
can play a significant role in support of solving hard problems e.g., genome annotation, image analysis, knowledge-base pop-
(see the examples in the next paragraph), particularly in combi- ulation and protein structure. In some cases, humans are
nation with a large number of human agents (crowdsourcing). needed in vast quantities (e.g. in cancer research), whereas
From the theory of human problem solving it is known that, in others, we need just a few very specialized experts in
for example, medical doctors can often make diagnoses with certain fields (e.g., in the case of rare diseases). Crowdsourc-
great reliability - but without being able to explain their rules ing encompasses an emerging collection of approaches for
explicitly. Here iML could help to equip algorithms with such harnessing such distributed human intelligence. Recently, the
“instinctive” knowledge and learn thereof. The importance bioinformatics community has begun to apply crowdsourcing
of iML becomes also apparent when the use of automated in a variety of contexts, yet few resources are available that
solutions due to the incompleteness of ontologies is difficult describe how these human-powered systems work and how to
[50]. use them effectively in scientific domains. Generally, there are
In the following I provide three examples where the human- large-volume micro-tasks and highly difficult mega-tasks [56].
in-the-loop is beneficial. A good example of such an approach is foldit, an experimental
game which takes advantage of crowdsourcing for category
iML-Example 1: Subspace Clustering discovery of new protein structures [57]. Crowdsourcing and
Clustering is a descriptive task to identify homogeneous collective intelligence (putting many experts-into-the-loop)
groups of data objects based on the dimensions. Cluster- would generally offer much potential to foster translational
ing of large high-dimensional gene expression data sets has medicine (bridging biomedical sciences and clinical applica-
widespread application in -omics [51]. tions) by providing platforms upon which interdisciplinary
Unfortunately, the underlying structure of these natural data workforces can communicate and collaborate [58].
sets is often fuzzy, and the computational identification of data
clusters generally requires domain expert knowledge about e.g. iML-Example 3: k-anonymization of patient data
cluster number and geometry. The high-dimensionality of data Privacy preserving machine learning is an important issue,
is a huge problem in health informatics, because of the curse fostered by anonymization, in which a record is released
of dimensionality: with increasing dimensionality the volume only if it is indistinguishable from k other entities in the
of the space increases so fast that the available data becomes data. k-anonymity is highly dependent on spatial locality in
sparse, hence, it becomes impossible to find reliable clusters; order to effectively implement the technique in a statistically
also the concept of distance becomes less precise as the num- robust way, and in high dimensionalities data becomes sparse,
ber of dimensions grows, since the distance between any two hence, the concept of spatial locality is not easy to define.
points in a given data set converges. Last but not least different Consequently, it becomes difficult to anonymize the data
clusters might be found in different subspaces, so a global without an unacceptably high amount of information loss [59].
filtering of attributes is not sufficient. Given that large number Consequently, the problem of k-anonymization is on the one
of attributes, it is likely that some attributes are correlated, hand NP-hard, on the other hand the quality of the result
therefore clusters might exist in arbitrarily oriented affinity obtained can be measured at the given factors: k-anonymity
subspaces. Moreover, high-dimensional data often includes means that attributes are suppressed or generalized until each
irrelevant features, which can obscure to find the relevant ones, row in a database is identical with at least k − 1 other rows
thus increases the danger of modeling artifacts (i.e. undesired [60] [61]; l-diversity as extension of the k-anonymity model
outcomes or errors which can be misleading or confusing) reduces the granularity of data representation by generalization
[52]. The integration of a human-in-the-loop can be of help and suppression so that any given record maps onto at least k
[53]. other records in the data [62]; t-closeness is a refinement of
l-diversity by reducing the granularity of a data representation,
iML-Example 2: Protein Folding
and treating the values of an attribute distinctly by taking into
Proteins6 are very important for all life sciences. In protein account the distribution of data values for that attribute [63];
structure prediction there is much interest in using amino acid and delta-presence, which links the quality of anonymization
interaction preferences to align (thread) a protein sequence to the risk posed by inadequate anonymization [64]), but not
to a known structural motif. The protein alignment decision with regard to the actual security of the data, i.e., the re-
problem (does there exist an alignment (threading) with a identification through an attacker. For this purpose certain
score less than or equal to K?) is NP-complete and the related assumptions about the background knowledge of the hypo-
problem of finding the globally optimal protein threading is thetical enemy must be made. With regard to the particular
6 In my talk I provide an example from protein conformation, i.e. the x-
demographic and cultural clinical environment this is best done
ray structure of Avian Pancreatic Polypeptide (APP), which is a medium-size by a human agent. Thus, the problem of (k-)anonymization
protein of 36 amino acids [54]. represents a natural application domain for iML.
59
Humans are very capable in the explorative learning of In terms of integer linear programming the TSP is formu-
patterns from relatively few samples, whilst classic supervised lated as follows [69].
ML needs large sets of data and long processing time. In The cities, as the nodes, are in the set N of numbers
the biomedical domain often large sets of training data are 1, . . . , n; the edges are L = {(i, j) : i, j ∈ N , i = j}
missing, e.g., with rare diseases or with malfunctions of There are considered several variables: xij as in equa-
humans or machines. Moreover, in clinical medicine time is tion (12), the cost between cities i and j denoted with cij .
a crucial factor - where a medical doctor needs the results
quasi in real-time, or at least in a very short time (less than 5
1 , the path goes from city i to city j
minutes), for example, in emergency medicine or intensive xij = (12)
0 otherwise
care. Rare diseases are often life threatening and require
a rapid intervention - the lack of much data makes aML- The Traveling Salesman Problem is formulated to optimize,
approaches nearly impossible. An example for such a rare dis- more precisely to minimize the objective function illustrated
ease with only few available data sets is CADASIL (Cerebral in equation (13).
Autosomal Dominant Arteriopathy with Subcortical Infarcts n
n

and Leukoencephalopathy), a disease, which is prevalent in min cij xij (13)
5 per 100,000 persons and is therefore the most frequent i=1 i=j,j=1
monogenic inherited apoplectic stroke in Germany. The TSP constraints follow.
Particularly in the patient admission, human agents have
• The first condition, equation (14) is that each node i is
the advantage to perceive the total situation at a glance. This
visited only once.
aptitude results from the ability of transfer learning, where
knowledge can be transferred from one situation to another xij + xji = 2 (14)
situation, in which model parameters, i.e., learned features or i∈N ,(i,j)∈L j∈N ,(i,j)∈L
contextual knowledge are transferred.
The examples mentioned so far demonstrate that the appli- • The second condition, equation (15), ensures that no
cation of iML-approaches in “real-world” situations sometimes subtours, S are allowed.

can be advantageous. These examples demonstrate, that human xij ≤ |S| − 1, ∀S ⊂ N : 2 ≤ |S| ≤ n − 2
experience can help to reduce a search space of exponen- i,j∈L,(i,j)∈S
tial possibilities drastically by heuristic selection of samples, (15)
thereby help to solve NP-hard problems efficiently - or at least For the symmetric TSP the condition cij = cji holds. For
optimize them acceptably for a human end-user. the metric version the triangle inequality holds: cik + ckj ≥
We focused in a recent work [42] on the Traveling Salesman cij , ∀i, j, k nodes.
Problem (TSP), because it appears in a number of practical We implemented the travelling Snakesman7 in C, which
problems in health informatics, e.g. the native folded three- is also part of the .NET framework. The choice was made
dimensional conformation of a protein is its lowest free energy because it is supported by the game engine Unity [70].
state and both two- and three-dimensional folding processes If you are now smiling, or even skeptical that we use a game
as a free energy minimization problem belong to a large for our experiments, I would like to emphasize that Google
set of computational problems, assumed to be very hard is making enormous progress through the use of such games,
(conditionally intractable) [43]. see e.g. [7].
The TSP basically is about finding the shortest path through Gamification [71] is very powerful and we also could
a set of points, returning to the origin. As it is an intransigent proof the concept of interactive machine learning with the
mathematical problem, many heuristics have been developed human-in-the-loop with gamification experiments. Psycholog-
in the past to find approximate solutions [44]. ical research indicates that human intuition based on distinct
The Traveling Salesman Problem (TSP) is one of the most behavioral and cognitive strategies that developed evolutionary
known and studied Combinatorial Optimization Problems. over millions of years. For improving ML, we need to identify
Problems connected to TSP were mentioned as early as the the concrete mechanisms and we argue that this can be done
last eighteenth century [65]. During the past century, TSP best by observing crowd behaviors and decisions in gamified
has become a traditional example of difficult problems and situations (see also a classic example for the usefulness of
also a common testing problem for new methodologies and games in [72].
algorithms in Optimization. It has now many variants, solving
approaches and applications [66]. For example, it models in VIII. T OWARDS E XPLAINABLE AI
computational biology the construction of the evolutionary Very interesting was the recent success in mastering the
trees [67] and in genetics - the DNA sequencing [68] - to game of go without human knowledge [73].
provide only a few examples. When Google DeepMind won the (human) Go player ex-
The problem is a N P-hard problem, meaning that there is claimed: ”Why did it make this move ...”. The main problem
no polynomial algorithm for solving it to optimality. For a
given number of n cities there are (n − 1)! different tours. 7 https://hci-kdd.org/project/iml
60
is that the best performing methods are ”black boxes” and can medical research corpora (large ontologies). In addition, the
not ”explain” why they came up with a certain decision. resulting interaction paths are further constrained by known
Unfortunately, always the ”no free-lunch”-theorem [74] is logical limitations of the domain (in this example: Biology).
true, and also in explainable AI we have a ”trade-off between As a result, the classification is presented, thus can be made
prediction performance and explainability: the most perform- re-traceable, hence be explained by an human expert as an an-
ing models are the least transparent. DARPA had last year notated interaction path, with annotations on each edge linking
initiated a funding initiative with the goal to create a suite of to specific medical texts that provide supporting evidence [81].
ML techniques that produce explainable models, while at the A framework for unsupervised learning of a hierarchical
same time maintaining a high level of performance [75]. reconfigurable image template was presented by [82]. This
At first we should make us aware of what is understandable AND-OR Template (AOT) for visual objects shows three inter-
to a human; understanding is not only recognizing, perceiving esting elements: 1) a hierarchical composition as AND nodes,
and reproducing or simply a ”re-presentation” of facts, but the 2) the deformation and articulation of parts as geometric OR
intellectual understanding of the context in which these facts nodes, and 3) multiple ways of composition as structural OR
appear. Rather, understanding can be seen as a bridge between nodes. The terminal nodes are hybrid image templates (HIT)
perceiving and reasoning. From capturing the context, without [83] that are fully generative to the pixels. Both structures and
doubt an important indicator of intelligence, the current state- parameters of this model can be learned unsupervised from
of-the-art AI is still many miles away. On the other hand, images using an information projection principle; which is an
humans are able to instantaneously capture the context and awesome technique known for a long time [84], [85]. The
make very good generalizations from very few data points – learning algorithm itself consists of two steps: 1) a recursive
at least in the lower dimensions [76]. I present an example8 block pursuit procedure to learn the hierarchical dictionary
of an explanation interface from our own work: A heatmap of the primitives, and 2) a graph compression procedure to
for visualizing molecule properties; the central view is an minimize the model structure for generalizability.
interactive table view (1), i.e. the columns are molecules A good example is the hierarchical generative model pre-
grouped by similarities using an hierarchical cluster algorithm. sented by Lin et al. (2009), where objects are broken into
The dendrogram (2) shows the grouping, that means the end their constituent parts (Yann LeCun always underlies in his
user can decide which molecules form a group. The rows talks that ”our world is compositional” which is also often
are characteristics of the molecules, e.g. efficacy, chemical used by Alan Yuille, Jason Eisner, Stuart A Geman) and
and other numerical measurements. The values per molecule the variability of configurations and relationships between
are color-coded, so one can quickly see how groups differ. these parts are modeled by stochastic attribute graph gram-
Exploration is done through interaction and drill-down, among mars. These are embedded in an AND-OR graph for each
other things to get structure explanations. In 3 we see the compositional object category. It combines the power of a
settings to configure the heatmap; between the left sidebar stochastic context free grammar to express the variability of
and the heatmap we find the legend for the values (4), on the part configurations and a Markov random field represents
right the names of the molecules (5); the controls (6) and a the pictorial spatial relationships between these parts. As a
tree map (7) [51]. generative model, different object instances of a category can
be realized as a traversal through the AND-OR graph in order
IX. F UTURE O UTLOOK to get a valid configuration. The inference is connected to the
One possibility of how we might bridge the gap between structure of the model and follows a probabilistic formulation
artificial inference and human understanding is a combination consisting of bottom-up detection steps for the parts, which
of deep learning technologies with ontological approaches in turn recursively activate the grammar rules for top-down
[77]. A good current example is Deep Tensor [78], which is verification and searches for missing parts [86].
a deep neural network, suited for data sets with meaningful Coming to the conclusion, I want to emphasize that com-
graph-like properties and it is beneficial for us that the domains putational approaches can find patterns in arbitrarily high-
of biology, chemistry, medicine, drug design, etc. offer many dimensional spaces what no human would be able to see.
such data sets (see for an overview [79]). Here the interactions Consequently, we need an augmentation of human intelligence
between various entities (mutations, genes, drugs, disease, with artificial intelligence - but also vice versa. In particular
etc.) can be encoded via graphs. If we now consider a Deep situations of problem solving to date only human experts are
Tensor network that learns to identify biological interaction able to understand the context. Therefore we need solutions
paths that lead to a certain disease, we would be able now for effective mapping of results from high dimensional spaces
to automatically identify and to make understandable the into the lower dimensions to make it not only perceivable and
inference factors that significantly influenced the classification manipulative to humans, but the raising challenge is that the
results. These influence factors can further be used to filter best performing methods are the least transparent. Our best
a knowledge graph [80] constructed from publicly available methods are not re-traceable, thus not understandable, hence
it is not possible to explain why a decision has been made.
8 the image can be seen here: https://gi.de/informatiklexikon/explainable-ai- However, current trends in privacy make transparent ”glass
ex-ai box” solutions mandatory.
61
If you ask me now what the most interesting topics towards R EFERENCES
reaching context adaptivity are, I would recommend three
major directions: [1] Andreas Holzinger and Igor Jurisica. Knowledge dis-
1) Multi-Task Learning to help to reduce catastrophic for- covery and data mining in biomedical informatics: The
getting 2) Transfer learning, which is is not easy: learning to future is in integrative, interactive machine learning so-
perform a task by exploiting knowledge acquired when solving lutions. In Andreas Holzinger and Igor Jurisica, editors,
previous tasks: A solution to this problem would have major Lecture Notes in Computer Science LNCS 8401, pages
impact to AI research generally and ML specifically! 3) Multi- 1–18. Springer, Heidelberg, 2014. URL http://dx.doi.org/
Agent Hybrid Systems making use of collective intelligence 10.1007/978-3-662-43968-5 1.
and crowd-sourcing by integrating a human-in-the-loop, and [2] Andreas Holzinger. On knowledge discovery and
fostering client side machine learning (federated learning) to interactive intelligent visualization of biomedical
ensure privacy, data protection, safety and security. data - challenges in humancomputer interaction
I would like to close this talk with a citation attributed to & biomedical informatics. In Markus Helfert,
Albert Einstein (which surely is not from Albert Einstein): Chiara Fancalanci, and Joaquim Filipe, editors,
”Computers are incredibly fast, accurate and stupid, humans DATA 2012, International Conference on Data
are incredibly slow, inaccurate and brilliant, together they are Technologies and Applications, pages 5–16, 2012.
powerful beyond imagination.” URL https://online.tugraz.at/tug online/voe main2.
Thank you very much! getVollText?pDocumentNr=258208&pCurrPk=64857.
[3] Andreas Holzinger. Human-computer interaction &
A BOUT THE K EYNOTE S PEAKER knowledge discovery (HCI-KDD): What is the benefit of
bringing those two fields to work together? In Alfredo
Andreas Holzinger is lead of the Holzinger Group, HCI- Cuzzocrea, Christian Kittl, Dimitris E. Simos, Edgar
KDD, Institute for Medical Informatics, Statistics at the Med- Weippl, and Lida Xu, editors, Multidisciplinary Research
ical University Graz, and Associate Professor of Applied and Practice for Information Systems, Springer Lecture
Computer Science at the Faculty of Computer Science and Notes in Computer Science LNCS 8127, pages 319–
Biomedical Engineering at Graz University of Technology. He 328. Springer, Heidelberg, Berlin, New York, 2013. doi:
serves as consultant for the Canadian, US, UK, Swiss, French, 10.1007/978-3-642-40511-2 22.
Italian and Dutch governments, for the German Excellence [4] Andreas Holzinger. Trends in interactive knowledge
Initiative, and as national expert in the European Commission. discovery for personalized medicine: Cognitive science
Andreas obtained a Ph.D. in Cognitive Science from Graz meets machine learning. IEEE Intelligent Informatics
University in 1998 and his Habilitation in Computer Science Bulletin, 15(1):6–14, 2014.
from TU Graz in 2003. Andreas was Visiting Professor for [5] Andreas Holzinger. Introduction to machine learning and
Machine Learning & Knowledge Extraction in Verona, RWTH knowledge extraction (make). Machine Learning and
Aachen, University College London and Middlesex University Knowledge Extraction, 1(1):1–20, 2017. doi: 10.3390/
London. Since 2016 Andreas is Visiting Professor for Machine make1010001.
Learning in Health Informatics at the Faculty of Informatics [6] Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and
at Vienna University of Technology. He founded the Network Edgar Weippl. Machine Learning and Knowledge Ex-
HCI-KDD to foster a synergistic combination of methodolo- traction: IFIP TC 5, WG 8.4, 8.9, 12.9 International
gies of two areas that offer ideal conditions toward unraveling Cross-Domain Conference, CD-MAKE 2017, Lecture
problems in understanding intelligence: Human-Computer In- Notes in Computer Science LNCS 10410. Springer-
teraction (HCI) & Knowledge Discovery/Data Mining (KDD), Nature, Cham, 2017. doi: 10.1007/978-3-319-66808-6.
with the goal of supporting human intelligence with artificial [7] Volodymyr Mnih, Koray Kavukcuoglu, David Silver,
intelligence. Andreas is Associate Editor of Springer/Nature Andrei A. Rusu, Joel Veness, Marc G. Bellemare,
Knowledge and Information Systems (KAIS), Section Editor Alex Graves, Martin Riedmiller, Andreas K. Fidjeland,
for Machine Learning of Springer/Nature BMC Medical In- Georg Ostrovski, Stig Petersen, Charles Beattie, Amir
formatics and Decision Making (MIDM), and Editor-in-Chief Sadik, Ioannis Antonoglou, Helen King, Dharshan Ku-
of Machine Learning & Knowledge Extraction (MAKE). He maran, Daan Wierstra, Shane Legg, and Demis Hassabis.
is organizer of the IFIP Cross-Domain Conference Machine Human-level control through deep reinforcement learn-
Learning & Knowledge Extraction (CD-MAKE) and Austrian ing. Nature, 518(7540):529–533, 2015. doi: 10.1038/
representative for Artificial Intelligence in the IFIP TC 12 nature14236.
and member of IFIP WG 12.9 Computational Intelligence, the [8] Joshua B. Tenenbaum, Charles Kemp, Thomas L. Grif-
ACM, IEEE, GI, the Austrian Computer Science and the Asso- fiths, and Noah D. Goodman. How to grow a mind:
ciation for the Advancement of Artificial Intelligence (AAAI). Statistics, structure, and abstraction. Science, 331(6022):
Since 2003 Andreas has participated in leading positions in 1279–1285, 2011. doi: 10.1126/science.1192788.
30+ R&D multi-national projects, budget 4+ MEUR, 300+ [9] Sangkyun Lee and Andreas Holzinger. Knowledge dis-
publications, 9600+ citations, h-Index = 45. covery from complex high dimensional data. In Stefan
62
Michaelis, Nico Piatkowski, and Marco Stolpe, editors, Filipe, editors, Communications in Computer and Infor-
Solving Large Scale Learning Tasks. Challenges and mation Science CCIS 455, pages 3–18. Springer, Berlin
Algorithms, Lecture Notes in Artificial Intelligence, LNAI Heidelberg, 2014. doi: 10.1007/978-3-662-44791-8 1.
9580, pages 148–167. Springer, Cham, 2016. URL [22] Jerome H. Friedman. On bias, variance, 0/1loss, and
http://dx.doi.org/10.1007/978-3-319-41706-6 7. the curse-of-dimensionality. Data mining and knowl-
[10] Michael I. Jordan and Tom M. Mitchell. Machine learn- edge discovery, 1(1):55–77, 1997. doi: 10.1023/A:
ing: Trends, perspectives, and prospects. Science, 349 1009778005914.
(6245):255–260, 2015. URL http://dx.doi.org/10.1126/ [23] Paul E. Plsek and Trisha Greenhalgh. Complexity sci-
science.aaa8415. ence: The challenge of complexity in health care. BMJ
[11] Tamal K Dey, Herbert Edelsbrunner, and Sumanta Guha. British Medical Journal, 323(7313):625–628, 2001.
Computational topology. Contemporary Mathematics, [24] Thomas Bayes. An essay towards solving a problem in
223:109–144, 1999. doi: 10.1090/conm/223/03135. the doctrine of chances (communicated by richard price).
[12] Vin De Silva, Dmitriy Morozov, and Mikael Vejdemo- Philosophical Transactions, 53:370–418, 1763.
Johansson. Persistent cohomology and circular coordi- [25] Pierre-Simon Laplace. Mmoire sur les probabilits.
nates. Discrete and Computational Geometry, 45(4):737– Mmoires de lAcadmie Royale des sciences de Paris,
759, 2011. doi: 10.1007/s00454-011-9344-x. 1778:227–332, 1781. URL http://www.cs.xu.edu/math/
[13] Massimo Ferri. Why topology for machine learning and Sources/Laplace/memoir probabilities.pdf.
knowledge extraction? Machine Learning and Knowl- [26] Pierre-Simon Laplace. Philosophical Essay on Proba-
edge Extraction (MAKE), 1(1):6, 2018. doi: 10.3390/ bilities: Translated 1995 from the fifth French edition of
make1010006. 1825 With Notes by Andrew I. Dale. Springer Science,
[14] Yoshua Bengio, Aaron Courville, and Pascal Vincent. New York, 1825.
Representation learning: A review and new perspectives. [27] Zoubin Ghahramani. Probabilistic machine learning
IEEE transactions on pattern analysis and machine intel- and artificial intelligence. Nature, 521(7553):452–459,
ligence, 35(8):1798–1828, 2013. URL http://dx.doi.org/ 2015. doi: 10.1038/nature14541. URL http://dx.doi.org/
10.1109/TPAMI.2013.50. 10.1038/nature14541.
[15] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra [28] Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P.
Malik. Rich feature hierarchies for accurate object Adams, and Nando de Freitas. Taking the human
detection and semantic segmentation. In Proceedings out of the loop: A review of bayesian optimization.
of the IEEE conference on computer vision and pattern Proceedings of the IEEE, 104(1):148–175, 2016. doi:
recognition (CVPR), pages 580–587. IEEE, 2014. doi: 10.1109/JPROC.2015.2494218.
10.1109/CVPR.2014.81. [29] Jesse Levinson, Jake Askeland, Jan Becker, Jennifer
[16] Christopher J.C.H. Watkins and Peter Dayan. Q-learning. Dolson, David Held, Soeren Kammel, J. Zico Kolter,
Machine learning, 8(3-4):279–292, 1992. doi: 10.1007/ Dirk Langer, Oliver Pink, and Vaughan Pratt. Towards
BF00992698. fully autonomous driving: Systems and algorithms. In
[17] Alan M Turing. Computing machinery and intelligence. Intelligent Vehicles Symposium (IV), 2011 IEEE, pages
Mind, 59(236):433–460, 1950. 163–168. IEEE, 2011.
[18] Lotfi A. Zadeh. Toward human level machine intelligence [30] Soeren Sonnenburg, Gunnar Raetsch, Christin Schaefer,
- is it achievable? the need for a paradigm shift. IEEE and Bernhard Schoelkopf. Large scale multiple kernel
Computational Intelligence Magazine, 3(3):11–22, 2008. learning. Journal of Machine Learning Research, 7(7):
doi: 10.1109/MCI.2008.926583. 1531–1565, 2006. URL http://www.jmlr.org/papers/v7/
[19] Andreas Holzinger and Klaus-Martin Simonic. Infor- sonnenburg06a.html.
mation Quality in e-Health. Lecture Notes in Computer [31] Andreas Holzinger, Matthias Dehmer, and Igor Jurisica.
Science LNCS 7058. Springer, Heidelberg, Berlin, New Knowledge discovery and interactive data mining in
York, 2011. doi: 10.1007/978-3-642-25364-5. bioinformatics - state-of-the-art, future challenges and
[20] Cagatay Turkay, Fleur Jeanquartier, Andreas Holzinger, research directions. BMC Bioinformatics, 15(S6):I1,
and Helwig Hauser. On computationally-enhanced visual 2014. doi: 10.1186/1471-2105-15-S6-I1.
analysis of heterogeneous data and its application in [32] Andreas Holzinger. Biomedical Informatics: Computa-
biomedical informatics. In Andreas Holzinger and Igor tional Sciences meets Life Sciences. BoD, Norderst-
Jurisica, editors, Interactive Knowledge Discovery and edt, 2012. URL http://www.bod.de/index.php?id=1132&
Data Mining: State-of-the-Art and Future Challenges objk id=859299.
in Biomedical Informatics. Lecture Notes in Computer [33] Thomas Hofmann, Bernhard Schoelkopf, and Alexan-
Science LNCS 8401, pages 117–140. Springer, Berlin, der J. Smola. Kernel methods in machine learning. The
Heidelberg, 2014. doi: 10.1007/978-3-662-43968-5 7. annals of statistics, 36(3):1171–1220, 2008.
[21] Andreas Holzinger, Christof Stocker, and Matthias [34] Thomas L Griffiths, Chris Lucas, Joseph Williams, and
Dehmer. Big complex biomedical data: Towards a tax- Michael L Kalish. Modeling human function learning
onomy of data. In Mohammad S. Obaidat and Joaquim with gaussian processes. In Daphne Koller, Dale Schu-
63
urmans, Yosuha Bengio, and Leon Bottou, editors, Ad- bf03213088.
vances in neural information processing systems (NIPS [45] Francesco Napolitano, Giancarlo Raiconi, Roberto Tagli-
2008), volume 21, pages 553–560. NIPS, 2009. aferri, Angelo Ciaramella, Antonino Staiano, and Gen-
[35] Quoc V Le, Marc’Aurelio Ranzato, Rajat Monga, naro Miele. Clustering and visualization approaches
Matthieu Devin, Kai Chen, Greg S Corrado, Jeff Dean, for human cell cycle gene expression data analysis.
and Andrew Y Ng. Building high-level features using International Journal of Approximate Reasoning, 47(1):
large scale unsupervised learning. arXiv:1112.6209, 70–84, 2008. doi: 10.1016/j.ijar.2007.03.013.
2011. [46] Roberto Amato, Angelo Ciaramella, N Deniskina,
[36] Andre Esteva, Brett Kuprel, Roberto A. Novoa, Justin Carmine Del Mondo, Diego di Bernardo, Ciro Donalek,
Ko, Susan M. Swetter, Helen M. Blau, and Sebastian Giuseppe Longo, Giuseppe Mangano, Gennaro Miele,
Thrun. Dermatologist-level classification of skin cancer and Giancarlo Raiconi. A multi-step approach to
with deep neural networks. Nature, 542(7639):115–118, time series analysis and gene expression clustering.
2017. doi: 10.1038/nature21056. Bioinformatics, 22(5):589–596, 2006. doi: 10.1093/
[37] Andreas Holzinger, Bernd Malle, Peter Kieseberg, Pe- bioinformatics/btk026.
ter M. Roth, Heimo Mller, Robert Reihs, and Kurt Zat- [47] Chi-Ren Shyu, Carla E. Brodley, Avinash C. Kak, Akio
loukal. Towards the augmented pathologist: Challenges Kosaka, Alex M. Aisen, and Lynn S. Broderick. Assert:
of explainable-ai in digital pathology. arXiv:1712.06657, A physician-in-the-loop content-based retrieval system
2017. for hrct image databases. Computer Vision and Image
[38] Burr Settles. From theories to queries: Active learning in Understanding, 75(12):111–132, 1999. doi: 10.1006/
practice. In Isabelle Guyon, Gavin Cawley, Gideon Dror, cviu.1999.0768.
Vincent Lemaire, and Alexander Statnikov, editors, Ac- [48] Gunar Schirner, Deniz Erdogmus, Kaushik Chowdhury,
tive Learning and Experimental Design Workshop 2010, and Taskin Padir. The future of human-in-the-loop cyber-
volume 16, pages 1–18. JMLR Proceedings, Sardinia, physical systems. Computer, 46(1):36–45, 2013.
2011. [49] Peter Kieseberg, Johannes Schantl, Peter Früwirt, Edgar
[39] Andreas Holzinger. Interactive machine learning for Weippl, and Andreas Holzinger. Witnesses for the doctor
health informatics: When do we need the human-in-the- in the loop. In Yike Guo, Karl Friston, Faisal Aldo,
loop? Springer Brain Informatics (BRIN), 3(2):119– Sean Hill, and Hanchuan Peng, editors, Brain Informatics
131, 2016. doi: 10.1007/s40708-016-0042-6. URL and Health, Lecture Notes in Artificial Intelligence LNAI
http://dx.doi.org/10.1007/s40708-016-0042-6. 9250, pages 369–378. Springer, Heidelberg, Berlin, 2015.
[40] Ceyhun B. Akgl, Daniel L. Rubin, Sandy Napel, Christo- [50] Martin Atzmueller, Joachim Baumeister, and Frank
pher F. Beaulieu, Hayit Greenspan, and Burak Acar. Puppe. Introspective subgroup analysis for interac-
Content-based image retrieval in radiology: Current sta- tive knowledge refinement. In Geoff Sutcliffe and
tus and future directions. Journal of Digital Imaging, 24 Randy Goebel, editors, FLAIRS Nineteenth International
(2):208–222, 2011. doi: 10.1007/s10278-010-9290-9. Florida Artificial Intelligence Research Society Confer-
[41] Andrew G. Wilson, Christoph Dann, Chris Lucas, and ence, pages 402–407. AAAI Press, 2006.
Eric P. Xing. The human kernel. In Corinna Cortes, [51] Werner Sturm, Tobias Schreck, Andreas Holzinger, and
Neil D. Lawrence, Daniel D. Lee, Masashi Sugiyama, Torsten Ullrich. Discovering medical knowledge using
and Roman Garnett, editors, Advances in Neural In- visual analytics a survey on methods for systems biology
formation Processing Systems, NIPS 2015, volume 28, and omics data. In Katja Bühler, Lars Linsen, and
pages 2836–2844, 2015. Nigel W. John, editors, Eurographics Workshop on Visual
[42] Andreas Holzinger, Markus Plass, Katharina Holzinger, Computing for Biology and Medicine (2015), pages 71–
Gloria Cerasela Crisan, Camelia-M. Pintea, and Vasile 81. Eurographics EG, 2015. doi: DOI:10.2312/vcbm.
Palade. Towards interactive machine learning (iml): 20151210.
Applying ant colony algorithms to solve the traveling [52] Emmanuel Müller, Ira Assent, Ralph Krieger, Timm
salesman problem with the human-in-the-loop approach. Jansen, and Thomas Seidl. Morpheus: interactive explo-
In Springer Lecture Notes in Computer Science LNCS ration of subspace clustering. In Proceedings of the 14th
9817, pages 81–95. Springer, Heidelberg, Berlin, New ACM SIGKDD international conference on Knowledge
York, 2016. doi: 10.1007/978-3-319-45507-56. discovery and data mining KDD 08, pages 1089–1092.
[43] Pierluigi Crescenzi, Deborah Goldman, Christos Pa- ACM, 2008. doi: 10.1145/1401890.1402026.
padimitriou, Antonio Piccolboni, and Mihalis Yan- [53] Michael Hund, Werner Sturm, Tobias Schreck, Torsten
nakakis. On the complexity of protein folding. Journal Ullrich, Daniel Keim, Ljiljana Majnaric, and Andreas
of computational biology, 5(3):423–465, 1998. doi: Holzinger. Analysis of patient groups and immunization
10.1016/S0092-8240(05)80170-3. results based on subspace clustering. In Yike Guo, Karl
[44] J. N. Macgregor and T. Ormerod. Human perfor- Friston, Faisal Aldo, Sean Hill, and Hanchuan Peng,
mance on the traveling salesman problem. Perception editors, Brain Informatics and Health, Lecture Notes in
& Psychophysics, 58(4):527–539, 1996. doi: 10.3758/ Artificial Intelligence LNAI 9250, volume 9250, pages
64
358–368. Springer International Publishing, Cham, 2015. [65] Gilbert Laporte. The traveling salesman problem: An
doi: 10.1007/978-3-319-23344-4 35. overview of exact and approximate algorithms. European
[54] Henrik Bohr and Saren Brunak. A travelling salesman Journal of Operational Research, 59(2):231–247, 1992.
approach to protein conformation. Complex Systems, 3 doi: 10.1016/0377-2217(92)90138-Y.
(9):9–28, 1989. [66] David L. Applegate, Robert E. Bixby, Vasek Chvatal,
[55] R. H. Lathrop. The protein threading problem with and William J. Cook. The traveling salesman problem:
sequence amino-acid interaction preferences is np- a computational study. Princeton university press, 2006.
complete. Protein Engineering, 7(9):1059–1068, 1994. [67] Chantal Korostensky and Gaston H. Gonnet. Using
doi: 10.1093/protein/7.9.1059. traveling salesman problem algorithms for evolutionary
[56] Benjamin M. Good and Andrew I. Su. Crowd- tree construction. Bioinformatics, 16(7):619–627, 2000.
sourcing for bioinformatics. Bioinformatics, 29(16): doi: 10.1093/bioinformatics/16.7.619.
1925–1933, 2013. doi: 10.1093/bioinformatics/btt333. [68] Richard M. Karp. Mapping the genome: some com-
URL http://bioinformatics.oxfordjournals.org/content/29/ binatorial problems arising in molecular biology. In
16/1925.abstract. Proceedings of the twenty-fifth annual ACM symposium
[57] Seth Cooper, Firas Khatib, Adrien Treuille, Janos Bar- on Theory of computing (STOC 1993), pages 278–285.
bero, Jeehyung Lee, Michael Beenen, Andrew Leaver- ACM, 1993. doi: 10.1145/167088.167170.
Fay, David Baker, and Zoran Popovic. Predicting protein [69] Christos H Papadimitriou and Kenneth Steiglitz. Combi-
structures with a multiplayer online game. Nature, 466 natorial optimization: algorithms and complexity. Dover,
(7307):756–760, 2010. doi: 10.1038/nature09304. Mineola, New York, 1982.
[58] Eleanor Jane Budge, Sandra Maria Tsoti, Daniel [70] Sue Blackman. Beginning 3D Game Development with
James Howgate, Shivan Sivakumar, and Morteza Jalali. Unity 4: All-in-one, multi-platform game development.
Collective intelligence for translational medicine: Crowd- Second Edition. Apress, New York, 2013.
sourcing insights and innovation from an interdisci- [71] Andreas Holzinger, Markus Plass, and Michael D.
plinary biomedical research community. Annals of Kickmeier-Rust. Interactive machine learning (iml):
Medicine, 47(7), 2015. doi: 10.3109/07853890.2015. a challenge for game-based approaches. In Isabelle
1091945. Guyon, Evelyne Viegas, Sergio Escalera, Ben Hamner,
[59] Charu C Aggarwal. On k-anonymity and the curse of and Balasz Kegl, editors, Challenges in Machine Learn-
dimensionality. In Proceedings of the 31st international ing: Gaming and Education. NIPS Workshops, 2016.
conference on Very large data bases VLDB, pages 901– [72] Martin Ebner and Andreas Holzinger. Successful im-
909, 2005. plementation of user-centered game based learning in
[60] Pierangela Samarati and Latanya Sweeney. Generalizing higher education: An example from civil engineering.
data to provide anonymity when disclosing information. Computers and Education, 49(3):873–890, 2007. doi:
In Alberto O. Mendelzon and Jan Paredaens, editors, 10.1016/j.compedu.2005.11.026.
PODS ’98 17th ACM SIGACT-SIGMOD-SIGART Sym- [73] David Silver, Julian Schrittwieser, Karen Simonyan,
posium on Principles of Database Systems, page 188. Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas
ACM, 1998. doi: 10.1145/275487.275508. Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yu-
[61] Latanya Sweeney. Achieving k-anonymity privacy pro- tian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre,
tection using generalization and suppression. Interna- George van den Driessche, Thore Graepel, and Demis
tional Journal of Uncertainty, Fuzziness and Knowledge- Hassabis. Mastering the game of go without human
Based Systems, 10(5):571–588, 2002. doi: 10.1142/ knowledge. Nature, 550(7676):354–359, 2017. doi:
S0218488502001648. URL http://www.worldscientific. doi:10.1038/nature24270.
com/doi/abs/10.1142/S0218488502001648. [74] David H Wolpert and William G Macready. No free
[62] Ashwin Machanavajjhala, Daniel Kifer, Johannes lunch theorems for optimization. IEEE Transactions on
Gehrke, and Muthuramakrishnan Venkitasubramaniam. Evolutionary Computation, 1(1):67–82, 1997. doi: 10.
l-diversity: Privacy beyond k-anonymity. ACM Trans- 1109/4235.585893.
actions on Knowledge Discovery from Data (TKDD), 1 [75] David Gunning. Explainable artificial intelligence (XAI):
(1):1–52, 2007. doi: 10.1145/1217299.1217302. Technical Report Defense Advanced Research Projects
[63] Ninghui Li, Tiancheng Li, and Suresh Venkatasubrama- Agency DARPA-BAA-16-53. DARPA, Arlington, USA,
nian. t-closeness: Privacy beyond k-anonymity and l- 2016.
diversity. In IEEE 23rd International Conference on Data [76] Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B.
Engineering, ICDE 2007, pages 106–115. IEEE, 2007. Tenenbaum. Human-level concept learning through prob-
doi: 10.1109/ICDE.2007.367856. abilistic program induction. Science, 350(6266):1332–
[64] M. E. Nergiz and C. Clifton. delta-presence without 1338, 2015. doi: 10.1126/science.aab3050.
complete world knowledge. IEEE Transactions on [77] Andreas Holzinger, Peter Kieseberg, Edgar Weippl, and
Knowledge and Data Engineering, 22(6):868–883, 2010. A Min Tjoa. Current advances, trends and challenges
doi: 10.1109/tkde.2009.125. of machine learning and knowledge extraction: From
65
machine learning to explainable ai. In Springer Lec-
ture Notes in Computer Science LNCS 11015. Springer,
Cham, 2018.
[78] Koji Maruhashi, Masaru Todoriki, Takuya Ohwa,
Keisuke Goto, Yu Hasegawa, Hiroya Inakoshi, and Hi-
rokazu Anai. Learning multi-way relations via tensor
decomposition with neural networks. In The Thirty-
Second AAAI Conference on Artificial Intelligence AAAI-
18, pages 3770–3777, 2018. URL https://www.aaai.org/
ocs/index.php/AAAI/AAAI18/paper/view/17010/16600.
[79] Andreas Holzinger. Biomedical Informatics: Discovering
Knowledge in Big Data. Springer, New York, 2014. doi:
10.1007/978-3-319-04528-3.
[80] Heiko Paulheim. Knowledge graph refinement: A survey
of approaches and evaluation methods. Semantic web, 8
(3):489–508, 2017. doi: 10.3233/SW-160218.
[81] Randy Goebel, Ajay Chander, Katharina Holzinger,
Freddy Lecue, Zeynep Akata, Simone Stumpf, Peter
Kieseberg, and Andreas Holzinger. Explainable ai: the
new 42? In Springer Lecture Notes in Computer Science
LNCS 11015. Springer, Cham, 2018.
[82] Zhangzhang Si and Song-Chun Zhu. Learning and-or
templates for object recognition and detection. IEEE
transactions on pattern analysis and machine intelli-
gence, 35(9):2189–2205, 2013. doi: 10.1109/TPAMI.
2013.35.
[83] Zhangzhang Si and Song-Chun Zhu. Learning hybrid
image templates (hit) by information projection. IEEE
Transactions on pattern analysis and machine intelli-
gence, 34(7):1354–1367, 2012. doi: 10.1109/TPAMI.
2011.227.
[84] Jerome H. Friedman and John W. Tukey. A projection
pursuit algorithm for exploratory data analysis. IEEE
Transactions on Computers, 23(9):881–890, 1974. doi:
10.1109/T-C.1974.224051.
[85] Peter J Huber. Projection pursuit. The Annals of Statis-
tics, 13(2):435–475, 1985. doi: jstor.org/stable/2241175.
[86] Liang Lin, Tianfu Wu, Jake Porway, and Zijian Xu.
A stochastic graph grammar for compositional object
representation and recognition. Pattern Recognition, 42
(7):1297–1307, 2009. doi: 10.1016/j.patcog.2008.10.033.
66

From Machine Learning To Explainable AI

Uploaded by

Copyright:

Available Formats

From Machine Learning To Explainable AI

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

From Machine Learning To Explainable AI

Uploaded by

Copyright:

Available Formats

From Machine Learning to Explainable AI

You might also like