GDPR on Artificial Intelligence 1593737770
GDPR on Artificial Intelligence 1593737770
GDPR on Artificial Intelligence 1593737770
STUDY
Panel for the Future of Science and Technology
AUTHOR
The study was led by Professor Giovanni Sartor, European University Institute of Florence, at the request of the
Panel for the Future of Science and Technology (STOA) and managed by the Scientific Foresight Unit, within
the Directorate-General for Parliamentary Research Services (EPRS) of the Secretariat of the European
Parliament. It was co-authored by Professor Sartor and Dr Francesca Lagioia, European University Institute of
Florence, working under his supervision.
ADMINISTRATOR RESPONSIBLE
Mihalis Kritikos, Scientific Foresight Unit (STOA)
To contact the publisher, please e-mail stoa@ep.europa.eu
LINGUISTIC VERSION
Original: EN
Manuscript completed in June 2020.
PE 641.530
ISBN: 978-92-846-6771-0
doi: 10.2861/293
QA-QA-02-20-399-EN-N
Executive summary
AI and big data
In the last decade, AI has gone through rapid development. It has acquired a solid scientific basis
and has produced many successful applications. It provides opportunities for economic, social, and
cultural development; energy sustainability; better health care; and the spread of knowledge. These
opportunities are accompanied by serious risks, including unemployment, inequality,
discrimination, social exclusion, surveillance, and manipulation.
There has been an impressive leap forward on AI since it began to focus on the application of
machine learning to mass volumes of data. Machine learning systems discover correlations between
data and build corresponding models, which link possible inputs to presumably correct responses
(predictions). In machine learning applications, AI systems learn to make predictions after being
trained on vast sets of examples. Thus, AI has become hungry for data, and this hunger has spurred
data collection, in a self-reinforcing spiral: the development of AI systems based on machine
learning presupposes and fosters the creation of vast data sets, i.e., big data. The integration of AI
and big data can deliver many benefits for the economic, scientific and social progress. However, it
also contributes to risks for individuals and for the whole of society, such as pervasive surveillance
and influence on citizens' behaviour, polarisation and fragmentation in the public sphere.
AI and personal data
Many AI applications process personal data. On the one hand, personal data may contribute to the
data sets used to train machine learning systems, namely, to build their algorithmic models. On the
other hand, such models can be applied to personal data, to make inferences concerning particular
individuals.
Thanks to AI, all kinds of personal data can be used to analyse, forecast and influence human
behaviour, an opportunity that transforms such data, and the outcomes of their processing, into
valuable commodities. In particular, AI enables automated decision-making even in domains that
require complex choices, based on multiple factors and non-predefined criteria. In many cases,
automated predictions and decisions are not only cheaper, but also more precise and impartial than
human ones, as AI systems can avoid the typical fallacies of human psychology and can be subject
to rigorous controls. However, algorithmic decisions may also be mistaken or discriminatory,
reproducing human biases and introducing new ones. Even when automated assessments of
individuals are fair and accurate, they are not unproblematic: they may negatively affect the
individuals concerned, who are subject to pervasive surveillance, persistent evaluation, insistent
influence, and possible manipulation.
The AI-based processing of vast masses of data on individuals and their interactions has social
significance: it provides opportunities for social knowledge and better governance, but it risks
leading to the extremes of 'surveillance capitalism' and 'surveillance state'.
A normative framework
It must be ensured that the development and deployment of AI tools takes place in a socio-technical
framework – inclusive of technologies, human skills, organisational structures, and norms – where
individual interests and the social good are preserved and enhanced.
To provide regulatory support for the creation of such a framework, ethical and legal principles are
needed, together with sectorial regulations. The ethical principles include autonomy, prevention of
harm, fairness and explicability; the legal ones include the rights and social values enshrined in the
EU charter, in the EU treaties, as well as in national constitutions. The sectoral regulations involved
include first of all data protection law, consumer protection law, and competition law, but also other
I
STOA | Panel for the Future of Science and Technology
domains of the law, such as labour law, administrative law, civil liability etc. The pervasive impact of
AI on European society is reflected in the multiplicity of the legal issues it raises.
To ensure adequate protection of citizens against the risks resulting from the misuses of AI, beside
regulation and public enforcement, the countervailing power of civil society is also needed to detect
abuses, inform the public, and activate enforcement. AI-based citizen-empowering technologies
can play an important role in this regard, by enabling citizens not only to protect themselves from
unwanted surveillance and 'nudging', but also to detect unlawful practices, identify instances of
unfair treatment, and distinguish fake and untrustworthy information.
AI is compatible with the GDPR
AI is not explicitly mentioned in the GPDR, but many provisions in the GDPR are relevant to AI, and
some are indeed challenged by the new ways of processing personal data that are enabled by AI.
There is indeed a tension between the traditional data protection principles – purpose limitation,
data minimisation, the special treatment of 'sensitive data', the limitation on automated decisions –
and the full deployment of the power of AI and big data. The latter entails the collection of vast
quantities of data concerning individuals and their social relations and processing such data for
purposes that were not fully determined at the time of collection. However, there are ways to
interpret, apply, and develop the data protection principles that are consistent with the beneficial
uses of AI and big data.
The requirement of purpose limitation can be understood in a way that is compatible with AI and
big data, through a flexible application of the idea of compatibility, which allows for the reuse of
personal data when this is not incompatible with the purposes for which the data were originally
collected. Moreover, reuse for statistical purposes is assumed to be compatible, and thus would in
general be admissible (unless it involves unacceptable risks for the data subject).
The principle of data minimisation can also be understood in such a way as to allow for beneficial
applications of AI. Minimisation may require, in some contexts, reducing the 'personality' of the
available data, rather than the amount of such data, i.e., it may require reducing, through measures
such as pseudonymisation, the ease with which the data can be connected to individuals. The
possibility of re-identification should not entail that all re-identifiable data are considered personal
data to be minimised. Rather the re-identification of data subjects should be considered as creation
of new personal data, which should be subject to all applicable rules. Re-identification should
indeed be strictly prohibited unless all conditions for the lawful collection of personal data are met,
and it should be compatible with the purposes for which the data were originally collected and
subsequently anonymised.
The information requirements established by the GDPR can be met with regard to AI-based
processing, even though the complexity of AI application has to be taken into account. The
information made available to data subjects should enable them to understand the purpose of each
AI-based processing and its limits, even without going into unnecessary technical details.
The GDPR allows for inferences based on personal data, provided that appropriate safeguards are
adopted. Profiling is in principle prohibited, but there are ample exceptions (contract, law or
consent). Uncertainties exist concerning the extent to which an individual explanation should be
provided to the data subject. It is also uncertain to what extent reasonableness criteria may apply to
automated decisions.
The GDPR provisions on preventive measures, and in particular those concerning privacy by design
and by default, do not hinder the development of AI systems, if correctly designed and
implemented, even though they may entail some additional costs. It needs to be clarified which AI
applications present high risks and therefore require a preventive data protection assessment, and
possibly the preventive involvement of data protection authorities.
II
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Finally, the possibility of using personal data for statistical purposes opens opportunities for the
processing of personal data in ways that do not involve the inference of new personal data.
Statistical processing requires security measures that are proportionate to the risks for the data
subject, and which should include at least pseudonymisation.
The GDPR prescriptions are often vague and open-ended
The GDPR allows for the development of AI and big data applications that successfully balance data
protection and other social and economic interests, but it provides limited guidance on how to
achieve this goal. It indeed abounds in vague clauses and open standards, the application of which
often requires balancing competing interests. In the case of AI/big data applications, the
uncertainties are aggravated by the novelty of the technologies, their complexity and the broad
scope of their individual and social effects.
It is true that the principles of risk-prevention and accountability potentially direct the processing of
personal data toward a 'positive sum' game, in which the advantages of the processing, when
constrained by appropriate risk-mitigation measures, outweigh its possible disadvantages.
Moreover these principles enable experimentation and learning, avoiding the over- and under-
inclusiveness issues involved in the applications of strict rules. However, by requiring controllers to
rely on such principles, the GDPR offloads the task of establishing how to manage risk and find
optimal solutions onto controllers, a task that may be challenging as well as costly. The stiff penalties
for non-compliance, when combined with the uncertainty on the requirements for compliance, may
constitute a novel risk, which, rather than incentivising the adoption of adequate compliance
measure, may prevent small companies from engaging in new ventures.
Thus, the successful application of GDPR to AI-application depends heavily on what guidance data
protection bodies and other competent authorities will provide to controllers and data subjects.
Appropriate guidance would diminish the cost of legal uncertainty and would direct companies –
in particular small ones that mostly need such advice – to efficient and data protection-compliant
solutions.
Some policy indications
The study concludes with the following indications on AI and the processing of personal data.
• The GDPR generally provides meaningful indications for data protection in the context of AI
applications.
• The GDPR can be interpreted and applied in such a way that it does not substantially hinder
the application of AI to personal data, and that it does not place EU companies at a
disadvantage by comparison with non-European competitors.
• Thus, the GDPR does not require major changes in order to address AI applications.
• However, a number of AI-related data-protection issues do not have an explicit answer in
the GDPR. This may lead to uncertainties and costs, and may needlessly hamper the
development of AI applications.
• Controllers and data subjects should be provided with guidance on how AI can be applied
to personal data consistently with the GDPR, and on the available technologies for doing so.
Such guidance can prevent costs linked to legal uncertainty, while enhancing compliance.
• Providing guidance requires a multilevel approach, which involves data protection
authorities, civil society, representative bodies, specialised agencies, and all stakeholders.
• A broad debate is needed involving not only political and administrative authorities, but
also civil society and academia. This debate needs to address the issues of determining what
III
STOA | Panel for the Future of Science and Technology
IV
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
• It needs to be ensured that the right to opt out of profiling and data transfers can easily be
exercised, through appropriate user interfaces. The same applies to the right to be
forgotten.
• Normative and technological requirements concerning AI by design and by default need to
be specified.
• The possibility of repurposing data for AI applications that do not involve profiling –
scientific and statistical ones – need to be broad, as long as appropriate precautions are in
place preventing abuse.
• Strong measures need to be adopted against companies and public authorities that
intentionally abuse the trust of data subjects by using their data against their interests.
• Collective enforcement in the data protection domain should be enabled and facilitated.
In conclusion, controllers engaging in AI-based processing should endorse the values of the GDPR
and adopt a responsible and risk-oriented approach. This can be done in ways that are compatible
with the available technology and economic profitability (or the sustainable achievement of public
interests, in the case of processing by public authorities). However, given the complexity of the
matter and the gaps, vagueness and ambiguities present in the GDPR, controllers should not be left
alone in this exercise. Institutions need to promote a broad societal debate on AI applications, and
should provide high-level indications. Data protection authorities need to actively engage in a
dialogue with all stakeholders, including controllers, processors, and civil society, in order to
develop appropriate responses, based on shared values and effective technologies. Consistent
application of data protection principles, when combined with the ability to efficiently use AI
technology, can contribute to the success of AI applications, by generating trust and preventing
risks.
V
STOA | Panel for the Future of Science and Technology
Table of Contents
1. Introduction............................................................................................................................................................ 1
2. AI and personal data............................................................................................................................................ 2
2.3.7. The general problem of social sorting and differential treatment ............................................... 27
2.4. AI, legal values and norms.......................................................................................................................30
VI
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
3.5.5. Article 21 (1) and (2): Objecting to profiling and direct marketing .............................................. 58
3.5.6. Article 21 (2). Objecting to processing for research and statistical purposes............................. 58
3.6.4. Article 22(4) GDPR: Automated decision-making and sensitive data .......................................... 62
3.6.5. A right to explanation? ......................................................................................................................... 62
3.6.6. What rights to information and explanation? .................................................................................. 64
3.7. AI and privacy by design ..........................................................................................................................66
VII
STOA | Panel for the Future of Science and Technology
4.2.3. Consent.................................................................................................................................................... 74
4.2.4. AI and transparency............................................................................................................................... 74
VIII
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Table of figures
Figure 1 – Hypes and winters of AI _________________________________________________ 5
Figure 2 – General AI: The singularity _______________________________________________ 6
Figure 3 – Efficiency gains from AI _________________________________________________ 7
Figure 4 – Basic structure of expert systems__________________________________________ 9
Figure 5 – Kinds of learning _____________________________________________________ 10
Figure 6 – Supervised learning ___________________________________________________ 11
Figure 7 – Training set and decision tree for bail decisions _____________________________ 12
Figure 8 – Multilayered (deep) neural network for face recognition ______________________ 14
Figure 9 – Number of connected devices ___________________________________________ 17
Figure 10 – Data collected in a minute of online activity worldwide______________________ 17
Figure 11 – Growth of global data ________________________________________________ 18
Figure 12 – The Cambridge Analytica case__________________________________________ 24
Figure 13 – The connection between identified and de-identified data ___________________ 37
IX
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
1. Introduction
This study aims to provide a comprehensive assessment of the interactions between artificial
intelligence (AI) and data protection, focusing on the 2016 EU General Data Protection Regulation
(GDPR).
Artificial intelligence systems are populating the human and social world in multiple varieties:
industrial robots in factories, service robots in houses and healthcare facilities, autonomous vehicles
and unmanned aircraft in transportation, autonomous electronic agents in e-commerce and
finance, autonomous weapons in the military, intelligent communicating devices embedded in
every environment. AI has come to be one of the most powerful drivers of social transformation: it
is changing the economy, affecting politics, and reshaping citizens' lives and interactions.
Developing appropriate policies and regulations for AI is a priority for Europe, since AI increases
opportunities and risks in ways that are of the greatest social and legal importance. AI may enhance
human abilities, improve security and efficiency, and enable the universal provision of knowledge
and skills. On the other hand, it may increase opportunities for control, manipulation, and
discrimination; disrupt social interactions; and expose humans to harm resulting from technological
failures or disregard for individual rights and social values.
A number of concrete ethical and legal issues have already emerged in connection with AI in several
domains, such as civil liability, insurance, data protection, safety, contracts and crimes. Such issues
acquire greater significance as more and more intelligent systems leave the controlled and limited
environments of laboratories and factories and share the same physical and virtual spaces with
humans (internet services, roads, skies, trading on the stock exchange, other markets, etc.).
Data protection is at the forefront of the relationship between AI and the law, as many AI
applications involve the massive processing of personal data, including the targeting and
personalised treatment of individuals on the basis of such data. This explains why data protection
has been the area of the law that has most engaged with AI, although other domains of the law are
involved as well, such as consumer protection law, competition law, antidiscrimination law, and
labour law.
This study will adopt an interdisciplinary perspective. Artificial intelligence technologies will be
examined and assessed on the basis of most recent scientific and technological research, and their
social impacts will be considered by taking account of an array of approaches, from sociology to
economics and psychology. A normative perspective will be provided by works in sociology and
ethics, and in particular information, computer, and machine ethics. Legal aspects will be analysed
by reference to the principles and rules of European law, as well as to their application in national
contexts. The report will focus on data protection and the GDPR, though it will also consider how
data protection shares with other domains of the law the task of addressing the opportunities and
risks that come with AI.
1
STOA | Panel for the Future of Science and Technology
2.1.1. A definition of AI
The broadest definition of artificial intelligence (AI) characterises it as the attempt to build machines
that 'perform functions that require intelligence when performed by people.' 1 A more elaborate
notion has been provided by the High Level Expert Group on AI (AI HLEG), set up by the EU
Commission:
Artificial intelligence (AI) systems are software (and possibly also hardware) systems
designed by humans that, given a complex goal, act in the physical or digital dimension
by perceiving their environment through data acquisition, interpreting the collected
structured or unstructured data, reasoning on the knowledge, or processing the
information, derived from this data and deciding the best action(s) to take to achieve
the given goal. AI systems can either use symbolic rules or learn a numeric model, and
they can also adapt their behaviour by analysing how the environment is affected by
their previous actions. 2
This definition can be accepted with the proviso that most AI systems only perform a fraction of the
activities listed in the definition: pattern recognition (e.g., recognising images of plants or animals,
human faces or attitudes), language processing (e.g., understanding spoken languages, translating
from one language into another, fighting spam, or answering queries), practical suggestions (e.g.,
recommending purchases, purveying information, performing logistic planning, or optimising
industrial processes), etc. On the other hand, some systems may combine many such capacities, as
in the example of self-driving vehicles or military and care robots.
The High-Level Expert Group characterises the scope of research in AI as follows:
As a scientific discipline, AI includes several approaches and techniques, such as
machine learning (of which deep learning and reinforcement learning are specific
examples), machine reasoning (which includes planning, scheduling, knowledge
representation and reasoning, search, and optimization), and robotics (which includes
control, perception, sensors and actuators, as well as the integration of all other
techniques into cyber-physical systems).
To this definition, we could also possibly add also communication, and particularly the
understanding and generation of language, as well as the domains of perception and vision.
1
Kurzweil (1990, 14), Russel and Norvig (2016, Section 1.1).
2
AI-HLEG (2019).
2
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
3
Russell and Norvig (2016).
4
Harel (2004).
5
According to Russel and Norvig (2016, 693), 'an agent is learning if it improves its performance on future tasks after
making observations about the world'.
3
STOA | Panel for the Future of Science and Technology
to detect sounds, capture syntactic structures, retrieve relevant knowledge, make inferences,
generate answers, etc.
In a system that is capable of learning, the most important component will not be the learned
algorithmic model, i.e., the algorithms that directly execute the tasks assigned to the system (e.g.,
making classifications, forecasts, or decisions) but rather the learning algorithms that modify the
algorithmic model so that it better performs its function. For instance, in a classifier system that
recognises images through a neural network, the crucial element is the learning algorithm (the
trainer) that modifies the internal structure of the algorithmic model (the trained neural network)
by changing it (by modifying its internal connections and weights) so that it correctly classifies the
objects in its domain (e.g., animals, sounds, faces, attitudes, etc.).
6
See Mayer-Schoenberger and Cukier (2013, 15).
7
Hildebrandt (2014)
8
Nilsson (2010).
4
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
9
Bostrom (2014)
10
Bostrom (2014). This possibility was anticipated by Turing ([1951] 1966).
11
Parkin (2015).
12
See Kurzweil (2005) and Tegmark (2017).
5
STOA | Panel for the Future of Science and Technology
The risks related to the emergence of an 'artificial general intelligence' should not be
underestimated: this is, on the contrary, a very serious problem that will pose challenges in the
future. In fact, as much as scientists may disagree on whether and when 'artificial general
intelligence,' will come into existence, most of them believe that this objective will be achieved
within the end of this century. 13 In any case, it is too early to approach 'artificial general intelligence'
at a policy level, since it lies decades ahead, and a broader experience with advanced AI is needed
before we can understand both the extent and proximity of this risk, and the best ways to address
it.
Conversely, 'artificial specialised intelligence' is already with us, and is quickly transforming
economic, political, and social arrangements, as well as interactions between individuals and even
their private lives. The increase in economic efficiency already is reality (see Figure 2), but AI provides
further opportunities: economic, social, and cultural development; energy sustainability; better
health care; and the spread of knowledge. In the very recent White Paper by the European
Commission 14 it is indeed affirmed that AI.
will change our lives by improving healthcare (e.g. making diagnosis more precise,
enabling better prevention of diseases), increasing the efficiency of farming,
contributing to climate change mitigation and adaptation, improving the efficiency of
production systems through predictive maintenance, increasing the security of
Europeans, and in many other ways that we can only begin to imagine.
13
A poll among leading AI scientists can be found in Bostrom (2014).
14
White Paper 'On artificial intelligence - A European approach to excellence and trust', Brussels, 19.2.2020 COM(2020) 65
final.
6
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
15
Floridi et al (2018, 690).
16
See Cath et al (2017).
17
For a recent review of documents on AI ethics and policy, see Jobin (2019).
7
STOA | Panel for the Future of Science and Technology
that the EU is competitive with the US and China. The policy framework setting out measures to
align efforts at European, national and regional level should aim to mobilise resources
to achieve an 'ecosystem of excellence' along the entire value chain, starting in research
and innovation, and to create the right incentives to accelerate the adoption of
solutions based on AI, including by small and medium-sized enterprises (SMEs)
On the other hand, the deployment of AI technologies should be consistent with the EU
fundamental rights and social values. This requires measures to create an 'ecosystem of trust,' which
should provide citizens with 'the confidence to take up AI applications' and 'companies and public
organisations with the legal certainty to innovate using AI'. This ecosystem
must ensure compliance with EU rules, including the rules protecting fundamental
rights and consumers' rights, in particular for AI systems operated in the EU that pose a
high risk.
It is important to stress that the two objectives of excellence in research, innovation and
implementation, and of consistency with individual rights and social values are compatible, but
distinct. On the one hand the most advanced AI applications could be deployed to the detriment of
citizens' rights and social values; on the other hand the effective protection of citizens' from the risks
resulting from abuses AI does not provide in itself the incentives that are needed to stimulate
research and innovation and promote beneficial uses. This report will argue that GDPR can
contribute to address abuses of AI, and that it can be implemented in ways that do not hinder its
beneficial uses. It will not address the industrial and other policies that are needed to ensure the EU
competitiveness in the AI domain.
18
Van Harmelen et al (2008).
8
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
The theoretical results in knowledge representation and reasoning were not matched by disrupting,
game-changing applications. Expert systems – i.e., computer systems including vast domain-
specific knowledge bases, e.g., in medicine, law, or engineering, coupled with inferential engines –
gave rise to high expectations about their ability to reason and answer users' queries. Unfortunately,
such systems were often unsuccessful or only limitedly successful: they could only provide
incomplete answers, were unable to address the peculiarities of individual cases, and required
persistent and costly efforts to broaden and update their knowledge bases. In particular, expert-
system developers had to face the so-called knowledge representation bottleneck: in order to build a
successful application, the required information – including tacit and common-sense knowledge –
had to be represented in advance using formalised languages. This proved to be very difficult and
in many cases impractical or impossible.
In general, only in some restricted domains the logical models have led to successful application. In
the legal domain, for example, logical models of great theoretical interest have been developed –
dealing, for example, with arguments,19 norms, and precedents 20 – and some expert systems have
been successful in legal and administrative practice, in particular in dealing with tax and social
security regulations. However, these studies and applications have not fundamentally transformed
the legal system and the application of the law.
AI has made an impressive leap forward since it began to focus on the application of machine
learning to mass amounts of data. This has led to a number of successful applications in many
sectors – ranging from automated translation to industrial optimisation, marketing, robotic visions,
movement control, etc. – and some of these applications already have substantial economic and
social impacts. In machine learning approaches, machines are provided with learning methods,
rather than, or in addition to, formalised knowledge. Using such methods, they can automatically
learn how to effectively accomplish their tasks by extracting/inferring relevant information from
their input data. As noted, and as Alan Turing already theorised in the 1950s, a machine that is able
to learn will achieve its goals in ways that are not anticipated by its creators and trainers, and in some
cases without them knowing the details of its inner workings.21
Even though the great success of machine learning has overshadowed the techniques for explicit
and formalised knowledge representation, the latter remain highly significant. In fact, in many
domains the explicit logical modelling of knowledge and reasoning can be complementary to
machine learning. Logical models can explain the functioning of machine learning systems, check
and govern their behaviour according to normative standards (including ethical principles and legal
norms), validate their results, and develop the logical implications of such results according to
conceptual knowledge and scientific theories. In the AI community the need to combine logical
modelling and machine leaning is generally recognised, though different views exist on how to
19
Prakken, and Sartor (2015).
20
Ashley (2017).
21
Turing ([1951] 1996)
9
STOA | Panel for the Future of Science and Technology
achieve this goal, and on the aspects to be covered by the two approaches (for a discussion on the
limits of machine learning, see recently Marcus and Davis 2019).
Supervised learning is currently the most popular approach. In this case the machine learns through
'supervision' or 'teaching': it is given in advance a training set, i.e., a large set of (probably) correct
answers to the system's task. More exactly the system is provided with a set of pairs, each linking the
description of a case to the correct response for that case. Here are some examples: in systems
designed to recognise objects (e.g. animals) in pictures, each picture in the training set is tagged
with the name of the kind of object it contains (e.g., cat, dog, rabbit, etc.); in systems for automated
translation, each (fragment of) a document in the source language is linked to its translation in the
target language; in systems for personnel selection, the description of each past applicants (age,
experience, studies, etc.) is linked to whether the application was successful (or to an indicator of
the work performance for appointed candidates); in clinical decision support systems, each patient's
symptoms and diagnostic tests is linked to the patient's pathologies; in recommendation systems,
each consumer's features and behaviour is linked to the purchased objects; in systems for assessing
loan applications, each record of a previous application is linked to whether the application was
accepted (or, for successful applications, to the compliant or non-compliant behaviour of the
borrower). As these examples show, the training of a system does not always require a human
teacher tasked with providing correct answers to the system. In many case, the training set can be
side-product of human activities (purchasing, hiring, lending, tagging, etc.), as is obtained by
recording the human choices pertaining to such activities. In some cases the training set can even
be gathered 'from the wild' consisting in data which is available on the open web. For instance,
manually tagged images or faces, available on social networks, can be scraped and used for training
automated classifiers.
10
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
The learning algorithm of the system (its trainer), uses the training set to build an algorithmic model:
a neural network, a decision tree, a set of rules, etc. The algorithmic model is meant to capture the
relevant knowledge originally embedded in the training set, namely the correlations between cases
and responses. This model is then used, by a predicting algorithm, to provide hopefully correct
responses to new cases, by mimicking the correlations in the training set. If the examples in the
training set that come closest to a new case (with regard to relevant features) are linked to a certain
answer, the same answer will be proposed for the new case. For instance if the pictures that are most
similar to a new input were tagged as cats, also the new input will also be tagged in the same way;
if past applicants whose characteristic best match those of the new applicant were linked to
rejection, the system will propose to reject also the new applicant; if the past workers who come
closest to the new applicant performed well (or poorly), the systems will predict that also the
applicant will perform likewise.
The answers by learning systems are usually called 'predictions'. However, often the context of the
system's use often determines whether its proposals are be interpreted as forecasts, or rather as a
suggestion to the system's user. For instance, a system's 'prediction' that a person's application for
bail or parole will be accepted can be viewed by the defendant (and his or her lawyer) as a prediction
of what the judge will do, and by the judge as a suggestion guiding her decision (assuming that she
prefers not to depart from previous practice). The same applies to a system's prediction that a loan
or a social entitlement will be granted.
There is also an important distinction to be drawn concerning whether the 'correct' answers in a
training set are provided by the past choices by human 'experts' or rather by the factual
consequences of such choices. Compare, for instance, a system whose training set consists of past
loan applications linked to the corresponding lending decisions, and a system whose training set
consists of successful applications linked to the outcome of the loan (repayment or non-payment).
Similarly, compare a system whose training set consists of parole applications linked to judges'
decisions on such application with a system whose training set consists of judicial decisions on
parole applications linked to the subsequent behaviour of the applicant. In the first case, the system
will learn to predict the decisions that human decision-makers (bank managers, or judges) would
have made under the same circumstances. In the second case, the system will predict how a certain
choice would affect the goals being pursued (preventing non-payments, preventing recidivism). In
the first case the system would reproduce the virtues – accuracy, impartiality, farness – but also the
11
STOA | Panel for the Future of Science and Technology
vices – carelessness, partiality, unfairness – of the humans it is imitating. In the second case it would
more objectively approximate the intended outcomes.
As a simple example of supervised learning, Figure 7, shows a (very small) training set concerning
bail decisions along with the decision tree that can be learned on the basis of that training set. The
decision tree captures the information in the training set through a combination of tests, to be
performed sequentially. The first test concerns whether the defendant was involved in a drug
related offence. If the answer is positive, we have reached the bottom of the tree with the conclusion
that bail is denied. If the answer is negative, we move to the second test, on whether the defendant
used a weapon, and so on. Notice that the decision tree does not include information concerning
the kind of injury, since all outcomes can be explained without reference to that information. This
shows how the system's model does not merely replicate the training set; it involves generalisation:
it assumes that certain combination of predictors are sufficient to determine the outcomes, other
predictors being irrelevant.
In this example we can distinguish the elements in Figure 6. The table in Figure 7 is the training set.
The software that constructs the decision tree, is the learning algorithm. The decision tree itself, as
shown in Figure 7 is the algorithmic model, which codes the logic of the human decisions in the
training set. The software that processes new cases, using the decision tree, and makes predictions
based on their features of such cases, is the predicting algorithm. In this example, as noted above,
the decision tree reflects the attitudes of the decision-makers whose decisions are in the training
set: it reproduces their virtues and biases.
For instance, according to the decision tree, the fact that the accuse concerns a drug-related offence
is sufficient for bail to be denied. We may wonder whether this is a fair criterion for assessing bail
requests. Note also that the decision tree (the algorithmic model) also provides answers for cases
that do not fit exactly any example in the training set. For instance, no example in the training set
concerns a drug-related offence with no weapon and no previous record. However, the decision
tree provides an answer also for this case: there should be no bail, as this is what happens in all drug-
related cases in the training set.
As another simplified example of supervised machine learning consider the training set and the
rules in figure 7. In this case too, the learning algorithm, as applied to this very small set of past
decisions, delivers questionable generalisation, such as the prediction that young age would always
lead to a rejection of the loan applications and that middle age would always lead to acceptance.
Usually, in order to give reliable prediction, a training set must include a vast number of examples,
each described through a large set of predictors.
Reinforcement learning is similar to supervised learning, as both involve training by way of examples.
However, in the case of reinforcement learning the systems learns from the outcomes of its own
action, namely, through the rewards or penalties (e.g., points gained or lost) that are linked to the
outcomes of such actions. For instance, in case of a system learning how to play a game, rewards
12
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
may be linked to victories and penalties to defeats; in a system learning to make investments,
rewards may be linked to financial gains and penalties to losses; in a system learning to target ads
effectively, rewards may be linked to users' clicks, etc. In all these cases, the system observes the
outcomes of its actions, and it self-administers the corresponding rewards or penalties. Being
geared towards maximising its score (its utility), the system will learn to achieve outcomes leading
to rewards (victories, gains, clicks), and to prevent outcomes leading to penalties. With regard to
reinforcement learning too, we can distinguish the learner (the algorithm that learns how to act
successfully, based on the outcomes of previous actions by the system) and the learned model (the
output of the learner, which determines the system's new actions).
In unsupervised learning, finally, AI systems learn without receiving external instructions, either in
advance or as feedback, about what is right or wrong. The techniques for unsupervised learning are
used in particular, for clustering, i.e., for grouping the set of items that present relevant similarities
or connections (e.g., documents that pertain to the same topic, people sharing relevant
characteristics, or terms playing the same conceptual roles in texts). For instance, in a set of cases
concerning bail or parole, we may observe that injuries are usually connected with drugs (not with
weapons as expected), or that people having prior record are those who are related to weapon.
These clusters might turn out to be informative to ground bail or parole policies.
13
STOA | Panel for the Future of Science and Technology
In the case of the neural network, the learning algorithm modifies the network until it achieves the
desired performance level, while the outcome of the learning – algorithmic model – is the network
in its final configuration.
As previously noted, the learning algorithm is able to modify the neural network (the weights in
connections and neurons) so that the network is able to provide the most appropriate answers.
Under the supervised learning approach, the trained network will reproduce the behaviour in the
training set; under the reinforcement learning approach, the network will adopt the behaviour that
maximises its score (e.g. the reward points linked to gains in investments or to victories in games).
2.2.5. Explicability
Different machine learning approaches differ in their ability to provide explanations. For instance,
the outcome of a decision tree can be explained through the sequence of tests leading to that
outcome. In our example, if bail is refused after testing No for Drug, Yes for Weapons and Yes for
Previous record, an explanation is provided by a corresponding rule: if No Drug and Weapons and
Previous Record, then No Bail.
Unlike a decision tree, a neural network does not provide explanations of its outcomes. It is possible
to determine how a certain output has resulted from the network's activation, and how that
activation, in response to a given input, was determined by the connections between neurons (and
by the weights assigned to such connections as a result of the network's training) and by the
mathematical functions governing each neuron. However, this information does not show a
rationale that is meaningful to humans: it does not tell us why a certain response was given.
Many approaches exist to providing explanations of the behaviour of neural networks and other
opaque systems (also called 'black boxes'). Some of these approaches look into the system to be
explained, and build explanations accordingly (e.g., looking at the outcomes of the network's
different layers, as in the example in Figure 8). Other approaches build explanations on the basis of
the network's external behaviour: they only consider the relation between the inputs provided by
the network and the outcomes it delivers, and build arguments or other explanations accordingly.
However, advancements of human-understandable explanation of neural networks have so far
been quite limited still. 22 Unfortunately, in many domains, the systems whose functioning is less
explicable provide higher performance. Thus, comparative advantages in performance and in
explicability may have to be balanced, in order to determine what approach should be adopted in
22
Guidotti et al (2018).
14
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
a machine learning system. The best balance also depends on the domain in which the system is
used and on the importance of the interests that are affected. When public action is involved and
key human interests are at stake (e.g., as in judicial decisions) explanation is paramount.
Even when a system can only be viewed as a black box, however, some critical analyses of its
behaviour are still possible. Through sensitivity analysis – i.e., by systematically checking whether
the output changes if the value of certain input features is modified, leaving all other features
unchanged – we can understand what features determine the system's output. For instance, by
checking whether the prediction of a system meant to assess creditworthiness changes if we modify
the place of birth or residence of the applicant, we can determine whether this input feature is
relevant to the system's output. Consequently, we may wonder whether the system unduly
discriminated people depending on their ethnicity or social status, which may be linked to place of
birth or residence.
15
STOA | Panel for the Future of Science and Technology
past patients, linking their characteristics and medical tests to subsequent medical conditions and
treatments.
As a result of the need to learn by analysing vast amount of data, AI has become hungry for data,
and this hunger has spurred data collection, in a self-reinforcing spiral. 23 Thus, the development of
AI systems based on machine learning presupposes and fosters the creation of vast data sets, i.e.,
big data 24.
The collection of data is facilitated by the availability of electronic data as a by-product of using any
kind of ICT system. Indeed, a massive digitisation has preceded most AI applications, resulting
from the fact that data flows are produced in all domains where computing is deployed.9 For
instance, huge amounts of data are collected every second by computers that execute economic
transactions (as in e-commerce)10, by sensors monitoring and providing input to physical objects
(e.g., vehicles or smart home devices), by the workflows generated by economic and governmental
activities (e.g., banking, transportation, or taxation, etc.); by surveillance devices (e.g. traffic cameras,
or access control systems); and systems supporting non-market activities (e.g. internet access,
searching, or social networking).
In recent years, these data flows have been integrated into a global interconnected data-processing
infrastructure, centred on, but not limited to, the Internet. This infrastructure constitutes a universal
medium for communicating, accessing data, and delivering any kind of private and public services.
It enables citizens to shop, use banking and other services, pay taxes, get government benefits and
entitlements, access information and knowledge, and build social connections. Algorithms – often
powered by AI – mediate citizens' access to content and services, selecting information and
opportunities for them, while at the same time recording any activity. Today, this global
interconnected data-processing infrastructure seems to include about 30 billion devices –
computers, smart phones, industrial machines, cameras, etc. – which generate masses of electronic
data (see Figure 9).
23
Cristianini (2016).
24
Mayer-Schönberger and Cukier (2013).
16
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Figure 10 provides a comparative overview of what takes place online every minute.
AI's hunger for data concerns any kind of information: from meteorological data, to environmental
ones, to those concerning industrial processes. Figure 4 gives an idea of the growth of data creation.
17
STOA | Panel for the Future of Science and Technology
25
Kurzweil (2012).
26
McAfee, and Brynjolfsson (2019).
18
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
27
Licklider (1960).
28
McAfee and Brynjolfsson (2019), Mindell (2015).
29
Brynjolfsson and McAfee (2011).
30
Bhuta et al (2015).
31
Sunstein (2007).
19
STOA | Panel for the Future of Science and Technology
when used to capture users by exposing them to information they may like, or which accords with
their preferences, thereby exploiting their confirmation biases.32
Just as AI can be misused by economic actors, it can also be misused by the public section.
Governments have many opportunities to use AI for legitimate political and administrative purposes
(e.g., efficiency, cost savings, improved services), but they may also employ it to anticipate and
control citizens' behaviour in ways that restrict individual liberties and interfere with the democratic
process.
32
Pariser (2011).
33
Kahneman (2011).
34
Kahneman (2011, Ch. 21), Kleinberg et al (2019).
35
Kleinberg et all (2019).
20
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
In other cases, a training set may be biased against a certain group, since the achievement of the
outcome being predicted (e.g., job performance) is approximated through a proxy that has a
disparate impact on that group. Assume, for instance, that the future performance of employees
(the target of interest in job hiring) is only measured by the number of hours worked in the office.
This outcome criterion will lead to past hiring of women – who usually work for fewer hours than
men, having to cope with heavier family burdens – being considered less successful than the hiring
of men; based on this correlation (as measured on the basis of the biased proxy), the systems will
predict a poorer performance of female applicants.
In other cases, mistakes and discriminations may pertain to the machine-learning system's biases
embedded in the predictors. A system may perform unfairly, since it uses a favourable predictor
(input feature) that only applies to members of a certain group (e.g., the fact of having attended a
socially selective high-education institution). Unfairness may also result from taking biased human
judgements as predictors (e.g., recommendation letters).
Finally, unfairness may derive from a data set that does reflect the statistical composition of the
population. Assume for instance that in applications for bail or parole, previous criminal record plays
a role, and that members of a certain groups are subject to stricter controls, so that their criminal
activity is more often detected and acted upon. This would entail that members of that group will
generally receive a less favourable assessment than members of other groups having behaved in
the same ways.
Members of a certain group may also suffer prejudice when that group is only represented by a very
small subset of the training set, since this will reduce the accuracy of predictions for that group (e.g.,
consider the case of a firm that has appointed few women in the past and which uses its records of
past hiring as its training set).
It has also been observed that it is difficult to challenge the unfairness of automated decision-
making. Challenges raised by the individuals concerned, even when justified, may be disregarded
or rejected because they interfere with the system's operation, giving rise to additional costs and
uncertainties. In fact, the predictions of machine-learning systems are based on statistical
correlations, against which it may be difficult to argue on this basis of individual circumstances, even
when exceptions would be justified. Here is the perspective of Cathy O'Neil, a machine-learning
expert who has become a critic of the abuses of automation:
An algorithm processes a slew of statistics and comes up with a probability that a
certain person might be a bad hire, a risky borrower, a terrorist, or a miserable teacher.
That probability is distilled into a score, which can turn someone's life upside down.
And yet when the person fights back, 'suggestive' countervailing evidence simply won't
cut it. The case must be ironclad. The human victims of WMDs, we'll see time and again,
are held to a far higher standard of evidence than the algorithms themselves.36
These criticisms have been countered by observing that algorithmic systems, even when based on
machine learning, are more controllable than human decision-makers, their faults can be identified
with precision, and they can be improved and engineered to prevent unfair outcomes.
[W]ith appropriate requirements in place, the use of algorithms will make it possible to
more easily examine and interrogate the entire decision process, thereby making it far
easier to know whether discrimination has occurred. By forcing a new level of
specificity, the use of algorithms also highlights, and makes transparent, central trade-
36
O'Neil (2016)
21
STOA | Panel for the Future of Science and Technology
offs among competing values. Algorithms are not only a threat to be regulated; with
the right safeguards in place, they have the potential to be a positive force for equity.37
In conclusion, it seems that issues that have just been presented should not lead us to exclude
categorically the use of automated decision-making. The alternative to automated decision-making
is not perfect decisions but human decisions with all their flaws: a biased algorithmic system can still
be fairer than an even more biased human decision-maker. In many cases, the best solution consists
in integrating human and automated judgements, by enabling the affected individuals to request a
human review of an automated decision as well as by favouring transparency and developing
methods and technologies that enable human experts to analyse and review automated decision-
making. In fact, AI systems have demonstrated an ability to successfully also act in domains
traditionally entrusted the trained intuition and analysis of humans, such as medical diagnosis,
financial investment, the granting of loans, etc. The future challenge will consist in finding the best
combination between human and automated intelligence, taking into account the capacities and
the limitations of both.
37
Kleinberg, Ludwig, Mullainathan, and Sunstein (2018, 113).
22
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
the system's indication may provide the basis for preventive therapies and tests, or rather for a raise
in the insurance premium.
The information so inferred may also be conditional, that is, it may consist in the propensity to react
in a certain way to given inputs. For instance, it may consist in in the propensity to respond to a
therapy with improved medical condition, or in the propensity to respond to a certain kind of ad or
to a certain price variation with a certain purchasing behaviour, or in the propensity to respond a
certain kind of message with a change in mood or preference (e.g., relatively to political choices).
When that is the case, profiling potentially leads to influence and manipulation.
Assume, too, that the system connects certain values for input features (e.g., having a certain age,
gender, social status, personality type, etc.) to the propensity to react to a certain message (e.g., a
targeted ad) with a certain response (e.g., buying a certain product). Assume also that the system is
told that a particular individual has these values (he is a young male, working class, extrovert, etc.).
Then the system would know that by administering to the individual that message, the individual
can probably be induced to deliver the response.
The notion of profiling just presented corresponds to this more elaborate definition:
Profiling is a technique of (partly) automated processing of personal and/or non-
personal data, aimed at producing knowledge by inferring correlations from data in the
form of profiles that can subsequently be applied as a basis for decision-making. A
profile is a set of correlated data that represents a (individual or collective) subject.
Constructing profiles is the process of discovering unknown patterns between data in
large data sets that can be used to create profiles. Applying profiles is the process of
identifying and representing a specific individual or group as fitting a profile and of
taking some form of decision based on this identification and representation. 38
The notion of profiling in the GDPR only covers assessments or decisions concerning individuals,
based on personal data, excluding the mere construction of group profiles:
'profiling'[…] consists of any form of automated processing of personal data evaluating
the personal aspects relating to a natural person, in particular to analyse or predict
aspects concerning the data subject's performance at work, economic situation, health,
personal preferences or interests, reliability or behaviour, location or movements,
where it produces legal effects concerning him or her or similarly significantly affects
him or her.
Even when an automated assessment and decision-making system – a profile-based system – is
unbiased, and meant to serve beneficial purposes, it may negatively affect the individuals
concerned. Those who are subject to pervasive surveillance, persistent assessments and insistent
influence come under heavy psychological pressure that affects their personal autonomy, and they
are susceptible to deception, manipulation and exploitation in multiple ways.
38
Bosco et al (2015); see also Hildebrandt, M. (2009).
23
STOA | Panel for the Future of Science and Technology
First of all, people being registered as voters in the USA were invited to take a detailed
personality/political test (about 120 questions), available online. The individuals taking the test
would be rewarded with a small amount of money (from two to five dollars). They were told that
their data would only be used for the academic research.
About 320 000 voters took the test. In order to be receive the reward each individual taking the test
had to provide access to his or her Facebook page (step 1). This allowed the system to connect each
individual's answers to the information included in his or her Facebook page.
When accessing a test taker's page, Cambridge Analytica collected not only the Facebook page of
test takers, but also the Facebook pages of their friends, between 30 and 50 million people
altogether (step 2). Facebook data was also collected from other sources.
After this data collection phase, Cambridge Analytica had at is disposition two sets of personal data
to be processed (step 3): the data about the test takers, consisting in the information on their
Facebook pages, paired with their answers to the questionnaire, and the data about their friends,
consisting only in the information on their Facebook pages.
Cambridge Analytica used the data about test-takers as a training set for building a model to profile
their friends and other people. More precisely, the data about the test-takers constituted a vast
training set, where the information on an individual's Facebook pages (likes, posts, links, etc.)
provided values for predictors (features) and the answers to the questionnaire (and psychological
and political attitudes expressed by such answers) provided values the targets. Thanks to its
machine leaning algorithms Cambridge Analytica could use this data to build a model correlating
the information in people's Facebook pages to predictions about psychology and political
preferences. At this point Cambridge Analytica engaged in massive profiling, namely, in expanding
the data available on the people who did not take the test (their Facebook data, and any further
data that was available on them), with the predictions provided by the model. For instance, if test-
takers having a certain pattern of Facebook likes and posts were classified as having a neurotic
personality, the same assessment could be extended also to non-test-takers having similar patterns
in their Facebook data.
24
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Finally (stage 4), based on this personality/political profiling, potential voters who were likely to
change their voting behaviour were identified (in US States in which a small change could make a
difference) if prodded with appropriate messages. These voters where targeted with personalised
political ads and with other messages that could trigger the desired change in voting behaviour,
possibly building upon their emotions and prejudice and without making them aware of the
purpose of such messages.39
39
On the problems related to disinformation and propaganda, see Bayer et al (2019).
40
Varian (2010, 2014),
41
Pentland (2015,28),
42
Zuboff (2019), see also Cohen (2019) who prefers to speak of 'informational capitalism.”
43
Polanyi [1944] 2001),
25
STOA | Panel for the Future of Science and Technology
In government too, AI and big data can bring great advantages, supporting efficiency in managing
public activities, coordinating citizens' behaviour, and preventing social harms. However, they may
also enable new kinds of influence and control, underpinned by purposes and values that may
conflict with the requirements of democratic citizenship. A paradigmatic example is that of the
44
Zuboff (2019, 507).
45
Cristianini, and Scantamburlo (2019).
46
Balkin (2008, 3).
26
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Chinese Social credit systems, which collects data about citizens and assigns to those citizens scores
that quantify their social value and reputation. This system is based on the aggregation and analysis
of personal information. The collected data cover financial aspects (e.g., timely compliance with
contractual obligations), political engagement (e.g., participation in political movements and
demonstrations), involvement in civil and criminal proceedings (past and present) and social action
(e.g. participation in social networks, interpersonal relationships, etc.). On the basis these data items,
citizens may be assigned positive or negative points, which contribute to their social score. A
citizen's overall score determines his or her access to services and social opportunities', such as
universities, housing, transportation, jobs, financing, etc. The system's purported objective is to
promote mutual trust, and civic virtues. One may wonder whether opportunism and conformism
may be rather promoted to the detriment of individual autonomy and genuine moral and social
motivations.
Thus, the perspective of an integration or symbiosis between humans and intelligent machine, while
opening bright prospects, does not entail that all applications of AI should be accepted as long as
they meet technological and fairness standards. It has been argued that following this approach
What is achieved is resignation – the normalization of massive data capture, a one-way
transfer to technology companies, and the application of automated, predictive
solutions to each and every societal problem. 47
Indeed, in some cases and domain AI and big data applications – even when accurate and unbiased–
may have individual and social costs that outweigh their advantages. To address these cases, we
need to go beyond requiring unbiasedness and fairness, and ask further questions, which may
challenge the very admissibility of the AI applications at stake.
Which systems really deserve to be built? Which problems most need to be tackled?
Who is best placed to build them? And who decides? We need genuine accountability
mechanisms, external to companies and accessible to populations. Any A.I. system that
is integrated into people's lives must be capable of contest, account, and redress to
citizens and representatives of the public interest. 48
Consider, for instance, systems that are able to recognise sexual orientation, or criminal tendencies
from the faces of persons. Should we just ask that whether these systems provide reliable
assessments, or should we rather ask whether they should be built at all. Should we 'ban them, or at
least ensure they are only licensed for socially productive uses?' 49 The same may concern extremely
intrusive ways to monitor, analyse, punish or reword the behaviour of workers by online platforms
for transportation (e.g. Uber) or other services. Similarly, some AI-based financial application, even
when inclusive, may have a negative impact on their addressees, e.g., pushing them into perpetual
debt. 50
47
Powles and Nissenbaum (2018).
48
Powles and Nissenbaum (2018).
49
Pasquale (2019).
50
Pasquale (2019).
27
STOA | Panel for the Future of Science and Technology
a certain financial history, combined with data on residence or internet use, can lead to a prediction
concerning financial reliability and possibly to a credit score.
A new dynamic of stereotyping and differentiation takes place. On the one hand, the individuals
whose data support the same prediction, will be considered and treated in the same way. On the
other hand, the individuals whose data support different predictions, will be considered and treated
differently.
This equalisation and differentiation, depending on the domains in which it is used and on the
purposes that it is meant to serve, may affect positively or negatively the individuals concerned but
also broader social arrangements.
Consider for instance the use of machine learning technologies to detect or anticipate health issues.
When used to direct patients to therapies or preventive measures that are most suited to their
particular conditions, these AI applications are certainly beneficial, and the benefits outweigh – at
least when accompanied by corresponding security measures – whatever risks that may be linked
to the abuse of patients' data. The benefits, moreover, concern in principle all data subjects whose
data are processed for this purpose, since each patient has an interest in a more effective and
personalised treatment. Processing of health-related data may also be justified on grounds of public
health (Article 9 (2)(h)), and in particular for the purpose of 'monitoring epidemics and their spread'
(Recital 46). This provision has become hugely relevant in the context of the Coronavirus disease
2019 (COVID-19) epidemics. In particular a vast debate has been raised by development of
applications for tracing contacts, in order to timely monitor the diffusion of the infection. 51 AI is
being applied in the context of the epidemics in multiple ways, e.g., to assess symptoms of
individuals and to anticipate the evolution of the epidemics. Such processing should be viewed as
legitimate as long as it effectively contributes to limit the diffusion and the harmfulness of the
epidemics, assuming that the privacy and data protection risks are proportionate to the expected
benefit, and that appropriate mitigation measures are applied.
The use of the predictions based on health data in the context of insurance deserves a much less
favourable assessment. In this case there would be some gainers, namely the insured individuals
getting a better deal based on their favourable heath prospects, but also some losers, namely those
getting a worse deal because of their unfavourable prospects. Thus, individuals who already are
disadvantaged because of their medical conditions would suffer further disadvantage, being
excluded from insurance or being subject to less favourable conditions. Insurance companies
having the ability (based on the data) to distinguish the risks concerning different applicants would
have a competitive advantage, being able to provide better conditions to less risky applicants, so
that insurers would be pressured to collect as much personal data as possible.
Even less commendable would be the use of health predictions in the context of recruiting, which
would involve burdening less healthy people with unemployment or with harsher work conditions.
Competition between companies would also be affected, and pressure for collecting health data
would grow.
Let us finally consider the domain of targeted advertising. In principle, there seems to be nothing
wrong in providing consumers with ads match their interests, helping them to navigate the huge
set of options that are available online. However, personalised advertising involves the massive
collection of personal data, which is used in the interests of advertisers and intermediaries, possibly
against the interests of data subjects. Such data provide indeed new opportunities for influence and
51
See the European Data Protection Board Guidelines 04/2020 on the use of location data and contact-tracing tools in the
context of the Covid-19 outbreak.
28
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
control, they can be used to delivers deceitful, or aggressive messages, or generally messages that
bypass rationality by appealing to weaknesses and emotions.
Rather than predominantly stimulating the development and exercise of conscious and
deliberate reason, today's networked information flows […] employ a radical
behaviorist approach to human psychology to mobilize and reinforce patterns of
motivation, cognition, and behavior that operate on automatic, near- instinctual levels
and that may be manipulated instrumentally.52
Thus, people may be induced to purchase goods they do not need, to overspend, to engage in risky
financial transactions, to indulge in their weaknesses (e.g. gambling or drug addiction). The
opportunity for undue influence is emphasised by the use of psychographic techniques that enable
psychological attitudes to be inferred from behaviour, and thus disclose opportunities for
manipulation. 53
Even outside of the domain of aggressive or misleading advertising, we may wonder what real
benefits to consumers and to society may be delivered by practices such as price discrimination,
namely, the policy of providing different prices and different conditions to different consumers,
depending on predictions on their readiness to pay. Economist have observed that this practice may
not only harm consumers but also affect the functioning of markets.
Because AI and big data enable firms to assess how much each individual values
different products and is therefore willing to pay, they give these firms the power to
price discriminate, to charge more to those customers who value the product more or
who have fewer options. Price discrimination not only is unfair, but it also undermines
the efficiency of the economy: standard economic theory is based on the absence of
discriminatory pricing. 54
The practice of price discrimination shows how individuals may be deprived of access to some
opportunities when they are provided with personalised informational environment engineered by
third parties, i.e., with informational cocoons where they are presented with data and choices that
are selected by others, according to their priorities.
Similar patterns characterise the political domain, where targeted ads and messages can enable
political parties to selectively appeal to individuals having different political preferences and
psychological attitudes, without them knowing what messages are addressed to other voters, in
order to direct such individuals towards the desired voting behaviour, possibly against their best
judgement. In this case too, it may be wondered whether personalisation really contributes to the
formation of considered political opinions, or whether it is averse to it. After the Cambridge
Analytica case, some internet companies have recognised how microtargeted political advertising
may negatively affect the formation of political opinion, and have consequently adopted some
remedial measures. Some have refused to transmit paid political ads (Twitter), others have restricted
the factors used for targeting, only allowing general features such as age, gender, or residence code,
to the exclusion of other aspects, such as political affiliation or public voter records (Google).
In conclusion we may say that AI enables new kinds of algorithmic mediated differentiations
between individuals, which need to be strictly scrutinised. While in the pre-AI era differential
treatments could be based on the information extracted through individual interactions (the typical
job interview) and human assessments, or on few data points whose meaning was predetermined,
in the AI era differential treatments can be based on vast amounts of data enabling probabilistic
52
Cohen 2019
53
Burr and Cristianini (2019).
54
Stiglitz (2019, 115).
29
STOA | Panel for the Future of Science and Technology
predictions, which may trigger algorithmically predetermined responses. The impacts of such
practices can go beyond the individuals concerned, and affect important social institution, in the
economical as well as in the political sphere.
The GDPR, as we shall see in the following section, provides some constraints: the need for a legal
basis for any processing of personal data, obligations concerning information and transparency,
limitations on profiling and automated decision-making, requirements on anonymisation and
pseudonymisation, etc. These constraints, however, need to be coupled with strong public
oversight, possibly leading to the ban of socially obnoxious forms of differential treatment, or to
effective measures that prevent abuses. The decision on what forms of algorithmic differentiations
to allow is a highly political one, which should be entrusted to technical authorities only under the
direction of politically responsible bodies, such as in particular, parliamentary assemblies. It is a
decision that concerns what society we want to live in, under what arrangement of powers and
opportunities.
30
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
both benefits and costs, and ensuring that individuals and groups are free from unfair
bias, discrimination and stigmatisation. The procedural dimension entails the ability to
contest and seek effective redress against decisions made by AI systems and by the
humans operating them.
- Explicability: algorithmic processes need to be transparent, the capabilities and purpose
of AI systems openly communicated, and decisions explainable to those affected both
directly and indirectly.
According to the High-Level Expert Group, in order to implement and achieve trustworthy AI, seven
requirements should be met, building on the principles mentioned above:
- Human agency and oversight, including fundamental rights;
- Technical robustness and safety, including resilience to attack and security, fall back
plan and general safety, accuracy, reliability and reproducibility;
- Privacy and data governance, including respect for privacy, quality and integrity of data,
and access to data;
- Transparency, including traceability, explainability and communication;
- Diversity, non-discrimination and fairness, including the avoidance of unfair bias,
accessibility and universal design, and stakeholder participation;
- Societal and environmental wellbeing, including sustainability and environmental
friendliness, social impact, society and democracy;
- Accountability, including auditability, minimisation and reporting of negative impact,
trade-offs and redress.
Implementation of these requirements should occur throughout an AI system's entire life cycle as
required by specific applications.
A recent comparative analysis of documents on the ethics of AI has noted a global convergence
around the values of transparency, non-maleficence, responsibility, and privacy, while dignity,
solidarity and responsibility are less often mentioned.55 However, substantial differences exists on
how to how to balance competing requirements, i.e., on how to address cases in which some of the
values just mentioned are affected, but at the same time economic, administrative, political or
military advantages are also obtained.
55
Jobin et al (2019).
56
For a review of the impacts of ICTs on rights and values, see Sartor (2017), De Hert and Gutwirth (2009).
31
STOA | Panel for the Future of Science and Technology
to health care (article 35), right to access to services of general economic interest (Article 36),
consumer protection (Article 38), right to good administration (Article 41), right to an effective
remedy and to a fair trial (Article 47). Besides individual right also social values are at stake, such as
democracy, peace, welfare, competition, social dialogue efficiency, advancement in science, art and
culture, cooperation, civility, and security.
Given the huge breath of its impacts on citizens' individual and social lives, AI falls under the scope
of different sectorial legal regimes. These regimes include especially, though not exclusively, data
protection law, consumer protection law, and competition law. As has been observed by the
European Data Protection Supervisor (EDPS) in Opinion 8/18 on the legislative package 'A New Deal
for Consumers,' there is synergy between the three regimes. Consumer and data protection law
share the common goals of correcting imbalances of informational and market power, and, along
with competition law, they contribute to ensuring that people are treated fairly. Other domains of
the law are also involved in AI: labour law relative to the new forms of control over worker enabled
by AI; administrative law relative to the opportunities and risk in using AI to support administrative
decision-making; civil liability law relative to harm caused by AI driven systems and machines;
contract law relative to the use of AI in preparing, executing and performing agreements; laws on
political propaganda and elections relatively to the use of AI in political campaigns; military law on
the use of AI in armed conflicts; etc.
57
Floridi et al (2018).
58
Pasquale (2015).
59
On fiduciary obligations related to the use of AI, see Balkin (2017).
32
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
have an interest in not being misled or manipulated by AI systems, but they also have an interest in
being able to trust such systems, knowing that the controllers of those systems will not profit from
the people's exposure (possibly resulting from personal data). Reasonable trust is needed so that
individuals do to waste their limited and costly cognitive capacities in trying to fend off AI systems'
attempts to mislead and manipulate them.
Finally, citizens have an indirect interest in fair algorithmic competition, i.e., in not being subject to
market-power abuses resulting from exclusive control over masses of data and technologies. This is
of direct concern to competitors, but the lack of competition may negatively affect consumers, too,
by depriving them of valuable options and restricting their sphere of action. Moreover, the lack of
competition enables the leading companies to obtain huge financial resources, which they can use
to further increase their market power (e.g., by preventively buying potential competitors), or to
promote their interests through influence.ng public opinion and politics.
60
Galbraith (1983).
61
Lippi et al (2020).
62
https://claudette.eui.eu/
63
Contissa et al (2018), Lippi et al (2019).
64
Ruggeri, Pedreschi, and Turini (2010).
33
STOA | Panel for the Future of Science and Technology
to protect users from manipulation and fraud, provide them with awareness of fake and
untrustworthy information, and facilitate their escape from 'filter bubbles' (the unwanted
filtering/pushing of information).
It may be worth considering how the public could support and incentivise the creation and
distribution of AI tools to the benefit of data subject and citizens. Such tools would provide new
opportunities for research, development, and entrepreneurship. They would contribute to reduce
unfair and unlawful market behaviour and favour the development of legal and ethical business
models. Finally, citizen-empowering technologies would support the involvement of civil society in
monitoring and assessing the behaviour of public and private actors and of the technologies
deployed by the latter, encouraging active citizenship, as a complement to the regulatory and law-
enforcement activity of public bodies.
34
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
3. AI in the GDPR
In this section the provisions of the GDPR are singularly analysed to determine the extent to which
their application is challenged by of AI as well as the extent to which they may influence the
development of AI applications.
35
STOA | Panel for the Future of Science and Technology
Re-identification
The first issue concerns of identifiability. AI, and more generally methods for computational
statistics, increases the identifiability of apparently anonymous data, since they enable
nonidentified data (including data having been anonymised or pseudonymised) to be connected to
the individuals concerned
[N]umerous supposedly anonymous datasets have recently been released and
reidentified. In 2016, journalists reidentified politicians in an anonymized browsing
history dataset of 3 million German citizens, uncovering their medical information and
their sexual preferences. A few months before, the Australian Department of Health
publicly released de-identified medical records for 10% of the population only for
researchers to reidentify them 6 weeks later. Before that, studies had shown that de-
identified hospital discharge data could be reidentified using basic demographic
attributes and that diagnostic codes, year of birth, gender, and ethnicity could uniquely
identify patients in genomic studies data. Finally, researchers were able to uniquely
identify individuals in anonymized taxi trajectories in NYC27, bike sharing trips in
London, subway data in Riga, and mobile phone and credit card datasets. 66
The re-identification of data subjects is usually based on statistical correlations between non-
identified data and personal data concerning the same individuals.
65
Regulation (EU) 2018/1807 of the European Parliament and of the Council of 14 November 2018 on a framework for the
free flow of non-personal data in the European Union.
66
Rocher et al (2019).
36
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Figure 13 illustrates a connection between an identified and a de-identified data set that enabled
the re-identification of the health record of the governor of Massachusetts. This result was obtained
by searching for de-identified data that matched the Governor's date of birth, ZIP code and gender.67
Another classic example is provided the Netflix price database case, in which anonymised movie
ratings could be re-identified by linking them to non-anonymous ratings in IMDb (Internet Movie
Database). In fact, knowing only two non-anonymous reviews by an IMDb user, it was possible to
identify the reviews by the same user in the anonymous database. Similarly, it has been shown that
an anonymous user of an online service can be re-identified by that service, if the service knows that
the user has installed four apps on his or her device, and the service has access to the whole list of
apps installed by each user. 68
Re-identification can be viewed as a specific kind of inference of personal data: through re-
identification. A personal identifier is associated to previously non-identified data items, which, as a
consequence, become personal data. Note that for an item to be linked to a person, it is not
necessary that the data subject be identified with absolute certainty; a degree of probability may be
sufficient to enable a differential treatment of the same individual (e.g., the sending of targeted
advertising).
Thanks to AI and big data the identifiability of the data subjects has vastly increased. The personal
nature of a data idem no longer is a feature of that item separately considered. It has rather become
a contextual feature. As shown above, an apparently anonymous data item becomes personal in the
context of further personal data that enable re-identification. For instance, the identifiability of the
Netflix movie reviewers supervened on the availability of their named reviews on IMDb. As it has
been argued, 'in any "reasonable" setting there is a piece of information that is in itself innocent, yet
in conjunction with even a modified (noisy) version of the data yields a privacy breach.' 69
67
Sweeney (2000).
68
Achara et al (2015)
69
Dwork and Naor (2010, 93).
37
STOA | Panel for the Future of Science and Technology
This possibility can be addressed in two ways, neither of which is fail-proof. The first consists in
ensuring that data is de-identified in ways that make it more difficult to re-identify the data subject;
the second consists in implementing security processes and measures for the release of data that
contribute to this outcome.70
70
Rubinstein and Harzog (2016).
71
Joint cases c-141 and 372/12. See Joined Cases C–141 & 372/12, YS, M and S v. Minister voor Immigratie, Integratie en Asiel,
2014 E.C.R. I- 2081, ¶ 48.
72
Case C-434/16, Peter Nowak v. Data Protection Commissioner, 34.
73
Opinion 4/2007
74
Opinion 216/679, adopted on 3 October 2017, revised in 6 February 2018.
38
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
75
Opinion 216/679, adopted on 3 October 2017, revised in 6 February 2018.
39
STOA | Panel for the Future of Science and Technology
messages that are most likely to trigger the desired purchasing behaviour. The same model can be
extended to politics, with regard to messages that may trigger desired voting behaviour.
76
Wachter and Mittelstadt (2019).
40
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Legal scholars have argued that data subjects should be granted a general right to 'reasonable
inference' namely, the right that any assessment of decision affecting them is obtained through
automated inferences that are reasonable, respecting both ethical and epistemic standards.
Accordingly, data subject should be entitled to challenge the inferences (e.g. credit scores) made by
an AI system, and not only the decisions based on such inferences (e.g., the granting of loans). It has
been argued that for an inference to be reasonable it should satisfy the following criteria: 77
(a) Acceptability: the input data (the predictors) for the inference should be normatively
acceptable as a basis for inferences concerning individuals (e.g., to the exclusion of
prohibited features, such as sexual orientation);
(b) Relevance: the inferred information (the target) should be relevant to the purpose of the
decision and normatively acceptable in that connection (e.g., ethnicity should not be
inferred for the purpose of giving a loan).
(c) Reliability: both input data, including the training set, and the methods to process them
should be accurate and statistically reliable (see Section 2.3.3).
Controllers, conversely, should be prohibited to base their assessment or decisions on unreasonable
inferences, and they should also have the obligation to demonstrate the reasonableness of their
inferences.
The idea the unreasonable automated inference should be prohibited only applies to inferences
meant to lead to assessments and decisions affecting the data subject. They should not apply to
inquiries that are motivated by merely cognitive purposes, such as those pertaining to scientific
research.
77
Wachter and Mittelstadt (2019).
78
See Cate et all (2014).
41
STOA | Panel for the Future of Science and Technology
and anticipate the involved risks. Moreover, even if data subjects possessed such skills, still they
would not have the time and energy to go through the details of each privacy policy. On the other
hand, a refusal to consent may imply the impossibility to use (or limitation in the use of) services
that are important or even necessary to the data subjects.
The second criticism is that consent, when targeted on specific purposes, does not include (and
therefore precludes, when considered a necessary basis of the processing) future, often unknown,
uses of the data, even when such uses are socially beneficial. Thus, the requirement of consent can
'interfere with future benefits and hinder valuable new discoveries', as exemplified in 'myriad
examples', including 'examining health records and lab results for medical research, analysing
billions of Internet search records to map flu outbreaks and identify dangerous drug interactions,
searching financial records to detect and prevent money laundering, and tracking vehicles and
pedestrians to aid in infrastructure planning.'79
These criticisms of consent have been countered by observing that it is possible to implement the
principles of consent and purpose limitation in ways that are both meaningful to the data subject
and consistent with allowing for future beneficial uses of the data. 80
Firstly, it has been argued that notices should focus on most important issue, and that they should
be user-friendly and direct. In particular, simple and clear information should be given on how to
opt-in or opt-out relative to critical processing, such as those involving the tracking of users or the
transmission of data to third parties. An interesting example is provided by the new California Data
Privacy Act, which requires companies to include in their website a link with the words 'do not sell
my data' (or a corresponding logo-button) to enable users to exclude transmission of their data to
third parties. Further opt-out or opt-in buttons could be presented to all users, to provide ways to
express their preferences relatively to tracking, profiling, etc.
Secondly, the GDPR allows that the data that were collected for certain purposes are processed for
further purposes, as long as the latter purposes are compatible with the original ones (see Section
3.3.4).
In conclusion, it seems that, as we shall see in the following, the concepts of consent and purpose
limitation can be interpreted in ways that are consistent with both the protection of the data subject
and the need of enabling beneficial uses of AI. However, AI and big data raise three key issues
concerning consent: specificity, granularity, and freedom.
Specificity
The first issue pertains to the specificity of consent: does consent to the processing for a certain
purpose also cover further AI-based processing, typically for data analytics and profiling? – e.g., can
data on sales be used to analyse consumer preferences and send targeted advertising? This seems
to be ruled out, since consent needs to be specific, so that it cannot extend beyond what is explicitly
indicated. However, the fact that the data subject has only consented to processing for a certain
purpose (e.g., client management) does not necessarily rule out that the data can be processed for
a further legitimate purpose (e.g., business analytics): the further processing is permissible when it
is covered by a legal basis, and it is not incompatible with the purpose for which the data were
collected.
The requirement of specificity is attenuated for scientific research as stated in Recital (33), which
allows consent to be given not only for specific research projects, but also for areas of scientific
research.
79
Cate et al (2014, 9).
80
Cavoukian (2015), Calo (2012).
42
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
It is often not possible to fully identify the purpose of personal data processing for
scientific research purposes at the time of data collection. Therefore, data subjects
should be allowed to give their consent to certain areas of scientific research when in
keeping with recognised ethical standards for scientific research. Data subjects should
have the opportunity to give their consent only to certain areas of research or parts of
research projects to the extent allowed by the intended purpose.
Granularity
The second issue pertains to the granularity of consent. For instance, is a general consent to any
kind of analytics and profiling sufficient to authorise the AI-based sending of targeted commercial
or political advertising? Recital (43) addresses granularity as follows:
Consent is presumed not to be freely given if it does not allow separate consent to be
given to different personal data processing operations despite it being appropriate in
the individual case.
This has two implications for AI application. First it seems that the data subject should not be
required to jointly consent to essentially different kinds of AI-based processing (e.g., to economic
and political ads). Second, the use of a service should not in principle be dependent on an
agreement to be subject to profiling practices. Consent to profiling must be separate from access to
the service. 81
Freedom
The third issue pertains to the freedom of consent: can consent to profiling be considered freely
given? This issue is addressed in Recital (42), which excludes the freedom of consent when 'the data
subject has no genuine or free choice or is unable to refuse or withdraw consent without detriment.'
According to Recital (43), consent is not free under situations of 'clear imbalance:'
In order to ensure that consent is freely given, consent should not provide a valid legal
ground for the processing of personal data in a specific case where there is a clear
imbalance between the data subject and the controller.
Situations of imbalance are prevalent in the typical contexts in which AI and data analytics are
applied to personal data. Such situations exist in the private sector, especially when a party enjoys
market dominance (as is the case for leading platforms), or a position of private power (as is the case
for employers relative to their employees). They also exist between public authorities and the
individuals who are subject to the powers by such authorities. In all these cases, consent cannot
provide a sufficient legal basis, unless it can be shown that there are no risks of 'deception,
intimidation, coercion or significant negative consequence if [the data subject] does not consent.'82
Finally, consent should be invalid when refusal or withdrawal of consent is linked to a detriment that
is unrelated to the availability of the personal data for which consent was refused (e.g., a patients
are told that in order to obtain a medical treatment they must consent that their medical data are
used for purposes that are not needed for that treatment). This also applies to cases in which consent
is required by the provider of a service, even though the processing is not necessary for performing
the service.
if the performance of a contract, including the provision of a service, is dependent on
the consent despite such consent not being necessary for such performance.
81
Article 29 Working Party Guidelines on consent under Regulation 2016/679. Wp259
82
Article 29 Working Party Guidelines on consent under Regulation 2016/679. Wp259, 7
43
STOA | Panel for the Future of Science and Technology
This typically is the case when the closing of a contract for a service is conditioned on the user's
consent to being profiled, the profiling not being needed to provide the service to the individual
user.
Transparency
The idea of transparency is specified in Recital 58, which focuses on conciseness, accessibility and
understandability.
The principle of transparency requires that any information addressed to the public or
to the data subject be concise, easily accessible and easy to understand, and that clear
and plain language and, additionally, where appropriate, visualisation be used.
As we shall clarify in what follows, this idea is related, but distinct, from the idea of transparent and
explainable AI. In fact, the latter idea involves building a 'scientific' model of the functioning of an AI
system, rather than providing sufficient information to lay people, relatively to issues that are
relevant to them.
Informational fairness
Two different concepts of fairness can be distinguished in the GDPR. The first, which we may call
'information fairness' is strictly connected to the idea of transparency It requires that data subjects
are not deceived or misled concerning the processing of their data, as is explicated in Recital (60):
The principles of fair and transparent processing require that the data subject be
informed of the existence of the processing operation and its purposes. The controller
should provide the data subject with any further information necessary to ensure fair
and transparent processing taking into account the specific circumstances and context
in which the personal data are processed.
The same recital explicitly requires that information is provided on profiling:
Furthermore, the data subject should be informed of the existence of profiling and the
consequences of such profiling.
Informational fairness is also linked to accountability, since it presumes that the information to be
provided makes it possible to check for compliance. Informational fairness raises specific issues in
connection with AI and big data, because of the complexity of the processing involved in AI-
applications, the uncertainty of its outcome, and the multiplicity of its purposes. The new dimension
of the principle pertains to the explicability of automated decisions, an idea that is explicitly affirmed
in the GDPR, as we shall see in the following section. Arguably, the idea of transparency as
explicability can be extended to automated inferences, even when a specific decision has not yet
been adopted.
A specific aspect of transparency in the context of machine learning concerns access to data, in
particular to the system's training set. Access to data may be needed to identify possible causes of
unfairness resulting from inadequate or biased data or training algorithm. This is particularly
44
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
important when the learned algorithmic model is opaque, so that possible flaws cannot be detected
though its inspection.
Substantive fairness
Recital (71) points to a different dimension of fairness, i.e. what we may call substantive fairness,
which concerns the fairness of the content of an automated inference or decision, under a
combination of criteria, which may be summarised by referring to the aforementioned standards of
acceptability, relevance and reliability (see Section3.1.2):
In order to ensure fair and transparent processing in respect of the data subject, taking
into account the specific circumstances and context in which the personal data are
processed, the controller should use appropriate mathematical or statistical procedures
for the profiling, implement technical and organisational measures appropriate to
ensure, in particular, that factors which result in inaccuracies in personal data are
corrected and the risk of errors is minimised, secure personal data in a manner that
takes account of the potential risks involved for the interests and rights of the data
subject and that prevents, inter alia, discriminatory effects on natural persons on the
basis of racial or ethnic origin, political opinion, religion or beliefs, trade union
membership, genetic or health status or sexual orientation, or that result in measures
having such an effect.
AI and repurposing
A tension exists between the use of AI and big data technologies and the purpose limitation
requirement. These technologies enable the useful reuse of personal data for new purposes that are
different from those for which the data were originally collected. For instance, data collected for the
purpose of contract management can be processed to learn consumers' preferences and send
targeted advertising; 'likes' that are meant to express and communicate one's opinion may be used
to detect psychological attitudes, political or commercial preferences, etc.
To establish whether the repurposing of data is legitimate, we need to determine whether a new
purpose is 'compatible' or 'not incompatible' with the purpose for which the data were originally
collected. According to the Article 29 WP, the relevant criteria are (a) the distance between the new
45
STOA | Panel for the Future of Science and Technology
purpose and the original purpose, (b) the alignment of the new purpose with the data subjects'
expectations, the nature of the data and their impact on the data subjects' interests, and (c) the
safeguards adopted by the controller to ensure fair processing and prevent undue impacts.83
Though all these criteria are relevant to the issue of compatibility, they do not provide a definite
answer to the typical issues pertaining to the reuse of personal data in AI applications. To what
extent can the repurposing of personal data for analytics and AI be compatible with the purpose of
the original collection? Should the data subjects be informed that their data is being repurposed?
To address such issues, we need to distinguish what is at stake in the inclusion of a person's data in
a training set from the application of a trained model to a particular individual.
83
Opinion 03/2013 on purpose limitation.
46
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
With regard to the use of a person's data in a training set, it seems that since the person is not directly
affected by the use of her personal data, the distance between the new purpose and the original
purpose should not be a primary concern, nor should be the data subject's expectations. However,
we need to consider the risk that the data are misused, against the interest of the data subject (the
risk is particularly serious for data on health or other sensitive conditions), as well as the possibility
of mitigating this risk through anonymisation or pseudonymisation. Adequate security measures
also are the key precondition for the legitimate use of personal data in a training set.
Different considerations pertain to the use of a personal data as input to algorithmic models that
provide inferences concerning the data subject. This case clearly falls within the domain of profiling
as the inference directly affects the individuals concerned. Therefore, the criteria indicated by the
Article 29 WP have to be rigorously applied.
Obviously, the two uses of personal data may be connected in practice: personal data (for instance
data outlining an individual's clinical history, or the history of his or her online purchases) can be
processed to learn an algorithmic model, but they can also be used as inputs for the same or other
algorithmic models (e.g., to predict additional health issues, or further purchases).
47
STOA | Panel for the Future of Science and Technology
Since the data subject is not individually affected by statistical processing, the proportionality
assessment, as far as data protection is concerned, concerns the comparison between the
(legitimate) interest in obtaining the statistical results, and the risks of the data being misused for
non-statistical purposes.
It is true that the results of statistical processing can affect the collective interests of the data subjects
who share the factors that are correlated to certain inferences (e.g., the individuals whose live style
and activities are correlated to certain pathologies, certain psychological attitudes, or certain market
preferences or political views). The availability of this correlation exposes all members of the group
– as soon as their membership in the group is known – to such inferences. However, as long as the
correlation is not meant to be applied to particular individuals, on the basis of data concerning such
individual (data determining its belonging to the group) statistical processing remains outside of
data protection. On the contrary, the information used to ascribe a person to a group and the
person's ascription to that group are personal data, and so are the consequentially inferred data
concerning that person. This idea is expressed in at footnote 5 in the 2017 Council of Europe
Guidelines on the protection of individuals with regard to the processing of personal data in a world
of big data
personal data are also any information used to single out people from data sets, to take
decisions affecting them on the basis of group profiling information.
Thus, neither in the GDPR nor in the in Guidelines can we yet find an explicit endorsement of group
privacy as an aspect of data protection. On the contrary, the need to take into account group privacy
has been advocated by many scholars.84 However, as we shall see in the following, a preventive risk-
management approach can contribute to the protection of group privacy also in the context of
GPDR.
84
On the Guidelines, see Mantelero (2017).
48
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
[Personal data should be] kept in a form which permits identification of data subjects
for no longer than is necessary for the purposes for which the personal data are
processed.
Longer storage is however allowed for archiving, research, or statistical purposes.
[P]ersonal data may be stored for longer periods insofar as the personal data will be
processed solely for archiving purposes in the public interest, scientific or historical
research purposes or statistical purposes in accordance with Article 89(1) subject to
implementation of the appropriate technical and organisational measures required by
this Regulation in order to safeguard the rights and freedoms of the data subject
('storage limitation');
There is undoubtable tension between the AI-based processing of large sets of personal data and
the principle of storage limitation. This tension can be limited to the extent that the data are used
for statistical purposes, and appropriate measures are adopted at national level, as discussed above
in 3.2.3.
49
STOA | Panel for the Future of Science and Technology
do not apply to the AI-based processing that is subsequent to or independent of such aims in the
specific case at hand.
For instance, the necessity of using personal data for performing or entering a particular contract
does not cover the subsequent use of such data for purposes of business analytics. Similarly, this
legal basis does not cover the subsequent use of contract data as input to a predictive-decisional
model concerning the data subject, even when the data are used for offering a different contract to
the same person. Assume, for instance that the data subject's health data are necessary for
performing an insurance contract with the data subject. This necessity would not cover to the use
of the same data for offering a new contract to the same data subject, unless the data subject has
requested to be considered for a new contract, i.e., unless the data are necessary 'in order to take
steps at the request of the data subject prior to entering into a contract' (Article 6(b)).
85
On legitimate interest, see Kamara and De Hert (2019).
50
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
outlaw it based on this assessment, should be adopted on the basis of on a wide debate, and
according to the determination or at least to the directions, of politically responsible bodies.
51
STOA | Panel for the Future of Science and Technology
The issues of the admissibility of processing personal data for new and different purposes has
become crucial in the era of AI and big data, when vast and diverse masses of data are available and
artificial intelligence or statistical methods are then deployed to discover correlations and identify
possible causal links. As noted above this may lead to the discovery of unexpected connections
based on the combination of disparate sets of data (e.g., connections between lifestyle preferences
in social networks and health conditions, between consumer behaviour and market trends, between
internet queries and the spread of diseases, between internet likes and political preferences, etc.).
The results of these analyses (e.g., correlations discovered between consumers' data and their
preferences, spending capacities and purchasing propensities, etc.) can then be used to assess or
influence individual behaviour (e.g., by sending targeted advertisements).
Repurposing is key in the domain of big data and AI, since the construction of big data sets often
involves merging data that had been separately collected for different purposes, and processing
such data to address issues that were not contemplated at the time of collection. A key issue for the
future of the GDPR pertains to the extent to which the compatibility test will enable us to draw a
sensible distinction between admissible and inadmissible reuses of the data for the purposes of
analytics.
Recital (50) does not help us much in addressing this issue, since it seems to indicate that no legal
basis is required for compatible repurposing: 'where the processing is compatible with the purposes
for which the personal data were initially collected […] no legal basis separate from that which
allowed the collection of the personal data is required.' Moreover, Recital (50) seems to presume
that all processing for statistical purposes is admissible, by affirming that 'further processing for …
statistical purposes should be considered to be compatible lawful processing operations.' This
presumption has been limited by the Article 29 WP, who has argued that compatibility must be
checked also in the case of statistical processing.
In conclusion, it seems that two requirements are needed for repurposing to be permissible: (a) the
new processing must be compatible with the purpose for which the data were collected, and (b) the
new processing must have a legal basis (that may be, but is not necessarily, the same of the original
processing). Following Recital (50) it seems that statistical processing should be presumed to be
compatible, unless reasons for incompatibility appear to exist.
By applying these criteria to the AI-based reuse of data, we must distinguish whether the data are
reused for statistical purposes or rather for profiling. Reuse for a merely statistical purpose should in
general be acceptable since it does affect individually the data subject, and thus it should be
compatible with the original processing. If the statistical processing is directed towards a
permissible goal, such as security or market research, it can also rely on the legal basis of Article
6(1)(f), i.e., on its necessity for achieving purposes pertaining to legitimate interests.
Different would be the case for profiling. In such a case, the compatibility assessment is much more
uncertain. It should lead to a negative outcome whenever AI-based predictions or decisions may
affect the data subject in a way that negatively reverberates on the original purpose of the
processing. Consider, for instance, the case in which a person's data collected for medical purpose
are inputted to an algorithmic model that determines an insurance price for that person.
It has been argued that the possibility to repurpose personal data for statistical processing is very
important for European economy, since European companies need to extract information on
markets and social trends – as US and Asian companies do – in order to be competitive. 86 The use of
personal data for merely statistical purposes should enable companies to obtain the information
86
On statistical uses and big data, see Mayer-Schonberger and Padova 2016)
52
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
they need without interfering with the data subjects rights. In fact, as we noted above, according to
Recital (162) the processing remains statistical only as long as the result the processing
is not personal data, but aggregate data, and that this result or the personal data are
not used in support of measures or decisions regarding any particular natural person.
53
STOA | Panel for the Future of Science and Technology
[The obligation to provide information to the data subject does not apply when] the
provision of such information proves impossible or would involve a disproportionate
effort, in particular for processing for archiving purposes in the public interest, scientific
or historical research purposes or statistical purposes, subject to the conditions and
safeguards referred to in Article 89(1) or in so far as the obligation referred to in
paragraph 1 of this Article is likely to render impossible or seriously impair the
achievement of the objectives of that processing. In such cases the controller shall take
appropriate measures to protect the data subject's rights and freedoms and legitimate
interests, including making the information publicly available.
This limitation only applies when the data have not been collected from the data subject. It is hard
to understand why this is the case. In fact, the reasons that justify an exception to the information
obligation when the data were not obtained from the data subject, should also justify the same
exception when the data were collected from him or her.
87
Floridi et al (2018).
88
Guidotti et al (2019).
54
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
89
Miller (2019). Mittelstadt and Wachter (2019).
55
STOA | Panel for the Future of Science and Technology
from third parties, and your search and booking history) to send you promotional
messages, marketing, advertising and other information that we think may be of
interest to you.
The data subject would benefit from more precise and relevant information, especially when
important decisions are at stake. In particular, with regard to complex AI systems, the possibility of
providing modular information should be explored, i.e., providing bullet points that laypeople can
understand, with links to access more detailed information possibly covering technical aspects.
However, it is unlikely that the information that is provided to the general public will be sufficient
to gain an understanding that is sufficient for identifying potential problems, dysfunctions,
unfairness. This would assume access to the algorithmic model, or at least the possibly of subjecting
it to extensive testing, and in the case of machine learning approaches, access to the system's
training set.
It has been argued that it would important to enable citizen to engage in 'black box tinkering', i.e.,
on a limited reverse-engineering exercise that consists in submitting test cases to a system and
analysing the system's responses to detect faults and biases. 90 This approach, which involves a
distributed and non-systematic attempt at sensitivity analysis, has the advantage of democratising
controls but is likely to have a limited success given the complexity of AI applications and the
limitations on access to them.
90
Perel and Elkin-Koren (2017).
56
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
The scope of the right to access, or the ways of implementing it are limited by the requirement that
but it
should not adversely affect the rights or freedoms of others, including trade secrets or
intellectual property and in particular the copyright protecting the software.
This limitation, however, should not entail a complete denial of the right to information:
[T]he result of these considerations should not be a refusal to provide all information to
the data subject. Where the controller processes a large quantity of information
concerning the data subject, the controller should be able to request that, before the
information is delivered, the data subject specify the information or processing
activities to which the request relates'
There has been a wide discussion on whether Article 15 should be read as granting data subjects
the right to obtain an individualised explanation of automated assessments and decisions.91
Unfortunately, the formulation of Article 15 is very ambiguous, and that ambiguity is reflected in
Recital 63. In particular it is not specified whether the obligation to provide information on the 'logic
involved' only concerns providing general information on the methods adopted in the system, or
rather specific information on how these methods where applied to the data subject (i.e., an
individual explanation, as we shall see in Section 3.6.5).
91
Wachter et al (2016), Edwards and Veale (2019).
57
STOA | Panel for the Future of Science and Technology
1. The data subject has grounds relating to his or her particular situation that support
the request.
2. The processing is based on the legal basis of Article 6 (3)(e), i.e. necessity of the
processing for performing a public task in the public interest or for the exercise of
legitimate authority, or on the legal basis of Article 6 (3)(f), i.e., necessity of the
processing for purposes of the legitimate interests pursued by the controller or by a
third party.
3. The controller fails to demonstrate compelling legitimate grounds for the processing
which override the interests, rights and freedoms of the data subject.
If all these conditions are satisfied, the controller has the obligation to terminate the processing.
The right to object is particularly significant with regard to profiling, since it seems that only in very
special cases the controller may have overriding compelling legitimate grounds for continuing to
profile a data subject which objects to the profiling on personal grounds.
The right to object does not apply to a processing that is based on the data subject's consent, since
in this case the data subject can impede the continuation of the processing just by withdrawing
consent (according to Article 7 (3) GDPR).
The GDPR, in regulating the right to object, explicitly refers to profiling, and introduces special
norms concerning direct marketing and statistical processings. Such provisions are relevant to AI,
given that profiling and statistics are indeed key applications of AI to personal data.
3.5.5. Article 21 (1) and (2): Objecting to profiling and direct marketing
Article 21 (1) specifies that the right to object also applies to profiling:
The data subject shall have the right to object, on grounds relating to his or her
particular situation, at any time to processing of personal data concerning him or her
which is based on point (e) or (f) of Article 6(1), including profiling based on those
provisions.
Profiling in the context of direct marketing is addressed in Article 21 (2), which recognises an
unconditioned right to object:
Where personal data are processed for direct marketing purposes, the data subject
shall have the right to object at any time to processing of personal data concerning him
or her for such marketing, which includes profiling to the extent that it is related to such
direct marketing.
This means that the data subject does not need to invoke specific grounds when objecting to
processing for direct marketing purposes, and that such purposes cannot be 'compelling legitimate
grounds for the processing which override the interests, rights and freedoms of the data subject'.
Given the importance of profiling for marketing purposes, the unconditional right to object to such
processing is particularly significant for the self-protection of data subjects. Controllers should be
required to provide easy, intuitive and standardised ways to facilitate the exercise of this right.
58
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
cannot consist in personal data). The right to object does not apply when the processing is carried
out for reasons of public interest (it therefore applies, a contrario, when the processing is aimed at
private commercial purposes):
Where personal data are processed for scientific or historical research purposes or
statistical purposes pursuant to Article 89(1), the data subject, on grounds relating to
his or her particular situation, shall have the right to object to processing of personal
data concerning him or her, unless the processing is necessary for the performance of
a task carried out for reasons of public interest.
A further limitation is introduced by Article 17(3)(d), which limits the right to erasure when its
exercise would make it impossible or would seriously undercut the ability to achieve the objectives
of the processing for archiving, research or statistical purposes. This limitation would probably find
limited application to big data, since the exclusion of a single records from the processing would
likely have little impact on the system's training or, at any rate, on the definition of its algorithmic
model.
92
Mendoza and Bygrave (2017).
93
Article 29, WP251/2017 last revised 2018, 19.
59
STOA | Panel for the Future of Science and Technology
who are responsible for the decision, deliberate on the merit of each case, and autonomously decide
whether to accept or reject the system's suggestions.94
The third condition requires that the automated processing determining the decision includes
profiling. A different interpretation could be suggested by the comma that separates 'processing'
and 'including profiling' in Article 22(1), which seems to indicate that profiling only is an optional
component of the kind of automated decisions that are in principle prohibited by Article 22(1).
However, the first interpretation (the necessity of profiling) is confirmed by Recital (71), according
to which the processing at stake in the regulation of automated decision must include profiling:
Such processing includes 'profiling' that consists of any form of automated processing
of personal data evaluating the personal aspects relating to a natural person, in
particular to analyse or predict aspects concerning the data subject's performance at
work, economic situation, health, personal preferences or interests, reliability or
behaviour, location or movements.
The fourth condition requires that the decision
produces legal effects concerning [the data subject] or similarly significantly affects him
or her.
Recital (71) mentions the following examples of decision having significant effects: the 'automatic
refusal of an online credit application or e-recruiting practices'.95 It has been argued that such effects
cannot be merely emotional, and that usually they are not caused by targeted advertising, unless
'advertising involves blatantly unfair discrimination in the form of web-lining and the discrimination
has non-trivial economic consequences (e.g., the data subject must pay a substantially higher price
for goods or services than other persons).' 96
Many decisions made today by AI systems fall under the scope of Article 21(1), as AI algorithms are
increasingly deployed in recruitment, lending, access to insurance, health services, social security,
education, etc. The use of AI makes it more likely that a decision will be based 'solely' on automated
processing. This is due to the fact that humans may not have access to all the information that is
used by AI systems, and may not have the ability to analyse and review the way in which this
information is used. It may be impossible, or it may take an excessive effort to carry out an effective
review – unless the system has been effectively engineered for transparency, which in some cases
may be beyond the state of the art. Thus, especially when a large-scale opaque system is deployed,
humans are likely to merely execute the automated suggestions by AI, even when they are formally
in charge. Moreover, human intervention may be prevented by the costs-and-incentives structure
in place: humans are likely not to substantially review automated decision, when the cost of
engaging in the review – from an individual or an institutional perspective– exceeds the significance
of the decision (according to the decision-maker's perspective).
94
Article 29, WP251/2017 last revised 2018, 21-22.
95
For an analysis of legal effects and of similarly relevant effects, see Article 29, WP251/2017 last revised 2018,
96
Medoza and Bygrave (2017, 89).
60
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
b) is authorised by Union or Member State law to which the controller is subject, and
which also lays down suitable measures to safeguard the data subject's rights and
freedoms and legitimate interests; or
c) is based on the data subject's explicit consent.
Based on the broad exception of item (a), automated decision-making is enabled in key areas such
as recruitment and lending. However, for the exception to apply, decisions based solely on
automated processing must be 'necessary.' Such necessity may depend on the high number of cases
to be examined (e.g., a very high number of applications to a job). The necessity of using AI in
decision-making may also be connected to AI capacities to outperform human judgement. In this
connection we may wonder whether human involvement will still contribute to a stronger
protection of data subjects, or whether the better performance of machines – even with regard to
the political and legal values at stake, e.g., ensuring 'fair equality of opportunity' for all applicants to
a position 97 – will make human intervention redundant or dysfunctional. Outside of the domain of
contract and legal authorisation, consent may provide a basis for automated decision-making
according to Article 22(2)(c). However, the conditions for valid consent not always obtain, even in
cases when automated decision-making seems appropriate. Consider for instance the case in which
an NGO uses an automated method for classifying (profiling) applicants to determine their need and
consequently allocate certain benefits to them. In such a case, it is very doubtful that an applicant's
consent may be viewed as free (as not consenting would entail being excluded from the benefit),
but the system seems socially acceptable and beneficial even so.
97
Rawls ([1971 1999, 63).
98
Article 29, WP251/2017 last revised 2018, 32
99
Article 29, WP251/2017 last revised 2018, 17
61
STOA | Panel for the Future of Science and Technology
machine learning, this should apply not only to the data concerning the person involved in a
particular decision, but also to the data in a training set, where the biases built into the training set
may affect the learned algorithmic model, and hence the accuracy the system's inferences.
Other measures pertain to the interaction with the data subjects, such the right to obtain human
intervention and the right to challenge a decision. For instance, a link could be provided to 'an
appeals process at the point the automated decision is delivered to the data subject, with agreed
time scales for the review and a named contact point for any queries.'100 An appeals process is most
significant with regard to AI applications, and especially when these applications are 'opaque', i.e.,
they are unable to provide human-understandable explanations and justifications.
100
Article 29, WP251/2017 last revised 2018, 32
62
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
101
Wachter et al (2016).
102
Edwards and Veal (2019).
63
STOA | Panel for the Future of Science and Technology
interests and resources and coordinate their activities. The same may also apply to the right to an
explanation, which is likely to remain underused by the data subjects, given that they may lack a
sufficient understanding of technologies and applicable normative standards. Moreover, even when
an explanation elicits potential defects, the data subjects may be unable to obtain a new, more
satisfactory decision.
103
Guidelines of the European Data Protection Board of 3 October 2017 on Automated individual decision-making and
Profling, p. 25.
104
Directive 2011/83/EU of the European Parliament and of the Council of 25 October 2011 on consumer rights, as
amended by Directive 2019/2161/EU of the European Parliament and of the Council of 27 November 2019 amending
Council Directive 93/13/EEC and Directives 98/6/EC, 2005/29/EC and 2011/83/EU of the European Parliament and of the
Council as regards the better enforcement and modernisation of Union consumer protection rules
64
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
6. specific information on what data have been collected about the data subject and used
for profiling him or her;
7. specific information on what values for the features of the data subject determined the
outcome concerning him or her;
8. specific information on what data have been inferred about the data subject;
9. specific information on the inference process through which certain values for the
features of the data subject have determined a certain outcome concerning him or her.
In this list, items from (1) to (5) concern information ex ante, to be provided before the data are
collected or anyway processed, while items from (5) to (9) concern information to be provided ex
post.
With regard to the ex-ante information, it is sure that the controller is required to provide the
information under (1) and (2). Information under (3) may also be required, when the adopted
technology makes a relevant difference (e.g., it may be inappropriate or lead to errors and biases).
Information under (4) should also be provided, as a minimal account of the 'logic' of the processing,
at least relative to the categories into which the input factors can be classified. This idea is explicitly
adopted in the California Consumer Privacy Act, which at Section 1798.100 (b) requires controllers
to 'inform consumers as to the categories of personal information to be collected.' We may wonder
whether also some information under (5) should be provided, as an aspect of the information about
the 'logic' of the processing, though it may not easy to determine in the abstract (without reference
to a specific case) the importance of a certain input factor.
With regard to the ex-post information, all data under (6) should be provided, as they are the object
of the right to access. Information about (7) should also be provided, if we assume that there is right
to individualised explanation. An individualised explanation may also require information about (8),
when the intermediate conclusions by the system play a decisive role. Finally, information about (9)
might also be provided, though information on (7) and (8) should generally be sufficient to provide
adequate individualised explanations
The information above needs to be complemented with further information in the case of decisions
by public authorities, in which case also a reference to the norms being applied and the powers
being exercised is needed, based on principles concerning the required justification for
administrative acts.
Given the variety of ways in which automated decision-making can take place, it is hard to specify
in precise and general terms what information should be provided. What information the controller
may be reasonably required to deliver will indeed depend on the importance of the decision, on the
space of discretion that is being used, and on technological feasibility. However, it seems that data
subjects who did not obtain the decision they hoped for should be provided with the specific
information that most matters to them, namely, with the information on what values for their
features determined in their case an unfavourable outcome. The relevant causal factors could
possibly be identified by looking at the non-normal values that may explain the outcome. Consider
for instance the case of person having an average income, and an ongoing mortgage to repay,
whose application for an additional mortgage is rejected. Assume both of the following
hypotheticals: (a) if the person had had a much higher income her application would have been
accepted, regardless of her ongoing mortgage, and (b) if she had had no ongoing mortgage, her
application would have been accepted, given her average income. Under such circumstances, we
would say that it was the previous mortgage, rather than the average income, the key reason or
65
STOA | Panel for the Future of Science and Technology
cause explaining why the mortgage application was rejected, since it is what explains the departure
from the standard outcome for such a case. 105
105
On the connection between causal explanations and (ab)normality, see Halpern and Hitchcock (2013)
106
Edwards and Veal (2019).
107
Edwards and Veal (2019).
108
Citron (2008).
109
Citron and Pasquale (2014).
66
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
The GDPR also contains a number of provisions that contribute to prevent the misuse of AI, in
particular, in connection with the idea of 'privacy by design and by default', namely, with preventive
technological and organisational measures.110
A serious issue pertaining to risk-prevention and mitigation measures concerns whether the same
measures should be required by all controllers engaging in similar processings or whether a
differentiated approach is needed, that takes into account the size of controllers and their financial
and technical capacity of adopting the most effective precautions. More precisely, should the same
standards be applied both to the Internet giants, which have huge assets and powerful technologies
and profit of monopolistic rents, and to small start-ups, which are trying to develop innovative
solutions with scanty resources. Possibly a solution to this issue can be found by considering that
risk prevention and mitigation measures are the object of best effort obligations, having a
stringency that is scalable, depending not only on the seriousness of the risk, but also the capacity
of the address of the obligation. Thus, more stringent risk preventions measures may be required to
the extent that the controller both causes a more serious social risk, by processing a larger quantity
of personal data on larger set of individuals and has superior ability to respond to risk in effective
and financially sustainable ways.
110
Edwards and Veal 2019.
67
STOA | Panel for the Future of Science and Technology
68
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
111
AI Now (2018) report
112
Edwards and Veal (2019, 80).
113
Edwards and Veal (2019, 80).
69
STOA | Panel for the Future of Science and Technology
114
Mayer-Schonberger and Padova 2016.
70
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
personal data are not used in support of measures or decisions regarding any particular
natural person.
As it emerges from this characterisation, the meaning of statistical purpose in the GDPR is not
narrowly defined and may be constructed as including not only uses for the public interest, but also
by private companies for commercial goals.115
3.8.2. Article 5(1)(b) GDPR: Repurposing for research and statistical processing
According to Article 5(1)(b) repurposing data for statistical purposes is in principle admissible, as it
will 'not be considered to be incompatible with the initial purposes.' Similarly, at 5(1)(e) data
retention limits are relaxed with regard to processing for research and statistical purposes. However,
processing for research and statistical purposes requires appropriate safeguards, including in
particular pseudonymisation. On the other hand, EU or National law may provide for derogation
from the data subjects' rights, when needed to achieve scientific or statistical purposes.
115
Mayer-Schoeberger and Padova 2016, 326-7
71
STOA | Panel for the Future of Science and Technology
GDPR is making small, but noteworthy steps towards enabling Big Data in Europe. It is
a peculiar kind of Big Data, though, that European policymakers are facilitating: one that
emphasizes reuse and permits some retention of personal data, but that at the same
time remains very cautious when collecting data.116
The facilitations for scientific and statistical processing, however, may extend beyond reuse and
retention: these kinds of processing may also be justified by legitimate interests according to 6(1)(f),
as long as the processing is done in such a way as to duly fulfil the that data subjects' data protection
interests, including their interests in not being subject to risks because of unauthorised uses of their
data.
A difficult issue concerns whether access to the data sets of personal information supporting
statistical inferences (e.g., to predict consumer preferences, or market trends) should be limited to
the companies or public bodies who have collected the data. On the one hand, allowing, or even
requiring, that the original controllers do not make the data accessible to third parties, may affect
competition and prevent beneficial uses of the data. On the other hand, requiring the original
controllers to make their data sets available to third parties would cause additional data protection
risks.
116
Mayer-Schoenberger and Padova 2016, 331
72
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
73
STOA | Panel for the Future of Science and Technology
4.2.2. Profiling
Profiling is at the core of the application of AI to personal data: it consists in inferring new personal
data (expanding a person's profile) on the basis of the available personal data. Profiling provides the
necessary precondition for automated decision-making, as specifically regulated in the GDPR. A key
issue is the extent to which the law may govern and constrain such inferences, and the extent of the
data subject's rights in relation to them. This aspect is also not clearly worked out in the GDPR.
Neither is the extent to which the data subject may have a right to reasonable automated inferences
clear, even when these inferences provide a basis for making assessments or decisions.
4.2.3. Consent
The requirement of specificity, granularity and freedom of consent are difficult to realise in
connection with AI applications. Thus, in general, consent will be insufficient to support an AI
application, unless it appears that the application pursues a legitimate interest and does not unduly
sacrifice the data subject's rights and interests under Article 6 (1)(f). There are, however, cases in
which consent by the data subject would be the decisive criterion by which to determine whether
his or her interests have been sufficiently taken into consideration by the controller (e.g., consent to
profiling in the interest of the data subject).
74
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
There is an uncertainty as to what is meant by the logic and consequences of an automated decision.
With regard to complex AI processing, there is a conflict between the need for the information to be
concise and understandable on the one hand, and the need for it to be precise and in-depth on the
other.
Ex-post information is addressed by Article 15(1), which reiterates the same information
requirements in Articles 13 and 14. It remains to be determined whether the controller is required
to provide the data subject with only general information or also with an individualised explanation.
75
STOA | Panel for the Future of Science and Technology
over the representativeness of training sets, over the reasonableness of the inferences (including
the logical and statistical methods adopted) and over the absence of unfairness and discrimination.
Appropriate security measures, such as encryption or pseudonymisation, should also prevent
unauthorised uses of the data (Article 32 (1)). High risk processing operations are subject to
mandatory data protection assessment (Article 35 (1)), a requirement that applies in particular to
the 'systematic and extensive evaluation of personal aspects' for the purpose of automated
decision-making including profiling (Article 35 (3)(a)). Article 37 requires that a data protection
officer be designated when a 'regular and systematic monitoring of data subjects on a large scale' is
envisaged. Articles 40-43, on codes of conduct and certification, although not specifically
addressing AI, identify procedures for anticipating and countering risks, and incentivise the
adoption of preventive measures that are highly significant to AI.
117
Zarsky (2017), Hildebrandt (2015)
76
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
The requirement that consent be specific and purpose limitation be respected should be linked to
a flexible application of the idea of compatibility, that allows for the reuse of personal data when
this is not incompatible with the purpose for which the data were collected. As noted above, the
legal basis laid down in Article (6)(1)(f), namely, that the processing should serve a legitimate interest
that is not outweighed by the interests of the data subjects, in combination with a compatibility
assessment of the new uses, may provide sufficient grounds on which to make reuse permissible.
Moreover, as noted above, reuse for statistical purposes is assumed to be compatible, and thus
would in general be admissible (unless it involves unacceptable risks for the data subject).
Even the principle of data-minimisation can be understood in such a way as to enable a beneficial
application of AI. This may involve in some context reducing the 'personality' of the data, namely
the ease with which they can be connected to the individuals concerned, with measures such as
pseudonymisation, rather than focusing on the amount of personal data to be preserved. This also
applies to re-identification, the possibility of which should not exclude the processing of data which
can be re-identified, but rather requires viewing re-identification as the creation of new personal
data, which should be subject to all applicable rules, and strictly prohibited unless all conditions for
the lawful collection of personal data are met, and should also be subject to the compatibility test.
The information requirements established by the GDPR can also be met with regard to AI-based
processing, even though the complexity of AI systems represents a difficult challenge. The
information concerning AI-based applications should enable the data subjects to understand the
purpose of the processing and its limits, without going into technical details.
The GDPR allows for inferences based on personal data, including profiling, but only under certain
conditions and so long as the appropriate safeguards are adopted.
The GDPR does not exclude automated decision-making, as it provides for ample exceptions –
contract, law or consent – to the general prohibition set forth in Article 22(1). Uncertainties exist
concerning the extent to which an individual explanation should be provided to the data subject.
Uncertainties also exist about the extent to which reasonableness criteria may apply to automated
decisions.
The GDPR provisions on preventive measures, and in particular those concerning privacy by design
and by default should also not hinder the development of AI applications, if correctly designed and
implemented, although they may entail some additional costs.
Finally, the possibility of using the data for statistical purposes – with appropriate security measures,
proportionate to the risks, which should include at least pseudonymisation – opens wide spaces for
the processing of personal data in ways that do not involve the inference of personal data.
77
STOA | Panel for the Future of Science and Technology
automated decision-making (Article 22 (2)), and the appropriateness of the technical and
organisational measures for data protection by design and by default (Article 25).
In various cases, the interpretation of undefined GDPR standards requires balancing competing
interests: it requires determination of whether a certain processing activity, and the measures
adopted are justified on balance, i.e., whether the controller's interests in processing the data and
in (not) adopting certain measures are outweighed by the data subjects' interests in not being
subject to the processing or in being protected by additional or stricter measures. These
assessments depend on both (a) uncertain normative judgements on the comparative importance
of the impacts on the interests at stake and (b) uncertain forecasts concerning potential future risks.
In the case of AI and big data applications the uncertainties involved in applying indeterminate
concepts and balancing competing interests are aggravated by the novelty of the technologies,
their complexities, the broad scope of their individual and social effects.
It is true that the principles of risk-prevention and accountability potentially direct the processing of
personal data toward being a 'positive sum' game (where the advantages of the processing, when
constrained by appropriate risk-mitigation measures, outweigh its possible disadvantages), and
enable experimentation and learning, avoiding the over- and under-inclusiveness issues involved in
the applications of strict rules. On the other hand, by requiring controllers to apply these principles,
the GDPR offloads the task of establishing how to manage risk and find optimal solutions onto
controllers, a task which may be both challenging and costly. The stiff penalties for non-compliance,
when combined with the uncertainty as to what is required for compliance, may constitute a novel
risk, which, rather than incentivising the adoption of adequate compliance measure, may prevent
small companies from engaging in new ventures.
No easy solution is available in the hyper-complex and rapidly evolving domain of AI technologies:
rules may fail to enable opportunities and counter risks, but the private implementation of open
standard, in the absence of adequate legal guidance, may also be unsatisfactory:
[Giving] appropriate content to the law often requires effort, whether in analysing a
problem, resolving value conflicts, or acquiring empirical knowledge. […] [I]ndividuals
contemplating behavior that may be subject to the law will find it more costly to comply
with standards, because it generally is more difficult to predict the outcome of a future
inquiry (by the adjudicator, into the law's content) than to examine the result of a past
inquiry. They must either spend more to be guided properly or act without as much
guidance as under rules. 118
Thus, the way in which the GDPR will affect successful applications of AI and big data in Europe will
also depend on what guidance data protection bodies – and more generally the legal system – will
be able to provide to controllers and data subjects. This would diminish the cost of legal uncertainty
and would direct companies – in particular small ones that mostly need advice – to efficient and
data protection-compliant solutions. Appropriate mechanisms may need to be devised, such as an
obligation to notify data protection authorities when new applications based on profiling are
introduced, but also the possibility to ask for preventive, non-binding, indications on whether and
how such applications should be developed, and with what safeguards.
118
Kaplow (1992, 621).
78
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
authorities needs to be complemented by the support of civil society. As collective interests, power
relations, and societal arrangements are at stake, a broad public debate and the involvement of
representative institutions is also needed.
Collective enforcement is also a key issue that is not answered by the GDPR, which still relies on
individual action by the concerned data subjects. An important improvement toward an effective
protection could consist in enabling collective actions for injunctions and compensation. It has
indeed been observed that US courts have been unable so far to deal satisfactorily with privacy
harms, since on the one hand they rely on old-fashioned theories requiring compensable harms to
be concrete, actual and directly caused by the defendant, and on the other hand they are unable to
address a very high numbers of similar claims, each having small monetary value. 119 In Europe, data
protection authorities can provide an alternative and easier avenue to enforcement, but
nevertheless, the damaged parties have to rely on the judiciary to obtain compensation from privacy
harms, which also includes non-material harm (Article 82). Thus, effective protection is dependent
on the data subject's ability to engage in lawsuits. The possibility for multiple data subjects to merge
similar claims to share cost and engage more effectively with the law is necessary to make legal
remedies available to data subjects.
The Court of Justice has recently denied that a consumer can combine his or her individual data
protection claim with claims concerning other consumers involved in similar cases. 120 In particular,
it has affirmed that Max Schrems could exercise, in the courts of his domicile, only his individual
claim against Facebook for data protection violations. He could not bring, before the same court,
claims for similar violations that had been assigned to him by other data subjects. Perhaps the
proposed directive on collective redress for consumers,121 currently under interinstitutional
negotiation 122, could present an opportunity to enable collective actions in the context of data
protection.
119
Cohen (2019, Ch. 5).
120
Judgment in Case C-498/16 Maximilian Schrems v Facebook Ireland Limited, of 25 January 2018.
121
Proposal for a directive of the European Parliament and of the Council on representative actions for the protection of
the collective interests of consumers, COM(2018) 184 final.
122
See European Parliament Legislative train schedule, Area of Justice and Fundamental Rights, Representative actions for
the protection of the collective interests of consumers - a New deal for consumers at
https://www.europarl.europa.eu/legislative-train/theme-area-of-justice-and-fundamental -rights/file-representative-
actions-for-consumers
79
STOA | Panel for the Future of Science and Technology
• That said, a number of AI-related data protections issues are not explicitly answered
in the GDPR, which may lead to uncertainties and costs, and may needlessly hamper
the development of AI applications.
• Controllers and data subjects should be provided with guidance on how AI can be
applied to personal data consistently with the GDPR, and on the available
technologies for doing so. This can prevent costs linked to legal uncertainty, while
enhancing compliance.
• Providing adequate guidance requires a multilevel approach, which involves civil
society, representative bodies, specialised agencies, and all stakeholders.
• A broad debate is needed, involving not only political and administrative authorities,
but also civil society and academia. This debate needs to address the issues of
determining what standards should apply to AI processing of personal data,
particularly to ensure the acceptability, fairness and reasonability of decisions on
individuals.
• The political debate should also address what applications are to be barred
unconditionally, and which may instead be admitted only under specific
circumstances. Legally binding rules are needed to this effect, since the GDPR is
focused on individual entitlements and does not take the broader social impacts of
mass processing into account.
• Discussion of a large set of realistic examples is needed to clarify which AI applications
are on balance socially acceptable, under what circumstances and with what
constraints. The debate on AI can also provide an opportunity to reconsider in depth,
more precisely and concretely, some basic ideas of European law and ethics, such as
acceptable and practicable ideas of fairness and non-discrimination.
• Political authorities, such as the European Parliament, the European Commission and
the Council could provide general open-ended soft law indications about the values
at stake and ways to achieve them.
• Data protection authorities, and in particular the Data Protection Board, should
provide controllers with guidance on the many issues for which no precise answer can
be found in the GDPR, which could also take the form of soft law instruments
designed with a dual legal and technical competence.
• National Data Protection Authorities should also provide guidance, in particular when
contacted for advice by controllers, or in response to data subjects' queries.
• The fundamental data protection principles – especially purpose limitation and
minimisation – should be interpreted in such a way that they do not exclude the use
of personal data for machine learning purposes. They should not preclude forming
training sets and building algorithmic models, whenever the resulting AI systems are
socially beneficial, and compliant with data protection rights.
• The use of personal data in a training set, for the purpose of learning general
correlations and connection, should be distinguished from their use for individual
profiling, which is about making assessments of individuals.
• The inference of new personal data, as is done in profiling, should be considered as
creation of new personal data, when providing an input for making assessments and
decisions. The same should apply to the re-identification of anonymous or
80
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
81
STOA | Panel for the Future of Science and Technology
5. References
AI-HLEG, High-Level Expert Group on Artificial Intelligence (2019). A definition of AI: Main capabilities and
scientific disciplines.
AI-HLEG, High-Level Expert Group on Artificial Intelligence (2019). Ethics guidelines for trustworthy AI.
Ashley, K. D. (2017). Artificial Intelligence and Legal Analytics. Cambridge University Press.
Balkin, J. M. (2008). The constitution in the national surveillance state. Minnesota Law Review 93, 1–
25.
Balkin, J. M. (2017). The three laws of robotics in the age of big data. Ohio State Journal Law Journal 78,
1217–241.
Barocas, S. and A. D. Selbst (2016). Big data's disparate impact. California Law Re view 104, 671–732.
Bayer, J., Bitiukova, N., Bard, P., Szakacs, J., Alemanno, A., and Uszkiewicz, E. (2019). Disinformation and
propaganda – impact on the functioning of the rule of law in the EU and its member states. Study, Policy
Department for Citizens' Rights and Constitutional Affairs, European Parliament.
Bhuta, N., S. Beck, R. Geiss, C. Kress, and H. Y. Liu (2015). Autonomous Weapons Systems: Law, Ethics,
Policy. Cambridge University Press. Bostrom, N. (2014). Superintelligence. Oxford University Press.
Bosco, F., Creemers, N., Ferraris, V., Guagnin, D., & Koops, B. J. (2015). Profiling technologies and
fundamental rights and values: regulatory challenges and perspectives from European Data Protection
Authorities. In Reforming European data protection law (pp. 3-33). Springer, Dordrecht.
Brynjolfsson, E. and A. McAfee (2011). Race Against the Machine. Digital Frontier Press.
Burr, C. and Cristianini, N. (2019). Can machines read our minds? Minds and Machines 29:461–494.
Calo, M. R. (2012). Against notice skepticism in privacy (and elsewhere). Notre Dame Law Review, 87:1027–
72.
Cate, F. H., P. Cullen, and V. Mayer-Schönberger (2014). Data Protection Principles for the 21st Century:
Revising the 1980 OECD Guidelines. Oxford Internet Institute.
Cath, C., Wachter, S., Mittelstadt, B., Taddeo, M., and Floridi, L. (2018). Artificial intelligence and the 'good
society': the US, EU, and UK approach. Science and Engineering Ethics 24:505–528.
Cohen, J. D. (2019). Between Truth and Power. The Legal Constructions of Informational Capitalism. Oxford
University Press.
Cristianini, N. (2016a, 23 November). Intelligence rethought: AIs know us, but don't think like us. New
Scientist .
Cristianini, N. (2016b, 26 October). The road to artificial intelligence: A case of data over theory. New
Scientist.
Cristianini, N. and T. Scantamburlo (2019). On social machines for algorithmic regulation. AI and
Society.
De Hert, P. and Gutwirth, S. (2009). Data protection in the case law of Strasbourg and Luxemburg:
Constitutionalisation in action. In Gutwirth, S., Poullet, Y., De Hert, P., de Terwangne, C., and Nouwt,
S., editors, Reinventing Data Protection? 3–44. Springer.
Edwards, L. and Veale, M. (2019). Slave to the algorithm? Why a 'right to an explanation' is probably not
the remedy you are looking for. Duke Law and Technology Review, 16-84.
Floridi, L., J. Cowls, M. Beltrametti, R. Chatila, P. Chazerand, V. Dignum, C. Luetge, R. Madelin, U.
Pagallo, F. Rossi, B. Schafer, P. Valcke, and E. Vayena (2018). Ai4people– an ethical framework for a
good ai society: Opportunities, risks, principles, and recommendations. Minds and Machines 28,
689–707.
Galbraith, J. K. ([1952]1956). American Capitalism: The Concept of Countervailing Power. Houghton
Mifflin.
Guidotti, R., A. Monreale, F. Turini, D. Pedreschi, and F. Giannotti (2018). A survey of methods for
explaining black box models. ACM Computer Surveys 51 (5) Article 93, 1–4.
82
The impact of the General Data Protection Regulation (GDPR) on artificial intelligence
Halpern, J. Y. and Hitchcock, C. (2013). Graded causation and defaults. The British Journal for the Philosophy
of Science, 1–45.
Harel, D. and Y. Feldman (2004). Algorithmics: The Spirit of Computing. Addison-Wesley.
Hildebrandt, M. (2009). Profiling and AML. In Rannenberg, K., Royer, D., and Deuker, A., editors, The Future
of Identity in the Information Society. Challenges and Opportunities. Springer.
Hildebrandt, M. (2014). Location data, purpose binding and contextual integrity: What's the message? In
Floridi, L., editor, The protection of information and the right to privacy, 31–62. Springer.
Hildebrandt, M. (2015). Smart Technologies and the End(s) of Law: Novel Entanglements of Law and
Technology. Edgar.
Jobin, A., Ienca, M., and Vayena, E. (2019). Artificial intelligence: the global landscape of ethics guidelines.
Nature Machine Intelligence, 1: 389–399.
Kahneman, D. (2011). Thinking: fast and slow. Allen Lane.
Kamara, I. and De Hert, P. (2019). Understanding the balancing act behind the legitimate interest of the
controller ground: A pragmatic approach. In Seligner, E., Polonetsky, J., and Tene, O., editors, The
Cambridge Handbook of Consumer Privacy. Cambridge University Press.
Kaplow, L. (1992). Rule vs standards: An economical analysis. Duke Law Journal, 42: 557–629.
Kleinberg, J., J. Ludwig, S. Mullainathan, and C. R. Sunstein (2018). Discrimination in the age of
algorithm. Journal of Legal Analysis 10, 113–174.
Kurzweil, R. (1990). The Age of Intelligent Machines. MIT. Kurzweil, R. (2012). How to Create a Mind.
Viking.
Licklider, J. C. R. (1960). Man-computer symbiosis. IRE Transactions on Human Factors in Electronics
HFE-1 (March), 4–11.
Lippi, M., P. Palka, G. Contissa, F. Lagioia, H.-W. Micklitz, Y. Panagis, G. Sartor, and P. Torroni (2019).
Claudette: an automated detector of potentially unfair clauses in online terms of service. Artificial
Intelligence and Law.
Lippi, M., Contissa, G., Jablonowska, A., Lagioia, F., Micklitz, H.-W., Palka, P., Sartor, G., and Torroni, P.
(2020). The force awakens: Artificial intelligence for consumer law. The journal of Artificial Intelligence
Research 67:169 – 190.
Mantelero, A. (2017). Regulating Big Data. The guidelines of the Council of Europe in the context of
the European data protection framework. Computer Law and Security Review 33, 584–602.
Mayer-Schönberger, V. and K. Cukier (2013). Big Data. Harcourt.
Mayer-Schönberger, V. and Y. Padova (2016). Regime change? enabling Big Data through Europe's
new data protection regulation. Columbia Science and Technology Law Review 17, 315–35.
McAfee, A. and E. Brynjolfsson (2019). Machine, Platform, Crowd. Norton.
Marcus, G. and Davis, E. (2019). Rebooting AI: building artificial intelligence we can trust. Pantheon Books.
Mindell, D. A. (2015). Our Robots, Ourselves: Robotics and the Myths of Autonomy. Penguin.
Nilsson, N. (2010). The Quest for Artificial Intelligence. Cambridge University Press. O'Neil, C. (2016).
Weapons of math destruction: how Big Data increases inequality and threatens democracy. Crown
Business. Pariser, E. (2011). The Filter Bubble. Penguin.
O'Neil, C. (2016). Weapons of math destruction: how big data increases inequality and threatens
democracy. Crown Business.
Pariser, E. (2011). The Filter Bubble. Penguin.
Parkin, S. (14 June 2015). Science fiction no more? channel 4's humans and our rogue ai obsessions.
The Guardian.
Pasquale, F. (2019). The second wave of algorithmic accountability. Law and Political Economy.
Pentland, A. (2015). Social Physics: How Social Networks Can Make Us Smarter. Penguin.
Polanyi, K. ([1944] 2001). The Great Transformation. Beacon Press.
83
STOA | Panel for the Future of Science and Technology
Powles, J. and Nissenbaum, H. (2018). The seductive diversion of 'solving' bias in artificial intelligence.
Medium.
Prakken, H. and G. Sartor (2015). Law and logic: A review from an argumentation perspective. Artificial
Intelligence 227, 214–45.
Rawls, J. ([1971] 1999). A Theory of Justice. Oxford University Press.
Ruggeri, S., D. Pedreschi, and F. Turini (2010). Integrating induction and deduction for finding
evidence of discrimination. Artificial Intelligence and Law 18, 1–43.
Russell, S. J. and P. Norvig (2016). Artificial Intelligence. A Modern Approach (3 ed.). Prentice Hall.
Sartor, G. (2017). Human rights and information technologies. In R. Brownsword, E. Scotford, and K.
Yeung (Eds.), The Oxford Handbook on the Law and Regulation of Technology, pp. 424–450. Oxford
University Press.
Stiglitz, J. (2019). People, Power, and Profits. Progressive Capitalism for an Age of Discontent. Norton.
Sunstein, C. R. (2007). Republic.com 2.0. Princeton University Press.
Turing, A. M. ([1951] 1996). Intelligent machinery, a heretical theory. Philosophia Mathematica 4, 256–
60.
van Harmelen, F., V. Lifschitz, and B. Porter (2008). Handbook of Knowledge Representation. Elsevier.
Varian, H. R. (2010). Computer mediated transactions. American Economic Review (2): 100, 1–10.
Varian, H. R. (2014). Beyond Big Data. Business Economics (49), 27–31.
Wachter, S. and B. Mittelstadt (2017). A right to reasonable inferences: Re-thinking data protection
law in the age of Big Data and AI. Columbia Business Law Review, 1–130.
Wachter, S., B. Mittelstadt, and L. Floridi (2016). Why a right to explanation of automated decision-
making does not exist in the General Data Protection Regulation. International Data Privacy Law 7,
76–99.
Yeung, K. (2018). 'Hypernudge': Big data as a mode of regulation by design. Communication and
Society 20, 118–36.
Zarsky, T. Z. (2017). Incompatible: The GDPR in the age of Big Data. Seton Hall Law Review, 47:995–1020.
Zuboff, S. (2019). The Age of Surveillance Capitalism. Hachette.
84
QA-02-20-399-EN-N
This study addresses the relation between the EU
General Data Protection Regulation (GDPR) and artificial
intelligence (AI). It considers challenges and
opportunities for individuals and society, and the ways
in which risks can be countered and opportunities
enabled through law and technology.
The study discusses the tensions and proximities
between AI and data protection principles, such as in
particular purpose limitation and data minimisation. It
makes a thorough analysis of automated decision-
making, considering the extent to which it is admissible,
the safeguard measures to be adopted, and whether
data subjects have a right to individual explanations.
The study then considers the extent to which the GDPR
provides for a preventive risk-based approach, focused
on data protection by design and by default.
ISBN: 978-92-846-6771-0
doi:10.2861/293
QA-QA-02-20-399-EN-N