Practical Privacy-Preserving Protocols for Criminal
Investigations
Florian Kerschbaum, Andreas Schaad, Debmalya Biswas
SAP Research
Karlsruhe, Germany
Email: firstname.lastname@sap.com
Abstract—Social Network Analysis (SNA) is now a commonly
used tool in criminal investigations, but evidence gathering and
analysis is often restricted by data privacy laws. We consider
the case where multiple investigators want to collaborate but
do not yet have sufficient evidence that justifies a plaintext
data exchange. We propose a practical solution that allows
an investigator to expand his current view without actually
exchanging sensitive private information. The investigator gets
a partially anonymized view of the entire social network, while
preserving his known view.
I. I NTRODUCTION
In federated states or organization of states, such as the
European Union or the United States, a common approach
to organized crime is necessary. For this purpose, federal
law enforcement agencies, such as Europol or the FBI, have
been established. Nevertheless, data privacy laws or simply
data governance concerns restrict supplying institutions from
sharing their data, unless there is a hard corroborating evidence
on a case and subject under investigation. In particular, in
the European Union [1], data privacy is regarded as a high
social and political value and the dilemma on how to generate
evidence without violating privacy laws is evident.
A common tool for the criminal investigator is social
network analysis. It graphically depicts the suspects and their
connections to other people or artifacts, such as telephone
numbers or bank accounts, and allows the computation of
certain metrics. Not all the facts composing the entire picture
of a case may be known to one investigator. In particular, in
pan-European organized crime, local police forces may only
be aware of a partial view of the picture, as the case studies
in the framework of the R4eGov project suggest [2].
This necessitates data exchange between the institutions, but
European data privacy laws prohibits data exchange without
reasonable cause and in excessive amounts. Therefore, we
propose a solution where a local investigator can track his
subjects or visualize the entire network, but without revealing
sensitive or private details. This allows the investigator to still
use SNA and profit from its achievements without breaking
individual privacy rights or guidelines of other institutions.
Privacy-Preserving SNA has been suggested in the literature
before, but we found the solutions to be insufficient for the
requirements of our scenario. In [7] a fully anonymized version
of the social network is computed. This does not allow the
investigator to track his suspect anymore and he cannot gain
additional information or collect evidence about him.
We propose the protocol “Compute Entire Network” that
computes the entire social network from distributed sources.
The protocol does not reveal any personally identifying information. However, the protocol does allow the investigators to
keep track of their own inputs, as well as the sources of inputs
shared by other investigators. Given this, the investigators
can perform analyses on their subjects, requesting additional
information from the other investigators as deemed necessary
by their analyses.
The remainder of the paper is structured as follows: The next
Section reviews related work. This is followed by description
of the protocol in Section III. This section is divided into
building blocks, protocol description and analysis. The final
section presents the conclusions of this work.
II. R ELATED W ORKS
SNA has been used for criminal investigations for a long
time [4], [5], [6]. Recent research [6] suggests using graphical
tools and investigates the impact of SNA. We can conclude
that SNA is a widely accepted tool in criminal investigations.
Privacy-Preserving SNA was first proposed by [7]. They
compute an anonymized graph of the social network, such
that no one should be able to track their position in the
graph. They allow for certain modifications of the correctness
of the anonymized graph in order to prevent tracking of
one’s position. E.g. they may bound the number of incoming
connections or apply similar restrictions. While this provides
strong privacy guarantees it does not match the requirements
of our scenario. An investigator intends to gather additional
information to his present view of the social network. It is
therefore unacceptable to anonymize his view, but the goal
is to augment it with additional information about the entire
network.
In a previous paper, Kerschbaum and Schaad [14] had
shown how certain association metrics, such as betweeness
and closeness, can be computed without revealing personally
identifying information and without revealing the entire social
network. In this work, we show how to compute the entire
social network in an anonymous fashion.
Privacy-Preserving SNA can be seen as a special case of
secure multi-party computation (SMC) which can solve any
distributed function privately. SMC has been suggested in [8]
for the two-party case. The first multi-party solution has been
suggested in [9] for the computational setting and in [12]
for the information-theoretic setting. Efficient construction has
been identified for different secret sharing schemes [10], [11].
Nevertheless, as in [7] stated, a straight-forward application of
these techniques would result in an unpractical protocol.
III. V ISUALIZING AND A NALYZING THE E NTIRE
N ETWORK
A. Building Blocks
A social network can be represented as a graph G = (V, E)
consisting of a set V of (connected) vertices and a set E of
edges between the vertices. Each vertex v ∈ V represents
a person or other artifact, such as a telephone number or
company, and is associated with some personally identifying,
unique information (such as the name or the telephone number). An edge e ∈ E connects a pair of vertices v1 and v2 and
is in correspondence with our application undirected.
The graph G may be distributed among n data sources, such
that each party Xi holds a part of the graph Gi = (Vi , Ei ).
The combination of the data sources result in the entire graph
V =
n
[
Vi
E=
n
[
Ei
i=1
i=1
The parts may be overlapping, such that the intersection
may not be the empty set.
n
\
i=1
Vi 6= ∅
n
\
Ei 6= ∅
i=1
In the protocols, we use a commutative encryption scheme.
In a commutative encryption scheme the order of encryption
(with different keys) does not matter. We denote the encryption
with Alice’s key as EA () and with Bob’s key as EB (). Then,
in a commutative encryption scheme, it holds that
EA (EB (x)) = EB (EA (x))
As we compare ciphertext, the encryption system cannot
be semantically secure, but may be secret key. A candidate
encryption system with all these properties is Pohlig-Hellman
encryption [13].
B. Protocol to Compute the Entire Network
The goal of the protocol is to compute the entire graph G,
but without revealing the identifiers of the vertices. The algorithm must maintain the partial view of the graph, such that
each investigator can track his information in the assembled
graph. The algorithm may reveal the source of each vertex or
edge, the size of each part of the graph and the overlapping
sets, such as the vertices or edges already known to the local
investigator.
The source information can be used by the investigator
to selectively request additional information from other institutions that can enhance the case. Therefore, this protocol
can be used to combine data sources selectively, such that
personally identifying information is only revealed when there
is a reasonable cause.
We refer to the collaborating investigators as participants in
the protocol. Before the protocol begins, each participant Xi
holds a key for encryption Ei () in the commutative, secret key
encryption scheme and another key Ei′ (). Ei′ () should be in
the same (commutative) encryption scheme.
The “Compute Entire Network” protocol proceeds as follows:
1) Each participant Xi prepares for each of his edges ej =
(v, v ′ ) ∈ Ei a tuple
hi, Ei′ (v), Ei′ (v ′ ), Ei (v), Ei (v ′ )i
2) Each participant Xi sends his tuples to the next participant Xi+1 . Participant Xn sends to participant X1 .
3) Each participant Xi+1 encrypts the last two fields of
each received tuple.
hi, Ei′ (v), Ei′ (v ′ ), Ei+1 (Ei (v)), Ei+1 (Ei (v ′ ))i
4) All participants repeat steps 2 and 3 n times.
5) Each participant Xi keeps a copy of the received tuples
and forwards them to participant Xi+1 . They repeat this
step n − 1 times.
After the “Compute Entire Network” protocol, each participant Xi holds a set T of tuples representing all edges
E plus some potential duplicates. Let E n (v) denote the
encryption with all keys Ei () i = 1, . . . , n. Note that due
to the commutative encryption the order of encryption does
not matter. A tuple after the protocol therefore looks like this:
hi, Ei′ (v), Ei′ (v ′ ), E n (v), E n (v ′ )i
The local investigator can now
• remove duplicates, since duplicates have the same two
last fields, resulting in the set E.
• build an anonymized graph G from E with the
pseudonyms E n (v) for v and e.g. visualize it.
• track his input by identifying the tuples with (his) i in
the first field and deanonymizing those pseudonyms by
decrypting the second and third field.
The investigator ends up with an anonymous view of the
entire social network with a partial non-anonymous view
of his present knowledge. He can then perform SNA (e.g.
metrics computation) on his suspects without having learnt
personally identifiable from his collaborators. He may request
additional information from some collaborators, since he can
now target individual collaborators that are likely to have
useful information. This protects the privacy of suspects, since
it reduces the overall data exchange to reasonable amounts that
are likely to contribute positively to the case. The information
about likely innocent people remains protected.
Note that the protocol actually allows all the participants
to compute their own local anonymized graphs of the entire
network simultaneously. Thus, the protocol can be regularly
scheduled as a batch process by all the participants to maintain
their own local copies of G. To further improve efficiency, the
protocol can be be run in a change-oriented fashion. That is,
after an initial run of the protocol, for any subsequent runs, it
is sufficient if each participant only processes any newly added
edges (after the last run) at its site. Of course, this holds only
as long as the set of participants does not change.
Entire Network” protocol is much more suitable for practical
implementations.
C. Analysis
Social Network Analysis is becoming an important tool
for investigators, but all the necessary information is often
distributed over a number of sites. Privacy legislation and data
governance concerns prohibit freely sharing the information.
We have presented a practical protocol that allows selective
disclosure of information for Social Network Analysis. An
investigator can use the protocol to compute the entire network
in an anonymized fashion, while preserving his local information. The protocol has low computational complexity and
allows the investigators to perform many analyses. It preserves
the privacy of personally identifiable information of suspects
and limits the data exchange to prevent misuse.
The protocol operates in the semi-honest model [3]. We
strongly argue that this is appropriate for our application, since
we are concerned with cooperating police organizations and
officers whose main concern is protecting the privacy of the
suspects and keeping practical data governance. That is, the
organizations are inclined to follow the protocol, since their
objective is not only the outcome of the collaborations, but
also the process of data privacy protection. Since interest in
collaboration can be assumed, the organizations could simply
exchange data by bypassing the protocol, if they were not
interested in data protection.
The main security goal of the protocol is to not leak
identifiable information as in the labels of the vertices. We
state the following theorem:
Theorem 1. The identifier of any vertex v known only to
participants Xi ∈ X|v ∈ Vi is inaccessible to any participant
Xj ∈
/ X.
Proof Sketch: The identifier of v only leaves the systems
of a participant Xi encrypted under Ei (). Therefore, the
security of the identifier is based on the security of the
encryption.
Privacy Model Comparison: The paper by Frikken and Golle
[7] presents a privacy definition for SNA. Our paper takes
a different stance on privacy than their paper. Their main
concern is that an attacker is able to track his position in
the network and thereby possibly identify some other vertices
in the network. They assume that one may have partial information about the network and exploit this knowledge to gain
additional information. On the contrary, we assume that one
completely divulges one’s partial information. The threat is
that an attacker might learn additional identifying information
accidentally revealed during the protocol. In contrast to [7]
where no quantification of the previous knowledge was given,
we prove that absolutely no private (that is, identifying)
information is being leaked. This implies that the attack with
previous knowledge does not apply in our framework, since
the attacker would simply reveal that information and then
the revealed information would be part of the result of the
computation. Obviously, the result is not protected by our (or
for that matter any) protocol.
Computational Complexity Comparison: The paper [14] by
Kerschbaum and Schaad gives a protocol to compute the SNA
metrics (betweeness and closeness) in the same setting as the
current paper. Let the number of participants be p and |V | be
the number of vertices in the entire network G. Given this,
the computational complexity of their protocol is O(p|V |3 ).
In contrast, our protocol has a computational complexity
of O(p2 |V |). As the number of vertices is expected to be
much more than the number of participants, the “Compute
IV. C ONCLUSION
ACKNOWLEDGMENT
The authors would like to thank the anonymous reviewers
for their helpful suggestions. The developments presented in
this paper were partly funded by the European Commission
through the ICT program under Framework 7 grant 213531 to
the SecureSCM project.
R EFERENCES
[1] Directive 95-46-EC on the protection of individuals with regard to the
processing of personal data and on the free movement of such data,
Available at http://ec.europa.eu/justice home/fsj/privacy, 1995.
[2] T. Van Cangh and A. Boujraf, The Eurojust-Europol Case Study,
Available at http://www.r4egov.eu, 2007.
[3] O. Goldreich, Secure Multi-party Computation, Available at
http://www.wisdom.weizmann.ac.il/˜oded/pp.html, 2002.
[4] W. R. Harper and D. H. Harris, The application of link analysis to police
intelligence, Human Factors, 17(2): 157-164, 1975.
[5] M. K. Sparrow, The application of network analysis to criminal intelligence: An assessment of the prospects, Social Networks, 13: 251-274,
1991.
[6] J. Xu and H. Chen, Criminal Network Analysis and Visualization,
Communications of the ACM, 48(6): 100-107, 2005.
[7] K. Frikken and P. Golle, Private Social Network Analysis: How to
Assemble Pieces of a Graph Privately, in proceedings of the 5th ACM
Workshop on Privacy in the Electronic Society (WPES), pp. 89-98, 2006.
[8] A. C. Yao, Protocols for secure computations, in proceedings of the
23rd Annual IEEE Symposium on Foundations of Computer Science
(FOCS), pp. 160-164, 1984.
[9] O. Goldreich and S. Micali and A. Wigderson, How to play any mental
game, in proceedings of the 19th Annual ACM conference on Theory
of Computing (TOC), pp. 218-229, 1987.
[10] R. Cramer and I. Damgard and U. Maurer, General Secure Multi-party
Computation from any Linear Secret-Sharing Scheme, in Advances in
Cryptology - EUROCRYPT, Springer LNCS vol. 1807, pp. 316-334,
2000.
[11] R. Cramer and I. Damgard and J. Nielsen, Multiparty Computation
from Threshold Homomorphic Encryption, in Advances in Cryptology EUROCRYPT, Springer LNCS vol. 2045, pp. 280-300, 2001.
[12] M. Ben-Or and A. Wigderson, Completeness Theorems for Noncryptographic Fault-tolerant Distributed Computation, in proceedings
of the 20th Annual ACM symposium on Theory of computing (TOC),
pp. 1-10, 1988.
[13] S. Pohlig and M. Hellman, An improved algorithm for computing logarithms over GF(p) and its cryptographic significance, IEEE Transactions
on Information Theory, 24(1): 106-110, 1978.
[14] F. Kerschbaum and A. Schaad, Privacy-preserving Social Network
Analysis for Criminal Investigations, in proceedings of the 7th ACM
Workshop on Privacy in the Electronic Society (WPES), pp. 9-14, 2008.