Small Secure Sketch For Point-Set Difference
Small Secure Sketch For Point-Set Difference
Small Secure Sketch For Point-Set Difference
such that
X
. The set X is randomly perturbed so as to reduce the inuence of the underlying dis-
tribution. Although this approach seems feasible, there are many loose-ends. Our construction
has many similarities with this approach, and indeed can be viewed as a method that realizes it.
In the rest of this paper, we rst give a secure sketch for set dierence that can handle
multi-sets (Section 4). We then give a secure sketch for 0-1 noises for points in one-dimensional
Z
n
(Section 6), and extend it to 2-dimensional Z
n
Z
n
(Section 7). Although similar ideas can
be extended to higher dimensions, it might not be practical due to larger constant factors in
the entropy loss.
Contributions and Results.
1. We give an approach to handle the combination of 0-1 noises and replacement noises, where
the points are from a nite eld Z
n
, and are well-separated. The total size of the sketch
and the entropy loss depends on the choices of the sketch for set dierence and the sketch
that handles the 0-1 noise. If the encoder and the decoder have polynomial (with respect
to s, t, log n) computing time, the entropy loss is at most 1.5s + 1 + 2t(1 + log n), which is
in O(s+t log n), and the size of the sketch is in O(s log n). If the encoder is able to perform
an exhaustive search in 2
(s)
sequences, then both the size of the sketch and the entropy
loss are in O(s +t log n).
2. We give an extension of the above to a 2-dimensional universe, Z
n
Z
n
. If the encoder and
the decoder have polynomial computing time (with respect to s, t, log n), the entropy loss
is at most 4s + 4 + 2t(1 + 2 log n), while the size of the sketch is in O(s log n). When the
encoder can do exhaustive search in 2
(s)
, then both the size of the sketch and the entropy
loss are in O(s +t log n).
3. We give a scheme for set dierence that handles multi-sets. The scheme has a simple and
yet very ecient decoding algorithm, which amounts to solving linear systems with 2t
equations, and root-nding for a polynomial of degree at most t. Hence, the number of
arithmetic operations in Z
n
is bounded by a polynomial of s and t. The size of the sketch
and the entropy loss are at most 2t(1 + log n). This construction is very similar to the set
reconciliation technique in [10].
2 Related Works
Recently, a few new cryptographic primitives for noisy inputs are proposed. Fuzzy commitment
scheme [9] is one of the earliest formal approaches to error tolerance. The fuzzy commitment
scheme uses an error correcting code to handle Hamming distance. The notions of secure sketch
and fuzzy extractor are introduced in [6], which gives constructions for Hamming distance, set
dierence, and edit distance. Under their framework, a reliable key is extracted from noisy data
by reconstructing the original data with a given sketch, and then applying a normal extractor
(such as pair-wise independent hash functions) on the data. The issue of reusability of sketches
is addressed in [3]. It is shown that a sketch scheme that is provably secure may be insecure
when multiple sketches of the same biometric data are obtained.
The set dierence metric is rst considered in [8], which gives a fuzzy vault scheme. Later,
[6] proposed three constructions. The entropy loss by all these schemes are roughly the same.
They dier in the sizes of the sketches, decoding eciency and also the degree of ease in prac-
tical implementation. The BCH-based scheme [6] has small sketches and achieves sublinear
(with respect to n, the size of the universe) decoding by careful reworking of the standard
BCH decoding algorithm. All these schemes can not handle multi-sets. The set reconciliation
protocol presented in [10] is designed for two parties to jointly discover the union of their data,
with as little communication cost as possible. Although the problem settings are dierent, the
techniques in handling set dierence is similar and can be employed to obtain a secure sketch.
Another line of research yields the constructions of approximate message authentication
codes ([7, 2, 11, 5]), which can authenticate images that are corrupted by certain levels of noises,
which are common to images (such as white noise and compression). There are also attempts
in rening the extraction of biometric features so that the features are invariant to permissible
noises [12]. Unfortunately, the reliability of such systems is not high.
3 Preliminaries
We follow the denitions of entropy and secure sketch in [6]. We also give the closeness and
distance functions considered in this paper. A summary of notations is given in Appendix A.
Entropy. Let H
(A) =
log(max
a
Pr[A = a]). For two random variables A and B, the average min-entropy of A
given B is dened as
H
(A[B) = log(E
bB
[2
H
(A|B=b)
]). This denition is useful in the
analysis, since for any -bit string B, we have
H
(A[B) H
(A) .
Secure sketch. Let / be a set with a closeness relation C //. When (X, Y ) C, we
say the Y is close to X, or (X, Y ) is a close pair. The closeness can be determined by a distance
function dist : // R
0
and a threshold . That is, (X, Y ) C i dist(X, Y ) .
Denition 1. A sketch scheme is a tuple (/, C, Enc, Dec), where Enc : / 0, 1
is an
encoder and Dec : /0, 1
(X)
H
(X [ Enc(X)) m.
Closeness and Distance Functions. In this paper, / could be the collection of subsets or
multi-sets of a universe |. The universe could be Z
n
or Z
n
Z
n
, where n is a prime. We consider
2 types of noises. The 0-1 noise, and the replacement noise that replaces certain elements in a
set by randomly chosen elements. Here is a list of closeness and distance functions considered
in this paper.
1. C
s,t
: The closeness determined by set dierence, which is catered for the replacement noise.
A pair (X, Y ) C
s,t
if [X[ = [Y [ = s and [X Y [ s t.
2. ZDist: Distance function dened in Z
n
, which is catered for the 0-1 noise.
ZDist(x, y) =
0 if y = x y = x + 1
otherwise
. (1)
We dene ZDist(n 1, 0) = . We can generalize this function to 2 or higher dimensions.
For example, in Z
n
Z
n
, we have
ZDist((x
1
, x
2
), (y
1
, y
2
)) =
0 if ZDist(x
1
, y
1
) = 0 ZDist(x
2
, y
2
) = 0
otherwise
. (2)
Note that ZDist is not symmetric, that is, it is not necessary that ZDist(x, y) = ZDist(y, x).
3. ZPS
s,t
: Closeness for point-sets of size s. Suppose X = x
1
, . . . , x
s
and Y = y
1
, . . . , y
s
,
we have (X, Y ) ZPS
s,t
if there exists a 1-1 correspondence f on 1, . . . , s such that
[i [ ZDist(x
f(i)
, y
i
) = 0[ s t.
Well-separated point-sets. A point-set X is well-separated if for any x, y X, we have
ZDist(x, y) > 0. We will discuss about this in detail in Section 6.3.
Almost k-wise independent random variables. A sample space on n-bit strings is k-wise inde-
pendent if, the probability distribution, induced on every k bit locations in a randomly chosen
string from the sample space, is uniform. Alon et al [1] considered almost k-wise independence
with small sample size, and gave several constructions.
Denition 2 (Almost k-wise independence [1]). Let S
n
be a sample space and X =
x
1
x
n
be chosen uniformly from S
n
. S
n
is almost k-wise independent with statistical dif-
ference if, for any k positions i
1
< i
2
< . . . < i
k
, and any k-bit string , we have
{0,1}
k
[ Pr[x
i
1
x
i
2
. . . x
i
k
= ] 2
k
[ . (3)
If we choose = 2
k
, the probability Pr[x
i
1
x
i
2
. . . x
i
k
= ] in (3) is non-zero. To see this, let
X
= x
i
1
x
i
2
. . . x
i
k
, then Pr[X
= ] >
2
k
, since otherwise
{0,1}
k Pr[X
= ]2
k
[+[ Pr[X
= ]2
k
[ >
2
k
, which is a contradiction. Hence, we can always nd such X given any i
1
, . . . , i
k
and any
. Furthermore, the number of bits required to describe the sample is (2 + o(1))(log log n +
3k/2 + log k) which is in O(k + log log n).
4 Secure Sketch for Set Dierence
In this section, we give a secure sketch for set dierence, that is, with respect to the closeness
C
s,t
. Our scheme can handle the case where X is a multi-set. The size of the sketch is at most
2t(1 + log n). In addition, there exists a simple and yet ecient decoding algorithm we just
need to solve a linear system with 2t equations and unknowns and nd the roots of two degree
t polynomials.
To handle a special case, we assume that X does not contain any element in 0, 1, . . . , 2t1,
and will discuss how to remove this assumption later at the end of this section. Our construction
is similar to the set reconciliation protocol in [10], but the problem settings are dierent.
The encoder Enc
s
. Given X = x
1
, . . . , x
s
, the encoder does the following.
1. Construct a monic polynomial p(x) =
s
i=1
(x x
i
) of degree s.
2. Publish P = p(0), p(1), . . . , p(2t 1)).
The decoder Dec
s
. Given P = p(0), p(1), . . . , p(2t 1)) and Y = y
1
, . . . , y
s
, the decoder
follows the steps below.
1. Construct a polynomial q(x) =
s
i=1
(x y
i
) of degree s.
2. Compute q(0), q(1), . . . , q(2t 1).
3. Let p
(x) = x
t
+
t1
j=0
a
j
x
j
and q
(x) = x
t
+
t1
j=0
b
j
x
j
be monic polynomials of degree t.
Construct the following system of linear equations with the a
j
s and b
j
s as unknowns.
q(i)p
(i) = p(i)q
(x) and q
and Y
respectively.
6. Output
X = (Y X
) Y
.
The correctness of this scheme is straight forward. When there is exactly t replacement
errors, we can view p
= XY . Similarly,
q
(x) = p(x)q
(x) and q
(x) = p(x)q
(x) and q
= (X Y ) Z,
and Y
white if x 1 mod 2
black if x 0 mod 2
(5)
We call a many-to-one function H a quantizer if for all x, H(x) = x if x is black, and H(x)
is either x 1 or x + 1 if x is white. Fig. 2 depicts such a function. For X = x
1
, . . . , x
s
, we
denote H(X) H(x
1
), . . . , H(x
s
).
Given X = x
1
, x
2
, . . . , x
s
, our goal is to nd a quantizer such that each point in X will
be consistently quantized, even under the inuence of 0-1 noises. In particular, we require that
1 x X, H(x) = H(x + 1).
1 2 3 5 6 0 4 7 8
1
X
2
X
4
X
5
X
3
X
Fig. 2. A quantizer H. The arrows indicate how the points are quantized (or rounded) to even numbers.
H(x) H(x) H(x)
b
b b
(b) (c) (a)
Fig. 3. Recovering x from H(x). There are three scenarios for H when H(x) = b.
To reconstruct X given H(X), we need to add more constraints on H. Suppose we know
that H(x) = b for some b, as illustrated in Fig. 3. In scenario (b) and (c), x must be b and
b 1 respectively. However, when (a) happens, x can be either b or b 1.
To resolve the ambiguity, we add one but not both of the following two constraints:
2a x X, H(x + 2) = x + 3 if x is white, or
2b x X, H(x 1) = x 2 if x is black.
Note that the constraint will not eliminate scenario (a). Nevertheless, even if scenario (a)
happens, we can conclude that x = b if 2a is imposed, and that x = b 1 if 2b is imposed.
6.2 Short Descriptions of H (Sketch P
H
)
A quantizer H can be conveniently represented by a sequence h
1
, h
3
, h
5
, . . . , h
n2
) where
h
i
= 0 if H(i) = i 1; otherwise, h
i
= 1. Publishing such a sequence requires n/2| bits, which
is undesirable when n is large.
For any given X, there would be many quantizers that satisfy the constraints in Section
6.1. The constraints restrict the values of h
i
for certain indices is. Let W be the set of these
indices. For any other j , W, h
j
can be either 0 or 1.
The rst constraint 1 restricts s of such h
i
s. For constraints 2a and 2b, we choose 2a if
white is the minority color in X, and 2b otherwise. In this way, the number of restricted h
i
s
is at most 0.5s. Hence, we have [W[ 1.5s. Note that we need to publish an additional 1 bit
to indicate which constraint, either 2a or 2b, we have used.
We give two constructions of short descriptions. The rst one is simple and ecient but
requires [W[ log n bits. The second one requires space that is near-optimal, but it requires
exhaustive search during encoding. In either case, the entropy loss is small, and there exists
ecient decoding algorithms. Let k = [W[, and W = w
1
, . . . , w
k
.
Construction I: Construct a polynomial f(x) of degree k 1 as the following.
1. Randomly choose d
1
, . . . , d
k
| such that for 1 i k, d
i
h
w
i
mod 2.
2. Find a degree k 1 polynomial f such that f(w
i
) = d
i
for 1 i k.
The sketch P
H
is the k coecients of f. Hence, the size of the sketch is k log n. The size of this
sketch could be reduced further to about k log(n/2), since we can work on a smaller nite eld.
This is possible because the number of h
i
s is only n/2|, thus any nite eld of size greater
than n/2 is sucient.
During decoding, the quantizer H to be used is the one represented by h
i
= f(i) mod 2
for all odd i.
Construction II: Firstly we construct an almost k-wise independent space on n bits [1],
and = 2
k
. Given W and h
w
1
, . . . , h
w
k
), we uniformly choose a sample =
1
n
in the
sample space such that
w
i
= h
w
i
for all i. The representation of is then the description of
H, which is the sketch P
H
. Note that the size of P
H
is = (2 +o(1))(log log n +3k/2 +log k),
which is in O(s + log log n). However, we do not have an ecient way to nd such sample,
except by exhaustive search in the sample space of size 2
(X)
H
(X[P
H
) k + 1.
For Construction II, the entropy loss is simply bounded by the size of P
H
. That is,
H
(X)
(X[P
H
) [P
H
[ (2 +o(1))(log log n + 3k/2 + log k) + 1.
Since k 1.5s, we have the claimed bounds.
Recall from Section 4 that P
S
introduces entropy loss at most 2t(1+log n), the total entropy
loss of P
H
P
S
is bounded by the following.
Theorem 1. When t n/4, the entropy lost due to P
H
and P
S
is at most h+1+2t(1+log n),
where h = 1.5s if P
H
is computed using Construction I, and h = (2 +o(1))(log log n + 9s/4 +
log(1.5s)), which is in O(s + log log n), if P
H
is computed as in Construction II.
6.3 On the Assumption that Points are Well-Separated
If the points are not well-separated, the error tolerance of our scheme would be aected. For
example, consider the points x
3
, x
4
and x
5
in Fig. 2. If the 0-1 noise shifts x
5
from 7 to 8,
it will be considered as a replacement error. If the noise happens to leave x
5
unchanged, then
the scheme still works. In addition, if X contains duplicated elements, our scheme would work
ne because the sketch in Section 4 can handle multi-sets. In other words, our scheme has a
guaranteed error tolerance when there are no two points x, x
X such that x x
= 1.
During the reduction from white noise to 0-1 noise as in Section 5, we can choose a step
size that is larger than 2, such that given the same white noise, the points are less likely to be
shifted. In other words, the 0-1 white noise is reduced on average. Note that this introduces
more entropy loss. Moreover, such trade-o depends on the distribution of the input biometric
data. Therefore, we will not discuss it further in this paper.
In sum, with the requirement on well-separation, our scheme can tolerate the claimed noise
in the worst case. Without the requirement, the scheme could still perform well in average.
Hence, it is better to include those points that violate the well-separation requirement, instead
of removing them. To include those points, we have to handle multi-sets.
7 0-1 Noise with Replacement when M= P(Z
n
Z
n
)
We now extend our construction on 0-1 noise with replacement to | = (Z
n
Z
n
). Similar
to one-dimension, the sketch is the concatenated
P
H
P
S
. We will only discuss the sketch
P
H
.
We assume that X is always well-separated. That is, for any distinct (u
1
, v
1
), (u
2
, v
2
) X,
[u
1
u
2
[ 2 and [v
1
v
2
[ 2.
7.1 Quantizer H
The elements in | are rendered with 4 dierent colors as below.
color(u, v) =
black if u v 0 mod 2
white if u 0 mod 2 v 1 mod 2
red if u 1 mod 2 v 0 mod 2
green if u v 1 mod 2
(6)
The 0-1 noise either leaves a point x untouched, or shifts it to one of the 3 adjacent points. Let
Neighbour(u, v) be the set of 4 points that (u, v) may be shifted to by the 0-1 noise, namely
(u, v), (u + 1, v), (u, v + 1), (u + 1, v + 1).
In Z
n
Z
n
, a quantizer H maps each point (u, v) to a black point (u
, v
) where [uu
[ 1
and [v v
Entropy. H
(A) = log(max
a
Pr[A =
a]).
H
(A[B) = log(E
bB
[2
H
(A|B=b)
]).
C
s,t
The closeness relation dened by set dierence. A pair (X, Y ) C
s,t
if [X[ = [Y [ = s and
[X Y [ s t.
ZDist Distance function dened in Z
n
, which caters for the 0-1 noise. For x, y Z
n
, ZDist(x, y)
is dened to be 0 if 0 y x 1, innity otherwise.
ZPS
s,t
Closeness for point-sets of size s. For two point-sets X and Y , Y is close to X if at least
s t elements in Y are close to a matching point in X under the 0-1 noise.
B Detailed Construction of
P
H
The quantizer H can be dened in terms of 3 functions H
w
, H
r
, and H
g
,
H(x) =
H
w
(x), if color(x) = white
H
r
(x), if color(x) = red
H
g
(x), if color(x) = green
Each function maps its input to one of its neighbouring black points. For convenience, for
any x X, let x
b
, x
w
, x
r
, and x
g
be the black, white, red and green points in Neighbour(x)
respectively. Hence, the rst constraint 1 is equivalent to the following. For all x X,
1a H
b
(x) = x
b
1b H
w
(x) = x
b
1c H
r
(x) = x
b
1d H
g
(x) = x
b
.
As mentioned in Section 7, there are two possibilities for H
r
(x) and H
w
(x), and once
they are xed, there are also two possibilities for H
g
(x). Hence, to describe H in a straight
forward manner using a binary sequence, we only need m = (1/4 + 1/4 + 1/4)n
2
bits. Let
h = h
1
, . . . , h
m
) be such a sequence.
For any x X, to satisfy constraint 1b, we need to restrict the value of at most one bit
in h. Same for 1c and 1d. We do not need to consider 1a since it is implicit. Therefore, for
s points, we need to restrict 3s bits in h.
Similar to the 1-dimensional case, there will be ambiguities when we try to recover X from
an H that satises only 1. One of the worst (most ambiguous) scenarios of H is as illustrated
by the solid arrows in Fig. 5. In this case, if H(x) = x
1
for some x X, any of the four points
X
1
X
4
X
3
X
2
X
7
X
8
X
5
X
6
Fig. 5. Ambiguity resolving in 2-D. The solid arrows show how H quantizes each point, and the dashed arrows
show how the ambiguities can be resolved.
x
1
, x
2
, x
3
and x
4
could be x.
To resolve the ambiguity while keeping the size of the sketch small, we impose more con-
straints on H. First, we nd the most and the least frequently occurred colors in X. Without
loss of generality, we assume that they are black and green respectively. Then we require that
the following constraints are satised by H. For all (u, v) X,
2a H
r
(u + 2, v) = (u + 3, v) if (u, v) is red;
2b H
w
(u, v + 2) = (u, v + 3) if (u, v) is white;
2c Both 2a and 2b if (u, v) is green.
With the above additional constraint, we can resolve the ambiguity by the following rules.
If H is as shown by the solid arrows in the gure, we declare that x = x
1
. If H
r
(x
5
) = x
6
(the horizontal dashed arrow), and the rest follow solid arrows, we declare that x = x
2
. If
H
w
(x
7
) = x
8
(the vertical dashed arrow), and the rest follow solid arrows, we declare that
x = x
4
. If both H
r
(x
5
) = x
6
and H
w
(x
7
) = x
8
, we declare that x = x
3
.
In this case, each red and white point would restrict one more bit by constraints 2a and 2b
respectively, and each green point would restrict two more bits by constraint 2c. Since green
is the least frequently occurred color, the number of green points is at most s/4. Similarly, the
number of black points is at least s/4. In the worst case (in terms of the number of restricted
bits), there are s/4 green points and s/2 red and white points. Therefore, the total number of
bits restricted by these constraints is at most (1/2 + 2/4)s = s.
Therefore, 3s + s = 4s bits are restricted in the (3/4)n
2
bits of h. By applying the con-
structions in Section 6.2, we obtain the results presented in Section 7.