Semantic Private Information Retrieval
From MDS-Coded Databases
Sajani Vithana1 , Karim Banawan2 , and Sennur Ulukus1
1
2
Department of Electrical and Computer Engineering, University of Maryland
Electrical Engineering Department, Faculty of Engineering, Alexandria University
Abstract—We investigate the problem of semantic private
information retrieval (PIR) from coded databases, where a
user requires to download a message out of M independent
messages, without revealing its identity to the databases. These
messages are coded using an (N, K) MDS code and stored in N
non-colluding databases. The M messages are allowed to have
different semantics, e.g., different sizes and different probabilities
of retrieval. We characterize the exact capacity of semantic PIR
with coded databases, and provide an achievable scheme with
non-uniform subpacketization. We show that the retrieval rate of
semantic PIR with coded databases outperforms that of classical
PIR with coded databases when the effects of zero padding
shorter messages are taken into account.
I. I NTRODUCTION
The classical private information retrieval (PIR) problem
refers to a setting, where a user downloads a required message
from a system of non-colluding replicated databases containing a number of messages while not revealing the identity of
the downloaded message to the databases. This problem was
first introduced by Chor et al. in [1]. The information-theoretic
characterization of the classical problem is presented in [2]. In
[2], the performance metric is the retrieval rate, which is the
ratio between the desired message bits and the total download.
The supermum of all achievable retrieval rates is called the PIR
capacity. The PIR capacity of many variants of the problem
have been studied (see for instance [3]–[43]).
The most closely related works to ours are PIR from MDScoded databases, which is studied extensively in [44]–[53],
and the semantic PIR problem from replicated databases in
[54]. In semantic PIR, the messages exhibit different semantics. These semantics include message sizes and prior
probabilities. This is motivated in practice by the fact that
files in storage systems have different sizes (e.g., databases
may simultaneously store text files and video files with vastly
different message sizes) and popularity levels (e.g., databases
may simultaneously store trending and stale files). The work of
[54] characterizes the capacity of semantic PIR from replicated
databases. On the other hand, MDS-coded storage systems
provide an increased level of reliability for information stored
in systems of databases without incurring the excessive storage
cost of direct replication. The capacity of the MDS-coded PIR
problem is characterized in [44] to be a function of the MDS
code parameters. Due to the relevance of MDS coding and
semantic heterogeneity in practice, it is desirable to provide
a viable PIR scheme that efficiently optimizes the PIR rate
corresponding to the storage code structure and the given
message semantics.
In this paper, we aim at characterizing the capacity of
semantic PIR from N non-colluding coded databases using
an arbitrary (N, K) MDS storage code. In this problem,
there are M messages with different semantics, i.e., each
message has a different message size and a different popularity level. More specifically, the storage system possesses
messages (W1 , . . . , WM ) in matrix form, with K columns
and Li , i ∈ {1, . . . , M } rows. Each row is mapped to the
content of the databases via an (N, K) MDS storage code
and then distributed to a system of N non-colluding databases.
Furthermore, the messages have arbitrary probabilities of retrieval (pi , i ∈ {1, . . . , M }) to reflect the popularity levels. We
investigate the interplay between the storage code parameters,
non-uniform subpacketization, and its effect on the PIR rate,
i.e., for a given (N, K) MDS code, how can we design the
retrieval parameters to exploit the heterogeneity of the message
semantics to maximize the retrieval rate?
We characterize the
MDS-coded
−1
semantic PIR from (N, K)
L2
L1
K
K M −1 LM
databases as C = E[L] + N E[L] +. . .+ N
,
E[L]
where the expected number of rows E[L] is with respect to
the retrieval probability distribution. We provide an achievable
scheme, which is an extension to the scheme introduced
in [44]. The main difference of our scheme compared to
its counterpart in [44] is that our achievable scheme uses
non-uniform subpacketization that is parameterized by the
message sizes and the storage code parameters. Specifically,
a single application of the scheme results in different number
of useful downloads (which correspond to the number of rows
in the required message) for different message requirements
of the user. The converse is extended from [44] with the
incorporation of the heterogeneity of message sizes. Compared
to the achievable MDS-coded PIR rate with zero padding, i.e.,
when all messages are zero-padded to match the size of the
largest message, the semantic PIR capacity expression shows
a strict rate gain when message semantics are not identical.
II. P ROBLEM F ORMULATION
We consider an (N, K) MDS coded distributed storage
system containing M independent messages. The messages
are allowed to have different lengths and different prior
probabilities of retrieval. The prior probability distribution
(pi , i ∈ {1, . . . , M }) is known by the databases and the
user. Each message Wi , i ∈ {1, . . . , M } is represented as
i ×K
, where the Li × K elements of Wi are
a matrix in FL
q
chosen uniformly and independently from Fq . Without loss
of generality, we assume that the messages are ordered with
respect to their sizes, such that L1 ≥ L2 ≥ · · · ≥ LM . The
message sizes can be expressed in q-ary symbols as,
H(Wi ) = KLi ,
i = 1, . . . , M.
(1)
The generator matrix of the (N, K) code H is a FqK×N
matrix, which is represented as H = [h1 , h2 , . . . , hN ], where
hi ∈ FK
q , i ∈ [N ]. The MDS property implies that any combination of up to K columns of H are linearly independent.
[i]
Let the jth row of Wi be denoted by xj , j = 1, . . . , Li and
i = 1, . . . , M . The nth database stores a projection of this row
[i]
as xj hn , n = 1, . . . , N .
In order to retrieve a message Wi , the user sends query
[i]
Qn to the nth database, n = 1, . . . , N . The objective is to
retrieve Wi without revealing the index i to any database. The
queries and messages are independent of each other. Once
the databases receive the queries from the user, they generate
[i]
answer strings denoted by An , n = 1 . . . , N , to send back to
the user. These answer strings are deterministic functions of
the stored coded messages and the received query.
An achievable PIR scheme satisfies the following constraints:
Correctness: The user should be able to perfectly decode
the desired message using the received answer strings. Thus,
[i]
[i]
[i]
[i]
H(Wi |A1 , . . . , AN , Q1 , . . . , QN ) = 0, i = 1, . . . , M (2)
Privacy: The queries sent to the databases by the user
should not leak any information on the required message
index. Thus, the joint PMFs of queries, answers and messages
need to satisfy the following for n ∈ [N ], j ∈ [M ], i 6= j
[i]
(Q[i]
n , An , W1 , . . . , WM )
∼
[j]
(Q[j]
n , An , W1 , . . . , WM )
(3)
An achievable semantic PIR scheme for a coded distributed
system is a scheme that satisfies the correctness and privacy
conditions in (2) and (3). The achievable rate of the semantic
PIR scheme with coded databases is defined as,
R=
KE[L]
E[H(Wi )]
=
E[D]
E[D]
(4)
PM
where E[L] =
i=1 pi Li is the expected number of rows
in a message and E[D] is the expected number of total bits
downloaded. The expectation E[·] is with respect to the a priori
probability distribution. The capacity of semantic PIR from
coded distributed databases is the supremum of the expected
retrieval rates over all achievable schemes.
III. M AIN R ESULTS AND D ISCUSSIONS
In this section, we present the capacity of semantic PIR
with MDS-coded databases, which depends on the number of
messages and their semantics as well as the MDS storage code.
Theorem 1 The capacity of semantic PIR with an (N, K)
MDS-coded distributed storage system containing M messages (Wi , i ∈ {1, . . . , M }) with Li rows in each of their
matrix representations (arranged in decreasing order as L1 ≥
L2 ≥ · · · ≥ LM ), and prior probabilities pi , is
!−1
M −1
L2
LM
K
K
L1
+
+. . .+
(5)
C=
E[L]
N E[L]
N
E[L]
PM
where E[L] = i=1 pi Li .
The achievability proof of Theorem 1 is presented in Section IV and the converse proof is presented in Section V.
Corollary 1 For messages with arbitrary lengths and for any
given (N, K) MDS code, the semantic PIR capacity with
coded databases outperforms the achievable PIR rate with
zero-padding, when message lengths are not identical.
Proof: Let Rc and Cc be the achievable rate and the capacity
of classical coded PIR, respectively. For a setting with an
(N, K) MDS code and M messages with L1 ≥ L2 ≥ . . . ≥
LM , the classical coded PIR scheme zero pads messages
W2 , . . . , WM such that all messages contain L1 rows. ThereL1
.
fore, the download cost in classical coded PIR is D = C
c
However, since the number of rows of all messages are not
the same, the effective retrieval rate of classical coded PIR is,
!−1
M −1
K
K
L1
L1
L1
+
+. . .+
Rc =
(6)
E[L]
N E[L]
N
E[L]
Since L1 ≥ . . . ≥ LM , comparison of (6) and (5) shows that
semantic coded PIR outperforms classical coded PIR in terms
of the retrieval rate.
IV. ACHIEVABILITY P ROOF
In this section, we present a scheme that achieves the
capacity expression in Theorem 1.
A. Achievable Scheme
This scheme is an extension to the scheme in [44]. All
queries of this scheme are in blocks of ℓ-sums of coded message bits for ℓ ∈ {1, . . . , M }. Our proposed scheme optimizes
the number of queries in each block of ℓ-sums, based on the
message lengths {Li }M
i=1 . This scheme utilizes non-uniform
subpacketization, as opposed to uniform subpacketization used
in [44]. The general achievability scheme is as follows:
1) Message index assignment, permutation of row indices and calculation of retrieval parameters: Assign
message indices in the descending order of the message
sizes, i.e., L1 ≥ L2 ≥ · · · ≥ LM . Permute the rows
of all messages independently from each other using
random permutations, privately from the databases. Calculate the retrieval parameters υ1 , . . . , υM using (14).
Assume that the desired message is Wj in the sequel.
2) Singletons: Download υj different coded bits corresponding to message Wj from the nth database, for
3)
4)
5)
6)
n ∈ [N ]. From each database, download υi coded bits
of Wi , i ∈ [M ], i 6= j, such that there exists K coded
bits that are downloaded from K different databases that
correspond to the same row of Wi (i.e., for some row t,
[i]
[i]
xt hn1 , . . . , xt hnK for n1 , . . . , nK ∈ [N ] and ni 6= nj
for i, j ∈ {1, . . . , K}). This is required to decode the
side information. Hence, for Wi , i 6= j, there are N υi
coded bits corresponding to NKυi different rows.
Sums of two elements: There are two types of blocks
in this step. The first block is the sums involving bits
of the desired message, Wj , and the other block is
the sums that do not have any bits from Wj . In the
first block, make use of the side information (singletons
corresponding to Wi , i 6= j) downloaded in the previous
step. Consider a 2-sum corresponding to coded bits
of Wj , Wi , i 6= j. For a given database, download
N
− 1) min{υi , υj } 2-sums utilizing all the side in(K
formation bits of Wi from other databases. Each 2-sum
contains a coded bit corresponding to a new row of Wj
and a coded bit of Wi which corresponds to a row that
was already decoded from K different databases (which
does not include this database) in the previous step.
The second block of 2-sums contains coded bits corresponding to Wi1 , Wi2 , i1 6= i2 6= j. DownN
− 1 min{υi1 , υi2 } 2-sums each from each
load K
[i ]
[i ]
databases of the form (xra1 +xrb2 )hn , n ∈ [N ], where ra
and rb are new rows of Wi1 and Wi2 . Each pair of rows
(ra , rb ) must be downloaded from K different databases
[i ]
[i ]
for correct decoding of xra1 + xrb2 . Thus, the second
N
block contains N K
− 1 min{υi1 , υi2 } bits, which
N
N( K
−1) min{υi1 ,υi2 }
different rows.
corresponds to
K
Sums of ℓ elements: There are two types of blocks
similar to sums of two. the first block contains queries
[iℓ−1 ]
[j]
[i ]
)hn for i1 6=
of the form (xr + xr11 + . . . + xrℓ−1
[j]
. . . 6= iℓ−1 6= j where xr is a new row of Wj
[iℓ−1 ]
[i ]
and xr11 + . . . + xrℓ−1
is an already decoded (ℓ −
1)-sum from the second block in the previous step.
For a given database and given i1 , . . . , iℓ−1 , there are
ℓ−1
N
υmin{j,i1 ,...,iℓ−1 } such ℓ-sums.
K −1
The second block of ℓ-sums contains queries of the
[i ]
[i ]
form (xt11 + . . . + xtℓℓ )hn for i1 6= . . . 6= iℓ 6=
j where (t1 , . . . , tℓ ) are new rows of Wi1 , . . . , Wiℓ
which are repeated at K different databases for
[i ]
[i ]
the correct decoding of xt11 , . . . , xtℓℓ . There are
ℓ−1
N
υmin{i1 ,...,iℓ } such ℓ-sums in a given
K −1
database for a given ℓ-tuple (i1 , . . . , iℓ ). Therefore,
ℓ−1
N
−1
υmin{i1 ,...,iℓ } ℓ-sums in total,
there are N K
ℓ−1
N
−1)
υmin{i1 ,...,iℓ }
N( K
different ℓ-tuples
belonging to
K
of rows of (Wi1 , . . . , Wiℓ ).
Repeat the process up to sums of M elements.
Query Repetition: To decode each row of Wj , the user
has to repeat the above process K times, while shifting
the queries that contain rows of Wj to its neighboring
database and by choosing new sets of rows of Wi , i ∈
{1, . . . , M }, i 6= j in each repetition. The K different
linear combinations of the elements of each row of Wj
resulting from this process makes it possible to recover
all Lj × K elements of Wj .
B. Rate of Semantic PIR Scheme with Coded Databases
E[D] needs to be constant for any desired message to
guarantee user privacy. Therefore, it suffices to calculate
E[D] when retrieving Wj for some j. Within one round
PM
of queries, there are
i=1N υi number of singletons and
PM
ℓ−1
i−1
N
υ
−
1
N
i
i=ℓ
K
ℓ−1 number of sums of ℓ elements,
ℓ−1
M
M X
M
X
E[D] X
i−1
N
(7)
=
−1
υi
N υi +
N
K
K
ℓ−1
i=1
ℓ=2 i=ℓ
"M
#
i−1
ℓ
M
X
X
X
ℓ−1
N
(8)
−1
υℓ
υi +
=N
i−1
K
i=2
i=1
ℓ=2
"
ℓ
ℓ #
M
M
X
X
N
N
N
=K
=K
υℓ (9)
υ1 +
υℓ
K
K
K
ℓ=2
ℓ=1
For E[L], we calculate the total number of useful bits (bits
of Wj ). Considering a single round of queries, there are N υj
rows of Wj retrieved in terms of singletons. The number of
rows retrieved when Wj is the shortest message in an ℓ-sum
ℓ−1 j−1
N
−1
containing a row of Wj is N K
ℓ−1 υj . The number of rows retrieved when Wi , i 6= j is the shortest message
ℓ−1 i−2
N
−1
in an ℓ-sum containing a row of Wj is N K
ℓ−2 υi .
Let Uj denote the total number of useful bits retrieved.
U
Therefore, Kj can be calculated by counting the number of
different rows of Wj that correspond to all coded bits as,
ℓ−1
j
X
j−1
Uj
N
= N υj +
−1
υj
N
K
K
ℓ−1
ℓ=2
ℓ−1
j
M
X
X
N
i−2
υi
+
N
−1
ℓ−2
K
ℓ=2 i=j+1
ℓ−1
M X
M
X
N
i−2
+
υi
(10)
N
−1
ℓ−2
K
ℓ=j+1 i=ℓ
j−1
X
j−1
ℓ j −1
γ
+ N υj+1 γ
γ
= N υj
ℓ
ℓ
ℓ=0
ℓ=0
j
M−2
X
X
ℓ j
ℓ M −2
+ N υj+2 γ γ
+· · ·+N υM γ
γ
ℓ
ℓ
ℓ=0
ℓ=0
(11)
j
M
X N i−1
N
(12)
υi ,
=K
υ
+
γ
j
Kj
K
i=j+1
j−1
X
ℓ
N
where γ = K
− 1. Thus, the subpacketiation for Wi can be
Uj
defined as K , which represents the number of rows of Wj ,
that can be retrieved by a single use of the scheme.
Since the total number of rows of Wj , j ∈ {1, . . . , M } have
to be a common multiple of their own subpacketizations,
Uj
, j = 1, . . . , M
(13)
K
for some α ∈ N. Solving (12), (13) for the retrieval parameters
N
− 1 leads to:
υ1 , . . . , υM with γ = K
2
M
K
... − K
γ
− K
υ1
L1
N
N γ
N
2
M
K
υ2
L2
... − K
γ
0
1
N
N
.. =
..
.
.
.
.
.
.
.
.
. Kα .
.
.
.
.
LM
υM
K M
0
0
...
Lj = α
N
(14)
In order for the values of υi , i ∈ {1, . . . , M } to be integers,
this scheme requires all Li ’s to be a multiples of N M . Here,
α should be chosen to be the greatest common divisor (gcd)
of the elements of the vector resulting from multiplying the
matrix and the vector on the right side of (14). This allows
the shortest subpacketization levels for all messages.
The total and useful numbers of bits downloaded ((9) and
(12) respectively) are both within one subpacketization level.
These downloads are repeated α times to download the entire
message; see also (13). Thus, the achievable rate is given by,
PM
K i=1 pi Li
KE[L]
(15)
=
R=
PM N i
E[D]
αK 2 i=1 K
i υi
E[L]
=
PM
i
PM N i h K i
1
N
Kt
αK Kα i=1 K i
L
−
−1
Lt
i
t
i
t=i+1 N
N
K
(16)
E[L]
i
h
=P
(17)
PM
M
K t−i
N
t=i+1 N t−i Lt
i=1 Li − K − 1
E[L]
h
i
(18)
PM
i−1
+ i=3 Li 1 − 1 − K
L1 + L2
i−1
N
!−1
M −1
L1
L2
LM
K
K
=
+
+· · ·+
(19)
E[L]
N E[L]
N
E[L]
=
K
N
C. A Representative Example for the Proposed Scheme
In this example, we consider a (5, 3) code with M = 2,
i ×K
L1 = 100 and L2 = 50. First, the row indices of Wi ∈ FL
q
for i = 1, 2 are independently and uniformly permuted. The
rows of the first and the second messages after permutations
[i]
are denoted by xr for r ∈ {1, . . . , Li } and i = 1, 2. Messages
are indexed such that the first message is the longer. The
calculation of υi , i = 1, 2 is as follows:
6
1 53 − 25
L1
υ1
(20)
=
9
L2
υ2
3α 0
25
2 3L2
where α =gcd{ L51 − 2L
25 , 25 } =gcd{16, 6} = 2, υ1 = 8 and
υ2 = 3. The subpacketization levels of W1 and W2 in terms
U2
50
of the number of rows are U31 = 100
2 = 50 and 3 = 2 =
25, respectively. Assume that W1 is the desired message in
the sequel. Table I shows the queries sent to the databases to
retrieve W1 .
First, download υ1 = 8 different coded bits of W1 from each
database as singletons. There are 40 coded bits of this form
in total, corresponding to 40 different rows of W1 . Download
υ2 = 3 coded bits of W2 from each database such that three
coded bits corresponding to the same row of W2 is downloaded
from three different databases for correct decoding of the
downloaded rows of W2 . There are 15 different coded bits
of this form correspondingto 5 different rows of W2 .
[1]
N
− 1 υ2 = 2 sums of the form (xr1 +
Next, download K
[1]
[2]
xr2 )hn from the nth database for n = 1, . . . , N . Each xr1
[2]
corresponds to a new row of W1 and each xr2 corresponds to
an already decoded row of W2 in the previous step.
Finally, we repeat the above queries two extra times with
the queries that include rows of W1 shifted to the neighboring
database and by choosing new rows of W2 in each repetition.
The rate achieved by this scheme when downloading W1 is
10
R1 = 3×50
3×65 = 13 , and the rate achieved by this scheme when
5
3×25
= 13
. Therefore, the average
downloading W2 is R2 = 3×65
rate R achieved by the scheme is,
R ==
K(p1 L1 +p2 L2 )
10
5
= p1 R1 +p2 R2 = p1 + p2 (21)
p1 D+p2 D
13
13
This matches the capacity expression in Theorem 1.
V. C ONVERSE P ROOF
In this section, we provide an upper bound on the rate of
semantic PIR from MDS-coded databases.
First, note that the answer strings generated by any set of
K databases are independent from each other, i.e.,
X
[m]
[m]
[m]
H(A[m]
(22)
H(AΩ |QΩ , W∆ ) =
n |Qn , W∆ ),
n∈Ω
where Ω ⊂ [N ] such that |Ω| = K, and W∆ is any subset of
messages. The proof can be found in [44, Lemma 1].
We begin the derivation of the upper bound on the rate from
the expression for E[D]. To satisfy the privacy constraint, E[D]
is the same for any desired message. Without loss of generality,
assume that the user required message is W1 ,
E[D] =
N
X
[1]
H(Ai )
(23)
i=1
[1]
[1]
≥ H(A1:N |Q1:N )
=
(24)
[1]
[1]
I(W1:M ; A1:N |Q1:N )
(25)
[1]
[1]
I(W2:M ; A1:N , Q1:N |W1 )
= H(W1 ) +
1
≥ KL1 + N
K
= KL1 +
1
N
K
= KL1 +
= KL1 +
1
N
K
1
N
K
[1]
(26)
[1]
X
I(W2:M ; AΩ , QΩ |W1 ) (27)
X
H(AΩ |QΩ , W1 )
(28)
X
X
[1]
H(A[1]
n |Qn , W1 )
(29)
X
X
[2]
H(A[2]
n |Qn , W1 )
(30)
Ω:|Ω|=K
[1]
[1]
Ω:|Ω|=K
Ω:|Ω|=K n∈Ω
Ω:|Ω|=K n∈Ω
Database 1
[1]
x 1 h1
..
.
Database 2
[1]
x 9 h2
..
.
Database 3
[1]
x17 h3
..
.
[1]
x16 h2
[2]
x 1 h2
[2]
x 3 h2
[2]
x 4 h2
[1]
[2]
(x43 + x2 )h2
[1]
[2]
(x44 + x5 )h2
[1]
x 1 h2
..
.
[1]
x 8 h2
[2]
x 6 h2
[2]
x 8 h2
[2]
x 9 h2
[1]
[2]
(x41 + x7 )h2
[1]
[2]
(x42 + x10 )h2
[1]
x33 h2
..
.
[2]
x40 h2
[2]
x11 h2
[2]
x13 h2
[2]
x14 h2
[1]
[2]
(x49 + x12 )h2
[1]
[2]
(x50 + x15 )h2
x 8 h1
[2]
x 1 h1
[2]
x 2 h1
[2]
x 4 h1
[1]
[2]
(x41 + x3 )h1
[1]
[2]
(x42 + x5 )h1
x33 h1
..
.
x40 h1
[2]
x 6 h1
[2]
x 7 h1
[2]
x 9 h1
[1]
[2]
(x49 + x8 )h1
[1]
[2]
(x50 + x10 )h1
x25 h1
..
.
x32 h1
[2]
x11 h1
[2]
x12 h1
[2]
x14 h1
[1]
[2]
(x47 + x13 )h1
[1]
[2]
(x48 + x15 )h1
[1]
x24 h3
[2]
x 1 h3
[2]
x 3 h3
[2]
x 5 h3
[1]
[2]
(x45 + x2 )h3
[1]
[2]
(x46 + x4 )h3
[1]
x 9 h3
..
.
[1]
x16 h3
[2]
x 6 h3
[2]
x 8 h3
[2]
x10 h3
[1]
[2]
(x43 + x7 )h3
[1]
[2]
(x44 + x9 )h3
[1]
x 1 h3
..
.
[2]
x 8 h3
[2]
x11 h3
[2]
x13 h3
[2]
x15 h3
[1]
[2]
(x41 + x12 )h3
[1]
[2]
(x42 + x14 )h3
Database 4
[1]
x25 h4
..
.
[1]
x32 h4
[2]
x 2 h4
[2]
x 3 h4
[2]
x 5 h4
[1]
[2]
(x47 + x1 )h4
[1]
[2]
(x48 + x4 )h4
[1]
x17 h4
..
.
[1]
x24 h4
[2]
x 7 h4
[2]
x 8 h4
[2]
x10 h4
[1]
[2]
(x45 + x6 )h4
[1]
[2]
(x46 + x9 )h4
[1]
x 9 h4
..
.
[2]
x16 h4
[2]
x12 h4
[2]
x13 h4
[2]
x15 h4
[1]
[2]
(x43 + x11 )h4
[1]
[2]
(x44 + x14 )h4
Database 5
[1]
x33 h5
..
.
[1]
x40 h5
[2]
x 2 h5
[2]
x 4 h5
[2]
x 5 h5
[1]
[2]
(x49 + x1 )h5
[1]
[2]
(x50 + x3 )h5
[1]
[1]
x25 h5
..
.
[1]
x32 h5
[2]
x 7 h5
[2]
x 9 h5
[2]
x10 h5
[1]
[2]
(x47 + x6 )h5
[1]
[2]
(x48 + x8 )h5
[1]
x17 h5
..
.
[2]
x24 h5
[2]
x12 h5
[2]
x14 h5
[2]
x15 h5
[1]
[2]
(x45 + x11 )h5
[1]
[2]
(x46 + x13 )h5
[1]
[1]
[1]
[2]
TABLE I
T HE QUERY TABLE FOR THE RETRIEVAL OF W1 .
= KL1 +
≥ KL1 +
1
N
K
1
N
K
[2]
[2]
X
H(AΩ |QΩ , W1 )
X
[2]
[2]
H(AΩ |Q1:N , W1 )
(31)
Ω:|Ω|=K
(32)
Ω:|Ω|=K
K
[2]
[2]
H(A1:N |Q1:N , W1 )
(33)
N
K
[2]
[2]
(34)
= KL1 + I(W2:M ; A1:N , Q1:N |W1 )
N
K
[2]
[2]
= KL1 + I(W2:M ; W2 , A1:N , Q1:N |W1 )
(35)
N
K
[2]
[2]
= KL1 + (KL2 +I(W3:M ; A1:N , Q1:N |W1:2 )) (36)
N
K
K
[2]
[2]
= KL1 + KL2 + I(W3:M ; A1:N , Q1:N |W1:2 ),
N
N
(37)
≥ KL1 +
where (29),
the privacy
equality
(30), and (33) follow from (22),
constraint in (3), and Han’s inP
[m]
[m]
1
H(AΩ |W∆ , Q1:N )
≥
N
(K
) Ω:|Ω|=K
[m]
[m]
K
N H(A1:N |W∆ , Q1:N ), respectively. The recursive application
of (26)-(37) on the last two terms in (37) gives
R=
KE[L]
≤
E[D]
K L1 +
KE[L]
K
N L2
+ ··· +
K M −1
L
N M −1 M
(38)
leading to the capacity expression in Theorem 1.
VI. D ISCUSSION
In this paper, we presented a capacity-achieving scheme
for semantic PIR from MDS-coded databases. An alternative
description of the scheme in Section IV is as follows. Each
MDS coded database contains Li coded bits representing Wi
with L1 ≥ L2 ≥ · · · ≥ LM . Initially, consider the first LM
coded bits of all M messages and perform equal-length PIR as
described in [44]. Then, consider the next LM −1 − LM coded
bits of all messages except WM and perform equal-length PIR
with the remaining M − 1 messages. Continue this process
until the last L1 − L2 coded bits of W1 and perform PIR
with a single message. This process yields the same outcome
described in the scheme in Section IV.
Nevertheless, the description of the scheme in Section IV
provides a systematic method of calculating the subpacketization based on the message sizes {Li }M
i=1 , such that the scheme
described on a single subpacket is applied repeatedly throughout the retrieval process in the same way. This is in contrast to
the alternative description, where the subpacketization changes
from one block to another (blocks of sizes LM and Li − Li+1
bits for i = 1, . . . , M − 1). In summary, the description of the
scheme in Section IV uses a fixed subpacketization, while that
of the alternative scheme uses irregular subpacketization.
R EFERENCES
[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan. Private information
retrieval. Journal of the ACM, 45(6):965–981, November 1998.
[2] H. Sun and S. A. Jafar. The capacity of private information retrieval.
IEEE Trans. on Info. Theory, 63(7):4075–4088, July 2017.
[3] H. Sun and S. A. Jafar. The capacity of robust private information
retrieval with colluding databases. IEEE Trans. on Info. Theory,
64(4):2361–2370, April 2018.
[4] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, C. Hollanti,
and S. El Rouayheb. Private information retrieval schemes for coded
data with arbitrary collusion patterns. In IEEE ISIT, June 2017.
[5] R. Bitar and S. El Rouayheb. Staircase-PIR: Universally robust private
information retrieval. In IEEE ITW, pages 1–5, November 2018.
[6] H. Sun and S. A. Jafar. The capacity of symmetric private information
retrieval. IEEE Transactions on Information Theory, 65(1):322–329,
January 2019.
[7] Q. Wang, H. Sun, and M. Skoglund. Symmetric private information
retrieval with mismatched coded messages and randomness. In IEEE
ISIT, pages 365–369, July 2019.
[8] T. Guo, R. Zhou, and C. Tian. On the information leakage in private
information retrieval systems. Available at arXiv: 1909.11605.
[9] K. Banawan and S. Ulukus. Multi-message private information retrieval:
Capacity results and near-optimal schemes. IEEE Trans. on Info. Theory,
64(10):6842–6862, October 2018.
[10] K. Banawan and S. Ulukus. The capacity of private information retrieval
from Byzantine and colluding databases. IEEE Trans. on Info. Theory,
65(2):1206–1219, February 2019.
[11] R. Tandon. The capacity of cache aided private information retrieval.
In Allerton Conference, October 2017.
[12] Y.-P. Wei, K. Banawan, and S. Ulukus. Fundamental limits of cacheaided private information retrieval with unknown and uncoded prefetching. IEEE Trans. on Info. Theory, 65(5):3215–3232, May 2019.
[13] Y.-P. Wei, K. Banawan, and S. Ulukus. Cache-aided private information
retrieval with partially known uncoded prefetching: Fundamental limits.
IEEE JSAC, 36(6):1126–1139, June 2018.
[14] S. Kumar, A. G. i Amat, E. Rosnes, and L. Senigagliesi. Private
information retrieval from a cellular network with caching at the edge.
IEEE Trans. on Communications, 67(7):4900–4912, July 2019.
[15] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson.
Private information retrieval with side information. IEEE Trans. on Info.
Theory, 66(4):2032–2043, April 2020.
[16] Z. Chen, Z. Wang, and S. Jafar. The capacity of T -private information
retrieval with private side information. Available at arXiv:1709.03022.
[17] Y.-P. Wei, K. Banawan, and S. Ulukus. The capacity of private
information retrieval with partially known private side information. IEEE
Trans. on Info. Theory, 65(12):8222–8231, December 2019.
[18] S. P. Shariatpanahi, M. J. Siavoshani, and M. A. Maddah-Ali. Multimessage private information retrieval with private side information. In
IEEE ITW, pages 1–5, November 2018.
[19] S. Li and M. Gastpar. Converse for multi-server single-message PIR
with side information. Available at arXiv:1809.09861.
[20] H. Sun and S. A. Jafar. The capacity of private computation. IEEE
Trans. on Info. Theory, 65(6):3880–3897, June 2019.
[21] M. Mirmohseni and M. A. Maddah-Ali. Private function retrieval. In
IWCIT, pages 1–6, April 2018.
[22] Z. Chen, Z. Wang, and S. Jafar. The asymptotic capacity of private
search. In IEEE ISIT, June 2018.
[23] M. A. Attia, D. Kumar, and R. Tandon. The capacity of private information retrieval from uncoded storage constrained databases. Available
at arXiv:1805.04104v2.
[24] C. Tian, H. Sun, and J. Chen. Capacity-achieving private information
retrieval codes with optimal message size and upload cost. IEEE Trans.
on Info. Theory, 65(11):7613–7627, Nov 2019.
[25] Y.-P. Wei and S. Ulukus. The capacity of private information retrieval
with private side information under storage constraints. IEEE Trans. on
Info. Theory, 66(4):2023–2031, April 2020.
[26] K. Banawan, B. Arasli, and S. Ulukus. Improved storage for efficient
private information retrieval. In IEEE ITW, August 2019.
[27] C. Tian. On the storage cost of private information retrieval. Available
at arXiv:1910.11973.
[28] Y.-P. Wei, B. Arasli, K. Banawan, and S. Ulukus. The capacity of private
information retrieval from decentralized uncoded caching databases.
Information, 10, December 2019.
[29] K. Banawan, B. Arasli, Y. P. Wei, and S. Ulukus. The capacity of private
information retrieval from heterogeneous uncoded caching databases.
IEEE Trans. on Info. Theory, 66(6):3407–3416, 2020.
[30] N. Raviv and I. Tamo. Private information retrieval in graph based
replication systems. In IEEE ISIT, June 2018.
[31] K. Banawan and S. Ulukus. Private information retrieval from nonreplicated databases. In IEEE ISIT, pages 1272–1276, July 2019.
[32] K. Banawan and S. Ulukus. Private information retrieval through
wiretap channel II: Privacy meets security. IEEE Trans. on Info. Theory,
66(7):4129–4149, 2020.
[33] H. Sun and S. A. Jafar. Optimal download cost of private information
retrieval for arbitrary message length. IEEE Trans. on Info. Forensics
and Security, 12(12):2920–2932, December 2017.
[34] Q. Wang, H. Sun, and M. Skoglund. The capacity of private information
retrieval with eavesdroppers. IEEE Trans. on Info. Theory, 65(5):3198–
3214, May 2019.
[35] H. Yang, W. Shin, and J. Lee. Private information retrieval for secure
distributed storage systems. IEEE Trans. on Info. Forensics and Security,
13(12):2953–2964, December 2018.
[36] Z. Jia, H. Sun, and S. Jafar. Cross subspace alignment and the asymptotic
capacity of X-secure T -private information retrieval. IEEE Trans. on
Info. Theory, 65(9):5783–5798, September 2019.
[37] R. Zhou, C. Tian, H. Sun, and T. Liu. Capacity-achieving private
information retrieval codes from MDS-coded databases with minimum
message size. Available at arXiv: 1903.08229.
[38] K. Banawan and S. Ulukus. Asymmetry hurts: Private information
retrieval under asymmetric-traffic constraints. IEEE Trans. on Info.
Theory, 65(11):7628–7645, November 2019.
[39] K. Banawan and S. Ulukus. Noisy private information retrieval: On
separability of channel coding and information retrieval. IEEE Trans.
on Info. Theory, 65(12):8232–8249, December 2019.
[40] R. G. L. D’Oliveira and S. El Rouayheb. One-shot PIR: Refinement and
lifting. IEEE Trans. on Info. Theory, 66(4):2443–2455, April 2020.
[41] R. Tajeddine, A. Wachter-Zeh, and C. Hollanti. Private information
retrieval over random linear networks. Available at arXiv:1810.08941.
[42] Z. Wang, K. Banawan, and S. Ulukus. Private set intersection: A multimessage symmetric private information retrieval perspective. Available
at arXiv: 1912.13501.
[43] Z. Wang, K. Banawan, and S. Ulukus. Multi-party private set intersection: An information-theoretic approach. Available at arXiv:
2008.07504.
[44] K. Banawan and S. Ulukus. The capacity of private information retrieval
from coded databases. IEEE Trans. on Info. Theory, 64(3):1945–1956,
March 2018.
[45] R. Freij-Hollanti, O. Gnilke, C. Hollanti, and D. Karpuk. Private
information retrieval from coded databases with colluding servers. SIAM
Journal on Applied Algebra and Geometry, 1(1):647–664, 2017.
[46] H. Sun and S. A. Jafar. Private information retrieval from MDS coded
data with colluding servers: Settling a conjecture by Freij-Hollanti et al.
IEEE Trans. on Info. Theory, 64(2):1000–1022, February 2018.
[47] Y. Zhang and G. Ge. A general private information retrieval scheme
for MDS coded databases with colluding servers. Designs, Codes and
Cryptography, 87(11), November 2019.
[48] Y. Zhang and G. Ge. Multi-file private information retrieval from MDS
coded databases with colluding servers. Available at arXiv: 1705.03186.
[49] R. Tandon, M. Abdul-Wahid, F. Almoualem, and D. Kumar. PIR from
storage constrained databases - coded caching meets PIR. IEEE ICC,
May 2018.
[50] T. Chan, S. Ho, and H. Yamamoto. Private information retrieval for
coded storage. In IEEE ISIT, June 2015.
[51] R. Tajeddine, O. W. Gnilke, and S. El Rouayheb. Private information
retrieval from mds coded data in distributed storage systems. IEEE Tran.
on Info.Theory, 64(11):7081–7093, November 2018.
[52] R. Tajeddine and S. El Rouayheb. Robust private information retrieval
on coded data. In IEEE ISIT, June 2017.
[53] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, and C. Hollanti. Private information retrieval from coded storage systems with
colluding, Byzantine, and unresponsive servers. IEEE Trans. on Info.
Theory, 65(6):3898–3906, June 2019.
[54] S. Vithana, K. Banawan, and S. Ulukus. Semantic private information
retrieval. Available at arXiv: 2003.13667.