Academia.eduAcademia.edu

Semantic Private Information Retrieval From MDS-Coded Databases

2021

We investigate the problem of semantic private information retrieval (PIR) from coded databases, where a user requires to download a message out of $M$ independent messages, without revealing its identity to the databases. These messages are coded using an (N, K) MDS code and stored in $N$ non-colluding databases. The $M$ messages are allowed to have different semantics, e.g., different sizes and different probabilities of retrieval. We characterize the exact capacity of semantic PIR with coded databases, and provide an achievable scheme with non-uniform subpacketization. We show that the retrieval rate of semantic PIR with coded databases outperforms that of classical PIR with coded databases when the effects of zero padding shorter messages are taken into account.

Semantic Private Information Retrieval From MDS-Coded Databases Sajani Vithana1 , Karim Banawan2 , and Sennur Ulukus1 1 2 Department of Electrical and Computer Engineering, University of Maryland Electrical Engineering Department, Faculty of Engineering, Alexandria University Abstract—We investigate the problem of semantic private information retrieval (PIR) from coded databases, where a user requires to download a message out of M independent messages, without revealing its identity to the databases. These messages are coded using an (N, K) MDS code and stored in N non-colluding databases. The M messages are allowed to have different semantics, e.g., different sizes and different probabilities of retrieval. We characterize the exact capacity of semantic PIR with coded databases, and provide an achievable scheme with non-uniform subpacketization. We show that the retrieval rate of semantic PIR with coded databases outperforms that of classical PIR with coded databases when the effects of zero padding shorter messages are taken into account. I. I NTRODUCTION The classical private information retrieval (PIR) problem refers to a setting, where a user downloads a required message from a system of non-colluding replicated databases containing a number of messages while not revealing the identity of the downloaded message to the databases. This problem was first introduced by Chor et al. in [1]. The information-theoretic characterization of the classical problem is presented in [2]. In [2], the performance metric is the retrieval rate, which is the ratio between the desired message bits and the total download. The supermum of all achievable retrieval rates is called the PIR capacity. The PIR capacity of many variants of the problem have been studied (see for instance [3]–[43]). The most closely related works to ours are PIR from MDScoded databases, which is studied extensively in [44]–[53], and the semantic PIR problem from replicated databases in [54]. In semantic PIR, the messages exhibit different semantics. These semantics include message sizes and prior probabilities. This is motivated in practice by the fact that files in storage systems have different sizes (e.g., databases may simultaneously store text files and video files with vastly different message sizes) and popularity levels (e.g., databases may simultaneously store trending and stale files). The work of [54] characterizes the capacity of semantic PIR from replicated databases. On the other hand, MDS-coded storage systems provide an increased level of reliability for information stored in systems of databases without incurring the excessive storage cost of direct replication. The capacity of the MDS-coded PIR problem is characterized in [44] to be a function of the MDS code parameters. Due to the relevance of MDS coding and semantic heterogeneity in practice, it is desirable to provide a viable PIR scheme that efficiently optimizes the PIR rate corresponding to the storage code structure and the given message semantics. In this paper, we aim at characterizing the capacity of semantic PIR from N non-colluding coded databases using an arbitrary (N, K) MDS storage code. In this problem, there are M messages with different semantics, i.e., each message has a different message size and a different popularity level. More specifically, the storage system possesses messages (W1 , . . . , WM ) in matrix form, with K columns and Li , i ∈ {1, . . . , M } rows. Each row is mapped to the content of the databases via an (N, K) MDS storage code and then distributed to a system of N non-colluding databases. Furthermore, the messages have arbitrary probabilities of retrieval (pi , i ∈ {1, . . . , M }) to reflect the popularity levels. We investigate the interplay between the storage code parameters, non-uniform subpacketization, and its effect on the PIR rate, i.e., for a given (N, K) MDS code, how can we design the retrieval parameters to exploit the heterogeneity of the message semantics to maximize the retrieval rate? We characterize the MDS-coded −1  semantic PIR from (N, K)  L2 L1 K K M −1 LM databases as C = E[L] + N E[L] +. . .+ N , E[L] where the expected number of rows E[L] is with respect to the retrieval probability distribution. We provide an achievable scheme, which is an extension to the scheme introduced in [44]. The main difference of our scheme compared to its counterpart in [44] is that our achievable scheme uses non-uniform subpacketization that is parameterized by the message sizes and the storage code parameters. Specifically, a single application of the scheme results in different number of useful downloads (which correspond to the number of rows in the required message) for different message requirements of the user. The converse is extended from [44] with the incorporation of the heterogeneity of message sizes. Compared to the achievable MDS-coded PIR rate with zero padding, i.e., when all messages are zero-padded to match the size of the largest message, the semantic PIR capacity expression shows a strict rate gain when message semantics are not identical. II. P ROBLEM F ORMULATION We consider an (N, K) MDS coded distributed storage system containing M independent messages. The messages are allowed to have different lengths and different prior probabilities of retrieval. The prior probability distribution (pi , i ∈ {1, . . . , M }) is known by the databases and the user. Each message Wi , i ∈ {1, . . . , M } is represented as i ×K , where the Li × K elements of Wi are a matrix in FL q chosen uniformly and independently from Fq . Without loss of generality, we assume that the messages are ordered with respect to their sizes, such that L1 ≥ L2 ≥ · · · ≥ LM . The message sizes can be expressed in q-ary symbols as, H(Wi ) = KLi , i = 1, . . . , M. (1) The generator matrix of the (N, K) code H is a FqK×N matrix, which is represented as H = [h1 , h2 , . . . , hN ], where hi ∈ FK q , i ∈ [N ]. The MDS property implies that any combination of up to K columns of H are linearly independent. [i] Let the jth row of Wi be denoted by xj , j = 1, . . . , Li and i = 1, . . . , M . The nth database stores a projection of this row [i] as xj hn , n = 1, . . . , N . In order to retrieve a message Wi , the user sends query [i] Qn to the nth database, n = 1, . . . , N . The objective is to retrieve Wi without revealing the index i to any database. The queries and messages are independent of each other. Once the databases receive the queries from the user, they generate [i] answer strings denoted by An , n = 1 . . . , N , to send back to the user. These answer strings are deterministic functions of the stored coded messages and the received query. An achievable PIR scheme satisfies the following constraints: Correctness: The user should be able to perfectly decode the desired message using the received answer strings. Thus, [i] [i] [i] [i] H(Wi |A1 , . . . , AN , Q1 , . . . , QN ) = 0, i = 1, . . . , M (2) Privacy: The queries sent to the databases by the user should not leak any information on the required message index. Thus, the joint PMFs of queries, answers and messages need to satisfy the following for n ∈ [N ], j ∈ [M ], i 6= j [i] (Q[i] n , An , W1 , . . . , WM ) ∼ [j] (Q[j] n , An , W1 , . . . , WM ) (3) An achievable semantic PIR scheme for a coded distributed system is a scheme that satisfies the correctness and privacy conditions in (2) and (3). The achievable rate of the semantic PIR scheme with coded databases is defined as, R= KE[L] E[H(Wi )] = E[D] E[D] (4) PM where E[L] = i=1 pi Li is the expected number of rows in a message and E[D] is the expected number of total bits downloaded. The expectation E[·] is with respect to the a priori probability distribution. The capacity of semantic PIR from coded distributed databases is the supremum of the expected retrieval rates over all achievable schemes. III. M AIN R ESULTS AND D ISCUSSIONS In this section, we present the capacity of semantic PIR with MDS-coded databases, which depends on the number of messages and their semantics as well as the MDS storage code. Theorem 1 The capacity of semantic PIR with an (N, K) MDS-coded distributed storage system containing M messages (Wi , i ∈ {1, . . . , M }) with Li rows in each of their matrix representations (arranged in decreasing order as L1 ≥ L2 ≥ · · · ≥ LM ), and prior probabilities pi , is !−1    M −1 L2 LM K K L1 + +. . .+ (5) C= E[L] N E[L] N E[L] PM where E[L] = i=1 pi Li . The achievability proof of Theorem 1 is presented in Section IV and the converse proof is presented in Section V. Corollary 1 For messages with arbitrary lengths and for any given (N, K) MDS code, the semantic PIR capacity with coded databases outperforms the achievable PIR rate with zero-padding, when message lengths are not identical. Proof: Let Rc and Cc be the achievable rate and the capacity of classical coded PIR, respectively. For a setting with an (N, K) MDS code and M messages with L1 ≥ L2 ≥ . . . ≥ LM , the classical coded PIR scheme zero pads messages W2 , . . . , WM such that all messages contain L1 rows. ThereL1 . fore, the download cost in classical coded PIR is D = C c However, since the number of rows of all messages are not the same, the effective retrieval rate of classical coded PIR is, !−1    M −1 K K L1 L1 L1 + +. . .+ Rc = (6) E[L] N E[L] N E[L] Since L1 ≥ . . . ≥ LM , comparison of (6) and (5) shows that semantic coded PIR outperforms classical coded PIR in terms of the retrieval rate.  IV. ACHIEVABILITY P ROOF In this section, we present a scheme that achieves the capacity expression in Theorem 1. A. Achievable Scheme This scheme is an extension to the scheme in [44]. All queries of this scheme are in blocks of ℓ-sums of coded message bits for ℓ ∈ {1, . . . , M }. Our proposed scheme optimizes the number of queries in each block of ℓ-sums, based on the message lengths {Li }M i=1 . This scheme utilizes non-uniform subpacketization, as opposed to uniform subpacketization used in [44]. The general achievability scheme is as follows: 1) Message index assignment, permutation of row indices and calculation of retrieval parameters: Assign message indices in the descending order of the message sizes, i.e., L1 ≥ L2 ≥ · · · ≥ LM . Permute the rows of all messages independently from each other using random permutations, privately from the databases. Calculate the retrieval parameters υ1 , . . . , υM using (14). Assume that the desired message is Wj in the sequel. 2) Singletons: Download υj different coded bits corresponding to message Wj from the nth database, for 3) 4) 5) 6) n ∈ [N ]. From each database, download υi coded bits of Wi , i ∈ [M ], i 6= j, such that there exists K coded bits that are downloaded from K different databases that correspond to the same row of Wi (i.e., for some row t, [i] [i] xt hn1 , . . . , xt hnK for n1 , . . . , nK ∈ [N ] and ni 6= nj for i, j ∈ {1, . . . , K}). This is required to decode the side information. Hence, for Wi , i 6= j, there are N υi coded bits corresponding to NKυi different rows. Sums of two elements: There are two types of blocks in this step. The first block is the sums involving bits of the desired message, Wj , and the other block is the sums that do not have any bits from Wj . In the first block, make use of the side information (singletons corresponding to Wi , i 6= j) downloaded in the previous step. Consider a 2-sum corresponding to coded bits of Wj , Wi , i 6= j. For a given database, download N − 1) min{υi , υj } 2-sums utilizing all the side in(K formation bits of Wi from other databases. Each 2-sum contains a coded bit corresponding to a new row of Wj and a coded bit of Wi which corresponds to a row that was already decoded from K different databases (which does not include this database) in the previous step. The second block of 2-sums contains coded bits corresponding to Wi1 , Wi2 , i1 6= i2 6= j. DownN − 1 min{υi1 , υi2 } 2-sums each from each load K [i ] [i ] databases of the form (xra1 +xrb2 )hn , n ∈ [N ], where ra and rb are new rows of Wi1 and Wi2 . Each pair of rows (ra , rb ) must be downloaded from K different databases [i ] [i ] for correct decoding of xra1 + xrb2 . Thus, the second N block contains N K − 1 min{υi1 , υi2 } bits, which N N( K −1) min{υi1 ,υi2 } different rows. corresponds to K Sums of ℓ elements: There are two types of blocks similar to sums of two. the first block contains queries [iℓ−1 ] [j] [i ] )hn for i1 6= of the form (xr + xr11 + . . . + xrℓ−1 [j] . . . 6= iℓ−1 6= j where xr is a new row of Wj [iℓ−1 ] [i ] and xr11 + . . . + xrℓ−1 is an already decoded (ℓ − 1)-sum from the second block in the previous step. For a given database and given i1 , . . . , iℓ−1 , there are ℓ−1 N υmin{j,i1 ,...,iℓ−1 } such ℓ-sums. K −1 The second block of ℓ-sums contains queries of the [i ] [i ] form (xt11 + . . . + xtℓℓ )hn for i1 6= . . . 6= iℓ 6= j where (t1 , . . . , tℓ ) are new rows of Wi1 , . . . , Wiℓ which are repeated at K different databases for [i ] [i ] the correct decoding of xt11 , . . . , xtℓℓ . There are  ℓ−1 N υmin{i1 ,...,iℓ } such ℓ-sums in a given K −1 database for a given ℓ-tuple (i1 , . . . , iℓ ). Therefore, ℓ−1 N −1 υmin{i1 ,...,iℓ } ℓ-sums in total, there are N K ℓ−1 N −1) υmin{i1 ,...,iℓ } N( K different ℓ-tuples belonging to K of rows of (Wi1 , . . . , Wiℓ ). Repeat the process up to sums of M elements. Query Repetition: To decode each row of Wj , the user has to repeat the above process K times, while shifting the queries that contain rows of Wj to its neighboring database and by choosing new sets of rows of Wi , i ∈ {1, . . . , M }, i 6= j in each repetition. The K different linear combinations of the elements of each row of Wj resulting from this process makes it possible to recover all Lj × K elements of Wj . B. Rate of Semantic PIR Scheme with Coded Databases E[D] needs to be constant for any desired message to guarantee user privacy. Therefore, it suffices to calculate E[D] when retrieving Wj for some j. Within one round PM of queries, there are i=1N υi number of singletons and  PM ℓ−1 i−1 N υ − 1 N i i=ℓ K ℓ−1 number of sums of ℓ elements,  ℓ−1   M M X M X E[D] X i−1 N (7) = −1 υi N υi + N K K ℓ−1 i=1 ℓ=2 i=ℓ "M # i−1 ℓ  M X X X ℓ−1 N (8) −1 υℓ υi + =N i−1 K i=2 i=1 ℓ=2 " ℓ  ℓ # M  M X X N N N =K =K υℓ (9) υ1 + υℓ K K K ℓ=2 ℓ=1 For E[L], we calculate the total number of useful bits (bits of Wj ). Considering a single round of queries, there are N υj rows of Wj retrieved in terms of singletons. The number of rows retrieved when Wj is the shortest message in an ℓ-sum ℓ−1 j−1 N −1 containing a row of Wj is N K ℓ−1 υj . The number of rows retrieved when Wi , i 6= j is the shortest message ℓ−1 i−2 N −1 in an ℓ-sum containing a row of Wj is N K ℓ−2 υi . Let Uj denote the total number of useful bits retrieved. U Therefore, Kj can be calculated by counting the number of different rows of Wj that correspond to all coded bits as, ℓ−1    j X j−1 Uj N = N υj + −1 υj N K K ℓ−1 ℓ=2   ℓ−1  j M X X N i−2 υi + N −1 ℓ−2 K ℓ=2 i=j+1   ℓ−1  M X M X N i−2 + υi (10) N −1 ℓ−2 K ℓ=j+1 i=ℓ    j−1 X j−1 ℓ j −1 γ + N υj+1 γ γ = N υj ℓ ℓ ℓ=0 ℓ=0     j M−2 X X ℓ j ℓ M −2 + N υj+2 γ γ +· · ·+N υM γ γ ℓ ℓ ℓ=0 ℓ=0 (11)     j  M X N i−1 N (12) υi  , =K υ + γ j Kj K i=j+1 j−1 X ℓ  N where γ = K − 1. Thus, the subpacketiation for Wi can be Uj defined as K , which represents the number of rows of Wj , that can be retrieved by a single use of the scheme. Since the total number of rows of Wj , j ∈ {1, . . . , M } have to be a common multiple of their own subpacketizations, Uj , j = 1, . . . , M (13) K for some α ∈ N. Solving (12), (13) for the retrieval parameters N − 1 leads to: υ1 , . . . , υM with γ = K  2 M      K ... − K γ − K υ1 L1 N N γ N   2 M K  υ2  L2  ... − K γ  0 1     N N     ..  = ..  . . . .   . . . .  .  Kα  .  .  . . .  LM υM K M 0 0 ... Lj = α N (14) In order for the values of υi , i ∈ {1, . . . , M } to be integers, this scheme requires all Li ’s to be a multiples of N M . Here, α should be chosen to be the greatest common divisor (gcd) of the elements of the vector resulting from multiplying the matrix and the vector on the right side of (14). This allows the shortest subpacketization levels for all messages. The total and useful numbers of bits downloaded ((9) and (12) respectively) are both within one subpacketization level. These downloads are repeated α times to download the entire message; see also (13). Thus, the achievable rate is given by, PM K i=1 pi Li KE[L] (15) = R= PM N i E[D] αK 2 i=1 K i υi E[L] =  PM  i PM N i h K i  1 N Kt αK Kα i=1 K i L − −1 Lt i t i t=i+1 N N K (16) E[L]  i  h =P (17)  PM M K t−i N t=i+1 N t−i Lt i=1 Li − K − 1 E[L] h  i (18) PM i−1 + i=3 Li 1 − 1 − K L1 + L2 i−1 N !−1    M −1 L1 L2 LM K K = + +· · ·+ (19) E[L] N E[L] N E[L] = K N  C. A Representative Example for the Proposed Scheme In this example, we consider a (5, 3) code with M = 2, i ×K L1 = 100 and L2 = 50. First, the row indices of Wi ∈ FL q for i = 1, 2 are independently and uniformly permuted. The rows of the first and the second messages after permutations [i] are denoted by xr for r ∈ {1, . . . , Li } and i = 1, 2. Messages are indexed such that the first message is the longer. The calculation of υi , i = 1, 2 is as follows:      6 1 53 − 25 L1 υ1 (20) = 9 L2 υ2 3α 0 25 2 3L2 where α =gcd{ L51 − 2L 25 , 25 } =gcd{16, 6} = 2, υ1 = 8 and υ2 = 3. The subpacketization levels of W1 and W2 in terms U2 50 of the number of rows are U31 = 100 2 = 50 and 3 = 2 = 25, respectively. Assume that W1 is the desired message in the sequel. Table I shows the queries sent to the databases to retrieve W1 . First, download υ1 = 8 different coded bits of W1 from each database as singletons. There are 40 coded bits of this form in total, corresponding to 40 different rows of W1 . Download υ2 = 3 coded bits of W2 from each database such that three coded bits corresponding to the same row of W2 is downloaded from three different databases for correct decoding of the downloaded rows of W2 . There are 15 different coded bits of this form correspondingto 5 different rows of W2 . [1] N − 1 υ2 = 2 sums of the form (xr1 + Next, download K [1] [2] xr2 )hn from the nth database for n = 1, . . . , N . Each xr1 [2] corresponds to a new row of W1 and each xr2 corresponds to an already decoded row of W2 in the previous step. Finally, we repeat the above queries two extra times with the queries that include rows of W1 shifted to the neighboring database and by choosing new rows of W2 in each repetition. The rate achieved by this scheme when downloading W1 is 10 R1 = 3×50 3×65 = 13 , and the rate achieved by this scheme when 5 3×25 = 13 . Therefore, the average downloading W2 is R2 = 3×65 rate R achieved by the scheme is, R == K(p1 L1 +p2 L2 ) 10 5 = p1 R1 +p2 R2 = p1 + p2 (21) p1 D+p2 D 13 13 This matches the capacity expression in Theorem 1. V. C ONVERSE P ROOF In this section, we provide an upper bound on the rate of semantic PIR from MDS-coded databases. First, note that the answer strings generated by any set of K databases are independent from each other, i.e., X [m] [m] [m] H(A[m] (22) H(AΩ |QΩ , W∆ ) = n |Qn , W∆ ), n∈Ω where Ω ⊂ [N ] such that |Ω| = K, and W∆ is any subset of messages. The proof can be found in [44, Lemma 1]. We begin the derivation of the upper bound on the rate from the expression for E[D]. To satisfy the privacy constraint, E[D] is the same for any desired message. Without loss of generality, assume that the user required message is W1 , E[D] = N X [1] H(Ai ) (23) i=1 [1] [1] ≥ H(A1:N |Q1:N ) = (24) [1] [1] I(W1:M ; A1:N |Q1:N ) (25) [1] [1] I(W2:M ; A1:N , Q1:N |W1 ) = H(W1 ) + 1 ≥ KL1 + N  K = KL1 + 1  N K = KL1 + = KL1 + 1 N K 1 N K   [1] (26) [1] X I(W2:M ; AΩ , QΩ |W1 ) (27) X H(AΩ |QΩ , W1 ) (28) X X [1] H(A[1] n |Qn , W1 ) (29) X X [2] H(A[2] n |Qn , W1 ) (30) Ω:|Ω|=K [1] [1] Ω:|Ω|=K Ω:|Ω|=K n∈Ω Ω:|Ω|=K n∈Ω Database 1 [1] x 1 h1 .. . Database 2 [1] x 9 h2 .. . Database 3 [1] x17 h3 .. . [1] x16 h2 [2] x 1 h2 [2] x 3 h2 [2] x 4 h2 [1] [2] (x43 + x2 )h2 [1] [2] (x44 + x5 )h2 [1] x 1 h2 .. . [1] x 8 h2 [2] x 6 h2 [2] x 8 h2 [2] x 9 h2 [1] [2] (x41 + x7 )h2 [1] [2] (x42 + x10 )h2 [1] x33 h2 .. . [2] x40 h2 [2] x11 h2 [2] x13 h2 [2] x14 h2 [1] [2] (x49 + x12 )h2 [1] [2] (x50 + x15 )h2 x 8 h1 [2] x 1 h1 [2] x 2 h1 [2] x 4 h1 [1] [2] (x41 + x3 )h1 [1] [2] (x42 + x5 )h1 x33 h1 .. . x40 h1 [2] x 6 h1 [2] x 7 h1 [2] x 9 h1 [1] [2] (x49 + x8 )h1 [1] [2] (x50 + x10 )h1 x25 h1 .. . x32 h1 [2] x11 h1 [2] x12 h1 [2] x14 h1 [1] [2] (x47 + x13 )h1 [1] [2] (x48 + x15 )h1 [1] x24 h3 [2] x 1 h3 [2] x 3 h3 [2] x 5 h3 [1] [2] (x45 + x2 )h3 [1] [2] (x46 + x4 )h3 [1] x 9 h3 .. . [1] x16 h3 [2] x 6 h3 [2] x 8 h3 [2] x10 h3 [1] [2] (x43 + x7 )h3 [1] [2] (x44 + x9 )h3 [1] x 1 h3 .. . [2] x 8 h3 [2] x11 h3 [2] x13 h3 [2] x15 h3 [1] [2] (x41 + x12 )h3 [1] [2] (x42 + x14 )h3 Database 4 [1] x25 h4 .. . [1] x32 h4 [2] x 2 h4 [2] x 3 h4 [2] x 5 h4 [1] [2] (x47 + x1 )h4 [1] [2] (x48 + x4 )h4 [1] x17 h4 .. . [1] x24 h4 [2] x 7 h4 [2] x 8 h4 [2] x10 h4 [1] [2] (x45 + x6 )h4 [1] [2] (x46 + x9 )h4 [1] x 9 h4 .. . [2] x16 h4 [2] x12 h4 [2] x13 h4 [2] x15 h4 [1] [2] (x43 + x11 )h4 [1] [2] (x44 + x14 )h4 Database 5 [1] x33 h5 .. . [1] x40 h5 [2] x 2 h5 [2] x 4 h5 [2] x 5 h5 [1] [2] (x49 + x1 )h5 [1] [2] (x50 + x3 )h5 [1] [1] x25 h5 .. . [1] x32 h5 [2] x 7 h5 [2] x 9 h5 [2] x10 h5 [1] [2] (x47 + x6 )h5 [1] [2] (x48 + x8 )h5 [1] x17 h5 .. . [2] x24 h5 [2] x12 h5 [2] x14 h5 [2] x15 h5 [1] [2] (x45 + x11 )h5 [1] [2] (x46 + x13 )h5 [1] [1] [1] [2] TABLE I T HE QUERY TABLE FOR THE RETRIEVAL OF W1 . = KL1 + ≥ KL1 + 1 N K 1 N K   [2] [2] X H(AΩ |QΩ , W1 ) X [2] [2] H(AΩ |Q1:N , W1 ) (31) Ω:|Ω|=K (32) Ω:|Ω|=K K [2] [2] H(A1:N |Q1:N , W1 ) (33) N K [2] [2] (34) = KL1 + I(W2:M ; A1:N , Q1:N |W1 ) N K [2] [2] = KL1 + I(W2:M ; W2 , A1:N , Q1:N |W1 ) (35) N K [2] [2] = KL1 + (KL2 +I(W3:M ; A1:N , Q1:N |W1:2 )) (36) N K K [2] [2] = KL1 + KL2 + I(W3:M ; A1:N , Q1:N |W1:2 ), N N (37) ≥ KL1 + where (29), the privacy equality (30), and (33) follow from (22), constraint in (3), and Han’s inP [m] [m] 1 H(AΩ |W∆ , Q1:N ) ≥ N (K ) Ω:|Ω|=K [m] [m] K N H(A1:N |W∆ , Q1:N ), respectively. The recursive application of (26)-(37) on the last two terms in (37) gives R= KE[L] ≤  E[D] K L1 + KE[L] K N L2 + ··· + K M −1 L N M −1 M  (38) leading to the capacity expression in Theorem 1. VI. D ISCUSSION In this paper, we presented a capacity-achieving scheme for semantic PIR from MDS-coded databases. An alternative description of the scheme in Section IV is as follows. Each MDS coded database contains Li coded bits representing Wi with L1 ≥ L2 ≥ · · · ≥ LM . Initially, consider the first LM coded bits of all M messages and perform equal-length PIR as described in [44]. Then, consider the next LM −1 − LM coded bits of all messages except WM and perform equal-length PIR with the remaining M − 1 messages. Continue this process until the last L1 − L2 coded bits of W1 and perform PIR with a single message. This process yields the same outcome described in the scheme in Section IV. Nevertheless, the description of the scheme in Section IV provides a systematic method of calculating the subpacketization based on the message sizes {Li }M i=1 , such that the scheme described on a single subpacket is applied repeatedly throughout the retrieval process in the same way. This is in contrast to the alternative description, where the subpacketization changes from one block to another (blocks of sizes LM and Li − Li+1 bits for i = 1, . . . , M − 1). In summary, the description of the scheme in Section IV uses a fixed subpacketization, while that of the alternative scheme uses irregular subpacketization. R EFERENCES [1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan. Private information retrieval. Journal of the ACM, 45(6):965–981, November 1998. [2] H. Sun and S. A. Jafar. The capacity of private information retrieval. IEEE Trans. on Info. Theory, 63(7):4075–4088, July 2017. [3] H. Sun and S. A. Jafar. The capacity of robust private information retrieval with colluding databases. IEEE Trans. on Info. Theory, 64(4):2361–2370, April 2018. [4] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, C. Hollanti, and S. El Rouayheb. Private information retrieval schemes for coded data with arbitrary collusion patterns. In IEEE ISIT, June 2017. [5] R. Bitar and S. El Rouayheb. Staircase-PIR: Universally robust private information retrieval. In IEEE ITW, pages 1–5, November 2018. [6] H. Sun and S. A. Jafar. The capacity of symmetric private information retrieval. IEEE Transactions on Information Theory, 65(1):322–329, January 2019. [7] Q. Wang, H. Sun, and M. Skoglund. Symmetric private information retrieval with mismatched coded messages and randomness. In IEEE ISIT, pages 365–369, July 2019. [8] T. Guo, R. Zhou, and C. Tian. On the information leakage in private information retrieval systems. Available at arXiv: 1909.11605. [9] K. Banawan and S. Ulukus. Multi-message private information retrieval: Capacity results and near-optimal schemes. IEEE Trans. on Info. Theory, 64(10):6842–6862, October 2018. [10] K. Banawan and S. Ulukus. The capacity of private information retrieval from Byzantine and colluding databases. IEEE Trans. on Info. Theory, 65(2):1206–1219, February 2019. [11] R. Tandon. The capacity of cache aided private information retrieval. In Allerton Conference, October 2017. [12] Y.-P. Wei, K. Banawan, and S. Ulukus. Fundamental limits of cacheaided private information retrieval with unknown and uncoded prefetching. IEEE Trans. on Info. Theory, 65(5):3215–3232, May 2019. [13] Y.-P. Wei, K. Banawan, and S. Ulukus. Cache-aided private information retrieval with partially known uncoded prefetching: Fundamental limits. IEEE JSAC, 36(6):1126–1139, June 2018. [14] S. Kumar, A. G. i Amat, E. Rosnes, and L. Senigagliesi. Private information retrieval from a cellular network with caching at the edge. IEEE Trans. on Communications, 67(7):4900–4912, July 2019. [15] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson. Private information retrieval with side information. IEEE Trans. on Info. Theory, 66(4):2032–2043, April 2020. [16] Z. Chen, Z. Wang, and S. Jafar. The capacity of T -private information retrieval with private side information. Available at arXiv:1709.03022. [17] Y.-P. Wei, K. Banawan, and S. Ulukus. The capacity of private information retrieval with partially known private side information. IEEE Trans. on Info. Theory, 65(12):8222–8231, December 2019. [18] S. P. Shariatpanahi, M. J. Siavoshani, and M. A. Maddah-Ali. Multimessage private information retrieval with private side information. In IEEE ITW, pages 1–5, November 2018. [19] S. Li and M. Gastpar. Converse for multi-server single-message PIR with side information. Available at arXiv:1809.09861. [20] H. Sun and S. A. Jafar. The capacity of private computation. IEEE Trans. on Info. Theory, 65(6):3880–3897, June 2019. [21] M. Mirmohseni and M. A. Maddah-Ali. Private function retrieval. In IWCIT, pages 1–6, April 2018. [22] Z. Chen, Z. Wang, and S. Jafar. The asymptotic capacity of private search. In IEEE ISIT, June 2018. [23] M. A. Attia, D. Kumar, and R. Tandon. The capacity of private information retrieval from uncoded storage constrained databases. Available at arXiv:1805.04104v2. [24] C. Tian, H. Sun, and J. Chen. Capacity-achieving private information retrieval codes with optimal message size and upload cost. IEEE Trans. on Info. Theory, 65(11):7613–7627, Nov 2019. [25] Y.-P. Wei and S. Ulukus. The capacity of private information retrieval with private side information under storage constraints. IEEE Trans. on Info. Theory, 66(4):2023–2031, April 2020. [26] K. Banawan, B. Arasli, and S. Ulukus. Improved storage for efficient private information retrieval. In IEEE ITW, August 2019. [27] C. Tian. On the storage cost of private information retrieval. Available at arXiv:1910.11973. [28] Y.-P. Wei, B. Arasli, K. Banawan, and S. Ulukus. The capacity of private information retrieval from decentralized uncoded caching databases. Information, 10, December 2019. [29] K. Banawan, B. Arasli, Y. P. Wei, and S. Ulukus. The capacity of private information retrieval from heterogeneous uncoded caching databases. IEEE Trans. on Info. Theory, 66(6):3407–3416, 2020. [30] N. Raviv and I. Tamo. Private information retrieval in graph based replication systems. In IEEE ISIT, June 2018. [31] K. Banawan and S. Ulukus. Private information retrieval from nonreplicated databases. In IEEE ISIT, pages 1272–1276, July 2019. [32] K. Banawan and S. Ulukus. Private information retrieval through wiretap channel II: Privacy meets security. IEEE Trans. on Info. Theory, 66(7):4129–4149, 2020. [33] H. Sun and S. A. Jafar. Optimal download cost of private information retrieval for arbitrary message length. IEEE Trans. on Info. Forensics and Security, 12(12):2920–2932, December 2017. [34] Q. Wang, H. Sun, and M. Skoglund. The capacity of private information retrieval with eavesdroppers. IEEE Trans. on Info. Theory, 65(5):3198– 3214, May 2019. [35] H. Yang, W. Shin, and J. Lee. Private information retrieval for secure distributed storage systems. IEEE Trans. on Info. Forensics and Security, 13(12):2953–2964, December 2018. [36] Z. Jia, H. Sun, and S. Jafar. Cross subspace alignment and the asymptotic capacity of X-secure T -private information retrieval. IEEE Trans. on Info. Theory, 65(9):5783–5798, September 2019. [37] R. Zhou, C. Tian, H. Sun, and T. Liu. Capacity-achieving private information retrieval codes from MDS-coded databases with minimum message size. Available at arXiv: 1903.08229. [38] K. Banawan and S. Ulukus. Asymmetry hurts: Private information retrieval under asymmetric-traffic constraints. IEEE Trans. on Info. Theory, 65(11):7628–7645, November 2019. [39] K. Banawan and S. Ulukus. Noisy private information retrieval: On separability of channel coding and information retrieval. IEEE Trans. on Info. Theory, 65(12):8232–8249, December 2019. [40] R. G. L. D’Oliveira and S. El Rouayheb. One-shot PIR: Refinement and lifting. IEEE Trans. on Info. Theory, 66(4):2443–2455, April 2020. [41] R. Tajeddine, A. Wachter-Zeh, and C. Hollanti. Private information retrieval over random linear networks. Available at arXiv:1810.08941. [42] Z. Wang, K. Banawan, and S. Ulukus. Private set intersection: A multimessage symmetric private information retrieval perspective. Available at arXiv: 1912.13501. [43] Z. Wang, K. Banawan, and S. Ulukus. Multi-party private set intersection: An information-theoretic approach. Available at arXiv: 2008.07504. [44] K. Banawan and S. Ulukus. The capacity of private information retrieval from coded databases. IEEE Trans. on Info. Theory, 64(3):1945–1956, March 2018. [45] R. Freij-Hollanti, O. Gnilke, C. Hollanti, and D. Karpuk. Private information retrieval from coded databases with colluding servers. SIAM Journal on Applied Algebra and Geometry, 1(1):647–664, 2017. [46] H. Sun and S. A. Jafar. Private information retrieval from MDS coded data with colluding servers: Settling a conjecture by Freij-Hollanti et al. IEEE Trans. on Info. Theory, 64(2):1000–1022, February 2018. [47] Y. Zhang and G. Ge. A general private information retrieval scheme for MDS coded databases with colluding servers. Designs, Codes and Cryptography, 87(11), November 2019. [48] Y. Zhang and G. Ge. Multi-file private information retrieval from MDS coded databases with colluding servers. Available at arXiv: 1705.03186. [49] R. Tandon, M. Abdul-Wahid, F. Almoualem, and D. Kumar. PIR from storage constrained databases - coded caching meets PIR. IEEE ICC, May 2018. [50] T. Chan, S. Ho, and H. Yamamoto. Private information retrieval for coded storage. In IEEE ISIT, June 2015. [51] R. Tajeddine, O. W. Gnilke, and S. El Rouayheb. Private information retrieval from mds coded data in distributed storage systems. IEEE Tran. on Info.Theory, 64(11):7081–7093, November 2018. [52] R. Tajeddine and S. El Rouayheb. Robust private information retrieval on coded data. In IEEE ISIT, June 2017. [53] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, and C. Hollanti. Private information retrieval from coded storage systems with colluding, Byzantine, and unresponsive servers. IEEE Trans. on Info. Theory, 65(6):3898–3906, June 2019. [54] S. Vithana, K. Banawan, and S. Ulukus. Semantic private information retrieval. Available at arXiv: 2003.13667.