[PDF] Download Cost of Private Updating

Abstract

We consider the problem of privately updating a message out of K messages from N replicated and non-colluding databases. In this problem, a user has an outdated version of the message \hat{W}_\theta of length L bits that differ from the current version W_\theta in at most f bits. The user needs to retrieve W_\theta correctly using a private information retrieval (PIR) scheme with the least number of downloads without leaking any information about the message index \theta to any individual database. To that end, we propose a novel achievable scheme based on \emph{syndrome decoding}. Specifically, the user downloads the syndrome corresponding to W_\theta, according to a linear block code with carefully designed parameters, using the optimal PIR scheme for messages with a length constraint. We derive lower and upper bounds for the optimal download cost that match if the term \log_2\left(\sum_{i=0}^f \binom{L}{i}\right) is an integer. Our results imply that there is a significant reduction in the download cost if f < \frac{L}{2} compared with downloading W_\theta directly using classical PIR approaches without taking the correlation between W_\theta and \hat{W}_\theta into consideration.

Full PDF

aa r X i v : . [ c s . I T ] F e b Download Cost of Private Updating

Bryttany Herren , Ahmed Arafa , and Karim Banawan Electrical and Computer Engineering Department, University of North Carolina at Charlotte, USA Department of Electrical Engineering, Alexandria University, Egypt

Abstract —We consider the problem of privately updating amessage out of K messages from N replicated and non-colludingdatabases. In this problem, a user has an outdated version of themessage ˆ W θ of length L bits that differ from the current version W θ in at most f bits. The user needs to retrieve W θ correctlyusing a private information retrieval (PIR) scheme with the leastnumber of downloads without leaking any information about themessage index θ to any individual database. To that end, wepropose a novel achievable scheme based on syndrome decoding .Speciﬁcally, the user downloads the syndrome corresponding to W θ , according to a linear block code with carefully designedparameters, using the optimal PIR scheme for messages witha length constraint. We derive lower and upper bounds for theoptimal download cost that match if the term log (cid:16)P fi =0 (cid:0) Li (cid:1)(cid:17) isan integer. Our results imply that there is a signiﬁcant reductionin the download cost if f < L compared with downloading W θ directly using classical PIR approaches without taking thecorrelation between W θ and ˆ W θ into consideration. I. I

NTRODUCTION

The problem of private information retrieval (PIR), intro-duced by Chor et al. in [1], seeks to ﬁnd the most efﬁcient wayfor a user to privately retrieve a single message from a set of K messages from N fully replicated and non-communicatingdatabases. PIR schemes are designed to download a mix-ture of all K messages, with the least number of overheaddownloaded bits, such that no single database can infer theidentity of the desired message. The user accomplishes thistask by sending a query to each database. The databasesrespond truthfully to the submitted query with an answerstring. The user can reconstruct the desired message fromjointly decoding the returned answer strings. Recently, theproblem of PIR has received a growing interest from theinformation and coding theory communities. The classical PIRproblem is re-formulated using information-theoretic measuresin the seminal work of Sun-Jafar [2]. In there, the performancemetric of the PIR scheme is the retrieval rate, which is theratio of the number of the desired message symbols to thetotal number of downloaded bits. The supremum of this ratiois denoted by the PIR capacity, C . Sun and Jafar characterizethe PIR capacity of the classical PIR model to be C = (cid:18) N + 1 N + · · · + 1 N K − (cid:19) − . (1)Following [2], the capacity (or its reciprocal, the normalizeddownload cost) of many variations of the problem have beeninvestigated, see, e.g., [3]–[17]. In all these works, the user is assumed to have no infor-mation about the desired message prior to retrieval. Thus, thequeries are designed independently from the message contents.This is not always the case in practice. To see that, considerthe following classical motivational example of PIR: in thestock market, investors need to privately retrieve some ofthe stock records, since showing interest in a speciﬁc recordmay undesirably affect its value. PIR is a natural solution tothis problem. Now, consider the case when an investor hasalready retrieved a speciﬁc stock record some time ago butthis record has been changed. The investor needs to updatethe record at his/her side. A trivial solution to this problem isto re-apply the original PIR scheme again. Nevertheless, thissolution overlooks the fact that stock records are correlated in time. Another example arises in the context of privatefederated submodel learning [18], in which a user needs toretrieve the up-to-date desired submodel without leaking anyinformation about its identity. The weights of each submodelare usually correlated in time as in the stock market example.In both examples, it is interesting to investigate whether ornot the investor (user) can exploit the correlation between theoutdated record (submodel) and its up-to-date counterpart todrive down the download cost. In this work, we focus ourattention on a speciﬁc type of correlation, in which the up-to-date message is a distorted version of the outdated messageaccording to a Hamming distortion measure. The most closelyrelated works to this problem are the PIR problems with sideinformation, e.g., [19]–[25]. In all these works, the user hasside information in the form of a subset of undesired messages,which are utilized to assist in privately retrieving the desiredmessage. This is different from our setting, in which theuser possesses side information in the form of an outdated desired message. Furthermore, these works differ from eachother in whether the privacy of the side information shouldbe maintained or not. This is different from our problem inwhich the identity of the desired and side information is thesame, and therefore the privacy constraint in our problem ismodiﬁed to reﬂect this fact.In this paper, we introduce the problem of private updating for a message out of a K -message library from N replicatedand non-colluding databases. In this problem, the user hasan outdated version of the desired message ˆ W θ , and wishesto update it to its up-to-date version W θ . Furthermore, theuser has information about the maximum Hamming distance f between the up-to-date message and its outdated counterpart,i.e., the user possesses ˆ W θ , which differs in at most f bitsrom the desired up-to-date message W θ . Based on ˆ W θ and f , the user needs to design a query set to reliably andprivately decode the up-to-date version of the desired message W θ with the least number of downloaded bits. Equivalently,the user needs to privately retrieve an auxiliary messagethat corresponds to the ﬂipped bit positions in the desiredmessage. Similar to the works of [26], [27], we assume that thedatabases can construct a mapping from the original library ofmessages into a more appropriate form that can assist the userin the retrieval process. We aim at characterizing the optimaldownload cost needed to update ˆ W θ to W θ without disclosingthe desired message index θ to any of the databases.To that end, we propose a novel achievable scheme that isbased on the syndrome decoding idea introduced in [28], andadapt it to our setting to exploit the correlation between W θ and ˆ W θ . Hence, syndrome decoding is used to compress thedesired message based on the user’s side information (i.e., theoutdated message ˆ W θ ). More speciﬁcally, the databases applya linear transformation to the stored library of messages usingthe parity check matrix of a linear block code with carefullychosen parameters. The existence of such a code can be readilyinferred from the Gilbert-Varshamov and the Hamming bounds[29]. This transformation, in effect, maps the messages intotheir corresponding syndromes. Thus, the problem is reducedto retrieving the auxiliary messages (i.e., the syndrome repre-sentation) that comprises of (cid:6) ¯ L (cid:7) = l log (cid:16)P fi =0 (cid:0) Li (cid:1)(cid:17)m ≤ L bits, where L is the original message length. This enablesus to directly apply the PIR scheme in [30] to the auxiliarymessages of length (cid:6) ¯ L (cid:7) , which is optimal under messagelength constraints. We conﬁrm the validity of our proposedscheme by deriving a matching converse proof. Our converseproof is inspired by the converse proofs of the PIR problemwith side information in [19], [20], with the main differencebeing the fact that the side information in our case is theoutdated message ˆ W θ in contrast to the cached messages.Consequently, we show that the optimal download cost, ¯ D L , isbounded by l ¯ LC m ≤ ¯ D L ≤ l ⌈ ¯ L ⌉ C m . Our achievable scheme isoptimal if ¯ L is an integer, otherwise the gap between the upperand lower bounds is upper bounded by 2 bits. This justiﬁes theefﬁcacy of using syndromes as a message mixing techniquein our setting. Furthermore, our results show that performingdirect PIR on the original library of messages is strictly sub-optimal as long as the maximum Hamming distance f < L .II. S YSTEM M ODEL

We consider a classical PIR problem with K independent,uncoded, messages W , · · · , W K , with each message consist-ing of L independent and uniformly distributed bits. We have H ( W i ) = L, ≤ i ≤ K, (2) H ( W , · · · , W K ) = H ( W ) + · · · + H ( W K ) . (3)The K messages are stored in N replicated and non-communicating databases. The user (retriever) has a local copyof one of the messages whose index θ ∈ [ K ] is known to the user, but not the database. However, this message storedlocally is outdated , and the user wishes to update it so that itis consistent with the copies in the databases without revealingto any of the databases what the message index is. This settingdeﬁnes the private updating problem .Since each message is a string of L bits, the problem canbe formulated as privately determining which subset of themessage bits need to be ﬂipped in order to fully update it. Tomodel this, we use ˆ W θ to represent the locally stored outdatedmessage, ¯ W θ to represent the subset of bit indices that need tobe ﬂipped, and f to represent the maximum Hamming distancebetween W θ and ˆ W θ . Therefore, in order to update message θ the user needs to ﬂip at most f bits, i.e., ¯ W θ takes a valueout of P fi =0 (cid:0) Li (cid:1) choices. We assume that such choices areuniformly distributed and independently realized from ˆ W θ .Based on this model, the following holds: H ( W θ ) = H ( ˆ W θ ) = L, (4) H ( ¯ W θ ) = log f X i =0 (cid:18) Li (cid:19)! , ¯ L, (5) H ( W θ | ˆ W θ ) = H ( ¯ W θ | ˆ W θ ) = ¯ L, (6) H ( ¯ W θ | ˆ W θ , W θ ) = 0 , (7) | ¯ W θ | ≤ f ≤ L, (8)where | · | denotes cardinality. For the purposes of this paper,we assume that the maximum Hamming distance f betweenthe outdated and updated message is known to the user.In order to retrieve W θ , the user sends a set of queries Q [ ˆ W θ ,f ]1 , . . . , Q [ ˆ W θ ,f ] N to the N databases to efﬁciently obtain ¯ W θ . The queries are generated according to ˆ W θ and f , andare jointly independent of the realizations of the [ K ] \{ θ } messages and ¯ W θ given ˆ W θ . Therefore we have I (cid:16) W [ K ] \{ θ } , ¯ W θ ; Q [ ˆ W θ ,f ]1: N (cid:12)(cid:12)(cid:12) ˆ W θ (cid:17) = 0 . (9)Upon receiving the query Q [ ˆ W θ ,f ] n , the n th database replieswith an answering string A [ ˆ W θ ,f ] n , which is a function of Q [ ˆ W θ ,f ] n and all the K messages stored. Therefore, ∀ θ ∈ [ K ] , ∀ n ∈ [ N ] , we have H (cid:16) A [ ˆ W θ ,f ] n (cid:12)(cid:12)(cid:12) Q [ ˆ W θ ,f ] n , W K (cid:17) = 0 . (10)To ensure that individual databases do not know whichmessage is being updated, we need to satisfy the following privacy constraint , ∀ n ∈ [ N ] , ∀ k ∈ [ K ] : (cid:16) Q [ ˆ W ,f ] n , A [ ˆ W ,f ] n , ˆ W , W K (cid:17) ∼ (cid:16) Q [ ˆ W k ,f ] n , A [ ˆ W k ,f ] n , ˆ W k , W K (cid:17) , (11)where ∼ denotes statistical equivalence. After receiving theanswering strings A [ ˆ W θ ,f ]1: N from all the N databases, the [ K ] denotes the set { , , . . . , K } . This is true if message θ has been previously obtained in a private manner. We use the notation x S to denote the collection of { x i , i ∈ S } . Fig. 1: Download cost of private updating with L = 32 bits, N = 2 databases, and K = 10 messages.user needs to decode the desired information W θ with nouncertainty, satisfying the following correctness constraint : H (cid:16) W θ (cid:12)(cid:12)(cid:12) A [ ˆ W θ ,f ]1: N , Q [ ˆ W θ ,f ]1: N , ˆ W θ (cid:17) = 0 . (12)For ﬁxed N , K , and f , a pair ( ¯ D, L ) is achievable ifthere exists a private updating scheme for messages of length L bits long satisfying the privacy constraint (11) and thecorrectness constraint (12). In this pair, ¯ D represents theexpected number of downloaded bits received from the N databases independently via the answering strings A [ ˆ W k ,f ]1: N , i.e., ¯ D = N X n =1 H (cid:16) A [ ˆ W θ ,f ] n (cid:17) . (13)Our goal is to characterize the optimal download cost ¯ D L forﬁxed arbitrary N , K , and f . That is, to solve for ¯ D L = min (cid:8) ¯ D : ( ¯ D, L ) is achievable (cid:9) . (14)Clearly, the user can ignore its outdated message ˆ W θ andre-download the whole new message W θ using standard PIRschemes [2]. In the next section, however, we show that wecan use ˆ W θ to do strictly better.III. M AIN R ESULT

We present our main result in the following theorem:

Theorem 1

In the private updating problem, we have (cid:24) ¯ LC (cid:25) ≤ ¯ D L ≤ (cid:24) ⌈ ¯ L ⌉ C (cid:25) , (15) with C and ¯ L deﬁned in (1) and (5), respectively. Fig. 1 shows the efﬁciency of our result by plotting theupper and lower bounds of the download cost for the privateupdating problem with L = 32 bits, N = 2 databases, and K = 10 messages. We show the ﬁrst inequality in (15) by presenting a converseproof for Theorem 1 in Section IV, which is based on similararguments to those used in cache-aided PIR settings [20]. Thesecond inequality in (15) is shown by a novel achievabilityscheme for Theorem 1 in Section V, which is based ondistributed source coding [28]. We now have some remarks. Remark 1

From (5) and (8) , it follows that (cid:6) ¯ L (cid:7) = L for allvalues of f ≥ L ; and that (cid:6) ¯ L (cid:7) < L for all values of f < L . This means that there is a

Hamming distance threshold of L beyond which there is no advantage to using a private updatingstrategy, and below which there will always be some savingsin download cost (see Fig. 1). Remark 2 If L and f are such that ¯ L = ⌈ ¯ L ⌉ then thetwo bounds in Theorem 1 match. We will see that this holdsif a perfect code by which the queries are sent exists (cf.Section V). Otherwise, if ¯ L < ⌈ ¯ L ⌉ , one can show that thetwo bounds are within 2 bits for N ≥ databases (see [30,Section 7.2]). IV. P

ROOF OF M AIN R ESULT : C

ONVERSE

In this section, we show that ⌈ ¯ L/C ⌉ serves as a generallower bound for the download cost in (13). To do so, we provetwo useful lemmas, which were previously used in the cache-aided PIR setting of [20], for the case of our private updatingproblem. The two lemmas are then combined to prove thegeneral lower bound. The key difference between our lemmasand those in [20] is that rather than a set of cached messages,the user is given an outdated message ˆ W θ , requiring carefulhandling of the correlation between W θ and ˆ W θ . Without lossof generality, we re-label the messages such that θ = 1 . Lemma 1 (Interference lower bound)

In the private updat-ing problem, the interference from undesired messages withinthe answering strings, ¯ D − ¯ L , satisﬁes ¯ D − ¯ L ≥ I (cid:16) W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N (cid:12)(cid:12)(cid:12) W , ˆ W (cid:17) . (16) Proof:

We start with the right hand side of (16), I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W , ˆ W )= I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N , W | ˆ W ) − I ( W K ; W | ˆ W ) (17) = I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | ˆ W )+ I ( W K ; W | Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N , ˆ W ) (18) (12) = I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | ˆ W ) (19) (9) = I ( W K ; A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , ˆ W ) (20) = H ( A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , ˆ W ) − H ( A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , W K , ˆ W ) (21) This can be readily shown using the binomial theorem. Details are omitted. Perfect codes are those that attain the Hamming bound with equality [29]. = H ( A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , ˆ W ) − H ( A [ ˆ W ,f ]1: N , W | Q [ ˆ W ,f ]1: N , W K , ˆ W ) (22) ≤ H ( A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , ˆ W ) − H ( W | Q [ ˆ W ,f ]1: N , W K , ˆ W ) (23) (9) = H ( A [ ˆ W ,f ]1: N | Q [ ˆ W ,f ]1: N , ˆ W ) − H ( W | W K , ˆ W ) (24) (13) , (6) ≤ ¯ D − ¯ L. (25)This concludes the proof. (cid:4) Note that if privacy was not a constraint, then ¯ D = ¯ L and the interference from undesired messages would be non-existent. However, when the privacy constraint is present, ¯ D − ¯ L characterizes the number of bits that will be downloadedand used as side information to preserve privacy from thedatabases in a given scheme. Lemma 2 (Induction lemma)

For all k ∈ { , . . . , K } , themutual information term in Lemma 1 can be inductively lowerbounded as I (cid:16) W k : K ; Q [ ˆ W k − ,f ]1: N , A [ ˆ W k − ,f ]1: N (cid:12)(cid:12)(cid:12) W k − , ˆ W k − (cid:17) ≥ N I (cid:16) W k +1: K ; Q [ ˆ W k ,f ]1: N , A [ ˆ W k ,f ]1: N (cid:12)(cid:12)(cid:12) W k , ˆ W k (cid:17) + ¯ LN . (26)

Proof:

We start with the left hand side of (26), I ( W k : K ; Q [ ˆ W k − ,f ]1: N , A [ ˆ W k − ,f ]1: N | W k − , ˆ W k − ) ≥ N N X n =1 I ( W k : K ; Q [ ˆ W k − ,f ] n , A [ ˆ W k − ,f ] n | W k − , ˆ W k − ) (27) (11) = 1 N N X n =1 I ( W k : K ; Q [ ˆ W k ,f ] n , A [ ˆ W k ,f ] n | W k − , ˆ W k ) (28) (9) = 1 N N X n =1 I ( W k : K ; A [ ˆ W k ,f ] n | W k − , ˆ W k , Q [ ˆ W k ,f ] n ) (29) (10) = 1 N N X n =1 H ( A [ ˆ W ,f ] n | W k − , ˆ W k , Q [ ˆ W ,f ] n ) (30) ≥ N N X n =1 H ( A [ ˆ W ,f ] n | W k − , ˆ W k , Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: n − ) (31) (10) = 1 N N X n =1 I ( W k : K ; A [ ˆ W ,f ] n | W k − , ˆ W k , Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: n − ) (32) = 1 N I ( W k : K ; A [ ˆ W ,f ]1: N | W k − , ˆ W k , Q [ ˆ W ,f ]1: N ) (33) (9) = 1 N I ( W k : K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W k − , ˆ W k ) (34) (12) = 1 N I ( W k : K ; W k , Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W k − , ˆ W k ) (35) = 1 N I ( W k : K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W k , ˆ W k )+ 1 N I ( W k : K ; W k | W k − , ˆ W k ) (36) (6) = 1 N I ( W k +1: K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W k , ˆ W ) + ¯ LN . (37)This concludes the proof. (cid:4)

We now apply the result of Lemma 2 recursively on that ofLemma 1 to get the general lower bound.

Lemma 3

The optimal private updating download cost satis-ﬁes the following lower bound: ¯ D L ≥ (cid:24) ¯ L (cid:18) N + · · · + 1 N K − (cid:19)(cid:25) = (cid:24) ¯ LC (cid:25) . (38) Proof:

Any private updating scheme’s download cost satisﬁesthe following series of inequalities: ¯ D (16) ≥ ¯ L + I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W , ˆ W ) (39) (26) ≥ ¯ L + ¯ LN + 1 N I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W , ˆ W ) (40) (26) ≥ ¯ L + ¯ LN + ¯ LN + 1 N I ( W K ; Q [ ˆ W ,f ]1: N , A [ ˆ W ,f ]1: N | W , ˆ W ) (41) (26) ≥ . . . (42) (26) ≥ ¯ L (cid:18) N + · · · + 1 N K − (cid:19) (1) = ¯ LC . (43)Since (43) lower bounds the download cost ¯ D for any privateupdating scheme, it also lower bounds the download cost ofthe optimal private updating scheme ¯ D L . Finally, since ¯ D L isan integer, we take the ceiling of (43) to get (38). (cid:4) This shows that the ﬁrst inequality of (15) holds, andconcludes the converse proof.V. P

ROOF OF THE M AIN R ESULT : A

CHIEVABILITY

Our achievability scheme makes use of the correlationbetween W θ and ˆ W θ through the knowledge of their maximumHamming distance f in order to reduce the download cost.This approach is related to the problem tackled in [28] (withoutprivacy constraints), in which a source is compressed giventhat it is correlated with some side information that is availableonly at the decoder. The retrieving user represents the decoderin our case, with side information ˆ W θ . By the Slepian-Wolfcoding theorem [31], one can noiselessly compress the source W θ at the rate of H ( W θ | ˆ W θ ) = ¯ L . The compressed sourceis treated as a new message to be downloaded using a PIRscheme, as opposed to downloading the whole message W θ .Such scheme, however, has a message length constraint (unlikemost of the PIR works in the literature). For that reason, weleverage tools from the PIR scheme with arbitrary messagelength in [30] to accomplish our task. The details are illustratedin the motivating examples below. A. Example: N = 2 , K = 2 , L = 3 , and f = 1 In this example, we have ¯ L = log (1 + 3) = 2 , and C =2 / . We show that ¯ D = (cid:6) ⌈ ¯ L ⌉ /C (cid:7) = 3 bits is achievable. Werst start by constructing a [3 , , linear block code, whichis in this case a repetition code with generator matrix G andparity check matrix H given by G = (cid:2) (cid:3) , H = (cid:20) (cid:21) . (44)Note that such code is capable of correcting at most f = 1 error. The syndromes associated with this code are s ∈{ , , , } . Observe that the length of s is exactly ⌈ ¯ L ⌉ .Instead of requesting W θ , the user retrieves the index of thecoset in which W θ resides in the code’s standard array. Thatis, its corresponding syndrome s θ = W θ H T . (45)The user then compares ˆ W θ to all the words in that coset,and decodes W θ as the one closest in Hamming distance.This is guaranteed to yield the unique correct message [28].Therefore, the syndrome s θ efﬁciently represents the ﬂippedbits’ indices ¯ W θ , and one is able to reduce the effectivemessage length from L = 3 to ⌈ ¯ L ⌉ = 2 by dealing withthe syndrome s θ instead of W θ .Let W = [ a , a , a ] , and W = [ b , b , b ] . The syn-dromes (the new messages) are given by s = W H T = (cid:2) a + a a + a (cid:3) , (cid:2) ¯ a ¯ a (cid:3) , (46) s = W H T = (cid:2) b + b b + b (cid:3) , (cid:2) ¯ b ¯ b (cid:3) . (47)Assume θ = 1 . Since ⌈ ¯ L ⌉ = N K − , we can apply a non-symmetric PIR scheme as follows to decode s [30]:Database Database a , ¯ b ¯ a + ¯ b This has a download cost of ¯ D = 3 bits, which is optimal inthis case since it meets the converse bound.The repetition code used in this example is a perfect code. While this makes ¯ L an integer, and meets the converse bound,perfect codes are scarce. In the next example, we show howthe proposed scheme performs with non-perfect codes. B. Example: N=2, K=2, L=5, and f=1

In this example, we have ¯ L = log (1 + 5) = 2 . , and C = 2 / . We show that ¯ D = (cid:6) ⌈ ¯ L ⌉ /C (cid:7) = 5 bits is achievable.As in the previous example, we start by constructing a [5 , , linear block code. Differently though, this is not a repetitioncode, and is characterized by G = (cid:20) (cid:21) , H =   . (48)The syndromes s have length ⌈ ¯ L ⌉ . Speciﬁcally, s = W H T = (cid:2) a + a + a a + a + a a + a (cid:3) , (cid:2) ¯ a ¯ a ¯ a (cid:3) , (49) s = W H T = (cid:2) b + b + b b + b + b b + b (cid:3) , (cid:2) ¯ b ¯ b ¯ b (cid:3) . (50) Since ⌈ ¯ L ⌉ = N K − +1 , we follow the methodology in [30];we privately download N K − = 2 bits ( ¯ a and ¯ a ) using thenon-symmetric PIR scheme in the previous example, and thenprivately download the remaining bit ( ¯ a ) using the schemein [32]. The technique in [32] in this case is such that the userrequests random linear combinations of [¯ a ¯ b ] from database using a random binary vector h , and the same from database yet with h ′ = h + e θ , where e i is the i th standard basisvector. The full PIR scheme is as follows:Database Database a , ¯ b ¯ a + ¯ b h ¯ a + h ¯ b ( h + 1)¯ a + h ¯ b This has a download cost of ¯ D = 5 bits, which is 1 bit awayfrom the converse bound since the code used is non-perfect. C. The General Scheme

For general N , K , L , and f , we construct an [ L, L −⌈ ¯ L ⌉ , f + 1] linear block code. From the Gilbert-Varshamovbound [29], we know that such a code exists if ⌈ ¯ L ⌉ ≤ f X j =0 (cid:18) Lj (cid:19) . (51)In addition, such a code must satisfy the Hamming bound [29]: f X j =0 (cid:18) Lj (cid:19) ≤ ⌈ ¯ L ⌉ . (52)By the deﬁnition of ¯ L in (5), both (51) and (52) are satisﬁed,and so the code exists and is able to correct f bit ﬂips.Next, we map each message to its corresponding syndromeof the constructed code, which is of length L − ( L − ⌈ ¯ L ⌉ ) = ⌈ ¯ L ⌉ . The user then retrieves the syndrome s θ according to aPIR scheme with N databases, K messages, and ⌈ ¯ L ⌉ messagelength. By [30, Theorem 1], a download cost of (cid:6) ⌈ ¯ L ⌉ /C (cid:7) isachievable in this case. Finally, correctness is guaranteed sincequerying for the syndrome s θ allows the user to decode W θ as the unique word in the syndrome’s coset with the leastHamming distance from ˆ W θ [28].This shows that the second inequality of (15) holds, andconcludes the achievability proof.VI. C ONCLUSIONS AND D ISCUSSIONS

In this work, a novel private updating problem has beenintroduced, in which a user’s outdated message is to beprivately updated by querying a set of replicated and non-colluding databases that have the up-to-date version. Under aHamming distortion measure between the outdated and the up-to-date messages, a syndrome decoding technique is leveragedto compress the number of bits that needs to be downloadedin order to correctly update the message. This has beencombined with PIR schemes with message length constraintsto guarantee privacy. The proposed private updating schemehas been shown to be optimal when the system parametersenable the construction of a perfect code according to whichhe syndrome decoding technique is worked out. In othercases, the achievable download cost has been shown to bewithin at most bits from a derived converse bound.The model of this paper assumes that the Hamming distor-tion between W θ and ˆ W θ is upper bounded by f . If, instead,the Hamming distortion is known to be exactly f ′ , then thedownload cost can be reduced, and using codes to map themessages into syndromes may be unnecessary. To see this,consider an example with N = 2 , K = 2 , L = 8 , and f = 1 .In this case, ⌈ ¯ L ⌉ = 4 , and by Theorem 1, a download cost of bits is achievable.Let us now set f ′ = 1 , i.e., the user knows that W θ and ˆ W θ differ in exactly bit. Assuming θ = 1 , deﬁne ¯ a , a + a + a + a , ¯ a , a + a + a + a , and ¯ a , a + a + a + a ,where a i ’s represent the bits of the desired message W . Theuser then constitutes similar combinations using the bits ofthe outdated message ˆ W to get ˆ a , ˆ a , and ˆ a . Now observethat possessing the new message [¯ a , ¯ a , ¯ a ] is sufﬁcient todetermine the position of the ﬂipped bit by comparing it to [ˆ a , ˆ a , ˆ a ] . For instance, if ¯ a i = ˆ a i , ∀ i , then ˆ W (8) needs tobe ﬂipped. If on the other hand ¯ a i = ˆ a i , ∀ i , then ˆ W (1) needsto be ﬂipped. While if ¯ a = ˆ a , and ¯ a i = ˆ a i , i = 2 , , then ˆ W (4) needs to be ﬂipped, and so on. Therefore, the effectivemessage length is reduced to (as opposed to ⌈ ¯ L ⌉ = 4 ), anda download cost of bits is achievable.The above procedure can be done using a bisection search approach. The user can ﬁrst retrieve a + a + a + a , andcompares it to the sum of the outdated message’s ﬁrst bits.If they are equal, then the error must lie in the last bits of ˆ W . Assuming this is the case, the user downloads a + a ,and compares it to ˆ W (5) + ˆ W (6) . If they too are equal, thenthe error must lie in the last bits of ˆ W . Assuming this isthe case as well, the user ﬁnally downloads a and comparesit to ˆ W (7) . If they are equal, then ˆ W (8) needs to be ﬂipped.We see that this bisection approach has an effective messagelength of log ( L ) = 3 bits. However, since the next querystructure depends on the answers of the previous queries, a multiround PIR scheme needs to devised in this case [33].It would be interesting to extend the results of this paper towork for the case of known distortion f ′ (and generally forother notions of correlation measures between W θ and ˆ W θ ),which may be relevant in certain applications.R EFERENCES[1] B. Chor, E. Kushilevitz, O. Goldreich, and M. Sudan, “Private informa-tion retrieval,”

J. ACM , vol. 45, p. 965–981, Nov. 1998.[2] H. Sun and S. A. Jafar, “The capacity of private information retrieval,”

IEEE Trans. Inf. Theory , vol. 63, pp. 4075–4088, July 2017.[3] K. Banawan and S. Ulukus, “The capacity of private informationretrieval from coded databases,”

IEEE Trans. Inf. Theory , March 2018.[4] H. Sun and S. A. Jafar, “The capacity of symmetric private informationretrieval,”

IEEE Transactions on Information Theory , vol. 65, pp. 322–329, January 2019.[5] K. Banawan and S. Ulukus, “Multi-message private information re-trieval: Capacity results and near-optimal schemes,”

IEEE Trans. on Info.Theory , vol. 64, pp. 6842–6862, October 2018.[6] R. Tajeddine, O. W. Gnilke, D. Karpuk, R. Freij-Hollanti, C. Hollanti,and S. E. Rouayheb, “Private information retrieval schemes for codeddata with arbitrary collusion patterns,” in

Proc. IEEE ISIT , June 2017. [7] Q. Wang and M. Skoglund, “On PIR and symmetric PIR from colludingdatabases with adversaries and eavesdroppers,”

IEEE Trans. Inf. Theory ,vol. 65, pp. 3183–3197, May 2019.[8] C. Tian, H. Sun, and J. Chen, “Capacity-achieving private informationretrieval codes with optimal message size and upload cost,”

IEEE Trans.Inf. Theory , vol. 65, pp. 7613–7627, November 2019.[9] T. Guo, R. Zhou, and C. Tian, “On the information leakage in privateinformation retrieval systems,”

IEEE Trans. Inf. Forensics Security ,vol. 15, pp. 2999–3012, March 2020.[10] K. Banawan and S. Ulukus, “The capacity of private informationretrieval from byzantine and colluding databases,”

IEEE Trans. Inf.Theory , vol. 65, pp. 1206–1219, February 2019.[11] M. A. Attia, D. Kumar, and R. Tandon, “The capacity of privateinformation retrieval from uncoded storage constrained databases,”

IEEETrans. Inf. Theory , vol. 66, pp. 6617–6634, November 2020.[12] H. Sun and S. A. Jafar, “The capacity of private computation,”

IEEETrans. Inf. Theory , vol. 65, pp. 3880–3897, June 2019.[13] S. Kumar, A. G. i Amat, E. Rosnes, and L. Senigagliesi, “Privateinformation retrieval from a cellular network with caching at the edge,”

IEEE Trans. Commun. , vol. 67, pp. 4900–4912, July 2019.[14] N. Raviv, I. Tamo, and E. Yaakobi, “Private information retrieval ingraph-based replication systems,”

IEEE Trans. Inf. Theory , vol. 66,pp. 3590–3602, June 2020.[15] X. Yao, N. Liu, and W. Kang, “The capacity of multi-round privateinformation retrieval from Byzantine databases,” in

Proc. IEEE ISIT ,July 2019.[16] I. Samy, R. Tandon, and L. Lazos, “On the capacity of leaky privateinformation retrieval,” in

Proc. IEEE ISIT , July 2019.[17] R. G. L. D’Oliveira and S. El Rouayheb, “One-shot PIR: Reﬁnementand lifting,”

IEEE Trans. Inf. Theory , vol. 66, pp. 2443–2455, April2020.[18] Z. Jia and S. Jafar, “X-secure T-private federated submodel learning,”[Online]. Available: arXiv:2010.01059.[19] Z. Chen, Z. Wang, and S. A. Jafar, “The capacity of T-private informa-tion retrieval with private side information,”

IEEE Trans. Inf. Theory ,vol. 66, pp. 4761–4773, August 2020.[20] Y.-P. Wei, K. Banawan, and S. Ulukus, “The capacity of privateinformation retrieval with partially known private side information,”

IEEE Trans. Inf. Theory , vol. 65, pp. 8222–8231, December 2019.[21] Y.-P. Wei and S. Ulukus, “The capacity of private information retrievalwith private side information under storage constraints,”

IEEE Trans.Inf. Theory , 2019. Early Access.[22] S. P. Shariatpanahi, M. J. Siavoshani, and M. A. Maddah-Ali, “Multi-message private information retrieval with private side information,” in

Proc. IEEE ITW , November 2018.[23] A. Heidarzadeh, B. Garcia, S. Kadhe, S. E. Rouayheb, and A. Sprintson,“On the capacity of single-server multi-message private informationretrieval with side information,” in

Proc. Allerton , October 2018.[24] S. Li and M. Gastpar, “Single-server multi-message private informationretrieval with side information,” in

Proc. Allerton , October 2018.[25] S. Kadhe, B. Garcia, A. Heidarzadeh, S. El Rouayheb, and A. Sprintson,“Private information retrieval with side information,”

IEEE Trans. Inf.Theory , vol. 66, pp. 2032–2043, April 2020.[26] Z. Chen, Z. Wang, and S. A. Jafar, “The asymptotic capacity of privatesearch,”

IEEE Trans. Inf. Theory , vol. 66, pp. 4709–4721, August 2020.[27] Z. Wang, K. Banawan, and S. Ulukus, “Private set intersection: A multi-message symmetric private information retrieval perspective,” [Online].Available: arXiv:1912.13501.[28] S. S. Pradhan and K. Ramchandran, “Distributed source coding usingsyndromes (DISCUS): design and construction,”

IEEE Trans. Inf. The-ory , vol. 49, pp. 626–643, March 2003.[29] R. E. Blahut,

Algebraic codes for data transmission . Cambridgeuniversity press, 2003.[30] H. Sun and S. A. Jafar, “Optimal download cost of private informationretrieval for arbitrary message length,”

IEEE Trans. Inf. ForensicsSecurity , vol. 12, pp. 2920–2932, December 2017.[31] D. Slepian and J. K. Wolf, “Noiseless coding of correlated informationsources,”

IEEE Trans. Inf. Theory , vol. IT-19, pp. 471–480, July 1973.[32] N. B. Shah, K. V. Rashmi, and K. Ramchandran, “One extra bit ofdownload ensures perfectly private information retrieval,” in

Proc. IEEEISIT , June 2014.[33] H. Sun and S. A. Jafar, “Multiround private information retrieval:Capacity and storage overhead,”