[PDF] Capacity-Achieving Private Information Retrieval Schemes from Uncoded Storage Constrained Servers with Low Sub-packetization

Abstract

This paper investigates reducing sub-packetization of capacity-achieving schemes for uncoded Storage Constrained Private Information Retrieval (SC-PIR) systems. In the SC-PIR system, a user aims to retrieve one out of K files from N servers while revealing nothing about its identity to any individual server, in which the K files are stored at the N servers in an uncoded form and each server can store up to \mu K equivalent files, where \mu is the normalized storage capacity of each server. We first prove that there exists a capacity-achieving SC-PIR scheme for a given storage design if and only if all the packets are stored exactly at M\triangleq \mu N servers for \mu such that M=\mu N\in\{2,3,\ldots,N\}. Then, the optimal sub-packetization for capacity-achieving linear SC-PIR schemes is characterized as the solution to an optimization problem, which is typically hard to solve because of involving indicator functions. Moreover, a new notion of array called Storage Design Array (SDA) is introduced for the SC-PIR system. With any given SDA, an associated capacity-achieving SC-PIR scheme is constructed. Next, the SC-PIR schemes that have equal-size packets are investigated. Furthermore, the optimal equal-size sub-packetization among all capacity-achieving linear SC-PIR schemes characterized by Woolsey et al. is proved to be \frac{N(M-1)}{\gcd(N,M)}. Finally, by allowing unequal size of packets, a greedy SDA construction is proposed, where the sub-packetization of the associated SC-PIR scheme is upper bounded by \frac{N(M-1)}{\gcd(N,M)}. Among all capacity-achieving linear SC-PIR schemes, the sub-packetization is optimal when \min\{M,N-M\}|N or M=N, and within a multiplicative gap \frac{\min\{M,N-M\}}{\gcd(N,M)} of the optimal one otherwise. In particular, for the case N=d\cdot M\pm1 where d\geq 2, another SDA is constructed to obtain lower sub-packetization.

Full PDF

aa r X i v : . [ c s . I T ] F e b Capacity-Achieving Private Information RetrievalSchemes from Uncoded Storage ConstrainedServers with Low Sub-packetization

Jinbao Zhu, Qifa Yan, Xiaohu Tang, and Ying Miao

Abstract

This paper investigates reducing sub-packetization of capacity-achieving schemes for uncoded Storage Constrained PrivateInformation Retrieval (SC-PIR) systems. In the SC-PIR system, a user aims to download one out of K ﬁles from N serverswhile revealing nothing about the identity of the requested ﬁle to any individual server, in which the K ﬁles are stored at the N servers in an uncoded form and each server can store up to µK equivalent ﬁles, where µ is the normalized storage capacity ofeach server. We ﬁrst prove that there exists a capacity-achieving SC-PIR scheme for a given storage design if and only if all thepackets are stored exactly at M , µN servers for µ such that M = µN ∈ { , , . . . , N } . Then, the optimal sub-packetizationfor capacity-achieving linear SC-PIR schemes is characterized as the solution to an optimization problem, which is typically hardto solve since it involves non-continuous indicator functions. Moreover, a new notion of array called Storage Design Array (SDA) is introduced for the SC-PIR system. With any given SDA, an associated capacity-achieving SC-PIR scheme is constructed. Next,the SC-PIR schemes that have equal-size packets are investigated. Furthermore, the optimal equal-size sub-packetization amongall capacity-achieving linear SC-PIR schemes characterized by Woolsey et al. is proved to be N ( M − N,M ) , which is achieved bya construction of SDA. Finally, by allowing unequal size of packets, a greedy SDA construction is proposed, where the sub-packetization of the associated SC-PIR scheme is upper bounded by N ( M − N,M ) . Among all capacity-achieving linear SC-PIRschemes, the sub-packetization is optimal when min { M, N − M }| N or M = N , and within a multiplicative gap min { M,N − M } gcd( N,M ) of the optimal one in general. In particular, for the special case N = d · M ± where the positive integer d ≥ , we proposeanother SDA construction to obtain lower sub-packetization. Index Terms

Private information retrieval, uncode, storage constrained servers, sub-packetization, capacity-achieving, storage design array.

I. I

NTRODUCTION

Along with the rapid advancement of Distributed Storage Systems (DSSs), protecting the download privacy of a user againstpublic servers is of vital importance. The problem of Private Information Retrieval (PIR) was ﬁrst introduced by Chor et al. in [9] and has attracted remarkable attention within computer science community subsequently [9], [11], [18], [35]. In theclassical framework, a user wishes to retrieve one out of K ﬁles from N servers, each of which stores the whole library of K ﬁles, while ensuring that any server can not learn any information about the ﬁle index being requested. To this end, theuser sends a query string to each server. Then the server responds truthfully with an answer string depending on the receivedquery and the contents stored. Finally, the user correctly decodes the requested ﬁle from the answers. Note that to prevent eachserver from obtaining information about which ﬁle is being requested, the query distribution has to be marginally independentof the desired ﬁle index.A trivial strategy is to download all the K ﬁles in the library no matter which ﬁle is requested by the user, but this results inimpractical communication cost, especially in a modern DSS, which typically maintains a large number of ﬁles. In the seminalwork [9] where each ﬁle is of one bit size, the communication cost was measured by the sum of upload cost (the total sizeof query strings) and download cost (the total size of answer strings). In the sense of information-theoretic security, which J. Zhu, Q. Yan and X. Tang are with the Information Security and National Computing Grid Laboratory, Southwest Jiaotong University, Chengdu 611756,China (email: [email protected], [email protected], [email protected]).Y. Miao is with the Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8573, Japan (e-mail:[email protected]). assures privacy even if the servers have unbounded computational power, it was shown in [9] that the naive strategy is theonly feasible solution to a single server, whereas low communication cost can be attained by replicating the ﬁles at multiplenon-colluding servers. To improve the efﬁciency, single-server PIR has been widely studied in the sense of computationalsecurity, whose privacy is guaranteed by some computational hard problems, for examples, the problems related to so-called Φ -hiding number-theory [6], [12], trapdoor permutations [15], [4], or quadratic/composite residuosity [13], [8]. These worksimprove the efﬁciency at the cost of non-zero possibility of disclosing information relevant to the identity of the requested ﬁle.Instead of retrieving a single bit, Shannon theory allows the ﬁle size to be arbitrarily large, and therefore the upload costcan be neglected compared to the download cost since it does not scale with ﬁle size [20], [7], [22], [3], [25]. Then, thecommunication efﬁciency is usually measured by retrieval rate , deﬁned as the number of bits that the user can privatelyretrieve per bit of download data across all random realizations of queries. Particularly, the supremum of retrieval rates overall achievable schemes is called capacity . To implement a PIR scheme, the ﬁles typically need to be partitioned into somenon-overlapping packets. The number of packets is referred to as sub-packetization in the literature. The sub-packetizationreﬂects the complexity of the scheme in practice and is preferred to be as small as possible. This is because any practicalscheme will require each of the packets to include some header information for user to decode [27], especially the headeroverhead may be non-negligible when there are a large number of packets. This problem has already been noticed in otherapplications, for example coded caching [32], [33], [34], [27], [21].In 2014, Shah et al. revisited the PIR problem and reported an interesting scheme achieving the PIR rate − N and requiringsub-packetization N − [20]. Later in the inﬂuential work by Sun and Jafar [22], the exact PIR capacity was characterized as (cid:0) N + . . . + N K − (cid:1) − for any N and K . However, to achieve the capacity, the smallest sub-packetization of the proposedPIR schemes [22] is N K , which increases exponentially with the number of ﬁles K and thus is impractical even for moderatenumber of ﬁles. Soon afterwards, the sub-packetization was decreased to N K − in [23], which was proved to be optimal underthe assumption that the download cost are identical over all random realizations of queries. In a very recent work [28], Tian et al. innovatively introduced a new capacity-achieving scheme, which incurs different download cost for distinct realizationsof queries. As a result, the sub-packetization was decreased to N − , which is independent of K and shown to be optimalamong all the capacity-achieving PIR schemes.A common assumption in the aforementioned results is that each server has sufﬁciently large storage capacity to store allthe ﬁles in the library, i.e., a repetition coding is used to store the ﬁles across servers. Though repetition coding can offersimplicity in designing PIR schemes and the high immunity against server failures, it suffers from extremely large storagecost. The storage cost in a PIR system has been widely investigated in terms of the coding structures in the storage design,such as speciﬁc Maximum Distance Separable (MDS) codes [3], [25], [37], an uncoded storage [26], [1], [31], and other morecomplicated coding techniques [20], [10], [5], [19], [36], [14], [2]. Moreover, the tradeoff between the storage cost and retrievalrate was considered without any explicit constraints on the storage codes [24], [29], [30].As the ﬁrst step toward exactly characterizing the tradeoff between storage cost and retrieval rate, Tandon et al. formulatedthe problem of uncoded Storage Constrained PIR (SC-PIR) in [26], [1]. In this setup, each server can store up to µKL symbolsby some storage design, where N ≤ µ ≤ is the normalized storage and L is the number of symbols of each ﬁle. The capacityof SC-PIR was proved in [1] to be (cid:0) M + . . . + M K − (cid:1) − for M , µN ∈ { , . . . , N } . However, the capacity-achievingSC-PIR scheme in [26], [1] has sub-packetization (cid:0) NM (cid:1) M K . Thus the problem of high sub-packetization shows up again inthis SC-PIR model. Recently, Woolsey et al. [31] proposed a general construction of SC-PIR schemes by establishing theconnection between storage design and Storage Full PIR (SF-PIR, i.e., the case that each server can store all the ﬁles). Then,the sub-packetization to achieve the capacity of the SC-PIR system was reduced to N M K − in [31], which also increasesexponentially with K .In this paper, we are interested in characterizing the optimal sub-packetization to achieve the capacity of SC-PIR systems.Note from the previous work [1], [31] that linear schemes are sufﬁcient to achieve the capacity of SC-PIR. Additionally, it wasproved in [1] that, for any µ with N ≤ µ ≤ , the capacity of SC-PIR system can be achieved by memory-sharing techniquebetween the discrete points such that M ∈ { , , . . . , N } , where M = 1 is a trivial case since the user has to download allthe contents stored at the N servers to assure privacy. Therefore, the problem comes down to the case M ∈ { , , . . . , N } forlinear SC-PIR schemes, which is the focus of this paper. The contributions of this paper are:1) We prove that there exists a capacity-achieving SC-PIR scheme for a given storage design if and only if all the packets are stored exactly at M servers in the storage phase.2) We characterize the optimal sub-packetization of capacity-achieving linear SC-PIR schemes by an optimization problem.Consequently, a general construction of capacity-achieving linear SC-PIR schemes with optimal sub-packetization canbe obtained based on the optimal solution of this optimization problem.3) Storage Design Array (SDA) is introduced to obtain feasible solutions to the optimization problem. Any given SDA isassociated to a practical capacity-achieving linear SC-PIR scheme with low sub-packetization.4) We prove that the optimal equal-size sub-packetization is N ( M − N,M ) among all the classes of capacity-achieving linearSC-PIR schemes characterized by Woolsey et al. [31].5) In order to further decrease sub-packetization of capacity-achieving SC-PIR schemes, we investigate the problem undera more general assumption, i.e., the sizes of the packets are allowed to be unequal. In particular, a greedy algorithm isproposed to construct SDA for any positive integers N, M such that ≤ M ≤ N . The sub-packetization of the associatedSC-PIR scheme is shown to be optimal among all capacity-achieving linear SC-PIR schemes when min { M, N − M }| N or M = N . In the other cases, the sub-packetization is within a multiplicative gap min { M,N − M } gcd( N,M ) compared to its lowerbound. Moreover, for the case N = d · M ± where the integer d ≥ , we propose another construction of SDA toachieve lower sub-packetization compared to the greedy SDA.The rest of this paper is organized as follows. In Section II, we introduce the system model and problem formulation.In Section III, we establish an information-theoretic lower bound on sub-packetization of capacity-achieving linear SC-PIRschemes. In Section IV, we characterize a generic construction of capacity-achieving linear SC-PIR schemes with optimal sub-packetization. In Section V, we introduce SDA to construct capacity-achieving SC-PIR schemes with low sub-packetization.In Section VI, we present the results under the assumption of equal-size packets. Section VII proposes two SDA constructionsand proves the optimality of the resultant sub-packetization. Finally, the paper is concluded in Section VIII.The following notation is used throughout this paper. • For any integers n, m, s, N with n ≤ m , [ n : m ] and ([ n : m ] + s ) N respectively denote the sets { n, n + 1 , . . . , m } and { i + s (mod N ) : n ≤ i ≤ m } ; • For a ﬁnite set S , |S| denotes its cardinality; • Denote A m a vector ( A , . . . , A m ) , and deﬁne A Γ as ( A γ , . . . , A γ k ) for any index set Γ = { γ , . . . , γ k } ⊆ [1 : m ] with γ < . . . < γ k or any index vector Γ = ( γ , . . . , γ k ) ; • Deﬁne ( x ) as a function of a logical variable x , i.e., ( x ) = 1 if x is true and ( x ) = 0 otherwise.II. S YSTEM M ODEL

Let F q be the ﬁnite ﬁeld for a prime power q . Consider a non-colluding PIR system with K ﬁles W , . . . , W K ∈ F L × q stored across N servers in an uncoded fashion. Each of ﬁles is comprised of L i.i.d. uniform symbols over F q , i.e., H ( W ) = . . . = H ( W K ) = L, (1) H ( W , . . . , W K ) = K X k =1 H ( W k ) , (2)where the entropy function H ( · ) is measured with logarithm q . Let Z n ( n ∈ [1 : N ] ) be the contents stored at server n , whichis subject to the storage capacity of server n , then the storage constraint for each server is H ( Z n ) ≤ µKL, ∀ n ∈ [1 : N ] , (3)where µ is the normalized storage capacity . Notice that, when µ < N , the total storage capacity of N servers is insufﬁcientto store all the K ﬁles. For µ = 1 , each server can store all the K ﬁles. Thus, we are interest in the case N ≤ µ ≤ .The system operates in the following two phases: Storage Phase:

Each ﬁle W k is partitioned into F disjoint packets and thus it will be convenient to label the F packets as W k, , W k, , . . . , W k,F , where W k,i is the i -th packet of ﬁle W k . By convention, we call F sub-packetization . Then, for any k ∈ [1 : K ] , W k = { W k,i : i ∈ [1 : F ] } , (4) H ( W k ) = F X i =1 H ( W k,i ) . (5)Clearly, each of these packets must be stored at at least one server because of the constraint of reliable decoding. Inparticular, all the ﬁles are partitioned and stored in the same manner , i.e., H ( W ,i ) = H ( W ,i ) = . . . = H ( W K,i ) , ∀ i ∈ [1 : F ] , (6) Z n = { W k,i : k ∈ [1 : K ] , i ∈ Z n } , ∀ n ∈ [1 : N ] , (7)where Z n is a subset of [1 : F ] such that Z n satisﬁes (3). In other words, Z n ⊆ [1 : F ] consists of the indices of packetsstored at server n . Retrieval Phase:

A user selects an index θ ∈ [1 : K ] privately and wishes to retrieve the ﬁle W θ from the system withoutdisclosing any information about θ to any individual server. For this purpose, the user generates N queries Q [ θ ]1: N and sends Q [ θ ] n to server n ∈ [1 : N ] . Indeed, the queries are generated independently of ﬁle realizations, i.e., I ( Q [ θ ]1: N ; W K ) = 0 , ∀ θ ∈ [1 : K ] , (8)where I ( · ) is the mutual information function. Upon receiving the query Q [ θ ] n , server n responds with an answer A [ θ ] n , whichis determined by the received query and its stored contents. Thus, by the data processing inequality, H ( A [ θ ] n | Q [ θ ] n , Z n ) = H ( A [ θ ] n | Q [ θ ] n , W K ) = 0 , ∀ n ∈ [1 : N ] . (9)Finally, from all the answers A [ θ ]1: N collected from the N servers, the user must be able to decode the desired ﬁle W θ correctly,i.e., H ( W θ | A [ θ ]1: N , Q [ θ ]1: N ) = 0 , ∀ θ ∈ [1 : K ] . (10)To ensure the privacy, the strategies for retrieving any two ﬁles W θ and W θ ′ must be indistinguishable in terms of any individualserver, i.e., ( Q [ θ ] n , A [ θ ] n , Z n ) ∼ ( Q [ θ ′ ] n , A [ θ ′ ] n , Z n ) , ∀ θ, θ ′ ∈ [1 : K ] , ∀ n ∈ [1 : N ] , (11)where X ∼ Y means that the random variables X and Y are identical distribution. Equivalently, the desired index θ must behidden from all the information available to each server, i.e., I ( Q [ θ ] n , A [ θ ] n , Z n ; θ ) = 0 , ∀ n ∈ [1 : N ] . (12)Throughout this paper, we refer to this system as a ( µ, N, K ) Storage Constrained PIR (SC-PIR) system. If µ = 1 , thesystem is also referred to as an ( N, K ) Storage Full PIR (SF-PIR) system.In order to measure the performance of SC-PIR systems, the following two quantities are considered:1. The sub-packetization F , which reﬂects the complexity of the SC-PIR scheme in practical applications, and thus ispreferred to be as small as possible.2. The retrieval rate R , which is the number of desired bits that the user can retrieve privately per bit of downloaded data,is deﬁned as R , H ( W θ ) P Nn =1 H ( A [ θ ] n ) = LD , (13)where D , P Nn =1 H ( A [ θ ] n ) is the average download cost from the N servers over random queries. Obviously, R and D are independent of θ by (1) and (11).A retrieval rate R is said to be achievable if there exists a design of both storage and retrieval phases satisfying (3)–(12)such that its retrieval rate is greater than or equal to R . The capacity of the SC-PIR system, denoted by C ∗ , is thesupremum over all the achievable rates, i.e., C ∗ = sup { R : R is achievable } . To the best of our knowledge, all the previous storage constrained PIR schemes satisfy this assumption [26], [1], [31], which is also a popular storagemanner in coded caching [16], [34], [27], [21], [17].

Deﬁne the total normalized storage capacity as M , µN ∈ [1 , N ] . For the case M ∈ [1 : N ] , the capacity of SC-PIR is exactly characterized in [1] as C ∗ = (cid:18) M + . . . + 1 M K − (cid:19) − . (14)Generally, for other M ∈ [1 , N ] (or equivalently µ ∈ [ N , ), the capacity can be achieved by memory-sharing techniquebetween the integer points ⌈ M ⌉ and ⌊ M ⌋ (see [1, Claim 1 & Theorem 2]). Thus, in the sequel, we will concentrate ourdiscussion on the case M ∈ [2 : N ] since it is straightforward to prove that the optimal sub-packetization is F = N for thecase M = 1 .Moreover, the existing work [26], [1], [31] have shown that linear SC-PIR schemes can achieve the capacity. Deﬁnition 1 (Linear SC-PIR Scheme) . For a given scheme of the ( µ, N, K ) SC-PIR system, let ℓ n be the answer length ofquery Q [ θ ] n . It is said to be a linear SC-PIR scheme if the answers A [ θ ] n ( n ∈ [1 : N ]) are formed by A [ θ ] n = LC [ θ ] n ( Z n ) = (cid:16) LC [ θ ] n, ( Z n ) , . . . , LC [ θ ] n,ℓ n ( Z n ) (cid:17) , ∀ n ∈ [1 : N ] with each entry LC [ θ ] n,j ( Z n ) ( j ∈ [1 : ℓ n ] ) given by a linear combination of the packets stored at server n , i.e., LC [ θ ] n,j ( Z n ) = X k ∈ [1: K ] X i ∈Z n β [ θ ] n,k,i,j · W k,i , (15) where β [ θ ] n,k,i,j ∈ F q is the coefﬁcient of packet W k,i in the j -th entry of A [ θ ] n and is determined completely by the receivedquery Q [ θ ] n . Here, it implicitly assumes that each of packets { W k,i : k ∈ [1 : K ] , i ∈ [1 : F ] } is represented by a vector over F q . If the packets have different dimensions, then the additions are performed by padding the vectors with zeros to the largestdimension. The objective of the paper is to design ( µ, N, K ) linear SC-PIR schemes achieving the SC-PIR capacity with the minimumsub-packetization for the case M ∈ [2 : N ] .III. A L OWER B OUND ON S UB - PACKETIZATION OF C APACITY -A CHIEVING L INEAR

SC-PIR S

CHEMES

To simplify our notations in the following discussion, denote W k, S the set of packets of ﬁle W k that are exclusively storedby servers in S , S ⊆ [1 : N ] , i.e., W k, S , (cid:26) W k,i : i ∈ (cid:18) ∩ n ∈S Z n (cid:19) (cid:31) (cid:18) ∪ m ∈ [1: N ] \S Z m (cid:19)(cid:27) , ∀ k ∈ [1 : K ] . (16)Obviously, W k, ∅ = ∅ and H ( W k, ∅ ) = 0 due to the constraint of reliable decoding. Then, ﬁle W k and the storage contents atserver n can be respectively rewritten as W k = ∪ S⊆ [1: N ] W k, S , ∀ k ∈ [1 : K ] (17)and Z n = ∪ k ∈ [1: K ] ∪ S⊆ [1: N ] n ∈S W k, S , ∀ n ∈ [1 : N ] . (18)Notice from (6) and (7) that both the entropy of random variable W k, S and the size of set W k, S are irrespective of k .Thus, for all S ⊆ [1 : N ] , we can set H ( W k, S ) , α S L and F S , | W k, S | where α S ∈ [0 , . In other words, α S is thenormalized ﬁle size of W k, S and F S is the number of packets in W k, S . By (1), (3), (17) and (18), the ﬁle size, storage size,and sub-packetization F are respectively constrained as X S⊆ [1: N ] α S = 1 , (19) Throughout this paper, the “length” is counted by the number of packets, thus “answer length” refers to as the number of packets in the answer. X S⊆ [1: N ] n ∈S α S ≤ µ, ∀ n ∈ [1 : N ] , (20)and F = X S⊆ [1: N ] F S . (21)In the following, we establish an information-theoretical lower bound on sub-packetization of any capacity-achieving ( µ, N, K ) linear SC-PIR scheme with M = µN ∈ [2 : N ] , which is characterized by the following optimization problem. Deﬁnition 2.

Given any positive integers N and M = µN ∈ [2 : N ] , Problem 1 is deﬁned as { α ∗S } S⊆ [1: N ] , |S| = M = arg min X S⊆ [1: N ] |S| = M ( α S > s.t. X S⊆ [1: N ] |S| = M,n ∈S α S = µ, ∀ n ∈ [1 : N ] (22) ≤ α S ≤ , ∀ S ⊆ [1 : N ] , |S| = M (23) where { α ∗S } S⊆ [1: N ] , |S| = M is called the optimal solution to Problem 1 and η ∗ = P S⊆ [1: N ] , |S| = M ( α ∗S > is called the optimalvalue of Problem 1. In addition, the parameters { α S } S⊆ [1: N ] , |S| = M satisfying (22) and (23) are called a feasible solution toProblem 1.A. Necessary Conditions of Capacity-Achieving Linear SC-PIR Schemes In this subsection, we derive ﬁve necessary conditions (Lemmas 1 and 2 below) for capacity-achieving linear SC-PIRschemes, whose proofs are left in Appendix.

Lemma 1.

Given any ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] and { α S : α S ∈ [0 , , S ⊆ [1 : N ] } , the storagedesign of any capacity-achieving SC-PIR scheme must satisfy: P1.

All the packets must be stored exactly at M servers, i.e., α S = 0 for all S ⊆ [1 : N ] with |S| 6 = M ; P2.

The storage capacity at all servers must be used up, i.e., P S⊆ [1: N ] ,n ∈S α S = µ for all n ∈ [1 : N ] .Remark . Given any parameters { α S : α S ∈ [0 , , S ⊆ [1 : N ] } , P1 along with P2 are equivalent to the constraints (22) and(23) of Problem 1.For any K ⊆ [1 : K ] , S ⊆ [1 : N ] , denote W K , S , ∪ k ∈K W k, S . Given θ ∈ [1 : K ] , let g LC [ θ ] n ( Z n ) be the answer of server n when receiving the query realization e Q [ θ ] n . Let g LC [ θ ] n ( W K , S ) be the part of g LC [ θ ] n ( Z n ) involving the linear combinations ofpackets in W K , S , i.e., g LC [ θ ] n ( W K , S ) , (cid:18)g LC [ θ ] n, ( W K , S ) , . . . , g LC [ θ ] n, e ℓ n ( W K , S ) (cid:19) , ∀ n ∈ [1 : N ] , (24)where e ℓ n is the answer length for the query realization e Q [ θ ] n , and g LC [ θ ] n,j ( W K , S ) is given by g LC [ θ ] n,j ( W K , S ) = X k ∈K ,W k,i ∈ W K , S e β [ θ ] n,k,i,j · W k,i , ∀ j ∈ [1 : e ℓ n ] (25)in which the coefﬁcient e β [ θ ] n,k,i,j is the realization of β [ θ ] n,k,i,j in (15) when the query realization e Q [ θ ] n is received by server n . Lemma 2.

Given any ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] , let S ⊆ [1 : N ] and θ, θ ′ ∈ [1 : K ] such that |S| = M, θ = θ ′ . For every realization of queries e Q [ θ ]1: N with positive probability, the retrieval phase for any capacity-achievinglinear SC-PIR scheme must satisfy: P3. (Independence of the retrieved data)

The M random variables f LC [ θ ] n (cid:0) W θ, S (cid:1) , ∀ n ∈ S (26) are independent of each other; P4. (Independence of the requested data)

The M random variables g LC [ θ ] n (cid:0) W [1: K ] \{ θ ′ } , S (cid:1) , ∀ n ∈ S (27) are independent of each other; P5. (Identical information for the residuals)

The M random variables f LC [ θ ] n (cid:0) W [1: K ] \{ θ,θ ′ } , S (cid:1) , ∀ n ∈ S (28) are deterministic of each other.B. Lower Bound on Sub-packetization of Capacity-Achieving Linear SC-PIR Schemes Lemma 3.

Given any capacity-achieving ( µ, N, K ) linear SC-PIR scheme with M = µN ∈ [2 : N ] and { α S : α S ∈ [0 , , S ⊆ [1 : N ] } , ( F S ≥ M − , if α S > , S ⊆ [1 : N ] , |S| = MF S = 0 , otherwise . Proof:

It is clear that F S = 0 if α S = 0 . By Lemma 1, we just need to prove F S ≥ M − if α S > , S ⊆ [1 : N ] , |S| = M .Let W θ ∗ and e Q [ θ ∗ ]1: N be the desired ﬁle of the user and a realization of queries with positive probability, respectively. Forany S with | W θ ∗ , S | > , recall from (16) that W θ ∗ , S are exclusively stored at servers in S . Thus, in the conditioning of therealization of queries e Q [ θ ∗ ]1: N , to ensure that the user can correctly decode W θ ∗ , there must be a server n ∈ S such that thecoefﬁcients of packets W θ ∗ , S in f LC [ θ ∗ ] n ( Z n ) are not all zeros, i.e., H (cid:16) f LC [ θ ∗ ] n ( W θ ∗ , S ) (cid:17) > , ∀ n ∈ S . Note that, the random queries for retrieving distinct ﬁles at a given server have the identical distribution by the privacyconstraint (11). Thus, the following observation holds:

Observation:

For any realization of queries e Q [ θ ∗ ]1: N with positive probability, the query e Q [ θ ∗ ] n , sent to server n for retrievingﬁle W θ ∗ , can also be sent to the same server n but for retrieving any distinct ﬁle θ = θ ∗ in another realization of queries e Q [ θ ]1: N with positive probability, where e Q [ θ ] n = e Q [ θ ∗ ] n . As a result, for the two realizations of queries e Q [ θ ∗ ]1: N and e Q [ θ ]1: N , server n will respond the same answer, i.e., f LC [ θ ∗ ] n ( Z n ) = (cid:18) f LC [ θ ∗ ] n, ( Z n ) , . . . , f LC [ θ ∗ ] n, e ℓ n ( Z n ) (cid:19) = (cid:18) f LC [ θ ] n, ( Z n ) , . . . , f LC [ θ ] n, e ℓ n ( Z n ) (cid:19) = f LC [ θ ] n ( Z n ) . That is, for any θ ∈ [1 : K ] \{ θ ∗ } , there exists another realization of queries e Q [ θ ]1: N with positive probability such that server n will respond with the same answer f LC [ θ ] n ( Z n ) = f LC [ θ ∗ ] n ( Z n ) , where the terms involving W θ ∗ , S are identical, i.e., H (cid:16) f LC [ θ ] n ( W θ ∗ , S ) (cid:17) = H (cid:16) f LC [ θ ∗ ] n ( W θ ∗ , S ) (cid:17) > . Then, for any θ ′ ∈ [1 : K ] \{ θ, θ ∗ } , H (cid:16) f LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) ( a ) ≥ H (cid:16) f LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) (cid:12)(cid:12) W [1: K ] \{ θ,θ ′ ,θ ∗ } , S (cid:17) ( b ) = H (cid:16) f LC [ θ ] n ( W θ ∗ , S ) (cid:17) > , (29)where ( a ) holds because conditioning reduces entropy; ( b ) follows from the linearity of (25).Assume that the number of packets in W θ, S is less than M − , i.e., F S < M − . According to (25) and (26), the M random variables f LC [ θ ] n ′ (cid:0) W θ, S (cid:1) ( n ′ ∈ S ) consisting of linear combinations of F S packets in W θ, S are independent of eachother. Thus, F S < M − results in that there must exist two distinct servers i, j ∈ S such that f LC [ θ ] i ( W θ, S ) = f LC [ θ ] j ( W θ, S ) = , ∀ i, j ∈ S , i = j. (30) However, we have ( a ) = I (cid:16) f LC [ θ ] i ( W [1: K ] \{ θ ′ } , S ); f LC [ θ ] j ( W [1: K ] \{ θ ′ } , S ) (cid:17) ( b ) = I (cid:16) f LC [ θ ] i ( W [1: K ] \{ θ ′ ,θ } , S ) + f LC [ θ ] i ( W θ, S ); f LC [ θ ] j ( W [1: K ] \{ θ ′ ,θ } , S ) + f LC [ θ ] j ( W θ, S ) (cid:17) ( c ) = I (cid:16) f LC [ θ ] i ( W [1: K ] \{ θ ′ ,θ } , S ); f LC [ θ ] j ( W [1: K ] \{ θ ′ ,θ } , S ) (cid:17) ( d ) = H (cid:16) f LC [ θ ] i ( W [1: K ] \{ θ ′ ,θ } , S ) (cid:17) ( e ) > , where ( a ) follows by (27); ( b ) follows from the linearity of (25) again; ( c ) is due to (30); ( d ) is because of (28); ( e ) holdssince H (cid:16) f LC [ θ ] i ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) = H (cid:16) f LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) > by (28) and (29). Thus, F S ≥ M − and the proof is completed.Now, we are ready to characterize a lower bound on the sub-packetization among all capacity-achieving linear SC-PIRschemes. Theorem 1.

For any given ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] , the sub-packetization of any capacity-achievinglinear SC-PIR scheme is lower bounded by F ≥ η ∗ · ( M − , (31) where η ∗ is the optimal value to Problem 1.Proof: Given any capacity-achieving linear SC-PIR scheme with { α S : α S ∈ [0 , , S ⊆ [1 : N ] } , the sub-packetization F ( a ) = X S⊆ [1: N ] F S ( b ) ≥ X S⊆ [1: N ] |S| = M ( α S > · ( M − ( c ) ≥ η ∗ · ( M − , where ( a ) follows by (21); ( b ) holds by Lemma 3; ( c ) is due to Lemma 1 and Remark 1 that { α S : α S ∈ [0 , , S ⊆ [1 : N ] } of any capacity-achieving SC-PIR scheme must satisfy (22) and (23). Deﬁnition 3.

The sub-packetization F of a capacity-achieving linear SC-PIR scheme is said to be optimal if it achieves theequality in (31) . IV. A G

ENERIC C APACITY -A CHIEVING L INEAR

SC-PIR S

CHEME WITH O PTIMAL S UB - PACKETIZATION

In this section, we present a generic construction of capacity-achieving linear SC-PIR schemes with optimal sub-packetization.

A. SF-SC-PIR Schemes Based on Transformation From SF-PIR Schemes to SC-PIR Schemes

We ﬁrst introduce a class of SC-PIR schemes in Algorithm 1, which are constructed by a transformation from SF-PIRschemes to SC-PIR schemes, where we term the resultant schemes as

SF-SC-PIR schemes for convenience. Actually, thetransformation was ﬁrst characterized in [31].

Theorem 2.

For any positive integers

N, K, M with M ∈ [2 : N ] , given any feasible solution { α S } S⊆ [1: N ] , |S| = M to Problem 1and any capacity-achieving ( M, K ) linear SF-PIR scheme with sub-packetization F SF , the ( µ = M/N, N, K ) linear SF-SC-PIRscheme obtained in Algorithm 1 is capacity-achievable with sub-packetization F = η · F SF , where η = P S⊆ [1: N ] , |S| = M ( α S > .Proof: Obviously, the output SF-SC-PIR scheme of Algorithm 1 is linear if the input SF-PIR scheme is linear. Furthermore,by Lines 2-5, the sub-packetization of the output scheme is F = η · F SF . Consequently, we prove the theorem by showing Algorithm 1

Capacity-Achieving SF-SC-PIR Schemes

Input:

A feasible solution { α S } S⊆ [1: N ] , |S| = M to Problem 1 and a capacity-achieving ( M, K ) SF-PIR scheme with sub-packetization F SF Output:

Capacity-Achieving SF-SC-PIR Scheme with sub-packetization F = η · F SF , where η = P S⊆ [1: N ] , |S| = M ( α S > . procedure Storage for k ∈ [1 : K ] do Divide W k into { W k, S : S ⊆ [1 : N ] , |S| = M , α S > } such that H ( W k, S ) = α S L Further divide each W k, S into F SF disjointed packets for S ⊆ [1 : N ] , |S| = M, α S > end for for n ∈ [1 : N ] do Z n ← { W k, S : k ∈ [1 : K ] , S ⊆ [1 : N ] , |S| = M, α S > , n ∈ S} end for end procedure procedure Retrieval for each

S ⊆ [1 : N ] , |S| = M, α S > do Employ the ( M, K ) capacity-achieving SF-PIR scheme independently to retrieve W θ, S privately from the M serversin S and the K packet sets in { W k, S : k ∈ [1 : K ] } end for The user combines W θ, S ( S ⊆ [1 : N ] , |S| = M, α S > ) to recover W θ end procedure that the storage design in Algorithm 1 is achievable and the SF-SC-PIR scheme is capacity-achievable while satisfying theconstraints of correctness and privacy.In Line 3, each ﬁle can be partitioned into W k = { W k, S : S ⊆ [1 : N ] , |S| = M , α S > } since the feasible solution { α S } S⊆ [1: N ] , S = M that satisﬁes (22) and (23) has X S⊆ [1: N ] |S| = M,α S > α S = X S⊆ [1: N ] |S| = M α S (32) = 1 M X S⊆ [1: N ] |S| = M α S X n ∈ [1: N ] ( n ∈ S )= 1 M X n ∈ [1: N ] X S⊆ [1: N ] |S| = M α S · ( n ∈ S )= 1 M X n ∈ [1: N ] X S⊆ [1: N ] |S| = M,n ∈S α S ( a ) = µNM = 1 , where ( a ) is due to (22). In Line 7, the storage content Z n at each server is Z n = ∪ k ∈ [1: K ] ∪ S⊆ [1: N ] , |S| = Mα S > ,n ∈S W k, S , ∀ n ∈ [1 : N ] . Since all the random variables W k, S are independent of each other, by applying H ( W k, S ) = α S L and (22), we get H ( Z n ) = KL X S⊆ [1: N ] , |S| = Mα S > ,n ∈S α S = µKL, It is not necessary to use different SF-PIR schemes as building blocks since the sub-packetization of resultant SC-PIR scheme can be further reduced byadopting the identical SF-PIR scheme with minimum sub-packetization. which satisﬁes the storage constraint (3). Thus, the storage design in Algorithm 1 is achievable.Then, we prove the scheme in Algorithm 1 to be capacity-achievable. By Lines 4 and 7, for any S ⊆ [1 : N ] , |S| = M, α S > , each W k, S ( k ∈ [1 : K ] ) is partitioned into F SF packets and is stored at M servers in S . Thus, in Lines 11-13, the non-zero W θ, S can be retrieved from servers in S by employing the capacity-achieving ( M, K ) SC-PIR schemes independently. Then,according to (13), the download cost for retrieving W θ, S is D S = H ( W θ, S ) C SF = (cid:18) M + . . . + 1 M K − (cid:19) α S L, where C SF = (1 + 1 /M + . . . + 1 /M K − ) − is the capacity of ( M, K ) SF-PIR scheme. Therefore, the rate for retrieving W θ is R = LD = L P S⊆ [1: N ] |S| = M,α S > D S = (cid:18) M + . . . + 1 M K − (cid:19) − , which achieves the capacity of SC-PIR in (14).The user can recover W θ by combining all non-zero W θ, S , where privacy is guaranteed because the SF-PIR scheme satisfyingthe constraint of privacy is independently employed to download the desired packet sets.We know from Theorem 2 that any parameters { α S : α S ∈ [0 , , S ⊆ [1 : N ] } satisfying (22) and (23) result in a storagedesign of a capacity-achieving SC-PIR scheme. Thus, we have the following corollary according to Remark 1. Corollary 1.

Given any parameters { α S : α S ∈ [0 , , S ⊆ [1 : N ] } , P1 and P2 are necessary and sufﬁcient conditions forthe storage design of a capacity-achieving SC-PIR scheme.B. Capacity-Achieving Linear SC-PIR Schemes with Optimal Sub-packetization From Algorithm 1, we can construct a capacity-achieving SC-PIR scheme by using any feasible solution to Problem 1 andany capacity-achieving SF-PIR scheme as a building block. Such capacity-achieving ( M, K ) SF-PIR schemes have been foundin [22], [23], [28]. If the SF-PIR scheme with sub-packetization M K − [23] is employed, then we can obtain a capacity-achieving SC-PIR scheme with sub-packetization η · M K − , which has identical download cost across all random realizationsof queries. Whereas, if the scheme with sub-packetization M − in [28] is adopted, the asymmetry of download cost over allrealizations of queries will be inherited by the resultant SC-PIR scheme so that the sub-packetization is reduced to η · ( M − .In particular, when any optimal solution to Problem 1 is further employed, a capacity-achieving SF-SC-PIR scheme withsub-packetization η ∗ · ( M − can be obtained.For the sake of completeness, we summarize the scheme of [28] in Algorithm 2, where Q is deﬁned as Q , (cid:8) ( q , . . . , q K ) ∈ [0 : M − K (cid:9) . Note that the dummy packets in Line 2 are not stored by the servers at all. Let ∆ = P i ∈ [1: K ] \{ θ } W i,q i , then the usercan decode ﬁle W k = ( W k, , . . . , W k,M − ) from the answers ( A [ θ ]0 , . . . , A [ θ ] M − ) in Line 5 because of ( A [ θ ]0 , . . . , A [ θ ] M − ) =(∆ + W θ,q θ , . . . , ∆ + W θ,M − , ∆ + , ∆ + W θ, , . . . , ∆ + W θ,q θ − ) .The following result is immediate by Theorems 1 and 2. Theorem 3.

For any positive integers

N, K, M with M ∈ [2 : N ] , given any optimal solution { α ∗S } S⊆ [1: N ] , |S| = M to Problem1 and the capacity-achieving ( M, K ) linear SF-PIR scheme in Algorithm 2, Algorithm 1 outputs a capacity-achieving ( µ = M/N, N, K ) linear SC-PIR scheme with sub-packetization F ∗ = η ∗ · ( M − , where η ∗ = P S⊆ [1: N ] , |S| = M ( α ∗S > .Particularly, the sub-packetization F ∗ is optimal among all capacity-achieving linear SC-PIR schemes. V. S

TORAGE D ESIGN A RRAY

According to Theorem 3, in terms of designing capacity-achieving linear SC-PIR schemes with optimal sub-packetization, itis crucial to solve the optimization problem in Problem 1. However, it is not easy because of the involved indicator functions.Thus, in this section, we dedicate to construct concrete capacity-achieving linear schemes with low sub-packetization by ﬁndingsub-optimal solutions to Problem 1. Algorithm 2

Capacity-Achieving ( M, K ) Linear SF-PIR Scheme with Sub-packetization M − Relabel the indices of the M servers as , , . . . , M − . For each k ∈ [1 : K ] , ﬁle W k is uniformly partitioned into M − disjoint packets W k, , W k, , . . . , W k,M − . For easy ofexploration, each ﬁle W k is appended a dummy packet W k,M − , , i.e., W k = ( W k, , . . . , W k,M − , ) , ∀ k ∈ [1 : K ] . Select a vector from the set Q independently and uniformly : q = ( q , . . . , q θ − , q θ , q θ +1 , . . . , q K ) . Query Phase:

Based on the vector q , the user constructs a query sent to server m as Q [ θ ] m = ( q , . . . , q θ − , ( q θ + m ) M , q θ +1 , . . . , q K ) , ∀ m ∈ [0 : M − . Answer Phase:

After receiving the query Q [ θ ] m , the answer at server m is A [ θ ] m =  NULL , if Q [ θ ] m = ( M − , . . . , M − P i ∈ [1: K ] \{ θ } W i,q i + W θ, ( q θ + m ) M , else , ∀ m ∈ [0 : M − , where the value NULL indicates that the server keeps silence. Decoding Phase:

Decode ﬁle W θ = ( W θ, , . . . , W θ,M − ) from the answers A [ θ ]0 , . . . , A [ θ ] M − For clarity, we ﬁrst introduce Storage Design Array (SDA) to construct feasible solutions of Problem 1.

Deﬁnition 4 (Storage Design Array (SDA)) . For any positive integers

N, M with M ∈ [1 : N ] , an ( N, M ) storage designarray is an array of size N × N gcd( N,M ) with each entry being either “ ∗ ” or “ NULL ” that satisﬁes

S1.

Each column has M “ ∗ ”s; S2.

Each row has M gcd( N,M ) “ ∗ ”s. Deﬁnition 5 (Number of Distinct Columns of SDA) . Let P = [ p i,j ] N × N gcd( N,M ) be an ( N, M ) SDA. For each j ∈ [1 : N gcd( N,M ) ] ,let S j be the set of row indices corresponding to “ ∗ ”s in column j , i.e., S j , { i ∈ [1 : N ] : p i,j = ∗} , ∀ j ∈ (cid:20) N gcd( N, M ) (cid:21) . (33) We denote η P as the number of distinct columns in P , i.e., η P , (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) S j : j ∈ (cid:20) N gcd( N, M ) (cid:21)(cid:27)(cid:12)(cid:12)(cid:12)(cid:12) . (34) Let {S i l } η P l =1 be the η P distinct ones in {S j } N gcd( N,M ) j =1 and s l be the occurrence that S i l appears in {S j } N gcd( N,M ) j =1 for l ∈ [1 : η P ] ,i.e., s l , (cid:12)(cid:12)(cid:12)(cid:12)(cid:26) j ∈ (cid:20) N gcd( N, M ) (cid:21) : S j = S i l (cid:27)(cid:12)(cid:12)(cid:12)(cid:12) , ∀ l ∈ [1 : η P ] . (35) Example . An ( N = 9 , M = 4) SDA P and another ( N = 11 , M = 5) SDA P ′ can be written as follows, respectively. P =  ∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗  × , P ′ =  ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗  × . In fact, the SDAs P and P ′ are constructed by greedy Algorithm 3, which will be illustrated in Section VII-A in detail.Apparently, there are η P = 6 distinct columns in P (i.e., columns , , , , and ) with { s = 4 , s = 1 , s = 1 , s =1 , s = 1 , s = 1 } and η P ′ = 7 distinct columns in P ′ (i.e., columns , , , , , and ) with { s = 5 , s = 1 , s = 1 , s =1 , s = 1 , s = 1 , s = 1 } . Lemma 4.

Given N and M ∈ [1 : N ] , any ( N, M ) SDA P is associated to a set of parameters { α S } S⊆ [1: N ] , |S| = M that is afeasible solution to Problem 1.Proof: Notice from S1 and (33) that |S i l | = M for all l ∈ [1 : η P ] . Thus, we can obtain a set of parameters { α S } S⊆ [1: N ] , |S| = M , α S = ( s l · gcd( N,M ) N , if S = S i l for some l ∈ [1 : η P ]0 , otherwise , (36)where s l is deﬁned in (35). Then, for any n ∈ [1 : N ] , X S⊆ [1: N ] |S| = M,n ∈S α S = X l ∈ [1: η P ] n ∈S il α S il = gcd( N, M ) N · X l ∈ [1: η P ] n ∈S il s l ( a ) = gcd( N, M ) N · X j ∈ [1: N gcd( N,M ) ] ( p n,j = ∗ ) ( b ) = gcd( N, M ) N · M gcd( N, M )= µ, where ( a ) follows from (33) and (35), and ( b ) is due to S2. That is, the parameters { α S } S⊆ [1: N ] , |S| = M satisfy (22) and (23),and thus are feasible for Problem 1.Obviously, taking the feasible solution { α S } S⊆ [1: N ] , |S| = M and the ( M, K ) SF-PIR scheme in Algorithm 2 as inputs ofAlgorithm 1, one can obtain a storage design scheme by Lines 1-9 in Algorithm 1 and a capacity-achieving SF-SC-PIRscheme with sub-packetization η P · ( M − by Theorem 2, where η P = P S⊆ [1: N ] , |S| = M ( α S > by (36). Theorem 4.

Given any positive integers

N, K, M with M ∈ [2 : N ] and any ( N, M ) SDA P , there is a capacity-achieving ( µ = M/N, N, K ) linear SC-PIR scheme with sub-packetization η P · ( M − .Example . For the ( N = 9 , M = 4) SDA P in Example 1, set α { , , , } = 49 , α { , , , } = α { , , , } = α { , , , } = α { , , , } = α { , , , } = 19 , and all the other α S to be zeros. It is easy to see that { α S } S⊆ [1:9] , |S| =4 is a feasible solution of Problem 1 with P S⊆ [1:9] , |S| =4 ( α S >

0) = η P = 6 . Then, we can generate a capacity-achieving ( µ = 4 / , N = 9 , K ) linear SC-PIR scheme with sub-packetization . Similarly, the ( N = 11 , M = 5) SDA P ′ in Example 1 is associated to a capacity-achieving ( µ = 5 / , N = 11 , K ) linear SC-PIR scheme with sub-packetization .VI. E QUAL -S IZE C APACITY -A CHIEVING L INEAR

SC-PIR S

CHEMES

Recall that the setup in Section II allows us to partition each ﬁle into unequal-size packets. Actually, the equal-size partitionof the ﬁles is one of the most important cases in practice, which is also considered by the previous capacity-achieving SC-PIRschemes [26], [1], [31] and SF-PIR schemes [22], [23], [28] in the storage phase. Thus, we ﬁrst focus on capacity-achievingSC-PIR/SF-PIR schemes with small sub-packetization by imposing the assumption of equal-size packets, i.e., H ( W k, ) = H ( W k, ) = . . . = H ( W k,F ) = LF , ∀ k ∈ [1 : K ] . (37)For simplicity, we will refer to the sub-packetization of a scheme satisfying (37) as equal-size sub-packetization . In particular,we characterize the optimal equal-size sub-packetization of all capacity-achieving linear SF-SC-PIR schemes in the followingtheorem. Theorem 5.

Given any ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] , the optimal equal-size sub-packetization of allcapacity-achieving linear SF-SC-PIR schemes is given by N ( M − N,M ) . The theorem will be proved by constructing an SDA-based SF-SC-PIR scheme with equal-size sub-packetization and showingthe optimality of its sub-packetization separately.

A. SDA-Based SF-SC-PIR Schemes with Equal-size Sub-packetization

In this subsection, we construct an ( N, M ) SDA P with all columns being distinct, i.e., η P = N gcd( N,M ) . Later, it will beshown that such SDA P is associated to a capacity-achieving SF-SC-PIR scheme with equal-size sub-packetization N ( M − N,M ) .Before that, a simple example is presented. Example . For N = 12 and M = 5 , an SDA with all columns being distinct is given by P =  ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗  × . Then it corresponds to a set of non-zero and equal-size parameters α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } (38) = α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } = α { , , , , } = 112 . By employing these parameters and the ( M = 5 , K ) SF-PIR scheme in Algorithm 2 as inputs of Algorithm 1, a capacity-achieving SF-SC-PIR scheme is obtained, where each packet has equal size L and sub-packetization .Formally, an ( N, M ) SDA P = [ p i,j ] N × N gcd( N,M ) satisfying η P = N gcd( N,M ) is constructed as p i,j = ( ∗ , if i ∈ S j NULL , if i / ∈ S j , (39) where S j , (cid:0) [0 : M −

1] + ( j − · M (cid:1) N + 1 , ∀ j ∈ (cid:20) N gcd( N, M ) (cid:21) . (40)It is easy to check that the array P is an ( N, M ) SDA satisfying η P = N gcd( N,M ) , by the following three facts from S j ( j ∈ (cid:2) N gcd( N,M ) (cid:3) ) :F1. For each j ∈ (cid:2) N gcd( N,M ) (cid:3) , set S j is of size M , i.e., |S j | = M for all j ∈ [1 : N gcd( N,M ) ] ;F2. All the sets in (40) are distinct, i.e., S i = S j for any i = j ∈ [1 : N gcd( N,M ) ] ;F3. For any given n ∈ [1 : N ] , n exactly occurs in M gcd( N,M ) different sets in (40), i.e., |{ j ∈ [1 : N gcd( N,M ) ] : n ∈ S j }| = M gcd( N,M ) for all n ∈ [1 : N ] .By (36), α S = gcd( N,M ) N if S = S j for some j ∈ [1 : N gcd( N,M ) ] , and α S = 0 otherwise. Accordingly, all the non-zero { W k, S : k ∈ [1 : K ] , S ⊆ [1 : N ] , |S| = M, α S > } are of equal size and each is partitioned into M − equal-size packetsby Algorithms 1-2. Therefore, its capacity-achieving SF-SC-PIR scheme has equal-size sub-packetization N ( M − N,M ) . B. Optimality of Equal-Size Sub-packetization

Recall from Algorithm 1 that any feasible solution { α S } S⊆ [1: N ] , |S| = M to Problem 1 can support a capacity-achieving ( µ = M/N, N, K ) linear SF-SC-PIR scheme by employing any speciﬁc capacity-achieving ( M, K ) linear SF-PIR scheme asa building block. According to Line 4 in Algorithm 1 and (37), each W k, S of size α S > is partitioned into F SF equal-sizeddisjoint packets. Thus, to design a linear SF-SC-PIR scheme with equal-size sub-packetization, it is necessary that α S > isa constant. Lemma 5.

Given any ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] and { α S } S⊆ [1: N ] , the storage design of anycapacity-achieving linear SF-SC-PIR scheme with equal-size sub-packetization must satisfy: P6.

The equal-size partition storage is adopted, i.e., all the non-zero α S has the same value. From Theorem 1, the equal-size sub-packetization of any SF-SC-PIR scheme is no less than η ∗ e · ( M − , where η ∗ e is theoptimal value of the following problem by Lemmas 1 and 5. Deﬁnition 6.

Given any positive integers N and M = µN ∈ [2 : N ] , Problem 2 is deﬁned as ( { α ∗S } S⊆ [1: N ] , |S| = M , ∆ ∗ ) = arg min X S⊆ [1: N ] |S| = M ( α S > s.t. X S⊆ [1: N ] |S| = M,n ∈S α S = µ, ∀ n ∈ [1 : N ] (41) α S ∈ { ∆ , } , ∀ S ⊆ [1 : N ] , |S| = M (42) ≤ ∆ ≤ (43) where ( { α ∗S } S⊆ [1: N ] , |S| = M , ∆ ∗ ) is called the optimal solution to Problem 2 and η ∗ e = P S⊆ [1: N ] , |S| = M ( α ∗S > is called theoptimal value of Problem 2. Thus, we just need to prove that the optimal value η ∗ e of Problem 2 satisﬁes η ∗ e ≥ N gcd( N,M ) . From Problem 2, ( a ) = X S⊆ [1: N ] |S| = M α ∗S = ∆ ∗ · X S⊆ [1: N ] |S| = M ( α ∗S > ∗ · η ∗ e , where ( a ) follows by (32) and (41)-(43). Thus, we have ∆ ∗ = η ∗ e . By (41), the storage constraint of any server n is µ = X S⊆ [1: N ] |S| = M,n ∈S α ∗S = ∆ ∗ · X S⊆ [1: N ] |S| = M,n ∈S ( α ∗S > η ∗ e ν. Note that ν = P S⊆ [1: N ] , |S| = M,n ∈S ( α ∗S > is an integer. Then, the above equation indicates that ν = µ · η ∗ e = MN · η ∗ e is an integer. Therefore, η ∗ e must be greater than or equal to N gcd( N,M ) , which completes the proof of Theorem 5.VII. C APACITY -A CHIEVING L INEAR

SC-PIR S

CHEMES WITH L OWER S UB - PACKETIZATION

The sub-packetization reﬂects the implementation complexity of a scheme, speciﬁcally in a PIR system, low sub-packetizationachieves low complexity [28]. In order to further reduce sub-packetization, we allow unequal-size packets in this section.Notably, unequal-size packets are usually unavoidable in such SC-PIR [26], [1], since the memory-sharing technique typicallyresults in schemes with unequal-size packets [1], [16]. Particularly, memory-sharing is often used to achieve the capacity forany storage M ∈ [1 , N ] by resorting to the discrete points with M = 1 , , . . . , N .Next, we ﬁrst present an example to illustrate that allowing unequal-size packets can further decrease sub-packetization ofcapacity-achieving SC-PIR schemes. Example . An ( N = 12 , M = 5) SDA can be also constructed by the form of P =  ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗  × . There are η P = 6 distinct columns in P , i.e., columns , , , , and . By Theorem 4, the SDA can be used forconstructing a capacity-achieving ( µ = 5 / , N = 12 , K ) linear SC-PIR scheme with sub-packetization , which is smallerthan 48, the optimal equal-size sub-packetization as illustrated in Example 3.Based on Theorem 4, we wish to construct an SDA P with η P as low as possible for further reducing sub-packetization. A. Greedy Construction of Storage Design Arrays

In this subsection, we propose a greedy construction of ( N, M ) SDA P for any N and M ∈ [1 : N ] . By convenience, forany positive integers n, m , we use [ ∗ ] n × m to denote an array of size n × m with all the entries being “ ∗ ”s.Clearly, when gcd( N, M ) > , an ( N, M ) SDA P can be yielded by P =  P ′ ... P ′  N × N gcd( N,M )  gcd( N, M ) , (44) where P ′ is an ( N gcd( N,M ) , M gcd( N,M ) ) SDA of size N gcd( N,M ) × N gcd( N,M ) . That is, P is generated by repeating P ′ gcd( N, M ) times. Hereafter, we only concentrate on the construction of ( N, M ) SDA for gcd(

N, M ) = 1 .Notice that we aim to construct ( N, M ) SDA P with η P as small as possible for gcd( N, M ) = 1 . Intuitively, by thedeﬁnition of η P in (34), the columns of SDA should be repeated as much as possible to reduce η P , i.e., s l should be as bigas possible for every l ∈ [1 : η P ] . However, s l is upper bounded by min { M, N − M } since • s l ≤ M follows from S2 directly. • If s l > N − M , i.e., some column is repeated more than N − M times, without loss of generality, assume that the ﬁrst M rows and ﬁrst m ( m > N − M ) columns of P form array [ ∗ ] M × m . Then by S1, the ﬁrst m entries of the ( M + 1) -throw are NULL and thus the ( M + 1) -th row has at most N − m < M “ ∗ ”s, contradicting S2.Based on the above fact that each column in SDA is repeated at most min { M, N − M } times, Algorithm 3 is proposed torecursively construct ( N, M ) SDA for gcd(

N, M ) = 1 as follows:1)

Case 1 (Lines 6-7): If N − M ≥ M (i.e., N ≥ M ), min { M, N − M } = M . Thus, greedily generate [ ∗ ] M × M andthen proceed an ( N − M, M ) SDA.2)

Case 2 (Lines 9-10) If N − M < M (i.e.,

N < M ), min { M, N − M } = N − M . Hence greedily generate [ ∗ ] M × ( N − M ) , [ ∗ ] ( N − M ) × M , and then proceed an ( M, M − N ) SDA.The two cases above are recursively carried out until N = 1 , i.e., P = [ ∗ ] × . Algorithm 3

Greedy SDA Algorithm (G-SDA)

Input:

Positive integers ( N, M ) with ≤ M ≤ N and gcd( N, M ) = 1 ; Output: An ( N, M ) array P of size N × N ; Procedure

GreedySDA (

N, M ) if N = 1 then P = [ ∗ ] × ; else if N ≥ M then P ′ = GreedySDA ( N − M, M ) ; P = " [ ∗ ] M × M P ′ N × N ; else P ′ = GreedySDA (cid:0) M, M − N (cid:1) ; P = " [ ∗ ] M × ( N − M ) P ′ [ ∗ ] ( N − M ) × M N × N ; end if end if end Procedure Example . The SDA in Example 4 is in fact constructed by the G-SDA algorithm with input parameters ( N = 12 , M = 5) .The recursive processes are illustrated in (45), where P i (1 ≤ i ≤ is the output array in the i -th recursive step with inputparameters speciﬁed in the brackets to its right. P (12,5) P (7,5) P (5,3) P (3,1)  ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗ P  × −→  ∗ ∗ P ∗ ∗∗ ∗∗ ∗∗ ∗ ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗  × −→  ∗ ∗ P ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗  × −→ " ∗ P × P (2,1) P (1,1) −→ " ∗ P × −→ h ∗ i × (45) Theorem 6.

Given any positive integers

N, M with ≤ M ≤ N , there exists an ( N, M ) SDA P with η P = η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) ,where η ( N, M ) is recursively deﬁned for any N, M with ≤ M ≤ N and gcd( N, M ) = 1 by η ( N, M ) =  , if N = 11 + η (cid:0) N − M, M (cid:1) , if N > N ≥ M η ( M, M − N ) , if N >

N < M . (46)

Proof:

For any

N, M with ≤ M ≤ N , the SDA P in (44) has η P = η P ′ , where P ′ is an array output by Algorithm 3with input parameters (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) . Thus, it is sufﬁcient to prove that the output array of Algorithm 3 is an SDA P with η P = η ( N, M ) for any N, M with ≤ M ≤ N and gcd( N, M ) = 1 .First of all, for any input parameters ( N, M ) with ≤ M ≤ N and gcd( N, M ) = 1 , we observe two facts from Lines6 and 9 of Algorithm 3: (1)

During each recursive procedure, the recursive input parameters ( N, M ) always maintain theproperty that ≤ M ≤ N and gcd( N, M ) = 1 ; (2) N strictly decreases and thus eventually decreases to . Then, the recursiveprocedure will terminate at Line 3, i.e., N = 1 . Actually, it is also easy to observe that the recursions of Algorithm 3 happen η ( N, M ) times.Secondly, it is easy to verify from Lines 7 and 10 that P satisﬁes S1 and S2 with parameters ( N, M ) if and only if P ′ satisﬁes them with parameters ( N − M, M ) (if N ≥ M ) or ( M, M − N ) (if N < M ). So, we can easily prove that theoutput P is an SDA with η P satisfying (46) by the induction method.The corollary below follows from Theorems 4 and 6. Corollary 2.

For any positive integers

N, K, M with M ∈ [2 : N ] , there exists a capacity-achieving ( µ = M/N, N, K ) linearSC-PIR scheme with sub-packetization η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) · ( M − .Remark . Here, we show that the SDA P constructed in (44) can further decrease the sub-packetization of capacity-achievingSC-PIR schemes compared to the optimal equal-size sub-packetization N ( M − N,M ) in Theorem 5. Since the ( N, M ) SDA P has N gcd( N,M ) columns for any M ∈ [2 : N ] , η P = η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) ≤ N gcd( N,M ) . Remarkably, it is easy to prove from(46) that the equality (i.e., η P = N gcd( N,M ) ) holds if and only if M gcd( N,M ) = 1 or N − M gcd( N,M ) = 1 . In the other cases, i.e., M gcd( N,M ) = 1 and N − M gcd( N,M ) = 1 , we have η P < N gcd( N,M ) and thus such SDAs can be used for generating capacity-achievingSC-PIR schemes with sub-packetization strictly smaller than the optimal equal-size sub-packetization N ( M − N,M ) (e.g. Example4). In addition, when M gcd( N,M ) = 1 or N − M gcd( N,M ) = 1 , it will be shown in Theorem 8 that the associated capacity-achievingSC-PIR scheme has the optimal sub-packetization N ( M − N,M ) . B. Improved Construction of Storage Design Arrays

Recall that Algorithm 3 always greedily repeats columns in the current recursive step l , which may lead to many s l = 1 inthe latter steps and thus results in large η P . Particularly, when N = 2 M + 1 , it generates SDA with { s = M, s = . . . = s M = 1 } . In principle, in order to minimize η P of SDA, it should be better to design repeated columns from a globalperspective. For this case, by decreasing s to M − and increasing some s l from to , we are able to present an improvedconstruction to decrease the sub-packetization.Before that, it is worthy to point out the following simple property of SDA. Lemma 6.

For any ( N, M ) SDA P = [ p i,j ] N × N gcd( N,M ) , its opposite array P = [ p i,j ] N × N gcd( N,M ) deﬁned by p i,j = ( ∗ , if p i,j = NULLNULL , if p i,j = ∗ is an ( N, N − M ) SDA. Moreover, the number of distinct columns in P and P are equal, i.e., η P = η P . Firstly, given any positive integer M ≥ , we construct an ( N, M ) = (2 M + 1 , M ) SDA Q M of size (2 M + 1) × (2 M + 1) as • If M is even, Q M =  [ ∗ ] M × ( M − diag ( ∗ ) × ...diag ( ∗ ) × [ ∗ ] ( M +1) × M diag ( ∗ ) M × M diag ( ∗ ) M × M [ ∗ ] M ×  (2 M + 1) × (2 M + 1)  M blocks diag ( ∗ ) × ; • If M is odd, Q M =  [ ∗ ] M × ( M − diag ( ∗ ) × ... [ ∗ ] M +32 × ( M − diag ( ∗ ) × diag ( ∗ ) M − × M − diag ( ∗ ) M − × M − [ ∗ ] M − ×  (2 M + 1) × (2 M + 1)  M +12 blocks diag ( ∗ ) × , where diag( ∗ ) n × n denotes an n × n array with the entries in diagonal being “ ∗ ”s and the rest entries being “ NULL ”s. It iseasily checked that Q M is a (2 M + 1 , M ) SDA with η Q M = (cid:6) M (cid:7) + 3 . Example . When M = 4 and M = 5 , Q and Q are the following forms, respectively. Q =  ∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗∗ ∗ ∗ ∗  × , Q =  ∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗∗ ∗ ∗ ∗ ∗  × . Obviously, η Q = 5 with { s = 3 , s = 2 , s = 2 , s = 1 , s = 1 } and η Q = 6 with { s = 4 , s = 2 , s = 2 , s = 1 , s =1 , s = 1 } . Compared to P with { s = 4 , s = 1 , s = 1 , s = 1 , s = 1 , s = 1 } (resp. P ′ with { s = 5 , s = 1 , s = 1 , s =1 , s = 1 , s = 1 , s = 1 } ) in Example 1, the distribution of repeated columns s l is more ﬂexible than the greedy algorithm,which leads to a smaller number of distinct columns. Notice that Q M − is a (2 M − , M − SDA with η Q M − = (cid:4) M (cid:5) + 3 for any M ≥ . By Lemma 6, we can obtain a (2 M − , M ) SDA Q M − with η Q M − = η Q M − = (cid:4) M (cid:5) + 3 for any M ≥ . Next, based on Q M and Q M − , for any givenpositive integers M, d such that M ≥ , d ≥ , we can construct a class of ( N, M ) SDA with N = dM ± as: P = d − z }| {  [ ∗ ] M × M [ ∗ ] M × M . . . [ ∗ ] M × M P ′  N × N , (47)where P ′ = ( Q M , if N = dM + 1 Q M − , if N = dM − . The following result is straightforward.

Theorem 7.

Given any positive integer N = dM ± for integers M ∈ [3 : N ] and d ≥ , there is an ( N, M ) SDA P with η P = ( d + (cid:6) M (cid:7) + 1 , if N = dM + 1 d + (cid:4) M (cid:5) + 1 , if N = dM − . Remark . Notice that gcd(

N, M ) = 1 when N = dM ± for any integer d ≥ . Let ( N, M ) be the input parameters of theG-SDA algorithm, then the number of distinct columns of the output SDA is equal to η ( N, M ) in (46) at N = dM ± , i.e., η ( N, M ) = ( d + M, if N = dM + 1 d + M − , if N = dM − . Thus, the SDA presented in (47) decreases the number of distinct columns and further reduces the required sub-packetizationof capacity-achieving SC-PIR scheme.Finally, we obtain the following corollary from Theorems 4 and 7.

Corollary 3.

Given any positive integers

N, K, M with M ∈ [3 : N ] and N = dM ± for some integer d ≥ , there exists acapacity-achieving ( µ = M/N, N, K ) linear SC-PIR scheme with sub-packetization (cid:0) d + (cid:6) M (cid:7) + 1 (cid:1) · ( M − if N = dM + 1 or (cid:0) d + (cid:4) M (cid:5) + 1 (cid:1) · ( M − if N = dM − . In general, it is difﬁcult to extend the above improvement to all the parameters N and M ∈ [1 : N ] , since the modiﬁcation of s l becomes very complicated and sometimes even unsolvable under the SDA constraints S1 and S2 if there are many differentvalues of s l . C. Optimality on Sub-packetization of Greedy SDA

In this subsection, ﬁrstly we establish a lower bound on the optimal value of Problem 1. Next, we discuss the optimalityon sub-packetization of the SC-PIR scheme associated to the proposed SDA with respect to this bound.

Lemma 7.

For any positive integers

N, M with ≤ M < N , the optimal value of Problem 1 satisﬁes η ∗ ≥ max (cid:26)(cid:24) NM (cid:25) , (cid:24) NN − M (cid:25)(cid:27) . (48) Proof:

Suppose that { α ∗S } S⊆ [1: N ] , |S| = M is the optimal solution to Problem 1. Let {S i } η ∗ i =1 be the indices of non-zeroelements in the solution, i.e., α ∗S > if and only if S ∈ {S i } η ∗ i =1 . By (32), the optimal solution satisfying (22) and (23) impliesthe following constraint X S⊆ [1: N ] |S| = M α ∗S = X i ∈ [1: η ∗ ] α ∗S i = 1 . (49) Hence, there must be an index j ∈ [1 : η ∗ ] such that α ∗S j ≥ η ∗ . (50)Assume that η ∗ < (cid:6) NM (cid:7) . Then, NM − η ∗ > since η ∗ is an integer. Accordingly, there must exist a positive number c = NM − η ∗ > . By (50), α ∗S j ≥ NM − c> MN = µ. Thus, for any n ∈ S j , X S⊆ [1: N ] |S| = M,n ∈S α ∗S ≥ α ∗S j > µ, (51)which contradicts (22). Thus, η ∗ ≥ (cid:6) NM (cid:7) .Assume that η ∗ < (cid:6) NN − M (cid:7) . Similarly, we have η ∗ = NN − M − c for some c > . Then, α ∗S j ≥ NN − M − c> N − MN = 1 − µ. Since

M < N , there exists n ∈ [1 : N ] \S j such that X S⊆ [1: N ] |S| = M,n ∈S α ∗S = X S⊆ [1: N ] , |S| = M S6 = S j ,n ∈S α ∗S ( a ) ≤ − α ∗S j < µ, where ( a ) is due to (49), a contradiction to (22) again. That is, η ∗ ≥ (cid:6) NN − M (cid:7) .Recall from Theorem 6 and Corollary 2 that the capacity-achieving linear SC-PIR scheme associated to the SDA in (44)has sub-packetization F ( N,M ) G-SDA , η (cid:18) N gcd( N, M ) , M gcd( N, M ) (cid:19) · ( M − . (52)In general, it is difﬁcult to directly compare the sub-packetization F ( N,M ) G-SDA and the optimal value F ∗ = η ∗ · ( M − in Theorem 3 since η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) is deﬁned in (46) as a recursive form. In the following theorem, we obtain amultiplicative gap by relaxing η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) to an upper bound and η ∗ to a lower bound. In addition, this theoremalso characterize two special optimal cases. Theorem 8.

Given any ( µ, N, K ) SC-PIR system with M = µN ∈ [2 : N ] , the sub-packetization F ( N,M ) G-SDA is within amultiplicative gap min { M,N − M } gcd( N,M ) of the optimal sub-packetization of capacity-achieving linear SC-PIR schemes. In particular,in the cases min { M, N − M }| N or M = N , the sub-packetization F ( N,M ) G-SDA = N ( M − N,M ) is optimal.Proof: From Theorem 3, the optimal sub-packetization is F ∗ = η ∗ · ( M − . In the case M = N , since η ∗ ≥ , F ( N,M ) G-SDA = M − F ∗ is optimal. When M ∈ [2 : N − , ≤ F ( N,M ) G-SDA F ∗ ( a ) = η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) η ∗ (53) ( b ) ≤ N gcd( N,M ) max (cid:8)(cid:6) NM (cid:7) , (cid:6) NN − M (cid:7)(cid:9) ≤ min { M, N − M } gcd( N, M ) , (54)where ( a ) follows by (52); ( b ) is due to the lower bound in (48) and the upper bound η (cid:0) N gcd( N,M ) , M gcd( N,M ) (cid:1) ≤ N gcd( N,M ) byRemark 2.In particular, in the case min { M, N − M }| N , we have gcd( N, M ) = min { M, N − M } . Thus, F ( N,M ) G-SDA = F ∗ by (54), i.e., F ( N,M ) G-SDA = N gcd( N,M ) · ( M − is the optimal sub-packetization in this case. Remark . The SC-PIR problem straightly degrades to the problem of replicated PIR by setting M = N . Obviously, when M = N , our SC-PIR scheme associated to the SDA in (44) achieves the optimal sub-packetization F ∗ = M − N − ,which is the same as that in [28]. VIII. C ONCLUSION

In this paper, we investigated the sub-packetization of uncoded Storage Constrained PIR (SC-PIR). We ﬁrst characterized theoptimal sub-packetization of capacity-achieving SC-PIR schemes by an optimization problem, which is hard to solve due tothe involved non-continuous indicator functions. We introduced Storage Design Array (SDA) to construct practical operationalSC-PIR schemes with low sub-packetization. It turns out that, for the SC-PIR system with N servers and a total normalizedstorage capacity M ∈ { , . . . , N } , the equal-size sub-packetization N ( M − N,M ) is optimal among all capacity-achieving linearSC-PIR schemes characterized by Woolsey et al. . Finally, by allowing unequal-size packets, two constructions of SDAs wereproposed to further decrease the sub-packetization. The resultant sub-packetizations were shown to be optimal in the cases min { M, N − M }| N or M = N , and within a multiplicative gap of min { M,N − M } gcd( N,M ) compared to a lower bound on the optimalsub-packetization otherwise. A PPENDIX

In this section, we provide the proof of the ﬁve necessary conditions P1–P5 for any capacity-achieving linear SC-PIR scheme.To this end, we ﬁrst reﬁne the converse proof of SC-PIR capacity given in [1], and accordingly, we have placed an emphasison the necessary properties of capacity-achieving SC-PIR schemes by constraining the inequalities in the reﬁned proof to beheld with equalities. Further, the obtained properties are specialized to linear SC-PIR schemes to complete the proof. Actually,the similar approach was used in [28] and [37] for the setups of replicated PIR and MDS-coded PIR, respectively.We start by proving two useful lemmas.

A. Preliminary Lemmas

Lemma 8.

For any i ∈ [1 : N ] , K ⊆ [1 : K ] , and N ⊆ [1 : N ] , H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) = H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N ) , ∀ θ ∈ [1 : K ] . (55) Proof:

Notice that ≤ H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) − H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N )= I ( A [ θ ] i ; Q [ θ ][1: N ] \{ i } | W K , Z N , Q [ θ ] i ) ≤ I ( A [ θ ] i , W [1: K ] \K ; Q [ θ ][1: N ] \{ i } | W K , Z N , Q [ θ ] i )= I ( W [1: K ] \K ; Q [ θ ][1: N ] \{ i } | W K , Z N , Q [ θ ] i ) + I ( A [ θ ] i ; Q [ θ ][1: N ] \{ i } | W K , Z N , Q [ θ ] i ) ( a ) = I ( W [1: K ] \K ; Q [ θ ][1: N ] \{ i } | W K , Z N , Q [ θ ] i ) ≤ I ( W K , Z N ; Q [ θ ][1: N ] \{ i } | Q [ θ ] i ) ≤ I ( W K , Z N ; Q [ θ ]1: N ) ( b ) = I ( W K ; Q [ θ ]1: N ) ( c ) = 0 , where ( a ) holds because A [ θ ] i is a function of W K and Q [ θ ] i by (9); ( b ) follows from the fact that Z N is a function of theﬁles W K by (7); and ( c ) follows from the independence of ﬁles and queries by (8). Lemma 9.

For any i ∈ [1 : N ] , K ⊆ [1 : K ] , and N ⊆ [1 : N ] , H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) = H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i ) , ∀ θ, θ ′ ∈ [1 : K ] . (56) Proof:

For any θ ∈ [1 : K ] , ≤ I ( Q [ θ ] i , A [ θ ] i , W K , Z N ; θ ) ≤ I ( Q [ θ ] i , A [ θ ] i , W K , Z N ; θ ) ( a ) = I ( Q [ θ ] i , W K ; θ )= I ( Q [ θ ] i ; θ ) + I ( W K ; θ | Q [ θ ] i ) ( b ) = I ( Q [ θ ] i ; θ ) ≤ I ( Q [ θ ] i , A [ θ ] i , Z i ; θ ) ( c ) = 0 , where ( a ) holds because A [ θ ] i and Z N are determined by Q [ θ ] i and W K , ( b ) holds because the ﬁles are independent of thedesired ﬁle index and the query, and ( c ) follows from the privacy constraint (12).Notice that, I ( Q [ θ ] i , A [ θ ] i , W K , Z N ; θ ) = 0 indicates that ( Q [ θ ] i , A [ θ ] i , W K , Z N ) and θ are independent of each other. Therefore,(56) holds.Next, we present some properties of any SC-PIR scheme. B. Properties of SC-PIR Schemes

For n ∈ [0 : N − and k ∈ [0 : K − , deﬁne T ( n, k ) , N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) (57)and T ( n, K ) , , ∀ n ∈ [0 : N − . (58)In addition, λ n , K (cid:0) Nn (cid:1) X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] H ( W θ | Z N ) , ∀ n ∈ [0 : N − . (59) Lemma 10.

For each n ∈ [0 : N − and k ∈ [0 : K − , any SC-PIR scheme must satisfy the following inductive relationship. T ( n, k ) ≥ N − n " N − X n ′ = n T ( n ′ , k + 1) + λ n . (60) Moreover, to establish the equality in (60) , for every realization of queries e Q [ θ ]1: N with positive probability, • For any

K ⊆ [1 : K ] of size k , N ⊆ [1 : N ] of size n , and θ ∈ [1 : K ] \K , X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N ) = H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N ); (61) • For any

K ⊆ [1 : K ] , θ ∈ K , N ⊆ [1 : N ] , N ⊆ [1 : N ] \N and i ∈ [1 : N ] \ ( N ∪ N ) such that |K| = k + 1 , |N | = n , |N | = n ′ − n with n ′ ∈ [ n + 1 : N − , I ( A [ θ ] i ; Z N | W K , Z N , A [ θ ] N , Q [ θ ]1: N = e Q [ θ ]1: N ) = 0 . (62) Proof:

For any n ∈ [0 : N − and k ∈ [0 : K − , T ( n, k ) in (57) can be lower bounded as T ( n, k ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) ( a ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N ) ( b ) ≥ N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) (63) ( c ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K h H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) + H ( W θ | A [ θ ][1: N ] \N , W K , Z N , Q [ θ ]1: N ) i = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K H ( A [ θ ][1: N ] \N , W θ | W K , Z N , Q [ θ ]1: N ) ( d ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K H ( A [ θ ][1: N ] \N | W ( K∪{ θ } ) , Z N , Q [ θ ]1: N )+ 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K H ( W θ | Z N ) ( e ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k +1 X N ⊆ [1: N ] |N | = n X θ ∈K H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) | {z } , e T ( n,k ) + 1 N − n · K (cid:0) K − k (cid:1)(cid:0) Nn (cid:1) X K⊆ [1: K ] |K| = k X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] \K H ( W θ | Z N ) ( f ) = e T ( n, k ) + 1 N − n · K (cid:0) K − k (cid:1)(cid:0) Nn (cid:1) X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] X K⊆ [1: K ] \{ θ }|K| = k H ( W θ | Z N ) ( g ) = e T ( n, k ) + 1 N − n · K (cid:0) Nn (cid:1) X N ⊆ [1: N ] |N | = n X θ ∈ [1: K ] H ( W θ | Z N ) ( h ) = e T ( n, k ) + 1 N − n · λ n , (64)where ( a ) follows by (55); ( b ) follows from the property of independence bound on entropy; ( c ) follows from (9) in which A [ θ ] N are a determined function of Z N and Q [ θ ] N , thus with ( A [ θ ][1: N ] \N , Z N , Q [ θ ]1: N ) , the ﬁle W θ can be decoded by (10), i.e., H ( W θ | A [ θ ][1: N ] \N , W K , Z N , Q [ θ ]1: N ) = 0 ; ( d ) is due to H ( W θ | W K , Z N , Q [ θ ]1: N ) = H ( W θ | Z N ) for θ / ∈ K by the independence ofthe ﬁles (2) and the fact that queries are independent of the ﬁles (8); ( e ) and ( f ) follow by simply changing the summationindices; ( g ) follows from P K⊆ [1: K ] \{ θ } , |K| = k H ( W θ | Z N ) = (cid:0) K − k (cid:1) H ( W θ | Z N ) ; ( h ) follows from the deﬁnition of λ n in (59).Notice that when n ∈ [0 : N − and k = K − , all the ﬁles W K are presented in the conditions of each entropy functionin e T ( n, k ) . Since the answers A [ θ ][1: N ] \N is a function of Q [ θ ][1: N ] \N and W K by (9), we have e T ( n, K −

1) = 0 . Thus, by (64),for n ∈ [0 : N − and k = K − , T ( n, K − ≥ N − n · λ n , (65)This proves (60) for n ∈ [0 : N − and k = K − . We proceed to prove the other cases by deriving a lower bound on e T ( n, k ) . Let P N be the set consisting of all possible permutations of [1 : N ] , and σ , ( σ , σ , . . . , σ N ) ∈ P N denote a permutationoperation of [1 : N ] . For any n ∈ [0 : N − and k ∈ [0 : K − , we further lower bound e T ( n, k ) as follows. e T ( n, k )= 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k +1 X θ ∈K X N ⊆ [1: N ] |N | = n H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) ( a ) = 1 N K (cid:0) K − k (cid:1)(cid:0) N − n (cid:1) X K⊆ [1: K ] |K| = k +1 X θ ∈K n !( N − n )! X σ ∈P N H ( A [ θ ] σ [ n +1: N ] | W K , Z σ [1: n ] , Q [ θ ]1: N ) ( b ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K X σ ∈P N N − X n ′ = n H ( A [ θ ] σ n ′ +1 | W K , Z σ [1: n ] , A [ θ ] σ [ n +1: n ′ ] , Q [ θ ]1: N ) ( c ) ≥ N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K X σ ∈P N N − X n ′ = n H ( A [ θ ] σ n ′ +1 | W K , Z σ [1: n ] , Z σ [ n +1: n ′ ] , A [ θ ] σ [ n +1: n ′ ] , Q [ θ ]1: N ) (66) ( d ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K X σ ∈P N N − X n ′ = n H ( A [ θ ] σ n ′ +1 | W K , Z σ [1: n ] , Z σ [ n +1: n ′ ] , Q [ θ ]1: N ) ( e ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K N − X n ′ = n X σ ∈P N H ( A [ θ ] σ n ′ +1 | W K , Z σ [1: n ′ ] , Q [ θ ] σ n ′ +1 ) ( f ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K N − X n ′ = n n ′ !( N − n ′ − X N ⊆ [1: N ] |N | = n ′ X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ] i ) ( g ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 X θ ∈K N − X n ′ = n n ′ !( N − n ′ − X N ⊆ [1: N ] |N | = n ′ X i ∈ [1: N ] \N X θ ′ ∈ [1: K ] \K H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i ) K − k − ( h ) = 1 N ! K (cid:0) K − k (cid:1) ( N − n ) X K⊆ [1: K ] |K| = k +1 N − X n ′ = n n ′ !( N − n ′ − X N ⊆ [1: N ] |N | = n ′ X i ∈ [1: N ] \N X θ ′ ∈ [1: K ] \K k + 1 K − k − H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i )= 1 N − n N − X n ′ = n N K (cid:0) K − k +1 (cid:1)(cid:0) N − n ′ (cid:1) X K⊆ [1: K ] |K| = k +1 X N ⊆ [1: N ] |N | = n ′ X θ ′ ∈ [1: K ] \K X i ∈ [1: N ] \N H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i )= 1 N − n N − X n ′ = n T ( n ′ , k + 1) , (67)where ( a ) holds because for each N ⊆ [1 : N ] with |N | = n , there are exactly n !( N − n )! permutation operations σ ∈ P N satisfying H ( A [ θ ] σ [ n +1: N ] | W K , Z σ [1: n ] , Q [ θ ]1: N ) = H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) , i.e., such n !( N − n )! permutation operations σ arerestricted over N and [1 : N ] \N , hence, X σ ∈P N H ( A [ θ ] σ [ n +1: N ] | W K , Z σ [1: n ] , Q [ θ ]1: N ) = n !( N − n )! · X N ⊆ [1: N ] |N | = n H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N );( b ) follows from the chain rule of entropy; ( c ) holds because conditioning reduces entropy; ( d ) holds because the answers A [ θ ] σ [ n +1: n ′ ] is a function of the stored contents Z σ [ n +1: n ′ ] and the queries Q [ θ ] σ [ n +1: n ′ ] by (9); ( e ) follows by (55); ( f ) follows fromsimilar arguments to ( a ) ; ( g ) follows from (56); ( h ) is due to P θ ∈K H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i ) = ( k +1) H ( A [ θ ′ ] i | W K , Z N , Q [ θ ′ ] i ) .We prove (60) by replacing (64) with (67).Remarkably, to establish the equality in (60), the inequalities in (63) and (66) have to hold with equalities. • The equality in (63) indicates that for any

K ⊆ [1 : K ] of size k , N ⊆ [1 : N ] of size n , and θ ∈ [1 : K ] \K , X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N ) = H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N ) . That is, X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N ) − H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N )= X e Q [ θ ]1: N Pr( Q [ θ ]1: N = e Q [ θ ]1: N )  X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N ) − H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N )  . (68)Whereas, for each realization of queries e Q [ θ ]1: N with positive probability, X i ∈ [1: N ] \N H ( A [ θ ] i | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N ) ≥ H ( A [ θ ][1: N ] \N | W K , Z N , Q [ θ ]1: N = e Q [ θ ]1: N ) . That is, the terms in square bracket of (68) are nonnegative. Accordingly, (61) holds for all realizations of queries e Q [ θ ]1: N with positive probability. • Similarly, the equality in (66) indicates that for any

K ⊆ [1 : K ] , θ ∈ K , N ⊆ [1 : N ] , N ⊆ [1 : N ] \N and i ∈ [1 : N ] \ ( N ∪ N ) such that |K| = k + 1 , |N | = n , |N | = n ′ − n with n ′ ∈ [ n + 1 : N − , H ( A [ θ ] i | W K , Z N , A [ θ ] N , Q [ θ ]1: N ) − H ( A [ θ ] i | W K , Z N , Z N , A [ θ ] N , Q [ θ ]1: N )= I ( A [ θ ] i ; Z N | W K , Z N , A [ θ ] N , Q [ θ ]1: N )= X e Q [ θ ]1: N Pr( Q [ θ ]1: N = e Q [ θ ]1: N ) I ( A [ θ ] i ; Z N | W K , Z N , A [ θ ] N , Q [ θ ]1: N = e Q [ θ ]1: N ) . (69)Notice that the mutual information terms in (69) are nonnegative. Consequently, they have to be zero for all realizationsof queries e Q [ θ ]1: N with positive probability, i.e., (62) holds.The proof of this lemma is completed.Deﬁne coefﬁcients e α ℓ and a function e D ( ℓ ) as follows: e α ℓ , (cid:0) Nℓ (cid:1) X S⊆ [1: N ] |S| = ℓ α S , ∀ ℓ ∈ [0 : N ] , (70) e D ( ℓ ) , ℓ + . . . + 1 ℓ K − , ∀ ℓ ∈ [1 : N + 1] (71)with the boundary conditions e α N +1 , , e D (0) , N K.

We next apply e α ℓ to write the constraint of ﬁle size in (19) as N X ℓ =1 (cid:18) Nℓ (cid:19)e α ℓ = 1 . (72) Lemma 11.

The download cost D for any SC-PIR scheme has the following lower bound: D ≥ L · N X ℓ =1 (cid:18) Nℓ (cid:19) e D ( ℓ ) e α ℓ . (73) Moreover, given any

S ⊆ [1 : N ] and θ, θ ′ ∈ [1 : K ] such that |S| = M ∈ [2 : N ] and θ = θ ′ , if the equality in (73) holds,then for every realization of queries e Q [ θ ]1: N with positive probability, any SC-PIR scheme must satisfy P3 ′ . The answers at servers in S are independent of each other in the conditioning of W [1: K ] \{ θ } and Z [1: N ] \S , i.e., X i ∈S H ( A [ θ ] i | W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) = H ( A [ θ ] S | W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) . P4 ′ . The answers at servers in S are independent of each other in the conditioning of W θ ′ and Z [1: N ] \S , i.e., X i ∈S H ( A [ θ ] i | W θ ′ , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) = H ( A [ θ ] S | W θ ′ , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) . P5 ′ . The answer at server i is independent of the contents stored at server j in the conditioning of W θ , W θ ′ , Z [1: N ] \S and A [ θ ] j for any i, j ∈ S , i.e., I ( A [ θ ] i ; Z j | W θ , W θ ′ , Z [1: N ] \S , A [ θ ] j , Q [ θ ]1: N = e Q [ θ ]1: N ) = 0 , ∀ i, j ∈ S . Proof:

We have the following two boundary conditions on T ( n, k ) in (57) and λ n in (59): T (0 ,

0) = 1

N K X θ ∈ [1: K ] X i ∈ [1: N ] H ( A [ θ ] i | Q [ θ ] i ) ( a ) = 1 N X i ∈ [1: N ] H ( A [ θ ] i | Q [ θ ] i ) , (74)where ( a ) follows from (56) by setting N = ∅ and K = ∅ , and λ = 1 K X θ ∈ [1: K ] H ( W θ ) ( a ) = L, (75)where ( a ) is due to (1). Therefore, D ( a ) = N X i =1 H ( A [ θ ] i ) ≥ N X i =1 H ( A [ θ ] i | Q [ θ ] i ) ( b ) = N · T (0 , ( c ) ≥ λ + N − X n =0 T ( n , ( d ) ≥ λ + N − X n =0 λ n N − n + N − X n =0 N − X n = n λ n ( N − n )( N − n ) + N − X n =0 N − X n = n N − X n = n λ n ( N − n )( N − n )( N − n )+ . . . + N − X n =0 N − X n = n . . . N − X n K − = n K − λ n K − + P N − n K = n K − T ( n K , K )( N − n ) × ( N − n ) × . . . × ( N − n K − ) ( e ) = L + N − X n =0 λ n N − n + N − X n =0 N − X n = n λ n ( N − n )( N − n ) + N − X n =0 N − X n = n N − X n = n λ n ( N − n )( N − n )( N − n )+ . . . + N − X n =0 N − X n = n . . . N − X n K − = n K − λ n K − ( N − n ) × ( N − n ) × . . . × ( N − n K − ) ( f ) = L + N X n =1 λ N − n n + N X n =1 n X n =1 λ N − n n n + N X n =1 n X n =1 n X n =1 λ N − n n n n + . . . + N X n =1 . . . n K − X n K − =1 λ N − n K − n × . . . × n K − g ) = L + N X n =1 λ N − n n + N X n =1 N X n = n λ N − n n n + N X n =1 N X n = n N X n = n λ N − n n n n + . . . + N X n =1 . . . N X n K − = n K − λ N − n n × . . . × n K − = L + N X n =1 n + N X n = n n n + N X n = n N X n = n n n n + . . . + N X n = n . . . N X n K − = n K − n × . . . × n K − | {z } , S ( n ,K ) ! · λ N − n ( h ) = L + N X n =1 n X ℓ =1 (cid:18) n ℓ (cid:19) S ( n , K ) e α ℓ L = L · N X ℓ =1 e α ℓ N X n = ℓ (cid:18) n ℓ (cid:19) S ( n , K ) | {z } , α ( ℓ,K ) ! ( i ) = L · N X ℓ =1 (cid:18) Nℓ (cid:19) (cid:16) e D ( ℓ ) − (cid:17) e α t ! ( j ) = L · N X ℓ =1 (cid:18) Nℓ (cid:19) e D ( ℓ ) e α ℓ , where ( a ) follows from (13); ( b ) follows from (74); ( c ) and ( d ) follow by applying (60) K times recursively; ( e ) follows from(75) and (58); ( f ) and ( g ) follow by changing of the summation indices simply; ( h ) holds because λ n = P N − nℓ =1 (cid:0) N − nℓ (cid:1)e α ℓ L by the result in [1, Eq. (54)]; ( i ) follows from the deﬁnition of e D ( ℓ ) in (71) and α ( ℓ, K ) = (cid:0) Nℓ (cid:1) ( e D ( ℓ ) − by the result in[1, Eq. (60)]; ( j ) follows from (72).In order to establish the equality in (73), it is easy to see that, in the steps of applying (60), all the inequalities must holdwith equalities. This implies that the equality in (60) holds for all n ∈ [0 : N − and k ∈ [1 : K − . Thus, by Lemma 10,(61) and (62) hold for any n ∈ [0 : N − and k ∈ [1 : K − .Accordingly, for any θ, θ ′ ∈ [1 : K ] and S ⊆ [1 : N ] such that θ = θ ′ and |S| = N − n = M , we have • Let K = [1 : K ] \{ θ } and N = [1 : N ] \S , then K is of size k = K − and N is of size n ∈ [0 : N − (i.e., M ∈ [1 : N ] ). Thus, by (61), X i ∈S H ( A [ θ ] i | W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) = H ( A [ θ ] S | W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) . • Let K = { θ ′ } and N = [1 : N ] \S , then K is of size k = 1 and N is of size n ∈ [0 : N − . Thus, by (61) again, X i ∈S H ( A [ θ ] i | W θ ′ , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) = H ( A [ θ ] S | W θ ′ , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N ) . • For n ∈ [0 : N − (i.e., M ∈ [2 : N ] ), let k = 1 and n ′ = n + 1 , then K = { θ, θ ′ } is of size k + 1 , N = [1 : N ] \S isof size n , N = { j } is of size n ′ − n = 1 for j ∈ S . Thus given any i ∈ S\{ j } , by (62), I ( A [ θ ] i ; Z j | W θ , W θ ′ , Z [1: N ] \S , A [ θ ] j , Q [ θ ]1: N = e Q [ θ ]1: N ) = 0 . To conclude, P3 ′ -P5 ′ must hold if the equality in (73) holds. C. Proof of Lemma 1

For ﬁxed j ∈ [0 : N ] , the total storage of the N servers (20) is constrained as µN ≥ X n ∈ [1: N ] X S⊆ [1: N ] n ∈S α S (76) = N X ℓ =1 ℓ (cid:18) Nℓ (cid:19)e α ℓ = j + X ℓ ∈ [1: N ] \{ j,j +1 } (cid:18) Nℓ (cid:19) ( ℓ − j ) e α ℓ + (cid:18) Nj + 1 (cid:19)e α j +1 , (77)where the last two equalities follow from (70) and (72), respectively. Hence, DL ( a ) ≥ N X ℓ =1 (cid:18) Nℓ (cid:19) e D ( ℓ ) e α ℓ (78) ( b ) = e D ( j ) + X ℓ ∈ [1: N ] \{ j,j +1 } (cid:18) Nℓ (cid:19) (cid:16) e D ( ℓ ) − e D ( j ) (cid:17) e α ℓ + (cid:18) Nj + 1 (cid:19) (cid:16) e D ( j + 1) − e D ( j ) (cid:17) e α j +1( c ) ≥ e D ( j ) + X ℓ ∈ [1: N ] \{ j,j +1 } (cid:18) Nℓ (cid:19) (cid:16) e D ( ℓ ) − e D ( j ) (cid:17) e α ℓ + (cid:16) e D ( j + 1) − e D ( j ) (cid:17) (cid:16) µN − j − X ℓ ∈ [1: N ] \{ j,j +1 } (cid:18) Nℓ (cid:19) ( ℓ − j ) e α ℓ (cid:17) (79) = ( µN − j ) e D ( j + 1) − ( µN − j − e D ( j ) + X ℓ ∈ [1: N ] \{ j,j +1 } (cid:18) Nℓ (cid:19)(cid:16) e D ( ℓ ) + ( ℓ − j − e D ( j ) − ( ℓ − j ) e D ( j + 1) (cid:17)e α ℓ ( d ) ≥ ( µN − j ) e D ( j + 1) − ( µN − j − e D ( j ) (80) = ( M − j ) e D ( j + 1) − ( M − j − e D ( j ) , (81)where ( a ) follows by (73); ( b ) is due to (72); ( c ) follows from (77) and the fact that e D ( j + 1) − e D ( j ) is negative for all j ∈ [0 : N ] ; ( d ) is because e α ℓ ≥ and e D ( ℓ ) + ( ℓ − j − e D ( j ) − ( ℓ − j ) e D ( j + 1) ≥ for all ℓ ∈ [1 : N ] \{ j, j + 1 } by [1,Lemma 5].Therefore, by (13) and (81), we have R ≤ (cid:16) ( M − j ) e D ( j + 1) − ( M − j − e D ( j ) (cid:17) − , ∀ j ∈ [0 : N ] . (82)In particular, for j = M and j = M − , (82) results in R ≤ (cid:16) e D ( M ) (cid:17) − = (cid:18) M + . . . + 1 M K − (cid:19) − , (83)which achieves the capacity of the SC-PIR system (see (14)).Thus, for any capacity-achieving SC-PIR scheme, the inequality in (83) must hold with equality, which implies that theinequalities in (78), (79) and (80) hold with equality for both j = M and j = M − . Therefore, • It is easy to prove e D ( ℓ ) + ( ℓ − j − e D ( j ) − ( ℓ − j ) e D ( j + 1) > for all ℓ ∈ [1 : N ] \{ j, j + 1 } by [1, Lemma 5].Moreover, we also have e α ℓ ≥ . Thus, the equality in (80) for j = M and j = M − indicates that e α ℓ = 0 holds forall ℓ ∈ ([1 : N ] \{ M, M + 1 } ) ∪ ([1 : N ] \{ M − , M } ) = [1 : N ] \{ M } . Accordingly, by (70), we have α S = 0 for all S ⊆ [1 : N ] with |S| 6 = M . • Due to e D ( j + 1) − e D ( j ) < for all j ∈ [0 : N ] , thus the equality in (79) indicates that the equality in (76) also holds,i.e., the equalities in (20) hold for all n ∈ [1 : N ] .As a result, P1-P2 must be satisﬁed in the storage design phase of any capacity-achieving SC-PIR scheme. In addition, theequality in (78) indicates D = L · P Nℓ =1 (cid:0) Nℓ (cid:1) e D ( ℓ ) e α ℓ , thus we obtain the following corollary by Lemma 11. Corollary 4.

Any capacity-achieving SC-PIR scheme must satisfy P3 ′ -P5 ′ .D. Proof of Lemma 2 By Corollary 4, P3 ′ -P5 ′ hold for any capacity-achieving SC-PIR scheme. Next, we prove Lemma 2 by specializing P3 ′ -P5 ′ to linear SC-PIR schemes.According to Lemma 1, in the capacity-achieving SC-PIR scheme, all the servers only store the packets placing in M different servers, thus the stored contents (18) at server n are simpliﬁed to Z n = ∪ k ∈ [1: K ] ∪ S⊆ [1: N ] |S| = M,n ∈S W k, S , ∀ n ∈ [1 : N ] . (84)For any realization of queries e Q [ θ ]1: N with positive probability, the answer A [ θ ] n is a deterministic function of the correspondingquery realization e Q [ θ ] n and the stored contents Z n by (9). Therefore, by the stored contents in (84), A [ θ ] n is merely the functionof W θ, S and e Q [ θ ] n conditioned on the ﬁles W [1: K ] \{ θ } and Z [1: N ] \S . For linear SC-PIR scheme, it is exactly f LC [ θ ] n ( W θ, S ) conditioned on W [1: K ] \{ θ } and Z [1: N ] \S . Therefore, X n ∈S H (cid:16) A [ θ ] n (cid:12)(cid:12) W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N (cid:17) = X n ∈S H (cid:16) f LC [ θ ] n ( W θ, S ) (cid:12)(cid:12) W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N (cid:17) ( a ) = X n ∈S H (cid:0) f LC [ θ ] n ( W θ, S ) (cid:1) , (85)where ( a ) follows from independence of all the packets (5), and the fact that queries are independent of ﬁles by (8).Similarly, H (cid:16) A [ θ ] S (cid:12)(cid:12) W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N (cid:17) = H (cid:16) f LC [ θ ] n ( W θ, S ) : n ∈ S (cid:12)(cid:12) W [1: K ] \{ θ } , Z [1: N ] \S , Q [ θ ]1: N = e Q [ θ ]1: N (cid:17) = H (cid:16) f LC [ θ ] n ( W θ, S ) : n ∈ S (cid:17) . (86)Substituting the two sides in P3 ′ for (85) and (86), we obtain X n ∈S H (cid:16) f LC [ θ ] n ( W θ, S ) (cid:17) = H (cid:16) f LC [ θ ] n ( W θ, S ) : n ∈ S (cid:17) . Hence the random variables (cid:8) f LC [ θ ] n ( W θ, S ) : n ∈ S (cid:9) are independent of each other, i.e., P3 holds.By the similar argument to (85), P4 ′ can be rewritten as X n ∈S H (cid:16)g LC [ θ ] n ( W [1: K ] \{ θ ′ } , S ) (cid:17) = H (cid:16)g LC [ θ ] n ( W [1: K ] \{ θ ′ } , S ) : n ∈ S (cid:17) . Therefore, g LC [ θ ] n (cid:0) W [1: K ] \{ θ ′ } , S (cid:1) , ∀ n ∈ S are also independent of each other, i.e., P4 holds.Let n, n ′ ∈ S and θ ′ = θ . By P5 ′ , I (cid:16) A [ θ ] n ; Z n ′ (cid:12)(cid:12) W θ , W θ ′ , Z [1: N ] \S , A [ θ ] n ′ , Q [ θ ]1: N = e Q [ θ ]1: N (cid:17) ( a ) = I (cid:16) f LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ); W [1: K ] \{ θ,θ ′ } , S (cid:12)(cid:12) g LC [ θ ] n ′ ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) = H (cid:16) f LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) (cid:12)(cid:12) f LC [ θ ] n ′ ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) − H (cid:16)g LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S } (cid:12)(cid:12) W [1: K ] \{ θ,θ ′ } , S , g LC [ θ ] n ′ ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) ( b ) = H (cid:16)g LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) (cid:12)(cid:12) g LC [ θ ] n ′ ( W [1: K ] \{ θ,θ ′ } , S ) (cid:17) , where ( a ) follows from the similar argument to (85) again; ( b ) is due to the fact that g LC [ θ ] n ( W [1: K ] \{ θ,θ ′ } , S ) is a function ofpackets in W [1: K ] \{ θ,θ ′ } , S by (24) and (25). Thus, g LC [ θ ] n (cid:0) W [1: K ] \{ θ,θ ′ } , S (cid:1) , ∀ n ∈ S are deterministic of each other, i.e., P5 holds.As a result, P3-P5 are necessary conditions of any capacity-achieving linear SC-PIR scheme.R EFERENCES[1] M.A. Attia, D. Kumar, and R. Tandon, “The capacity of private information retrieval from uncoded storage constrained databases,”

IEEE Trans. Inf.Theory , vol. 66, no. 11, pp. 6617-6634, Nov. 2020.[2] K. Banawan, B. Arasli, and S. Ulukus, “Improved storage for efﬁcient private information retrieval,” ,Visby, Sweden, 2019, pp. 1-5.[3] K. Banawan and S. Ulukus, “The capacity of private information retrieval from coded databases,”

IEEE Trans. Inf. Theory , vol. 64, no. 3, pp. 1945-1956,Mar. 2018.[4] M. Bellare, S. Halevi, A. Sahai, and S. Vadhan, “Many-to-one trapdoor functions and their relations to public-key cryptosystems,”

CRYPTO ’98, LNCS ,1998.[5] S.R. Blackburn, T. Etzion and M. B. Paterson, “PIR schemes with small download complexity and low storage requirements,”

IEEE Trans. Inf. Theory ,doi: 10.1109/TIT.2019.2942311[6] C. Cachin, S. Micali, and M. Stadler, “Computationally private information retrieval with polylogarithmic communication,” in

Proc. Int. Conf. TheoryAppl. Cryptographic Techn. Advances Cryptology , 1999, pp. 402-414.[7] T.H. Chan, S.-W. Ho, and H. Yamamoto, “Private information retrieval for coded storage,” in

Proc. IEEE ISIT , Nov. 2015, pp. 2842-2846.[8] Y.-C. Chang, “Single database private information retrieval with logarithmic communication,” in

Information Security and Privacy (ACISP) , pp. 50-61,2004.[9] B. Chor, O. Goldreich, E. Kushilevitz, and M. Sudan, “Private information retrieval,” in

Proc. 36th Annu. Symp. Found. Comput. Sci. , 1995, pp. 41-50.[10] A. Fazeli, A. Vardy, and E. Yaakobi, “Codes for distributed PIR with low storage overhead,” in

Proc. IEEE ISIT , 2015, pp. 2852-2856.[11] W. Gasarch, “A survey on private information retrieval,”

Bull. EATCS , vol. 82, pp. 72-107, 2004.[12] C. Gentry and Z. Ramzan, “Single-database private information retrieval with constant communication rate,” in

Proc. 31th Int. Colloq. Automata, Lang.Program. , 2005, pp. 803-815.[13] J. Katz and L. Trevisan, “On the efﬁciency of local decoding procedures for error-correcting codes,” in

Proc. 32nd Annu. ACM Symp. Theory Comput. ,2000, pp. 80-86. [14] S. Kumar, H.-Y. Lin, E. Rosnes, and A.G.i Amat, “Achieving maximum distance separable private information retrieval capacity with linear codes,” IEEE Trans. Inf. Theory , vol. 65, no. 7, pp. 4243-4273, Jul. 2019.[15] E. Kushilevitz and R. Ostrovsky, “One-way trapdoor permutations are sufﬁcient for non-trivial single-server private information retrieval,” in

Proc.EUROCRYPT , 2000, pp. 104-121.[16] M.A. Maddah-Ali and U. Niesen, “Fundamental limits of caching,”

IEEE Trans. Inf. Theory , vol. 60, no. 5, pp. 2856-2867, May 2014.[17] M.A. Maddah-Ali and U. Niesen, “Decentralized coded caching attains order-optimal memory-rate tradeoff,”

IEEE/ACM Trans. Netw. , vol. 23, no. 4,pp. 1029-1040, Aug. 2015.[18] R. Ostrovsky and W.E. Skeith III, “A survey of single-database private information retrieval: techniques and applications,” in

Proc. Int. Workshop PublicKey Cryptograph. , 2007, pp. 393-411.[19] S. Rao and A. Vardy, “Lower bound on the redundancy of PIR codes,” [Online]. Available: https://arxiv.org/abs/1605.01869, 2016.[20] N.B. Shah, K.V. Rashmi, and K. Ramchandran, “One extra bit of download ensures perfectly private information retrieval,” in

Proc. IEEE ISIT , 2014,pp. 856-860.[21] K. Shanmugam, A.M. Tulino, and A.G. Dimakis, “Coded caching with linear subpacketization is possible using Ruzsa-Szem´eredi graphs,” in

Proc. IEEEISIT , 2017, pp. 1237-1241.[22] H. Sun and S.A. Jafar, “The capacity of private information retrieval,”

IEEE Trans. Inf. Theory , vol. 63, no. 7, pp. 4075-4088, Jul. 2017.[23] H. Sun and S.A. Jafar, “Optimal download cost of private information retrieval for arbitrary message length,”

IEEE Trans. Inf. Forensics Security , vol.12, no. 12, pp. 2920-2932, Dec. 2017.[24] H. Sun and S.A. Jafar, “Multiround private information retrieval: capacity and storage overhead,”

IEEE Trans. Inf. Theory , vol. 64, no. 8, pp. 5743-5754,Aug. 2018.[25] R. Tajeddine, O.W. Gnilke, and S.El Rouayheb, “Private information retrieval from mds coded data in distributed storage systems,”

IEEE Trans. Inf.Theory , vol. 64, no. 11, pp. 7081-7093, Nov. 2018.[26] R. Tandon, M. Abdul-Wahid, F. Almoualem, and D. Kumar, “PIR from storage constrained databases-coded caching meets PIR,” in . IEEE, 2018, pp. 1-7.[27] L. Tang and A. Ramamoorthy, “Coded caching schemes with reduced subpacketization from linear block codes,”

IEEE Trans. Inf. Theory , vol. 64, no.4, pp. 3099-3120, April 2018.[28] C. Tian, H. Sun, and J. Chen, “Capacity-achieving private information retrieval codes with optimal message size and upload cost,”

IEEE Trans. Inf.Theory , vol. 65, no. 11, pp. 7613-7627, Nov. 2019.[29] C. Tian, H. Sun, and J. Chen, “A shannon-theoretic approach to the storage-retrieval tradeoff in PIR systems,” in

Proc. IEEE Int. Symp. Inf. Theory(ISIT) , Vail, CO, 2018, pp. 1904-1908.[30] C. Tian, “On the storage cost of private information retrieval,”

IEEE Trans. Inf. Theory , vol. 66, no. 12, pp. 7539-7549, Dec. 2020.[31] N. Woolsey, R.-R. Chen, and M. Ji, “A new design of private information retrieval for storage constrained databases.” in

Proc. IEEE ISIT , 2019, pp.1052-1056.[32] Q. Yan, and D. Tuninetti, “Fundamental limits of caching for demand privacy against colluding users,” to appear in

IEEE J. Selt. Area. Inf. Theory,

IEEE Trans. Inf. Theory ,vol. 63, no. 9, pp. 5821-5833, Sep. 2017.[35] S. Yekhanin, “Private information retrieval,”

Commun. ACM , vol. 53, no. 4, pp. 68-73, 2010.[36] Y. Zhang, X. Wang, H. Wei, and G. Ge, “On private information retrieval array codes,”

IEEE Trans. Inf. Theory , vol. 65, no. 9, pp. 5565-5573, Sept.2019.[37] J. Zhu, Q. Yan, C. Qi, and X. Tang, “A new capacity-achieving private information retrieval scheme with (almost) optimal ﬁle length for coded servers,”