[PDF] An Overflow Problem in Network Coding for Secure Cloud Storage

Abstract

In this paper we define the overflow problem of a network coding storage system in which the encoding parameter and the storage parameter are mismatched. Through analyses and experiments, we first show the impacts of the overflow problem in a network coding scheme, which not only waste storage spaces, but also degrade coding efficiency. To avoid the overflow problem, we then develop the network coding based secure storage (NCSS) scheme. Thanks to considering both security and storage requirements in encoding procedures and distributed architectures, the NCSS can improve the performance of a cloud storage system from both the aspects of storage cost and coding processing time. We analyze the maximum allowable stored encoded data under the perfect secrecy criterion, and provide the design guidelines for the secure cloud storage system to enhance coding efficiency and achieve the minimal storage cost.

Full PDF

aa r X i v : . [ c s . CR ] F e b An Overﬂow Problem in Network Coding forSecure Cloud Storage

Yu-Jia Chen,

Member, IEEE and Li-Chun Wang,

Fellow, IEEE,

National Chiao Tung University, TaiwanEmail: [email protected] and [email protected]

Abstract

In this paper we deﬁne the overﬂow problem of a network coding storage system in which theencoding parameter and the storage parameter are mismatched. Through analyses and experiments,we ﬁrst show the impacts of the overﬂow problem in a network coding scheme, which not onlywaste storage spaces, but also degrade coding efﬁciency. To avoid the overﬂow problem, we thendevelop the network coding based secure storage (NCSS) scheme. Thanks to considering bothsecurity and storage requirements in encoding procedures and distributed architectures, the NCSScan improve the performance of a cloud storage system from both the aspects of storage cost andcoding processing time. We analyze the maximum allowable stored encoded data under the perfectsecrecy criterion, and provide the design guidelines for the secure cloud storage system to enhancecoding efﬁciency and achieve the minimal storage cost.

I. I

NTRODUCTION

Network coding is an attractive solution for secure cloud storage because of achievingthe unconditional security. As long as protecting partial network coded data, a non-collusiveeavesdropper cannot decode any symbol even with huge computing power for inﬁnite time[1]. In principle, network coding simply mixes the data from different network nodes based onthe well-designed linear combination rules. Hence, almost incurring no bandwidth expansionis another advantage of network coding.A secure cloud storage using network coding is illustrated in Fig. 1. The original ﬁle issplit into smaller chunks of symbols, each of which is encoded by Vandermonde matrix [2].The different subsets of the encoded data are stored to two cloud databases. A legitimateuser with access to two cloud databases can recover the entire original ﬁle. However, aneavesdropper with access to only one of the two cloud databases is unable to decode any of

July 21, 2018 DRAFT

SourceSplit

Encode

Original File Original SymbolsEncoded Data

Original symbols

Original fileEncoded Data DecodeAssemble

Fig. 1. An example of secure cloud storage using network coding. the original symbols [3]. In summary, the network coding storage system consists of threeprocedures: splitting, encoding, and distributing to storage nodes.Nevertheless, a secure network coding storage system may encounter a practical designissue when encoding parameters (such as the size of encoding matrix) is not jointly designedwith the storage parameters (such as the storage size per node). If the mismatch betweenencoding and storage parameters occurs, it can cause bandwidth expansion and redundantcomputation cost. We coin the term the overﬂow problem for secure network coding storagesystem in this paper because the mismatch of encoding and storage parameters results inextended length of coded data in the format of digits.Table I shows an example of the overﬂow problem for binary digits, where A is theencoding matrix for network coding, b i and c i are the original data, and network coded data,respectively. Assume that c and c are stored in the ﬁrst database, and c is stored in thesecond database. We can see that the bit length of coded c i is larger than that of b i for July 21, 2018 DRAFT

TABLE I E XAMPLE OF THE OVERFLOW PROBLEM FOR BINARY DIGITS

A b c   (0 , , T (2 , , T = (10 , , T i = 1 ∼ . Also, the minimum bit length required for storing c and c is three in the ﬁrstdatabase, but in the second database the minimum bit length for storing c is two.To overcome the overﬂow problem, we propose a systematic design methodology tocalculate the important system parameters of a network-coded cloud storage system. Themajor objective of the proposed scheme is to provide correct mapping between the encodingparameters and the storage parameters. The contributions of this work are explained asfollows. • Formulate the overﬂow problem of a network-coded cloud storage system, and performexperiments to show the impacts of the overﬂow problem, which not only waste storagespaces, but degrade computational efﬁciency. • Propose a network coding based secure storage (NCSS) scheme to solve the overﬂowproblem. To our best knowledge, the overﬂow problem for a network-coded storagesystem has not been investigated in the literature yet. • Derive the upper bound of the amount of encoded data that can be stored in clouddatabases to achieve the unconditional security level (i.e., perfect secrecy). Based onthe derived upper bound, we present the analysis of storage cost minimization in theproposed NCSS scheme subject to different security levels. • Finally, based on the experimental results, we suggest the design guidelines for deter-mining important system parameters (e.g., the size of the encoding matrix) to acceleratethe network coding process.The rest of this paper is organized as follows. Section II describes related works. In SectionIII, we formulate the overﬂow problem in cloud storage using network coding. In Section IV,

July 21, 2018 DRAFT we present the NCSS scheme. In Section V we give the security and storage analyses of theproposed scheme. Section VI shows the experimental results. Finally, we give our concludingremarks in Section VII. II. B

ACKGROUND AND R ELATED W ORK

Network coding can be viewed as a generalized store-and-forward network routing princi-ple. Messages from different source nodes are combined and regenerated at the intermediatenodes according to algebraic encoding. Besides the well-known advantages of throughputenhancement [4]–[6] and data robustness [7], the recent studies on network coding focus onreliability and security enhancement.

A. Network Coding for Data Recovery

Network coding can improve the efﬁciency of data recovery process when storage nodesfail in distributed storage systems. It was proved that the data recovery problem of distributedstorage systems is equivalent to the multicasting problem of network routing [8]. The authorsof [9] designed a cooperative network coding recovery mechanism for multiple node failures.A proxy-based multiple cloud storage system with the feature of fault tolerance was builtbased on the network coding storage scheme [10]. A network coding method called Re-generating Code was proposed to improve the repair process of distributed storage systems[11]. Different from erasure coding, the repaired data fragments are mixed in intermediatenodes, thereby reducing the repair bandwidth. The authors of [12] applied network codingto optimize the reliability performance of frequently accessed data in cloud storage systems.

B. Network Coding for Data Security

Another research area of network coding is to prevent data being eavesdropped duringtransmission. The information-theoretical security problem for an untrusted channel was ﬁrstdiscussed in [13]. A network coding system was built to prove that a wiretapper cannotobtain any information from the transmitted message [14]. A weaker type of security issuewas investigated in [15], where a node can decode packets only after receiving sufﬁcientlinear independent encoded data. The construction of a secure linear network code for awiretap network was presented in [16].The secrecy capacity for a network-coded cloud storage system was investigated in [17],[18], where the secrecy capacity is deﬁned as the maximum amount of data that can be

July 21, 2018 DRAFT securely stored under the perfect secrecy condition. The perfect secrecy condition ensure theeavesdropper cannot obtain any information of source data. The secrecy capacity for nodeswith different storage capacities was derived in [19]. The coding scheme that can achievethe storage upper bound of secrecy capacity was proposed in [20]. The maximum data sizebeing stored under the perfect secrecy condition for any number of eavesdropped nodes wasdetermined in [21]. The authors of [22] considered how to achieve the information-theoreticalsecrecy when an eavesdropper can access some data in the storage nodes.For secure storage over multiple clouds, similar to this work, the authors of [3] proposeda security protection scheme to ensure that no symbols can be decoded by an eavesdropper,which is weaker than perfect secrecy. In [23], a link eavesdropping problem in a network-coded cloud storage system was investigated. A publicly veriﬁable protocol for network codedcloud storage was proposed in [24].

C. Objective of This Paper

Different from the previous works focusing on the security and reliability enhancement ofnetwork coding, in this paper we focus on the storage efﬁciency and perfect security whenapplying network coding in multiple untrusted clouds. We deﬁne the overﬂow problem whennetwork coding is applied in a cloud storage system, which has not been discussed previously.The overﬂow problem will result in extra extended encrypted data in the format of digitsduring encoding process, thereby increasing storage and computation cost. To overcome theoverﬂow problem, we develop a systematic design methodology for calculating the encodingand storage parameters of a network-coded cloud storage system. Based on the proposedmethod, we further solve the storage cost optimization problem under the perfect secrecyconstraint. The ultimate goal of this paper is to demonstrate that the performance of a network-coded cloud storage system can be improved by jointly designing the encoding and storageand parameters. III. S

YSTEM M ODEL AND P ROBLEM S TATEMENT

Now we describe the coding scheme adopted in this paper and give the formal deﬁnitionof the overﬂow problem.

A. Coding Scheme

Consider the original data vector b = ( b , . . . , b n ) T with base d , where elements b i areindependent random integers uniformly distributed over { , . . . , d − } . We use the terms July 21, 2018 DRAFT original data and plaintext data interchangeably in this paper. The goal of a cloud user isto securely store b to multiple cloud databases. To achieve this goal, we adopt the samenetwork coding scheme as [3], in which the input data are mapped to encoded symbols bylinear transformation.Denote A as an n × n Vandermonde matrix, where [ A i,j ] = ( a i − j ) . A is used for theencoding matrix, where all the coefﬁcients a i are distinct nonzero elements over a ﬁniteﬁeld F q , q = 2 k > n . Note that A can be a ( n + m ) × n matrix, where the the amount ofredundancy m depends on the reliability requirement of the storage system.A cloud user encodes data c = ( c , . . . , c n ) T = Ab and splits the encoded data into p parts.We assume the cloud user can arbitrarily store any piece of the encoded data to any clouddatabase. Let ˜c i ( i = 1 , . . . , p ) be the encoded data vector stored in the i -th cloud database.A legitimate user can collect ˜c i from the cloud databases and obtain the original data byperforming A − c . B. Security Model

Assume that an eavesdropper has inﬁnite computing power, but can access only one clouddatabase. Also, it is assumed that the eavesdropper can have the full information aboutthe encoding and decoding schemes, including the knowledge of the encoding matrix. Theobjective of an eavesdropper is to guess the original data. Although we consider only oneeavesdropper in this paper, our result can be extended to the case of multiple eavesdroppers.In our considered cloud storage system, every cloud database can support different securitylevels [25]. Denote P e i as the probability that the i -th cloud database can resist attacks. Also,the cloud user speciﬁes a security requirement P u , which represents the maximum probabilitythat an eavesdropper can guess the original data. Next, we will show how to solve the overﬂowproblem subject to the constraints of the security requirement when considering distributingencoded symbols to multiple cloud databases. C. Overﬂow Problem

Although it was proved that the aforementioned network coding scheme can help preventeavesdroppers from obtaining the information of the original data [1], the overﬂow problemoccurs from the encoding process and the storage process. Speciﬁcally, if the encodingparameter and the storage parameter are mismatched, the length of encoded data in digitalformat may become larger than the length of the original data in digital format. As a result,

July 21, 2018 DRAFT

TABLE II E XAMPLE OF THE DEFINITIONS FOR OVERFLOW PROBLEM

A b c ˜c = ( c , c ) ˜c = ( c ) strictly non-overﬂow -bounded non-overﬂowCase1   (0 , , T (1 , , T (1 ,

1) (1)

Yes YesCase2   (0 , , T (10 , , T (10 , No Yes storage spaces are wasted due to redundant encoded data. Now we formally state this problemby introducing the following deﬁnition.

Deﬁnition 1 (Strictly Non-overﬂow)

Let l d ( a ) be the number of digits that represents a inbase d . A piece of encoded data c = ( c , . . . , c n ) T is considered to be strictly non-overﬂowif and only if l d ( c i ) ≤ l d ( b i ) for every i . Thus, the length of the encoded data is equal to thatof the plaintext data. Deﬁnition 2 ( α -bounded Non-overﬂow) Let | ˜c i | denote the number of elements in ˜c i . Apiece of encoded data c = ( c , . . . , c n ) T is considered to be α -bounded Non-overﬂow if andonly if | ˜c i | X j =1 l d ( c j ) ≤ | ˜c i | αl d ( b i ) , for ≤ i ≤ p . Assume the encoded data are randomly stored in cloud databases. Hence, the increasingcost of storage or computation resources caused by data extension can be measured by theextension degree α = l d ( c i ) l d ( b i ) of the encoded data in the cloud database. Table II show thecoding results for the two different overﬂow cases where d = 2 and p = 2 . In case 1 there July 21, 2018 DRAFT

TABLE III N OTATIONS IN THIS PAPER

Notations Descriptions b Original data array d Base of b i l d ( a ) Number of digits that represents a in base d A Encoding matrix k Use Galois Field size k for A n Matrix size of A p Total number of cloud databases c Encoded data vector ˜c i Encoded data vector that stored in the i -th cloud database | ˜c i | Number of elements in ˜c i s i Number of digits in b i b ′ Regrouped data array r Size of b ′ P e i Probability of the i -th cloud database can resist attacks P g Probability that an eavesdropper can guess the original data P u Security requirement: Maximum probability that an eavesdropper can guess the original data are no redundant digits after the encoding process, but in case 2 the extension degree isbounded by .IV. N ETWORK C ODING BASED S ECURE S TORAGE (NCSS) S

CHEME

In this section, we analyze the overﬂow problem of a network-coded cloud storage system.We ﬁrst give the criteria to choose the proper length of the data element to be encoded. Next,we present the data distribution method for achieving the required security level. Finally, wedescribe the system design methods of the NCSS scheme. Table III summarizes the notationsused in this paper.

July 21, 2018 DRAFT

A. Coding Analysis

The following theorem can help calculate the encoding parameters to avoid the overﬂowproblem of the secure network coding storage system.

Theorem 1

Let s i be the number of digits in b i . Then, the system is strictly non-overﬂow if s i = s = k log d . Proof:

First, we assume that s i < k log d . Then, we have k log d = k log d d k . (1)Because the coding process is manipulated with integers, we have s i ≤ log d (2 k − . Since c i is distributed over (cid:8) , . . . , k − (cid:9) , the maximum number of digits used to represent anencoded element is l d ( c i ) max = log d (2 k − . Furthermore, the number of digits in b i can berepresented as l d ( b i ) . Thus, we have s i = l d ( b i ) ≤ log d (2 k −

1) = l d ( c i ) max . (2)As a result, the overﬂow problem occurs because the length of encoded data may be largerthan the length of the original data. Secondly, we assume that s i > k log d . We take ex-ponentiation with base d on both sides and we have d s i > d log d k = 2 k from (1). Since b i = d s i , it contradicts the fact that the maximum value of b i is k − . Hence, it follows that s i = s = k log d . Theorem 2

The system is α -bounded non-overﬂow if s i ≥ α log d (2 k − for every i .Proof: Since s i = l d ( b i ) and l d ( c i ) max = log d (2 k − , we have | ˜c i | X j =1 l d ( c j ) ≤ | ˜c i | log d (2 k − α · α | ˜c i | log d (2 k − ≤ α | ˜c i | s i = α | ˜c i | l d ( b i ) . (3)Theorem 1 and 2 give the criteria of selecting the length of the plaintext data element.Next, we relate the security requirement to the amount of encoded stored data. July 21, 2018 DRAFT0

Theorem 3

The system satisﬁes the security requirement P u if | ˜c i | X j =1 l d (˜ c i ( j )) ≤ n X t =1 l d ( c t ) + log d P u − log d (1 − P e i ) , for ≤ i ≤ p .Proof: Recall that an eavesdropper can access only one cloud database. Hence, theprobability P g that an eavesdropper can guess the original data is the product of the invasionprobability of the cloud database and the probability of guessing the remaining encodeddigits. It follows that P g = (1 − P e i ) d −  n P t =1 l d ( c t ) −| ˜c i | P j =1 l d (˜ c i ( j ))  ≤ (1 − P e i ) d log d P u − log d (1 − P ei ) = P u . (4) B. System Design

The proposed NCSS scheme can be divided into three steps. First, a dynamic-lengthalphabet representation of network coded data is adopted based on Theorem 1 and Theorem2. Second, the original data are preprocessed and regrouped before the encoding process.Third, the regrouped data are encoded and distributed to the corresponding cloud databases.Figure 2 shows the system ﬂow of the proposed NCSS scheme. Assume that a clouduser wants to store a single-digit data array b = ( b , . . . , b m ) T with base d to the p clouddatabases. We ﬁrst choose a power k for the ﬁeld characteristics according to the followingcondition. Condition 1 k ≥ d The ﬁeld size must be larger than the maximal value of the data array element d − . Otherwise,some data elements cannot be represented in the ﬁeld. After that, a proper length of data el-ements s i can be decided according to Theorem 1 and Theorem 2. This step is called dynamiclength alphabet representation. We then regroup b to b ′ = ( b ...b s , b s +1 ...b s + s , · · · , b ˆ s r − +1 ...b ˆ s r ) based on the value of s i , where ˆ s r ∆ = r P i =1 s i . Next, we generate an n × n encoding matrix A with the following condition. July 21, 2018 DRAFT1

Decide k for GF(2 k ) by Condition 1Input data b with base d Generate A of dimension n by Condition 2Encoding: c=Ab ’ Generate (cid:3556)(cid:2185) by Theorem 3 withsecurity requirement (cid:1842) (cid:3048)

Distributedly store (cid:3556)(cid:2185) to cloud databasesDecide (cid:1871) (cid:3036) by Theorem 1 Decide (cid:1871) (cid:3036) by Theorem 2

Strictly non-overflow scheme (cid:2009) -bounded non-overflow scheme

Regroup b and get b ’ Dynamic Length Alphabet Representation

Fig. 2. System ﬂow of NCSS scheme.

Condition 2 n < k and n ≤ r Since matrix A is constructed from n distinct elements over the Galois Field, we have n < k .In addition, the matrix multiplication cannot be operated if the size of encoding matrix islarger than the size of regrouped data array. We then encode b ′ with A and obtain the encodeddata array c = ( c , . . . , c n ) T . Finally, c can be regrouped to ˜c by Theorem 3, which speciﬁesthe maximum amount of encoded data that can be stored in a cloud database according touser’s security requirement. Finally, the elements of ˜c are distributed to the corresponding p cloud databases.Table IV shows an example of the proposed NCSS scheme in the strictly non-overﬂowcase. We assume that the original data is b = (0 , , , , , , , , and the encoded data arestored to two cloud databases with P e = 0 . , P e = 0 . , and P u = . From Theorem 3, July 21, 2018 DRAFT2

TABLE IV E XAMPLE OF ADOPTING

NCSS

SCHEME IN STORING ENCODED DATA TO TWO CLOUD DATABASES b d k s b ′ r n A c ˜c (0 , , , , , , , ,

1) 2 3 3 (001 , ,   (010 , , , the maximal numbers of digits that can be stored in the ﬁrst and the second cloud databaseare and , respectively. V. S ECURITY A NALYSIS

In this section, we analyze the proposed NCSS scheme in terms of security level andstorage cost. First, we discuss the issue of enhancing security level from a system designaspect. Then we derive the upper bound of data size that can be stored in the cloud underthe constraint of perfect secrecy.To begin with, from (4) we know that the lower bound of the security requirement P u is (1 − P e i ) d −  n P t =1 l d ( c t ) −| ˜C i | P j =1 l d (˜ c i ( j ))  ≤ P u . Since l d ( c t ) is proportional to the size of Galois Field, it is anticipated that a larger sizeof encoding matrix n and a large value of power k of the ﬁeld characteristics can resultin a higher security level for the network coding storage system. However, increasing theseencoding parameters can result extra coding complexity. Next, we show that the securitylevel can be enhanced to the perfect secrecy by storing a certain amount of encoded data inthe local machine. The notion of perfect secrecy represents that an eavesdropper can get noinformation of the original message. Deﬁnition 3 (Perfect Secrecy Criterion [26])

Let S denote the random variable associatedwith the secret data fragments and E denote the random variable associated encoded frag-ments observed by the eavesdropper. The perfect secrecy requires H (S | E ) = H (S) , July 21, 2018 DRAFT3 where H( X ) represents the entropy of a random variable X . In the worst case, an eavesdropper can access the encoded data of all the cloud databases.The following theorem can be applied to specify the maximal amount of encoded datafragments that can be stored in the cloud, while keeping the rest of data in a local machineto ensure perfect secrecy.

Theorem 4

Assume that w -digit secret information is encoded with n − w -digit data b . Forboth strictly non-overﬂow and α -bounded non-overﬂow schemes, a cloud user can store atmost n P j =1 l d ( c j ) − w digits of encoded data to the cloud under the perfect secrecy criterion.Proof: Let e ( h ) represent a subset containing any h components of vector e . We use e i : j to denote the subvector formed from the i -th to the j -th position of vector e . The set ofrows from the i -th to the j -th position of matrix D is represented as D i : j . In addition, b i areindependent random variables uniformly distributed over F q with entropy H ( b i ) = H ( b ) .For simplicity, without loss of generality, assume that t contiguous components of theencoded data c p +1: p + t are stored to the clouds. Then we can obtain H ( b ( w ) ) = H ( b ( w ) | c p +1: p + t ) − H ( b ( w ) | c ) (5) = I ( b ( w ) ; c ) − I ( b ( w ) ; c p +1: p + t ) (6) = H ( c ) − H ( c p +1: p + t ) − H ( c | b ( w ) ) + H ( c p +1: p + t | b ( w ) ) (7) ≤ H ( c ) − H ( c p +1: p + t ) . (8)In the above equations, (5) holds because of the perfect secrecy criterion and due to the factthat the secret information can be reconstructed if the entire codewords are given. In (8), wehave H ( c p +1: p + t | b ( w ) ) − H ( c | b ( w ) ) ≤ since H ( c | b ( w ) ) − H ( c p +1: p + t | b ( w ) ) = H ( c p + t +1: n | b ( w ) , c p +1: p + t ) . Since b i are i.i.d random variables, it follows that H ( b ( w ) ) = H (cid:0) b seq (1) , b seq (2) , . . . , b seq ( w ) (cid:1) = wH ( b ) , (9)where seq ( j ) is the j -th element of a random integer sequence within the range to n .Because the encoded data vector c contains the entire information of b at most, we can July 21, 2018 DRAFT4 obtain H ( c ) ≤ nH ( b ) . (10)Moreover, the n × n Vandermonde matrix A is nonsingular [2]. Thus the eavesdropper canapply the Gaussian elimination to obtain the reduced row echelon form of the submatrix S ,whose elements are [ S i,j ] = [ A i,j ] for p + 1 ≤ i, j ≤ p + t . The Eavesdropper Reduced Matrix M can be obtained as M p+1:p+t =  m p ... m p + t − ... ... ... ||| I t ||| ... ... ... m pn ... m p + t − n  , (11)where the other element of M are the same as A . Hence, the eavesdropper have t equationsto solve n unknown elements. It implies that H ( c p +1: p + t ) = tH ( b ) . (12)Substituting (9), (12) and (10) into (8), we obtain tH ( b ) ≤ nH ( b ) − wH(b) . (13)The above equation shows that we can store at most the n − w components of encoded datato the clouds under perfect secrecy criterion. For the strictly non-overﬂow scheme, we haveonly one digit in each component of encoded data. Thus, we can store at most n P j =1 l d ( c j ) − w digits of encoded data to the clouds, while keeping the remaining w digits in the localmachines. However, we may have multiple digits in each component of encoded data for α -bounded non-overﬂow scheme. Let e (˜ h ) represent a subset containing any w fragmentarycomponents of vector e . With at least n unknown digits, knowing c ( ˜ w ) cannot help solve b .As a result, it follows that I (cid:0) c ( ˜ w ) ; b (cid:1) = 0 . (14)Not that we still have t equations to solve n unknown elements. That is, H ( b ( w ) | c p +1: p + t , c ( ˜ w ) ) = H ( b ( w ) | c p +1: p + t ) . (15)Finally, we obtain I (cid:0) c p +1: p + t , c ( ˜ w ) ; b ( w ) (cid:1) = I (cid:0) c p +1: p + t ; b ( w ) (cid:1) . (16)Consequently, we can select w digits of encoded data from different w components, i.e.,select one digit for each component. These w -digit encoded data can be stored in the localmachines, while the remaining n P j =1 l d ( c j ) − w digits are stored to the clouds. July 21, 2018 DRAFT5

VI. S

TORAGE M INIMIZATION

We are motivated to analyze the amount of stored encrypted data with the security require-ment in terms of the probability that an eavesdropper can correctly guess the original data.This is because only a certain amount of encoded data fragments can be stored in the localmachines to enhance the security level, as shown in the previous section. As the requiredsecurity level increases, the amount of encoded data stored at the local site should increase.

A. Solving Storage Minimization Problem

Consider a cloud user keeps encoded data with length l in each encoding operation andstores the remaining encoded data to p cloud databases as shown in Fig. 3. We assume all thecloud databases have the same capability of preventing attacks (i.e., P e i = P e ) and the securityrequirement is P u , which speciﬁes the maximum probability that an eavesdropper can guessthe original message. In addition to the encoded data, the encoding matrix is stored at thelocal site.The storage cost at the local site is the function of encoding matrix size n and the amountof encoded data stored at a local machine for every encoding operation, denoted by l . Let m denote the length of the original message and α represent the number of encoding operations.Subject to a given security requirement P u , the storage cost minimization problem is expressedas minimize f ( n, l ) = n s + αl subject to (1 − P e ) p d − αl P u n k l nαns = mn, l ∈ Z + , (17)where s is deﬁned in Theorem 1. Note that an eavesdropper can guess the original messageonly if he/she can invade all the cloud databases and guess the encoded data in the localmachine. It is observed that the optimization problem is nonconvex even if we relax thenoncovex constraints n, l ∈ Z + . The complete algorithm for solving this optimization problemis given in the Appendix. July 21, 2018 DRAFT6

Encoding Matrix = SplitEncode Original File

Original Symbols

Encoded Data

Fig. 3. Illustration of a user keeping a certain amount of encoded data at the local site to enhance security protection.

B. Discussions

Figure 4 shows the optimal parameter setting for encoding matrix size n versus the originalmessage length m for d = 2 , P e = 0 . , p = 3 , and P u = 10 − . As the message lengthincreases, the size of the encoding matrix increases. A smaller encoding matrix size ispreferred if Galois ﬁeld size is large. Due to the integer constraints in the optimizationproblem, the encoding matrix size increases in a step-like function.Figure 5 shows the storage cost f ( n, l ) versus message length m for d = 2 , P e = 0 . , and p = 3 . Intuitively, we need more storage space for lower P u . However, the storage cost withvarious P u are the same when m exceeds certain threshold. This is because the consideredsystem is in the case of lower bound cost (i.e., l = 1 ). Noteworthily, a larger k can yield asmaller lower bound cost when m > . In a general setting k ∈ [8 , [27]. For m < it is suggested that the value of k is set to k = 8 ; otherwise, k = 16 .VII. E XPERIMENTAL R ESULTS

Since the encoding process is performed on local machines, processing delay may beperformance bottlenecks. Thus, it is of importance to investigate the impact of the systemdesign parameters on the delay performance when considering a secure network codingstorage scheme. We performed experiments on a commodity computer with an Intel Core i5processor running at 2.4 GHz, 8 GB of RAM, and a 5400 RPM Hitachi 500 GB Serial ATA

July 21, 2018 DRAFT7 n k=16k=8 Fig. 4. Optimal parameter setting for encoding matrix size versus message length under different Galois Field sizes k . f ( b i t s ) k=8, Pu=2 k=16, Pu=2 k=8, Pu=2 k=16, Pu=2 Fig. 5. Storage cost versus message length for different Galois Field sizes k and security requirement P u . July 21, 2018 DRAFT8 drive with an 8 MB buffer.Figure 6 shows the multiplication processing time of the network coding storage systemwith different sizes of Galois Field. Although the complexity for the network coding is O ( n ) modular multiplication, our result shows that the ﬁeld size has only slight impact onthe processing time, which supports our design methodology of selecting k . Speciﬁcally, itindicates the possibility that the security level can be enhanced signiﬁcantly by selecting anappropriate design of k but only pay a very small computational cost.Figure 7 shows the processing time between the strictly non-overﬂow and the α -boundednon-overﬂow schemes for 2MB ﬁle with p = 2 , where α = 5 . The processing time islonger for a smaller n or k since the numbers of encoding times increase. As a result, thesystem spends more time in I/O operations and fetching data between the kernel and user [7].Compared to the strictly non-overﬂow scheme, the α -bounded non-overﬂow scheme requiresmore computation cost. The α -bounded non-overﬂow scheme costs more than 11 times and22 times of the processing time than that of the strictly non-overﬂow scheme when k = 16 and 8, respectively. Finally, the best performance is achieved when n > for both non-overﬂow schemes. Because increasing n results in a larger cost than increasing k , we suggestto ﬁx n = 100 and adjust k to meet the security requirements.Figure 8 compares the processing time of the strictly non-overﬂow and the α -bounded non-overﬂow schemes versus the power of Galois Field characteristic k . As shown in the ﬁgure,the strictly non-overﬂow scheme is preferable to the α -bounded non-overﬂow scheme. It isnoteworthy that k has negligible effect on the processing time of the strictly non-overﬂowscheme while it has a great impact on that of the α -bounded non-overﬂow scheme.VIII. C ONCLUSIONS

In this paper, we investigated the overﬂow problem in a network coding cloud storagesystem. When the overﬂow problem occurs, it does not only require more storage spacesbut increases the processing time in encoding. We developed the network coding basedsecure storage (NCSS) scheme. A systematic approach for the optimal encoding and storageparameters was provided to solve the overﬂow problem and minimize the storage cost. Wealso derived an analytical upper bound for the maximal allowable stored data in the cloudnodes under perfect secrecy criterion. Our experimental results demonstrated that encodingefﬁciency in terms of processing time can be improved by jointly design of the encoding andthe storage system parameters. More importantly, we suggested the key design guidelines for

July 21, 2018 DRAFT9 times) P r o c e ss i ng T i m e ( s e c ond ) GF(2 )GF(2 ) Fig. 6. Processing time versus the multiplication times for different Galois Fields k .

20 40 60 80 100 120 14000.10.20.30.40.50.60.70.8 Matrix Size n P r o c e ss i ng T i m e ( m i nu t e ) strictly non−overflow (k = 8)strictly non−overflow (k = 16) α −bouned overflow ( α =5, k = 8) α −bouned overflow ( α =5, k = 16) Fig. 7. Comparison of processing time between the strictly non-overﬂow and the α -bounded non-overﬂow schemes versusmatrix size n with p = 2 . July 21, 2018 DRAFT0 P r o c e ss i ng T i m e ( m i nu t e ) strictly non−overflow (n = 15)strictly non−overflow (n = 31)strictly non−overflow (n = 127) α −bounded non−overflow ( α =5, n = 15) α −bounded non−overflow ( α =5, n = 31) α −bounded non−overflow ( α =5, n = 127) Fig. 8. Comparison of processing time between the strictly non-overﬂow and the α -bounded non-overﬂow schemes versuspower of Galois Field characteristic k with p = 2 . secure network coding storage systems to optimize the performance tradeoff among securityrequirement, storage cost per node, and encoding processing time. In the future research, wewill further incorporate the factors of user budgets and ﬁle recovery into the secure networkcoding distributed storage system. R EFERENCES [1] P. F. Oliveira, L. Lima, T. T. V. Vinhoza, J. Barros, and M. Medard, “Trusted storage over untrusted networks,”

IEEEGlobal Communication Conference , 2010.[2] A. Klinger, “The Vandermonde matrix,”

The American Mathematical Monthly , 1967.[3] P. F. Oliveira, L. Lima, T. T. Vinhoza, J. Barros, and M. Medard, “Coding for trusted storage in untrusted networks,”

IEEE Transactions on Information Forensics and Security , vol. 7, no. 6, pp. 1890–1899, 2012.[4] W. Qiao, J. Li, and J. Ren, “An efﬁcient error-detection and error-correction scheme for network coding,”

IEEE GlobalTelecommunications Conference , pp. 1–5, 2011.[5] D. Zeng, S. Guo, Y. Xiang, and H. Jin, “On the throughput of two-way relay networks using network coding,”

IEEETransactions on Parallel and Distributed Systems , vol. 25, no. 1, pp. 191–199, 2014.[6] Y. Wu and S.-Y. Kung, “Distributed utility maximization for network coding based multicasting: A shortest pathapproach,”

IEEE Journal on Selected Areas in Communications , vol. 24, no. 8, pp. 1475–1488, 2006.[7] C. Fragouli and J. L. Boudec, “Network coding: An instant primer,”

ACM SIGCOMM Computer , vol. 36, no. 1, pp.63–68, 2006.

July 21, 2018 DRAFT1 [8] A. Dimakis, P. Godfrey, Y. Wu, M. Wainwright, and K. Ramchandran, “Network coding for distributed storagesystems,”

IEEE Transactions on Information Theory , vol. 56, no. 9, pp. 4539–4551, 2010.[9] Y. Hu, Y. Xu, X. Wang, C. Zhan, and P. Li, “Cooperative recovery of distributed storage systems from multiple losseswith network coding,”

IEEE Journal on Selected Areas in Communications , vol. 28, pp. 268–276, 2010.[10] Y. Hu, H. Chen, P. Lee, and Y. Tang, “NCCloud: Applying network coding for the storage repair in a cloud-of-clouds,” in Proc. of the 10th USENIX Conf. on File and Storage Tech , vol. 1, 2012.[11] D. S. Papailiopoulos, J. Luo, A. G. Dimakis, C. Huang, and J. Li, “Simple regenerating codes: Network coding forcloud storage,”

IEEE International Conference on Computer Communications , pp. 2801–2805, 2012.[12] Y. Lu, J. Hao, X.-J. Liu, and S.-T. Xia, “Network coding for data-retrieving in cloud storage systems,” pp. 51–55,2015.[13] L. Ozarow and A. Wyner, “Wire-tap channel II,”

Advances in Cryptology , pp. 33–50, 1985.[14] N. Cai and R. Yeung, “Secure network coding,”

IEEE International Symposium on Information Theory , p. 323, 2002.[15] K. Bhattad and K. Narayanan, “Weakly secure network coding,”

Workshop on Network Coding, Theory, andApplication , 2005.[16] N. Cai and R. W. Yeung, “Secure network coding on a wiretap network,”

IEEE Transactions on Information Theory ,vol. 57, no. 1, pp. 424–435, 2011.[17] S. Pawar, S. El Rouayheb, and K. Ramchandran, “On secure distributed data storage under repair dynamics,”

IEEEInternational Symposium on Information Theory Proceedings , pp. 2543–2547, 2010.[18] N. B. Shah, K. Rashmi, and P. V. Kumar, “Information-theoretically secure regenerating codes for distributed storage,”

IEEE Global Telecommunications Conference , pp. 1–5, 2011.[19] T. Ernvall, S. El Rouayheb, C. Hollanti, and H. V. Poor, “Capacity and security of heterogeneous distributed storagesystems,”

IEEE Journal on Selected Areas in Communications , vol. 31, no. 12, pp. 2701–2709, 2013.[20] A. S. Rawat, N. Silberstein, O. O. Koyluoglu, and S. Vishwanath, “Secure distributed storage systems: Local repairwith minimum bandwidth regeneration,”

International Symposium on Communications, Control and Signal Processing ,pp. 5–8, 2014.[21] S. Goparaju, S. E. Rouayheb, R. Calderbank, and H. V. Poor, “Data secrecy in distributed storage systems under exactrepair,”

International Symposium on Network Coding , pp. 1–6, 2013.[22] N. Shah, K. Rashmi, and P. Kumar, “Information-theoretically secure regenerating codes for distributed storage,”

IEEEGlobal Communication Conference , 2011.[23] Y.-J. Chen, L.-C. Wang, and C.-H. Liao, “Eavesdropping prevention for network coding encrypted cloud storagesystems,”

IEEE Trans. Parallel Distrib. Syst. , vol. 27, pp. 2261–2273, 2016.[24] F. Chen, T. Xiang, Y. Yang, and S. S. Chow, “Secure cloud storage meets with secure network coding,”

IEEETransactions on Computers , vol. 65, no. 6, pp. 1936–1948, 2016.[25] M. Barua, X. Liang, R. Lu, and X. Shen, “ESPAC: Enabling security and patient-centric access control for eHealthin cloud computing,”

International Journal of Security and Networks , vol. 6, no. 2, pp. 67–76, 2011.[26] J. L. Massey, “An introduction to contemporary cryptology,”

Proceedings of the IEEE , vol. 76, no. 5, pp. 533–549,1988.[27] G. Angelopoulos, M. M´edard, and A. P. Chandrakasan, “Energy-aware hardware implementation of network coding,”

International Conference on Research in Networking , pp. 137–144, 2011.

July 21, 2018 DRAFT2 A PPENDIX

Here we ﬁrst show that the original storage cost minimization (17) is not convex even whenthe integer constraint is relaxed. Then we give the algorithm for solving the optimizationproblem by minimizing over separated variables.

Theorem 5

The objective function of the original storage cost minimization (17) is notconvex.Proof:

We consider the case of strictly non-overﬂow scheme. Substituting s i = s = k log d into (17), the original storage cost minimization is equivalent tominimize ˜ f ( n, l ) = k log d n + m log dk n − l subject to − km log d log d P u (1 − P c ) p n l n n k n, l ∈ Z + . (18)Then we prove the theorem by showing that the Hessian matrix of the objective function isnot positive semideﬁnite. The Hessian matrix of ˜ f ( n, l ) is H ( ˜ f ) =  a + 2 bln − − bn − − bn −  , where a = k log d > and b = m log dk > . Then, we solve the characteristic equation det( H ( ˜ f ) − λI ) = λ − (2 a + 2 bln − ) λ − b n − = 0 . We can obtain λ = 2 a + 2 bln − ± q (2 a + 2 bln − ) + b n − . Since the eigenvalues of H ( ˜ f ) is not all positive, H ( ˜ f ) is not positive semideﬁnite. Thus ˜ f is not convex.We are now ready for solving the equivalent optimization problem (18) by minimizingover separated variables. Deﬁne ˜ f ∗ ( n, l ) , min n ∈ B ,l ∈ A ˜ f and l ∗ , arg min l ∈ A ˜ f ( n, l ) , where A = { x | x ∈ Z + , nkm log d log d P u (1 − P c ) p x < n } and B = { x | x ∈ Z + , x < k } . We ﬁrst minimizeover n ˜ f ∗ ( n, l ) = min n ∈ B { x | x = min l ∈ A ˜ f ( n, l ) } . July 21, 2018 DRAFT3

Since min l ∈ A ˜ f ( n, l ) is a linear function with one variable in Z for ﬁxed n and the coefﬁcientis positive, we obtain l ∗ = min { A } = (cid:24) − nkm log d log d P u (1 − P c ) p (cid:25) . As a result, we can solve the optimization problem iteratively as: • Step 0: Initiate C = ∅ and B = { x | x ∈ Z + , x < k } . • Step 1: Select n ∈ B and set l = l − nkm log d log d P u (1 − P c ) p m . • Step 2: Calculate c = ˜ f ( n, l ) . • Step 3: Set C = C ∪ { c } and B = B − { n } . • Step 4: Iterate 1 to 4 until B = ∅ . • Step 5: Obtain ˜ f ∗ ( n, l ) = min { C } . July 21, 2018 DRAFT4

Yu-Jia Chen received the B.S. degree and Ph.D. degree in electrical engineering from NationalChiao Tung University, Taiwan, in 2010 and 2016, respectively. He is currently a postdoctoral fellowin National Chiao Tung University. His research interests include network coding for secure storagein cloud datacenters, software deﬁned networks (SDN), and sensors-assisted applications for mobilecloud computing. Yu-Jia Chen has published 15 conference papers and 3 journal papers. He is holdingtwo US patent and three ROC patent.

Li-Chun Wang (M’96 – SM’06 – F’11) received the B.S. degree from National Chiao TungUniversity, Taiwan, R.O.C. in 1986, the M.S. degree from National Taiwan University in 1988, andthe Ms. Sci. and Ph. D. degrees from the Georgia Institute of Technology , Atlanta, in 1995, and1996, respectively, all in electrical engineering.From 1990 to 1992, he was with the Telecommunications Laboratories of Chunghwa Telecom Co.In 1995, he was afﬁliated with Bell Northern Research of Northern Telecom, Inc., Richardson, TX.From 1996 to 2000, he was with AT&T Laboratories, where he was a Senior Technical Staff Member in the WirelessCommunications Research Department. Since August 2000, he has joined the Department of Electrical and ComputerEngineering of National Chiao Tung University in Taiwan and is the current Chairman of the same department. His currentresearch interests are in the areas of radio resource management and cross-layer optimization techniques for wireless systems,heterogeneous wireless network design, and cloud computing for mobile applications.Dr. Wang won the Distinguished Research Award of National Science Council, Taiwan in 2012, and was elected to theIEEE Fellow grade in 2011 for his contributions to cellular architectures and radio resource management in wireless networks.He was a co-recipient(with Gordon L. Stuber and Chin-Tau Lea) of the 1997 IEEE Jack Neubauer Best Paper Award for hispaper “Architecture Design, Frequency Planning, and Performance Analysis for a Microcell/Macrocell Overlaying System,”IEEE Transactions on Vehicular Technology, vol. 46, no. 4, pp. 836-848, 1997. He has published over 200 journal andinternational conference papers. He served as an Associate Editor for the IEEE Trans. on Wireless Communications from2001 to 2005, the Guest Editor of Special Issue on ”Mobile Computing and Networking” for IEEE Journal on SelectedAreas in Communications in 2005, ”Radio Resource Management and Protocol Engineering in Future Broadband Networks”for IEEE Wireless Communications Magazine in 2006, and ”Networking Challenges in Cloud Computing Systems andApplications,” for IEEE Journal on Selected Areas in Communications in 2013, respectively. He is holding 10 US patents.received the B.S. degree from National Chiao TungUniversity, Taiwan, R.O.C. in 1986, the M.S. degree from National Taiwan University in 1988, andthe Ms. Sci. and Ph. D. degrees from the Georgia Institute of Technology , Atlanta, in 1995, and1996, respectively, all in electrical engineering.From 1990 to 1992, he was with the Telecommunications Laboratories of Chunghwa Telecom Co.In 1995, he was afﬁliated with Bell Northern Research of Northern Telecom, Inc., Richardson, TX.From 1996 to 2000, he was with AT&T Laboratories, where he was a Senior Technical Staff Member in the WirelessCommunications Research Department. Since August 2000, he has joined the Department of Electrical and ComputerEngineering of National Chiao Tung University in Taiwan and is the current Chairman of the same department. His currentresearch interests are in the areas of radio resource management and cross-layer optimization techniques for wireless systems,heterogeneous wireless network design, and cloud computing for mobile applications.Dr. Wang won the Distinguished Research Award of National Science Council, Taiwan in 2012, and was elected to theIEEE Fellow grade in 2011 for his contributions to cellular architectures and radio resource management in wireless networks.He was a co-recipient(with Gordon L. Stuber and Chin-Tau Lea) of the 1997 IEEE Jack Neubauer Best Paper Award for hispaper “Architecture Design, Frequency Planning, and Performance Analysis for a Microcell/Macrocell Overlaying System,”IEEE Transactions on Vehicular Technology, vol. 46, no. 4, pp. 836-848, 1997. He has published over 200 journal andinternational conference papers. He served as an Associate Editor for the IEEE Trans. on Wireless Communications from2001 to 2005, the Guest Editor of Special Issue on ”Mobile Computing and Networking” for IEEE Journal on SelectedAreas in Communications in 2005, ”Radio Resource Management and Protocol Engineering in Future Broadband Networks”for IEEE Wireless Communications Magazine in 2006, and ”Networking Challenges in Cloud Computing Systems andApplications,” for IEEE Journal on Selected Areas in Communications in 2013, respectively. He is holding 10 US patents.