[PDF] Low Complexity Secure Code (LCSC) Design for Big Data in Cloud Storage Systems

Abstract

In the era of big data, reducing the computational complexity of servers in data centers will be an important goal. We propose Low Complexity Secure Codes (LCSCs) that are specifically designed to provide information theoretic security in cloud distributed storage systems. Unlike traditional coding schemes that are designed for error correction capabilities, these codes are only designed to provide security with low decoding complexity. These sparse codes are able to provide (asymptotic) perfect secrecy similar to Shannon cipher. The simultaneous promise of low decoding complexity and perfect secrecy make these codes very desirable for cloud storage systems with large amount of data. The design is particularly suitable for large size archival data such as movies and pictures. The complexity of these codes are compared with traditional encryption techniques.

Full PDF

aa r X i v : . [ c s . I T ] A p r Low Complexity Secure Code (LCSC) Design forBig Data in Cloud Storage Systems

Mohsen Karimzadeh Kiskani † , Hamid R. Sadjadpour † , Mohammad Reza Rahimi ‡ and Fred Etemadieh ‡ Abstract —In the era of big data, reducing the computationalcomplexity of servers in data centers will be an important goal.We propose Low Complexity Secure Codes (LCSCs) that arespeciﬁcally designed to provide information theoretic securityin cloud distributed storage systems. Unlike traditional codingschemes that are designed for error correction capabilities, thesecodes are only designed to provide security with low decodingcomplexity. These sparse codes are able to provide (asymptotic)perfect secrecy similar to Shannon cipher. The simultaneouspromise of low decoding complexity and perfect secrecy makethese codes very desirable for cloud storage systems with largeamount of data. The design is particularly suitable for large sizearchival data such as movies and pictures. The complexity ofthese codes are compared with traditional encryption techniques.

Index Terms —Distributed Cloud Storage Systems, InformationTheoretic Security, Big Data

I. I

NTRODUCTION

In the era of big data, it is becoming increasingly inefﬁcientto apply traditional cryptographical algorithms to securelystore data. The traditional methods of providing securityrequire signiﬁcant computational power and are not welloptimized for big data applications.For instance, Hypertext Transfer Protocol Secure (HTTPS)protocol which is the backbone of internet security uses theTransport Layer Security (TLS) protocol stack in TransmissionControl Protocol / Internet Protocol (TCP/IP) for secure andprivate data transfer. TLS is a protocol suite that uses a myriadof other protocols to guarantee security. Many of these sub-protocols consume a lot of CPU power and are complexprocesses which are not optimized for big data applications.For instance, TLS uses public-key cryptography paradigmsto exchange the keys between the communicating partiesthrough the

TLS handshake protocol . One of the well-knownkey-exchange algorithms that is used in TLS handshakingprotocol is the RSA algorithm. With the typical modular expo-nentiation algorithms used to implement the RSA algorithm,public-key operations take quadratic computations, private-keyoperations take cubic computations and key generation takesquartic computations with respect to the number of bits inthe modulus. After the key-exchange, TLS uses

TLS recordprotocol and algorithms like Advanced Encryption Standard(AES), which is the current adopted algorithm by the U.S.National Institute of Standards and Technology (NIST) for the

M. K. Kiskani † and H. R. Sadjadpour † are with the Department ofElectrical Engineering, University of California, Santa Cruz. Email: { mohsen,hamid } @soe.ucsc.edu.M. R. Rahimi ‡ and Fred Etemadieh ‡ are with Futurewei Technologies,Santa Clara, CA. Email: { reza.rahimi, fred.etemadieh } @huawei.com. encryption of electronic data, to form block ciphers for ﬁxedblock sizes of 128 bits. Each 128 bits of data therefore needsto undergo AES calculations to be transformed to ciphertexts.Albeit all the effort to enhance the performance of suchalgorithms, todays secure cryptographic protocols are not wellsuited in big data applications as they need to perform asigniﬁcant number of computations. Such unnecessary CPUprocessing time and power renders cloud service providersto spend signiﬁcant resources to maintain their secure cloudservices.On the other hand, most of current cryptographic algorithmsare based on computational security paradigms. In compu-tational security, it is inherently assumed that the man-in-the-middle is unable to perform complex computations andit does not have inﬁnite processing time for cryptanalysis.Such algorithms are vulnerable to attacks in time and theymay be broken by novel deciphering algorithms. For instance,Data Encryption Standard (DES) which, prior to AES, was theofﬁcial Federal Information Processing Standard (FIPS) in theU.S. is no longer considered to be secure.With a focus on reducing the decryption complexity, wepropose a completely new security paradigm for archival datain distributed cloud systems. Reduced decryption complexityensures that a signiﬁcant amount of redundant processingpower for security purposes is avoided. This saves signiﬁcantamount of resources for cloud service providers. Further, weprove that the proposed solution is information theoreticallysecure as opposed to computationally secure. Informationtheoretic secrecy guarantees that our solution is secure re-gardless of the computational power of the adversary and thecryptanalysis time.Our method is based on random sparse codes that are specif-ically designed for security purposes. The sparsity of thesecodes allows the decryption which is done on the clouds to bea low complexity operation. Further, the speciﬁc design of ourcodes allows us to achieve security using the randomness ofthe encryption operations. Our method is optimized to workwith large number of ﬁles that need to be archived. We provethat larger number of ﬁles will result in more secure solutions.Our method asymptotically achieves perfect secrecy posed byShannon in [1]. In cases when the number of ﬁles is notvery large, our method is still capable of achieving a levelof obfuscation that is desirable for many applications such asstoring private images and videos.The rest of the paper is organized as follows. Section IIis dedicated to the related works in security of distributedstorage systems and also the works on utilizing codes forsecurity purposes. The assumptions and problem formulation are described in section III. In section IV we will prove that atleast one dense encoding scheme exists that results in a securesolution and in section V we will examine the security of ourapproach. The complexity analysis and simulation results areprovided in section VI and the paper is concluded in sectionVII. II. R ELATED W ORKS

Secure transmission of data is usually implemented at higherlayers of network. Recently, there is a signiﬁcant interestin studying physical layer security. Physical layer securityassumes that all receivers, both legitimate and eavesdropper,possess the same complete knowledge of the transmissiontechnique. The main idea is based on the original idea thatthere is a transmitter (Alice) who wants to transmit data to alegitimate receiver (Bob) while an eavesdropper (Eve) tries tolisten to this communication and obtain the information. In thisscenario, Alice can adopt any kind of encoding, modulationor even randomization before transmission but both Bob andEve are aware of the transmission technique being used byAlice. Therefore, if there is no noise or channel impairment,an eavesdropper can perfectly decode the message. Wyner in1975 [2] proved that in case of noisy wiretap channels, Alicecan encode the message so that it reveals no information tothe Eve. An important parameter in wiretap channel is thesecrecy capacity. Secrecy capacity is deﬁned as the highesttransmission rate that data can be transmitted in a wiretapchannel such that Eve cannot decode any information and itsprobability of error stays close to 0.5.After the Wyner paper, many researchers started to studythe wiretap channel capacity for different scenarios. Alsoachieving that capacity was another research objective in manypublications. Many researchers [3]–[10] studied the use oferror correcting codes for wiretap channels and other typesof networks.In this paper, we are deviating from this common approachand investigating a completely different problem. SupposeAlice has some data that wants to store in a cloud storagesystem. However, Alice does not want the cloud storagesystem to be able to access this data. Therefore, if this datais accessed by Eve, no information can be obtained from thestored data. Further, when this information that is stored byAlice in the cloud is wiretapped by Eve during transmission,no useful information can be obtained by her. Note that we alsoassume that no encryption technique is used and only codingschemes are utilized to protect the data. There are some priorwork that attempted to use existing coding schemes to addressthis issue.Some papers use fountain codes [11] for content retrieval.The advantages of coding in caching and storage systemshave been shown in our previous works in [12]–[19]. Theapplication of fountain codes in distributed storage systemswas also studied in [20]. Other types of erasure codes havebeen extensively used in storage systems. Maximum DistanceSeparable (MDS) codes are widely used in storage systems[21], [22] due to their repair capabilities . However, certain re-quirements are needed to secure the applications that use these codes. Authors in [23] also studied the security of distributedstorage systems with MDS codes. Pawar et al. [24] studiedthe secrecy capacity of MDS codes. The authors in [25],[26] also proposed security measures for MDS coded storagesystems. Shah et al. [27] proposed information-theoretic secureregenerating codes for distributed storage systems. Rawat etal. [28] used Gabidulin codes on top of MDS codes to proposeoptimal locally repairable and secure codes for distributedstorage systems. Unlike all of the references [20]–[28], thispaper studies the use of sparse vectors to design codes toprovide security for distributed storage systems. We will showthat these codes can be effectively used to attain asymptoticperfect secrecy.Kumar et al. [29] have proposed a construction for re-pairable and secure fountain codes. Reference [29] achievessecurity by concatenating Gabidulin codes with RepairableFountain Codes (RFC). Their speciﬁc design allows to useLocally Repairable Fountain Codes (LRFC) for secure repairof the lost data. Unlike [29] which has focused on the securityof the repair links using concatenated codes, the current paperprovides security for the data storage by only using sparsevectors without any additional code usage such that perfectsecrecy is achieved.Network coding schemes has been shown to be very ef-ﬁcient from a security point of view. Cai and Young [30]showed that network coding can be used to achieve perfectsecrecy. Bhattad et al. [31] studied the problem of “weaklysecure” network coding schemes in which even without perfectsecrecy, no meaningful information can be extracted fromthe network. Subsequent to [31], Kadhe et al. studied theproblem of weakly secure storage systems in [32], [33]. Yanet al. also proposed [34], [35] algorithms to achieve weaksecurity and also studied weakly secure data exchange withgeneralized Reed Solomon codes. In our method, when usingsparse vectors to design codes for cloud storage systems, themessages are encoded by combining them with each otherto create the ciphertext. Hence, the ciphertext will not beindependent of the message and the Shannon criteria may notbe valid. Therefore it may be intuitive to think that these codescan only achieve weak security as opposed to perfect security.We will show that our unique code construction results inasymptotic perfect secrecy.As far as we know, in all previous works in literature,researchers use existing codes that were originally designedfor error correction and used/modiﬁed it for security purposes.In this paper, we pose the following questions. “

Can we designa code speciﬁcally for security of stored data in cloud storagesystems? ” More speciﬁcally, since we will face the problem oflarge quantities of data being generated in the cloud, “ can wedesign a code that has signiﬁcantly lower computational com-plexity for decoding as compared to commonly used encryptiontechniques?

The immediate beneﬁt from such approach wouldbe signiﬁcant reduction on the number of servers neededto maintain contents securely in the cloud because of thereduction on the computational complexity associated withrecovering the data. Note that the new code design may nothave any error correcting capabilities. It means that commonlyused codes such as MDS codes [31] can be used to protect the data. Another beneﬁt of this approach is that even thecloud storage provider is unable to access the contents whichprovides a level of privacy for users to store their data inthe cloud. Clearly, if the cloud storage provider is unable toaccess the contents, any eavesdropper who is listening to thecommunication between the user (Alice) and the cloud (Bob),cannot obtain any useful information.Our ﬁnal goal is to demonstrate that perfect secrecy asdeﬁned by Shannon in [1] can be achieved without any need tostore keys. Note that the original approach of Shannon ciphersystem [1] is not practical since for each bit of information,we need to store one bit of key which practically doublesthe storage capacity requirement. We demonstrate that withoutgenerating any key, it is feasible to achieve perfect secrecyasymptotically. Since in this work we generate equivalent ofkey by means of combining of contents together (encodingfunction), it is clear that this encoding function cannot beshared by Alice with the cloud or Eve. This is one fundamentaldifference between this work and common problem of wiretapchannel. One can consider this as the private key that isgenerated by Alice to secure the contents and won’t be sharedwith anyone. III. P

ROBLEM F ORMULATION

Assume that a user wants to store ﬁles f , f , . . . , f m on acloud system where each ﬁle has Q bits, i.e. f i ∈ F Q . Weassume that these are archival data and extension of this workto non-archival data remains as future work. The m × vectorthat represents all ﬁles is denoted by f = [ f f . . . f m ] T . A. Encoding

The user encodes these ﬁles using an encoding matrix A of size l × m (where l ≥ m ) and creates an encoded vectorof l coded ﬁles as b = Af . Assume that the elements of A belong to the Galois Field F . The user has a storage spaceof size h << l and saves h of these encoded contents locallyand uploads the rest of them on the cloud. These h encodedcontents act similar to the key in traditional cryptography. Let c be a vector of size ( l − h ) × showing all the encodedcontents stored on the cloud and u be a vector of size h × representing all the encoded contents saved on the user storagesuch that b = (cid:20) cu (cid:21) . (1) B. File Retrieval

To retrieve a content, we need to solve linear equations inGalois Field F . The cloud contains l − h coded ﬁles in c andthe user stores h coded ﬁles in u locally. When the legitimateuser wants to download any of the contents, the applicationthat runs by the user should solve the linear equation Af = b , (2) Assume that an encoding peice of software is running on the user whichdoes all of this processing. Note that each row of vector b contains Q bits and for simplicity ofpresentation, we use vector representation. in Galois Field F to be able to respond to the downloadrequest by the user. Since l > m , the linear equation in (2)has many solutions. Let the decoding matrix D be one of thesesolutions. This matrix can be split in two smaller matrices D c of size m × ( l − h ) and D u of size m × h denoting the cloudand user decoding matrices respectively. Therefore, the m × l decoding matrix D could be used to retrieve the ﬁles as f = Db = (cid:2) D c D u (cid:3) (cid:20) cu (cid:21) = D c c + D u u . (3)Notice that in equation (3), the ﬁrst contributing term iscomputed on the cloud data and the second contributing term iscalculated from the user stored data. It is important to noticethat cloud only gets the matrix D c and therefore it is notcapable of decoding the data on its own. In the subsequentsection we will show that the user contributing part D u u actssimilar to the key and is crucial to secure decoding. C. Low Complexity Secure Code (LCSC) Design

Our goal is to propose an encoding strategy such that A will be full rank with high probability and the rows of thematrix A are fairly dense. Further, we want to ideally have arelatively sparse decoding matrix D . We will use the density ofthe encoding matrix A for security purposes and the sparsityof the decoding matrix D for low complexity decoding. Inother words, we want the columns of the encoding matrix D to be sparse so that few cloud operations would be enough toretrieve a content while the rows of the matrix A are densesuch that a relatively large number of ﬁles get encoded togetherto enhance the security of our system.From the deﬁnition of the decoding matric D we have DA = I m , (4)where I m is the m × m identity matrix. If a i is the i th columnof matrix A , then equation (4) results in the following linearequation in F Da i = e i , (5)where e i is a vector of all zeros except at the i th position.For each column of matrix A , e.g. a i , the Hamming weight of these vectors deﬁne the density of the encoding matrix A .We will compute the density of A using the Hamming weightof all vectors a i .In this paper, we start with a sparse decoding matrix D and we will show that an encoding matrix A exists such thatthe number of 1s in each row of the encoding matrix A isproportional to Θ( m ) . To prove this, let’s denote A = [ a , a , . . . , a m ] , (6)where each a i is an l × column vector. We will show thatan encoding matrix A exists such that each a i will havea Hamming weight of Θ( l ) . Notice that the vector a i is asolution to the linear equation in (5) in F . Deﬁnition 1.

A random vector w = ( ω , ω , . . . , ω m ) T ∈ F m is called σ -sparse if all of its elements are independent of eachother and we have P [ ω i = 1] = 12 (1 − σ ) P [ ω i = 0] = 12 (1 + σ ) . (7)Assume that l independent σ -sparse vectors d , d , . . . , d l are the columns of the decoding matrix D . In other words, let D = [ d , d , . . . , d l ] . (8)In the next section, we will compute the density of theencoding matrix A .IV. E XISTENCE OF A D ENSE E NCODING M ATRIX

In order to compute the density of the encoding matrix A ,we ﬁrst prove the following useful lemmas. Lemma 1.

For a vector x ∈ F l with Hamming weight k , wehave P [ Dx = e i | wt ( x ) = k ] = 2 − m (cid:0) − σ k (cid:1) (cid:0) σ k (cid:1) m − . (9) Proof.

Since the Hamming weight of x is equal to k , thismeans that k vectors from the set of all vectors d , d , . . . , d l are added together to create e i . Let’s denote these vectors by d e , d e , . . . , d e k . Let d e j i denote the i th element of vector d e j . Since the vectors d e , d e , . . . , d e k are independent andtheir elements are also mutually independent, using binarysummation over F we have P [ Dx = e i | wt ( x ) = k ] = P [ k X j =1 d e j i = 1] m Y l ′ =1 l ′ = i P [ k X j =1 d e j l ′ = 0] . (10)We can easily prove that P [ k X j =1 d e j i = 1] = 12 (1 − σ k ) (11)To prove this, we can use induction on k . Equation (7) showsthat it is valid for the base case k = 1 . Assume that it is validfor k − . We have P [ k X j =1 d e j i = 1] = P [ d e k i = 1] P [ k − X j =1 d e j i = 0]+ P [ d e k i = 0] P [ k − X j =1 d e j i = 1] = 12 (1 − σ ) 12 (1 + σ k − )+ 12 (1 + σ ) 12 (1 − σ k − ) = 12 (1 − σ k ) (12)Similarly, using induction on k and equation (7) we can alsoprove that P [ k X j =1 d e j l ′ = 0] = 12 (1 + σ k ) . (13)Hence, equation (10) can be simpliﬁed to P [ Dx = e i | wt ( x ) = k ] = 2 − m (cid:0) − σ k (cid:1) (cid:0) σ k (cid:1) m − Deﬁnition 2.

Let F l ⊆ F l be the subset of all vectors in F l with a Hamming weight of at least l .We will prove that with probability close to one a solution ofequation (5) belongs to F l . To ﬁnd bounds on the probabilitythat e i is spanned by d , d , . . . , d l we deﬁne a new randomvariable Y i and an indicator function i ( x ) as follows Deﬁnition 3.

Let Y i denote the number of vectors x ∈ F l such that Dx = e i . Deﬁnition 4.

Let i ( x ) be an indicator function which is equalto 1 if Dx = e i and equal to 0 otherwise. Lemma 2. If D = [ d d . . . d l ] , then the average numberof vectors x ∈ F l such that Dx = e i is equal to E [ Y i ] = 2 − m l X j = l (cid:18) lj (cid:19) (cid:0) − σ j (cid:1) (cid:0) σ j (cid:1) m − . (14) Proof.

For every x ∈ F l , we have Y i = P x ∈F l i ( x ) and E [ Y i ] = P x ∈F l P [ Dx = e i ] . Using the result of Lemma 1 andtaking the summation over all values of k proves the lemma. Lemma 3.

We have, E [ Y i ] ≤ E [ Y i ] + E [ Y i ] . Proof.

Since Y i = P x ∈F l i ( x ) we have E [ Y i ] = E  X x ∈F l X x ∈F l i ( x ) i ( x )  = X x ∈F l X x ∈F l E [ i ( x ) i ( x )] = X x ∈F l P [ Dx = e i ]+ X x ∈F l X x ∈F l x = x P [ Dx = e i ] P [ Dx = e i ] = E [ Y i ]+ ( E [ Y i ]) − X x ∈F l ( P [ Dx = e i ]) ≤ E [ Y i ] + E [ Y i ] . Lemma 4.

The probability that e i is the summation of at least k vectors in d , d , . . . , d l is lower bounded by P [ ∃ x ∈ F l s.t. Dx = e i ] ≥

11 + E [ Y i ] . (15) Proof.

Since Y i is a non-negative integer random variable,from the second moment method in probability theory we have P [ Y i > ≥ E [ Y i ] E [ Y i ] . (16)Hence, using Lemma 3 we have P [ ∃ x ∈ F l s.t. Dx = e i ] = P [ Y i > ≥ E [ Y i ] E [ Y i ] ≥

11 + E [ Y i ] Lemma 5.

For any < α < , we have l + 1 2 lH ( α ) ≤ (cid:18) lαl (cid:19) ≤ lH ( α ) , (17) where H ( α ) denotes the entropy, i.e., H ( α ) = − α log ( α ) − (1 − α ) log (1 − α ) . Proof.

The proof can be found in the appendix of [36].

Theorem 1.

Let the vectors d , d , . . . , d l be σ -sparse ran-dom vectors belonging to F m such that their average Hammingweight is asymptotically non-zero, i.e. lim m →∞ E [ wt ( d i )] = 12 lim m →∞ m (1 − σ ) = c > . (18)If l = m (1 + ǫ ) where ǫ > is an arbitrary constant withrespect to m , then with a probability close to one, at least onesolution of equation (5) belongs to F l for large m . Proof.

Consider a base vector e i for i = 1 , , . . . , m . UsingLemmas 2 and 5 we have E [ Y i ] ≥ − m (cid:18) l l (cid:19) (1 − σ l )(1 + σ l ) m − ≥ l − m l + 1 (1 − σ l )(1 + σ l ) m − ≥ l − m l + 1 (1 − σ l ) . Since lim m →∞ E [ wt ( d i )] = c > we have lim m →∞ σ l = lim m →∞ (cid:16) − c m (cid:17) m (1+ ǫ ) = e − c (1+ ǫ ) Hence, for large m we have E [ Y i ] ≥ mǫ m (1 + ǫ ) + 1 (cid:16) − e − c (1+ ǫ ) (cid:17) . Therefore, lim m →∞ E [ Y i ] ≤ lim m →∞ mǫ m (1+ ǫ )+1 (cid:0) − e − c (1+ ǫ ) (cid:1) = 0 Using Lemma 4 we have lim m →∞ P [ ∃ x ∈ F l s.t. Dx = e i ] ≥ lim m →∞

11 + E [ Y i ] = 1 . This proves that with a probability approaching one, when m → ∞ and l = m (1 + ǫ ) then at least one solution toequation (5) belongs to F l .The above argument proves that if l = m (1 + ǫ ) for some ǫ > , then at least one of the solutions of the equation (5)will have a Hamming weight of l . Hence, for any σ -sparsedecoding matrix D , at least one encoding matrix A existssuch that all of its columns will have a Hamming weight of l . Therefore, the average Hamming weight of the rows ofmatrix A will be equal to m which means that at least oneencoding matrix exists that on average has dense rows.V. S ECURITY

In this section, we will use the results of section IV toachieve perfect secrecy. Let A be an l × m dense encodingmatrix which has a sparse decoding matrix D . As proved insection IV, the rows of this matrix will have on average m non-zero elements. Therefore, we can prove the following lemma, Lemma 6.

If the number of non-zero elements of an encodingvector increases with the number of ﬁles m , then the asymp-totic distribution of bits of the encoded ﬁles tend to uniform. Proof.

This proof is given as Lemma 4 in [16].We will now use equation (3) to analyze the security ofour approach. Let f c = D c c and f u = D u u denote thecontributions from the cloud and the user for data retrieval.Then using module two addition we can re-arrange equation(3) to f c = f + f u . (19)Cloud sends f c from the cloud to the user and this informa-tion is subject to eavesdropping. Equation (19) is similar toShannon cipher problem [1]. In Shannon cipher, an encodingfunction e : M × K → C is mapping a message M ∈ M and a key K ∈ K to a codeword C ∈ C . In this problem,the message f is encoded by the key f u and the ciphertext f c is created and transmitted through the channel. We want toexamine the criteria for achieving perfect secrecy using theabove cipher. The following theorem provides the necessaryand sufﬁcient condition to obtain perfect secrecy in Shannoncipher system. Theorem 2. If | M | = | K | = | C | , a coding scheme achievesperfect secrecy if and only if • For each pair ( M , C ) ∈ ( M × C ) , there exists a uniquekey K ∈ K such that C = e ( M , K ) . • The key K is uniformly distributed in K . Proof.

The proof can be found in section 3.1 of [37].We will use Theorem 2 to prove that our approach canachieve asymptotic perfect secrecy.

Theorem 3. If h grows with m such that m < h , thenthe proposed encoding scheme provides asymptotic perfectsecrecy against any eavesdropper wiretapping the communi-cation between the cloud and the user. Proof.

We formulated this problem as a Shannon ciphersystem assuming that M = f , K = f u , and C = f c . Thecondition m < h ensures that a unique key exists for eachrequested message. Therefore, for any pair ( m , C ) ∈ ( M , C ) ,a unique key K ∈ K exists such that C = m + K . Further,we are guaranteed to have | M | = | K | = | C | . Notice that thekey K = f u belongs to the set of all possible bit strings with Q bits. Lemma 6 proves that each encoded ﬁle is uniformlydistributed among all Q -bit strings. Hence each key whichis a unique summation of such encoded ﬁles is uniformlydistributed among the set of all Q -bit strings. In other words,regardless of the distribution of the bits in ﬁles, f u can beany bit string with equal probability for large values of m .Therefore, the conditions in Theorem 2 are met and perfectsecrecy is achieved.VI. C OMPLEXITY A NALYSIS AND S IMULATIONS

In this section, we will compare the complexity of ourproposed algorithm with the complexity of AES algorithmwhich is the standard cryptographic algorithm adopted byNIST in the U.S. and is a part of TLS and HTTPS protocols.AES is a block cipher with a block length of 128 bits.AES encryption consists of 10 rounds of processing for a

SubBytes ), a row-wise permutation step (

ShiftRows ),a column-wise mixing step (

MixColumns ), and the addition ofthe round key (

AddRoundKey ). The order in which these foursteps are executed is different for encryption and decryption.In the AddRoundKey step the input text is XORed with thekey and the same thing happens during decryption. The goalof ShiftRows and MixColumns steps is to scramble the byteorder inside each 128-bit block. All of these steps require alot of operations which result in a large number of sequentialoperations either for decoding and encoding. Some of theseoperations are summarized in Table I.Table I compares the number of operations in our proposedalgorithm with the number of operations in the AES algorithmto justify the signiﬁcant improvement of our approach overthe AES algorithm in terms of computational complexity.As can be seen from this table, at least a total of 5268 bitXOR operations are performed on a 128 block. Therefore, thenumber of per-bit XOR operations will be equal to 41.125bit XOR operations in AES decryption algorithm. Notice thatall of these steps in AES algorithm are done in sequentialorder and this will induce signiﬁcant delays on AES encryptionwhile out technique requires one-time XOR operation at thedata center.Sparse decoding in our method allows us to perform thedecoding with m (1 − σ ) XOR operations. Therefore for m =128 , our approach requires − σ ) XOR operations fordecoding. This table clearly shows that our proposed algorithmsigniﬁcantly reduces the number of per-bit XOR operationsfrom 41.125 to 4 XOR operations in case when σ = 15 / . Italso does not require any other operations. Also, our proposeddecoding operation does not inﬂict signiﬁcant delay on thesystem. TABLE IC

OMPLEXITY COMPARISON

Algorithm No. of operationsKeyExpansion step in AES Algorithm 40 byte row shifts50 byte table look-ups50 byte XOR operationsAddRoundKey step in AES Algorithm 16 byte XOR operationsSubBytes step in AES Algorithm 16 byte table look-upsShiftRows step in AES Algorithm 12 byte row shiftsMixColumns step in AES Algorithm 48 byte XOR operations64 byte table look-upsTotal number of byte XORoperations in AES decryption 658 byte XOR operationsTotal number of bit XORoperations in AES decryption 5268 bit XOR operationsTotal number of per-bit XORoperations in AES 41.125 XOR operationsTotal number of per-bit XORoperations in our sparse algorithmwith σ = 15 /

16 = 0 .

10 20 30 40 50 60 70 80 90 100

Number of files m A v e r age den s i t y o f t he p s eudo - i n v e r s e m a t r i x Average density of the pseudo-inverse matrix = 0.5 = 0.6 = 0.7 = 0.8 = 0.9

Fig. 1. Average density of the encoding matrix versus the number of ﬁlesfor different sparse decoding matrices.

Figure 1 shows our simulation results. In this ﬁgure, wehave plotted the average density of the encoding matrix fordifferent number of ﬁles m and different values of sparsityindex σ . The case of σ = 0 . is equivalent to random uniformdecoding matrices which is not sparse. As we can see formthis plot, when σ = 0 . the average Hamming weight ofthe encoding matrix is equal to m for different values of m . This is intuitively expected due to the symmetry andnon-sparsity of the problem. When σ becomes larger than . , the decoding matrix D becomes sparser which resultsin better decoding complexity and lower number of XORoperations. As can be seen from Figure 1, the average densityof the encoding matrix becomes less than m in this case.However, the simulation results conﬁrm our theoretical resultsthat regardless of the value of σ , i.e. regardless of the degreeof sparsity of the decoding matrix when m goes to inﬁnity, theaverage Hamming weight of the encoding matrix approachesthe dense value of m . Figure 1 shows that for σ = 0 . and m > , or when σ = 0 . and m > , or when σ = 0 . and m > , then the average density of the encoding matrixis close to m . Further, as can be seen from this plot, when σ = 0 . the average density of the encoding matrix slowlyapproaches m as m goes to inﬁnity.VII. C ONCLUSIONS

In this paper, we have proposed a radically different ap-proach for providing secure low complexity storage solutionsfor cloud systems. Our method is speciﬁcally suited for bigdata applications when a large number of archival ﬁles needsto be stored on the cloud. We proved that our method iscapable of achieving asymptotic perfect secrecy and throughsimulations and numerical studies have shown that it is compu-tationally more efﬁcient than today’s cryptographic algorithmswhich are not well optimized to handle very large number ofﬁles.We do not claim that this approach can replace encryptionfor all applications. For example, the decoding instruction thatis exchanged between the user and cloud can be encrypted ﬁrstbefore transmission. However, this approach can be a good alternative for archival data stored in cloud storage systems.Future work will focus on studying the security properties ofthis approach for ﬁnite values of m . Extension of this workto other types of data is also desirable. This work was mainlyfocused on decoding properties of these codes but it may beuseful to investigate low encoding complexity secure codes.A CKNOWLEDGEMENT

The authors would like to acknowledge the generous supportof Huawei Technologies in funding this research. This researchis supported by Huawei Technology North America throughaward number TETF-9402566.R

EFERENCES[1] Claude E Shannon. Communication theory of secrecy systems.

BellLabs Technical Journal , 28(4):656–715, 1949.[2] Aaron D Wyner. The wire-tap channel.

Bell Labs Technical Journal ,54(8):1355–1387, 1975.[3] Yingxian Zhang, Aijun Liu, Chao Gong, Guanpeng Yang, and SixiangYang. Polar-LDPC concatenated coding for the awgn wiretap channel.

IEEE Communications Letters , 18(10):1683–1686, 2014.[4] Willie K Harrison and Steven W McLaughlin. Physical-layer security:Combining error control coding and cryptography. In

Communications,2009. ICC’09. IEEE International Conference on , pages 1–5. IEEE,2009.[5] Mohsen Karimzadeh Kiskani, Bita Azimdoost, and Hamid R Sadjadpour.Effect of social groups on the capacity of wireless networks.

IEEETransactions on Wireless Communications , 15(1):3–13, 2016.[6] Mohsen Karimzadeh Kiskani, Hamid Sadjadpour, and Mohsen Guizani.Social interaction increases capacity of wireless networks. In

WirelessCommunications and Mobile Computing Conference (IWCMC), 20139th International , pages 467–472. IEEE, 2013.[7] Mohsen Karimzadeh Kiskani and Babak Hossein Khalaj. Novel powercontrol algorithms for underlay cognitive radio networks. In

SystemsEngineering (ICSEng), 2011 21st International Conference on , pages206–211. IEEE, 2011.[8] Mohsen Karimzadeh Kiskani, Babak Hossein Khalaj, and Shahin Vak-ilinia. Delay qos provisioning in cognitive radio systems using adaptivemodulation. In

Proceedings of the 6th ACM workshop on QoS andsecurity for wireless and mobile networks , pages 49–54. ACM, 2010.[9] Saeed Vahidian, Sonia A¨ıssa, and Sajad Hatamnia. Relay selectionfor security-constrained cooperative communication in the presence ofeavesdropper’s overhearing and interference.

IEEE Wireless Communi-cations Letters , 4(6):577–580, 2015.[10] Mohammad-Parsa Hosseini, Hamid Soltanian-Zadeh, Kost Elisevich,and Dario Pompili. Cloud-based deep learning of big EEG data forepileptic seizure prediction. In

Signal and Information Processing(GlobalSIP), 2016 IEEE Global Conference on , pages 1151–1155. IEEE,2016.[11] David JC MacKay. Fountain codes.

IEE Proceedings-Communications ,152(6):1062–1068, 2005.[12] Mohsen Karimzadeh Kiskani and Hamid R Sadjadpour. Throughputanalysis of decentralized coded content caching in cellular networks.

IEEE Transactions on Wireless Communications , 16(1):663–672, 2017.[13] Mohsen Karimzadeh Kiskani and Hamid Sadjadpour. Application of in-dex coding in information-centric networks. In

Computing, Networkingand Communications (ICNC), 2015 International Conference on , pages977–983. IEEE, 2015.[14] Mohsen Karimzadeh Kiskani and Hamid R. Sadjadpour. Secure codedcaching in wireless ad-hoc networks. In

International Conference onComputing, Networking and Communications (ICNC) , January 2017.[15] Mohsen Karimzadeh Kiskani and Hamid Sadjadpour. Secure and privatecloud storage systems with random linear fountain codes. In

IEEEConference on Cloud and Big Data Computing (CBDCom) , August2017.[16] Mohsen Karimzadeh Kiskani and Hamid R Sadjadpour. A secureapproach for caching contents in wireless ad hoc networks.

IEEETransactions on Vehicular Technology , 66(11):10249–10258, 2017.[17] Mohsen Karimzadeh Kiskani and Hamid R Sadjadpour. Capacityof cellular networks with femtocache. In

Computer CommunicationsWorkshops (INFOCOM WKSHPS), 2016 IEEE Conference on , pages9–14. IEEE, 2016. [18] Mohsen Karimzadeh Kiskani and Hamid R Sadjadpour. Multihopcaching-aided coded multicasting for the next generation of cellularnetworks.

IEEE Transactions on Vehicular Technology , 66(3):2576–2585, 2017.[19] Mohsen Karimzadeh Kiskani, Zheng Wang, Hamid R Sadjadpour,Jose A Oviedo, and Jose Joaquin Garcia-Luna-Aceves. Opportunisticinterference management: a new approach for multiantenna downlinkcellular networks.

Wireless Communications and Mobile Computing ,15(14):1837–1850, 2015.[20] Alexandros G Dimakis, Vinod Prabhakaran, and Kannan Ramchandran.Distributed fountain codes for networked storage. In

Acoustics, Speechand Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEEInternational Conference on , volume 5, pages V–V. IEEE, 2006.[21] Alexandros G Dimakis, P Brighten Godfrey, Yunnan Wu, Martin JWainwright, and Kannan Ramchandran. Network coding for distributedstorage systems.

IEEE Transactions on Information Theory , 56(9):4539–4551, 2010.[22] Alexandros G Dimakis, Kannan Ramchandran, Yunnan Wu, andChangho Suh. A survey on network codes for distributed storage.

Proceedings of the IEEE , 99(3):476–489, 2011.[23] Theodoros K Dikaliotis, Alexandros G Dimakis, and Tracey Ho. Se-curity in distributed storage systems by communicating a logarithmicnumber of bits. In

Information Theory Proceedings (ISIT), 2010 IEEEInternational Symposium on , pages 1948–1952. IEEE, 2010.[24] Sameer Pawar, Salim El Rouayheb, and Kannan Ramchandran. Onsecure distributed data storage under repair dynamics. In

InformationTheory Proceedings (ISIT), 2010 IEEE International Symposium on ,pages 2543–2547. IEEE, 2010.[25] Sameer Pawar, Salim El Rouayheb, and Kannan Ramchandran. Securingdynamic distributed storage systems against eavesdropping and adver-sarial attacks.

IEEE Transactions on Information Theory , 57(10):6734–6753, 2011.[26] Sameer Pawar, Salim El Rouayheb, and Kannan Ramchandran. Securingdynamic distributed storage systems from malicious nodes. In

Informa-tion Theory Proceedings (ISIT), 2011 IEEE International Symposiumon , pages 1452–1456. IEEE, 2011.[27] Nihar B Shah, KV Rashmi, and P Vijay Kumar. Information-theoretically secure regenerating codes for distributed storage. In

GlobalTelecommunications Conference (GLOBECOM 2011), 2011 IEEE , pages1–5. IEEE, 2011.[28] Ankit Singh Rawat, Onur Ozan Koyluoglu, Natalia Silberstein, andSriram Vishwanath. Optimal locally repairable and secure codes fordistributed storage systems.

IEEE Transactions on Information Theory ,60(1):212–236, 2014.[29] Siddhartha Kumar, Eirik Rosnes, and Alexandre Graell i Amat. Securerepairable fountain codes.

IEEE Communications Letters , 20(8):1491–1494, 2016.[30] Ning Cai and Raymond W Yeung. Secure network coding. In

Infor-mation Theory, 2002. Proceedings. 2002 IEEE International Symposiumon , page 323. IEEE, 2002.[31] Kapil Bhattad, Krishna R Narayanan, et al. Weakly secure networkcoding.

NetCod, Apr , 104, 2005.[32] Swanand Kadhe and Alex Sprintson. On a weakly secure regeneratingcode construction for minimum storage regime. In

Communication, Con-trol, and Computing (Allerton), 2014 52nd Annual Allerton Conferenceon , pages 445–452. IEEE, 2014.[33] Swanand Kadhe and Alex Sprintson. Weakly secure regenerating codesfor distributed storage. In

Network Coding (NetCod), 2014 InternationalSymposium on , pages 1–6. IEEE, 2014.[34] Muxi Yan, Alex Sprintson, and Igor Zelenko. Weakly secure dataexchange with generalized reed solomon codes. In

Information Theory(ISIT), 2014 IEEE International Symposium on , pages 1366–1370. IEEE,2014.[35] Muxi Yan and Alex Sprintson. Algorithms for weakly secure dataexchange. In

Network Coding (NetCod), 2013 International Symposiumon , pages 1–6. IEEE, 2013.[36] David J. C. MacKay. Good error-correcting codes based on very sparsematrices.

IEEE Trans. Information Theory , 45(2):399–431, 1999.[37] Matthieu Bloch and Joao Barros.