On the Effectiveness of Polynomial Realization of Reed-Solomon Codes for Storage Systems
OOn the Effectiveness of Polynomial Realization of Reed-Solomon Codes forStorage Systems
Kyumars Sheykh Esmaili ∗ Technicolor Research LabParis, [email protected]
Anwitaman Datta
Nanyang Technological [email protected]
Abstract
There are different ways to realize Reed Solomon (RS)codes. While in the storage community, using the gen-erator matrices to implement RS codes is more popular,in the coding theory community the generator polyno-mials are typically used to realize RS codes. Prominentexceptions include HDFS-RAID, which uses generatorpolynomial based erasure codes, and extends the ApacheHadoop’s file system.In this paper we evaluate the performance of an imple-mentation of polynomial realization of Reed-Solomoncodes, along with our optimized version of it, against thatof a widely-used library (Jerasure) that implements themain matrix realization alternatives. Our experimentalstudy shows that despite significant performance gainsyielded by our optimizations, the polynomial implemen-tations’ performance is constantly inferior to those ofmatrix realization alternatives in general, and that ofCauchy bit matrices in particular.
In the past few years, erasure codes, most prominentlyReed Solomon (RS) codes, have been increasingly em-braced by distributed storage systems –e.g., Facebook’sHDFS-RAID [18], Microsoft Azure [8], and Google FileSystem (GFS) [5]– as an alternative to replication, sincethey provide high fault-tolerance for low overheads.RS codes are defined over Galois Fields of size 2 w ,represented by GF ( w ) . In the RS coding scheme ofRS( k , m ), an object consisting of k elements (a.k.a sym-bols) from GF is encoded into n = m + k blocks (where n ≤ w ) in a way that the original object can be recreatedfrom any subset of size k of the n encoded pieces. ∗ The bulk of this work was done while the author was a ResearchFellow at Nanyang Technological University, Singapore.
There are two prominent ways to build systematic Reed Solomon codes. While in the storage commu-nity, the generator matrices (e.g., Cauchy matrix) havebeen the dominant realization of RS codes, in the cod-ing theory community, on the other hand, the genera-tor polynomials are the common means to realize RScodes [22]. Among the well-known storage systems thatuses the polynomial realization is HDFS-RAID [7], anerasure code-supporting extension of Apache Hadoop’sdistributed file system (HDFS), developed at Facebook.It has been subsequently used in a number of researchprototypes [20, 4, 3].Our goal in this paper is to empirically investigate theeffectiveness of the polynomial realization of RS codesand compare its performance against a state-of-the-artimplementation of the matrix realization. To this end,we make the following contributions: • describe polynomial realization of RS codes andhighlight its distinguishing properties, • build a C mirror for an open source Java implemen-tation of the polynomial realization of RS codes, • explore several techniques to optimize upon the ex-isting polynomial realization, • conduct a thorough experimental study to investi-gate the effectiveness of the polynomial realizationand compare its performance against Jerasure, anopen-source and widely-used library for matrix re-alization.All our source codes along with a manual can be obtainedfrom an anonymized repository [10]. We plan to releaseour implementation as an open-source library.Our experimental study shows that the polynomialimplementations’ performance is constantly inferior tothose of matrix realization alternatives in general, and A code is systematic if its encoded output contains all the original k data elements. a r X i v : . [ c s . I T ] D ec hat of Cauchy RS codes in particular. This is despitesignificant performance gains resulted from a range ofoptimization that we have devised.The rest of this paper is organized as follows. Wefirst, in Section 2, briefly explain the matrix realizationof Reed Solomon codes and the two well-known matrixconstruction methods. Next, a more detailed explanationof the Polynomial realization of RS codes along with itsimportant properties are given in Section 3. Then, afterdescribing the implementation and optimization detailsin Section 4, the experimental results are presented inSection 5. Finally, the paper is concluded in Section 6. In this section we first give an overview of the matrixrealization of RS codes, and then briefly introduce twotypes of matrices (Vandermonde-based and Cauchy) thatare commonly used in storage systems.
This realization uses a generator matrix, G n × k , whosetop k rows is an Identity matrix I k × k . One essential prop-erty of the generator matrix is that every subset of size k of its rows constitutes an invertible matrix.To encode k elements of data, its vector is multipliedby G , resulting in a codeword composed of the originaldata vector d and a parity vector p of size m = n − k : . . . . . .. . . . . . g , . . g , k − . . . .. . . . g m − , . . g m − , k − × d .. d k − = d .. d k − p .. p m − Decoding (i.e., recreating the erased data elements),is performed in 4 steps: (i) rows that correspond to theerased indexes are removed from the generator matrix,(ii) form the surviving rows, k of them are selected tobuild a matrix of size k × k , (iii) this matrix is inverted,and (iv) multiplying the inverted matrix by the corre-sponding vector data and parity values will generate theerased elements. The original RS Code [19] is constructed in the followingmanner. Given a vector of k data elements, the polyno-mial P ( x ) is defined as: P ( x ) = d + d x + ... + d k − x k − the complete code space C is constructed by choosing x over all possible values in GF ( w ) , yielding a system of2 w linear equations, each with k variables: P ( ) = d P ( α ) = d + d α + d α + ... + d k − α k − P ( α ) = d + d α + d α + ... + d k − α ( k − ) ... P ( α w − ) = d + d α w − + d α ( w − ) + ... + d k − α ( w − )( k − ) Any k + m of the above expressions can be used to con-struct a RS(k,m) code that can recover from up to m era-sures. This is due to the fact that the determinant of theresulting matrices reduces to that of a Vandermonde ma-trix which is always non-singular and invertible.It must be emphasized that none of the matrices builtfrom the above expressions will result in a systematiccode, and therefore they are of little use in storage ap-plications. It is, however, possible to transform non-systematic Vandermonde generator matrices into system-atic matrices. Details of the transformation method alongwith numerical examples can be found in [13, 14]. The generator matrix in this case is composed of theidentity matrix in the first k rows, and a Cauchy matrixin the remaining m rows. It also has the desired propertythat all k × k submatrices are invertible. Cauchy ReedSolomon (CRS) coding [2] modifies the scheme in theprevious section in two ways. First, instead of usinga Vandermonde matrix, CRS coding employs an m × k Cauchy matrix. which is defined as follows. Let X = x , ..., x m and Y = y , ..., y k be such that each x i and y i is a distinct element of GF ( w ) , and X ∩ Y = /0. Then the Cauchy matrix definedby X and Y has 1 ( x i + y j ) in element i , j .The second modification of CRS is to use projectionsthat convert the operations over GF ( w ) into XORs.An important property of these projections is that themultiplication operations –an expensive aspect of thecomputations– can be converted into bitwise AND.It is worth noting that not all Cauchy matrices areequally efficient [17]. There have been some work [17,11] on generating “good” Cauchy matrices, although fora limited set of parameter values.For numerical examples of CRS encoding and decod-ing, see [16, 17].2 Polynomial Realization
Reed-Solomon (RS) codes, in a broader view, are a sub-set of BCH codes –themselves a class of cyclic error-correcting codes– whose construction methodology uses generator polynomials instead of generator matrices .While in the matrix view, codeword is a sequence of values , in the polynomial view, it’s a sequence of coeffi-cients . The codeword space of both views are, however,equivalent through Fourier transformations.In this section, we first explain the concept of gener-ator polynomial and the way that is built, and then de-scribe how encoding and decoding operations are per-formed for BCH codes.
BCH codes use a generator polynomial, g ( x ) which con-sists of m + g ( x ) = m ∑ i = g i x i = ( x + α )( x + α ) ... ( x + α m − ) For example , for RS ( k = , m = ) in GF ( ) , wherethe primitive root is 2, the generator polynomial will be: g ( x ) = ( x + )( x + )( x + ) = x + x + x + In this realization, the data elements are also representedas k coefficients of a polynomial d ( x ) of order k −
1. Toencode d , it is first multiplied by x m and then dividedby the generator polynomial, g ( x ) . The coefficients ofthe remainder polynomial p ( x ) are the output parity ele-ments: d ( x ) × x m ≡ p ( x ) mod g ( x ) (1)or m + k − ∑ i = m d i x i ≡ m − ∑ i = p i x i mod m ∑ i = g i x i (2)Given the generator polynomial in the example above,the following data vector: , , , will be encoded as follows x + x + x + x ≡ + x + x mod x + x + x + hence the parity elements are: , , An extensive set of numerical examples can be found in [21].
One useful property of the encoding operation in thepolynomial realization is that the handling of updates isvery straightforward. In fact, as shown in [3], in caseof data updates, encoding the relevant data diff-blockswill generate the parity diff-blocks. This is in con-trast to the matrix realization in which the correspond-ing coefficients must be extracted from the generator ma-trix [23, 12, 1].
In the polynomial realization of RS codes, decoding iscarried out in two steps: • Step 1 : In this step, error-evaluator polynomials arecomputed. The general specification of these polynomi-als is as follows: D ( x ) = p ( x ) + ( k − ∑ i = d i (cid:54) = x i ) × x m in which d i (cid:54) = denotes the surviving data elements. Giventhe following equivalency : p ( x ) + d ( x ) × x m ≡ g ( x ) it can be inferred that D ( x ) = k − ∑ i = d i = x i in which d i = denotes the erased data elements (to be re-generated). To perform the decoding, a system of equa-tions will be built from as many as m instances of theabove formula.For example, when the first three elements of the datavector from the previous example are erased: , , , then D ( x ) is computed for the first three powers of theGF’s primitive root: D ( α ) = + ( α )+ ( α ) + ( α ) = d ( α ) + d ( α ) + d ( α ) D ( α ) = + ( α )+ ( α ) + ( α ) = d ( α ) + d ( α ) + d ( α ) D ( α ) = + ( α )+ ( α ) + ( α ) = d ( α ) + d ( α ) + d ( α ) • Step 2 : the outcome of Step 1 is a system of equa-tions whose corresponding matrix is of type Vander-monde (note that the size of matrix is not fixed and isdetermined by the number of erasures). To continue withthe example from above: D ( α ) D ( α ) D ( α ) = α α α α α α α α α × d d d This has some commonalities with one of the steps in the Parity-Check Matrix computations explained in [14]. = × d d d after solving this system of equations the the erased datablocks are regenerated: = d d d There are two notable remarks regarding this decod-ing procedure. First, the procedure is symmetric, mean-ing that it can be applied to the original data vector togenerate the parity elements which is basically what theencoding functionality does. An immediate implicationof this symmetry is that implementing the decoding pro-cedure is sufficient to provide polynomial RS coding. Wehave, in fact, exploited this property in our implementa-tion, where we solely focus on optimizing the decodingfunctionality and use the same method for both encodingand decoding.Second, while in the matrix-based RS codes, decod-ing uses exactly k surviving elements (data and parity),the procedure above can use more than k survivor ele-ments and by doing so, speed up the decoding process.In network-critical storage systems, however, the cost offetching extra data blocks may offset the computationalgains. This tradeoff has been explored in [4]. Jerasure [16] is a widely-used and open source erasurecoding library which implements the matrix realizationof RS codes (both Vandermonde-based and Cauchy vari-ants). It is written in C and has been shown to be highlyefficient [15].For the polynomial realization, to the best of ourknowledge, there is no open source implementation inC. There is, however, a Java implementation developedwithin HDFS-RAID [7] which is built upon GF ( ) . Wehave ported this implementation to C and improved itsperformance through a number of optimizations, listedbelow: • Opt1 : in the first step of the decoding process(Section 3.3), HDFS-RAID computes the value of theerror-evaluator polynomials ( D ( α i ) ’s) in an iterativefashion and independently for each vector of survivingelements. However, since the coefficients of thesepolynomials (i.e., different powers of the primitive root)are the same for all survivors’ vectors, we factor out andpre-compute them beforehand. Then, for each vector ofsurvivors, we just multiply the vector by these common factors. • Opt2 : in the same stage (Step 1), while theHDFS-RAID implementation always considers all the m + k primitive powers for computing D ( α i ) ’s, ourimplementation excludes the the erased indexes andhence avoids doing zero-result multiplications. • Opt3 : in the second step of the decoding process,HDFS-RAID employs the Gaussian elimination methodto solve the system of equations, once for each vectorof polynomial values (computed in Step 1). But sincethe same Vandermonde matrix is shared by all vectors,in our implementation we pre-compute and invert thismatrix only once. Later, in each iteration we justmultiply the inverted matrix by the vector of polynomialvalues. • Opt4 : while decoding a large number of survivors’vectors, our implementation, uses region-level multipli-cation and XORing. This optimization is inspired byJerasure and our implementation uses a slightly modifiedversion of one of Jerasure’s utility methods.All of these optimizations are aimed at the decodingfunctionality, since we exploit the symmetric property ofpolynomial realization of RS codes and use the same im-plementation to encode data as well. In terms of effec-tiveness, based on our development phase tests, the firstand third optimizations have the largest impacts. Further-more, the effectiveness of the first three optimizationsgrow with increase in the number of erasures.On top of reducing the computational cost, our opti-mizations also reduce the memory consumption of theHDFS-RAID through: (i) use arrays of type char in-stead of int to represent each GF ( ) element, and (ii)use of pipelining (e.g., in Opt4) which requires less tem-porary storage allocations. As a result, the overall mem-ory consumption is decreased significantly (up to 80%),to a level which is comparable with that of Jerasure.Lastly, we would like to note that all the source codes(including our optimization codes, HDFS-RAID’s im-plementation of RS coding in C and Java, the Jerasurelibrary, and the experimental utility codes) and a manualcan be obtained from [10]. In this section, we first explain the important details ofour experimental setup and then present the results andanalysis.4
Number of Erasures T i m e ( s e c o nd s ) Figure 2: Decoding Time for Different Erasure Sizes in(k=10,m=4)
In our evaluation study, we have examined the followingfive methods: • OrigCRS : the original Cauchy RS, • GoodCRS : CRS with “good” Cauchy matrices, • VanderRS : the Vandermonde-based RS, • PolyRS : our re-implementation (in C) of HDFS-RAID’s implementation of polynomial RS, • OptPolyRS : the optimized version of the above im-plementation.For the first three methods, we use Jerasure’s imple-mentations. In all cases, data and parity elements aredefined over GF ( ) , and the multiplication and divisiontables are pre-computed and maintained in memory.Similar to [15], we focus on the computational cost ofencoding and decoding (i.e. recovering from exactly m erasures) operations through measuring their completiontimes. Also, in order to minimize the impact of I/O activ-ities, we generate random data and store them in appro-priate structures in memory prior to running the encodingand decoding procedures.The coding scheme used in our experiments isRS( k=10 , m=4 ), a popular scheme used by both Face-book [18] and Windows Azure [8].For space purposes, here we only report the resultsof experiments that were run on a 64 bits Debian 7.0machine with 4 × × In our experiments, we varied a number of crucial pa-rameters: • Block Size . A block is a coarser-grain collection ofGF elements (data or parity). We vary the block sizeparameter from 1MB to 4MB. In our scheme of (10,4),this means that total size of data and parity changes from14MB to 56MB. The results of this experiment are de-picted in in Figure 1. For all the remaining experiments,the block size is set to be 4MB. • Erasure Size . In this experiment we vary the erasuresize from its minimum, 1, to its maximum, 4. The resultsare shown in Figure 2. In all other experiments the era-sure size is maximum. • Coding Parameters . We vary both data size, k , andparity size, m . Note that changing either of two param-eters, changes the storage overhead (i.e., the m / k ratio)of coding scheme, although in opposing directions. Theresults are summarized in Figure 3 and Figure 4, respec-tively.Based on the above results, here are the notable pat-terns: • Matrix-based implementations consistently –in allparameter combinations and for both encoding anddecoding– outperform the polynomial ones. Fur-thermore, they generally have lower growth rates(slopes) as well. • In real world scenarios, single erasures (per stripe)are by far the most common type of failures in stor-age systems [18]. As such, based on the results pre-sented in Figure 2, matrix-based realizations havea significant advantage (up to 10 times faster). Innetwork-critical configurations, the differences willbe even higher (as explained in Section 4). • The decoding of matrix methods are more effectivefor higher storage overheads. • The optimization gains in OptPolyRS increase withthe number of erasures (inline with our developmentphase tests, as mentioned in Section 4). • Data encoding in PolyRS is considerably slow forlow storage overheads e.g., it requires more than%250 of our optimized version’s time in (10,2).The gap, however, narrows as the storage overheadgrows (around %10 in (10,6)). • GoodCRS is more effective (compare to OrigCRS)in encoding than in decoding.5
MB 2
MB 3
MB 4
MBGoodCRSOrigCRSVanderRSOptPolyRSPolyRS
Block
Size T i m e ( s e c o nd s ) (a) Encoding Time MB 2
MB 3
MB 4
MBGoodCRSOrigCRSVanderRSOptPolyRSPolyRS
Block
Size T i m e ( s e c o nd s ) (b) Decoding Time (for m Erasures) Figure 1: Impact of Varying the Block Size in (k=10,m=4)
GoodCRS OrigCRS VanderRSOptPolyRS PolyRS T i m e ( s e c o nd s ) Data Size (k) (a) Encoding Time T i m e ( s e c o nd s ) Data
Size (k) (b) Decoding Time
Figure 3: Impact of Varying the Data Size for m=4
Parity
Size (m) T i m e ( s e c o nd s ) (c) Encoding Time T i m e ( s e c o nd s ) Parity
Size (m) (d) Decoding Time
Figure 4: Impact of Varying the Parity Size for k=10
We evaluated the performance of an implementation ofpolynomial-based Reed-Solomon codes against that ofa state-of-the-art implementation of two main matrix-based alternatives. Based on our experimental study, thepolynomial implementation’s performance is constantlyinferior to those of matrix alternatives in general, and thatof Cauchy Reed Solomon in particular. This is despitesignificant performance gains resulted from a range ofoptimization that we have devised.One important conclusion to draw from these resultsis that HDFS-RAID’s RS coding performance can begreatly improved, by either adopting some of the opti-mizations described in this paper, or by using Cauchymatrices RS codes.We see three directions to extend the work reported here. Firstly, since one of the main factor behind CRS’shigh efficiency is its use of bit-matrix and multiplication-free computations, and given the high number of multi-plications in the polynomial realization, it would be inter-esting to see what impact the adoption of a bit-matriceswill have on it.Secondly, a recent paper [9] has demonstrated that itis possible to multiply regions of bytes by constants in aGalois Field very fast. It’s not quite as fast as XOR, butthe speed is limited by how quickly you can populate theL3 cache. Another way to extend our current work is tointegrate this new multiplication method and measure itsimpact.Lastly, as the methodology and proofs in [6] show, agenerator polynomial can be transformed into a genera-tor matrix. Examining the effectiveness of such matricesfor RS coding can be another avenue for future work.6 eferences [1] A
GUILERA , M. K., J
ANAKIRAMAN , R.,
AND X U ,L. Using Erasure Codes Efficiently for Storage in aDistributed System. In Proceedings of the Interna-tional Conference on Dependable Systems and Net-works (DSN) (2005), pp. 336–345.[2] B
LOEMER , J., K
ALFANE , M., K
ARP , R.,K
ARPINSKI , M., L
UBY , M.,
AND Z UCKERMAN ,D. An XOR-Based Erasure-Resilient CodingScheme. Tech. Rep. TR-95-048, InternationalComputer Science Institute, 1995.[3] E
SMAILI , K. S., C
HINIAH , A.,
AND D ATTA , A.Efficient Updates in Cross-Object Erasure-CodedStorage Systems. In
Proceedings of the Work-shop on Distributed Storage Systems and Codingfor BigData (2013).[4] E
SMAILI , K. S., P
AMIES -J UAREZ , L.,
AND D ATTA , A. CORE: Cross-Object Redundancy forEfficient Data Repair in Storage Systems. In
Pro-ceedings of the IEEE International Conference onBig Data (2013).[5] F
ORD , D., L
ABELLE , F., P
OPOVICI , F. I.,S
TOKELY , M., T
RUONG , V.-A., B
ARROSO , L.,G
RIMES , C.,
AND Q UINLAN , S. Availability inGlobally Distributed Storage Systems. In
OSDI (2010), pp. 61–74.[6] H
ALL , J. I.
Notes on Coding Theory . Citeseer,2003.[7] HDFS-RAID. http://wiki.apache.org/hadoop/HDFS-RAID .[8] H
UANG , C., S
IMITCI , H., X U , Y., O GUS , A.,C
ALDER , B., G
OPALAN , P., L I , J., Y EKHANIN ,S.,
ET AL . Erasure Coding in Windows Azure Stor-age. In
Proceedings of USENIX ATC’12 (2012).[9] J. S. P
LANK , K. M. G.,
AND M ILLER , E. L.Screaming Fast Galois Field Arithmetic Using In-tel SIMD Instructions. In
Proceedings of USENIXFAST’13 (2013).[10] K
YUMARS S HEYKH E SMAILI . Source codesand manual. http://sands.sce.ntu.edu.sg/CodingForNetworkedStorage/Reed-Solomon-Comparison .[11] L I , X., Z HENG , Q., Q
IAN , H., Z
HENG , D.,
AND L I , J. Toward optimizing Cauchy Matrixfor Cauchy Reed-Solomon Code. CommunicationsLetters, IEEE 13 , 8 (2009), 603–605. [12] P
ETER , K.,
AND R EINEFELD , A. Consistencyand Fault Tolerance for Erasure-coded DistributedStorage Systems. In
Proceedings of the 5th Inter-national Workshop on Data-Intensive DistributedComputing Date (2012), pp. 23–32.[13] P
LANK , J. S.,
ET AL . A Tutorial on Reed-SolomonCoding for Fault-Tolerance in RAID-Like S.
Soft-ware Practice and Experience 27 , 9 (1997), 995–1012.[14] P
LANK , J. S.,
AND H UANG , C. Tutorial: Erasurecoding for storage applications. Slides presented atFAST-2013: 11th Usenix Conference on File andStorage Technologies, 2013.[15] P
LANK , J. S., L UO , J., S CHUMAN , C. D., X U ,L., W ILCOX -O’H
EARN , Z.,
ET AL . A perfor-mance evaluation and examination of open-sourceerasure coding libraries for storage. In
FAST (2009), vol. 9, pp. 253–265.[16] P
LANK , J. S., S
IMMERMAN , S.,
AND S CHUMAN ,C. D. Jerasure: A library in C/C++ FacilitatingErasure Coding for Storage Applications - Version1.2.
University of Tennessee, Tech. Rep. CS-08-627 (2008).[17] P
LANK , J. S.,
AND X U , L. Optimizing CauchyReed-Solomon Codes for Fault-Tolerant NetworkStorage Applications. In Network Computing andApplications, 2006. NCA 2006. Fifth IEEE Interna-tional Symposium on (2006), IEEE, pp. 173–180.[18] R
ASHMI , K., S
HAH , N., G U , D., K UANG , H.,B
ORTHAKUR , D.,
AND R AMCHANDRAN , K. ASolution to the Network Challenges of Data Recov-ery in Erasure-coded Distributed Storage Systems:A Study on the Facebook Warehouse Cluster. In
Proceedings of USENIX HotStorage’13 (2013).[19] R
EED , I. S.,
AND S OLOMON , G. PolynomialCodes over Certain Finite Fields.
Journal of theSociety for Industrial and Applied Mathematics 8 ,2 (1960), 300–304.[20] S
ATHIAMOORTHY , M., A
STERIS , M., P
APAIL - IOPOULOS , D. S., D
IMAKIS , A. G., V
ADALI , R.,C
HEN , S.,
AND B ORTHAKUR , D. XORing Ele-phants: Novel Erasure Codes for Big Data.
PVLDB6 , 5 (2013), 325–336.[21] U
NIVERSITY OF N EW B RUNSWICK
ICKER , S. B.,
AND B HARGAVA , V. K.
Reed-Solomon Codes and Their Applications . Wiley-IEEE Press, 1999.[23] Z
HANG , F., H
UANG , J.,