[PDF] Effective high compression of ECG signals at low level distortion

Abstract

An effective method for compression of ECG signals, which falls within the transform lossy compression category, is proposed. The transformation is realized by a fast wavelet transform. The effectiveness of the approach, in relation to the simplicity and speed of its implementation, is a consequence of the efficient storage of the outputs of the algorithm which is realized in compressed Hierarchical Data Format. The compression performance is tested on the MIT-BIH Arrhythmia database producing compression results which largely improve upon recently reported benchmarks on the same database. For a distortion corresponding to a percentage root-mean-square difference PRD of 0.53, in mean value, the achieved average compression ratio is 23.17 with quality score of 43.93. For a mean value of PRD up to 1.71 the compression ratio increases up to 62.5. The compression of a 30 min record is realized in an average time of 0.14 s. The insignificant delay for the compression process, together with the high compression ratio achieved at low level distortion and the negligible time for the signal recovery, uphold the suitability of the technique for supporting distant clinical health care.

Full PDF

aa r X i v : . [ ee ss . SP ] J a n Effective high compression of ECG signals at lowlevel distortion

Laura Rebollo-Neira Mathematics Department, Aston University, B4 7ET Birmingham, UK * [email protected] ABSTRACT

PRD ) of 0.53, in mean value, the achieved average compression ratio is 23.17 with quality score of43.93. For a mean value of

PRD up to 1.71 the compression ratio increases up to 62.5. The compression of a 30min record is realized in an average time of 0.14 s. The insigniﬁcant delay for the compression process, togetherwith the high compression ratio achieved at low level distortion and the negligible time for the signal recovery,uphold the suitability of the technique for supporting distant clinical health care.

Introduction

The electrocardiogram, frequently called ECG, is a routine diagnostic test to assess the electrical and muscularfunctions of the heart. A trained person looking at an ECG record can for instance interpret the rate and rhythm ofheartbeats; estimate the size of the heart, the health of its muscles and its electrical systems; check for effects or sideeffects of medications on the heart, or check heart abnormalities caused by other health conditions. At the presenttime, ambulatory ECG monitoring serves to detect and characterize abnormal cardiac functions during long hoursof ordinary daily activities. Thereby the validated diagnostic role of ECG recording has been extended beyond thebedside .The broad use of ECG records, in particular as a mean of supporting clinical health care from a distance, en-hances the signiﬁcance of dedicated techniques for compressing this type of data. Compression of ECG signals maybe realized without any loss in the signal reconstruction, what is referred to as lossless compression, or allowingsome distortion which does not change the clinical information of the data. The latter is called lossy compression.This procedure can enclose an ECG signal within a ﬁle signiﬁcantly smaller than that containing the uncompressedrecord.The literature concerning both lossless and lossy compression of ECG records is vast. It includes emerg-ing methodologies based on compressed sensing . This work focusses on lossy compression with good perfor-mance at low distortion recovery. Even if the approach falls within the standard transform compression category,it achieves stunning results. Fresh benchmarks on the MIT-BIH Arrhythmia database are produced for values ofPRD as in recent publications

11, 12, 14, 15 .The transformation step applies a Discrete Wavelet Transform (DWT). It is recommended to use the fast Cohen-Daubechies-Feauveau 9/7 (CDF 9/7) DWT , but other possibilities could also be applied. Techniques for ECGsignal compression using a wavelet transform have been reported in numerous publications. For a review paperwith extensive references see . The main difference introduced by our proposal lies in the compression method.In particular in what we refer to as the Organization and Storage stage. One of the ﬁndings of this work is theappreciation that remarkable compression results are achievable even prescinding from the typical entropy codingtep for saving the outputs of the algorithm. High compression is attained in straightforward manner by saving inthe Hierarchical Data Format (HDF) . More precisely, in the compressed HDF5 version which is supported bya number of commercial and non-commercial software platforms including MATLAB, Octave, Mathematica, andPython. HDF5 also implements a high-level Application Programming Interface (API) with C, C++, Fortran 90, andJava interfaces. As will be illustrated here, if implemented in software, adding to the algorithm an entropy codingprocess may improve compression further, but at expense of processing time. Either way, the compression resultsfor distortion corresponding to mean PRD in the range [ . , . ] are shown to signiﬁcantly improve recentlyreported benchmarks

11, 12, 14, 15 on the MIT-BIH Arrhythmia database. For PRD < . Method

Before describing the method let’s introduce the notational convention. R is the set of real numbers. Bold facelower cases are used to represent one dimension arrays and standard mathematical fonts to indicate their com-ponents, e.g. c ∈ R N is an array of N real components c ( i ) , i = , . . . , N , or equivalently c = ( c ( ) , . . . , c ( N )) . Within the algorithms, operations on components will be indicated with a dot, e.g. c . = ( c ( ) , . . . , c ( N ) ) and | c . | = ( | c ( ) | , . . . , | c ( N ) | ) . Moreover t = cumsum ( | c . | ) is a vector of components t ( n ) = ∑ ni = | c ( i ) | , n = , . . . , N .The proposed compression algorithm consists of three distinctive steps.1) Approximation Step.

Applies a DWT to the signal keeping the largest coefﬁcients to produce an approxima-tion of the signal up to the target quality.2)

Quantization Step . Uses a scalar quantizer to convert the wavelet coefﬁcients in multiples of integer numbers.3)

Organization and Storage Step . Organizes the outputs of steps 1) and 2) for economizing storage space.At the Approximation Step a DWT is applied to convert the signal f ∈ R N into the vector w ∈ R N whose componentsare the wavelet coefﬁcients ( w ( ) , . . . , w ( N )) . For deciding on the number of nonzero coefﬁcients to be involved inthe approximation we consider two possibilities:a) The wavelet coefﬁcients ( w ( ) , . . . , w ( N )) are sorted in ascending order of their absolute value ( w ( γ ) , . . . , w ( γ N )) ,with | w ( γ ) | ≤ · · · ≤ | w ( γ N ) | . The cumulative sums t ( n ) = ∑ ki = | w ( γ i ) | , n = , . . . , N are calculated to ﬁnd allthe values n such that t ( n ) ≥ tol . Let k + γ i , i = k + , . . . N give the coefﬁcients w ( γ i ) , i = k + , . . . , N of largest absolute value. Algorithm 1 summarizes the procedure.b) After the quantization step the nonzero coefﬁcients and their corresponding indices are gathered together. Algorithm 1

Selection of the largest wavelet coefﬁcientsProcedure [ c , ℓ ] = SLWC ( w , tol ) Input:

Array w of wavelets coefﬁcients. Parameter tol for the approximation error. Output:

Indices ℓ i , i = , . . . , K of the selected wavelet coefﬁcients. Array c with the selected wavelet coefﬁ-cients. { Sort in ascending order the absolute value of the components of w } [ w ↑ , γ ] = sort ( | ( w . | ); t = cumsum ( w ↑ . ) Γ = Set with the values of n such that t ( n ) ≥ tol , n = , . . . , N ℓ = γ ( Γ ) c = w ( ℓ ) At the Quantization Step the selected wavelet coefﬁcients c = ( c ( ) , . . . , c ( K )) , with K = N − k and c ( i − k ) = w ( γ i ) , i = k + , . . . , N , are transformed into integers by a mid-tread uniform quantizer as follows: c ∆ ( i ) = ⌊ c ( i ) ∆ + ⌋ , i = , . . . , K . (1) here ⌊ x ⌋ indicates the largest integer number smaller or equal to x and ∆ is the quantization parameter. Afterquantization, the coefﬁcients and indices are further reduced by the elimination of those coefﬁcients which aremapped to zero by the quantizer. The above mentioned option b) follows from this process. It comes into effect byskipping Algorithm 1. The signs of the coefﬁcients are encoded separately using a binary alphabet (1 for + and 0for -) in an array ( s ( ) , . . . , s ( K )) .Since the indices ℓ i , i = , . . . , K are large numbers, in order to store them in an effective manner at the Organi-zation and Storage Step we proceed as follows. These indices are re-ordered in ascending order ℓ i → ˜ ℓ i , i = , . . . , K ,which guarantees that ˜ ℓ i < ˜ ℓ i + , i = , . . . , K . This induces a re-order in the coefﬁcients, c ∆ → ˜c ∆ and in the corre-sponding signs s → ˜s . The re-ordered indices are stored as smaller positive numbers by taking differences betweentwo consecutive values. Deﬁning δ ( i ) = ˜ ℓ i − ˜ ℓ i − , i = , . . . , K the array ˜ δ = ( ˜ ℓ , δ ( ) , . . . , δ ( K )) stores the indices˜ ℓ , . . . , ˜ ℓ K with unique recovery. The size of the signal, N , the quantization parameter ∆ , and the arrays ˜c ∆ , ˜s , and ˜ δ are saved in HDF5 format. The HDF5 library operates using a chunked storage mechanism. The data array is splitinto equally sized chunks each of which is stored separately in the ﬁle. Compression is applied to each individualchunk using gzip . The gzip method is based of on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding . Within MATLAB all this is implemented simply by using the function save to store thedata.Algorithm 2 outlines a pseudo code of the above described compression procedure. Algorithm 2

Compression Procedure

Input:

Array f with the N -dimensional signal. Quantization parameter ∆ . Variable case (‘a’ for implementationof approach a) and ‘b’ for approach b)). Parameter tol for the approximation error. Output:

Array ˜ c ∆ with the quantized re-ordered unsigned wavelet coefﬁcients. Array ˜ δ with the difference ofthe re-ordered indices. Array ˜s with the signs of the re-ordered wavelet coefﬁcients. All these arrays saved inHDF5 format. { Approximation Stage }{ Perform a DWT transform of f with decomposition level lv } N = numeber of components of fw = DWT( f , lv) if case=‘a’ then { Apply Algorithm 1 } [ c , ℓ ] = SLWC ( w , tol ) end ifif case=‘b’ thenc = w ℓ = N end if { Quantization } Quantize the coefﬁcients c to obtain c ∆ from (1)Eliminate the zero coefﬁcients and the corresponding indices { Organization and Storage }{ Sort the elements of ℓ in ascending order } [ ˜ ℓ, ˜ γ ] = sort ( ℓ ) { Set ˜ δ ( ) = ˜ ℓ and take the difference of the re-ordered indices } ˜ δ ( i ) = ˜ ℓ i − ˜ ℓ i − , i = , . . . , K ˜ c ∆ = | c ∆ ( ˜ γ ) . | ˜s = ( sign ( c ∆ ( ˜ γ )) + ) ./ N and the arrays ˜ c ∆ , ˜ δ and ˜s in HDF5The fast wavelet transform has computational complexity O( N ). Thus, if the approach a) is applied, the compu- ational complexity of Algorithm 2 is dominated by the sort operation in Algorithm 1 with average computationalcomplexity O ( N log N ) . Otherwise the complexity is just O ( N ) , because the number K of indices of nonzero coefﬁ-cients to be sorted is in general much less than N . Nevertheless, as seen in Tables 5-8, in either case the compressionof a 30 min record is achieved on a MATLAB platform in an average time less then 0.2 s. While compression per-formance can be improved further by adding an entropy coding step before saving the arrays, if implemented insoftware such a step slows the process.When selecting the number of wavelet coefﬁcients for the approximation by method a) the parameter tol isﬁxed as follows: Assuming that the target PRD before quantization is PRD we set tol = PRD k f k / is ﬁxed as 70%- 80% of the required PRD. The quantization parameter is tuned to achieve the requiredPRD. Signal Recovery

At the Decoding Stage the signal is recovered by the following steps. • Read the number N , the quantization parameter ∆ , and the arrays ˜ c ∆ , ˜ δ , and ˜s from the compressed ﬁle. • Recover the magnitude of the coefﬁcients from their quantized version as˜ c r = ∆ ˜ c ∆ . (2) • Recover the indices ˜ ℓ from the array ˜ δ as: ˜ ℓ = ˜ δ ( ) and ˜ ℓ i = ˜ δ ( i ) + ˜ δ ( i − ) , i = , . . . , K . • Recover the signs of the the wavelet coefﬁcients as ˜s r = ˜s − • Complete the full array of wavelet coefﬁcients as w r ( i ) = , i = , . . . , N and w r ( ˜ ℓ ) = ˜s r . ˜ c r • Invert the wavelet transform to recover the approximated signal f r .As shown in Tables 5-7, and the recovery process runs about 3 times faster than the compression procedure, whichis already very fast. Results

We present here four numerical tests with different purposes. Except for the comparison in Test II, all the othertests use the full MIT-BIH Arrhythmia database which contains 48 ECG records. Each of these records consistsof N = = k f − f r kk f k × , (3)where, f is the original signal, f r is the signal reconstructed from the compressed ﬁle and k · k indicates the 2-norm.Since the PRD strongly depends on the baseline of the signal, the PRDN, as deﬁned below, is also reported.PRDN = k f − f r kk f − f k × , (4)where, f indicates the mean value of f . hen ﬁxing a value of PRD, the compression performance is assessed by the Compression Ratio (CR) as givenby CR = Size of the uncompressed ﬁle . Size of the compressed ﬁle (5)The quality score (QS), reﬂecting the tradeoff between compression performance and reconstruction quality, is theratio: QS = CRPRD . (6)Since the PRD is a global quantity, in order to detect possible local changes in the visual quality of the recoveredsignal, we deﬁne the local PRD as follows. Each signal is partitioned in Q segments f q , q = . . . , Q of L samples.The local PRD with respect to every segment in the partition, which we indicate as prd ( q ) , q = , . . . Q , is calculatedas prd ( q ) = k f q − f r q kk f q k × , (7)where f r q is the recovered portion of the signal corresponding to the segment q . For each record the mean value prd(prd) and corresponding standard deviation (std) are calculated asprd = Q Q ∑ q = prd ( q ) (8)and std = vuut Q − Q ∑ q = ( prd ( q ) − prd ) . (9)The mean value prd with respect to all the records in the database is a double average prd.When comparing two approaches on a database we reproduce the same mean value PRD. The quantiﬁcation ofthe relative gain in CR of one particular approach, say approach 1, in relation to another, say approach 2, is givenby the quantity: Gain = CR − CR CR × . The gain in QS has the equivalent deﬁnition.

Numerical Test I

We start the tests by implementing the proposed approach using wavelet transforms corresponding to differentwavelet families at different levels of decomposition. The comparison between different wavelet transforms is re-alized using approach b), because within this option each value of PRD is uniquely determined by the quantizationparameter ∆ . Thus, the difference in CR is only due to the particular wavelet basis and the concomitant decom-position level. Table 1 shows the average CR (indicated as CR b ) and corresponding standard deviation (std) withrespect to the whole data set and for three different values of PRD. For each PRD-value CR b is obtained by meansof the following wavelet basis: db5 (Daubechies) coif4 (Coiﬂets) sym4 (Symlets) and cdf97 (Cohen-Daubechies-Feauveau). Each basis is decomposed in three different levels (lv).As observed in Table 1, on the whole the best CR is achieved with the biorthogonal basis cdb97 for lv=4. Inwhat follows all the results are given using this basis for decomposition level lv=4.Next we produce the CR for every record in the database for a mean value PRD of 0.53. able 1. Comparison of CRs for three values of PRD when the proposed approach is implemented using differentwavelets at decomposition levels 3, 4, and 5.Family lv ∆ PRD CR std ∆ PRD CR std ∆ PRD CR std3 47 0.65 24.03 5.44 36 0.53 20.25 4.54 29 0.45 . 4.105 53 0.65 23.64 5.75 39 0.53 19.42 4.50 30 0.45 15.84 3.983 47 0.65 24.66 5.39 36 0.53 20.94 4.66 28 0.45 17.80 3.93cdf97 4 52 0.65

Table 2.

Compression results with approach a), cdf97 DWT, lv=4, ∆ =

35, and PRD = . a QS a PRDN100 0.52 0.02 0.52 28.65 55.01 12.99101 0.51 0.08 0.52 28.32 54.92 9.56102 0.52 0.03 0.52 29.15 55.89 13.36103 0.52 0.04 0.52 26.32 50.86 7.88104 0.52 0.12 0.53 21.23 40.08 10.37105 0.52 0.06 0.53 20.07 38.08 6.39106 0.51 0.07 0.52 20.46 39.49 6.97107 0.54 0.05 0.54 14.30 26.66 3.10108 0.52 0.09 0.52 22.52 43.11 8.51109 0.52 0.05 0.52 23.80 45.42 5.16111 0.52 0.04 0.52 26.71 51.55 10.01112 0.54 0.06 0.55 28.11 51.45 10.52113 0.52 0.02 0.52 22.89 43.93 6.29114 0.51 0.04 0.52 31.85 61.80 14.94115 0.53 0.03 0.53 22.02 41.68 6.75116 0.58 0.04 0.58 12.84 22.05 3.71117 0.54 0.03 0.54 33.70 62.86 9.59118 0.61 0.07 0.62 12.11 19.69 6.16119 0.55 0.02 0.55 18.01 32.67 4.42121 0.53 0.06 0.53 38.74 73.18 7.59122 0.55 0.02 0.55 21.36 38.58 6.49123 0.54 0.03 0.54 28.08 52.05 8.05124 0.54 0.05 0.54 26.03 48.21 5.07200 0.52 0.07 0.53 16.51 31.19 7.00201 0.51 0.05 0.52 37.62 72.79 13.23 Rec prd std PRD CR a QS a PRDN202 0.51 0.05 0.51 30.41 59.57 8.51203 0.54 0.08 0.54 13.64 25.11 5.46205 0.52 0.03 0.52 30.27 57.77 12.83207 0.50 0.11 0.52 30.31 58.72 7.23208 0.52 0.07 0.53 15.98 30.38 5.43209 0.52 0.07 0.53 16.43 31.08 9.79210 0.51 0.09 0.51 26.30 51.08 9.80212 0.54 0.08 0.54 13.28 24.37 8.18213 0.54 0.03 0.54 13.60 25.09 3.99214 0.52 0.05 0.52 21.45 41.45 5.48215 0.54 0.06 0.54 15.10 27.95 9.53217 0.52 0.03 0.52 18.13 34.83 4.22219 0.54 0.03 0.54 18.69 34.44 4.50220 0.54 0.03 0.54 24.21 44.77 7.79221 0.51 0.04 0.51 24.05 46.93 8.46222 0.51 0.07 0.52 24.48 47.17 13.88223 0.54 0.03 0.54 22.24 41.46 6.10228 0.52 0.08 0.52 19.23 36.91 7.51230 0.52 0.06 0.52 21.36 41.04 7.28231 0.52 0.04 0.52 27.10 51.81 9.56232 0.51 0.07 0.51 34.34 66.73 15.50233 0.53 0.05 0.53 15.74 29.59 4.89234 0.52 0.03 0.52 24.47 47.10 7.65 mean 0.53 0.05 0.53 23.17 43.93 8.08std 0.02 0.02 0.02 6.67 13.23 3.06 able 2 shows the results obtained by approach a) where the CR and QS produced by this method are indicatedas CR a and QS a , respectively. The PRD values for each of the records listed in the ﬁrst column of Table 2 are givenin the forth columns of those tables. Notice that the order refers to both, the left and right parts of the table. Thesecond and third columns show the values of prd and the corresponding std for each record. The CR is given in theﬁfth column and the corresponding QS in sixth column of the table. The mean value CR obtained by method b) forthe same mean value PRD = .

53 is CR b = . a with different values of the parameter PRD in method a). Table 3.

Comparison of the CR achieving PRD = .

53 with method a) of the proposed approach for differentvalues of the parameter PRD .PRD a ∆

39 39 39 37 32 21

Numerical Test II

Here comparisons are carried out with respect to results produced by the set partitioning in hierarchical threesalgorithm (SPHIT) approach proposed in . Thus for this test we use the data set described in that publication. Itconsists of 10-min long segments from records 100, 101, 102, 103, 107, 108, 109, 111, 115, 117, 118, and 111. Asindicated in the footnote of at pp 853, the given values of PRD correspond to the subtraction of a baseline equalto 1024. This has generated confusion in the literature, as often the values of PRD in Tables III of are unfairlyreproduced for comparison with values of PRD obtained without subtraction of the 1024 base line. The values ofPRD with and without subtraction of that baseline, which are indicated as PRD B and PRD respectively, are given inTable 4. As seen in this table, for the same approximation there is an enormous difference between the two metrics.A fair comparison with respect to the results in should either involve the ﬁgures in the second row of Tables 4 or,as done in , the fact that a 1024 base line has been subtracted should be speciﬁed. Table 4.

Comparison with the results of Table III in PRD B b Huffb ∆ . The 4th row shows the CRs resulting frommethod b) of the proposed approach without entropy coding and the 5th row the results of adding a Huffman codingstep before saving the compressed data in HDF5 format. The last two rows show the quantization parameters ∆ which produce the required values of PRD B and PRD. umerical Test III This numerical test aims at comparing our results with recently reported benchmarks on the full MIT-BIH Arrhyth-mia database for mean value PRD in the rage [ . , . ] . To the best of our knowledge the highest CRs reportedso far for mean value PRD in the range [ . , . ) are those in , and in the range ( . , . ] those in . ForPRD < . , as shown in Table 7. Table 5 compares our resultsagainst the results in Table III of and Table 6 against Table 1 of . In both cases we reproduce the identical meanvalue of PRD. The differences are in the values of CR and QS. All the Gains given in Table 5 are relative to theresults in while those given in Table 6 and Table 7 are relative to the results in and , respectively. Table 5.

Comparison between the average performance of the proposed method and the method in for the samemean value of PRD. PRD 1.71 1.47 1.18 1.05 0.91 0.80CR a Gain % 62 68 67 61 69 84CR b a Gain % 26 31 33 28 37 41QS b t c (s) a) 0.13 0.14 0.14 0.14 0.14 0.14 t c (s) b) 0.10 0.11 0.11 0.11 0.11 0.11 t r (s) 0.04 0.05 0.05 0.05 0.05 0.05PRD a) 1.303 0.966 0.830 0.830 0.635 0.627 ∆ a) 119 126 93 67 71 51 ∆ b) 177 147 113 98 82 69As already remarked, and fully discussed in , when comparing results from different publications care shouldbe taken to make sure that the comparison is actually on the identical database, without any difference in baseline.From the information given in the papers producing the results we are comparing with (the relation between thevalues of PRD and PRDN) we can be certain that we are working on the same database , which is described in .The parameters for reproducing the required PRD with methods a) and b) are given in the last 3 rows of Tables 5-7 The previous 3 rows in each table give, in seconds, the average time to compress ( t c ) and recover ( t r ) a record. Ascan be observed, the compression times of approaches a) and b) are very similar. The given times were obtained asthe average of 10 independent runs. Notice that the CR in these tables do not include the additional entropy codingstep. able 6. Comparison between the average performance of the proposed method and the method in for the samemean value of PRD. PRD 1.71 1.47 1.29 1.14 1.03 0.94CR a Gain % 48 60 64 76 80 89CR b a Gain% 7 18 21 33 36 44QS b t c a) (s) 0.14 0.14 0.14 0.14 0.14 0.14 t c b) (s) 0.10 0.11 0.11 0.11 0.11 0.11 t r (s) 0.05 0.05 0.05 0.05 0.05 0.05PRD a) 1.303 0.966 0.971 0.750 0.804 0.690 ∆ a) 119 126 91 96 68 69 ∆ b) 177 147 126 108 96 85 Table 7.

Comparison between the average compression performance of the proposed method and the method in for the same mean value of PRD.PRD 1.31 1.02 0.67 0.48 0.31 0.23CR a Gain% 188 176 146 114 70 49CR b a b t c (s) a) 0.14 0.14 0.14 0.14 0.15 0.16 t c (s) b) 0.11 0.11 0.11 0.11 0.13 0.14 t r (s) 0.05 0.05 0.05 0.05 0.05 0.06PRD a) 1.000 0.794 0.562 0.380 0.224 0.193 ∆ a) 90 67 36 33 16 9 ∆ b) 129 95 54 33 16 10Fig. 1 gives the plot of CR vs PRD for the approaches being compared in this section. CR Proposed b) lv=4[11][10][9]

Figure 1.

CR vs PRD corresponding to the proposed approach method b) (blue line) and the approaches in (green line), (yellow line) and red line. Numerical Test IV

Finally we would like to highlight the following two features of the proposed compression algorithm.1) One of the distinctive features stems from the signiﬁcance of saving the outputs of the algorithm directly incompressed HDF5 format. In order to highlight this, we compare the size of the ﬁle saved in this way againstthe size of the ﬁle obtained by applying a commonly used entropy coding process, Huffman coding, beforesaving the data in HDF5 format. The implementation of Huffman coding is realized, as in Table 4, by theoff the shelf MATLAB functions

Huff06 available on . In Table 8 CR a and CR b indicate, as before, theCR obtained when the outputs of methods a) and b) are directly saved in HDF5 format. CR Huffa and CR

Huffb indicate the CR when Huffman coding is applied on the outputs a) and b) before saving the data in HDF5format. The rows right below the CRs give the corresponding compression times.2) The other distinctive feature of the method is the signiﬁcance of the proposed Organization and Storagestep. In order to illustrate this, we compare the results obtained by method b) with those obtained using theconventional Run-Length (RL) algorithm instead of storing the indices of nonzero coefﬁcients as proposedin this work. The CR corresponding to RL in HDF5 format is indicated in Table 8 as CR RL . When Huffmancoding is applied on RL before saving the outputs in compressed HDF5 format, the CR is indicated as CR HuffRL . Discussion

We notice that, while the results in Table 1 show some differences in CR when different wavelets are used for theDWT, it is clear from the table that the selection of the wavelet family is not the crucial factor for the success of thetechnique. The same is true for the decomposition level. That said, since the best results correspond to the cdf97family at decomposition level 4, we have realized the other numerical tests with that wavelet basis.We chose to produce full results for a mean value PRD of 0.53 (c.f. Table 2) as this value represents a goodcompromise between compression performance and high visual similitude of the recovered signal and the raw data.Indeed, in the quality of the recovered signals giving rise to a mean value PRD of 0.53 is illustrated in relation tothe high performance of automatic QRS complex detection. However, the compression ratio of their method is low.For the same mean value of PRD our CR is 5 times larger: 4.5 vs 23.17 (Table 2). As observed in Table 2 the able 8. Comparison of different storage methods. CR a and CR b are the CRs from approaches a) and b) when theoutputs are saved directly in HFD5 format. CR Huffa and CR

Huffb are the corresponding values when the Huffmancodding step is applied before saving the data in HFD5 format. CR RL gives the CR if the outputs of method b) arestored using the RL algorithm and the arrays are saved in HFD5 format. CR HuffRL is the corresponding CR ifHuffman codding is applied before saving the arrays in HFD5 format.PRD 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2CR a Huffa t c (s) 0.13 0.13 0.13 0.14 0.14 0.14 0.15 1.15 1.15 t c (s) 4.3 4.5 5.0 5.4 5.5 6.2 8.3 10.22 15.4 ∆ a) 71 64 50 45 35 30 24 15 8.5PRD a) 0.750 0.675 0.640 0.550 0.484 0.400 0.300 0.230 0.150CR b Huffb t c (s) 0.11 0.11 0.11 0.11 0.11 0.11 0.12 0.12 0.12 t c (s) 4.1 4.0 4.5 5.2 5.4 6.7 8.1 10.7 15.5 ∆ b) 92 81 69 58 47 36 25 15.5 8.5CR RL HuffRL t c (s) 0.12 0.12 0.12 0.13 0.13 0.13 0.13 0.14 0.16 t c (s) 4.5 4.9 6.2 6.4 7.1 7.4 9.0 12.8 19.5mean value of the local quantity prd is equivalent to the global value (PRD). Nevertheless the prd may differ forsome of the segments in a record. Fig. 2 plots the prd for record 101 partitioned into Q =

325 segments of length L = q ⋆ of maximum distortion with respect to the prd as q ⋆ = arg max q = ,..., Q prd ( q ) . (10)The left graphs of Fig. 3 and Fig. 4 correspond to the segments of maximum prd with respect to all the recordsin the database and segments of length L =

50 100 150 200 250 3000.40.50.60.70.80.911.1 q p r d Figure 2.

Values of prd for the Q =

325 segments of length L = Figure 3.

The upper waveforms in both graphs are the raw data. The lower waveforms are the correspondingapproximations which have been shifted down for visual convenience. The bottom lines represent the absolutevalue of the difference between the raw data and the approximation. The left graph corresponds to segment 25 inrecord 101 and the right graph corresponds to segment 120 in the same record.

Figure 4.

Same description as in Fig. 3 but for record 213 and segment 175 (left graph) and 51 (right graph).It is worth commenting that the difference in the results between approaches a) and b) is consequence of thefact that the concomitant parameters are set to approximate the whole database at a ﬁxed mean value PRD. In thatsense, approach a) provides us with some ﬂexibility (there are two parameters to be ﬁxed to match the requiredPRD) whereas for approach b) the only parameter ( ∆ ) is completely determined by the required PRD. As observed n Table 3, when setting the parameter PRD much smaller than the target PRD the approximation is only inﬂuencedby the quantization parameter ∆ and methods a) and b) coincide. Contrarily, when setting the PRD too close tothe target PRD the quantization parameter needs to be signiﬁcantly reduced, which affects the compression results.For a target PRD ≥ . as 70%- 80% of the required PRD.For values of PRD < . < . < . > . and X-Ray medical images . In this case the strategy is evenmore efﬁcient, because the approximation is realized using a basis and on the whole signal, which intensiﬁes theefﬁciency of the storage approach. Conclusions

An effective and efﬁcient method for compressing ECG signals has been proposed. The proposal was tested onthe MIT-BIH Arrhythmia database, which gave rise to benchmarks improving upon recently reported results. Themain feature of the method is its simplicity and the fact that for values of PRD > . Note:

The MATLAB codes for implementing the whole approach have been made available on a dedicatedwebsite . Acknowledgments

Thanks are due to K. Skretting, for making available the Huff06 MATLAB function , which has been used forentropy coding, and to P. Getreuer for the waveletcdf97 MATLAB function which has being used for implemen-tation of the CDF 9/7 wavelet transform. Competing interests

The author declares no competing interests. uthor contribution statement

The author is the only contributor to the paper.

Funding statement:

There is no funding associated to this work.

Ethics statement:

There is no Ethics issue related to this work.

Data availability:

The data used in this paper are available on https://physionet.org/physiobank/database/mitdb/

We have also placed the data, together with the software for implementing the proposed approach, on

References C. M. Gibson, et al, “Diagnostic and prognostic value of ambulatory ECG (Holter) monitoring in patients withcoronary heart disease: a review”,

J Thromb Thrombolysis, S. Mittal, C. Movsowitz, and J.S. Steinberg, “Ambulatory external electrocardiographic monitoring: focus onatrial ﬁbrillation”,

J Am Coll Cardiol, , 1741–1749 (2011). J. S. Steinberg, et al, “2017 ISHNE-HRS expert consensus statement on ambulatory ECG and external cardiacmonitoring/telemetry”,

Heart Rhythm , e55–e96 (2017). S. M. S. Jalaleddine, C. G. Hutchens, R. D. Strattan, and W. A. Coberly, “ECG data compression techniques –a uniﬁed approach”,

IEEE Transactions on Biomedical Engineering , N. Sriraam, and C. Eswaran, “An Adaptive Error Modeling Scheme for the Lossless Compression of EEGSignals”,

IEEE Transactions on Information Technology in Biomedicine , , 587–594 (2008). K. Srinivasan, J. Dauwels, and M. R. Reddy, “A two-dimensional approach for lossless EEG compression”,

Biomedical Signal Processing and Control, , 387–394 (2011). S. K. Mukhopadhyay, S. Mitra, and M. Mitra, “A lossless ECG data compression technique using ASCIIcharacter encoding”,

Computers & Electrical Engineering , , 486- 497 (2011). B. Hejrati, A. Fathi, and F. Abdali-Mohammadi, “Efﬁcient lossless multi-channel EEG compression based onchannel clustering, Biomedical Signal Processing and Control” , 295–300 (2017). S-G Miaou, H-L Yen ,and C-L Lin, “Wavelet-based ECG compression using dynamic vector quantization withtree codevectors in single codebook,”

IEEE Transactions on Biomedical Engineering, , , 671 – 680 (2002). C. Ku, K. Hung, and T. Wu, “Wavelet-based ECG data compression system with linear quality control schem”,

IEEE Transactions on Biomedical Engineering, , 1399–1409 (2010). S.J. Lee, J. Kim, and M. Lee,“A Real-Time ECG Data Compression and Transmission Algorithm for an e-Health Device”,

IEEE Transactions on Biomedical Engineering , J.L. Ma, T.T. Zhang, and M. C. Dong, “A Novel ECG Data Compression Method Using Adaptive FourierDecomposition With Security Guarantee in e-Health Applications”,

IEEE Journal of Biomedical and HealthInformatics, , 986–994 (2015). A. Fathi and F. Faraji-kheirabadi, “ECG compression method based on adaptive quantization of main waveletpacket subbands”,

Signal, Image and Video Processing , , 1433–1440 (2016). C. Tan, L. Zhang and H. Wu,“A Novel Blaschke Unwinding Adaptive Fourier Decomposition based SignalCompression Algorithm with Application on ECG Signals”,

IEEE Journal of Biomedical and Health Informat-ics

M. Elgendi, A. Mohamed, and R, Ward “Efﬁcient ECG Compression and QRS Detection for E-Health Appli-cations”,

Scientiﬁc Reports , , doi:10.1038/s41598-017-00540-x (2017). H. Mamaghanian and N. Khaled, D. Atienza, P. Vandergheynst, “Compressed Sensing for Real-Time Energy-Efﬁcient ECG Compression on Wireless Body Sensor Nodes”,

IEEE Transactions on Biomedical Engineering , , 2456 – 2466 (2011). Z. Zhang, T-P Jung, S. Makeig, and B. D. Rao, “Compressed Sensing for Energy-Efﬁcient Wireless Telemon-itoring of Noninvasive Fetal ECG via Block Sparse Bayesian Learning”,

IEEE Transactions on BiomedicalEngineering, , 300–309 (2013). L. F. Polan´ıa, R. E. Carrillo, Manuel Blanco-Velasco, and K. E. Barner, “Exploiting Prior Knowledge in Com-pressed Sensing Wireless ECG Systems”,

IEEE Journal of Biomedical and Health Informatics , L. F. Polan´ıa, and R. I. Plaza, “Compressed Sensing ECG using Restricted Boltzmann Machines”,

BiomedicalSignal Processing and Control , , 237–45 (2018). A. Cohen, I Daubechies, and J. C. Feauveau, “Biorthogonal bases of compactly supported wavelets”,

Commu-nications on Pure and Applied Mathematics,

M. S Manikandan and S. Dandapat, “Wavelet-based electrocardiogram signal compression methods and theirperformances: A prospective review”,

Biomedical Signal Processing and Control , J. Ziv and A. Lenpel, “A Universal Algorithm for Sequential Data Compression”,

IEEE Transactions on Infor-mation Theory , , 337–343, (1977). D. Huffman, “A Method for the Construction of Minimum-Redundancy Codes”,

Proceedings of the IRE , ,1098–1101 (1952). https://physionet.org/physiobank/database/mitdb/ (Accessed Jan 2, 2019). Z. Lu, D. Y. Kim, and W.A. Pearlman, “Wavelet compression of ECG signals by the set partitioning in hierar-chical trees algorithm”,

IEEE Transactions on Biomedical Engineering , , 849–856 (2000). M. Blanco-Velasco, F Cruz-Rold´an, and J. I. Godino-Llorente, “On the use of PRD and CR parameters forECG compression”,

Medical Engineering and Physics, , 798–802 (2005). G. B. Moody and R. G. Mark RG, “The impact of the MIT-BIH Arrhythmia Database”,

IEEE Eng in Med andBiol , ∼ karlsk/proj99 (Accessed Jan 2, 2019) D. Salomon, Data Compression, Springer-Verlag London, 2007.

L. Rebollo-Neira and I. Sanches, , “Simple scheme for compressing sparse representation of melodic music”,

Electronics Letters (2017) DOI:10.1049/el.2017.3908

L. Rebollo-Neira, “A competitive scheme for storing sparse representation of X-Ray medical images”,

PLOSONE , https://doi.org/10.1371/journal.pone.0201455 (2018). (Accessed Jan 2, 2019) (Accessed Jan 2, 2019)(Accessed Jan 2, 2019)