[PDF] Time series compression: a survey

Abstract

The presence of smart objects is increasingly widespread and their ecosystem, also known as Internet of Things, is relevant in many different application scenarios. The huge amount of temporally annotated data produced by these smart devices demand for efficient techniques for transfer and storage of time series data. Compression techniques play an important role toward this goal and, despite the fact that standard compression methods could be used with some benefit, there exist several ones that specifically address the case of time series by exploiting their peculiarities to achieve a more effective compression and a more accurate decompression in the case of lossy compression techniques. This paper provides a state-of-the-art survey of the principal time series compression techniques, proposing a taxonomy to classify them considering their overall approach and their characteristics. Furthermore, we analyze the performances of the selected algorithms by discussing and comparing the experimental results that where provided in the original articles. The goal of this paper is to provide a comprehensive and homogeneous reconstruction of the state-of-the-art which is currently fragmented across many papers that use different notations and where the proposed methods are not organized according to a classification.

Full PDF

11 Time series compression: a survey

Giacomo Chiarot, Claudio Silvestri

Abstract —The presence of smart objects is increasingly widespread and their ecosystem, also known as Internet of Things, isrelevant in many different application scenarios. The huge amount of temporally annotated data produced by these smart devicesdemand for efﬁcient techniques for transfer and storage of time series data. Compression techniques play an important role toward thisgoal and, despite the fact that standard compression methods could be used with some beneﬁt, there exist several ones thatspeciﬁcally address the case of time series by exploiting their peculiarities to achieve a more effective compression and a moreaccurate decompression in the case of lossy compression techniques. This paper provides a state-of-the-art survey of the principaltime series compression techniques, proposing a taxonomy to classify them considering their overall approach and theircharacteristics. Furthermore, we analyze the performances of the selected algorithms by discussing and comparing the experimentalresults that where provided in the original articles.The goal of this paper is to provide a comprehensive and homogeneous reconstruction of the state-of-the-art which is currentlyfragmented across many papers that use different notations and where the proposed methods are not organized according to aclassiﬁcation.

Index Terms —Time series, Data Compaction and Compression, Internet of Things, State-of-the-art Survey, Sensor Data,Approximation, and Data Abstraction. (cid:70)

NTRODUCTION T IME series are relevant in several different contextsand Internet of Things ecosystem (IoT) is among themost pervasive ones. IoT devices, indeed, can be foundin different applications, ranging from health-care (smartwearables) to industrial ones (smart grids) [2], producingvery large amount of data. For instance, a single Boeing 787ﬂy can produce about half a terabyte of data from sensors[26]. In those scenarios, characterized by high data rates andvolumes, time series compression techniques are a sensiblechoice to increase the efﬁciency of collection, storage andanalysis of sensor data. In particular, the need to include inthe analysis both information related to the recent and tothe past history of the data stream, leads to consider datacompression as a solution to optimize space without losingthe most important information. A direct application of timeseries compression, for example, can be seen in Time SeriesManagement Systems (or Time Series Database) in whichcompression is one of the most signiﬁcant step [15].There exists an extensive literature on data compressionalgorithms, both on generic purpose ones for ﬁnite size dataand on domain speciﬁc ones, for example for images and forvideo and audio data streams. This survey aim at providingan overview of the state-of-the-art in time series compres-sion research, speciﬁcally focusing on general purpose datacompression techniques that are either developed for timeseries or working well with time series.The algorithms we chose to summarize are able to dealwith the continuous growth of time series over time andsuitable for generic domains (as in the different applications • Both authors are with the Department of Environmental Sciences,Informatics, and Statistics of Ca’ Foscari University of Venice. E-mail:[email protected], [email protected]. • Claudio Silvestri is also with the European Center For Living Technology,hosted by Ca’ Foscari University of Venice. in the IoT). Further, these algorithms take advantage of thepeculiarities of time series produced by sensors, such as: • Redundancy : some segments of a time series canfrequently appear inside the same or other relatedtime series; • Approximability : sensors in some case produce timeseries that can be approximated by functions; • Predictability : some time series can be predictable,for example using deep neural network techniques.Fig. 1: Visual classiﬁcation of time series compression algo-rithmsThe main contribution of this survey is to present areasoned summary of the state-of-the art in time series com-pression algorithms, which is currently fragmented amongseveral sub-domains ranging from databases to IoT sen-sor management. Moreover, we proposes a taxonomy of a r X i v : . [ c s . D B ] J a n time series compression techniques based on their approach(dictionary-based, functional approximation, autoencoders,sequential, others) and their properties (adaptiveness, loss-less reconstruction, symmetry, tuneability), anticipated invisual form in Figure 1 and discussed in Section 3, that willguide the description of the selected approaches. Finally,we recapitulate the results of performance measurementsindicated in the described studies.The article is organized as follows. Section 2 providessome deﬁnitions regarding time series, compression, andquality indexes. Section 3 describes compression algorithmsand is structured according to the proposed taxonomy. Sec-tion 4 summarize the experimental results found in the stud-ies that originally presented the approaches we describe. Asummary and conclusions are presented in Section 5. ACKGROUND

This section gives a formal deﬁnition of time series, com-pression and quality indices.

Time series are deﬁned as a collection of data, sorted inascending order according to the timestamp t i associatedto each element. They are divided into: • Univariate Time Series (UTS): elements inside thecollections are real values; • Multivariate Time Series (MTS): elements inside thecollections are arrays of real values, in which eachposition in the array is associated to a time seriesfeatures.For instance, the temporal evolution of the average dailyprice of a commodity as the one represented in the plotin Figure 2 can be modeled as an univariate time series,whereas the summaries of daily exchanges for a stock (in-cluding opening price, closing price, volume of trades andother information) can be modeled as a multivariate timeseries.Fig. 2: Example of an UTS representing stocks ﬂuctuationsUsing a formal notation, time series can be written as:

T S = [( t , x ) , . . . , ( t n , x n )] , x i ∈ R m (1)where n is the number of elements inside a time seriesand m is vector dimension of multivariate time series. Forunivariate time series m = 1 . Given k ∈ [1 , n ] , we write T S [ k ] to indicate the k -th element ( t n , x n ) of the time series T S .A time series can be divided into segments, deﬁned asa portion of the time series, without any missing elementsand ordering preserved:

T S [ i,j ] = [( t i , x i ) , . . . , ( t j , x j )] (2)where ∀ k ∈ [ i, j ] , T S [ k ] = T S [ i,j ] [ k − i + 1] . Data compression, also known as source coding , is deﬁned in[27] as ”the process of converting an input data stream (thesource stream or the original raw data) into another datastream (the output, the bitstream, or the compressed stream)that has a smaller size”. This process can take advantage ofthe

Simplicity Power (SP) theory, formulated in [39], in whichcompression goal is to remove redundancy while having anhigh descriptive power.The decompression process, complementary to the com-pression one is indicated also as source decoding , and triesto reconstruct the original data stream from its compressedrepresentation.Compression algorithms can be described with the com-bination of different classes, shown in the following list: • Non-adaptive - adaptive : a non-adaptive algorithmthat doesn’t need a training phase to work efﬁcientlywith a particular dataset or domain since operationsand parameters are ﬁxed, while an adaptive onedoes; • Lossy - lossless : algorithms can be lossy, if thedecoder doesn’t return a result that is identical tooriginal data, or lossless if the decoder result isidentical to original data; • Symmetric - non-symmetric : a symmetric algo-rithm uses the same algorithm as encoder and de-coder, working on different directions, whereas non-symmetric one uses two different algorithms.In the particular case of time series compression, a com-pression algorithm ( encoder ) takes in input one Time Series

T S of size s and returns its compressed representation T S (cid:48) of size s (cid:48) , where s (cid:48) < s and the size is deﬁned as the bitsneeded to store the time series: E ( T S ) =

T S (cid:48) . From thecompressed representation

T S (cid:48) , using a decoder , it is possibleto reconstruct the original time series: D ( T S (cid:48) ) =

T S . If

T S = T S s then the algorithm is lossless, otherwise it islossy.In Section 3, there are shown the most relevant categoriesof compression techniques and their implementation. To measure the performances of a compression encoder fortime series, three characteristics are considered: compres-sion ratio, speed and accuracy.

Compression ratio

This metric measures the effective-ness of a compression technique and it is deﬁned as: ρ = s (cid:48) s (3)where s (cid:48) is the size of the compressed representation and s the size of the original time series. Its inverse ρ is named compression factor . An index used for the same purpose is the compression gain , deﬁned as: c g = 100 log e ρ (4) Speed

The unit of measure for speed is cycles per byte (CPB) and is deﬁned as the average number of computercycles needed to compress one byte.

Accuracy , also called distortion criteria, measures theﬁdelity of the reconstructed time series respect to the orig-inal. It is possible to use different metrics to determineﬁdelity [28]: • Mean Squared Error : M SE = (cid:80) ni =1 ( x i − x i ) n • Root Mean Squared Error : RM SE = √ M SE • Signal to Noise Ratio : SN R = (cid:80) ni =1 x i / nMSE • Peak Signal to Noise Ratio : P SN R = x pick MSE where x pick is the maximum value in the original timeseries. OMPRESSION ALGORITHMS

In this section we present the most relevant time seriescompression algorithms by describing in short summariestheir principles and peculiarities. We also provide a pseudo-code for each approach, focusing more on style homogene-ity among the pseudo-codes for the different methods thanon the faithful reproduction of the original pseudo-codeproposed by the authors. For a more detailed descriptionwe refer the reader to the complete details available in theoriginal articles. Below the full list of algorithms describedin this section, divided by approach:1) Dictionary-Based (DB):1.1. TRISTAN;1.2. CONRAD;1.3. A-LZSS;1.4. D-LZW.2) Functional Approximation (FA)2.1. Piecewise Polynomial Approximation (PPA);2.2. Chebyshev Polynomial Transform (CPT);2.3. Discrete Wavelet Transform (DWT).3) Autoencoders:3.1. Recurrent Neural Network Autoencoder(RNNA).4) Sequential Algorithms (SA):4.1. Delta encoding, Run-length and Huffman(DRH);4.2. Sprintz;4.3. Run-Length Binary Encoding (RLBE);4.4. RAKE.5) Others:5.1. Major Extrema Extractor (MEE);5.2. Segment Merging (SM);5.3. Continuous Hidden Markov Chain (CHMC).

This approach is based on the principle that time seriesshare some common segments. These segments can beextracted into atoms, such that a time series segment canbe represented with a sequence of these atoms. Atoms arethen collected into a dictionary that associates to each atomwith an univocal key used both in the representation oftime series and to search efﬁciently their content. The choice of atoms length should guarantee a low decompressionerror and maximize the compression factor at the sametime. Listing 1 shows how training phase works at highlevel: createDictionary function computes a dictionaryof segments given a dataset composed by time series and athreshold value th .Listing 1: Training phase c r e a t e D i c t i o n a r y ( Stream S , Threshold th , i n tsegmentLength ) { Dictionary d ; Segment s ; while ( S . isNotEmpty ( ) ) { s . append ( S . read ( ) ) ; i f ( s . length == segmentLength ) { i f ( d . find ( s , th ) ) { d . merge ( s ) ; } e l s e { d . add ( s ) ; } s . empty ( ) ; } } return d ; } The find function searches in the dictionary if there existsa segment which is similar to the segment s with a distancelower than threshold th ; a possible index of distance canbe M SE . If a match is found, the algorithm merges thesegment s with the matched one in order to achieve gener-alization.After the segment dictionary is created, the compressionphase takes in input one time series, and each segment isreplaced with its key in the dictionary. If some segments arenot present in the dictionary, they are left uncompressed, ora new entry is added to the dictionary. Listing 2 shows howthe compression phase works: the compress function takesin input a time series to compress, the dictionary createdduring the training phase, a threshold value and returns thecompressed time series as a list of indices and segments.Listing 2: Compression phase compress ( Stream S , Dictionary d , Threshold th ,i n t segmentLength ) { Segment s ; while ( S . isNotEmpty ( ) ) { s . append ( S . read ( ) ) ; i f ( s . length == segmentLength ) { i f ( d . find ( s , th ) ) { send ( d . getIndex ( s ) ) ; } e l s e { send ( s ) ; } s . empty ( ) ; } } } The compression achieved by this technique can beeither lossy or lossless, depending on the implementation.The main challenges for this architecture are to: • maximize the searching speed to ﬁnd time seriessegments in the dictionary; • make the time series segments stored in the dictio-nary more general as possible in order to minimizethe distance in the compression phase. One implementation of the Dictionary-Based architecture isTRISTAN [22], an algorithm diveded into two phases: thelearning phase and the compression phase.

Learning phase

The dictionary used in this implemen-tation can be created by domain experts that add typicalpatterns or by learning a training set. To learn a dictionaryfrom a training set T = [ t , . . . , t n ] of n segments, thefollowing minimization problem has to be solved: D = arg min D n (cid:88) i =1 (cid:107) w i D − t i (cid:107) Under the constraint: (cid:107) w i (cid:107) ≤ sp (5)where D is the obtained dictionary, sp is a ﬁxed parameterrepresenting sparsity, w i is the compressed representationof segment i , and w i D is the reconstructed segment. Themeaning of this formulation is that the solution to be foundis a dictionary which minimize the distance between origi-nal and reconstructed segments.The problem shown in equation 5 is NP-hard [22], thusan approximate result is computed. A technique for approx-imating this result is shown in [20]. Compression phase

Once the dictionary is built, thecompression phase consists in ﬁnding w such that: s = w · D (6)where D is the dictionary, s is a segment and w ∈ { , } k ,and k the length of the compressed representation. Anelement of w in position i is if the a i ∈ D is used toreconstruct the original segment, otherwise.Finding a solution for Equation 6 is a NP-hard problemand, for this reason, the matching pursuit method [21] isused to approximate the original problem: s = arg w min (cid:107) wD − s (cid:107) Under the constraint: (cid:107) s (cid:107) ≤ sp (7)where sp is a ﬁxed parameter representing sparsity. Reconstruction phase

Having a dictionary D and acompressed representation w of a segment s , it is possibleto compute s as: s = wD (8) This implementation extends the idea presented in TRIS-TAN [16]. The main difference is that it adds autocorrelationinformation to get better performances in terms of compres-sion ratio and accuracy.Correlation between two time series

T S A and T S B ismeasured with the Pearson correlation coefﬁcient: r = (cid:80) ni ( x i − x )( y i − y ) (cid:112)(cid:80) ni ( x i − x ) (cid:112)(cid:80) ni ( y i − y ) (9)where x i is an element of T S An , y i is an element of T S Bn and x, y are mean values of the corresponding time series.This coefﬁcient can be applied also to segments that havedifferent ranges of values and r ∈ [ − , q ] where 1 expressthe maximum linear correlation, -1 the maximum linearnegative correlation and 0 no linear correlation. Time series are divided into segments and time windoware set. For each window, correlation is computed betweeneach segment belonging to it and results are stored in acorrelation matrix M ∈ R n × n where n is the segmentsnumber of elements. Compression phase

During this phase, segments aresorted from the less correlated to the most correlated. Todo this, correlation matrix M is used. The metric used tomeasure how much one segment is correlated with all theothers is the absolute sum of the correlations, computed asthe sum of each row of M .Knowing correlation information, dictionary is usedonly to represent segment that are not correlated withothers, as in TRISTAN implementation. While the otherssegments are represented solely using correlation informa-tion. Reconstruction phase

The reconstruction phase startswith segment represented with dictionary atoms while theothers are reconstructed looking at the segment to whichthey are correlated.This process is very similar to the one proposed in TRIS-TAN, with the sole difference that segments representedusing the dictionary are managed differently than those thatare not.

Accelerometer LZSS is an algorithm built on top the LZSSalgorithm [35] for searching matches [24]. A-LZSS algorithmuses Huffman codes, generated ofﬂine using frequency dis-tributions. In particular, this techniques considers blocks ofsize s = 1 to compute frequencies and build the code: anelement of the time series will be replaced by a variablenumber of bits. Also larger blocks can be considered and,in general, having larger blocks gives better compressionperformances at the cost of larger Huffman code tables.The implementation of this techniques is show below:A-LZSS algorithm compress ( Stream S , i n t minM, Dictionary d , i n tLn , i n t Dn) { foreach ( s in S ) { I , L = d . longestMatch ( s , Ln , Dn) ; i f ( L < minM) { send ( getHuffCode ( s ) ) ; } e l s e { send ( ( I , L ) ) ; s . skip ( L ) ; } } } The algorithm in Listing 3.1.3 takes in input a stream, aHuffman code dictionary, and other three integers valuesthat are: • minM : the minimum match length, which is assertedto be minM > ; • Ln : determines the lookahead distance, as Ln ; • Dn : determines the dictionary atoms length, as Dn .The longestMatch function returns the index I of thefound match and the length L of the match. If the lengthof the match is too small, then the Huffman code represen-tation of s is appended to c , otherwise the index and the length of the match are appended and the next L elementsare skipped.This implementation uses a brute-force approach, withcomplexity O (2 Dn · Ln ) but it is possible to improve overit by using hashing techniques. The core of this technique is the creation of a very largedictionary that grows over time: once the dictionary iscreated, if a buffer block is found inside the dictionary itis replaced by the corresponding index, otherwise, the newblock is inserted in the dictionary as a new entry [18].Adding new blocks guarantees lossless compression, buthas the drawback of having too large dictionaries. Thismakes the technique suitable only for particular scenarios(i.e. input streams composed by words/characters or whenthe values inside a block are quantized).Another drawback of this techniques is the way in whichthe dictionary is constructed: elements are simply appendedto the dictionary to preserve the indexing of previous blocks.For a simple implementation of the dictionary, the com-plexity for each search is O ( n ) where n is the size of thedictionary. This complexity can be improved by using moreefﬁcient data structures. The main idea behind function approximation is that atime series can be represented as a function of time. Sinceﬁnding a function that approximate the whole time series isinfeasible due to the the presence of new values that cannotbe handled, the time series are divided into segments andfor each of them an approximating function is found.Exploring all the possible functions f : T → X is notfeasible thus implementations consider only one family offunctions and try to ﬁnd the parameters that better approx-imate each segment. This makes the compression lossy.A point of strength is that it doesn’t depend on the datadomain, so no training phase is required, since the regres-sion algorithm considers only single segments in isolation. This technique divides a time series into several segmentsof ﬁxed or variable length and tries to ﬁnd the best polyno-mials that approximate segments.Despite the compression is lossy, a maximum deviationfrom the original data can be ﬁxed a priori to enforce a givenreconstruction accuracy.The implementation of this algorithm is described in[6] where the authors apply a greedy approach and threedifferent online regression algorithms for approximatingconstant functions, straight lines and polynomials. Theseonline algorithms are: • the PMR-Midrange algorithm, that approximates us-ing constant functions [17]; • the optimal approximation algorithm, described in[5], that uses linear regression; • the randomized algorithm presented in [29], thatapproximates using polynomials.The algorithm used in [6] for approximating a time seriessegment is explained in Listing 3.2.1. This algorithm takes in input a time series S , a maximum polynomial degree ρ and an error threshold (cid:15) .PPA algorithm compress ( i n t ρ , Stream S , f l o a t (cid:15) ) { Segment s ; while ( S . isNotEmpty ( ) ) { Polynomial p ; i n t l = 0 ; foreach ( k in [0 : ρ ] ) { f l o a t currErr ; Polynomial currP ; bool continueSearch = true ; i n t i = 0 ; i n t currL ; while ( continueSearch && i < len ( s ) ) { urrErr , currP = approx ( k , s .g e t P r e f i x ( i ) ) ; i f ( currErr < (cid:15) ) { i += 1 ; currL = i ; } e l s e { continueSearch = f a l s e ; } } while ( continueSearch && S .isNotEmpty ( ) ) { s . append ( S . read ( ) ) ; currErr , currP = approx ( k , s ) ; i f ( currErr < (cid:15) ) { currL = len ( s ) ; } e l s e { currL = len ( s ) − continueSearch = f a l s e ; } } i f ( currErr < (cid:15) && currL > l ) { p = currP ; l = currL ; } } i f ( len ( s ) > { send ( p , l ) ; s . removePrefix ( l ) ; } e l s e { throw ” Error : exceeded e r r o rthreshold value ” ; } } } This algorithm ﬁnds repeatedly the polynomial of degreebetween 0 and a ﬁxed maximum that can approximate thelongest segment within the threshold error, yielding themaximum local compression factor. After a preﬁx of thestream has been selected and compressed into a polynomial,the algorithm analyzes the following stream segment.

Another implementation of polynomial compression can befound in [11]. In this article the authors show how a timeseries can be compressed in a sequence of ﬁnite Chebyshevpolynomia.The principle is very similar to the one shown in subsec-tion 3.2.1 but based on the use of a different type of poly-nomial. Chebyshev Polynomials are of two types, T n ( x ) , U n ( x ) , deﬁned as [19]: T n ( x ) = n n/ (cid:88) k =0 ( − k ( n − k − k ! ( n − k )! (2 x ) n − k , | x | < (10) U n ( x ) = n/ (cid:88) k =0 ( − k ( n − k )! k ! ( n − k )! (2 x ) n − k , | x | < (11)where n ≥ is the polynomia degree. Discrete wavelet transform uses wavelets functions to trans-form time series. Wavelets are functions that, similarly toa wave, starts from zero, and ends with zero after someoscillations. An application of this technique can be foundin [1].This transformation can be written as: Ψ m,n ( t ) = 1 √ a m Ψ (cid:18) t − nba m (cid:19) (12)where a > , b > , m, n ∈ Z .To recover one transformed signal, the following formulacan be applied: x ( t ) = (cid:88) m ∈ Z (cid:88) n ∈ Z (cid:104) x, Ψ m,n (cid:105) · Ψ m,n ( t ) (13) An autoencoder is a particular neural network that is trainedto give as output the same values passed as input. Itsarchitecture is composed by two symmetric parts: encoderand decoder. Giving an input of dimension n , the encodergives as output a vector with dimensionality m < n , calledcode, while the decoder reconstructs the original input fromthe code, as shown in Figure 3 [9].Fig. 3: Simple autoencoder architecture RNNA compression algorithms exploit recurrent neural net-work [30] to achieve a compressed representation of a timeseries. Figure 4 shows the general unrolled structure of arecurrent neural network encoder and decoder. The encoder takes as input time series elements, which are combinedwith hidden states. Each hidden state is then computedstarting form the new input and the previous state. Thelast hidden state of the encoder is passed as ﬁrst hiddenstate for the decoder, which apply the same mechanism,with the only difference that each hidden state providesalso an output. The output provided by each state is thereconstruction of the relative time series element and ispassed to the next state.Fig. 4: General structure of a RNN encoder and decoderNew hidden state h t is obtained by applying: h t = φ ( W x t + U h t − ) (14)where φ is a logistic sigmoid function or the hyperbolictangent.One application of this technique is shown in [13],in which also Long Short-Term Memory [12] is consid-ered. This implementation compresses time series segmentsof different length using autoencoders. The compressionachieved is lossy and a maximal loss threshold (cid:15) can beenforced.The training set is preprocessed considering temporalvariations od data applying: ∆( L ) = (cid:88) t ∈L | x t − x t − | (15)where L is the local time window.The value obtained by Equation 15 is then used topartition the time series, such that each segment has a totalvariation close to a predetermined value τ .Listing 3 shows an implementation of the RNN autoen-coder approach, with error threshold (cid:15) . Listing 3: RNN compression algorithm compress ( Stream S , f l o a t (cid:15) , RAE a ) { Segment s ; while ( S . isNotEmpty ( ) ) { Segment aux = s ; Element e = S . read ( ) ; aux . append ( e ) ; i f ( getError ( aux , a . decode ( a . encode ( aux )) < (cid:15) ) { s = aux } e l s e { send ( a . encode ( s ) ) ; s . empty ( ) ; s . append ( e ) ; } } } where: • RAE a is the recurrent autoencoder trained on atraining set, composed of an encoder and a decoder; • getError computes the reconstruction error be-tween the original and reconstructed segment; • (cid:15) is the error threshold value. This architecture is characterized by combining sequentiallyseveral simple compression techniques. Some of the mostused techniques are: • Huffman coding; • Delta encoding; • Run-length encoding; • Fibonacci binary encoding.These techniques, summarized below, are the buildingblocks of the methods presented in the following subsec-tions.

Huffman coding

Huffman coding is the basis of manycompression techniques since it is one of the necessary steps,as for the algorithm shown in Subsection 3.4.4.The encoder creates a dictionary that associates eachsymbol to a binary representation and replace each symbolof the original data with the corresponding representa-tion. Compression algorithm is shown in Listing 4 [28].where createPriorityList function creates a list ofelements ordered from the less frequent to the most fre-quent, addTree function adds to the tree a father nodewith its children and createDictionary assigns 0 and1 respectively to left and right arcs and creates the dic-tionary assigning to characters the sequence of 0 | Delta encoding

This techniques encodes a target ﬁlewith respect to one or more reference ﬁles [36]. In theparticular case of time series, each element at time t isencoded as ∆( x t , x t − ) . Listing 4: Huffman Code compression c r e a t e D i c t i o n a r y ( StreamPrefix S ) { Tree T = new Tree ( ) ; P r i o r i t y L i s t L = c r e a t e P r i o r i t y L i s t ( S ) ; foreach ( i in [0 : L . length − { Node n = new Node ( ) ; Element e l 1 = L . extractMin ( ) ; Element e l 2 = L . extractMin ( ) ; n . frequency = e l 1 . frequency + e l 2 .frequency ; T . addTree ( n , ( el1 , e l 2 ) ) ; L . add ( n ) ; } return L . toDictionary ( ) ; } compress ( Stream S , i n t prefixLen ) { StreamPrefix s = S . p r e f i x ( prefixLen ) ; Dictionary D = c r e a t e D i c t i o n a r y ( s ) ; CompressedRepresentation R = [ ] ; while ( S . isNotEmpty ( ) ) { Element e = S . read ( ) ; send (D[ e ] ) ; } } Run-length

In this technique, each run (sequence inwhich the same value is repeated consecutively) is substi-tuted with the pair ( v t , o ) where v t is the value at time t and o is the number of consecutive occurrences [10]. Fibonacci binary encoding

This encoding technique isbased on the Fibonacci sequence, deﬁned as: F ( N ) = (cid:40) F ( N −

1) + F ( N − , if N > = 1 N, N = -1, N = 0 (16)Having F ( N ) = a . . . a p where a j is the bit at position j of the binary representation of F ( N ) , the Fibonacci binarycoding is deﬁned as: F E ( N ) = j (cid:88) i =0 a i · F ( i ) (17)Where a i ∈ { , } is the i-th bit of the binary representationof F ( N ) [37]. This technique combines together three well known com-pression techniques [23]: delta encoding, run-length en-coding and Huffman code. Since these techniques are alllossless, the compression provided by DRH is lossless too incase no quantization is applied.The following pseudocode describes the compressionalgorithm: Where Q is the quantization level.The decompression algorithm is the inverse of the com-pression one: once data is received, it is decoded usingthe Huffman code and reconstructed using the repetitionscounter.Since this kind of algorithm is not computationally ex-pensive, the compression phase can be performed also bylow resources computational units, such as sensor nodes. DRH nodel level compress ( Stream S , f l o a t Q) { Element l a s t V a l u e = n u l l ; Element l a s t D e l t a = n u l l ; i n t counter = 0 ; foreach ( s in S ) { i f ( l a s t != n u l l && l a s t D e l t a != n u l l ) { f l o a t d e l t a = ( i n t ) ( l a s t V a l u e − s ) / Q; l a s t V a l u e = s ; i f ( d e l t a == l a s t D e l t a ) { counter += 1 ; } e l s e { encoded = huffmanEncode ( l a s t D e l t a ) ; send ( ( encoded , counter ) ) ; l a s t D e l t a = d e l t a ; counter = 0 ; } } e l s e { l a s t V a l u e = s ; l a s t D e l t a = 0 ; } } } Sprintz algorithm [3] is designed for the IoT scenario, inwhich energy consumption and speed are important factors.In particular, the goal is to satisfy the following require-ments: • Handling of small blocks size • High decompression speed • Lossless data reconstructionThe proposed algorithm is a coder, that exploits predictionto achieve better results. In particular it is based on thefollowing components: • Forecasting : used to predict the difference betweennew samples and the previous ones through deltaencoding or FIRE algorithm [3]; • Bit packing : packages are composed by a payloadthat contains prediction errors and a header thatcontains information that are used during recon-struction; • Run-length encoding : if a sequence of correct fore-casts occurs, the algorithm doesn’t send anythinguntil some error is detected and the length of skippedzero error packages is added as information; • Entropy coding : package headers and payloads arecoded using Huffman coding, presented in Subsec-tion 3.4.

This lossless technique is composed by 5 steps, combiningdelta encoding, run-lenght and Fibonacci coding, as shownin Figure 5 [33].This techniques is developed speciﬁcally for devicescharacterized by low memory and computational resources,such as IoT infrastructures.

RAKE algorithm, presented in [4], exploits sparsity toachieve compression. It is a lossless compression algorithmwith two phases: preprocessing and compression. Fig. 5: RLBE compression steps

Preprocessing

In this phase, a dictionary is used totransform original data. For this purpose, many algorithmscan be used, such the Huffman coding presented in Sub-section 3.4, but since the aim is that of obtain sparsity,the RAKE dictionary uses a code similar to unary codingthus every codeword has at most one bit set to 1. Thisdictionary doesn’t depend on symbol probabilities, so nolearning phase is needed. Table 1 shows a simple RAKEdictionary. TABLE 1: RAKE dictionary

RAKEsymbol code length-1 1 1+1 01 2-2 001 3+2 0001 4... ... ...-R 0...1 2R - 1+R 00...1 2R0 all zeros 2R

Compression

This phase works, as suggested by thealgorithm name, as a rake of n teeth. Figure 6 shows anexecution of the compression phase, given a preprocessedinput and n = 4 .Fig. 6: RAKE algorithm executionThe rake starts at position 0, in case there is no bit set to 1in the rake interval, then 0 is added to the code, otherwise 1and followed by the binary representation of relative indexfor the ﬁrst bit set to 1 in the rake (2 bits in the example inFigure 6). After that, the rake is shifted right of n positionsfor the ﬁrst case or starting right to the ﬁrst found bit. In theﬁgure the rake is initially at position 0, and the ﬁrst bit setto 1 is in relative position 1 (output: 1 followed by 01), thenthe rake advance of 2 positions (after the ﬁrst 1 in the rake); all bits are set to zero (output: 0) thus that rake is movedforward by 4 places; the ﬁrst bit set to 1 in the rake hasrelative index 2 (output: 1 followed by 10) thus the rake isadvances by 3 places and the process continues for the twolast rake positions (output: 101 and 0 respectively). Decompression

The decompression processes the com-pressed bit stream replacing 0 with n occurrences of 0 and1 followed by an offset with a number 0 equal to the offsetfollowed by a bit set to 1. The resulting stream is decodedon the ﬂy using the dictionary. In this subsection we present other time series compres-sion algorithms that cannot be grouped in the previouslydescribed categories.

This algorithm is introduced in [7] and exploits time seriesfeatures (maxima and minima) to achieve compression. Forthis purpose, strict, left, right and ﬂat extrema are deﬁned.Considering a time series

T S = [( t , x ) , . . . , ( t n , x n )] , x i isa minimum if follows these rules: • strict : if x i < x i − ∧ x i < x i +1 • left : if x i < x i − ∧ ∃ k > i : ∀ j ∈ [ i, k ] , x j = x i ∨ x i i : ∀ j ∈ [ i, k ] , x j = x i ∨ x i < x k +1 ) ∧ ( ∃ k < i : ∀ j ∈ [ k, i ] , x j = x i ∨ x i < x k − ) For maximum points they are deﬁned similarly.After deﬁning minimum and maximum extrema, theauthors introduce the concept of importance, based on adistance function dist and a compression ratio ρ : x i is animportant minimum if: ∃ i l < i < i r : x i is minimum in { x l , . . . , x r } ∧ dist ( x l , x i ) < ρ ∧ dist ( x i , x r ) < r Important maximum points are deﬁned similarly.Once important extrema are found they are used as acompressed representation of the segment. This techniqueis a lossy compression technique, as the Segment Mergingone described in the next section. Despite the fact thatthe compressed data can be used to obtain original dataproperties useful for visual data representations (minimumand maximum), it is impossible to reconstruct the originaldata.

This technique, presented in [8], considers time series withregular timestamps and repeatedly replace sequences ofconsecutive elements (segments) with a summary consistingof a single value and a representation error, as shown inFigure 7 where the error is omitted.After compression, segments are represented by tuples ( t, y, δ ) where t is the starting time of the segment, y theconstant value associated with the segment, and δ is the seg-ment error. The merging operation can be applied either toa set of elements or to a set of segments, to further compressa previously compressed time series. The case of elements is Fig. 7: Example of segment merging technique, with increas-ing increasing value time intervalan instance of the case of segments, and it is immediate togeneralize from two to more elements/segments. We limitour description to the merge of two consecutive segmentsrepresented by the tuples ( t i , y i , δ i ) and ( t j , y j , δ j ) , where i < j , into a new segment represented by the tuple ( t, y, δ ) computed as: t = t i y = ∆ t i · y i + ∆ t j · y j ∆ t i + ∆ t j δ = (cid:115) ∆ t i · ( y i + δ i ) + ∆ t j · ( y i + δ i )∆ t i + ∆ t j − y where ∆ t x is the duration of the segment x , thus ∆ t i = t j − t i , ∆ t j = t k − t j and t k is the timestamp of the segmentafter the one starting at t j .The sets of consecutive segments to be merged are cho-sen with the goal of minimizing segment error with con-straints on maximal acceptable error and maximal segmentduration.This technique compression technique is lossy and theresult of the compression phase can be considered both asthe compressed representation and as the reconstruction oforiginal time series, without any additional computationsexecuted by a decoder. The idea behind this algorithm is that the data generationprocess follows a probabilistic model and can be describedwith a Markov chain [14]. This means that a system can berepresented with a set of ﬁnite states S and a set of arcs A for transition probabilities between states.Once the hidden Markov chain is found using knowntechniques, as the one presented in [34], a lossy reconstruc-tion of original data can be obtained by following the chainprobabilities. When we apply the taxonomy described in Section 2.2 andSection 3 to the above techniques we obtain the classiﬁcationreported in Table 2 that summarizes the properties of thedifferent implementations. To help the reader in visuallygrasping the membership of the techniques to different parts of the taxonomy, we report the same classiﬁcationgraphically in Figure 1 using Venn diagrams.TABLE 2: Compression algorithm classiﬁcation Dictionary based techniquesNon adaptive Lossless Symmetric min ρ max (cid:15) TRISTAN - - (cid:88) (cid:88) (cid:88)

CONRAD - - (cid:88) - (cid:88) A-LZSS - (cid:88) (cid:88) - NAD-LZW - (cid:88) (cid:88) (cid:88)

NAFunction ApproximationNon adaptive Lossless Symmetric min ρ max (cid:15) PPA (cid:88) - - - (cid:88)

CPT (cid:88) - - - (cid:88)

DWT (cid:88) - - - (cid:88)

AutoencodersNon adaptive Lossless Symmetric min ρ max (cid:15) RNNA - - (cid:88) - (cid:88) Sequential algorithmsNon adaptive Lossless Symmetric min ρ max (cid:15) DRH - (cid:88) (cid:88) - NASPRINTZ (cid:88) (cid:88) (cid:88) - NARLBE (cid:88) (cid:88) (cid:88) - NARAKE (cid:88) (cid:88) (cid:88) - NAOthersNon adaptive Lossless Symmetric min ρ max (cid:15) MEE (cid:88) - - (cid:88) -SM (cid:88) - - (cid:88) -CHMC - - (cid:88) - -

XPERIMENTAL RESULTS

In this section we discuss the different performances of thetechniques in Section 3, as reported by their authors. Toensure an homogeneous presentation of the different tech-niques and for copyright reasons, we do not include ﬁguresfrom the works we are describing. Instead, we redraw theﬁgures using the same graphical style for all of them andmaintaining a faithful reproductions with respect to therepresented values.The experiments presented in those studies where basedon the the following datasets: • ICDM challenge [38] – Trafﬁc ﬂows; • RTE France – Electrical power consumptions; • ACSF1 – Power consumptions; • BAFU , – Hydrogeological data; • GSATM – Dynamic mixtures of carbon monoxide(CO) and humid synthetic air in a gas chamber; • PigAirwayPressure – vital signs; • SonyAIBORobotSurface2 – X-axis of robot move-ments; • SMN [32] – Seismic measures in Nevada; • HAS [31] – Human activity from smartphone sen-sors; • PAMAP [25] – Wearable devices motion and heartfrequency measurements; • MSRC-12 – Microsoft Kinect gestures; • UCI gas dataset – Measuring gas concentration dur-ing chemical experiments; • AMPDs – Measuring maximum consumption ofwater, electricity, and natural gas in houses. To foster the comparison of the different algorithms wereport synoptically the accuracy and compression ratio ob-tained in the experiments described by their authors.

Accuracy

The creation of the dictionary depends on thedictionary sparsity parameter that is directly related to itssize and the accuracy can be inﬂuenced by this value, asshown in Equation 5. Experimental results are shown inFigure 8, where sparsity is related with RMSE.Fig. 8: RMSE results depending on sparsity for TRISTAN[22]

Compression ratio

The sparsity parameter also affects thecompression ratio. To assess this dependency, the authorsexecute experiments on two different datasets both containone measurements per minute with daily segments: RTEand ICDM, using dictionaries consisting respectively of 16and 131 atoms.The resulting compression ratios are ρ = 0 . for RTEand ρ = 0 . for ICDM, thus higher sparsity values (largerdictionaries) allow for better compression ratios. Accuracy

The authors of CONRAD measure accuracy usingMSE and the best reported result for their algorithms are0.03 for the BAFU dataset, 0.11 for the GSATM dataset and0.04 for the ACSF1 dataset. As for TRISTAN, accuracy de-pends on time series segments redundancy and correlation.

Compression ratio

The best reported results for CONRADcompression ratio are ρ = 0 . for the ACSF1 dataset, ρ =0 . for the BAFU dataset and ρ = 0 . for the GSATMdataset.The compression ratio is affected by the parametersused during training and encoding: error threshold, sparsityand segment length. Their effects are depicted in the threeplots in Figure 9, that represent the compression factor ( ρ )obtained for different values of the parameters.

7. http://ampds.org (a) Varying error threshold(b) Varying dictionary atoms(c) Varying segment length Fig. 9: Compression factor ( ρ ) with varying parameters forCONRAD [16] Accuracy and compression ration of PPA are affected by themaximum polynomial degree and the maximum alloweddeviation, the parameters of the method as described inSubsection 3.2.

Accuracy

Errors are bound by the maximum deviationparameter and also the maximum polynomial degree canaffect accuracy: deviation is always under the thresholdand using higher degree polynomial yield more precisereconstructions.Figure 10 shows the relation between maximum devia-tion and MSE.Fig. 10: Relation between MSE and maximum deviation forPPA [13]

Compression ratio

Both the maximum deviation and poly-nomial degree parameters affect the compression ratio, aswe can observe in Figure 11 where the compression factor( ρ ) is reported for different values of parameters. The bestcompression ratios, ρ = 0 . for REDD and ρ = 0 . forOfﬁce, are achieved for the highest degree polynomials.As expected, also for increasing maximum deviation thecompression factor is monotonically increasing. Despite the absence of experimental results in the originalpaper, some considerations can be done. The algorithm hasthree parameters: • Block size; • Maximum deviation; • Number of quantization bits.Similarly to PPA, in CPT it is possible to obtain highercompression ratios at the cost of higher MSE by increasingthe maximum deviation parameter. The same holds for theblock size: larger blocks corresponds to better compressionratios and higher decompression errors [11]. Similarly, wecan suppose that using less quantization bits would entail aless accurate decompression and a better compression ratio.

As for CPT, experimental results are not provided, butalso DWT performances follow the same rules as for PPA, (a) Varying maximum polynomial degree(b) Varying maximum deviation Fig. 11: PPA [6] compression factor ( ρ ) for varying parame-terssince compression ratio and threshold errors are strictlycorrelated. The threshold error can be ﬁxed a priori in orderto have an higher compression ratio or a higher accuracy. Accuracy

One of the parameters of the RNNA based methodis a threshold for the maximal allowed deviation whosevalue directly affects accuracy. For the experiments on uni-variate time series presented in [13] the authors report,considering the optimal value for deviation parameter, aRMSE value in the range [0 . , . for different datasets.In the same experiments, RMSE is in the range [0 . , . for multivariate time series datasets. Thus, there is no signif-icant difference between univariante and multivariate timeseries. Compression ratio

Similarly, the maximal allowed devia-tion affect the compression ratio. According to the resultsreported by the authors, also in this case there is not asigniﬁcant difference between univariate and multivariate time series: ρ is in the range [0 . , . for the univariatetime series dataset and in the range [0 . , . for themultivariate one. DRH, as most of the following algorithms, is a losslessalgorithm thus the only relevant performance metric is thecompression ratio.

Compression ratio

This parameter highly depends on thedataset: run-length algorithm combined with delta encodingachieves good results if the time series are composed by longsegments that are always increasing or decreasing of thesame value or constant. Experimental results are obtainedon temperature, pressure and light measures datasets. Forthis type of data, DRH appear to be appropriate, sincevalues are not highly variable and have relatively longinterval with constant values. The resulting compressionfactors are reported in Table 3 [23].TABLE 3: DRH Compression factors / ρ Indoor temperature 92.2Outdoor temperature 91.17Pressure 82.55Light 83.53

Compression ratio

This algorithm is tested over the datasetsof the UCR time series archive, which contains data com-ing from different domains. In the boxplots in Figure 12,compression factors are shown for 16 and 8-bit data andcompared with other lossless compression algorithms. Fromthis graph it is possible to see that all algorithms achievebetter (higher) compression factors on the 8-bit dataset. Thiscould be due to the fact that 16-bit data are more likely tohave higher differences between consecutive values.Another interesting result is that the FIRE forecastingtechnique improves this compression, especially if com-bined to Huffman coding, when compared with Delta cod-ing [3].

Compression ratio

The author of this algorithm in [33]report compression ratio and processing time of RLBE withrespect to those of other algorithms as shown in Figure 13where ellipses represent the inter-quartile range of com-pression ratios and processing times. RLBE achieves goodcompression ratios with a 10% variability and with lowprocessing time.

Compression ratio

The authors of RAKE in [4] run experi-ments on sparse time series with different sparsity ratios p and different algorithms. The corresponding compressionfactors are reported in Table 4. We can notice that thecompression factor is highly dependent on sparsity andlower sparsity values corresponds to higher compressionfactors (better compression). Thus the RAKE algorithm ismore suitable for sparse time series. Fig. 12: Sprintz compression factors versus other losslesscompression algorithms [3]Fig. 13: RLBE compression ratio versus other lossless com-pression algorithms [33]

The compression ratio is one of the parameter of the MEEalgorithm thus the only relevant performance metric is theaccuracy.

Accuracy

Despite the fact that MEE is a lossy com-pression algorithm, the authors have not provided any TABLE 4: Compression factor results for RAKE versus otherlossless compression algorithms pAlgorithm 0.002 0.005 0.01 0.05 0.1 0.15 0.2 0.25OPT 48.0 22.0 12.4 3.5 2.1 1.6 1.4 1.2RAKE 47.4 21.5 12.0 3.5 2.1 1.6 1.4 1.2RAR 22.1 12.1 7.4 2.6 1.7 1.4 1.2 1.2GZIP 21.4 11.8 7.4 2.6 1.8 1.4 1.3 1.2BZIP2 26.0 14.2 8.7 2.6 1.7 1.3 1.2 1.1PE 29.5 11.9 5.9 1.2 0.6 0.4 0.3 0.2RLE 35.7 15.7 8.4 2.1 1.2 0.9 0.8 0.6 information about accuracy. Since the accuracy depends onthe compression ratio, the authors recommend to choose avalue that is not smaller than the percentage of non-extremalpoints, in order to obtain an accurate reconstruction of thecompressed data [7].

Since this algorithm is used for visualization purposes, aﬁrst evaluation can be a qualitative observation. In Fig-ure 7 we can see how a light sensor time series is com-pressed considering different time series segment lengths.For this algorithm, compression ratio and error are strictlycorrelated, as shown in Figure 14 where FWC and CB- m corresponds to different versions of the algorithm anddifferent parameters [8]. We can observe that compressionFig. 14: Correlation between compression ratio and error ona temperature dataset [8]ratio values close to 1 correspond to error values close tozero.The dependence of compression ratio and error changesfor different datasets. Knowing the function that correlatescompression ratio and error for a given dataset, it is possibleto set the compression ratio that corresponds to the desiredmaximum compression error. For this algorithm, the authors haven’t provided any quan-titative evaluation for compression performances: their as-sessment is qualitative and mainly focused on the compres-sion and reconstruction of humanoid robot motions after alearning phase [14]. ONCLUSIONS

The amount of time series data produced in several contextsrequires specialized time series compression algorithms tostore them efﬁciently. In this paper we provide an analysisof the most relevant ones, we propose a taxonomy to classifythem, and report synoptically the accuracy and compressionratio published by their authors for different datasets. Evenif this is a meta-analysis, it is informative for the readerabout the suitability of a technique for speciﬁc contexts,for example when ﬁxed compression ratio or accuracy arerequired, or for dataset having particular characteristics.In particular,

Dictionary-Based methods are more effec-tive with datasets having high redundancy and sparsity andcan be divided into lossy and lossless. The latter ensurea faithful reconstruction of the time series, whereas theﬁrst obtain better compression ratio and allow for choosinga tradeoff between accuracy and compression.

Functionalapproximation methods are preferable for smooth timeseries, such as those containing environmental conditionsmeasures for which it can achieve low compression ratioswith high accuracy.

Autoencoders accuracy and compres-sion ratio are strongly related to the capacity of the RNNof ﬁnding pattern in the training set. One limitation ofthis technique is that in case a compressor receive in inputtime series that are not similar to the ones in the trainingset, the compression ratio can be worse than expected.The

Sequential algorithms described in this survey are alllossless and computationally efﬁcient, so they work well inthe IoT context, in which devices with low computationalcapabilities are often be used. Finally, the last algorithms weconsidered ( SM and MEE ) are mostly used for visualizationpurposes. R EFERENCES [1] Abo-Zahhad, M.: ECG Signal Compression Using Discrete WaveletTransform. In: J.T. Olkkonen (ed.) Discrete Wavelet Transforms -Theory and Applications. InTech (2011). DOI 10 . , 241–261 (2019). DOI 10 . . comnet . . . (3), 1–23 (2018). DOI 10 . . . . l i nfty norm. IEEE Transactions on Signal Process-ing (8), 3111–3124 (2006). DOI 10 . . . (2), 193–218 (2015). DOI 10 . (2), 255–270 (2011). DOI 10 . . . . deeplearningbook . org[10] Hardi, S.M., Angga, B., Lydia, M.S., Jaya, I., Tarigan, J.T.: Compara-tive Analysis Run-Length Encoding Algorithm and Fibonacci CodeAlgorithm on Image Compression. Journal of Physics: ConferenceSeries , 012107 (2019). DOI 10 . . nasa . gov/search . jsp?R=20120010460[12] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neuralcomputation , 1735–80 (1997). DOI 10 . . . . . . . . (11), 2581–2600 (2017). DOI 10 . . . . . . . . . . . . (1), 343 (2017). DOI 10 . , 19–60 (2009). DOI 10 . . (12), 3397–3415 (1993). DOI 10 . . . . . . . . . . . . . , 183–188 (2015). DOI 10 . . procs . . . (3), 423–434 (1991). DOI 10 . , 132306 (2020). DOI 10 . . physd . . [31] Shoaib, M., Bosch, S., Incel, O., Scholten, H., Havinga, P.: Fu-sion of smartphone motion sensors for physical activity recogni-tion. Sensors (Basel, Switzerland) , 10146–10176 (2014). DOI10 . . . . (4), 928–951 (1982). DOI10 . . . CEUR Workshop Proceedings , vol. 567, pp. 72–83. CEUR-WS.org(2010). URL http://ceur-ws . org/Vol-567/paper14 . pdf[38] Wojnarski, M., Gora, P., Szczuka, M., Nguyen, H.S., Swietlicka, J.,Zeinalipour, D.: Ieee icdm 2010 contest: Tomtom trafﬁc predictionfor intelligent gps navigation. In: 2010 IEEE International Con-ference on Data Mining Workshops, pp. 1372–1376 (2010). DOI10 . . . cs.AI/0307013 (2003) Giacomo Chiarot received his M.S. degree inComputer Science from Ca’ Foscari University ofVenice in 2019. Currently, he is a full-time Ph.D.Candidate in Computer Science at Ca’ FoscariUniversity of Venice. His current research in-clude time series, compression algorithms, andmachine learning.