[PDF] A New Approach of Data Pre-processing for Data Compression in Smart Grids

Abstract

The conventional approach to pre-process data for compression is to apply transforms such as the Fourier, the Karhunen-Loève, or wavelet transforms. One drawback from adopting such an approach is that it is independent of the use of the compressed data, which may induce significant optimality losses when measured in terms of final utility (instead of being measured in terms of distortion). We therefore revisit this paradigm by tayloring the data pre-processing operation to the utility function of the decision-making entity using the compressed (and therefore noisy) data. More specifically, the utility function consists of an Lp-norm, which is very relevant in the area of smart grids. Both a linear and a non-linear use-oriented transforms are designed and compared with conventional data pre-processing techniques, showing that the impact of compression noise can be significantly reduced.

Full PDF

aa r X i v : . [ ee ss . SP ] N ov A New Approach of Data Pre-processing for DataCompression in Smart Grids

Yifei Sun ⋆ Hang Zou ⋆ Samson Lasaulce ⋆ Michel Kieffer ⋆ Lucas Saludjian † ⋆ L2S, CNRS-CentraleSupelec-Univ. Paris Sud, Gif-sur-Yvette, France † RTE, France

Abstract —The conventional approach to pre-process data forcompression is to apply transforms such as the Fourier, theKarhunen-Lo`eve, or wavelet transforms. One drawback fromadopting such an approach is that it is independent of the use ofthe compressed data, which may induce signiﬁcant optimalitylosses when measured in terms of ﬁnal utility (instead ofbeing measured in terms of distortion). We therefore revisitthis paradigm by tayloring the data pre-processing operationto the utility function of the decision-making entity using thecompressed (and therefore noisy) data. More speciﬁcally, theutility function consists of an Lp-norm, which is very relevantin the area of smart grids. Both a linear and a non-linear use-oriented transforms are designed and compared with conven-tional data pre-processing techniques, showing that the impactof compression noise can be signiﬁcantly reduced.

Index Terms —Compression, Learning, Neural Networks, Pre-processing, Transform.

I. I

NTRODUCTION

Smart grids are, in particular, electricity networks equippedwith many sensors to monitor the states of the grid (e.g., thevoltage, the current, the phase, the temperature), which gen-erates very large volumes of data to be compressed and trans-mitted through the grid. This strongly motivates the researchin the area of compression of smart grid measurements (seee.g., [1]). Signals measured in smart grids are quite speciﬁcand make classical compression techniques unsuited. This iswhy researchers had to adapt some existing schemes to thecontext of smart grids. Despite of that, the used mathematicaltools to design good compression schemes are quite classical.For instance, in [2]- [3] the discrete wavelet transform (DWT)is that the core of the proposed design. In [4], [5]- [6],the singular value decomposition and principal componentanalysis (PCA) are respectively exploited. More advanced andmore speciﬁc designs have been proposed e.g., in [7] where theauthors propose a real-time data compression scheme based onan algorithm combining exception compression with swingdoor trending compression. In [8], the authors propose toexploit some priors on the consumers’ behavior to improvedata compression.When inspecting all the quoted works and the underlyingliterature, it can be seen that the main objective is either todecrease the required data rate to reach a given compressionquality (that is minimizing the distorsion) or to improve asmuch as possible the compression quality for a ﬁxed rate.Recently, the authors of [9] [10] have questioned this pointof view. Indeed, if one assumes that the decision-making entity has to make a binary decision, having at its disposalhigh quality data measurements might be useless and, worsethat that, it may involve costly system overdimensioning (interms of storage space, bandwidth, processing capability, etc).Rather, the new paradigm which is suggested in [9] [10] isto consider the ﬁnal use made by the decision-maker usingthe compressed data. When this use can be modeled as themaximization of a given utility function, it becomes possible totaylor the compressor to the ﬁnal use. In [9] [10], the authorsfocus on the quantization stage of the compressor and thedata use consists in choosing a power control vector basedon a compressed version of the channel state. In the presentpaper, we develop for the ﬁrst time this new paradigm forthe problem of data processing and consider as the ﬁnal usethe Lp minimization problem, which is fundamental in smartgrids since it encompasses key problems such as the peakpower, total consumed energy, and Joule losses minimizationproblems.The paper is structured as follows. In Section II we for-mulate the problem to be solved. In particular, we deﬁnethe ﬁnal utility function to be maximized and introduce theproposed optimality loss measure. In Section III we providean approximation of the solution of the problem of SectionII. This approximation is very useful to derive the best linearuse-oriented pre-processing technique in Section IV. In V, weexploit a neural network to design a non-linear use-orientedpre-processing technique and there to be able to assess thebeneﬁts from having a non-linear transform instead of alinear one. Numerical results are provided in Section VI andconclusions are given in Section VII.II. P

ROBLEM FORMULATION

In this paper, we assume that the use of the decision-makingentity (e.g., a home ﬂexible consumption power scheduler,an electric vehicle battery charging controller, a hot-spottemperature distribution transformer controller, a transmissionnetwork ﬂow controller) using the compressed data can bemodeled as the maximization operation of a certain utilityfunction. The considered utility function expresses as follows: u ( x ; ℓ ) = −|| x + ℓ || p (1) • where the Lp-norm of a generic vector v of size n isdeﬁned by k v k p = ( | v | p + · · · + | v n | p ) /p ; • x = [ x , x , . . . , x P ] T ∈ R P + corresponds to the decisionto be taken by the decision-maker; ℓ = [ ℓ , ℓ , . . . , ℓ P ] T corresponds to the parameters of thefunction to be maximized and precisely correspond to thedata to be measured and compressed. This means that thedecision-maker has in fact only access to a compressedversion of ℓ , which will be denoted by b ℓ .In the case of a home consumption power scheduler, x wouldrepresent the ﬂexible consumption vector to be chosen and ℓ would represent the non-controllable or non-ﬂexible part ofthe consumption, and only a compressed version of it wouldbe available to the scheduler. With p = 1 , the x maximizing u minimizes the total energy consumption. For p = 2 , itminimizes Joule losses (whenever x and ℓ are interpreted acurrents). For p → ∞ , one minimizes the peak power.If one assumes that x i is a power and one has to fulﬁlla given energy need E , the optimization problem (OP) ofinterest associated with the task to be executed by the decision-maker becomes:maximize u ( x ; ℓ ) s.t. P X j =1 x j − E = 0 x j > , j = 1 , . . . , P, (2)for any E > . x ⋆ ( ℓ ) is one solution for this OP, and weconsider ℓ ∈ R P for the reason as follows.In order to reduce the volume of the data to transmit, a lossycompression technique is implemented. Thus the exact valueof ℓ are usually unavailable to make a decision.For this reason, when solving (2), assume that only anapproximate value b ℓ ∈ R P of ℓ is available. Replacing ℓ by b ℓ in (2), one obtains x ⋆ (cid:16)b ℓ (cid:17) .The resulting utility becomes u (cid:16) x ⋆ (cid:16)b ℓ (cid:17) ; ℓ (cid:17) = −|| x ⋆ (cid:16)b ℓ (cid:17) + ℓ || p . (3)Due to the approximate value b ℓ obtained by lossy com-pression, u (cid:16) x ⋆ (cid:16)b ℓ (cid:17) ; ℓ (cid:17) u ( x ⋆ ( ℓ ) ; ℓ ) . And an utility loss iscaused.Our objective is to design a source coding scheme whichminimizes the utility loss. quantization transmission decodingprecoding Fig. 1. Coding scheme

Fig. 1 illustrates this coding scheme.Precoding scheme can be considered as a function g : R P + → R N ℓ θ where N P . By precoding function g ( · ) , the encodedparameters, θ ∈ R N , are obtained.Because of the bit constraint, quantization precessing fol-lows after the precoding. Assume that a quantization function q ( · ) is available, thus one obtains quantized encoded param-eters b θ = q ( θ ) . And b θ is transmitted losslessly.The reconstructed signal is obtained by the quantized en-coded parameters and a decoding function h ( · ) , b ℓ = h (cid:16)b θ (cid:17) h : R N → R P b θ b ℓ Our aim is to search these precoding and decoding funtionthat can minimize the average utility loss E ℓ (cid:20)(cid:12)(cid:12)(cid:12) u ( x ⋆ ( ℓ ) ; ℓ ) − u (cid:16) x ⋆ (cid:16)b ℓ (cid:17) ; ℓ (cid:17)(cid:12)(cid:12)(cid:12) (cid:21) . (4)Therefore, the goal is to choose the best data pre-processingschemes g and h to minimize the above optimality loss.Note that the conventional paradigm consisting in minimizingdistorsion is readily obtained as a special instance by choosing u to be the identity function.III. L INEAR APPROXIMATION OF THE OPTIMAL DECISION

A. Solution of OP for ﬁxed data

For a given realization of random vector ℓ ∈ R P , we wantto ﬁnd x ⋆ ( ℓ ) deﬁned in (2). Without loss of generality, weassume ℓ ℓ · · · ℓ z < ℓ z +1 · · · ℓ P (5)When p ∈ N + , x ⋆ ( ℓ ) ∈ arg min x (cid:16)P Pj =1 | x j + ℓ j | p (cid:17) p which is equivalent to x ⋆ ( ℓ ) ∈ arg min x P Pj =1 | x j + ℓ j | p .And we can discuss in two cases: < E < P zj =1 | ℓ j | or E > P zj =1 | ℓ j | .In the ﬁrst case, a solution for this OP should satisfy z X j =1 x ⋆j = E, x ⋆z +1 = · · · = x ⋆P = 0 (6)and x ⋆j + l j < , ∀ j ∈ { , , . . . , z } (7)Because, for (6), if ∃ j ∈ { z + 1 , . . . , P } , x ⋆j > , forsure, ∃ i ∈ { , . . . , z } , x ⋆i + ℓ i < . We can use x † j = x ⋆j − min (cid:0) x ⋆j , − x ⋆i − ℓ i (cid:1) and x † i = x ⋆i + min (cid:0) x ⋆j , − x ⋆i − ℓ i (cid:1) replacing respectively x ⋆j and x ⋆i , obtaining a p-norm less thanthe original which implies x ⋆j and x ⋆i are not the solutions.The reason for having (7) is similar. If ∃ j ∈{ , , . . . , z } , x ⋆j + ℓ j > , and for sure, ∃ i ∈ { , . . . , z } , x ⋆i + ℓ i < We can obtain a better result by replacing x ⋆j and x ⋆i with x † j = x ⋆j − min (cid:0) x ⋆j + ℓ j , − x ⋆i − ℓ i (cid:1) and x † i = x ⋆i + min (cid:0) x ⋆j + ℓ j , − x ⋆i − ℓ i (cid:1) respectively.And from the OP which is easy to prove to be convex, wecan thus use KKT and get Lagrangian function LL = z X j =1 ( − x j − ℓ j ) p + δ  E − z X j =1 x j  − z X j =1 λ j x j (8) ∂ L ∂x j = − p ( − x j − ℓ j ) p − − δ − λ j (9)  − p (cid:0) − x ⋆j − ℓ j (cid:1) p − − δ ⋆ − λ ⋆j = 0 E − P Pj =1 x ⋆j = 0 − x ⋆j λ ⋆j x ⋆j = 0 (10)Getting rid of the slack variables (cid:8) λ ⋆j (cid:9) , we have − p (cid:0) − x ⋆j − ℓ j (cid:1) p − = δ ⋆ (11) x ⋆j = − (cid:18) − δ ⋆ p (cid:19) − p − ℓ j ! + (12)where ( a ) + = max( a, . This is a water-ﬁlling solution. Weassume µ = − (cid:16) − δ ⋆ p (cid:17) − p is water level, thus x ⋆j = ( µ − ℓ j ) + (13)according to (5) µ is obtained by µ = 1 n ⋆  E + n ⋆ X j =1 ℓ j  (14)where n ⋆ indicates the number of induces that should beloaded n ⋆ = max n z  n : ( n − ℓ n − n − X j =1 ℓ j < E  (15)In the second case, a solution should satisfy x ⋆j + ℓ j > , ∀ j ∈ { , . . . , P } (16)the reason is similar to (7). Thus, we get a new Lagrangianfunction LL = P X j =1 ( x j + ℓ j ) p + δ  E − P X j =1 x j  − P X j =1 λ j x j (17)But we can get also a water-ﬁlling solution by the similarway to the ﬁrst case.When p → + ∞ , this OP can also be solved by a water-ﬁlling solution obtained by a similar way when p ∈ N + . Dueto the limit of space, the discussion is omitted here. B. Linear approximation of the water-ﬁlling operator aroundﬁxed data

Water-ﬁlling solution a widely used in communication sys-tems [11]- [12]. One has the solution x ⋆ ( ℓ ) , but it is hard tohave a explicit expression of this solution. In order to facilitatethe operation on water-ﬁlling solution, like computing the gra-dient, our aim in this section is to ﬁnd a linear approximationof this solution. Its Taylor series expansion can be expressedas x ⋆ ( ℓ + dℓ ) = x ⋆ ( ℓ ) + ( ∇ ℓ x ⋆ ( ℓ )) T dℓ + o ( dℓ )= x ⋆ ( ℓ ) + H dℓ + o ( dℓ ) (18)where H is the Jacobian matrix. From the discussion of the minimization problem of || x + ℓ || p with different p . x ⋆ ( ℓ ) = [ x ⋆ , x ⋆ , . . . , x ⋆P ] T is thefunction of ℓ . We consider water-ﬁlling solution in a linearregion where the nonzero/zero elements is still nonzero/zero. n ⋆ thus is ﬁxed. Accoding to the admissible values of n ⋆ ( n ⋆ = 1 , . . . , P ), R P can be divided into several regions I ⊆ { , , . . . , P } , I 6 = ∅ M I = { ℓ | x ⋆k ( ℓ ) > , k ∈ I , x ⋆j ( ℓ ) = 0 , j / ∈ I}I is powerset of { , , . . . , P } excluding the empty set. R P is divided into P − regions ∀ i = j, M i \ M j = ∅ , P − [ i =1 M i = R P In each region, we can have a linear approximation ofwater-ﬁlling operator. It means that H is different in differentregions. H is thus a function of ℓ , denoted by H ( ℓ ) .Consider one type of these regions ( H ( ℓ ) can be obtainedby the same way in other regions) where we assume last ( P − n ⋆ ) elements of ℓ will not affect on the output, x ⋆n ⋆ +1 = x ⋆n ⋆ +2 = · · · = x ⋆P = 0 . We focus on the inﬂuence of the ﬁrst n ⋆ elements of ℓ . We consider all admissible values of the ﬁrst n ⋆ elements that satisfy n ⋆ ﬁxed and ℓ , . . . , ℓ n ⋆ < ℓ n ⋆ +1 · · · ℓ P (19)Thus from Equations (13) and (14), we have x ⋆j = ( P n⋆i =1 ℓ i + En ⋆ − ℓ j j n ⋆ j > n ⋆ (20)By calculating the gradient of x ⋆ ( ℓ ) , we have H ( ℓ ) =  − n ⋆ n ⋆ · · · n ⋆ · · · n ⋆ − n ⋆ · · · ... ... ...... ... ... ... ... n ⋆ · · · · · · − n ⋆ · · · · · · · · · · · · ... ... ... ... · · · · · · · · ·  (21)and a constant term is obtained easily b ( ℓ ) =  En ⋆ En ⋆ ... En ⋆ ...  (22)For the same reason as H ( ℓ ) , b depends on ℓ and is denotedby b ( ℓ ) .V. B EST L INEAR T RANSFORMATION

Linear transformation is a classic approximation that projectthe signal to the chosen basis. And the signal is represented bya few of the vectors in this basis. Karhunen-Lo`eve Transfor-mation (KLT) have been proved be an optimal approximationin the sense of mean-square error [13]. The encoding anddecoding scheme can be considered as functions g ( · ) and h ( · ) . For simpliﬁcation, we consider an linear approximationwithout quantizer, and θ = g ( ℓ ) = B ℓ , b ℓ = h ( θ ) = B T B ℓ ,where B ∈ R N × P . (4) becomes E ℓ (cid:20)(cid:12)(cid:12)(cid:12) u ( x ⋆ ( ℓ ) , ℓ ) − u (cid:16) x ⋆ (cid:16) B T B ℓ (cid:17) , ℓ (cid:17)(cid:12)(cid:12)(cid:12) (cid:21) (23)In the case (23), we search a linear mapping from R P + to R P with maximum rank N .With a given set of realizations of ℓ , L = { ℓ (1) , ℓ (2) , . . . , ℓ ( T ) } , the loss function is equivalent to Γ ( B ) = 1 T T X i =1 (cid:16) u (cid:16) x ⋆ (cid:16) ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17) − u (cid:16) x ⋆ (cid:16)b ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17)(cid:17) = 1 T T X i =1 (cid:16) u (cid:16) x ⋆ (cid:16) ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17) + || x ⋆ (cid:16)b ℓ ( i ) (cid:17) + ℓ ( i ) || p (cid:17) = 1 T T X i =1 (cid:16) u (cid:16) x ⋆ (cid:16) ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17) + || (cid:16) H i B T B + I (cid:17) ℓ ( i ) + b i || p (cid:17) (24)Denote H (cid:16)b ℓ ( i ) (cid:17) and b (cid:16)b ℓ ( i ) (cid:17) , the linear approximation pa-rameters of water-ﬁlling operator, by H i , b i respectively, andassume that they are independent of B for simpliﬁcation.We consider gradient descent method searching an optimal B ∗ . Algorithm 1

Gradient descent searching B ∗ Require:

Initial matrix B (KL basis) Input:

Learning rate ǫ = 0 . while iteration max not reached and optimality loss reducedmore than . do Compute gradient: G ← ∇ B Γ ( B ) Apply update: B = B − ǫ G end while Even if our problem is not convex, we can ﬁnd a localoptimal which is veriﬁed by experiment.When p ∈ N + , after some calculation, we can get thegradient ∇ B Γ ( B ) = 1 T T X i =1 C i (cid:16) B ℓ ( i ) β Ti H i + BH Ti β i ℓ ( i ) T (cid:17) (25)where C i = 2 (cid:16) u (cid:16) x ⋆ (cid:16) ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17) − u (cid:16) x ⋆ (cid:16)b ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17)(cid:17) || x ⋆ (cid:16)b ℓ ( i ) (cid:17) + ℓ ( i ) || − pp (26) β i = (cid:16) x ⋆ (cid:16)b ℓ ( i ) (cid:17) + ℓ ( i ) (cid:17) ⊙ · · · ⊙ (cid:16) x ⋆ (cid:16)b ℓ ( i ) (cid:17) + ℓ ( i ) (cid:17)| {z } p − (27)When p → + ∞ , the gradient is obtained by ∇ B Γ ( B ) = 1 T T X i =1 D i (cid:16) B ℓ ( i ) s Tk ( i ) H i + BH Ti s k ( i ) ℓ ( i ) T (cid:17) (28)where D i = 2 (cid:16) u (cid:16) x ⋆ (cid:16) ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17) − u (cid:16) x ⋆ (cid:16)b ℓ ( i ) (cid:17) , ℓ ( i ) (cid:17)(cid:17) (29) s k ( i ) =  ( k ( i ) − × ( P − k ( i )) ×  (30)assume for x ⋆ (cid:16)b ℓ ( i ) (cid:17) + ℓ ( i ) , k ( i ) -th element has the maximumvalue. V. B EST N ONLINEAR T RANSFORMATION

The errors of approximation depends on the signal. Linearand nonlinear approximation have a similar performance ifsignal is uniformly regular. However, for the nonuniformlyregular signal, nonlinear approximation performs better [13].We thus consider also that function g ( · ) is nonlinear. Becauseof the quantizer, we still consider h ( · ) as a linear function. Fig. 2. Structure of NN

In this case, we propose to solve (4) by using autoencoder[14]. An autoencoder is a type of unsupervised learning neuralnetwork. This network is trained to copy its input ℓ to itsoutput e ℓ in order to efﬁciently encode the data. Autoencoder istypically used for dimensionality reduction or feature learning.The network can be considered as unite of two parts. One part,for encoding, is trained to ignore the piddling information, thedimension of the input thus can be reduced. Another part, fordecoding, is trained to reconstruct, from the reduced encodedarameters e θ , an output that minimizes the loss function(usually considered as mean-square error).Fig. 2 illustrate the structure of autoencoder that we used.With reduction side, the network encodes from input ℓ to e θ whose dimension N is much less than the dimension of theoriginal input. The reconstructing side generate an output e ℓ from e θ .Denoting W ( γ ) i,j the weight between the neuron j in the γ -th layer and the neuron i in ( γ + 1) -th layer. And o ( γ ) j is theoutput of j -th neuron in γ -th layer, b ( γ ) i is the bias term forneuron i , and this term is indicated by in the Fig. 2. Thebasic model for autoencoder is given by: o ( γ +1) i = f  b ( γ ) i + N γ X j =1 W ( γ ) i,j o ( γ ) j  where N γ indicate the number of neuron in γ -th layer, f ( · ) isthe activation function. After some matrix manipulations, weget the expressions of e θ and e ℓ e θ = f ( W ζ ) (31) e ℓ = W η (32)where ζ = (cid:18) ℓ (cid:19) η = (cid:18) e θ (cid:19) their entries are for bias term, and W =  W (1)1 , · · · W (1)1 ,P b (1)1 ... ... ... W (1) N, · · · W (1) N,P b (1) N  W =  W (2)1 , · · · W (2)1 ,N b (2)1 ... ... ... W (2) P, · · · W (2) P,N b (2) P  Instead of mean-square error, in our autoencoder, we deﬁnea loss function depending on the utility loss (cid:16) || x ⋆ (cid:16)e ℓ (cid:17) + ℓ || p − || x ⋆ ( ℓ ) + ℓ || p (cid:17) (33)VI. N UMERICAL PERFORMANCE ANALYSIS

In this section, we present some simulation results. These re-sults illustrate the performance of KLT, proposed approaches:linear transformation and nonlinear transformation.Our dataset comes from Ausgrid [15]. Consider one user’sdaily energy consumption as a random vector ℓ . We choosethe energy consumption of one user during one year as dataset. The data are measured each half hour, and the energyconsumption is transmitted once each day. Thus P = 48 andthe data set is of dimension × . The rank limit is N = 4 .Other parameters is set as p = inf or , E = 50 . For linear transformation, we take KL basis as initial valueof B . Alg. 1 is then executed.We choose one hidden layer neural network for nonlineartransformation. Since the rank limit is 4, hidden layer has4 neurons. The outputs of these 4 neurons are exactly theencoded information. Since sigmoid function restrict its outputbetween and , in order to simplify the quantization, wechoose it as activation function.

50 100 150 200 250 300 350 400 iteration op t i m a li t y l o ss ( % ) (26.17%) Optimality loss for conventional approach (KLT)(10.18%)(6.74%)New approach (linear)New approach (non-linear) Fig. 3. When p → ∞ , it is seen that the optimality loss induced by thecompression noise can be reduced in typical scenarios from about (KLTperformance) to about (new approach).

50 100 150 200 250 300 350 400 iteration op t i m a li t y l o ss ( % ) (10.56%) Optimality loss for conventional approach (KLT)(3.68%)(1.96%)New approach (linear)New approach (non-linear) Fig. 4. When p = 20 , the impact of compression noise is generally lessthan for p → ∞ . A careful design allows one to obtain very small optimalitylosses. Fig. 3-4 illustrate that, with different value of p , both linearand nonlinear transformation reduce efﬁciently the optimalityloss and have a better performance than KLT. Comparedto linear transformation, the nonlinear one is better. As weintroduced in Section V, nonlinear approximation is moresuitable for nonuniformly regular data. By our nonlineartransformation, the data that we used or other data which havesome similar characteristics can take advantage of nonlinearapproximation.A dead-zone uniform quantizer follows after the precodingprocess. The quantized coefﬁcients are transmitted losslessly.We thus obtain Fig. 5-6 which illustrate the optimality lossas function of bits limitation of the quantizer. Even withonly one bit, the linear and nonlinear transformations have agreat improvement compared to conventional approach (KLT)in the sense of optimality loss. In the low bit rate region, number of bits op t i m a li t y l o ss ( % ) Conventional approach (KLT)New approach (linear)New approach (non-linear)

Fig. 5. For p → ∞ , this curve exhibits the rate-optimality loss tradeoffcurve. For rate above 6 bits/sample, improving the compression quality hasno impact on the ﬁnal utility function maximization. number of bits op t i m a li t y l o ss ( % ) Conventional approach (KLT)New approach (linear)New approach (non-linear)

Fig. 6. For p = 20 the previous observation holds. the optimality loss reduced signiﬁcantly with the increasingbit limitation. And the curves tend to ﬂat in high bit rateregion. The results shows that, by our method, even only 4 bitsallocated to each encoded coefﬁcients (16 bits for a realizationof ℓ ) are sufﬁcient to ensure an appropriate optimality loss.Even if we allocate more bits, the improvement of optimalityloss is not distinct. VII. C ONCLUSION

In this paper, a new perspective of data pre-processing (orprecoding) is taken. We present a use-oriented formulationof the problem and derive two use-oriented pre-processingschemes (namely, a linear and a non-linear scheme). Thesetwo new schemes are compared with conventional data pre-processing schemes such as the KLT. By using real smart griddata, numerical experiments show that, in the sense of theoptimality loss, linear and nonlinear transformations performmuch better than conventional schemes such as the KLT. Forinstance, the optimality loss induced by the compression noisecan be reduced in typical scenarios from about to about . R EFERENCES[1] M. P. Tcheou, L. Lovisolo, M. V. Ribeiro, E. A. B. da Silva, M. A. M.Rodrigues, J. M. T. Romano, and P. S. R. Diniz, “The compression ofelectric signal waveforms for smart grids: State of the art and futuretrends,”

IEEE Transactions on Smart Grid , vol. 5, no. 1, pp. 291–302,Jan 2014. [2] S. Santoso, E. J. Powers, and W. M. Grady, “Power quality disturbancedata compression using wavelet transform methods,”

IEEE Transactionson Power Delivery , vol. 12, no. 3, pp. 1250–1257, July 1997.[3] J. Ning, J. Wang, W. Gao, and C. Liu, “A wavelet-based datacompression technique for smart grid,”

IEEE Transactions on SmartGrid , vol. 2, no. 1, pp. 212–218, March 2011.[4] J. C. S. de Souza, T. M. L. Assis, and B. C. Pal, “Data compressionin smart distribution systems via singular value decomposition,”

IEEETransactions on Smart Grid , vol. 8, no. 1, pp. 275–284, Jan 2017.[5] R. Mehra, N. Bhatt, F. Kazi, and N. M. Singh, “Analysis of pca basedcompression and denoising of smart grid data under normal and faultconditions,” in , Jan 2013, pp. 1–6.[6] S. Das and P. S. N. Rao, “Principal component analysis basedcompression scheme for power system steady state operational data,”in

ISGT2011-India , Dec 2011, pp. 95–100.[7] Fang Zhang, Lin Cheng, Xiong Li, Yuanzhang Sun, Wenzhong Gao,and Weixing Zhao, “Application of a real-time data compression andadapted protocol technique for wams,” in , July 2015, pp. 1–1.[8] Y. Wang, Q. Chen, C. Kang, Qing Xia, Yuekai Tan, Zhijian Zeng,and Min Luo, “Residential smart meter data compression and patternextraction via non-negative k-svd,” in , July 2016, pp. 1–5.[9] C. Zhang, N. Khalfet, S. Lasaulce, V. Varma and S. Tarbouriech,”Payoff-oriented quantization and application to power control,” 201715th International Symposium on Modeling and Optimization in Mobile,Ad Hoc, and Wireless Networks (WiOpt), Paris, 2017, pp. 1-6.[10] H. Zou, C. Zhang, S. Lasaulce, L. Saludjian, and P. Panciatici,“Decision-oriented communications: Application to energy-efﬁcient re-source allocation,” in , Oct 2018, pp. 1–6.[11] G. Scutari, D. P. Palomar, and S. Barbarossa, “Simultaneous iterativewater-ﬁlling for gaussian frequency-selective interference channels,” in , July 2006,pp. 600–604.[12] G. Scutari, D. P. Palomar, F. Facchinei, and J. Pang, “Convex opti-mization, game theory, and variational inequality theory,”

IEEE SignalProcessing Magazine , vol. 27, no. 3, pp. 35–49, May 2010.[13] Mallat St´ephane, “Chapter 9 - approximations in bases,” in

A WaveletTour of Signal Processing (Third Edition) , S. Mallat, Ed., pp. 435 – 480.Academic Press, Boston, third edition edition, 2009.[14] I. Goodfellow, Y. Bengio, and A. Courville,