[PDF] RG Smoothing Algorithm Which Makes Data Compression

Abstract

I describe a new method for smoothing a one-dimensional curve in Euclidian space with an arbitrary number of dimensions. The basic idea is borrowed from renormalization group theory which previously was applied to biological macromolecules. There are two crucial differences from other smoothing methods which make the algorithm unique: data compression and recursive implementation. One of the simplest forms of the method that is described in this article has only one free parameter - the number of iterative steps. This means that hardware implementation should be relatively easy because each loop is simple and strictly defined. The method could be beneficially applied to pattern recognition and data compression in future studies.

Full PDF

RRG Smo othingAlgorithm Which MakesData Compression

Anna Sinelnikova

Uppsala University, Sweden I would like to present a new method ofsmoothing of one-dimensional curvesin arbitrary number of dimensions.The basic idea is borrowed from renormaliza-tion group (RG) theory which was applied tobiological macromolecules [1]. In general RGtheory is used for rescaling of a system. Youcombine the elements into blocks and thentreat the blocks the same way as you did withthe elements. There are two important fea-tures that follow from this, that are inheritedby our smoothing algorithm: • reduction of the number of elements • recursive implementationThese two points distinguish RG smoothingfrom other methods and make it unique.The ﬁrst feature means that the smoothingreduces the amount of data. This propertyseems very natural if we think about whatactually any smoothing method does. Obvi-ously, it makes the curve or surface smoother.In a more formal language we can say thatsmoothing decreases diﬀerential curvature ofthe curve or surface.But let’s now think about the problemfrom a higher level: form a perspective ofthe amount of information. There is someinformation which we call data from where we want to extract the information about the“ideal data” which can be for example a signalcleaned up from noise.In any case the point is that the “useless”information is removed and only the impor-tant one remains. So any smoothing leads tothe loss of at least some information. Thequestion is if it actually leads to the reduc-tion of the amount of binary data. In ouralgorithm it does.The feature of the recursive implementa-tion makes the algorithm simpler. There isonly a small instruction which should be re-peated over and over again. In the simplestform, which I will describe in this article, theRG method has the only one tuning param-eter: number of recursive steps. This givesgreat opportunities for building hardware forthe algorithm, because each recursive loop isnot complicated and strictly deﬁned. METHOD

RG smoothing algorithm consist of twocrucial steps:1 construction of chain from the datapoints2 recursive rescaling procedure for thatchain

May 2018 − − a r X i v : . [ c s . NA ] J un he RG theory is used in the second step.So I will start from explaining what is therescaling procedure for a given chain. RESCALING PROCEDURE

We assume that we have a chain of rigidsegments and want to apply RG theory to it.As that theory says to us, we should combineelements into blocks . Exactly this process Iwill call rescaling .The simplest way of rescaling of the chainone can think about is to connect every othersite as is shown on the left part of Figure 1with red line, where it is applied to the origi-nal chain which consists of dark bold arrows.We could also go for every third site or evenmore if our chain would be long enough. Theparameter which is responsible for this choiceI will call the scaling parameter s . More strictit can be deﬁned like this : scaling parameter s is equal to number of old segments combinedinto a new one. In the case which is shown inFigure 1, s = 2: each new segment connectstwo old segments.The second part of the “deﬁnition” for RGsays that we should treat the block in the sameway as the elements . And this is what recur-sive stands for. Now we should treat the newchain in the same way as the original one andrescale it again. This is how we come to theyellow chain (or in our case just one segment)in Figure 1.You can see that the number of elementsis reduced during the procedure: from 7 seg-ments in the original chain we come to onesegment of the yellow chain, through 3 seg-ments of the red chain. We could continuethe rescaling procedure if the original chainwere longer.Here it is important to notice that we willloose one segment of the chain every timewhen number of segments is odd. During thewhole procedure this will unavoidably happenif the initial number of monomers was not apower of 2, as one can easily check.The RG theory is a complicated mathe-matical concept with a lot of nuances, whichwe cannot mention in this article. What is important for us is that the RG theory forchains requires the scaling parameter s to beless than 2, or even very close to 1. The de-tails one can ﬁnd in the book [2].In other words the scaling parametershould lie in the interval 1 < s ≤

2, and thuscannot be an integer number any longer. Thecase of non-integer scaling parameter is com-plicated and allows more than one solution.I will use one of those solutions which Ithink is the best in our case. It is presentedin the right plot of Figure 1 for scaling param-eter s = 1 + 1 /

3. The idea is that every timewe should connect s old segments. When s was equal to 2 we connected 2 old segmentswhich means that the new chain consists ofeach second vertex of old chain. Now withnon-integer s we will not always hit a vertex,but sometimes hit segment itself. To explainthis idea better I will use vectors.Let’s denote every original bold dark arrowin Figure 1 as (cid:126)t , (cid:126)t , ..., (cid:126)t . At the ﬁrst step ofrescaling we want to obtain red chain whichwe will mark as (cid:126)t , (cid:126)t , ... So, for our chose of s = 1 + 1 /

3, we can write down according toit’s deﬁnition : (cid:126)t = (cid:126)t + 13 (cid:126)t (cid:126)t = 23 (cid:126)t + 23 (cid:126)t (1) (cid:126)t = 13 (cid:126)t + (cid:126)t ... And the same for the second recursive step,but with respect to the new red chain ): (cid:126)t = (cid:126)t + 13 (cid:126)t (cid:126)t = 23 (cid:126)t + 23 (cid:126)t (2) (cid:126)t = 13 (cid:126)t + (cid:126)t ... The new chain is marked with yellow color inthe same Figure 1. The maximum number ofscaling steps in this case is 3, as one can seeit the plot.Notice, that we loose again the right end,even after the ﬁrst step.

May 2018 − − nitial chainthe first iteration stepthe second iteration step s = 2 s = 43 Figure 1:

Demonstration of the renormalization procedure. The dark bold arrows connectsthe original data points. The red thinner lines are result of the ﬁrst rescaling step. The thinnestyellow line is the result of the second iteration step. Left ﬁgure: the scaling parameter is equalto 2; Right ﬁgure: the scaling parameter is equal to 4/3.There are ﬁve important points we shoulddraw into conclusions:1 the procedure works in Euclidean spacewith arbitrary number of dimensioned .2 the process is recursive, i.e. it consistsof repeated steps.;3 in each step we reduce the number ofdata points;4 we can loose the endings of a chain dur-ing the procedure;5 the procedure does not treat the ends ofthe chain in the same way.The ﬁrst three items in the list are con-sidered as advantages. We have never men-tioned the ﬁrst advantage before, but indeed,we have never refered to the number of dimen-sions of the space we consider. We workedin plane, i.e with 2D space, but obviouslythe same algorithm will work for 3D space.And since the equations 1 and 2 operate onlywith vectors, then the only requirement forthe space is to have positively deﬁned norm.Number (4) and number (5) in the list aredownsides. Loosing ends obviously leads toloss of information, which can be crucial forsome applications. Number (5) is a bad thing, because the direction of moving along thechain was introduced artiﬁcially, there shouldbe no distinction between two directions.We will ﬁx both disadvantages in the ﬁnalalgorithm which I am going to present in thenext section.

CONSTRUCTING THE CHAIN

Now let’s come back to the original prob-lem of smoothing. Assume that we have a setof data points which we want to operate with.It can be signal with some noise or a trajec-tory in two- or three-dimensional space. Touse the method I have just described, ﬁrst weshould construct a chain from our data. Thisis equivalent to enumerate all the data pointsand connect them in increasing or decreasingorder. The points can be enumerated in diﬀer-ent ways, it is important that the algorithmwe present, will work even if there are selfcrossings and loops in the ﬁnal chain. How-ever, it can aﬀect the eﬃciency of smoothing.

ALGORITHM

Basically we do exactly what is writtenin the beginning of the previous section, butwith an additional trick. Let me do a smallremark about the notation we use: now I

May 2018 − − ill call the recursive procedure as an iter-ative one, because during the smoothing weare approaching the solution more and moreaccurate with every step.As we know now, during the procedure wecan loose the last data points and thus the in-formation about the end of the data set. Wewant to avoid this.As one could have noted from Figure 1 for s = 4 / N = 4instead of N = 7. It happens because: Ns = 4 = 3 = integer number . Then on the one hand we want s always tosatisfy this condition which will lead to de-pendence of s on iteration step. And on theother hand we have RG requirement for s tobe close to 1.Thus we will require new chain to beone segment shorter that the old one. Inother words, let’s decrease the number of datapoints by one in each iteration step. Thisgives the equation for scaling parameter s asa function of iteration step number p : s optp = N p N p +1 = N p N p − , (3)where N p is the number of segments at the p -th iteration step. Now we found the smallest s which provides the integer number of seg-ments in a new chain without rounding.By ﬁnding the scaling parameter that waywe solved both problems from the previoussection as: • no cut oﬀ of data set; • the procedure treats the beginning andthe end of the data set in the same way.In addition to that we also removed thescaling parameter from being a parameter ofthe algorithm • the scaling parameter s is strictly de-ﬁned at every step, not a degree of free-dom of the algorithm. Summarizing this section, we can presentthe ﬁnal set of instructions for RG smoothing: algorithm s using eq. 3;4 rescale the chain, i.e. dosimilar transformations asin eq. 1 and eq. 2 but forchosen s ;5 repeat steps (3) and (4) ifneeded. The only freedom left (after ﬁxing thechain) is the number of iteration steps. Sothe number of repetitions (item 5 in the al-gorithm) is the only tuning parameter of themethod.

EXAMPLE

In this part I will show how the algorithmworks for an arbitrary scatter plot example.I generated 101 data points with regu-lar x-grid. The result of smoothing withRG method in shown on Figure 2 with darkcurves. “Step” is a number of iterations wedid. So for the ﬁrst plot we applied our algo-rithm only once.The reduction of the number of data pointscan be expressed as a compression ratio whichis marked as “c.r.” in the Figure and deﬁnedas: c.r. = (cid:18) − N p + 1 N + 1 (cid:19) × , where N + 1 is the number of data points inthe original data, and N p + 1 is the number ofpoints after the p -th step of smoothing. May 2018 − − igure 1: Smoothing example.

May 4, 2018 Figure 2:

Smoothing example.If you take a look at the plots with 50 and80 steps in the ﬁgure you can notice the diﬀer-ent treatment of the end points. Since we ﬁxthe ﬁrst and the last point, they are includedto the ﬁnal chain without any corrections. Inother words the errors in the ﬁrst and the lastpoints remain untouched.Another feature which should be noticedcan be seen in the last two ﬁgures: for 80 and95 steps. In the last plot the x coordinates forour new smoothed points are: 0, 5, 10, 15, 20.It means that the property of equidistanty ofthe original data points is remained . Indeed, ifwe look at our transformations eq. (1, 2) thenit is obvious that we can write those transfor-mation for x components ans y componentsindependently. The algorithm perform rescal-ing with parameter s in all directions along basis vectors independently. CONCLUSION

A novel algorithm for smoothing was pre-sented. The idea behind the method is quitesimple, however it is based on solid mathe-matical formulation of renormalization grouptheory. From that theory the algorithm gotnot only the name (RG smoothing), but veryimportant features: data compression and re-cursive nature.It leads to lots of advantages of RGsmoothing algorithm and here I only list someof them together with possible implementa-tions: • can be used for ﬂow-data analysis, May 2018 − − here the question of data reduction iscrucial; • can be used for trajectory smoothing,because it can handle non-regular gridsand self crossings; • does not depend on the origin and char-acteristics of the noise; • the algorithm is identical for any num-ber of dimensions; • can be used for image contour compres-sion; • can be implemented directly in hard-ware, since the algorithm is iterativeand each step is simple; • has only one tuning parameter - thenumber of iterative steps; • the algorithm is non-parametric: num-ber of parameters does not grow withthr size of the system; • if the grid was regular initially then thisproperty is conserved; • can be used for contour recognition.The downside which still should be im-proved is how to handle the ends. FOR FEEDBACK

This is a preprint that is work in progress.Since my own background is in physics, I aminterested in feedback and potential collabo-ration on the article itself with people that areknowledgable in ﬁelds such as signal process-ing, pattern recognition, etc.If you have any ideas about possible appli-cations and improvements you are very wel-come to contact me.E-mail: [email protected]

REFERENCES [1] A. Sinelnikova, A. J. Niemi, M. Ulybyshev, Phys. Rev.

E97

Statistical physics of macromolecules (AIP Series in Poly-mers and Complex Materials, Woodbury, 1994)This work was supported by the Swedish research council and the Knut and Alice Wallenbergfoundation through the Wallenberg Academy Fellow grant of J. Nilsson.

May 2018 −6