RG Smoothing Algorithm Which Makes Data Compression
RRG Smo othingAlgorithm Which MakesData Compression
Anna Sinelnikova
Uppsala University, Sweden I would like to present a new method ofsmoothing of one-dimensional curvesin arbitrary number of dimensions.The basic idea is borrowed from renormaliza-tion group (RG) theory which was applied tobiological macromolecules [1]. In general RGtheory is used for rescaling of a system. Youcombine the elements into blocks and thentreat the blocks the same way as you did withthe elements. There are two important fea-tures that follow from this, that are inheritedby our smoothing algorithm: • reduction of the number of elements • recursive implementationThese two points distinguish RG smoothingfrom other methods and make it unique.The first feature means that the smoothingreduces the amount of data. This propertyseems very natural if we think about whatactually any smoothing method does. Obvi-ously, it makes the curve or surface smoother.In a more formal language we can say thatsmoothing decreases differential curvature ofthe curve or surface.But let’s now think about the problemfrom a higher level: form a perspective ofthe amount of information. There is someinformation which we call data from where we want to extract the information about the“ideal data” which can be for example a signalcleaned up from noise.In any case the point is that the “useless”information is removed and only the impor-tant one remains. So any smoothing leads tothe loss of at least some information. Thequestion is if it actually leads to the reduc-tion of the amount of binary data. In ouralgorithm it does.The feature of the recursive implementa-tion makes the algorithm simpler. There isonly a small instruction which should be re-peated over and over again. In the simplestform, which I will describe in this article, theRG method has the only one tuning param-eter: number of recursive steps. This givesgreat opportunities for building hardware forthe algorithm, because each recursive loop isnot complicated and strictly defined. METHOD
RG smoothing algorithm consist of twocrucial steps:1 construction of chain from the datapoints2 recursive rescaling procedure for thatchain
May 2018 − − a r X i v : . [ c s . NA ] J un he RG theory is used in the second step.So I will start from explaining what is therescaling procedure for a given chain. RESCALING PROCEDURE
We assume that we have a chain of rigidsegments and want to apply RG theory to it.As that theory says to us, we should combineelements into blocks . Exactly this process Iwill call rescaling .The simplest way of rescaling of the chainone can think about is to connect every othersite as is shown on the left part of Figure 1with red line, where it is applied to the origi-nal chain which consists of dark bold arrows.We could also go for every third site or evenmore if our chain would be long enough. Theparameter which is responsible for this choiceI will call the scaling parameter s . More strictit can be defined like this : scaling parameter s is equal to number of old segments combinedinto a new one. In the case which is shown inFigure 1, s = 2: each new segment connectstwo old segments.The second part of the “definition” for RGsays that we should treat the block in the sameway as the elements . And this is what recur-sive stands for. Now we should treat the newchain in the same way as the original one andrescale it again. This is how we come to theyellow chain (or in our case just one segment)in Figure 1.You can see that the number of elementsis reduced during the procedure: from 7 seg-ments in the original chain we come to onesegment of the yellow chain, through 3 seg-ments of the red chain. We could continuethe rescaling procedure if the original chainwere longer.Here it is important to notice that we willloose one segment of the chain every timewhen number of segments is odd. During thewhole procedure this will unavoidably happenif the initial number of monomers was not apower of 2, as one can easily check.The RG theory is a complicated mathe-matical concept with a lot of nuances, whichwe cannot mention in this article. What is important for us is that the RG theory forchains requires the scaling parameter s to beless than 2, or even very close to 1. The de-tails one can find in the book [2].In other words the scaling parametershould lie in the interval 1 < s ≤
2, and thuscannot be an integer number any longer. Thecase of non-integer scaling parameter is com-plicated and allows more than one solution.I will use one of those solutions which Ithink is the best in our case. It is presentedin the right plot of Figure 1 for scaling param-eter s = 1 + 1 /
3. The idea is that every timewe should connect s old segments. When s was equal to 2 we connected 2 old segmentswhich means that the new chain consists ofeach second vertex of old chain. Now withnon-integer s we will not always hit a vertex,but sometimes hit segment itself. To explainthis idea better I will use vectors.Let’s denote every original bold dark arrowin Figure 1 as (cid:126)t , (cid:126)t , ..., (cid:126)t . At the first step ofrescaling we want to obtain red chain whichwe will mark as (cid:126)t , (cid:126)t , ... So, for our chose of s = 1 + 1 /
3, we can write down according toit’s definition : (cid:126)t = (cid:126)t + 13 (cid:126)t (cid:126)t = 23 (cid:126)t + 23 (cid:126)t (1) (cid:126)t = 13 (cid:126)t + (cid:126)t ... And the same for the second recursive step,but with respect to the new red chain ): (cid:126)t = (cid:126)t + 13 (cid:126)t (cid:126)t = 23 (cid:126)t + 23 (cid:126)t (2) (cid:126)t = 13 (cid:126)t + (cid:126)t ... The new chain is marked with yellow color inthe same Figure 1. The maximum number ofscaling steps in this case is 3, as one can seeit the plot.Notice, that we loose again the right end,even after the first step.
May 2018 − − nitial chainthe first iteration stepthe second iteration step s = 2 s = 43 Figure 1:
Demonstration of the renormalization procedure. The dark bold arrows connectsthe original data points. The red thinner lines are result of the first rescaling step. The thinnestyellow line is the result of the second iteration step. Left figure: the scaling parameter is equalto 2; Right figure: the scaling parameter is equal to 4/3.There are five important points we shoulddraw into conclusions:1 the procedure works in Euclidean spacewith arbitrary number of dimensioned .2 the process is recursive, i.e. it consistsof repeated steps.;3 in each step we reduce the number ofdata points;4 we can loose the endings of a chain dur-ing the procedure;5 the procedure does not treat the ends ofthe chain in the same way.The first three items in the list are con-sidered as advantages. We have never men-tioned the first advantage before, but indeed,we have never refered to the number of dimen-sions of the space we consider. We workedin plane, i.e with 2D space, but obviouslythe same algorithm will work for 3D space.And since the equations 1 and 2 operate onlywith vectors, then the only requirement forthe space is to have positively defined norm.Number (4) and number (5) in the list aredownsides. Loosing ends obviously leads toloss of information, which can be crucial forsome applications. Number (5) is a bad thing, because the direction of moving along thechain was introduced artificially, there shouldbe no distinction between two directions.We will fix both disadvantages in the finalalgorithm which I am going to present in thenext section.
CONSTRUCTING THE CHAIN
Now let’s come back to the original prob-lem of smoothing. Assume that we have a setof data points which we want to operate with.It can be signal with some noise or a trajec-tory in two- or three-dimensional space. Touse the method I have just described, first weshould construct a chain from our data. Thisis equivalent to enumerate all the data pointsand connect them in increasing or decreasingorder. The points can be enumerated in differ-ent ways, it is important that the algorithmwe present, will work even if there are selfcrossings and loops in the final chain. How-ever, it can affect the efficiency of smoothing.
ALGORITHM
Basically we do exactly what is writtenin the beginning of the previous section, butwith an additional trick. Let me do a smallremark about the notation we use: now I
May 2018 − − ill call the recursive procedure as an iter-ative one, because during the smoothing weare approaching the solution more and moreaccurate with every step.As we know now, during the procedure wecan loose the last data points and thus the in-formation about the end of the data set. Wewant to avoid this.As one could have noted from Figure 1 for s = 4 / N = 4instead of N = 7. It happens because: Ns = 4 = 3 = integer number . Then on the one hand we want s always tosatisfy this condition which will lead to de-pendence of s on iteration step. And on theother hand we have RG requirement for s tobe close to 1.Thus we will require new chain to beone segment shorter that the old one. Inother words, let’s decrease the number of datapoints by one in each iteration step. Thisgives the equation for scaling parameter s asa function of iteration step number p : s optp = N p N p +1 = N p N p − , (3)where N p is the number of segments at the p -th iteration step. Now we found the smallest s which provides the integer number of seg-ments in a new chain without rounding.By finding the scaling parameter that waywe solved both problems from the previoussection as: • no cut off of data set; • the procedure treats the beginning andthe end of the data set in the same way.In addition to that we also removed thescaling parameter from being a parameter ofthe algorithm • the scaling parameter s is strictly de-fined at every step, not a degree of free-dom of the algorithm. Summarizing this section, we can presentthe final set of instructions for RG smoothing: algorithm s using eq. 3;4 rescale the chain, i.e. dosimilar transformations asin eq. 1 and eq. 2 but forchosen s ;5 repeat steps (3) and (4) ifneeded. The only freedom left (after fixing thechain) is the number of iteration steps. Sothe number of repetitions (item 5 in the al-gorithm) is the only tuning parameter of themethod.
EXAMPLE
In this part I will show how the algorithmworks for an arbitrary scatter plot example.I generated 101 data points with regu-lar x-grid. The result of smoothing withRG method in shown on Figure 2 with darkcurves. “Step” is a number of iterations wedid. So for the first plot we applied our algo-rithm only once.The reduction of the number of data pointscan be expressed as a compression ratio whichis marked as “c.r.” in the Figure and definedas: c.r. = (cid:18) − N p + 1 N + 1 (cid:19) × , where N + 1 is the number of data points inthe original data, and N p + 1 is the number ofpoints after the p -th step of smoothing. May 2018 − − igure 1: Smoothing example.
May 4, 2018 Figure 2:
Smoothing example.If you take a look at the plots with 50 and80 steps in the figure you can notice the differ-ent treatment of the end points. Since we fixthe first and the last point, they are includedto the final chain without any corrections. Inother words the errors in the first and the lastpoints remain untouched.Another feature which should be noticedcan be seen in the last two figures: for 80 and95 steps. In the last plot the x coordinates forour new smoothed points are: 0, 5, 10, 15, 20.It means that the property of equidistanty ofthe original data points is remained . Indeed, ifwe look at our transformations eq. (1, 2) thenit is obvious that we can write those transfor-mation for x components ans y componentsindependently. The algorithm perform rescal-ing with parameter s in all directions along basis vectors independently. CONCLUSION
A novel algorithm for smoothing was pre-sented. The idea behind the method is quitesimple, however it is based on solid mathe-matical formulation of renormalization grouptheory. From that theory the algorithm gotnot only the name (RG smoothing), but veryimportant features: data compression and re-cursive nature.It leads to lots of advantages of RGsmoothing algorithm and here I only list someof them together with possible implementa-tions: • can be used for flow-data analysis, May 2018 − − here the question of data reduction iscrucial; • can be used for trajectory smoothing,because it can handle non-regular gridsand self crossings; • does not depend on the origin and char-acteristics of the noise; • the algorithm is identical for any num-ber of dimensions; • can be used for image contour compres-sion; • can be implemented directly in hard-ware, since the algorithm is iterativeand each step is simple; • has only one tuning parameter - thenumber of iterative steps; • the algorithm is non-parametric: num-ber of parameters does not grow withthr size of the system; • if the grid was regular initially then thisproperty is conserved; • can be used for contour recognition.The downside which still should be im-proved is how to handle the ends. FOR FEEDBACK
This is a preprint that is work in progress.Since my own background is in physics, I aminterested in feedback and potential collabo-ration on the article itself with people that areknowledgable in fields such as signal process-ing, pattern recognition, etc.If you have any ideas about possible appli-cations and improvements you are very wel-come to contact me.E-mail: [email protected]
REFERENCES [1] A. Sinelnikova, A. J. Niemi, M. Ulybyshev, Phys. Rev.
E97
Statistical physics of macromolecules (AIP Series in Poly-mers and Complex Materials, Woodbury, 1994)This work was supported by the Swedish research council and the Knut and Alice Wallenbergfoundation through the Wallenberg Academy Fellow grant of J. Nilsson.
May 2018 −6