A study for Image compression using Re-Pair algorithm
Pasquale De Luca, Vincenzo Maria Russiello, Raffaele Ciro Sannino, Lorenzo Valente
aa r X i v : . [ c s . MM ] F e b A study for Image compression using Re-Pair algorithm
Pasquale De Luca a, ∗ , Vincenzo Maria Russiello b , Raffaele Ciro Sannino b ,Lorenzo Valente b a University of Naples ”Parthenope”, Department of Science and Technologies, CentroDirezionale C4, Napoli I-80143 b University of Salerno, Department of Computer Science, via Giovanni Paolo II,Fisciano I-84084
Abstract
The compression is an important topic in computer science which allows weto storage more amount of data on our data storage. There are several tech-niques to compress any file. In this manuscript will be described the mostimportant algorithm to compress images such as JPEG and it will be com-pared with another method to retrieve good reason to not use this methodon images. So to compress the text the most encoding technique knownis the Huffman Encoding which it will be explained in exhaustive way. Inthis manuscript will showed how to compute a text compression methodon images in particular the method and the reason to choice a determinateimage format against the other. The method studied and analyzed in thismanuscript is the Re-Pair algorithm which is purely for grammatical con-text to be compress. At the and it will be showed the good result of thisapplication.
Keywords:
Image Compression, Re-Pair, Compression, C++ Compression,BMP
1. Introduction
This manuscript is a deep study so it is possible execute and use a par-ticular compression algorithm on images. Aside the canonical compressionalgorithm for images such as JPEG[14] and PNG, this work has the aim to ∗ Corresponding Author
Email address: [email protected] (Pasquale De Luca)
Preprint submitted to Journal of Artificial Intelligence and Soft Computing Research February 14, 2019 how how to apply the
Re-Pair algorithm, purely for text compression, onimages. We have analyzed in literature if is present a similar or related workbut the search has worst results so there are mainly paper and works on Re-Pair applied on the text. We would show how is possible execute the latteralgorithm on images using methods and techniques to generate a good inputfor the Re-Pair algorithm which is purely for text segmentation justified andstructured according to canonical grammar rules. It is done a brief analysisof the problem about the compression of images using several and canonicalalgorithm indeed after we will describe the method used to compute in effi-cient way this technique so that for future works can be used, in particulargreat importance it will be given to compression method for images to com-pare the different techniques to reveal which method is the better so that itwill be a correct choice to have a very efficient result computed by operationof the
Re-Pair algorithm.
2. Methods
In this section we will show the method can compute
Re-Pair algorithmon images, in particular the techniques used to convert an any image fromany graphical format in
BitMap format, after it will explain why it usedthis format. We can classify compression algorithm in two groups or rather lossless and lossy [1].
Lossless compression algorithms are computing compression methods thatallow the original data to be perfectly reconstructed from the compresseddata, briefly it is possible compress an image without loss graphical and datainformation. Lossless algorithm such as
Huffman coding [4], which belongsto
Entropy Encoding subfamily is a most used compression method onwhich based a lot of compression algorithm in particular JPEG [2] whereit is possible compress an image opening it in binary mode and reading asingle byte like ASCII symbol and after apply Huffman Encoding to generatea compression version of raw image. Other algorithms that belong to lossless family are: • PNG; • TIFF; 2
TGA; • BPG.Also there are other methods we cite the most importants used, indeed wefound other techniques so
RLE [3] and Chain Codes [5] , aside the latterwhich used on monochromatic images.
This kind of methods have as aim to compress an image losing the sev-eral information starting from reducing the color space like chroma sub-sampling method and any
Transform coding which belongs an importanttransform such as
Discrete Cosine Transform [6].Following it is showed a figure that represent various applications of aDCT on image:
Figure 1: DCT compression step
In Figure 1 it is possible see the differences of compressed images, thefirst image is the raw, the second has a 8 coefficients of compression andthe last image has 64 coefficients of compression, more coefficients increasethe percentage of compression and automatically increments the losing ofinformation. Overtime are combined these two family but with minimal finalresult [7]. Any method explained till now has been developed in particularfor images, after we introduce how to configure and use Re-Pair algorithmon images [8]. 3 .3. Re-Pair on Images
Re-Pair is an efficient grammar compressor that operates by recursivelyreplacing high-frequency character pairs with new grammar symbols the mainapproach is to replace the most frequently occurring pairs first, and for eachpair add a dictionary entry mapping the new symbol to the replaced pair[10].Briefly we describe the step of the algorithm, the first step compute the sequence array which is an array structure where each entry consists of asymbol value and two pointers which are used to create doubly linked listsbetween sequences of identical pairs. The second step consists to build a active pairs table . This is a hash table from a pair of symbols to a pair record,which is a collection of information about that specific pair.The Re-Pair algorithm starts counting the pairs in input using a flag toindicate that a pair is seen once, and if the pair is encountered again a pairrecord is created.Mainly the Re-Pair algorithm works on the text in particular on correctlygrammar based text for this reason is correct to say something about the
Context-Free Grammar , now
CFG .For definition:A context-free grammar is a collection of rules of the form: A = x , x , x , ..., x k where x ,x ,x ,...,x k are either terminal symbols (letters in thealphabet) or symbols that appear on the left-hand side of somerule.To execute the Re-Pair algorithm on image we must convert any imagesto BMP Format why the common format such as PNG and JPEG and otheralready compressed format does not make it perform its compression insteada BMP or rather Bitmap Image File is a simple format where there isnot a high level of compression indeed its size on hard disk are greater thanany images of different format. We tried to execute Re-Pair algorithm onJPEG and PNG images but the result was been very bad which are showedin following table: 4ame file Type image input Size (Kb) Size output (Kb)hello JPEG 67 72test1 JPEG 192 388test1 PNG 86 96test2 PNG 155 227
Table 1: Execution of Re-Pair algorithm on different images format
It is clear that can not possible use already compressed images why thereis confusion and redundancy into compressed images.
The main step to can apply this technique of compression is to convert anyimages file to BMP format so it is a simple format with low rate compressionand this one allow to compression algorithm to be efficient so the redundancyin bitmap picture favors a better compression [11].
Also this step is optional but to have an efficient compression with goodresults we recommend to covert the bitmap image to ASCII file using a simplemethod with any programming language, to open the image file in binary [12] mode and read chunk of octet bit, after convert the octet of bit in decimalnumber such as this table shows the conversion of bit to decimal to ASCII[13] code: Letter ASCII Code Binarya 97 01100001b 98 01100010c 99 01100011d 100 01100100
Table 2: ASCII Table example
In Table 2 is possible to see the Binary code, of the first four number ofASCII table, represented by octet of bit then transform to decimal numberis possible associate the relative ASCII code using cast of value.In 2 is possible view the results after this conversions.5 . Results
In this section will be described briefly the result obtained after the con-verting of the images using the previous method. Our techniques consiststo reading the BMP image using zig-zag reading techniques. In this waythere are more possibility of an efficient result. The following table showsthe result of Re-Pair algorithm executed on BMP images:Name Size (Mb) Size out (Mb)hello 4,4 1,5Ray 1,1 0,4Lena 2,1 0,9binary ex 1,0 0,3
Table 3: Re-Pair algorithm on BMP images
In the table 3 it is possible to note that the rate compression is around the70 percentage, in particular on monochromatic images why the redundancyof chroma pixel during the conversion to ASCII generates same characterswhich are the best input for the Re-Pair algorithm.
4. Discussion
We can note the good result in previous section indeed the it is a proposalhow method to compress and hide images file using a grammar compressionmethod. The reasoning to convert any image file to BMP and after to ASCIIallows to
Re-Pair to have good final results but there is minimal problemon linear reading of the file why a linear reading can make problem withdetecting of the pairs in text, following the structure of algorithm. A newtechnique to add to these step are to split the image in three different imageeach for any colour channel or rather
RGB . After to apply the Re-Pair onthese file using a
Discrete Cosine Transform reading so to have a betterclassification to accommodate the redundancy in the start file.