Przemysław Skibiński
University of Wrocław
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Przemysław Skibiński.
advances in databases and information systems | 2007
Przemysław Skibiński; Jakub Swacha
This paper describes a new XML compression scheme that offers both high compression ratios and short query response time. Its core is a fully reversible transform featuring substitution of every word in an XML document using a semi-dynamic dictionary, effective encoding of dictionary indices, as well as numbers, dates and times found in the document, and grouping data within the same structural context in individual containers. The results of conducted tests show that the proposed scheme attains compression ratios rivaling the best available algorithms, and fast compression, decompression, and query processing.
data compression conference | 2004
Przemysław Skibiński; Szymon Grabowski
This paper presents a PPM variation which combines traditional character based processing with string matching. Such an approach can effectively handle repetitive data and can be used with practically any algorithm from the PPM family. The algorithm, inspired by its predecessors, PPM/sup */ and PPMZ, searches for matching sequences in arbitrarily long, variable-length, deterministic contexts. The experimental results show that the proposed technique may be very useful, especially in combination with relatively low order (up to 8) models, where the compression gains are often significant and the additional memory requirements are moderate.
international conference on experience of designing and applications of cad systems in microelectronics | 2007
Przemysław Skibiński; Szymon Grabowski; Jakub Swacha
The main drawback of the XML format seems to be its verbosity, a key problem especially in case of large documents. Therefore, efficient encoding of XML constitutes an important research issue. In this work, we describe a preprocessing transform meant to be used with popular LZ77-style compressors. We show experimentally that our transform, albeit quite simple, leads to better compression ratios than existing XML-aware compressors. Moreover, it offers high decoding speed, which often is of utmost priority.
Information Sciences | 2006
Przemysław Skibiński
In the following paper we propose modification of Prediction by Partial Matching (PPM)-a lossless data compression algorithm, which extends an alphabet, used in the PPM method, to long repeated strings. Usually the PPM algorithms alphabet consists of 256 characters only. We show, on the basis of the Calgary corpus [T.C. Bell, J. Cleary, I.H. Witten, Text compression. Advanced Reference Series, Prentice Hall, Englewood Cliffs, New Jersey, 1990], that for ordinary files such a modification improves the compression performance in lower, but not greater than 10, orders. However, for some kind of files, this modification gives much better compression performance than any known lossless data compression algorithm.
conference on current trends in theory and practice of informatics | 2008
Przemysław Skibiński; Jakub Swacha; Szymon Grabowski
Contemporary XML documents can be tens of megabytes long, and reducing their size, thus allowing to transfer them faster, poses a significant advantage for their users. In this paper, we describe a new XML compression scheme which outperforms the previous state-of-the-art algorithm, SCMPPM, by over 9% on average in compression ratio, having the practical feature of streamlined decompression and being almost twice faster in the decompression. Applying the scheme can significantly reduce transmission time/bandwidth usage for XML documents published on the Web. The proposed scheme is based on a semi-dynamic dictionary of the most frequent words in the document (both in the annotation and contents), automatic detection and compact encoding of numbers and specific patterns (like dates or IP addresses), and a back-end PPM coding variant tailored to efficiently handle long matching sequences. Moreover, we show that the compression ratio can be improved by additional 9% for the price of a significant slow-down.
data compression conference | 2005
Przemysław Skibiński
Summary form only given. The basic idea of preprocessing is to transform the text into some intermediate form which can be used as input of any existing general-purpose compressor and compressed more efficiently. Dictionary-based preprocessing is based on the notion of replacing whole words with shorter codes. We present a dictionary-based preprocessing technique and its implementation called TWRT (two-level word replacing transformation). Our preprocessor uses several dictionaries and divides files into various kinds. The first level dictionaries (small dictionaries) are specific for some kind of data (e.g., programming language, references). The second level dictionaries (large dictionaries) are specific for natural languages (e.g., English, Russian, French). On the Calgary corpus, TWRT improves the compression performance of bzip2 by over 7% and PPMonstr by about 6% on average. Even for the top compressor nowadays, PAQ6, the gain is significant - 5%. On multilingual text files, TWRT improves the compression performance of bzip2, PPMonstr, and PAQ6 by about 8%. Moreover, TWRT improves the compression speed with PAQ6 and on larger files with PPMonstr.
web information systems engineering | 2009
Przemysław Skibiński
The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns. Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing.
Software - Practice and Experience | 2008
Przemysław Skibiński; Szymon Grabowski; Jakub Swacha
advances in databases and information systems | 2007
Przemysław Skibiński; Jakub Swacha
Information Technology and Libraries | 2009
Przemysław Skibiński; Jakub Swacha