Marek Smieja
Jagiellonian University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marek Smieja.
IEEE Transactions on Information Theory | 2012
Marek Smieja; Jacek Tabor
Suppose that we are given two sources <i>S</i><sub>1</sub>, <i>S</i><sub>2</sub> which both send us information from the data space <i>X</i> . We assume that we lossy-code information coming from <i>S</i><sub>1</sub> and <i>S</i><sub>2</sub> with the same maximal error but with different alphabets <i>P</i><sub>1</sub> and <i>P</i><sub>2</sub>, respectively. Consider a new source <i>S</i> which sends a signal produced by source <i>S</i><sub>1</sub> with probability <i>a</i><sub>1</sub> and by source <i>S</i><sub>2</sub> with probability <i>a</i><sub>2</sub>=1-<i>a</i><sub>1</sub> . We provide a simple greedy algorithm which constructs a coding alphabet <i>P</i> which encodes data from <i>S</i> with the same value of maximal error as single sources, such that the entropy <i>h</i>(<i>S</i>;<i>P</i>) satisfies: h(S;P) ≤ a<sub>1</sub>h(S<sub>1</sub>;P<sub>1</sub>)+a<sub>2</sub>h(S<sub>2</sub>;P<sub>2</sub>) +1. In the proof of the aforementioned formula, the basic role is played by a new equivalent definition of entropy based on measures instead of partitions. As a consequence, we decompose the entropy dimension of the mixture of sources in terms of the convex combination of the entropy dimensions of the single sources. In the case of probability measures in \BBR<sup>N</sup>, this allows us to link the upper local dimension at point with the upper entropy dimension of a measure by an improved version of Young estimation.
ieee international conference on data science and advanced analytics | 2015
Marek Smieja; Jacek Tabor
Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards approach. The algorithm finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, it is scale invariant and allows for forming of spherically-shaped clusters of arbitrary sizes. In order to graphically represent and interpret the results the notion of Voronoi diagram was generalized to non Euclidean spaces and applied for introduced clustering method.
science and information conference | 2014
Marek Smieja; Jacek Tabor
Rényi entropy dimension describes the rate of growth of coding cost in the process of lossy data compression in the case of exponential dependence between the code length and the cost of coding. In this paper we generalize the Csiszár estimation of the Rényi entropy dimension of the mixture of measures for the case of general probability metric space. This result determines the cost of encoding of the information which comes from the combined sources assuming its exponential growth. Our proof relies on an equivalent definition of the Rényi entropy in weighted form which allows to deal well with a calculation of the entropy of the mixture of measures.
international joint conference on neural network | 2016
Marek Smieja; Szymon Nakoneczny; Jacek Tabor
We introduce Sparse Entropy Clustering (SEC) which uses minimum entropy criterion to split high dimensional binary vectors into groups. The idea is based on the analogy between clustering and data compression: every group is reflected by a single encoder which provides its optimal compression. Following the Minimum Description Length Principle the clustering criterion function includes the cost of encoding the elements within clusters as well as the cost of clusters identification. Proposed model is adopted to the sparse structure of data - instead of encoding all coordinates, only non-zero ones are remembered which significantly reduces the computational cost of data processing. Our theoretical and experimental analysis proves that SEC works well with imbalance data, minimizes the average entropy within clusters and is able to select the correct number of clusters.
arXiv: Learning | 2016
Lukasz Struski; Marek Smieja; Jacek Tabor
arXiv: Information Theory | 2012
Marek Smieja; Jacek Tabor
arXiv: Information Theory | 2012
Marek Smieja; Jacek Tabor
arXiv: Computer Vision and Pattern Recognition | 2018
Bartosz Zieliński; Lukasz Struski; Marek Smieja; Jacek Tabor
Archive | 2017
Lukasz Struski; Marek Smieja; Jacek Tabor
Archive | 2017
Marek Smieja; Krzysztof Hajto; Jacek Tabor