Marek Smieja | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marek Smieja is active.

Explore More

Publication

Featured researches published by Marek Smieja.

IEEE Transactions on Information Theory | 2012

Entropy of the Mixture of Sources and Entropy Dimension

Marek Smieja; Jacek Tabor

Suppose that we are given two sources S1, S2 which both send us information from the data space X . We assume that we lossy-code information coming from S1 and S2 with the same maximal error but with different alphabets P1 and P2, respectively. Consider a new source S which sends a signal produced by source S1 with probability a1 and by source S2 with probability a2=1-a1 . We provide a simple greedy algorithm which constructs a coding alphabet P which encodes data from S with the same value of maximal error as single sources, such that the entropy h(S;P) satisfies: h(S;P) ≤ a1h(S1;P1)+a2h(S2;P2) +1. In the proof of the aforementioned formula, the basic role is played by a new equivalent definition of entropy based on measures instead of partitions. As a consequence, we decompose the entropy dimension of the mixture of sources in terms of the convex combination of the entropy dimensions of the single sources. In the case of probability measures in \BBRN, this allows us to link the upper local dimension at point with the upper entropy dimension of a measure by an improved version of Young estimation.

ieee international conference on data science and advanced analytics | 2015

Spherical wards clustering and generalized Voronoi diagrams

Marek Smieja; Jacek Tabor

Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards approach. The algorithm finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, it is scale invariant and allows for forming of spherically-shaped clusters of arbitrary sizes. In order to graphically represent and interpret the results the notion of Voronoi diagram was generalized to non Euclidean spaces and applied for introduced clustering method.

science and information conference | 2014

Rényi entropy dimension of the mixture of measures

Marek Smieja; Jacek Tabor

Rényi entropy dimension describes the rate of growth of coding cost in the process of lossy data compression in the case of exponential dependence between the code length and the cost of coding. In this paper we generalize the Csiszár estimation of the Rényi entropy dimension of the mixture of measures for the case of general probability metric space. This result determines the cost of encoding of the information which comes from the combined sources assuming its exponential growth. Our proof relies on an equivalent definition of the Rényi entropy in weighted form which allows to deal well with a calculation of the entropy of the mixture of measures.

international joint conference on neural network | 2016

Fast Entropy Clustering of sparse high dimensional binary data.

Marek Smieja; Szymon Nakoneczny; Jacek Tabor

We introduce Sparse Entropy Clustering (SEC) which uses minimum entropy criterion to split high dimensional binary vectors into groups. The idea is based on the analogy between clustering and data compression: every group is reflected by a single encoder which provides its optimal compression. Following the Minimum Description Length Principle the clustering criterion function includes the cost of encoding the elements within clusters as well as the cost of clusters identification. Proposed model is adopted to the sparse structure of data - instead of encoding all coordinates, only non-zero ones are remembered which significantly reduces the computational cost of data processing. Our theoretical and experimental analysis proves that SEC works well with imbalance data, minimizes the average entropy within clusters and is able to select the correct number of clusters.

arXiv: Learning | 2016