Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marek Smieja is active.

Publication


Featured researches published by Marek Smieja.


IEEE Transactions on Information Theory | 2012

Entropy of the Mixture of Sources and Entropy Dimension

Marek Smieja; Jacek Tabor

Suppose that we are given two sources <i>S</i><sub>1</sub>, <i>S</i><sub>2</sub> which both send us information from the data space <i>X</i> . We assume that we lossy-code information coming from <i>S</i><sub>1</sub> and <i>S</i><sub>2</sub> with the same maximal error but with different alphabets <i>P</i><sub>1</sub> and <i>P</i><sub>2</sub>, respectively. Consider a new source <i>S</i> which sends a signal produced by source <i>S</i><sub>1</sub> with probability <i>a</i><sub>1</sub> and by source <i>S</i><sub>2</sub> with probability <i>a</i><sub>2</sub>=1-<i>a</i><sub>1</sub> . We provide a simple greedy algorithm which constructs a coding alphabet <i>P</i> which encodes data from <i>S</i> with the same value of maximal error as single sources, such that the entropy <i>h</i>(<i>S</i>;<i>P</i>) satisfies: h(S;P) ≤ a<sub>1</sub>h(S<sub>1</sub>;P<sub>1</sub>)+a<sub>2</sub>h(S<sub>2</sub>;P<sub>2</sub>) +1. In the proof of the aforementioned formula, the basic role is played by a new equivalent definition of entropy based on measures instead of partitions. As a consequence, we decompose the entropy dimension of the mixture of sources in terms of the convex combination of the entropy dimensions of the single sources. In the case of probability measures in \BBR<sup>N</sup>, this allows us to link the upper local dimension at point with the upper entropy dimension of a measure by an improved version of Young estimation.


ieee international conference on data science and advanced analytics | 2015

Spherical wards clustering and generalized Voronoi diagrams

Marek Smieja; Jacek Tabor

Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards approach. The algorithm finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, it is scale invariant and allows for forming of spherically-shaped clusters of arbitrary sizes. In order to graphically represent and interpret the results the notion of Voronoi diagram was generalized to non Euclidean spaces and applied for introduced clustering method.


science and information conference | 2014

Rényi entropy dimension of the mixture of measures

Marek Smieja; Jacek Tabor

Rényi entropy dimension describes the rate of growth of coding cost in the process of lossy data compression in the case of exponential dependence between the code length and the cost of coding. In this paper we generalize the Csiszár estimation of the Rényi entropy dimension of the mixture of measures for the case of general probability metric space. This result determines the cost of encoding of the information which comes from the combined sources assuming its exponential growth. Our proof relies on an equivalent definition of the Rényi entropy in weighted form which allows to deal well with a calculation of the entropy of the mixture of measures.


international joint conference on neural network | 2016

Fast Entropy Clustering of sparse high dimensional binary data.

Marek Smieja; Szymon Nakoneczny; Jacek Tabor

We introduce Sparse Entropy Clustering (SEC) which uses minimum entropy criterion to split high dimensional binary vectors into groups. The idea is based on the analogy between clustering and data compression: every group is reflected by a single encoder which provides its optimal compression. Following the Minimum Description Length Principle the clustering criterion function includes the cost of encoding the elements within clusters as well as the cost of clusters identification. Proposed model is adopted to the sparse structure of data - instead of encoding all coordinates, only non-zero ones are remembered which significantly reduces the computational cost of data processing. Our theoretical and experimental analysis proves that SEC works well with imbalance data, minimizes the average entropy within clusters and is able to select the correct number of clusters.


arXiv: Learning | 2016

Incomplete data representation for SVM classification.

Lukasz Struski; Marek Smieja; Jacek Tabor


arXiv: Information Theory | 2012

Partition Reduction for Lossy Data Compression Problem

Marek Smieja; Jacek Tabor


arXiv: Information Theory | 2012

Weighted Approach to Rényi Entropy

Marek Smieja; Jacek Tabor


arXiv: Computer Vision and Pattern Recognition | 2018

Cascade context encoder for improved inpainting.

Bartosz Zieliński; Lukasz Struski; Marek Smieja; Jacek Tabor


Archive | 2017

Pointed subspace approach to incomplete data.

Lukasz Struski; Marek Smieja; Jacek Tabor


Archive | 2017

Efficient mixture model for clustering of sparse high dimensional binary data.

Marek Smieja; Krzysztof Hajto; Jacek Tabor

Collaboration


Dive into the Marek Smieja's collaboration.

Top Co-Authors

Avatar

Jacek Tabor

Jagiellonian University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge