Amichai Painsky
Tel Aviv University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amichai Painsky.
IEEE Transactions on Information Theory | 2016
Amichai Painsky; Saharon Rosset; Meir Feder
Independent component analysis (ICA) is a statistical method for transforming an observable multidimensional random vector into components that are as statistically independent as possible from each other.Usually the ICA framework assumes a model according to which the observations are generated (such as a linear transformation with additive noise). ICA over finite fields is a special case of ICA in which both the observations and the independent components are over a finite alphabet. In this work we consider a generalization of this framework in which an observation vector is decomposed to its independent components (as much as possible) with no prior assumption on the way it was generated. This generalization is also known as Barlows minimal redundancy representation problem and is considered an open problem. We propose several theorems and show that this NP hard problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear problems. Our contribution provides the first efficient and constructive set of solutions to Barlows problem.The minimal redundancy representation (also known as factorial code) has many applications, mainly in the fields of Neural Networks and Deep Learning. The Binary ICA (BICA) is also shown to have applications in several domains including medical diagnosis, multi-cluster assignment, network tomography and internet resource management. In this work we show this formulation further applies to multiple disciplines in source coding such as predictive coding, distributed source coding and coding of large alphabet sources.
international symposium on information theory | 2014
Amichai Painsky; Saharon Rosset; Meir Feder
Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are as statistically independent as possible from each other. Usually the ICA framework assumes a model according to which the observations are generated (generative function, additive noise). Binary ICA (BICA) is a special case of ICA in which both the observations and the independent components are over the binary field GF(2). In this work we introduce a generalized BICA framework in which an observation vector is decomposed to its independent components (as much as possible) with no prior assumption on the way it was generated. We propose several theorems and show that this NP hard problem can be accurately solved with a branch and bound search tree algorithm, or tightly approximated with a series of linear programs. BICA was shown to have applications in many domains including medical diagnosis, multi-cluster assignment, network tomography and internet resource management. We suggest that BICA also applies in source coding; we argue that instead of generating statistically independent prediction errors, as in predictive coding, an improved encoder shall assemble a vector of observations and apply the generalized BICA on it. This is shown to achieve improved performance at the cost of introducing some time delay (working in batch).
international workshop on machine learning for signal processing | 2016
Amichai Painsky; Saharon Rosset; Meir Feder
Independent Component Analysis (ICA) is a statistical method for transforming an observable multi-dimensional random vector into components that are as statistically independent as possible from each other. The binary ICA (BICA) is a special case of ICA in which both the observations and the independent components are over a binary alphabet. The BICA problem has received a significant amount of attention in the past decade, mostly in the form of algorithmic approaches and heuristic solutions. However, BICA still suffers from a substantial lack of theoretical bounds and efficiency guarantees. In this work we address these concerns, as we introduce novel lower bounds and theoretical properties for the BICA problem, both under linear and non-linear transformations. In addition, we present simple algorithms which apply our methodology and achieve favorable merits, both in terms of their accuracy, and their practically optimal computational complexity.
Journal of Computer Science and Technology | 2014
Amichai Painsky; Saharon Rosset
The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclustering problem (also known as projected clustering) for gene expression, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). We present a novel method to identify these exclusive row biclusters in the spirit of the optimal set cover problem. We present our algorithmic solution as a combination of existing biclustering algorithms and combinatorial auction techniques. Furthermore, we devise an approach for tuning the threshold of our algorithm based on comparison with a null model, inspired by the Gap statistic approach. We demonstrate our approach on both synthetic and real world gene expression data and show its power in identifying large span non-overlapping rows submatrices, while considering their unique nature.
international conference on data mining | 2012
Amichai Painsky; Saharon Rosset
The availability of large microarray data has led to a growing interest in biclustering methods in the past decade. Several algorithms have been proposed to identify subsets of genes and conditions according to different similarity measures and under varying constraints. In this paper we focus on the exclusive row biclusteing problem (also known as projected clustering) for gene expression data sets, in which each row can only be a member of a single bicluster while columns can participate in multiple clusters. This type of biclustering may be adequate, for example, for clustering groups of cancer patients where each patient (row) is expected to be carrying only a single type of cancer, while each cancer type is associated with multiple (and possibly overlapping) genes (columns). In this paper we present a novel method to identify these exclusive row biclusters through a combination of existing biclustering algorithms and combinatorial auction techniques. We devise an approach for tuning the threshold for our algorithm based on comparison to a null model in the spirit of the Gap statistic approach. We demonstrate our approach on both synthetic and real-world gene expression data and show its power in identifying large span non-overlapping rows sub matrices, while considering their unique nature. The Gap statistic approach succeeds in identifying appropriate thresholds in all our examples.
IEEE Transactions on Information Theory | 2017
Amichai Painsky; Saharon Rosset; Meir Feder
Large alphabet source coding is a basic and well-studied problem in data compression. It has many applications, such as compression of natural language text, speech, and images. The classic perception of most commonly used methods is that a source is best described over an alphabet, which is at least as large as the observed alphabet. In this paper, we challenge this approach and introduce a conceptual framework in which a large alphabet source is decomposed into “as statistically independent as possible” components. This decomposition allows us to apply entropy encoding to each component separately, while benefiting from their reduced alphabet size. We show that in many cases, such decomposition results in a sum of marginal entropies which is only slightly greater than the entropy of the source. Our suggested algorithm, based on a generalization of the binary independent component analysis, is applicable for a variety of large alphabet source coding setups. This includes the classical lossless compression, universal compression, and high-dimensional vector quantization. In each of these setups, our suggested approach outperforms most commonly used methods. Moreover, our proposed framework is significantly easier to implement in most of these cases.
international conference on data mining | 2016
Amichai Painsky; Saharon Rosset
Ensemble methods are considered among the state-of-the-art predictive modeling approaches. Applied to modern big data, these methods often require a large number of sub-learners, where the complexity of each learner typically grows with the size of the dataset. This phenomenon results in an increasing demand for storage space, which may be very costly. This problem mostly manifests in a subscriber based environment, where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device). In this work we introduce a novel method for lossless compression of tree-based ensemble methods, focusing on Random Forests. Our suggested method is based on probabilistic modeling of the ensembles trees, followed by model clustering via Bregman divergence. This allows us to find a minimal set of models that provides an accurate description of the trees, and at the same time is small enough to store and maintain. Our compression scheme demonstrates high compression rates on a variety of modern datasets. Importantly, our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2016
Amichai Painsky; Saharon Rosset
In this paper we present an algorithmic approach for fitting isotonic models under convex, yet non-differentiable, loss functions. It is a generalization of the greedy non-regret approach proposed by Luss and Rosset (2014) for differentiable loss functions, taking into account the sub-gradiental extensions required. We prove that our suggested algorithm solves the isotonic modeling problem while maintaining favorable computational and statistical properties. As our suggested algorithm may be used for any nondifferentiable loss function, we focus our interest on isotonic modeling for either regression or two-class classification with appropriate log-likelihood loss and lasso penalty on the fitted values. This combination allows us to maintain the non-parametric nature of isotonic modeling, while controlling model complexity through regularization. We demonstrate the efficiency and usefulness of this approach on both synthetic and real world data. An implementation of our suggested solution is publicly available from the first authors website (https://sites.google.com/site/amichaipainsky/software).
international symposium on information theory | 2013
Amichai Painsky; Saharon Rosset; Meir Feder
Memoryless processes hold many theoretical and practical advantages. They are easy to describe, analyze, store and encrypt. They can also be seen as the essence of a family of regression processes, or as an innovation process triggering a dynamic system. The Gram-Schmidt procedure suggests a linear sequential method of whitening (decorrelating) any stochastic process. Applied on a Gaussian process, memorylessness (that is, statistical independence) is guaranteed. It is not clear however, how to sequentially construct a memoryless process from a non-Gaussian process. In this paper we present a non-linear sequential method to generate a memoryless process from any given Markov process under varying objectives and constraints. We differentiate between lossless and lossy methods, closed form and algorithmic solutions and discuss the properties and uniqueness of our suggested methods.
IEEE Transactions on Pattern Analysis and Machine Intelligence | 2017
Amichai Painsky; Saharon Rosset