Amir Ingber
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Amir Ingber.
IEEE Transactions on Information Theory | 2015
Amir Ingber; Thomas A. Courtade; Tsachy Weissman
The problem of performing similarity queries on compressed data is considered. We focus on the quadratic similarity measure, and study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on the compressed data. For a Gaussian source, we show that the queries can be answered reliably if and only if the compression rate exceeds a given threshold-the identification rate-which we explicitly characterize. Moreover, when compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that, as with classical compression, the Gaussian source requires the largest compression rate among sources with a given variance. Moreover, a robust scheme is described that attains this maximal rate for any source distribution.
IEEE Transactions on Information Theory | 2016
Albert No; Amir Ingber; Tsachy Weissman
We investigate the second order asymptotics (source dispersion) of the successive refinement problem. Similar to the classical definition of a successively refinable source, we say that a source is strongly successively refinable if successive refinement coding can achieve the second order optimum rate (including the dispersion terms) at both decoders. We establish a sufficient condition for strong successive refinability. We show that any discrete source under Hamming distortion and the Gaussian source under quadratic distortion are strongly successively refinable. We also demonstrate how successive refinement ideas can be used in point-to-point lossy compression problems in order to reduce complexity. We give two examples, the binary-Hamming and Gaussian-quadratic cases, in which a layered code construction results in a low complexity scheme that attains optimal performance. For example, when the number of layers grows with the block length n, we show how to design an O(nlog(n)) algorithm that asymptotically achieves the rate-distortion bound.
data compression conference | 2013
Amir Ingber; Thomas A. Courtade; Tsachy Weissman
The problem of performing similarity queries on compressed data is considered. We study the fundamental tradeoff between compression rate, sequence length, and reliability of queries performed on compressed data. For a Gaussian source and quadratic similarity criterion, we show that queries can be answered reliably if and only if the compression rate exceeds a given threshold - the identification rate - which we explicitly characterize. When compression is performed at a rate greater than the identification rate, responses to queries on the compressed data can be made exponentially reliable. We give a complete characterization of this exponent, which is analogous to the error and excess-distortion exponents in channel and source coding, respectively. For a general source, we prove that the identification rate is at most that of a Gaussian source with the same variance. Therefore, as with classical compression, the Gaussian source requires the largest compression rate. Moreover, a scheme is described that attains this maximal rate for any source distribution.
international symposium on information theory | 2012
Amir Ingber; Ram Zamir
We revisit the setting of a Gaussian channel without power constraints, proposed by Poltyrev, where the codewords are points in Euclidean space and their density is considered instead of the communication rate. We refine the expurgation technique (proposed by Poltyrev for the derivation of the error exponent) to the finite dimensions case and obtain a finite-dimensional achievability bound. While the expurgation exponent improves upon the random coding exponent only for certain rates (below a rate known as δex), we show that for finite dimensions the expurgation technique is useful for a broader range of rates. In addition, we present precise asymptotical analysis of the expurgation bound and find the sub-exponential terms, which turn out to be non-negligible.
international symposium on information theory | 2012
Da Wang; Amir Ingber; Yuval Kochman
We consider a discrete memoryless joint source-channel setting. In this setting, if a source sequence is reconstructed with distortion below some threshold, we declare a success event. We prove that for any joint source-channel scheme, if this threshold lower (better) than the optimum average distortion, then the success probability approaches zero as the block length increases. Furthermore, we show that the probability has an exponential behavior, and evaluate the optimal exponent. Surprisingly, the best exponential behavior is attainable by a separation-based scheme.
conference on information sciences and systems | 2012
Amir Ingber; Da Wang; Yuval Kochman
We revisit old and recent “dispersion” theorems for lossless and lossy source coding, channel coding and joint source-channel coding, showing that they can be proven in a general unified framework. In all of these cases, an error (or excess distortion) event occurs, to the level of interest in the asymptotic analysis, if some function of the empirical distribution crosses a threshold. Thus, second order analysis for general functions of distributions is studied, and a technical result which allows to prove the various dispersion theorems is derived.
allerton conference on communication, control, and computing | 2013
Idoia Ochoa; Amir Ingber; Tsachy Weissman
The generation of new databases and the amount of data on existing ones is growing exponentially. As a result, executing queries on large databases is becoming a timely and challenging task. With this in mind, we study the problem of compressing sequences in a database so that similarity queries can be performed efficiently on the compressed database. The fundamental limits of this problem characterize the tradeoff between compression rate and the reliability of the queries performed on the compressed data. While those asymptotic limits have been studied and characterized in past work, how to approach these limits in practice has remained largely unexplored. In this paper, we propose an approach to this task, based in part on existing lossy compression algorithms. Specifically, we consider queries of the form: “which sequences in the database are similar to a given sequence y?”. For the case where similarity between sequences is measured via Hamming distortion, we construct schemes whose performance is close to the fundamental limits. Furthermore, we test our scheme on a sample database of real DNA sequences, and show significant compression while still allowing highly reliable query answers.
IEEE Transactions on Information Theory | 2016
Fabian Steiner; Steffen Dempfle; Amir Ingber; Tsachy Weissman
We study the problem of compression for the purpose of similarity identification, where similarity is measured by the mean square Euclidean distance between vectors. While the asymptotical fundamental limits of the problem-the minimal compression rate and the error exponent-were found in a previous work, in this paper, we focus on the nonasymptotic domain and on practical, implementable schemes. We first present a finite blocklength achievability bound based on shape-gain quantization: the gain (amplitude) of the vector is compressed via scalar quantization, and the shape (the projection on the unit sphere) is quantized using a spherical code. The results are numerically evaluated, and they converge to the asymptotic values, as predicted by the error exponent. We then give a nonasymptotic lower bound on the performance of any compression scheme, and compare to the upper (achievability) bound. For a practical implementation of such a scheme, we use wrapped spherical codes, studied by Hamkins and Zeger, and use the Leech lattice as an example for an underlying lattice. As a side result, we obtain a bound on the covering angle of any wrapped spherical code, as a function of the covering radius of the underlying lattice.
international symposium on information theory | 2014
Amir Ingber; Tsachy Weissman
We study the problem of compressing a source for the goal of answering similarity queries from the compressed data. Unlike classical compression, here there is no requirement that the source be reproduced from the compressed form. For discrete memoryless sources and an arbitrary similarity measure, we fully characterize the minimal compression rate that allows query answers, that are reliable in the sense of having a vanishing false-positive probability, when false negatives are not allowed. The result is partially based on a previous work by Ahlswede et al. [1], and the inherently typical subset lemma plays a key role in the converse proof. We then discuss the performance that is attainable by using schemes that use lossy source codes as a building block, and show that such schemes are, in general, suboptimal. Finally, we discuss the problem of computing the fundamental limit, and present numerical results.
international symposium on information theory | 2013
Amir Ingber; Thomas A. Courtade; Tsachy Weissman
In this paper, we consider the problem of determining whether sequences X and Y, generated i.i.d. according to P<sub>X</sub> × P<sub>Y</sub>, are equal given access only to the pair (Y, T(X)), where T(X) is a rate-R compressed version of X. In general, the rate R may not be sufficiently large to reliably determine whether X=Y. We precisely characterize this reliability - i.e., the exponential rate at which an error is made - as a function of R. Interestingly, the exponent turns out to be related to the Bhattacharyya distance between the distributions P<sub>X</sub> and P<sub>Y</sub>. In addition, the scheme achieving this exponent is universal, i.e. does not depend on P<sub>X</sub>, P<sub>Y</sub>.