Thomas M. Cover
Stanford University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Thomas M. Cover.
IEEE Transactions on Information Theory | 1967
Thomas M. Cover; Peter E. Hart
The nearest neighbor decision rule assigns to an unclassified sample point the classification of the nearest of a set of previously classified points. This rule is independent of the underlying joint distribution on the sample points and their classifications, and hence the probability of error R of such a rule must be at least as great as the Bayes probability of error R^{\ast} --the minimum probability of error over all decision rules taking underlying probability structure into account. However, in a large sample analysis, we will show in the M -category case that R^{\ast} \leq R \leq R^{\ast}(2 --MR^{\ast}/(M-1)) , where these bounds are the tightest possible, for all suitably smooth underlying distributions. Thus for any number of categories, the probability of error of the nearest neighbor rule is bounded above by twice the Bayes probability of error. In this sense, it may be said that half the classification information in an infinite sample set is contained in the nearest neighbor.
IEEE Transactions on Information Theory | 1979
Thomas M. Cover; Abbas El Gamal
A relay channel consists of an input x_{l} , a relay output y_{1} , a channel output y , and a relay sender x_{2} (whose transmission is allowed to depend on the past symbols y_{1} . The dependence of the received symbols upon the inputs is given by p(y,y_{1}|x_{1},x_{2}) . The channel is assumed to be memoryless. In this paper the following capacity theorems are proved. 1)If y is a degraded form of y_{1} , then C \: = \: \max \!_{p(x_{1},x_{2})} \min \,{I(X_{1},X_{2};Y), I(X_{1}; Y_{1}|X_{2})} . 2)If y_{1} is a degraded form of y , then C \: = \: \max \!_{p(x_{1})} \max_{x_{2}} I(X_{1};Y|x_{2}) . 3)If p(y,y_{1}|x_{1},x_{2}) is an arbitrary relay channel with feedback from (y,y_{1}) to both x_{1} \and x_{2} , then C\: = \: \max_{p(x_{1},x_{2})} \min \,{I(X_{1},X_{2};Y),I \,(X_{1};Y,Y_{1}|X_{2})} . 4)For a general relay channel, C \: \leq \: \max_{p(x_{1},x_{2})} \min \,{I \,(X_{1}, X_{2};Y),I(X_{1};Y,Y_{1}|X_{2}) . Superposition block Markov encoding is used to show achievability of C , and converses are established. The capacities of the Gaussian relay channel and certain discrete relay channels are evaluated. Finally, an achievable lower bound to the capacity of the general relay channel is established.
IEEE Transactions on Electronic Computers | 1965
Thomas M. Cover
This paper develops the separating capacities of families of nonlinear decision surfaces by a direct application of a theorem in classical combinatorial geometry. It is shown that a family of surfaces having d degrees of freedom has a natural separating capacity of 2d pattern vectors, thus extending and unifying results of Winder and others on the pattern-separating capacity of hyperplanes. Applying these ideas to the vertices of a binary n-cube yields bounds on the number of spherically, quadratically, and, in general, nonlinearly separable Boolean functions of n variables. It is shown that the set of all surfaces which separate a dichotomy of an infinite, random, separable set of pattern vectors can be characterized, on the average, by a subset of only 2d extreme pattern vectors. In addition, the problem of generalizing the classifications on a labeled set of pattern points to the classification of a new point is defined, and it is found that the probability of ambiguous generalization is large unless the number of training patterns exceeds the capacity of the set of separating surfaces.
IEEE Transactions on Information Theory | 1982
Abbas El Gamal; Thomas M. Cover
Consider a sequence of independent identically distributed (i.i.d.) random variables X_{l},X_{2}, \cdots, X_{n} and a distortion measure d(X_{i},X_{i}) on the estimates X_{i} of X_{i} . Two descriptions i(X)\in \{1,2, \cdots ,2^{nR_{1}\} and j(X)\in \{1,2, \cdots,2^{nR_{2}\} are given of the sequence X=(X_{1}, X_{2}, \cdots ,X_{n}) . From these two descriptions, three estimates (i(X)), X2(j(X)) , and \hat{X}_{O}(i(X),j(X)) are formed, with resulting expected distortions E \frac{1/n} \sum^{n}_{k=1} d(X_{k}, \hat{X}_{mk})=D_{m}, m=0,1,2. We find that the distortion constraints D_{0}, D_{1}, D_{2} are achievable if there exists a probability mass distribution p(x)p(\hat{x}_{1},\hat{x}_{2},\hat{x}_{0}|x) with Ed(X,\hat{x}_{m})\leq D_{m} such that R_{1}>I(X;\hat{X}_{1}), R_{2}>I(X;\hat{X}_{2}), where I(\cdot) denotes Shannon mutual information. These rates are shown to be optimal for deterministic distortion measures.
IEEE Transactions on Information Theory | 1991
William H. R. Equitz; Thomas M. Cover
The successive refinement of information consists of first approximating data using a few bits of information, then iteratively improving the approximation as more and more information is supplied. The goal is to achieve an optimal description at each stage. In general, an ongoing description which is rate-distortion optimal whenever it is interrupted is sought. It is shown that in order to achieve optimal successive refinement the necessary and sufficient conditions are that the solutions of the rate distortion problem can be written as a Markov chain. In particular, all finite alphabet signals with Hamming distortion satisfy these requirements. It is also shown that the same is true for Gaussian signals with squared error distortion and for Laplacian signals with absolute error distortion. A simple counterexample with absolute error distortion and a symmetric source distribution which shows that successive refinement is not always achievable is presented. >
IEEE Transactions on Information Theory | 1991
Amir Dembo; Thomas M. Cover; Joy A. Thomas
The role of inequalities in information theory is reviewed, and the relationship of these inequalities to inequalities in other branches of mathematics is developed. The simple inequalities for differential entropy are applied to the standard multivariate normal to furnish new and simpler proofs of the major determinant inequalities in classical mathematics. The authors discuss differential entropy inequalities for random subsets of samples. These inequalities when specialized to multivariate normal variables provide the determinant inequalities that are presented. The authors focus on the entropy power inequality (including the related Brunn-Minkowski, Youngs, and Fisher information inequalities) and address various uncertainty principles and their interrelations. >
IEEE Transactions on Information Theory | 1973
Thomas M. Cover
Let S be a given subset of binary n-sequences. We provide an explicit scheme for calculating the index of any sequence in S according to its position in the lexicographic ordering of S . A simple inverse algorithm is also given. Particularly nice formulas arise when S is the set of all n -sequences of weight k and also when S is the set of all sequences having a given empirical Markov property. Schalkwijk and Lynch have investigated the former case. The envisioned use of this indexing scheme is to transmit or store the index rather than the sequence, thus resulting in a data compression of (\log\midS\mid)/n .
IEEE Transactions on Information Theory | 1991
Andrew R. Barron; Thomas M. Cover
The authors introduce an index of resolvability that is proved to bound the rate of convergence of minimum complexity density estimators as well as the information-theoretic redundancy of the corresponding total description length. The results on the index of resolvability demonstrate the statistical effectiveness of the minimum description-length principle as a method of inference. The minimum complexity estimator converges to true density nearly as fast as an estimator based on prior knowledge of the true subclass of densities. Interpretations and basic properties of minimum complexity estimators are discussed. Some regression and classification problems that can be examined from the minimum description-length framework are considered. >
IEEE Transactions on Information Theory | 1980
Thomas M. Cover; Abbas El Gamal; Masoud Salehi
Let \{(U_{i},V_{i})\}_{i=1}^{n} be a source of independent identically distributed (i.i.d.) discrete random variables with joint probability mass function p(u,v) and common part w=f(u)=g(v) in the sense of Witsenhausen, Gacs, and Korner. It is shown that such a source can be sent with arbitrarily small probability of error over a multiple access channel (MAC) \{\cal X_{1} \times \cal X_{2},\cal Y,p(y|x_{1},x_{2})\}, with allowed codes \{x_{l}(u), x_{2}(v)\} if there exist probability mass functions p(s), p(x_{1}|s,u),p(x_{2}|s,v) , such that H(U|V) H(V|U ) H(U,V|W) H(U,V) \mbox{where} p(s,u,v,x_{1},x_{2},y), Xl, X2, y)=p(s)p(u,v)p(x_{1}|u,s)p(x_{2}|v,s)p(y|x_{1},x_{2}). lifts region includes the multiple access channel region and the Slepian-Wolf data compression region as special cases.
IEEE Transactions on Information Theory | 1975
Thomas M. Cover
If \{(X_i, Y_i)\}_{i=1}^{\infty} is a sequence of independent identically distributed discrete random pairs with (X_i, Y_i) ~ p(x,y) , Slepian and Wolf have shown that the X process and the Y process can be separately described to a common receiver at rates R_X and R_Y hits per symbol if R_X + R_Y > H(X,Y), R_X > H(X\midY), R_Y > H(Y\midX) . A simpler proof of this result will be given. As a consequence it is established that the Slepian-Wolf theorem is true without change for arbitrary ergodic processes \{(X_i,Y_i)\}_{i=1}^{\infty} and countably infinite alphabets. The extension to an arbitrary number of processes is immediate.