Andrew R. Barron
Yale University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew R. Barron.
IEEE Transactions on Information Theory | 1993
Andrew R. Barron
Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The approximated function is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform. The nonlinear parameters associated with the sigmoidal nodes, as well as the parameters of linear combination, are adjusted in the approximation. In contrast, it is shown that for series expansions with n terms, in which only the parameters of linear combination are adjusted, the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption, where d is the dimension of the input to the function. For the class of functions examined, the approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings. >
IEEE Transactions on Information Theory | 1998
Andrew R. Barron; Jorma Rissanen; Bin Yu
We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannons basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.
IEEE Transactions on Information Theory | 1991
Andrew R. Barron; Thomas M. Cover
The authors introduce an index of resolvability that is proved to bound the rate of convergence of minimum complexity density estimators as well as the information-theoretic redundancy of the corresponding total description length. The results on the index of resolvability demonstrate the statistical effectiveness of the minimum description-length principle as a method of inference. The minimum complexity estimator converges to true density nearly as fast as an estimator based on prior knowledge of the true subclass of densities. Interpretations and basic properties of minimum complexity estimators are discussed. Some regression and classification problems that can be examined from the minimum description-length framework are considered. >
conference on learning theory | 1991
Andrew R. Barron
AbstractFor a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target function f is shown to be bounded by
IEEE Transactions on Information Theory | 1990
Bertrand Clarke; Andrew R. Barron
Journal of Statistical Planning and Inference | 1994
Bertrand Clarke; Andrew R. Barron
{\text{O}}\left( {\frac{{C_f^2 }}{n}} \right) + O(\frac{{ND}}{N}\log N)
Annals of Statistics | 2008
Andrew R. Barron; Albert Cohen; Wolfgang Dahmen; Ronald A. DeVore
IEEE Transactions on Information Theory | 2006
Gilbert Leung; Andrew R. Barron
where n is the number of nodes, d is the input dimension of the function, N is the number of training observations, and Cf is the first absolute moment of the Fourier magnitude distribution of f. The two contributions to this total risk are the approximation error and the estimation error. Approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function. With n ~ Cf(N/(dlog N))1/2 nodes, the order of the bound on the mean integrated squared error is optimized to be O(Cf((d/N)log N)1/2). The bound demonstrates surprisingly favorable properties of network estimation compared to traditional series and nonparametric curve estimation techniques in the case that d is moderately large. Similar bounds are obtained when the number of nodes n is not preselected as a function of Cf (which is generally not known a priori), but rather the number of nodes is optimized from the observed data by the use of a complexity regularization or minimum description length criterion. The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.
international symposium on information theory | 1997
Qun Xie; Andrew R. Barron
In the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D/sub n//n converges to zero at rate (log n)/n. The constant c, which the authors explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stock-market portfolio selection. >
Nonparametric Functional Estimation and Related Topics | 1991
Andrew R. Barron
We provide a rigorous proof that Jeffreys’ prior asymptotically maximizes Shannon’s mutual information between a sample of size n and the parameter. This was conjectured by Bernard0 (1979) and, despite the absence of a proof, forms the basis of the reference prior method in Bayesian statistical analysis. Our proof rests on an examination of large sample decision theoretic properties associated with the relative entropy or the Kullback-Leibler distance between probability density functions for independent and identically distributed random variables. For smooth finite-dimensional parametric families we derive an asymptotic expression for the minimax risk and for the related maximin risk. As a result, we show that, among continuous positive priors, Jeffreys’ prior uniquely achieves the asymptotic maximin value. In the discrete parameter case we show that, asymptotically, the Bayes risk reduces to the entropy of the prior so that the reference prior is seen to be the maximum entropy prior. We identify the physical significance of the risks by giving two information-theoretic interpretations in terms of probabilistic coding. AMS Subject Class