Andrew R. Barron | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andrew R. Barron is active.

Explore More

Publication

Featured researches published by Andrew R. Barron.

IEEE Transactions on Information Theory | 1993

Universal approximation bounds for superpositions of a sigmoidal function

Andrew R. Barron

Approximation properties of a class of artificial neural networks are established. It is shown that feedforward networks with one layer of sigmoidal nonlinearities achieve integrated squared error of order O(1/n), where n is the number of nodes. The approximated function is assumed to have a bound on the first moment of the magnitude distribution of the Fourier transform. The nonlinear parameters associated with the sigmoidal nodes, as well as the parameters of linear combination, are adjusted in the approximation. In contrast, it is shown that for series expansions with n terms, in which only the parameters of linear combination are adjusted, the integrated squared approximation error cannot be made smaller than order 1/n/sup 2/d/ uniformly for functions satisfying the same smoothness assumption, where d is the dimension of the input to the function. For the class of functions examined, the approximation rate and the parsimony of the parameterization of the networks are shown to be advantageous in high-dimensional settings. >

IEEE Transactions on Information Theory | 1998

The minimum description length principle in coding and modeling

Andrew R. Barron; Jorma Rissanen; Bin Yu

We review the principles of minimum description length and stochastic complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannons basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples.

IEEE Transactions on Information Theory | 1991

Minimum complexity density estimation

Andrew R. Barron; Thomas M. Cover

The authors introduce an index of resolvability that is proved to bound the rate of convergence of minimum complexity density estimators as well as the information-theoretic redundancy of the corresponding total description length. The results on the index of resolvability demonstrate the statistical effectiveness of the minimum description-length principle as a method of inference. The minimum complexity estimator converges to true density nearly as fast as an estimator based on prior knowledge of the true subclass of densities. Interpretations and basic properties of minimum complexity estimators are discussed. Some regression and classification problems that can be examined from the minimum description-length framework are considered. >

conference on learning theory | 1991

Approximation and estimation bounds for artificial neural networks

Andrew R. Barron

AbstractFor a common class of artificial neural networks, the mean integrated squared error between the estimated network and a target function f is shown to be bounded by

IEEE Transactions on Information Theory | 1990

Information-theoretic asymptotics of Bayes methods

Bertrand Clarke; Andrew R. Barron

Journal of Statistical Planning and Inference | 1994

Jeffreys prior is asymptotically least favorable under entropy risk

Bertrand Clarke; Andrew R. Barron

{\text{O}}\left( {\frac{{C_f^2 }}{n}} \right) + O(\frac{{ND}}{N}\log N)

Annals of Statistics | 2008

Approximation and learning by greedy algorithms

Andrew R. Barron; Albert Cohen; Wolfgang Dahmen; Ronald A. DeVore

IEEE Transactions on Information Theory | 2006

Information Theory and Mixing Least-Squares Regressions

Gilbert Leung; Andrew R. Barron

where n is the number of nodes, d is the input dimension of the function, N is the number of training observations, and Cf is the first absolute moment of the Fourier magnitude distribution of f. The two contributions to this total risk are the approximation error and the estimation error. Approximation error refers to the distance between the target function and the closest neural network function of a given architecture and estimation error refers to the distance between this ideal network function and an estimated network function. With n ~ Cf(N/(dlog N))1/2 nodes, the order of the bound on the mean integrated squared error is optimized to be O(Cf((d/N)log N)1/2). The bound demonstrates surprisingly favorable properties of network estimation compared to traditional series and nonparametric curve estimation techniques in the case that d is moderately large. Similar bounds are obtained when the number of nodes n is not preselected as a function of Cf (which is generally not known a priori), but rather the number of nodes is optimized from the observed data by the use of a complexity regularization or minimum description length criterion. The analysis involves Fourier techniques for the approximation error, metric entropy considerations for the estimation error, and a calculation of the index of resolvability of minimum complexity estimation of the family of networks.

international symposium on information theory | 1997

Asymptotic minimax regret for data compression, gambling and prediction

Qun Xie; Andrew R. Barron

In the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. The authors examine the relative entropy distance D/sub n/ between the true density and the Bayesian density and show that the asymptotic distance is (d/2)(log n)+c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D/sub n//n converges to zero at rate (log n)/n. The constant c, which the authors explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stock-market portfolio selection. >

Nonparametric Functional Estimation and Related Topics | 1991

Complexity Regularization with Application to Artificial Neural Networks

Andrew R. Barron

We provide a rigorous proof that Jeffreys’ prior asymptotically maximizes Shannon’s mutual information between a sample of size n and the parameter. This was conjectured by Bernard0 (1979) and, despite the absence of a proof, forms the basis of the reference prior method in Bayesian statistical analysis. Our proof rests on an examination of large sample decision theoretic properties associated with the relative entropy or the Kullback-Leibler distance between probability density functions for independent and identically distributed random variables. For smooth finite-dimensional parametric families we derive an asymptotic expression for the minimax risk and for the related maximin risk. As a result, we show that, among continuous positive priors, Jeffreys’ prior uniquely achieves the asymptotic maximin value. In the discrete parameter case we show that, asymptotically, the Bayes risk reduces to the entropy of the prior so that the reference prior is seen to be the maximum entropy prior. We identify the physical significance of the risks by giving two information-theoretic interpretations in terms of probabilistic coding. AMS Subject Class

Explore More