Gábor Lugosi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gábor Lugosi is active.

Explore More

Publication

Featured researches published by Gábor Lugosi.

Archive | 2001

Combinatorial methods in density estimation

Luc Devroye; Gábor Lugosi

1. Introduction.- 1.1. References.- 2. Concentration Inequalities.- 2.1. Hoeffdings Inequality.- 2.2. An Inequality for the Expected Maximal Deviation.- 2.3. The Bounded Difference Inequality.- 2.4. Examples.- 2.5. Bibliographic Remarks.- 2.6. Exercises.- 2.7. References.- 3. Uniform Deviation Inequalities.- 3.1. The Vapnik-Chervonenkis Inequality.- 3.2. Covering Numbers and Chaining.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCams Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 3.3. Example: The Dvoretzky-Kiefer-Wolfowitz Theorem.- 3.4. Bibliographic Remarks.- 3.5. Exercises.- 3.6. References.- 4. Combinatorial Tools.- 4.1. Shatter Coefficients.- 4.2. Vapnik-Chervonenkis Dimension and Shatter Coefficients.- 4.3. Vapnik-Chervonenkis Dimension and Covering Numbers.- 4.4. Examples.- 4.5. Bibliographic Remarks.- 4.6. Exercises.- 4.7. References.- 5. Total Variation.- 5.1. Density Estimation.- 5.2. The Total Variation.- 5.3. Invariance.- 5.4. Mappings.- 5.5. Convolutions.- 5.6. Normalization.- 5.7. The Lebesgue Density Theorem.- 5.8. LeCams Inequality.- 5.9. Bibliographic Remarks.- 5.10. Exercises.- 5.11. References.- 6. Choosing a Density Estimate.- 6.1. Choosing Between Two Densities.- 6.2. Examples.- 6.3. Is the Factor of Three Necessary?.- 6.4. Maximum Likelihood Does not Work.- 6.5. L2 Distances Are To Be Avoided.- 6.6. Selection from k Densities.- 6.7. Examples Continued.- 6.8. Selection from an Infinite Class.- 6.9. Bibliographic Remarks.- 6.10. Exercises.- 6.11. References.- 7. Skeleton Estimates.- 7.1. Kolmogorov Entropy.- 7.2. Skeleton Estimates.- 7.3. Robustness.- 7.4. Finite Mixtures.- 7.5. Monotone Densities on the Hypercube.- 7.6. How To Make Gigantic Totally Bounded Classes.- 7.7. Bibliographic Remarks.- 7.8. Exercises.- 7.9. References.- 8. The Minimum Distance Estimate: Examples.- 8.1. Problem Formulation.- 8.2. Series Estimates.- 8.3. Parametric Estimates: Exponential Families.- 8.4. Neural Network Estimates.- 8.5. Mixture Classes, Radial Basis Function Networks.- 8.6. Bibliographic Remarks.- 8.7. Exercises.- 8.8. References.- 9. The Kernel Density Estimate.- 9.1. Approximating Functions by Convolutions.- 9.2. Definition of the Kernel Estimate.- 9.3. Consistency of the Kernel Estimate.- 9.4. Concentration.- 9.5. Choosing the Bandwidth.- 9.6. Choosing the Kernel.- 9.7. Rates of Convergence.- 9.8. Uniform Rate of Convergence.- 9.9. Shrinkage, and the Combination of Density Estimates.- 9.10. Bibliographic Remarks.- 9.11. Exercises.- 9.12. References.- 10. Additive Estimates and Data Splitting.- 10.1. Data Splitting.- 10.2. Additive Estimates.- 10.3. Histogram Estimates.- 10A. Bibliographic Remarks.- 10.5. Exercises.- 10.6. References.- 11. Bandwidth Selection for Kernel Estimates.- 11.1. The Kernel Estimate with Riemann Kernel.- 11.2. General Kernels, Kernel Complexity.- 11.3. Kernel Complexity: Univariate Examples.- 11.4. Kernel Complexity: Multivariate Kernels.- 11.5. Asymptotic Optimality.- 11.6. Bibliographic Remarks.- 11.7. Exercises.- 11.8. References.- 12. Multiparameter Kernel Estimates.- 12.1. Multivariate Kernel Estimates-Product Kernels.- 12.2. Multivariate Kernel Estimates-Ellipsoidal Kernels.- 12.3. Variable Kernel Estimates.- 12.4. Tree-Structured Partitions.- 12.5. Changepoints and Bump Hunting.- 12.6. Bibliographic Remarks.- 12.7. Exercises.- 12.8. References.- 13. Wavelet Estimates.- 13.1. Definitions.- 13.2. Smoothing.- 13.3. Thresholding.- 13.4. Soft Thresholding.- 13.5. Bibliographic Remarks.- 13.6. Exercises.- 13.7. References.- 14. The Transformed Kernel Estimate.- 14.1. The Transformed Kernel Estimate.- 14.2. Box-Cox Transformations.- 14.3. Piecewise Linear Transformations.- 14.4. Bibliographic Remarks.- 14.5. Exercises.- 14.6. References.- 15. Minimax Theory.- 15.1. Estimating a Density from One Data Point.- 15.2. The General Minimax Problem.- 15.3. Rich Classes.- 15.4. Assouads Lemma.- 15.5. Example: The Class of Convex Densities.- 15.6. Additional Examples.- 15.7. Tuning the Parameters of Variable Kernel Estimates.- 15.8. Sufficient Statistics.- 15.9. Bibliographic Remarks.- 15.10. Exercises.- 15.11. References.- 16. Choosing the Kernel Order.- 16.1. Introduction.- 16.2. Standard Kernel Estimate: Riemann Kernels.- 16.3. Standard Kernel Estimates: General Kernels.- 16.4. An Infinite Family of Kernels.- 16.5. Bibliographic Remarks.- 16.6. Exercises.- 16.7. References.- 17. Bandwidth Choice with Superkernels.- 17.1. Superkernels.- 17.2. The Trapezoidal Kernel.- 17.3. Bandwidth Selection.- 17.4. Bibliographic Remarks.- 17.5. Exercises.- 17.6. References.- Author Index.

Lecture Notes in Computer Science | 2004

Introduction to Statistical Learning Theory

Olivier Bousquet; Stéphane Boucheron; Gábor Lugosi

The goal of statistical learning theory is to study, in a statistical framework, the properties of learning algorithms. In particular, most results take the form of so-called error bounds. This tutorial introduces the techniques that are used to obtain such results.

Machine Learning | 2002

Model Selection and Error Estimation

Peter L. Bartlett; Stéphane Boucheron; Gábor Lugosi

We study model selection strategies based on penalized empirical loss minimization. We point out a tight relationship between error estimation and data-based complexity penalization: any good error estimate may be converted into a data-based penalty function and the performance of the estimate is governed by the quality of the error estimate. We consider several penalty functions, involving error estimates on independent test data, empirical VC dimension, empirical VC entropy, and margin-based quantities. We also consider the maximal difference between the error on the first half of the training data and the second half, and the expected maximal discrepancy, a closely related capacity estimate that can be calculated by Monte Carlo integration. Maximal discrepancy penalty functions are appealing for pattern classification problems, since their computation is equivalent to empirical risk minimization over the training data with some labels flipped.

Random Structures and Algorithms | 2000

A sharp concentration inequality with application

Stéphane Boucheron; Gábor Lugosi; Pascal Massart

We present a new general concentration-of-measure inequality and illustrate its power by applications in random combinatorics. The results find direct applications in some problems of learning theory.

IEEE Transactions on Information Theory | 1998

Learning pattern classification-a survey

Sanjeev R. Kulkarni; Gábor Lugosi; Santosh S. Venkatesh

Classical and recent results in statistical pattern recognition and learning theory are reviewed in a two-class pattern classification setting. This basic model best illustrates intuition and analysis techniques while still containing the essential features and serving as a prototype for many applications. Topics discussed include nearest neighbor, kernel, and histogram methods, Vapnik-Chervonenkis theory, and neural networks. The presentation and the large (though nonexhaustive) list of references is geared to provide a useful overview of this field for both specialists and nonspecialists.

Annals of Probability | 2005

Moment inequalities for functions of independent random variables

Stéphane Boucheron; Olivier Bousquet; Gábor Lugosi; Pascal Massart

A general method for obtaining moment inequalities for functions of independent random variables is presented. It is a generalization of the entropy method which has been used to derive concentration inequalities for such functions [Boucheron, Lugosi and Massart Ann. Probab. 31 (2003) 1583-1614], and is based on a generalized tensorization inequality due to Latala and Oleszkiewicz [Lecture Notes in Math, 1745 (2000) 147-168]. The new inequalities prove to be a versatile tool in a wide range of applications. We illustrate the power of the method by showing how it can be used to effortlessly re-derive classical inequalities including Rosenthal and Kahane-Khinchine-type inequalities for sums of independent random variables, moment inequalities for suprema of empirical processes and moment inequalities for Rademacher chaos and U-statistics. Some of these corollaries are apparently new. In particular, we generalize Talagrands exponential inequality for Rademacher chaos of order 2 to any order. We also discuss applications for other complex functions of independent random variables, such as suprema of Boolean polynomials which include, as special cases, subgraph counting problems in random graphs.

international symposium on information theory | 1994

Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding

Tamás Linder; Gábor Lugosi; Kenneth Zeger

Rate of convergence results are established for vector quantization. Convergence rates are given for an increasing vector dimension and/or an increasing training set size. In particular, the following results are shown for memoryless real-valued sources with bounded support at transmission rate R. (1) If a vector quantizer with fixed dimension k is designed to minimize the empirical mean-square error (MSE) with respect to m training vectors, then its MSE for the true source converges in expectation and almost surely to the minimum possible MSE as O(/spl radic/(log m/m)). (2) The MSE of an optimal k-dimensional vector quantizer for the true source converges, as the dimension grows, to the distortion-rate function D(R) as O(/spl radic/(log k/k)). (3) There exists a fixed-rate universal lossy source coding scheme whose per-letter MSE on a real-valued source samples converges in expectation and almost surely to the distortion-rate function D(R) as O((/spl radic/(loglog n/log n)). (4) Consider a training set of n real-valued source samples blocked into vectors of dimension k, and a k-dimension vector quantizer designed to minimize the empirical MSE with respect to the m=[n/k] training vectors. Then the per-letter MSE of this quantizer for the true source converges in expectation and almost surely to the distortion-rate function D(R) as O(/spl radic/(log log n/log n))), if one chooses k=[(1/R)(1-/spl epsiv/)log n] for any /spl epsiv//spl isin/(0.1). >

Machine Learning | 2003

Potential-Based Algorithms in On-Line Prediction and Game Theory

Nicolò Cesa-Bianchi; Gábor Lugosi

In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasi-additive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapires Hedge and MW, as well as the Λ-strategies of Hart and Mas-Colell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potential-based analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general game-theoretic problems. In particular, we describe a notion of generalized regret andshow its applications in learning theory.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1993

Fast nearest-neighbor search in dissimilarity spaces

András Faragó; Tamás Linder; Gábor Lugosi

A fast nearest-neighbor algorithm is presented. It works in general spaces in which the known cell techniques cannot be implemented for various reasons, such as the absence of coordinate structure or high dimensionality. The central idea has already appeared several times in the literature with extensive computer simulation results. An exact probabilistic analysis of this family of algorithms that proves its O(1) asymptotic average complexity measured in the number of dissimilarity calculations is presented. >

IEEE Transactions on Information Theory | 2008

On the Performance of Clustering in Hilbert Spaces

Gérard Biau; Luc Devroye; Gábor Lugosi

Based on randomly drawn vectors in a separable Hilbert space, one may construct a k-means clustering scheme by minimizing an empirical squared error. We investigate the risk of such a clustering scheme, defined as the expected squared distance of a random vector X from the set of cluster centers. Our main result states that, for an almost surely bounded , the expected excess clustering risk is O(¿1/n) . Since clustering in high (or even infinite)-dimensional spaces may lead to severe computational problems, we examine the properties of a dimension reduction strategy for clustering based on Johnson-Lindenstrauss-type random projections. Our results reflect a tradeoff between accuracy and computational complexity when one uses k-means clustering after random projection of the data to a low-dimensional space. We argue that random projections work better than other simplistic dimension reduction schemes.

Explore More