Alexander J. Smola | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Alexander J. Smola is active.

Explore More

Publication

Featured researches published by Alexander J. Smola.

Neural Computation | 1998

Nonlinear component analysis as a kernel eigenvalue problem

Bernhard Schölkopf; Alexander J. Smola; Klaus-Robert Müller

A new method for performing a nonlinear form of principal component analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear mapfor instance, the space of all possible five-pixel products in 16 16 images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

Statistics and Computing | 2004

A tutorial on support vector regression

Alexander J. Smola; Bernhard Schölkopf

In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

Neural Computation | 2001

Estimating the Support of a High-Dimensional Distribution

Bernhard Schölkopf; John Platt; John Shawe-Taylor; Alexander J. Smola; Robert C. Williamson

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a simple subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

Neural Computation | 2000

New Support Vector Algorithms

Bernhard Schölkopf; Alexander J. Smola; Robert C. Williamson; Peter L. Bartlett

We propose a new class of support vector algorithms for regression and classification. In these algorithms, a parameter lets one effectively control the number of support vectors. While this can be useful in its own right, the parameterization has the additional benefit of enabling us to eliminate one of the other free parameters of the algorithm: the accuracy parameter in the regression case, and the regularization constant C in the classification case. We describe the algorithms, give some theoretical results concerning the meaning and the choice of , and report experimental results.

international conference on artificial neural networks | 1997

Kernel principal component analysis

Bernhard Schölkopf; Alexander J. Smola; Klaus-Robert Müller

A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

IEEE Transactions on Neural Networks | 1999

Input space versus feature space in kernel-based methods

Bernhard Schölkopf; Sebastian Mika; Christopher J. C. Burges; Phil Knirsch; Klaus-Robert Müller; Gunnar Rätsch; Alexander J. Smola

This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature space map, and how this influences the capacity of SV methods. Following this, we describe how the metric governing the intrinsic geometry of the mapped surface can be computed in terms of the kernel, using the example of the class of inhomogeneous polynomial kernels, which are often used in SV pattern recognition. We then discuss the connection between feature space and input space by dealing with the question of how one can, given some vector in feature space, find a preimage (exact or approximate) in input space. We describe algorithms to tackle this issue, and show their utility in two applications of kernel methods. First, we use it to reduce the computational complexity of SV decision functions; second, we combine it with the Kernel PCA algorithm, thereby constructing a nonlinear statistical denoising technique which is shown to perform well on real-world data.

european conference on computational learning theory | 2001

A Generalized Representer Theorem

Bernhard Schölkopf; Ralf Herbrich; Alexander J. Smola

Wahbas classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a self-contained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.

IEEE Transactions on Signal Processing | 2004

Online learning with kernels

Jyrki Kivinen; Alexander J. Smola; Robert C. Williamson

Kernel-based algorithms such as support vector machines have achieved considerable success in various problems in batch setting, where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper, we consider online learning in a reproducing kernel Hilbert space. By considering classical stochastic gradient descent within a feature space and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst-case loss bounds, and moreover, we show the convergence of the hypothesis to the minimizer of the regularized risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection.

Annals of Statistics | 2008

Kernel methods in machine learning

Thomas Hofmann; Bernhard Schölkopf; Alexander J. Smola

We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data. We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.

international conference on artificial neural networks | 1997

Predicting Time Series with Support Vector Machines

Klaus-Robert Müller; Alexander J. Smola; Gunnar Rätsch; Bernhard Schölkopf; Jens Kohlmorgen; Vladimir Vapnik

Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an e insensitive loss and (ii) Hubers robust loss function and discuss how to choose the regularization parameters in these models. Two applications are considered: data from (a) a noisy (normal and uniform noise) Mackey Glass equation and (b) the Santa Fe competition (set D). In both cases Support Vector Machines show an excellent performance. In case (b) the Support Vector approach improves the best known result on the benchmark by a factor of 29%.

Explore More