Is this you? Create Your Porfile

Lorenzo Rosasco

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Lorenzo Rosasco is active.

Explore More

Publication

Featured researches published by Lorenzo Rosasco.

arXiv: Machine Learning | 2012

Kernels for Vector-Valued Functions: A Review

Mauricio A. Álvarez; Lorenzo Rosasco; Neil D. Lawrence

Kernel methods are among the most popular techniques in machine learning. From a regularization perspective they play a central role in regularization theory as they provide a natural choice for the hypotheses space and the regularization functional through the notion of reproducing kernel Hilbert spaces. From a probabilistic perspective they are the key in the context of Gaussian processes, where the kernel function is known as the covariance function. Traditionally, kernel methods have been used in supervised learning problems with scalar outputs and indeed there has been a considerable amount of work devoted to designing and learning kernels. More recently there has been an increasing interest in methods that deal with multiple outputs, motivated partially by frameworks like multitask learning. In this monograph, we review different methods to design or learn valid kernel functions for multiple outputs, paying particular attention to the connection between probabilistic and functional methods.

Neural Computation | 2004

Are loss functions all the same

Lorenzo Rosasco; Ernesto De Vito; Andrea Caponnetto; Michele Piana; Alessandro Verri

In this letter, we investigate the impact of choosing different loss functions from the viewpoint of statistical learning theory. We introduce a convexity assumption, which is met by all loss functions commonly used in the literature, and study how the bound on the estimation error changes with the loss. We also derive a general result on the minimizer of the expected risk for a convex loss function in the case of classification. The main outcome of our analysis is that for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate practically indistinguishable from the logistic loss rate and much better than the square loss rate. Furthermore, if the hypothesis space is sufficiently rich, the bounds obtained for the hinge loss are not loosened by the thresholding stage.

Foundations of Computational Mathematics | 2005

Model Selection for Regularized Least-Squares Algorithm in Learning Theory

E. De Vito; Andrea Caponnetto; Lorenzo Rosasco

AbstractWe investigate the problem of model selection for learning algorithms depending on a continuous parameter. We propose a model selection procedure based on a worst-case analysis and on a data-independent choice of the parameter. For the regularized least-squares algorithm we bound the generalization error of the solution by a quantity depending on a few known constants and we show that the corresponding model selection procedure reduces to solving a bias-variance problem. Under suitable smoothness conditions on the regression function, we estimate the optimal parameter as a function of the number of data and we prove that this choice ensures consistency of the algorithm.

Journal of Complexity | 2007

On regularization algorithms in learning theory

Frank Bauer; Sergei V. Pereverzev; Lorenzo Rosasco

In this paper we discuss a relation between Learning Theory and Regularization of linear ill-posed inverse problems. It is well known that Tikhonov regularization can be profitably used in the context of supervised learning, where it usually goes under the name of regularized least-squares algorithm. Moreover, the gradient descent algorithm was studied recently, which is an analog of Landweber regularization scheme. In this paper we show that a notion of regularization defined according to what is usually done for ill-posed inverse problems allows to derive learning algorithms which are consistent and provide a fast convergence rate. It turns out that for priors expressed in term of variable Hilbert scales in reproducing kernel Hilbert spaces our results for Tikhonov regularization match those in Smale and Zhou [Learning theory estimates via integral operators and their approximations, submitted for publication, retrievable at , 2005] and improve the results for Landweber iterations obtained in Yao et al. [On early stopping in gradient descent learning, Constructive Approximation (2005), submitted for publication]. The remarkable fact is that our analysis shows that the same properties are shared by a large class of learning algorithms which are essentially all the linear regularization schemes. The concept of operator monotone functions turns out to be an important tool for the analysis.

Neural Computation | 2008

Spectral algorithms for supervised learning

L. Lo Gerfo; Lorenzo Rosasco; Francesca Odone; E. De Vito; Alessandro Verri

We discuss how a large class of regularization methods, collectively known as spectral regularization and originally designed for solving ill-posed inverse problems, gives rise to regularized learning algorithms. All of these algorithms are consistent kernel methods that can be easily implemented. The intuition behind their derivation is that the same principle allowing for the numerical stabilization of a matrix inversion problem is crucial to avoid overfitting. The various methods have a common derivation but different computational and theoretical properties. We describe examples of such algorithms, analyze their classification performance on several data sets and discuss their applicability to real-world problems.

european conference on machine learning | 2010

Solving structured sparsity regularization with proximal methods

Sofia Mosci; Lorenzo Rosasco; Matteo Santoro; Alessandro Verri; Silvia Villa

Proximal methods have recently been shown to provide effective optimization procedures to solve the variational problems defining the l1 regularization algorithms. The goal of the paper is twofold. First we discuss how proximal methods can be applied to solve a large class of machine learning algorithms which can be seen as extensions of l1 regularization, namely structured sparsity regularization. For all these algorithms, it is possible to derive an optimization procedure which corresponds to an iterative projection algorithm. Second, we discuss the effect of a preconditioning of the optimization procedure achieved by adding a strictly convex functional to the objective function. Structured sparsity algorithms are usually based on minimizing a convex (not strictly convex) objective function and this might lead to undesired unstable behavior. We show that by perturbing the objective function by a small strictly convex term we often reduce substantially the number of required computations without affecting the prediction performance of the obtained solution.

Foundations of Computational Mathematics | 2010

Mathematics of the Neural Response

Steve Smale; Lorenzo Rosasco; Jake V. Bouvrie; Andrea Caponnetto; Tomaso Poggio

We propose a natural image representation, the neural response, motivated by the neuroscience of the visual cortex. The inner product defined by the neural response leads to a similarity measure between functions which we call the derived kernel. Based on a hierarchical architecture, we give a recursive definition of the neural response and associated derived kernel. The derived kernel can be used in a variety of application domains such as classification of images, strings of text and genomics data.

Foundations of Computational Mathematics | 2010

Adaptive Kernel Methods Using the Balancing Principle

E. De Vito; S. Pereverzyev; Lorenzo Rosasco

The regularization parameter choice is a fundamental problem in Learning Theory since the performance of most supervised algorithms crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount of prior knowledge needed to choose the regularization parameter in order to obtain good learning rates. In this paper we present a parameter choice strategy, called the balancing principle, to choose the regularization parameter without knowledge of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choices for various kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented.

BMC Genomics | 2009

The l1-l2 regularization framework unmasks the hypoxia signature hidden in the transcriptome of a set of heterogeneous neuroblastoma cell lines

Paolo Fardin; Annalisa Barla; Sofia Mosci; Lorenzo Rosasco; Alessandro Verri; Luigi Varesio

BackgroundGene expression signatures are clusters of genes discriminating different statuses of the cells and their definition is critical for understanding the molecular bases of diseases. The identification of a gene signature is complicated by the high dimensional nature of the data and by the genetic heterogeneity of the responding cells. The l1-l2 regularization is an embedded feature selection technique that fulfills all the desirable properties of a variable selection algorithm and has the potential to generate a specific signature even in biologically complex settings. We studied the application of this algorithm to detect the signature characterizing the transcriptional response of neuroblastoma tumor cell lines to hypoxia, a condition of low oxygen tension that occurs in the tumor microenvironment.ResultsWe determined the gene expression profile of 9 neuroblastoma cell lines cultured under normoxic and hypoxic conditions. We studied a heterogeneous set of neuroblastoma cell lines to mimic the in vivo situation and to test the robustness and validity of the l1-l2 regularization with double optimization. Analysis by hierarchical, spectral, and k-means clustering or supervised approach based on t-test analysis divided the cell lines on the bases of genetic differences. However, the disturbance of this strong transcriptional response completely masked the detection of the more subtle response to hypoxia. Different results were obtained when we applied the l1-l2 regularization framework. The algorithm distinguished the normoxic and hypoxic statuses defining signatures comprising 3 to 38 probesets, with a leave-one-out error of 17%. A consensus hypoxia signature was established setting the frequency score at 50% and the correlation parameter ε equal to 100. This signature is composed by 11 probesets representing 8 well characterized genes known to be modulated by hypoxia.ConclusionWe demonstrate that l1-l2 regularization outperforms more conventional approaches allowing the identification and definition of a gene expression signature under complex experimental conditions. The l1-l2 regularization and the cross validation generates an unbiased and objective output with a low classification error. We feel that the application of this algorithm to tumor biology will be instrumental to analyze gene expression signatures hidden in the transcriptome that, like hypoxia, may be major determinant of the course of the disease.

Machine Learning | 2012

Multi-output learning via spectral filtering

Luca Baldassarre; Lorenzo Rosasco; Annalisa Barla; Alessandro Verri

In this paper we study a class of regularized kernel methods for multi-output learning which are based on filtering the spectrum of the kernel matrix. The considered methods include Tikhonov regularization as a special case, as well as interesting alternatives such as vector-valued extensions of L2 boosting and other iterative schemes. Computational properties are discussed for various examples of kernels for vector-valued functions and the benefits of iterative techniques are illustrated. Generalizing previous results for the scalar case, we show a finite sample bound for the excess risk of the obtained estimator, which allows to prove consistency both for regression and multi-category classification. Finally, we present some promising results of the proposed algorithms on artificial and real data.

Explore More