Massimiliano Pontil | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Massimiliano Pontil is active.

Explore More

Publication

Featured researches published by Massimiliano Pontil.

Advances in Computational Mathematics | 2000

Regularization Networks and Support Vector Machines

Theodoros Evgeniou; Massimiliano Pontil; Tomaso Poggio

Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular, the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapniks theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.

Machine Learning | 2008

Convex multi-task feature learning

Andreas Argyriou; Theodoros Evgeniou; Massimiliano Pontil

Abstract We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single-task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns task-specific functions and in the latter step it learns common-across-tasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select—not learn—a few common variables across the tasks.

knowledge discovery and data mining | 2004

Regularized multi--task learning

Theodoros Evgeniou; Massimiliano Pontil

Past empirical work has shown that learning multiple related tasks from data simultaneously can be advantageous in terms of predictive performance relative to learning these tasks independently. In this paper we present an approach to multi--task learning based on the minimization of regularization functionals similar to existing ones, such as the one for Support Vector Machines (SVMs), that have been successfully used in the past for single--task learning. Our approach allows to model the relation between tasks in terms of a novel kernel function that uses a task--coupling parameter. We implement an instance of the proposed approach similar to SVMs and test it empirically using simulated as well as real data. The experimental results show that the proposed method performs better than existing multi--task learning methods and largely outperforms single--task learning using SVMs.

IEEE Transactions on Pattern Analysis and Machine Intelligence | 1998

Support vector machines for 3D object recognition

Massimiliano Pontil; Alessandro Verri

Support vector machines (SVMs) have been recently proposed as a new technique for pattern recognition. Intuitively, given a set of points which belong to either of two classes, a linear SVM finds the hyperplane leaving the largest possible fraction of points of the same class on the same side, while maximizing the distance of either class from the hyperplane. The hyperplane is determined by a subset of the points of the two classes, named support vectors, and has a number of interesting theoretical properties. In this paper, we use linear SVMs for 3D object recognition. We illustrate the potential of SVMs on a database of 7200 images of 100 different objects. The proposed system does not require feature extraction and performs recognition on images regarded as points of a space of high dimension without estimating pose. The excellent recognition rates achieved in all the performed experiments indicate that SVMs are well-suited for aspect-based recognition.

Bioinformatics | 2012

PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments

David Jones; Daniel W. A. Buchan; Domenico Cozzetto; Massimiliano Pontil

MOTIVATION The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.

PLOS ONE | 2012

A tale of many cities: universal patterns in human urban mobility.

Anastasios Noulas; Salvatore Scellato; Renaud Lambiotte; Massimiliano Pontil; Cecilia Mascolo

The advent of geographic online social networks such as Foursquare, where users voluntarily signal their current location, opens the door to powerful studies on human movement. In particular the fine granularity of the location data, with GPS accuracy down to 10 meters, and the worldwide scale of Foursquare adoption are unprecedented. In this paper we study urban mobility patterns of people in several metropolitan cities around the globe by analyzing a large set of Foursquare users. Surprisingly, while there are variations in human movement in different cities, our analysis shows that those are predominantly due to different distributions of places across different urban environments. Moreover, a universal law for human mobility is identified, which isolates as a key component the rank-distance, factoring in the number of places between origin and destination, rather than pure physical distance, as considered in some previous works. Building on our findings, we also show how a rank-based movement model accurately captures real human movements in different cities.

Neural Computation | 2005

On Learning Vector-Valued Functions

Charles A. Micchelli; Massimiliano Pontil

In this letter, we provide a study of learning in a Hilbert space of vector-valued functions. We motivate the need for extending learning theory of scalar-valued functions by practical considerations and establish some basic results for learning vector-valued functions that should prove useful in applications. Specifically, we allow an output space Y to be a Hilbert space, and we consider a reproducing kernel Hilbert space of functions whose values lie in Y. In this setting, we derive the form of the minimal norm interpolant to a finite set of data and apply it to study some regularization functionals that are important in learning theory. We consider specific examples of such functionals corresponding to multiple-output regularization networks and support vector machines, for both regression and classification. Finally, we provide classes of operator-valued kernels of the dot product and translation-invariant type.

computer vision and pattern recognition | 2001

Component-based face detection

B. Heiselet; Thomas Serre; Massimiliano Pontil; Tomaso Poggio

We present a component-based, trainable system for detecting frontal and near-frontal views of faces in still gray images. The system consists of a two-level hierarchy of Support Vector Machine (SVM) classifiers. On the first level, component classifiers independently detect components Of a face. On the second level, a single classifier checks if the geometrical configuration of the detected components in the image matches a geometrical model of a face. We propose a method for automatically learning components by using 3-D head models, This approach has the advantage that no manual interaction is required for choosing and extracting components. Experiments show that the component-based system is significantly more robust against rotations in depth than a comparable system trained on whole face patterns.

Neural Computation | 1998

Properties of support vector machines

Massimiliano Pontil; Alessandro Verri

Support vector machines (SVMs) perform pattern recognition between two point classes by finding a decision surface determined by certain points of the training set, termed support vectors (SV). This surface, which in some feature space of possibly infinite dimension can be regarded as a hyperplane, is obtained from the solution of a problem of quadratic programming that depends on a regularization parameter. In this article, we study some mathematical properties of support vectors and show that the decision surface can be written as the sum of two orthogonal terms, the first depending on only the margin vectors (which are SVs lying on the margin), the second proportional to the regularization parameter. For almost all values of the parameter, this enables us to predict how the decision surface varies for small parameter changes. In the special but important case of feature space of finite dimension m, we also show that there are at most m + 1 margin vectors and observe that m + 1 SVs are usually sufficient to determine the decision surface fully. For relatively small m, this latter result leads to a consistent reduction of the SV number.

Annals of Statistics | 2011

Oracle Inequalities and Optimal Inference under Group Sparsity

Karim Lounici; Massimiliano Pontil; Sara van de Geer; Alexandre B. Tsybakov

We consider the problem of estimating a sparse linear regression vector s* under a gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate s*. We establish oracle inequalities for the prediction and l2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger coherence condition, we derive bounds for the estimation error for mixed (2,p)-norms with 1=p=8. When p=8, this result implies that a threshold version of the Group Lasso estimator selects the sparsity pattern of s* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and l2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation properties as compared to the Lasso.

Explore More