Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Mikhail Belkin is active.

Publication


Featured researches published by Mikhail Belkin.


Neural Computation | 2003

Laplacian Eigenmaps for dimensionality reduction and data representation

Mikhail Belkin; Partha Niyogi

One of the central problems in machine learning and pattern recognition is to develop appropriate representations for complex data. We consider the problem of constructing a representation for data lying on a low-dimensional manifold embedded in a high-dimensional space. Drawing on the correspondence between the graph Laplacian, the Laplace Beltrami operator on the manifold, and the connections to the heat equation, we propose a geometrically motivated algorithm for representing the high-dimensional data. The algorithm provides a computationally efficient approach to nonlinear dimensionality reduction that has locality-preserving properties and a natural connection to clustering. Some potential applications and illustrative examples are discussed.


Machine Learning | 2004

Semi-Supervised Learning on Riemannian Manifolds

Mikhail Belkin; Partha Niyogi

We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the Laplace-Beltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set.Our algorithm models the manifold using the adjacency graph for the data and approximates the Laplace-Beltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification.


conference on learning theory | 2004

Regularization and semi-supervised learning on large graphs

Mikhail Belkin; Irina Matveeva; Partha Niyogi

We consider the problem of labeling a partially labeled graph. This setting may arise in a number of situations from survey sampling to information retrieval to pattern recognition in manifold settings. It is also of potential practical importance, when the data is abundant, but labeling is expensive or requires human assistance.


international conference on machine learning | 2005

Beyond the point cloud: from transductive to semi-supervised learning

Vikas Sindhwani; Partha Niyogi; Mikhail Belkin

Due to its occurrence in engineering domains and implications for natural learning, the problem of utilizing unlabeled data is attracting increasing attention in machine learning. A large body of recent literature has focussed on the transductive setting where labels of unlabeled examples are estimated by learning a function defined only over the point cloud data. In a truly semi-supervised setting however, a learning machine has access to labeled and unlabeled examples and must make predictions on data points never encountered before. In this paper, we show how to turn transductive and standard supervised learning algorithms into semi-supervised learners. We construct a family of data-dependent norms on Reproducing Kernel Hilbert Spaces (RKHS). These norms allow us to warp the structure of the RKHS to reflect the underlying geometry of the data. We derive explicit formulas for the corresponding new kernels. Our approach demonstrates state of the art performance on a variety of classification tasks.


Annals of Statistics | 2008

Consistency of spectral clustering

Ulrike von Luxburg; Mikhail Belkin; Olivier Bousquet

Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.


symposium on computational geometry | 2008

Discrete laplace operator on meshed surfaces

Mikhail Belkin; Jian Sun; Yusu Wang

In recent years a considerable amount of work in graphics and geometric optimization used tools based on the Laplace-Beltrami operator on a surface. The applications of the Laplacian include mesh editing, surface smoothing, and shape interpolations among others. However, it has been shown [13, 24, 26] that the popular cotangent approximation schemes do not provide convergent point-wise (or even L2) estimates, while many applications rely on point-wise estimation. Existence of such schemes has been an open question [13]. In this paper we propose the first algorithm for approximating the Laplace operator of a surface from a mesh with point-wise convergence guarantees applicable to arbitrary meshed surfaces. We show that for a sufficiently fine mesh over an arbitrary surface, our mesh Laplacian is close to the Laplace-Beltrami operator on the surface at every point of the surface. Moreover, the proposed algorithm is simple and easily implementable. Experimental evidence shows that our algorithm exhibits convergence empirically and compares favorably with cotangentbased methods in providing accurate approximation of the Laplace operator for various meshes.


foundations of computer science | 2010

Polynomial Learning of Distribution Families

Mikhail Belkin; Kaushik Sinha

The question of polynomial learn ability of probability distributions, particularly Gaussian mixture distributions, has recently received significant attention in theoretical computer science and machine learning. However, despite major progress, the general question of polynomial learn ability of Gaussian mixture distributions still remained open. The current work resolves the question of polynomial learn ability for Gaussian mixtures in high dimension with an arbitrary fixed number of components. Specifically, we show that parameters of a Gaussian mixture distribution with fixed number of components can be learned using a sample whose size is polynomial in dimension and all other parameters. The result on learning Gaussian mixtures relies on an analysis of distributions belonging to what we call “polynomial families” in low dimension. These families are characterized by their moments being polynomial in parameters and include almost all common probability distributions as well as their mixtures and products. Using tools from real algebraic geometry, we show that parameters of any distribution belonging to such a family can be learned in polynomial time and using a polynomial number of sample points. The result on learning polynomial families is quite general and is of independent interest. To estimate parameters of a Gaussian mixture distribution in high dimensions, we provide a deterministic algorithm for dimensionality reduction. This allows us to reduce learning a high-dimensional mixture to a polynomial number of parameter estimations in low dimension. Combining this reduction with the results on polynomial families yields our result on learning arbitrary Gaussian mixtures in high dimensions.


conference on learning theory | 2005

Towards a theoretical foundation for laplacian-based manifold methods

Mikhail Belkin; Partha Niyogi

In recent years manifold methods have attracted a considerable amount of attention in machine learning. However most algorithms in that class may be termed “manifold-motivated” as they lack any explicit theoretical guarantees. In this paper we take a step towards closing the gap between theory and practice for a class of Laplacian-based manifold methods. We show that under certain conditions the graph Laplacian of a point cloud converges to the Laplace-Beltrami operator on the underlying manifold. Theorem 1 contains the first result showing convergence of a random graph Laplacian to manifold Laplacian in the machine learning context.


international conference on acoustics, speech, and signal processing | 2004

Tikhonov regularization and semi-supervised learning on large graphs

Mikhail Belkin; Irina Matveeva; Partha Niyogi

We consider the problem of labeling a partially labeled graph. This setting may arise in a number of situations from survey sampling to information retrieval to pattern recognition in manifold settings. It is also, especially, of potential practical importance when data is abundant, but labeling is expensive or requires human assistance. Our approach develops a framework for regularization on such graphs parallel to Tikhonov regularization on continuous spaces. The algorithms are very simple and involve solving a single, usually sparse, system of linear equations. Using the notion of algorithmic stability, we derive bounds on the generalization error and relate it to the structural invariants of the graph.


Annals of Statistics | 2009

Data spectroscopy: Eigenspaces of convolution operators and clustering

Tao Shi; Mikhail Belkin; Bin Yu

This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we providepopulation analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the Data Spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.

Collaboration


Dive into the Mikhail Belkin's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Yusu Wang

Ohio State University

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Simon Dennis

University of Newcastle

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge