Andriy Mnih | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andriy Mnih is active.

Explore More

Publication

Featured researches published by Andriy Mnih.

international conference on machine learning | 2007

Restricted Boltzmann machines for collaborative filtering

Ruslan Salakhutdinov; Andriy Mnih; Geoffrey E. Hinton

Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBMs), can be used to model tabular data, such as users ratings of movies. We present efficient learning and inference procedures for this class of models and demonstrate that RBMs can be successfully applied to the Netflix data set, containing over 100 million user/movie ratings. We also show that RBMs slightly outperform carefully-tuned SVD models. When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflixs own system.

international conference on machine learning | 2008

Bayesian probabilistic matrix factorization using Markov chain Monte Carlo

Ruslan Salakhutdinov; Andriy Mnih

Low-rank matrix approximation methods provide one of the simplest and most effective approaches to collaborative filtering. Such models are usually fitted to data by finding a MAP estimate of the model parameters, a procedure that can be performed efficiently even on very large datasets. However, unless the regularization parameters are tuned carefully, this approach is prone to overfitting because it finds a single point estimate of the parameters. In this paper we present a fully Bayesian treatment of the Probabilistic Matrix Factorization (PMF) model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters. We show that Bayesian PMF models can be efficiently trained using Markov chain Monte Carlo methods by applying them to the Netflix dataset, which consists of over 100 million movie ratings. The resulting models achieve significantly higher prediction accuracy than PMF models trained using MAP estimation.

international conference on machine learning | 2007

Three new graphical models for statistical language modelling

Andriy Mnih; Geoffrey E. Hinton

The supremacy of n-gram models in statistical language modelling has recently been challenged by parametric models that use distributed representations to counteract the difficulties caused by data sparsity. We propose three new probabilistic language models that define the distribution of the next word in a sequence given several preceding words by using distributed representations of those words. We show how real-valued distributed representations for words can be learned at the same time as learning a large set of stochastic binary hidden features that are used to predict the distributed representation of the next word from previous distributed representations. Adding connections from the previous states of the binary hidden features improves performance as does adding direct connections between the real-valued distributed representations. One of our models significantly outperforms the very best n-gram models.

international symposium on neural networks | 2005

Learning nonlinear constraints with contrastive backpropagation

Andriy Mnih; Geoffrey E. Hinton

Certain datasets can be efficiently modelled in terms of constraints that are usually satisfied but sometimes are strongly violated. We propose using energy-based density models (EBMs) implementing products of frequently approximately satisfied nonlinear constraints for modelling such datasets. We demonstrate the feasibility of this approach by training an EBM using contrastive backpropagation on a dataset of idealized trajectories of two balls bouncing in a box and showing that the model learns an accurate and efficient representation of the dataset, taking advantage of the approximate independence between subsets of variables.

neural information processing systems | 2007