Michael I. Jordan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Michael I. Jordan is active.

Explore More

Publication

Featured researches published by Michael I. Jordan.

Neural Computation | 1991

Adaptive mixtures of local experts

Robert A. Jacobs; Michael I. Jordan; Steven J. Nowlan; Geoffrey E. Hinton

We present a new supervised learning procedure for systems composed of many separate networks, each of which learns to handle a subset of the complete set of training cases. The new procedure can be viewed either as a modular version of a multilayer supervised network, or as an associative version of competitive learning. It therefore provides a new link between these two apparently different approaches. We demonstrate that the learning procedure divides up a vowel discrimination task into appropriate subtasks, each of which can be solved by a very simple expert network.

Journal of the American Statistical Association | 2006

Hierarchical Dirichlet Processes

Yee Whye Teh; Michael I. Jordan; Matthew J. Beal; David M. Blei

We consider problems involving groups of data where each observation within a group is a draw from a mixture model and where it is desirable to share mixture components between groups. We assume that the number of mixture components is unknown a priori and is to be inferred from the data. In this setting it is natural to consider sets of Dirichlet processes, one for each group, where the well-known clustering property of the Dirichlet process provides a nonparametric prior for the number of mixture components within each group. Given our desire to tie the mixture models in the various groups, we consider a hierarchical model, specifically one in which the base measure for the child Dirichlet processes is itself distributed according to a Dirichlet process. Such a base measure being discrete, the child Dirichlet processes necessarily share atoms. Thus, as desired, the mixture models in the different groups necessarily share mixture components. We discuss representations of hierarchical Dirichlet processes in terms of a stick-breaking process, and a generalization of the Chinese restaurant process that we refer to as the “Chinese restaurant franchise.” We present Markov chain Monte Carlo algorithms for posterior inference in hierarchical Dirichlet process mixtures and describe applications to problems in information retrieval and text modeling.

Foundations and Trends® in Machine Learning archive | 2008

Graphical Models, Exponential Families, and Variational Inference

Martin J. Wainwright; Michael I. Jordan

The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide variety of algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

Nature Neuroscience | 2002

Optimal feedback control as a theory of motor coordination

Emanuel Todorov; Michael I. Jordan

A central problem in motor control is understanding how the many biomechanical degrees of freedom are coordinated to achieve a common goal. An especially puzzling aspect of coordination is that behavioral goals are achieved reliably and repeatedly with movements rarely reproducible in their detail. Existing theoretical frameworks emphasize either goal achievement or the richness of motor variability, but fail to reconcile the two. Here we propose an alternative theory based on stochastic optimal feedback control. We show that the optimal strategy in the face of uncertainty is to allow variability in redundant (task-irrelevant) dimensions. This strategy does not enforce a desired trajectory, but uses feedback more intelligently, correcting only those deviations that interfere with task goals. From this framework, task-constrained variability, goal-directed corrections, motor synergies, controlled parameters, simplifying rules and discrete coordination modes emerge naturally. We present experimental results from a range of motor tasks to support this theory.

Archive | 1999

Learning in graphical models

Michael I. Jordan

Part 1 Inference: introduction to inference for Bayesian networks, Robert Cowell advanced inference in Bayesian networks, Robert Cowell inference in Bayesian networks using nested junction trees, Uffe Kjoerulff bucket elimination - a unifying framework for probabilistic inference, R. Dechter an introduction to variational methods for graphical models, Michael I. Jordan et al improving the mean field approximation via the use of mixture distributions, Tommi S. Jaakkola and Michael I. Jordan introduction to Monte Carlo methods, D.J.C. MacKay suppressing random walls in Markov chain Monte Carlo using ordered overrelaxation, Radford M. Neal. Part 2 Independence: chain graphs and symmetric associations, Thomas S. Richardson the multiinformation function as a tool for measuring stochastic dependence, M. Studeny and J. Vejnarova. Part 3 Foundations for learning: a tutorial on learning with Bayesian networks, David Heckerman a view of the EM algorithm that justifies incremental, sparse and other variants, Radford M. Neal and Geoffrey E. Hinton. Part 4 Learning from data: latent variable models, Christopher M. Bishop stochastic algorithms for exploratory data analysis - data clustering and data visualization, Joachim M. Buhmann learning Bayesian networks with local structure, Nir Friedman and Moises Goldszmidt asymptotic model selection for directed networks with hidden variables, Dan Geiger et al a hierarchical community of experts, Geoffrey E. Hinton et al an information-theoretic analysis of hard and soft assignment methods for clustering, Michael J. Kearns et al learning hybrid Bayesian networks from data, Stefano Monti and Gregory F. Cooper a mean field learning algorithm for unsupervised neural networks, Lawrence Saul and Michael Jordan edge exclusion tests for graphical Gaussian models, Peter W.F. Smith and Joe Whittaker hepatitis B - a case study in MCMC, D.J. Spiegelhalter et al prediction with Gaussian processes - from linear regression to linear prediction and beyond, C.K.I. Williams.

Journal of Machine Learning Research | 2003

Matching words and pictures

Kobus Barnard; Pinar Duygulu; David A. Forsyth; Nando de Freitas; David M. Blei; Michael I. Jordan

We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmanns hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation (MoM-LDA). All models are assessed using a large collection of annotated images of real scenes. We study in depth the difficult problem of measuring performance. For the annotation task, we look at prediction performance on held out data. We present three alternative measures, oriented toward different types of task. Measuring the performance of correspondence methods is harder, because one must determine whether a word has been placed on the right region of an image. We can use annotation performance as a proxy measure, but accurate measurement requires hand labeled data, and thus must occur on a smaller scale. We show results using both an annotation proxy, and manually labeled data.

Machine Learning | 1999

An Introduction to Variational Methods for Graphical Models

Michael I. Jordan; Zoubin Ghahramani; Tommi S. Jaakkola; Lawrence K. Saul

This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMR-DT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.

Machine Learning | 2003

An Introduction to MCMC for Machine Learning

Christophe Andrieu; Nando de Freitas; Arnaud Doucet; Michael I. Jordan

This purpose of this introductory paper is threefold. First, it introduces the Monte Carlo method with emphasis on probabilistic machine learning. Second, it reviews the main building blocks of modern Markov chain Monte Carlo simulation, thereby providing and introduction to the remaining papers of this special issue. Lastly, it discusses new interesting research horizons.

international conference on machine learning | 2004

Multiple kernel learning, conic duality, and the SMO algorithm

Francis R. Bach; Gert R. G. Lanckriet; Michael I. Jordan

While classical kernel-based classifiers are based on a single kernel, in practice it is often desirable to base classifiers on combinations of multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for the support vector machine (SVM), and showed that the optimization of the coefficients of such a combination reduces to a convex optimization problem known as a quadratically-constrained quadratic program (QCQP). Unfortunately, current convex optimization toolboxes can solve this problem only for a small number of kernels and a small number of data points; moreover, the sequential minimal optimization (SMO) techniques that are essential in large-scale implementations of the SVM cannot be applied because the cost function is non-differentiable. We propose a novel dual formulation of the QCQP as a second-order cone programming problem, and show how to exploit the technique of Moreau-Yosida regularization to yield a formulation to which SMO techniques can be applied. We present experimental results that show that our SMO-based algorithm is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.

Cognitive Science | 1992

Forward models : Supervised learning with a distal teacher

Michael I. Jordan; David E. Rumelhart

Internal models of the environment have an important role to play in adaptive systems, in general, and are of particular importance for the supervised learning paradigm. In this article we demonstrate that certain classical problems associated with the notion of the “teacher” in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system. In particular, we show how supervised learning algorithms can be utilized in cases in which an unknown dynamical system intervenes between actions and desired outcomes. Our approach applies to any supervised learning algorithm that is capable of learning in multilayer networks.

Explore More