Sham M. Kakade | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sham M. Kakade is active.

Explore More

Publication

Featured researches published by Sham M. Kakade.

Neural Networks | 2002

Opponent interactions between serotonin and dopamine

Nathaniel D. Daw; Sham M. Kakade; Peter Dayan

Anatomical and pharmacological evidence suggests that the dorsal raphe serotonin system and the ventral tegmental and substantia nigra dopamine system may act as mutual opponents. In the light of the temporal difference model of the involvement of the dopamine system in reward learning, we consider three aspects of motivational opponency involving dopamine and serotonin. We suggest that a tonic serotonergic signal reports the long-run average reward rate as part of an average-case reinforcement learning model; that a tonic dopaminergic signal reports the long-run average punishment rate in a similar context; and finally speculate that a phasic serotonin signal might report an ongoing prediction error for future punishment.

international conference on machine learning | 2006

Cover trees for nearest neighbor

Alina Beygelzimer; Sham M. Kakade; John Langford

We present a tree data structure for fast nearest neighbor operations in general n-point metric spaces (where the data set consists of n points). The data structure requires O(n) space regardless of the metrics structure yet maintains all performance properties of a navigating net (Krauthgamer & Lee, 2004b). If the point set has a bounded expansion constant c, which is a measure of the intrinsic dimensionality, as defined in (Karger & Ruhl, 2002), the cover tree data structure can be constructed in O (c6n log n) time. Furthermore, nearest neighbor queries require time only logarithmic in n, in particular O (c12 log n) time. Our experimental results show speedups over the brute force search varying between one and several orders of magnitude on natural machine learning datasets.

international conference on machine learning | 2012

Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting

Niranjan Srinivas; Andreas Krause; Sham M. Kakade; Matthias W. Seeger

Many applications require optimizing an unknown, noisy function that is expensive to evaluate. We formalize this task as a multi-armed bandit problem, where the payoff function is either sampled from a Gaussian process (GP) or has low RKHS norm. We resolve the important open problem of deriving regret bounds for this setting, which imply novel convergence rates for GP optimization. We analyze GP-UCB, an intuitive upper-confidence based algorithm, and bound its cumulative regret in terms of maximal information gain, establishing a novel connection between GP optimization and experimental design. Moreover, by bounding the latter in terms of operator spectra, we obtain explicit sublinear regret bounds for many commonly used covariance functions. In some important cases, our bounds have surprisingly weak dependence on the dimensionality. In our experiments on real sensor data, GP-UCB compares favorably with other heuristical GP optimization approaches.

international conference on machine learning | 2009

Multi-view clustering via canonical correlation analysis

Kamalika Chaudhuri; Sham M. Kakade; Karen Livescu; Karthik Sridharan

Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).

Journal of Computer and System Sciences | 2012

A spectral algorithm for learning Hidden Markov Models

Daniel J. Hsu; Sham M. Kakade; Tong Zhang

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the smallest singular value of the HMM parameters), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations-it implicitly depends on this quantity through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications.

Nature Neuroscience | 2000

Learning and selective attention

Peter Dayan; Sham M. Kakade; P. Read Montague

Selective attention involves the differential processing of different stimuli, and has widespread psychological and neural consequences. Although computational modeling should offer a powerful way of linking observable phenomena at different levels, most work has focused on the relatively narrow issue of constraints on processing resources. By contrast, we consider statistical and informational aspects of selective attention, divorced from resource constraints, which are evident in animal conditioning experiments involving uncertain predictions and unreliable stimuli. Neuromodulatory systems and limbic structures are known to underlie attentional effects in such tasks.

Neural Networks | 2002

Dopamine: generalization and bonuses

Sham M. Kakade; Peter Dayan

In the temporal difference model of primate dopamine neurons, their phasic activity reports a prediction error for future reward. This model is supported by a wealth of experimental data. However, in certain circumstances, the activity of the dopamine cells seems anomalous under the model, as they respond in particular ways to stimuli that are not obviously related to predictions of reward. In this paper, we address two important sets of anomalies, those having to do with generalization and novelty. Generalization responses are treated as the natural consequence of partial information; novelty responses are treated by the suggestion that dopamine cells multiplex information about reward bonuses, including exploration bonuses and shaping bonuses. We interpret this additional role for dopamine in terms of the mechanistic attentional and psychomotor effects of dopamine, having the computational role of guiding exploration.

neural information processing systems | 2012

A Spectral Algorithm for Latent Dirichlet Allocation

Anima Anandkumar; Yi-Kai Liu; Daniel J. Hsu; Dean P. Foster; Sham M. Kakade

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

conference on innovations in theoretical computer science | 2013

Learning mixtures of spherical gaussians: moment methods and spectral decompositions

Daniel J. Hsu; Sham M. Kakade

This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures. Thus computational and information-theoretic barriers to efficient estimation in mixture models are precluded when the mixture components have means in general position and spherical covariances. Some connections are made to estimation problems related to independent component analysis.

IEEE Transactions on Information Theory | 2011

Robust Matrix Decomposition With Sparse Corruptions

Daniel J. Hsu; Sham M. Kakade; Tong Zhang

Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse matrix, and the goal is to recover these individual components from the observed sum. Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, and principal components analysis. We study conditions under which recovering such a decomposition is possible via a combination of ℓ1 norm and trace norm minimization. We are specifically interested in the question of how many sparse corruptions are allowed so that convex programming can still achieve accurate recovery, and we obtain stronger recovery guarantees than previous studies. Moreover, we do not assume that the spatial pattern of corruptions is random, which stands in contrast to related analyses under such assumptions via matrix completion.

Explore More