Daniel J. Hsu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Daniel J. Hsu is active.

Explore More

Publication

Featured researches published by Daniel J. Hsu.

Journal of Computer and System Sciences | 2012

A spectral algorithm for learning Hidden Markov Models

Daniel J. Hsu; Sham M. Kakade; Tong Zhang

Hidden Markov Models (HMMs) are one of the most fundamental and widely used statistical tools for modeling discrete time series. In general, learning HMMs from data is computationally hard (under cryptographic assumptions), and practitioners typically resort to search heuristics which suffer from the usual local optima issues. We prove that under a natural separation condition (bounds on the smallest singular value of the HMM parameters), there is an efficient and provably correct algorithm for learning HMMs. The sample complexity of the algorithm does not explicitly depend on the number of distinct (discrete) observations-it implicitly depends on this quantity through spectral properties of the underlying HMM. This makes the algorithm particularly applicable to settings with a large number of observations, such as those in natural language processing where the space of observation is sometimes the words in a language. The algorithm is also simple, employing only a singular value decomposition and matrix multiplications.

international conference on machine learning | 2008

Hierarchical sampling for active learning

Sanjoy Dasgupta; Daniel J. Hsu

We present an active learning scheme that exploits cluster structure in data.

neural information processing systems | 2012

A Spectral Algorithm for Latent Dirichlet Allocation

Anima Anandkumar; Yi-Kai Liu; Daniel J. Hsu; Dean P. Foster; Sham M. Kakade

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.

conference on innovations in theoretical computer science | 2013

Learning mixtures of spherical gaussians: moment methods and spectral decompositions

Daniel J. Hsu; Sham M. Kakade

This work provides a computationally efficient and statistically consistent moment-based estimator for mixtures of spherical Gaussians. Under the condition that component means are in general position, a simple spectral decomposition technique yields consistent parameter estimates from low-order observable moments, without additional minimum separation assumptions needed by previous computationally efficient estimation procedures. Thus computational and information-theoretic barriers to efficient estimation in mixture models are precluded when the mixture components have means in general position and spherical covariances. Some connections are made to estimation problems related to independent component analysis.

IEEE Transactions on Information Theory | 2011

Robust Matrix Decomposition With Sparse Corruptions

Daniel J. Hsu; Sham M. Kakade; Tong Zhang

Suppose a given observation matrix can be decomposed as the sum of a low-rank matrix and a sparse matrix, and the goal is to recover these individual components from the observed sum. Such additive decompositions have applications in a variety of numerical problems including system identification, latent variable graphical modeling, and principal components analysis. We study conditions under which recovering such a decomposition is possible via a combination of ℓ1 norm and trace norm minimization. We are specifically interested in the question of how many sparse corruptions are allowed so that convex programming can still achieve accurate recovery, and we obtain stronger recovery guarantees than previous studies. Moreover, we do not assume that the spatial pattern of corruptions is random, which stands in contrast to related analyses under such assumptions via matrix completion.

Siam Journal on Optimization | 2013

Stochastic Convex Optimization with Bandit Feedback

Alekh Agarwal; Dean P. Foster; Daniel J. Hsu; Sham M. Kakade; Alexander Rakhlin

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set χ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ χ. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly (d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.

international joint conference on natural language processing | 2015

Model-based Word Embeddings from Decompositions of Count Matrices

Karl Stratos; Michael Collins; Daniel J. Hsu

This work develops a new statistical understanding of word embeddings induced from transformed count data. Using the class of hidden Markov models (HMMs) underlying Brown clustering as a generative model, we demonstrate how canonical correlation analysis (CCA) and certain count transformations permit efficient and effective recovery of model parameters with lexical semantics. We further show in experiments that these techniques empirically outperform existing spectral methods on word similarity and analogy tasks, and are also competitive with other popular methods such as WORD2VEC and GLOVE.

arXiv: Learning | 2011

Scaling Up Machine Learning: Parallel Online Learning

Daniel J. Hsu; Nikos Karampatziakis; John Langford; Alexander J. Smola

In this work we study parallelization of online learning, a core primitive in machine learning. In a parallel environment all known approaches for parallel online learning lead to delayed updates, where the model is updated using out-of-date information. In the worst case, or when examples are temporally correlated, delay can have a very adverse effect on the learning algorithm. Here, we analyze and present preliminary empirical results on a set of learning architectures based on a feature sharding approach that present various tradeoffs between delay, degree of parallelism, representation power and empirical performance.

Physical Review D | 2016

Do dark matter halos explain lensing peaks

José Manuel Zorrilla Matilla; Zoltan Haiman; Daniel J. Hsu; Arushi Gupta; Andrea Petri

We have investigated a recently proposed halo-based model, Camelus, for predicting weak-lensing peak counts, and compared its results over a collection of 162 cosmologies with those from N-body simulations. While counts from both models agree for peaks with

ieee european symposium on security and privacy | 2017