Alekh Agarwal
Microsoft
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Alekh Agarwal.
IEEE Transactions on Automatic Control | 2012
John C. Duchi; Alekh Agarwal; Martin J. Wainwright
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent coordination, estimation in sensor networks, and large-scale machine learning. We develop and analyze distributed algorithms based on dual subgradient averaging, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis allows us to clearly separate the convergence of the optimization algorithm itself and the effects of communication dependent on the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network, and confirm this predictions sharpness both by theoretical lower bounds and simulations for various networks. Our approach includes the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.
conference on decision and control | 2012
Alekh Agarwal; John C. Duchi
We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. We show n-node architectures whose optimization error in stochastic problems-in spite of asynchronous delays-scales asymptotically as O(1/√nT) after T iterations. This rate is known to be optimal for a distributed system with n nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a logistic regression task.
Annals of Statistics | 2012
Alekh Agarwal; Sahand Negahban; Martin J. Wainwright
Many statistical
IEEE Transactions on Information Theory | 2012
Alekh Agarwal; Peter L. Bartlett; Pradeep Ravikumar; Martin J. Wainwright
M
Siam Journal on Optimization | 2016
Alekh Agarwal; Animashree Anandkumar; Prateek Jain; Praneeth Netrapalli
-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods for solving such problems, working within a high-dimensional framework that allows the data dimension
Siam Journal on Optimization | 2013
Alekh Agarwal; Dean P. Foster; Daniel J. Hsu; Sham M. Kakade; Alexander Rakhlin
\pdim
IEEE Transactions on Information Theory | 2013
Alekh Agarwal; John C. Duchi
to grow with (and possibly exceed) the sample size
international conference on machine learning | 2008
Pradeep D. Ravikumar; Alekh Agarwal; Martin J. Wainwright
\numobs
allerton conference on communication, control, and computing | 2012
John C. Duchi; Alekh Agarwal; Martin J. Wainwright
. This high-dimensional structure precludes the usual global assumptions---namely, strong convexity and smoothness conditions---that underlie much of classical optimization analysis. We define appropriately restricted versions of these conditions, and show that they are satisfied with high probability for various statistical models. Under these conditions, our theory guarantees that projected gradient descent has a globally geometric rate of convergence up to the \emph{statistical precision} of the model, meaning the typical distance between the true unknown parameter
allerton conference on communication, control, and computing | 2011
John C. Duchi; Alekh Agarwal; Mikael Johansson; Michael I. Jordan
\theta^*