Is this you? Create Your Porfile

Andre Wibisono

University of California, Berkeley

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Andre Wibisono is active.

Explore More

Publication

Featured researches published by Andre Wibisono.

Proceedings of the National Academy of Sciences of the United States of America | 2016

A Variational Perspective on Accelerated Methods in Optimization

Andre Wibisono; Ashia C. Wilson; Michael I. Jordan

Significance Optimization problems arise naturally in statistical machine learning and other fields concerned with data analysis. The rapid growth in the scale and complexity of modern datasets has led to a focus on gradient-based methods and also on the class of accelerated methods, first proposed by Nesterov in 1983. Accelerated methods achieve faster convergence rates than gradient methods and indeed, under certain conditions, they achieve optimal rates. However, accelerated methods are not descent methods and remain a conceptual mystery. We propose a variational, continuous-time framework for understanding accelerated methods. We provide a systematic methodology for converting accelerated higher-order methods from continuous time to discrete time. Our work illuminates a class of dynamics that may be useful for designing better algorithms for optimization. Accelerated gradient methods play a central role in optimization, achieving optimal rates in many settings. Although many generalizations and extensions of Nesterov’s original acceleration method have been proposed, it is not yet clear what is the natural scope of the acceleration concept. In this paper, we study accelerated methods from a continuous-time perspective. We show that there is a Lagrangian functional that we call the Bregman Lagrangian, which generates a large class of accelerated methods in continuous time, including (but not limited to) accelerated gradient descent, its non-Euclidean extension, and accelerated higher-order gradient methods. We show that the continuous-time limit of all of these methods corresponds to traveling the same curve in spacetime at different speeds. From this perspective, Nesterov’s technique and many of its generalizations can be viewed as a systematic way to go from the continuous-time curves generated by the Bregman Lagrangian to a family of discrete-time accelerated algorithms.

IEEE Transactions on Information Theory | 2015

Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations

John C. Duchi; Michael I. Jordan; Martin J. Wainwright; Andre Wibisono

We consider derivative-free algorithms for stochastic and nonstochastic convex optimization problems that use only function values rather than gradients. Focusing on nonasymptotic bounds on convergence rates, we show that if pairs of function values are available, algorithms for d-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods. We establish such results for both smooth and nonsmooth cases, sharpening previous analyses that suggested a worse dimension dependence, and extend our results to the case of multiple (m ≥ 2) evaluations. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, establishing the sharpness of our achievable results up to constant (sometimes logarithmic) factors.

international symposium on information theory | 2017

Information and estimation in Fokker-Planck channels

Andre Wibisono; Varun Jog; Po-Ling Loh

We study the relationship between information- and estimation-theoretic quantities in time-evolving systems. We focus on the Fokker-Planck channel defined by a general stochastic differential equation, and show that the time derivatives of entropy, KL divergence, and mutual information are characterized by estimation-theoretic quantities involving an appropriate generalization of the Fisher information. Our results vastly extend De Bruijns identity and the classical I-MMSE relation.

neural information processing systems | 2013