ArXiv | 2021

A Control-Theoretic Perspective on Optimal High-Order Optimization

 
 

Abstract


<jats:p>We provide a control-theoretic perspective on optimal tensor algorithms for minimizing a convex function in a finite-dimensional Euclidean space. Given a function <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\varPhi : {\\mathbb {R}}^d \\rightarrow {\\mathbb {R}}$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>Φ</mml:mi>\n <mml:mo>:</mml:mo>\n <mml:msup>\n <mml:mrow>\n <mml:mi>R</mml:mi>\n </mml:mrow>\n <mml:mi>d</mml:mi>\n </mml:msup>\n <mml:mo>→</mml:mo>\n <mml:mi>R</mml:mi>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> that is convex and twice continuously differentiable, we study a closed-loop control system that is governed by the operators <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\nabla \\varPhi $$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>∇</mml:mi>\n <mml:mi>Φ</mml:mi>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> and <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\nabla ^2 \\varPhi $$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:msup>\n <mml:mi>∇</mml:mi>\n <mml:mn>2</mml:mn>\n </mml:msup>\n <mml:mi>Φ</mml:mi>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> together with a feedback control law <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\lambda (\\cdot )$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>λ</mml:mi>\n <mml:mo>(</mml:mo>\n <mml:mo>·</mml:mo>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> satisfying the algebraic equation <jats:inline-formula><jats:alternatives><jats:tex-math>$$(\\lambda (t))^p\\Vert \\nabla \\varPhi (x(t))\\Vert ^{p-1} = \\theta $$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:msup>\n <mml:mrow>\n <mml:mo>(</mml:mo>\n <mml:mi>λ</mml:mi>\n <mml:mrow>\n <mml:mo>(</mml:mo>\n <mml:mi>t</mml:mi>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n <mml:mi>p</mml:mi>\n </mml:msup>\n <mml:msup>\n <mml:mrow>\n <mml:mo>‖</mml:mo>\n <mml:mi>∇</mml:mi>\n <mml:mi>Φ</mml:mi>\n <mml:mrow>\n <mml:mo>(</mml:mo>\n <mml:mi>x</mml:mi>\n <mml:mrow>\n <mml:mo>(</mml:mo>\n <mml:mi>t</mml:mi>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n <mml:mo>‖</mml:mo>\n </mml:mrow>\n <mml:mrow>\n <mml:mi>p</mml:mi>\n <mml:mo>-</mml:mo>\n <mml:mn>1</mml:mn>\n </mml:mrow>\n </mml:msup>\n <mml:mo>=</mml:mo>\n <mml:mi>θ</mml:mi>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> for some <jats:inline-formula><jats:alternatives><jats:tex-math>$$\\theta \\in (0, 1)$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>θ</mml:mi>\n <mml:mo>∈</mml:mo>\n <mml:mo>(</mml:mo>\n <mml:mn>0</mml:mn>\n <mml:mo>,</mml:mo>\n <mml:mn>1</mml:mn>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula>. Our first contribution is to prove the existence and uniqueness of a local solution to this system via the Banach fixed-point theorem. We present a simple yet nontrivial Lyapunov function that allows us to establish the existence and uniqueness of a global solution under certain regularity conditions and analyze the convergence properties of trajectories. The rate of convergence is <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(1/t^{(3p+1)/2})$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>O</mml:mi>\n <mml:mo>(</mml:mo>\n <mml:mn>1</mml:mn>\n <mml:mo>/</mml:mo>\n <mml:msup>\n <mml:mi>t</mml:mi>\n <mml:mrow>\n <mml:mo>(</mml:mo>\n <mml:mn>3</mml:mn>\n <mml:mi>p</mml:mi>\n <mml:mo>+</mml:mo>\n <mml:mn>1</mml:mn>\n <mml:mo>)</mml:mo>\n <mml:mo>/</mml:mo>\n <mml:mn>2</mml:mn>\n </mml:mrow>\n </mml:msup>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> in terms of objective function gap and <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(1/t^{3p})$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>O</mml:mi>\n <mml:mo>(</mml:mo>\n <mml:mn>1</mml:mn>\n <mml:mo>/</mml:mo>\n <mml:msup>\n <mml:mi>t</mml:mi>\n <mml:mrow>\n <mml:mn>3</mml:mn>\n <mml:mi>p</mml:mi>\n </mml:mrow>\n </mml:msup>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula> in terms of squared gradient norm. Our second contribution is to provide two algorithmic frameworks obtained from discretization of our continuous-time system, one of which generalizes the large-step A-HPE framework of Monteiro and Svaiter (SIAM J Optim 23(2):1092–1125, 2013) and the other of which leads to a new optimal <jats:italic>p</jats:italic>-th order tensor algorithm. While our discrete-time analysis can be seen as a simplification and generalization of Monteiro and Svaiter (2013), it is largely motivated by the aforementioned continuous-time analysis, demonstrating the fundamental role that the feedback control plays in optimal acceleration and the clear advantage that the continuous-time perspective brings to algorithmic design. A highlight of our analysis is that we show that all of the <jats:italic>p</jats:italic>-th order optimal tensor algorithms that we discuss minimize the squared gradient norm at a rate of <jats:inline-formula><jats:alternatives><jats:tex-math>$$O(k^{-3p})$$</jats:tex-math><mml:math xmlns:mml= http://www.w3.org/1998/Math/MathML >\n <mml:mrow>\n <mml:mi>O</mml:mi>\n <mml:mo>(</mml:mo>\n <mml:msup>\n <mml:mi>k</mml:mi>\n <mml:mrow>\n <mml:mo>-</mml:mo>\n <mml:mn>3</mml:mn>\n <mml:mi>p</mml:mi>\n </mml:mrow>\n </mml:msup>\n <mml:mo>)</mml:mo>\n </mml:mrow>\n </mml:math></jats:alternatives></jats:inline-formula>, which complements the recent analysis in Gasnikov et al. (in: COLT, PMLR, pp 1374–1391, 2019), Jiang et al. (in: COLT, PMLR, pp 1799–1801, 2019) and Bubeck et al. (in: COLT, PMLR, pp 492–507, 2019).</jats:p>

Volume abs/1912.07168
Pages None
DOI 10.1007/s10107-021-01721-3
Language English
Journal ArXiv

Full Text