Sublinear classical and quantum algorithms for general matrix games
SSublinear Classical and Quantum Algorithms for General Matrix Games
Tongyang Li *1,2
Chunhao Wang ∗ Shouvanik Chakrabarti Xiaodi Wu Joint Center for Quantum Information and Computer Science, Department of Computer Science, and Institute for AdvancedComputer Studies, University of Maryland Center for Theoretical Physics, Massachusetts Institute of Technology Department of Computer Science and Engineering, Pennsylvania State University Department of Computer Science, University of Texas at [email protected], [email protected], {shouv,xwu}@cs.umd.edu
Abstract
We investigate sublinear classical and quantum algorithmsfor matrix games, a fundamental problem in optimizationand machine learning, with provable guarantees. Given a ma-trix A ∈ R n × d , sublinear algorithms for the matrix game min x ∈X max y ∈Y y (cid:62) Ax were previously known only for twospecial cases: (1) Y being the (cid:96) -norm unit ball, and (2) X being either the (cid:96) - or the (cid:96) -norm unit ball. We give a sublin-ear classical algorithm that can interpolate smoothly betweenthese two cases: for any fixed q ∈ (1 , , we solve the matrixgame where X is a (cid:96) q -norm unit ball within additive error (cid:15) in time ˜ O (( n + d ) /(cid:15) ) . We also provide a correspondingsublinear quantum algorithm that solves the same task in time ˜ O (( √ n + √ d ) poly(1 /(cid:15) )) with a quadratic improvement inboth n and d . Both our classical and quantum algorithmsare optimal in the dimension parameters n and d up to poly-logarithmic factors. Finally, we propose sublinear classicaland quantum algorithms for the approximate Carathéodoryproblem and the (cid:96) q -margin support vector machines as appli-cations. Introduction
Motivations.
Minimax games between two parties, i.e., min x max y f ( x, y ) , is a basic model in game theory and hasubiquitous connections and applications to economics, opti-mization and machine learning, theoretical computer science,etc. Among minimax games, one of the most fundamentalcases is the bilinear minimax game, also known as the matrixgame , with the following form: min x ∈X max y ∈Y y (cid:62) Ax, where A ∈ R n × d , X ⊂ R d , Y ⊂ R n . (1)Matrix games are fundamental in algorithm design due totheir equivalence to linear programs (Dantzig 1998), andalso in machine learning because they contain classifica-tion (Novikoff 1963; Minsky and Papert 1988) as a specialcase, and many other important problems.For many common domains X and Y , matrix games canbe solved efficiently within approximation error (cid:15) , i.e., to out-put x (cid:48) ∈ X and y (cid:48) ∈ Y such that ( y (cid:48) ) (cid:62) Ax (cid:48) is (cid:15) -close to theoptimum in (1). For some specific choices of X and Y , the * matrix game can even be solved in sublinear time in the size nd of A . When X and Y are both (cid:96) -norm unit balls, Grigo-riadis and Khachiyan (1995) can solve the matrix game intime O (( n + d ) log( n + d ) /(cid:15) ) . When X is the (cid:96) -norm unitball in R d and Y is the (cid:96) -norm unit ball in R n , Clarkson,Hazan, and Woodruff (2012) can solve the matrix game intime O (( n + d ) log n/(cid:15) ) .As far as we know, the (cid:96) - (cid:96) and (cid:96) - (cid:96) matrix games arethe only two cases where sublinear algorithms are known.However, there is general interest of solving matrix gameswith general norms. For instance, matrix games are closely re-lated to the Carathéodory problem for finding a sparse linearcombination in the convex hull of given data points, whereall the (cid:96) p -metrics with p ≥ have been well-studied (Bar-man 2015; Mirrokni et al. 2017; Combettes and Pokutta2019). In addition, matrix games are common in machinelearning especially support vector machines (SVMs), andgeneral (cid:96) p -margin SVMs have also been considered by pre-vious literature, see e.g. the book by Deng, Tian, and Zhang(2012). In all, it is a natural question to investigate sublinearalgorithms for general matrix games . In addition, quantumcomputing has been rapidly advancing and current technol-ogy has reached "quantum supremacy" for some specifictasks (Arute et al. 2019); since previous works have givensublinear quantum algorithms for (cid:96) - (cid:96) matrix games (Li,Chakrabarti, and Wu 2019; Apeldoorn and Gilyén 2019) and (cid:96) - (cid:96) matrix games (Li, Chakrabarti, and Wu 2019) with run-ning time ( √ n + √ d ) poly(1 /(cid:15) ) , it is also natural to explore sublinear quantum algorithms for general matrix games . Contributions.
We conduct a systematic study of (cid:96) q - (cid:96) matrix games for any q ∈ (1 , which corresponds to (cid:96) q -margin SVMs and the (cid:96) p -Carathéodory problem for any p ≥ . We use the following entry-wise input model, the standardassumption in the sublinear algorithms in Grigoriadis andKhachiyan (1995); Clarkson, Hazan, and Woodruff (2012): Input model:
Given any i ∈ [ n ] and j ∈ [ d ] , the j th entryof A i can be recovered in O (1) time.Quantumly, we consider an almost same oracle: Quantum input model:
Given any i ∈ [ n ] and j ∈ [ d ] ,the j th entry of A i can be recovered in O (1) time coherently .The only difference is to allow coherent queries, whichgive quantum algorithms the ability to query different loca-tions in superposition, and have been the standard quantiza- a r X i v : . [ qu a n t - ph ] D ec ion of the classical inputs and commonly adopted in previousworks (Li, Chakrabarti, and Wu 2019; Apeldoorn and Gilyén2019). Theorem 1 (Main Theorem) . Given q ∈ (1 , . Define p ≥ such that p + q = 1 . Consider the (cid:96) q - (cid:96) matrix game : σ := max x ∈ B dq min p ∈ ∆ n p (cid:62) Ax, (2) where B dq is the (cid:96) q -unit ball in R d and ∆ n is the (cid:96) -simplexin R n . Then we can find an ¯ x ∈ B dq s.t. min i ∈ [ n ] A i ¯ x ≥ σ − (cid:15) (3) with success probability at least / , using • O (cid:0) ( n + d )( p +log n ) (cid:15) (cid:1) classical queries (Theorem 2); or • ˜ O (cid:0) p √ n(cid:15) + p . √ d(cid:15) (cid:1) quantum queries (Theorem 3).When p = Ω(log d/(cid:15) ) , the above bounds can be improved(by Lemma 1) to respectively • O (cid:0) ( n + d )( log d(cid:15) +log n ) (cid:15) (cid:1) queries to the classical input model; • ˜ O (cid:0) √ n(cid:15) + √ d(cid:15) . (cid:1) queries to the quantum input model.Both results are optimal in n and d up to poly-log factors aswe show Ω( n + d ) and Ω( √ n + √ d ) classical and quantumlower bounds respectively when (cid:15) = Θ(1) (Theorem 4). Conceptually, our classical and quantum algorithms forgeneral matrix games enjoy quite a few nice properties. Onthe one hand, they can be directly applied to•
Convex geometry:
We give the first sublinear classicaland quantum algorithms for the approximate Carathéodoryproblem (Corollary 1), improving the previous linear-time algorithms of Mirrokni et al. (2017); Combettes andPokutta (2019);•
Supervised learning:
We provide the first sublinear al-gorithms for general (cid:96) q -margin support vector machines(SVMs) (Corollary 2).On the other hand, our quantum algorithm is friendly fornear-term applications . It uses the standard quantum inputmodel and needs not to use any sophisticated quantum datastructures. It is classical-quantum hybrid where the quantumpart is isolated by pieces of state preparations connected byclassical processing. Its output is completely classical .Technique-wise, we are deeply inspired by Clarkson,Hazan, and Woodruff (2012), which serves as the startingpoint of our algorithm design. At a high level, Clarkson etal.’s algorithm follows a primal-dual framework where theprimal part applies ( (cid:96) -norm) online gradient descent (OGD)by Zinkevich (2003), and the dual part applies multiplicativeweight updates (MWU) by (cid:96) -sampling. The choice of the Throughout the paper, we use the bold font p to denote a vectorand the math font p to denote a real number. ¯ x ∈ B dq is the standard objective quantity under the (cid:96) q -norm.Also note that once we have the ¯ x in (3), any p ∈ ∆ n satisfies p (cid:62) A ¯ x ≥ σ − (cid:15) . Here ˜ O omits poly-logarithmic factors. (cid:96) -norm metric greatly facilitates the design and analysis ofthe algorithms for both parts. However, it is conceivable that more sophisticated design and analysis will be required tohandle general (cid:96) q - (cid:96) matrix games.Classically, our main technical contribution is to expandthe primal-dual approach of Clarkson, Hazan, and Woodruff(2012) to work for more general metrics for the (cid:96) q - (cid:96) ma-trix game. Specifically, in the primal we replace OGD by ageneralized p -norm OGD due to Shalev-Shwartz (2012), andin the dual we replace the (cid:96) -sampling by (cid:96) q -sampling. Weconduct a careful algorithm design and analysis to ensurethat this strategy only incurs an O ( p/(cid:15) ) overhead in thenumber of iterations, and the error of the (cid:96) q - (cid:96) matrix gameis still bounded by (cid:15) as in (3). In a nutshell, our algorithmcan be viewed as an interpolation between the (cid:96) - (cid:96) matrixgame (Clarkson, Hazan, and Woodruff 2012) and the (cid:96) - (cid:96) matrix game (Grigoriadis and Khachiyan 1995): when q isclose to 2 the algorithm is more similar to Clarkson, Hazan,and Woodruff (2012), whereas when q is close to 1, p is largeand the p -norm GD becomes closer to the normalized expo-nentiated gradient (Shalev-Shwartz 2012), which is exactlythe update rule in Grigoriadis and Khachiyan (1995).Quantumly, our main contribution is the systematic im-provement of the previous quantum algorithm for (cid:96) - (cid:96) ma-trix games by Li, Chakrabarti, and Wu (2019). They achieveda quantum speedup of ˜ O ( √ n + √ d ) for solving (cid:96) - (cid:96) ma-trix games by levering quantum amplitude amplification andobserving that (cid:96) -sampling can be readily accomplished by quantum state preparation as quantum states refer to (cid:96) unitvectors. For general (cid:96) q - (cid:96) matrix game ( q ∈ (1 , ), we like-wise upgrade both primal and dual parts as in our classicalalgorithm: specifically, in the primal, we apply the p -normOGD in ˜ O ( √ d ) time, whereas in the dual, we apply themultiplicative weight update via an (cid:96) q -sampling in ˜ O ( √ n ) time. To that end, we contribute to the following technicalimprovements, which may be of independent interest:• In our algorithm, we cannot directly leverage quantumstate preparation in the (cid:96) q metric because it corresponds to (cid:96) -normalized vectors. Instead, we propose Algorithm 2for quantum (cid:96) q -sampling with O ( √ n ) oracle calls whichworks with states whose amplitudes follow (cid:96) q -norm pro-portion. Measuring such states is equivalent to performing (cid:96) q -sampling.• When p = q = 2 , we improved the (cid:15) -dependence fromthe /(cid:15) in the prior art by Li, Chakrabarti, and Wu (2019)to /(cid:15) . This is achieved by deriving a better upper boundon the entries of the vectors in the p -norm OGD (i.e., y t,j as in Eq. (38)); see the supplementary material (Eqs. (38)-(40)) for details.• In our lower bounds , although the hard cases are moti-vated by Li, Chakrabarti, and Wu (2019), the matrix gamevalues are much more complicated in the (cid:96) q - (cid:96) case. Inthe supplementary material, we figure out two functions f and f that not only separate the game values of twospecifically-constructed (cid:96) q - (cid:96) matrix games but also havemonotone and nonnegative properties, which are crucialfactors in our proof.hese improvements together result in Theorem 1. Related work.
Matrix games were probably first stud-ied as zero-sum games by Neumann (1928). The seminalwork (Nemirovski and Yudin 1983) proposed the mirror de-scent method and gave an algorithm for solving matrix gamesin time ˜ O ( nd/(cid:15) ) . This was later improved to ˜ O ( nd/(cid:15) ) bythe prox-method due to Nemirovski (2004) and the dualextrapolation method due to Nesterov (2007). To further im-prove the cost, there have been two main focuses:• Sampling-based methods: They focus on achieving sub-linear cost in nd , the size of the matrix A . Grigoriadisand Khachiyan (1995); Clarkson, Hazan, and Woodruff(2012) mentioned above are seminal examples; these sub-linear algorithms can also be used to solve semidefiniteprograms (Garber and Hazan 2011), SVMs (Hazan, Koren,and Srebro 2011), etc.• Variance-reduced methods: They focus on the cost in /(cid:15) ,in particular its decoupling with nd . Palaniappan and Bach(2016) showed how to apply the standard SVRG (John-son and Zhang 2013) technique for solving (cid:96) - (cid:96) matrixgames; this idea can also be extended to smooth func-tions using general Bregman divergences (Shi, Zhang,and Yu 2017). Variance-reduced methods for solving ma-trix games culminate in Carmon et al. (2019), where theyshow how to solve (cid:96) - (cid:96) and (cid:96) - (cid:96) matrix games in time ˜ O (nnz( A ) + (cid:112) nnz( A ) · ( n + d ) /(cid:15) ) , where nnz( A ) is thenumber of nonzero elements in A .There have been relatively few quantum results for solvingmatrix games. Kapoor, Wiebe, and Svore (2016) solved the (cid:96) - (cid:96) matrix game with cost ˜ O ( √ nd/(cid:15) ) using an unusualinput model where the representation of a data point in R d is the concatenation of d floating point numbers. More re-cently, Apeldoorn and Gilyén (2019) was able to solve the (cid:96) - (cid:96) matrix game with cost ˜ O ( √ n/(cid:15) + √ d/(cid:15) ) using the stan-dard input model above, and Li, Chakrabarti, and Wu (2019)solved the (cid:96) - (cid:96) matrix game with cost ˜ O ( √ n/(cid:15) + √ d/(cid:15) ) also using the standard input model. Preliminaries and Notations
To facilitate the reading of this paper, we introduce necessarydefinitions and notations here.
Preliminaries for quantum computing.
Quantum me-chanics can be formulated in terms of linear algebra. Forthe space C d , we denote { (cid:126)e , . . . , (cid:126)e d − } as its computationalbasis, where (cid:126)e i = (0 , . . . , , . . . , (cid:62) where 1 only appearsin the ( i + 1) th coordinate. These basic vectors can be writ-ten by the Dirac notation : (cid:126)e i := | i (cid:105) (called a “ket"), and (cid:126)e (cid:62) i := (cid:104) i | (called a “bra"). A d -dimensional quantum state is a unit vector in C d : i.e., | v (cid:105) = ( v , . . . , v d − ) (cid:62) such that (cid:80) d − i =0 | v i | = 1 . Tensor product of quantum states is their Kronecker prod-uct: if | u (cid:105) ∈ C d and | v (cid:105) ∈ C d , then | u (cid:105) ⊗ | v (cid:105) := ( u v , u v , . . . , u d − v d − ) (cid:62) , (4) which is a vector in C d ⊗ C d .Quantum access to an input matrix, also known as a quan-tum oracle , is reversible and allows access to coordinatesof the matrix in superposition , this is the essence of quan-tum speedups. In particular, to access entries of a matrix A ∈ R n × d , we exploit a quantum oracle O A , which is aunitary transformation on C n ⊗ C d ⊗ C d acc ( d acc being thedimension of a floating-point register) such that O A ( | i (cid:105) ⊗ | j (cid:105) ⊗ | z (cid:105) ) = | i (cid:105) ⊗ | j (cid:105) ⊗ | z ⊕ A ij (cid:105) (5)for any i ∈ [ n ] , j ∈ [ d ] , and z ∈ C d acc . Intuitively, O A reads the entry A ij and stores it in the third register as afloating-point number. However, to promise that O A a uni-tary transformation, O A applies the XOR operation ( ⊕ ) onthe third register. This is a natural generalization of classicalreversible computation, when each entry of A can be recov-ered in O (1) time. Subsequently, a common assumption isthat a single query to O A takes O (1) cost. Interpolation for large p . If p is large, we prove the fol-lowing lemma showing that we can restrict without loss ofgenerality to cases where p such that p + q = 1 is O (log d/(cid:15) ) ,since in this case the (cid:96) q - (cid:96) matrix game is (cid:15) -close to the (cid:96) - (cid:96) matrix game in the following sense: Lemma 1. An (cid:96) q - (cid:96) matrix game where p such that p + q = 1 is greater than log d/(cid:15) can be solved using an algorithm forsolving (cid:96) - (cid:96) games. This introduces an error O ( (cid:15) ) in theobjective value.Proof. Assume without loss of generality that (cid:15) ≤ / . Let p ≥ log d/(cid:15) ≥ log d/ ( − log(1 − (cid:15) )) . It can be easily verifiedthat B d ⊂ B dq ⊂ B d + (cid:0) − d − /p (cid:1) B dq . Thus B dq ⊂ B d + (cid:15) B dq .Consider applying an algorithm to solve an (cid:96) - (cid:96) matrixgame instead of the (cid:96) q - (cid:96) matrix game as required in (2).Let the optimal solution to (2) be x ∗ ∈ B dq , p ∗ ∈ ∆ n . Bythe previous analysis, there is a point x ∈ B d such that (cid:107) x − x ∗ (cid:107) q ≤ (cid:15) . Thus the solution x, p ∗ has an error at most O ( (cid:15) ) from the true objective, and the algorithm for solving (cid:96) - (cid:96) games finds a solution at least as good as this. Notations.
Throughout the paper, we denote p, q > to betwo real numbers such that p + q = 1 ; p ∈ [2 , + ∞ ) and q ∈ (1 , . For any s > , we use B ds to denote the d -dimensionalunit ball in (cid:96) s -norm, i.e., B ds := { x : (cid:80) i ∈ [ d ] | x i | s ≤ } ; weuse ∆ n to denote the n -dimensional unit simplex { p ∈ R n : p i ≥ , (cid:80) i p i = 1 } , and use n to denote the n -dimensionalall-one vector. We denote A ∈ R n × d to be the matrix whose i th row is A (cid:62) i for all i ∈ [ n ] . We define sgn : R → {− , , } such that sgn( x ) = − if x < , sgn( x ) = 1 if x > , and sgn(0) = 0 . A Sublinear Classical Algorithm for GeneralMatrix Games
For any q ∈ (1 , , we consider the (cid:96) q - (cid:96) matrix game: σ := max x ∈ B dq min p ∈ ∆ n p (cid:62) Ax. (6)he goal is to find a ¯ x that approximates the equilibrium ofthe matrix game within additive error (cid:15) : min i ∈ [ n ] A i ¯ x ≥ σ − (cid:15). (7)Throughout the paper, we assume A , . . . , A n ∈ B dp , i.e., allthe n data points are normalized to have (cid:96) p -norm at most 1. Algorithm 1:
A sublinear algorithm for (cid:96) q - (cid:96) games. Input: (cid:15) > ; p ∈ [2 , + ∞ ) , q ∈ (1 , such that p + q = 1 ; A ∈ R n × d with A i ∈ B dp ∀ i ∈ [ n ] . Output: ¯ x that satisfies (7). Let T = (cid:100)
895 log n +4 p(cid:15) (cid:101) , y = d , η = (cid:113)
11 log n T , w = n ; for t = 1 to T do p t ← w t (cid:107) w t (cid:107) , x t ← y t max { , (cid:107) y t (cid:107) q } ; Choose i t ∈ [ n ] by i t ← i with probability p t ( i ) ; Define y t +1 where for any j ∈ [ d ] , y t +1 ,j ← y t + (cid:113) q − T sgn( A it,j ) | A it,j | p − (cid:107) A it (cid:107) p − p ; Choose j t ∈ [ d ] by j t ← j with probability x t ( j ) q (cid:107) x t (cid:107) qq ; for i = 1 to n do ˜ v t ( i ) ← A i ( j t ) (cid:107) x t (cid:107) qq /x t ( j t ) q − ; v t ( i ) ← clip(˜ v t ( i ) , η ) where clip( v, M ) := min { M, max {− M, v }}∀ v, M ∈ R ; w t +1 ( i ) ← w t ( i )(1 − ηv t ( i ) + η v t ( i ) ) ; Return ¯ x = T (cid:80) Tt =1 x t . Theorem 2.
The output of Algorithm 1 satisfies (7) withprobability at least / , and its total running time is O ( ( n + d )( p +log n ) (cid:15) ) where p ≥ such that p + q = 1 . Our sublinear algorithm follows the primal-dual approachof Algorithm 1 of Clarkson, Hazan, and Woodruff (2012),which solves (cid:96) - (cid:96) matrix games. Here for (cid:96) q - (cid:96) matrixgames, the solution vector x now lies in B dq . Hence, themost natural adaptations are to use (cid:96) q -sampling instead of (cid:96) -sampling in the primal updates, and to use a p -norm OGDby Shalev-Shwartz (2012) which generalizes the online gradi-ent descent by Zinkevich (2003) in (cid:96) -norm. In the following,we use various technical tools to show these natural adapta-tions actually work. Proposition 1 (Shalev-Shwartz 2012, Corollary 2.18) . Con-sider a set of vectors u , . . . , u T ∈ R d such that (cid:107) u i (cid:107) p ≤ . Set ι = (cid:113) q − T . Let x ← d , ˜ x t +1 ,i ← x t,i + ι sgn( u t,i ) | u t,i | p − (cid:107) u t (cid:107) p − p for all i ∈ [ d ] , and x t +1 ← ˜ x t +1 max { , (cid:107) ˜ x t +1 (cid:107) q } .Then max x ∈ B dq T (cid:88) t =1 u (cid:62) t x − T (cid:88) t =1 u (cid:62) t x t ≤ (cid:115) Tq − . (8) The analysis of Algorithm 1 uses the following lemma,adapted from the variance multiplicative weight lemma andmartingale tail bounds in Clarkson, Hazan, and Woodruff(2012) : Lemma 2 (Section 2 of Clarkson, Hazan, and Woodruff2012) . In Algorithm 1, the parameters p t in Line 3 and v t inLine 9 satisfy (cid:88) t ∈ [ T ] p (cid:62) t v t ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] v t ( i ) + η (cid:88) t ∈ [ T ] p (cid:62) t v t + log nη (9) where v t is defined as ( v t ) i := ( v t ) i for all i ∈ [ n ] , as longas the update rule of w t is as in Line 10 and Var[ v t ( i ) ] ≤ for all t ∈ [ T ] and i ∈ [ n ] . Furthermore, with probability atleast − O (1 /n ) , max i ∈ [ n ] (cid:88) t ∈ [ T ] (cid:2) v t ( i ) − A i x t (cid:3) ≤ ηT ; (10) (cid:12)(cid:12)(cid:12) (cid:88) t ∈ [ T ] A i t x t − (cid:88) t ∈ [ T ] p (cid:62) t v t (cid:12)(cid:12)(cid:12) ≤ ηT, (11) with probability at least / , (cid:80) t ∈ [ T ] p (cid:62) t v t ≤ T . We also need to prove the following inequality on differentmoments of random variables.
Lemma 3.
Suppose that X is a random variable on R , and p ≥ . If E [ | X | p ] ≤ , then E [ X ] ≤ .Proof. Denote the probability density of X as µ . Then (cid:82) + ∞−∞ | x | p d µ x = E [ | X | p ] ≤ . By Hölder’s inequality, wehave ≥ (cid:16) (cid:90) + ∞−∞ | x | p d µ x (cid:17) /p (cid:16) (cid:90) + ∞−∞ µ x (cid:17) − /p ≥ (cid:90) + ∞−∞ | x | · − /p d µ x = (cid:90) + ∞−∞ x d µ x , (12)hence the result follows.Now we are ready to prove our main theorem. Proof of Theorem 2.
First, ˜ v t ( i ) is an unbiased estimator of A i x t as E [˜ v t ( i )] = d (cid:88) j t =1 x t ( j t ) q (cid:107) x t (cid:107) qq · A i ( j t ) (cid:107) x t (cid:107) qq x t ( j t ) q − = A i x t . (13)Furthermore, E [ | ˜ v t ( i ) | p ] = d (cid:88) j t =1 x t ( j t ) q (cid:107) x t (cid:107) qq · | A i ( j t ) | p (cid:107) x t (cid:107) pqq x t ( j t ) p ( q − = (cid:107) A i (cid:107) pp (cid:107) x t (cid:107) pq ≤ , (14) The proof follows from the proofs of Lemmas 2.3, 2.4, 2.5,and 2.6 in Section 2 and Appendix B of Clarkson, Hazan, andWoodruff (2012), with only small modifications to fit our new pa-rameter choices. For instance, the original statement requires that η ≥ (cid:113) log nT , but the proofs actually work for η ≥ (cid:113)
11 log n T . here the second equality follows from the identities q = p ( q − and p = q ( p − , and the last inequality followsfrom the assumption that A i ∈ B dp ∀ i ∈ [ n ] . By Lemma 3, E [˜ v t ( i ) ] ≤ . Because the clip function in Line 9 only makesvariance smaller, this means that the conditions of Lemma 2are satisfied and we hence have (9), rewritten below: (cid:88) t ∈ [ T ] p (cid:62) t v t ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] v t ( i ) + η (cid:88) t ∈ [ T ] p (cid:62) t v t + log nη . (15)Furthermore, Lemma 2 implies that with probability / − O (1 /n ) we have (cid:88) t ∈ [ T ] A i t x t ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] v t ( i ) + 17 ηT + log nη . (16)Moreover, (10) gives (cid:80) t ∈ [ T ] (cid:2) v t ( i ) − A i x t (cid:3) ≤ ηT , andhence min i ∈ [ n ] (cid:80) t ∈ [ T ] v t ( i ) ≤ ηT + min i ∈ [ n ] (cid:80) t ∈ [ T ] A i x t .Plugging this into (16), we have (cid:88) t ∈ [ T ] A i t x t ≤ (cid:88) t ∈ [ T ] p (cid:62) t v t + 10 ηT ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] A i x t + 21 ηT + log nη , (17)with probability (5 / − O (1 /n )) · (1 − O (1 /n )) ≥ / .On the other hand, by taking u t = A i t in Proposition 1, T σ ≤ max x ∈ B dq T (cid:88) t =1 A i t x ≤ T (cid:88) t =1 A i t x t + (cid:112) T p, (18)since q − = pq ≤ p . Combining (17) and (18), we have min i ∈ [ n ] (cid:88) t ∈ [ T ] A i x t ≥ T σ − (cid:112) T p − ηT − log nη . (19)Consequently, the return ¯ x = T (cid:80) Tt =1 x t of Algorithm 1 inLine 11 satisfies min i ∈ [ n ] A i ¯ x ≥ σ − (cid:114) pT − η − log nηT . (20)To prove (7), it remains to show that (cid:113) pT + 21 η + log nηT ≤ (cid:15) ,which is equivalent to √ p + 21 (cid:113)
11 log n + (cid:113)
12 log n ≤√ T (cid:15) by the definition of η . This is true because the AM-GM inequality implies that that LHS is at most √ p ) +2 (cid:0) (cid:113)
11 log n + (cid:113)
12 log n (cid:1) ≤ p + 895 log n ≤ T (cid:15) .Lemma 1 combined with Theorem 2 yields the classicalresult in Theorem 1. A Sublinear Quantum Algorithm for GeneralMatrix Games
In this section, we give a quantum algorithm for solving thegeneral (cid:96) q - (cid:96) matrix games. It closely follows our classical al-gorithm because they both use a primal-dual approach, where the primal part is composed of p -norm online gradient de-scent and the dual part is composed of multiplicative weightupdates. However, we adopt quantum techniques to achievespeedup on both.The intuition behind the quantum algorithm and the quan-tum speedup is that we measure quantum states to obtainrandom samples. These quantum states can be efficientlyprepared (with cost ˜ O ( √ n ) and ˜ O ( √ d ) ). Mathematically, Aquantum state can be represented by an (cid:96) -normalized com-plex vector ψ in the sense that measuring this quantum statesyields outcome i with probability | ψ i | (thus for every proba-bility distribution there is a quantum state corresponding toit). Let us denote the quantum state for sampling from w by | w (cid:105) and the quantum state for sampling from x by | x (cid:105) (differ-ent from the notation in Algorithm 3). If we can maintain | w (cid:105) and | x (cid:105) in each iteration, then there is no need for classicalupdates, and preparing | w (cid:105) and | x (cid:105) becomes the bottleneckof the quantum algorithm.The source of our quantum speedup comes from an impor-tant subroutine, Algorithm 2, which is designed to preparestates for (cid:96) q -sampling. It uses standard Grover-based tech-niques to prepare states but we carefully keep track of thenormalizing factor to facilitate (cid:96) q -sampling. We showed (inProposition 2 in the supplementary material) that preparing | w (cid:105) costs ˜ O ( √ n ) and preparing | x (cid:105) costs ˜ O ( √ d ) . In the fol-lowing, we give the high-level ideas of Algorithm 2.1. We first create a quantum state corresponding to the uni-form distribution, which is easy using Hadamard gates.2. For each entry, we create a state with the desired amplitudeassociated with 0, and an undesired amplitude associatedto 1 (the unitarity of quantum operations necessitates theexistence of this undesired term).3. Finally we use a technique called amplitude amplificationto amplify the portion of the state corresponding to 0 foreach entry, to get a state with only the desired amplitudes.The details of our quantum algorithm for solving the gen-eral (cid:96) q - (cid:96) matrix games, Algorithm 3, is rather technical. Tosimplify the presentation, we postpone its pseudocode (Algo-rithm 3) to the supplementary material and highlight how itis different from Algorithm 1 in the following.• For the primal part, we prepare a quantum state | y t (cid:105) forthe q -norm OGD and measure it (in Line 7) to obtaina sample j t ∈ [ d ] . The subtlety here is that we needto perform the (cid:96) q -sampling to the vector y t ; this is dif-ferent from the (cid:96) -sampling in Li, Chakrabarti, and Wu(2019) which uses the fact that pure quantum states are (cid:96) -normalized. To this end, we design Algorithm 2 for (cid:96) q -quantum state sampling , which may be of indepen-dent interest; this algorithm is built upon a clever use of quantum amplitude amplification , the technique behindthe Grover search (Grover 1996). Note that sampling ac-cording to y t is equivalent to sampling according to x t in Algorithm 1, because x t ( j ) q / (cid:107) x t (cid:107) qq = y t ( j ) q / (cid:107) y t (cid:107) qq .Moreover, it suffices to replace (cid:107) x t (cid:107) qq /x t ( j t ) q − with (cid:107) y t (cid:107) qq / ( y t ( j t ) q − max { , (cid:107) y t (cid:107) q } ) in Line 8 of Algo-rithm 3. Similar to preparing | p t (cid:105) , we use ˜ O ( √ d ) queries lgorithm 2: Prepare an (cid:96) q -pure state given an oracle toits coefficients. Apply the minimum finding algorithm (Dürr and Høyer1996) to find a (cid:107) q (cid:107) := max i ∈ [ n ] | a i | q/ in O ( √ n ) time; Prepare the uniform superposition √ n (cid:80) r ∈ [ n ] | i (cid:105) ; Perform the following unitary transformations: √ n (cid:88) i ∈ [ n ] | i (cid:105) O a (cid:55)−−→ √ n (cid:88) i ∈ [ n ] | i (cid:105) | a i (cid:105)(cid:55)→ √ n (cid:88) i ∈ [ n ] | i (cid:105) | a i (cid:105) (cid:32) a q/ i a (cid:107) q (cid:107) | (cid:105) + (cid:115) − | a i | q a (cid:107) q (cid:107) | (cid:105) (cid:33) O − a (cid:55)−−−→ √ n (cid:88) i ∈ [ n ] | i (cid:105) | (cid:105) (cid:32) a q/ i a (cid:107) q (cid:107) | (cid:105) + (cid:115) − | a i | q a (cid:107) q (cid:107) | (cid:105) (cid:33) ; Discard the second register above and rewrite the state as (cid:107) a (cid:107) q/ q √ na (cid:107) q (cid:107) (cid:107) a (cid:107) q/ q (cid:88) i ∈ [ n ] a q/ i | i (cid:105) | (cid:105) + (cid:12)(cid:12) a ⊥ (cid:11) | (cid:105) , (21)where (cid:12)(cid:12) a ⊥ (cid:11) := √ n (cid:80) i ∈ [ n ] (cid:114) − | a i | q a (cid:107) q (cid:107) | i (cid:105) ; Apply amplitude amplification (Brassard et al. 2002) forthe state in (21) conditioned on the second registerbeing 0. Return the output.to O A to prepare y t , while classically we need to computeall the entries of y t , which takes O ( d ) queries.• For the dual part, we prepare the multiplicative weightvector as a quantum state | p t (cid:105) and measure it (in Line 3)to obtain a sample i t ∈ [ n ] . This adaption enables us toachieve the ˜ O ( √ n ) dependence by using quantum am-plitude amplification in the quantum state preparation: inLine 8, we implement the oracle O t and in Line 9 we use ˜ O ( √ n ) queries to O t to prepare the state | p t +1 (cid:105) for thenext iteration. In contrast, classically we need to computeall the entries of w t +1 to obtain the probability distribution p t +1 for the next iteration, which takes O ( n ) queries.In general, Algorithm 3 can be viewed as a template forachieving quantum speedups for online mirror descent meth-ods: In this work, we focus on the general matrix gameswhere the primal and dual are in the special relationship of (cid:96) p and (cid:96) q norms, but in principle it may be applicable to studyother dualities in online learning.We summarize the main quantum result as the followingtheorem, which states the correctness and time complexityof Algorithm 3. The relevant technical proofs are deferred tothe supplementary material. Theorem 3.
Algorithm 3 returns a succinct classical repre- sentation of a vector ¯ w ∈ R d such that A i ¯ x ≥ max x ∈ B dq min i (cid:48) ∈ [ n ] A i (cid:48) x − (cid:15) ∀ i ∈ [ n ] , (22) with probability at least / , and its total running time is ˜ O (cid:0) p √ n(cid:15) + p . √ d(cid:15) (cid:1) . We can also assume p = O (log d/(cid:15) ) (Lemma 1) and result in running time ˜ O (cid:0) √ n(cid:15) + √ d(cid:15) . (cid:1) . Moreover, Algorithm 3 enjoys the following features:•
Simple quantum input:
Algorithm 3 uses the standardquantum input model and needs not to use any sophisti-cated quantum data structures, such as quantum randomaccess memory (QRAM) in some other quantum machinelearning applications, to achieve speedups.•
Hybrid classical-quantum feature:
Algorithm 3 is alsohighly classical-quantum hybrid: the quantum part is iso-lated by pieces of state preparations connected by clas-sical processing. In addition, it only has O ( log n + p(cid:15) ) it-erations, which implies that the corresponding quantumcircuit is shallow and can potentially be implementedeven on near-term quantum machines (Preskill 2018).• Classical output:
The output of Theorem 3 is completely classical . Compared to quantum algorithms whose outputis a quantum state and may incur overheads (Aaronson2015), Algorithm 3 guarantees minimal overheads andcan be directly used for classical applications.
Applications
We give two applications that generically follow from ourclassical and quantum (cid:96) q - (cid:96) matrix game solvers. Approximate Carathéodory problem
The exact Carathéodory problem is a fundamental result inlinear algebra and convex geometry: every point u ∈ R d inthe convex hull of a vertex set S ⊂ R d can be expressedas a convex combination of d + 1 vertices in S . Recently, abreakthrough result by Barman (2015) shows that if S ⊂ B dp ,i.e., S is in the (cid:96) p -norm unit ball, then there exists a point u (cid:48) s.t. (cid:107) u − u (cid:48) (cid:107) p ≤ (cid:15) and u (cid:48) is a convex combination of O ( p/(cid:15) ) vertices in S . The follow-up work by Mirrokni et al. (2017)proved a matching lower bound Ω( p/(cid:15) ) , and Combettesand Pokutta (2019) can give better bounds under strongerassumptions on S or u .Currently, the best-known time complexity of solving theapproximate Carathéodory problem is O ( ndp/(cid:15) ) by The-orem 3.5 of Mirrokni et al. (2017). We give classical andquantum sublinear algorithms: Corollary 1.
Suppose that S ⊂ B dp , | S | = n , and u isin the convex full of S . Then we can find a convex com-bination (cid:80) ki =1 x i v i such that v i ∈ S for all i ∈ [ k ] , k = O (( p + log n ) /(cid:15) ) , and (cid:107) (cid:80) ki =1 x i v i (cid:107) p ≤ (cid:15) , using a The algorithm stores T = ˜ O ( p/(cid:15) ) real numbers classically: i , . . . , i T obtained from Line 3 and (cid:103) (cid:107) y (cid:107) q , . . . , (cid:93) (cid:107) y T (cid:107) q obtainedfrom Line 6. After that, each coordinate of ¯ x can be computed intime ˜ O ( p/(cid:15) ) . lassical algorithm with running time O (cid:0) ( n + d )( p +log n ) (cid:15) (cid:1) ora quantum algorithm with running time ˜ O (cid:0) p √ n(cid:15) + p . √ d(cid:15) (cid:1) .We can also assume p = O (log d/(cid:15) ) (Lemma 1) and resultin running time O (cid:0) ( n + d )(log d/(cid:15) +log n ) (cid:15) (cid:1) and ˜ O (cid:0) √ n(cid:15) + √ d(cid:15) . (cid:1) ,respectively.Proof. We denote the matrix V := ( v ; v ; · · · ; v n ) where v i is the i th element in S . Note that the ap-proximate Carathéodory problem can be formed as min p ∈ ∆ n (cid:107) V (cid:62) p − u (cid:107) p . In addition, by Hölder’s inequality (cid:107) y (cid:107) p = max x : (cid:107) x (cid:107) q ≤ y (cid:62) x ; therefore, we obtain the follow-ing minimax matrix game: min p ∈ ∆ n max x ∈ B dq ( p (cid:62) V − u (cid:62) ) x. (23)We denote U = ( u ; u ; · · · ; u ) ∈ R n × d , i.e., all the n rowsof U are u . Then we have ( p (cid:62) V − u (cid:62) ) x = 2 p (cid:62) V − U x . Fur-thermore, since u, v i ∈ B dp for all i ∈ [ n ] , each row of V − U is also in u, v i ∈ B dp . Finally, by the Sion’s Theorem (Sion1958) we can switch the order of the min and max in (23).In all, to solve the approximate Carathéodory problem withprecision (cid:15) , it suffices to solve the maximin game max x ∈ B dq min p ∈ ∆ n p (cid:62) V − U x (24)with precision (cid:15) . This is exactly (6), thus the result followsfrom Theorem 2 and Theorem 3.Compared to Mirrokni et al. (2017), we pay a log n over-head in the cardinality of the convex combination, but intime complexity the dominating term nd is significantly im-proved to n + d . We also give the first sublinear quantumalgorithm. Note that as Mirrokni et al. (2017) pointed out, theapproximate Carathéodory problem has wide applications inmachine learning and optimization, including support vec-tor machines (SVMs), rounding in polytopes, submodularfunction minimization, etc. We elaborate the details of SVMsbelow, and leave out the details of other applications as thereductions are direct. (cid:96) q -margin support vector machine (SVM) When we solve the (cid:96) q - (cid:96) matrix game in Algorithm 1, we ap-ply (cid:96) q -sampling where j t = j with probability x ( j ) q / (cid:107) x (cid:107) qq for any j ∈ [ d ] . The key reason of the success of Algo-rithm 1 is because the expectation of the random variable A i ( j t ) (cid:107) x t (cid:107) qq /x t ( j t ) q − in Line 8 is A i x , which is unbiased.If we consider some alternate random variables, we canpotentially solve a maximin game in (cid:96) q - (cid:96) norm with respectto some nonlinear functions of the matrix. A specific problemof significant interest is the (cid:96) q -margin support vector machine(SVM), where we are given n data points X , . . . , X n in R d and a label vector y ∈ { , − } n . The goal is to find aseparating hyperplane w ∈ R d of these data points with thelargest margin under the (cid:96) q -norm loss, i.e., σ SVM := max w ∈ R d min i ∈ [ n ] y i · X (cid:62) i w − (cid:107) w (cid:107) qq . (25) Without loss of generality, we assume y i = 1 for all i ∈ [ n ] ,otherwise we take X i ← ( − y i · X i . In this case, the randomvariable X i ( j ) (cid:107) w (cid:107) qq /w ( j ) q − − (cid:107) w (cid:107) qq is unbiased under (cid:96) q -sampling on j : E (cid:104) X i ( j ) (cid:107) w (cid:107) qq w ( j ) q − − (cid:107) w (cid:107) qq (cid:105) = 2 X (cid:62) i w − (cid:107) w (cid:107) qq . (26)Note that σ SVM ≥ since X (cid:62) i w − (cid:107) w (cid:107) qq = 0 for all i ∈ [ n ] when w = 0 . For the case σ SVM > and taking < (cid:15) <σ SVM , similar to Theorem 2 and Theorem 3 we have:
Corollary 2.
To return a vector ¯ w ∈ R d such that withprobability at least / , min i ∈ [ n ] X i ¯ w − (cid:107) ¯ w (cid:107) qq ≥ σ SVM − (cid:15) > , (27) there is a classical algorithm that achieves this with O (cid:0) ( n + d )( p +log n ) (cid:15) (cid:1) time and a quantum algorithm thatachieves this with ˜ O (cid:0) p √ n(cid:15) + p . √ d(cid:15) (cid:1) time. We can also as-sume p = O (log d/(cid:15) ) (Lemma 1) and result in running time O (cid:0) ( n + d )(log d/(cid:15) +log n ) (cid:15) (cid:1) and ˜ O (cid:0) √ n(cid:15) + √ d(cid:15) . (cid:1) , respectively. Notice that classical sublinear algorithms for (cid:96) -SVMshave been given (Clarkson, Hazan, and Woodruff 2012;Hazan, Koren, and Srebro 2011), and there is also a sub-linear quantum algorithm for (cid:96) -SVMs in Li, Chakrabarti,and Wu (2019). We essentially generalize their results to the l q -norm cases based on our new general matrix game solversin Theorem 2 and Theorem 3. Classical and Quantum Lower Bounds
For both our classical and quantum algorithms for generalmatrix games, we can prove matching classical and quantumlower bounds in n and d for constant (cid:15) : Theorem 4.
Assume < (cid:15) < . . Then to return an ¯ x ∈ B dq satisfying A j ¯ x ≥ max x ∈ B dq min i ∈ [ n ] A i x − (cid:15) ∀ j ∈ [ n ] , (28) with probability at least / , we need Ω( n + d ) classicalqueries or Ω( √ n + √ d ) quantum queries. Due to the space limitation, we postpone the proof detailsof Theorem 4 to the supplementary material.
Conclusions
We give sublinear algorithms for solving general (cid:96) q - (cid:96) matrixgames for any q ∈ (1 , . Our classical and algorithms run intime O ( ( n + d )( p +log n ) (cid:15) ) and ˜ O ( p √ n(cid:15) + p . √ d(cid:15) ) , respectively;both bounds are tight up to poly-logarithmic factors in n and d . Our results can be applied to solve the approximateCarathéodory problem and the (cid:96) q -margin SVMs.Our paper raises a couple of natural open questions forfuture work. For instance:• Can we give sublinear algorithms for (cid:96) p - (cid:96) matrix gameswhere p > ? Technically, this will probably require a q th moment multiplicative weight lemma to replace Lemma 2.• Can we give quantum algorithms that achieve speedup ofvariance-reduced methods for solving matrix games, suchas the state-of-the-art result in Carmon et al. (2019)? thics Statement This work is purely theoretical. Researchers working on learn-ing theory and quantum computing may benefit from ourresults. In the long term, once fault-tolerant quantum comput-ers have been built, our results may find practical applicationsin matrix game scenarios arising in the real world. As far aswe are aware, our work does not have immediate negativeethical impact.
Acknowledgements
TL thanks Adrian Vladu for many helpful discussions, as wellas Yair Carmon for the discussions about his paper (Carmonet al. 2019). TL was supported by an IBM PhD Fellowship,an QISE-NET Triplet Award (NSF grant DMR-1747426),the U.S. Department of Energy, Office of Science, Office ofAdvanced Scientific Computing Research, Quantum Algo-rithms Teams program, ARO contract W911NF-17-1-0433,and NSF grant PHY-1818914. CW was supported by ScottAaronson’s Vannevar Bush Faculty Fellowship. SC and XWwere partially supported by the U.S. Department of Energy,Office of Science, Office of Advanced Scientific ComputingResearch, Quantum Algorithms Team program, and werealso partially supported by the U.S. National Science Founda-tion grant CCF-1755800, CCF-1816695, and CCF-1942837(CAREER).
References
Aaronson, S. 2015. Read the fine print.
Nature Physics arXiv:1904.03180
Arute et al., F. 2019. Quantum supremacy using a pro-grammable superconducting processor.
Nature arXiv:1910.11333
Barman, S. 2015. Approximating Nash equilibria anddense bipartite subgraphs via an approximate version ofCarathéodory theorem. In
Proceedings of the 47th An-nual ACM Symposium on Theory of Computing , 361–369. arXiv:1406.2296
Bennett, C. H.; Bernstein, E.; Brassard, G.; and Vazirani,U. 1997. Strengths and weaknesses of quantum com-puting.
SIAM Journal on Computing arXiv:quant-ph/9701001
Brassard, G.; Høyer, P.; Mosca, M.; and Tapp, A. 2002. Quan-tum amplitude amplification and estimation.
ContemporaryMathematics arXiv:quant-ph/0005055
Carmon, Y.; Jin, Y.; Sidford, A.; and Tian, K. 2019.Variance reduction for matrix games. In
Advances inNeural Information Processing Systems , 11377–11388. arXiv:1907.02056
Clarkson, K. L.; Hazan, E.; and Woodruff, D. P. 2012. Sub-linear optimization for machine learning.
Journal of the ACM(JACM) arXiv:1010.4408
Combettes, C. W.; and Pokutta, S. 2019. Revisiting theApproximate Carathéodory Problem via the Frank-WolfeAlgorithm. arXiv:1911.04415
Dantzig, G. B. 1998.
Linear programming and extensions ,volume 48. Princeton University Press.Deng, N.; Tian, Y.; and Zhang, C. 2012.
Support vectormachines: optimization based theory, algorithms, and exten-sions . CRC press.Dürr, C.; and Høyer, P. 1996. A quantum algorithm forfinding the minimum. arXiv:quant-ph/9607014
Garber, D.; and Hazan, E. 2011. Approximating semidef-inite programs in sublinear time. In
Advances in NeuralInformation Processing Systems , 1080–1088.Grigoriadis, M. D.; and Khachiyan, L. G. 1995. A sublinear-time randomized approximation algorithm for matrix games.
Operations Research Letters
Proceedings of the Twenty-eighth AnnualACM Symposium on Theory of Computing , 212–219. ACM. arXiv:quant-ph/9605043
Hazan, E.; Koren, T.; and Srebro, N. 2011. Beating SGD:Learning SVMs in sublinear time. In
Advances in NeuralInformation Processing Systems , 1233–1241.Johnson, R.; and Zhang, T. 2013. Accelerating stochasticgradient descent using predictive variance reduction. In
Ad-vances in Neural Information Processing Systems , 315–323.Kapoor, A.; Wiebe, N.; and Svore, K. 2016. Quantumperceptron models. In
Proceedings of the 30th Confer-ence on Neural Information Processing Systems , 3999–4007. arXiv:1602.04799
Li, T.; Chakrabarti, S.; and Wu, X. 2019. Sublinear quantumalgorithms for training linear and kernel-based classifiers. In
Proceedings of the 36th International Conference on MachineLearning , 3815–3824. arXiv:1904.02276
Minsky, M.; and Papert, S. A. 1988.
Perceptrons: An intro-duction to computational geometry . MIT Press.Mirrokni, V.; Leme, R. P.; Vladu, A.; and Wong, S. C.-w.2017. Tight bounds for approximate Carathéodory and be-yond. In
Proceedings of the 34th International Conferenceon Machine Learning , 2440–2448. arXiv:1512.08602
Nemirovski, A. 2004. Prox-method with rate of convergence O (1 /t ) for variational inequalities with Lipschitz continuousmonotone operators and smooth convex-concave saddle pointproblems. SIAM Journal on Optimization
Mathe-matical Programming
Mathematische Annalen
Proceedings of the Symposium on the MathematicalTheory of Automata , volume 12, 615–622.Palaniappan, B.; and Bach, F. 2016. Stochastic variance re-duction methods for saddle-point problems. In
Advancesn Neural Information Processing Systems , 1416–1424. arXiv:1605.06398
Preskill, J. 2018. Quantum Computing in the NISQera and beyond.
Quantum
2: 79. ISSN 2521-327X. arXiv:1801.00862
Shalev-Shwartz, S. 2012. Online learning and online convexoptimization.
Foundations and Trends® in Machine Learning
Advances in Neural Information ProcessingSystems , 6031–6041.Sion, M. 1958. On general minimax theorems.
Pacific Jour-nal of Mathematics
Proceedings of the20th International Conference on Machine Learning , 928–936. ublinear Quantum Algorithm for General Matrix Games: Proof Details
We first give the details of our quantum algorithm.
Algorithm 3:
A sublinear quantum algorithm for (cid:96) q - (cid:96) matrix games. Input: (cid:15) > ; p ∈ [2 , + ∞ ) , q ∈ (1 , such that p + q = 1 ; A ∈ R n × d with A i ∈ B dp ∀ i ∈ [ n ] . Output: ¯ x that satisfies (7). Let T = (cid:100) n +4 p(cid:15) (cid:101) , y = d , η = (cid:113)
11 log n T , w = n , | p (cid:105) = √ n (cid:80) i ∈ [ n ] | i (cid:105) ; for t = 1 to T do Measure the state | p t (cid:105) in the computational basis and denote the output as i t ∈ [ n ] ; For each i ∈ [ t ] , estimate (cid:107) A i t (cid:107) pp by Lemma 4 with precision δ = η . Output := (cid:94) (cid:107) A i t (cid:107) pp ; Define y t +1 by y t +1 ,j ← y t + (cid:113) q − T sgn( A it,j ) | A it,j | p − (cid:94) (cid:107) A it (cid:107) p − p for all j ∈ [ d ] ; Apply Lemma 4 (cid:100) log T (cid:101) times to estimate (cid:107) y t (cid:107) qq with precision δ = η , and take the median of the (cid:100) log T (cid:101) outputs,denoted by (cid:103) (cid:107) y t (cid:107) qq ; Choose j t ∈ [ d ] by j t = j with probability y t ( j ) q / (cid:107) y t (cid:107) qq , which is achieved by applying Algorithm 2 to prepare thequantum state | y t (cid:105) and measure in the computational basis; For all i ∈ [ n ] , denote ˜ v t ( i ) = A i ( j t ) (cid:103) (cid:107) y t (cid:107) qq / (cid:0) y t ( j t ) q − max { , (cid:103) (cid:107) y t (cid:107) q } (cid:1) , v t ( i ) = clip(˜ v t ( i ) , /η ) , and u t +1 ( i ) = u t ( i )(1 − ηv t ( i ) + η v t ( i ) ) . Prepare an oracle O t such that O t | i (cid:105)| (cid:105) = | i (cid:105)| u t +1 ( i ) (cid:105) for all i ∈ [ n ] , using t queries to O A and ˜ O ( t ) additional arithmetic computations; Prepare | p t +1 (cid:105) = (cid:107) u t +1 (cid:107) (cid:80) i ∈ [ n ] u t +1 ( i ) | i (cid:105) using Algorithm 2 (with q = 2 therein) and O t ; Return ¯ x = T (cid:80) Tt =1 y t max { , (cid:103) (cid:107) y t (cid:107) q } .We need the following lemma to estimate the norm of a vector: Lemma 4 (Li, Chakrabarti, and Wu 2019, Lemma 6) . Given a function F : [ d ] → [0 , with a quantum oracle O F : | i (cid:105) | (cid:105) (cid:55)→| i (cid:105) | F ( i ) (cid:105) for all i ∈ [ d ] , let m = d (cid:80) di =1 F ( i ) . Then for any δ < , there is a quantum algorithm that uses O ( √ d/δ ) queries to O F and returns an ˜ m such that | m − ˜ m | ≤ δm with probability at least / . We use the procedure below for preparing a quantum state given an oracle to a power of its coefficients:
Proposition 2.
Assume that a ∈ C n , and we are given a unitary oracle O a such that O | i (cid:105)| (cid:105) = | i (cid:105)| a i (cid:105) for all i ∈ [ n ] . ThenAlgorithm 2 takes O ( √ n ) calls to O a for preparing the quantum state (cid:107) a (cid:107) q/ q (cid:80) i ∈ [ n ] a q/ i | i (cid:105) with success probability − O (1 /n ) .Proof. Note that Algorithm 2 of Li, Chakrabarti, and Wu (2019) had given a quantum algorithm for preparing an (cid:96) -norm purestate given an oracle to its coefficients, and Algorithm 2 essentially generalize this result to the (cid:96) q -norm case by replacing all a i by a q/ i as in Algorithm 2. Note that the coefficient in (21) satisfies (cid:107) a (cid:107) q/ q √ na (cid:107) q (cid:107) ≥ √ n . As a result, applying amplitude amplificationfor O ( √ n ) times indeed promises that we obtain 0 in the second system with success probability − O (1 /n ) , i.e., the state (cid:107) a (cid:107) q/ q (cid:80) i ∈ [ n ] a q/ i | i (cid:105) is prepared.We need the following lemma. Lemma 5.
For all i ∈ [ n ] , Define ˜ v t, approx ( i ) := A i ( j t ) (cid:103) (cid:107) y t (cid:107) qq y t ( j t ) q − max { , (cid:103) (cid:107) y t (cid:107) q } , ˜ v t, true ( i ) := A i ( j t ) (cid:107) y t (cid:107) qq y t ( j t ) q − max { , (cid:107) y t (cid:107) q } . (29) where (cid:103) (cid:107) y t (cid:107) qq and (cid:107) y t (cid:107) qq satisfy (cid:12)(cid:12)(cid:12) (cid:103) (cid:107) y t (cid:107) qq − (cid:107) y t (cid:107) qq (cid:12)(cid:12)(cid:12) ≤ δ (cid:107) y t (cid:107) qq (30) Here we do not write down the whole vector y t +1 , but we construct any query to its entries in O (1) time. ith probability at least − o (1) . Also assume that ˜ v t, approx ( i ) , ˜ v t, true ( i ) ≤ /η . Then, it holds that for all i ∈ [ n ] , | ˜ v t, approx ( i ) − ˜ v t, true ( i ) | ≤ δη ∀ i ∈ [ n ] , (31) with probability at least − o (1) .Proof. First note that | ˜ v t, approx ( i ) − ˜ v t, true ( i ) | = ˜ v t, true ( i ) (cid:12)(cid:12)(cid:12)(cid:12) ˜ v t, approx ( i )˜ v t, true ( i ) − (cid:12)(cid:12)(cid:12)(cid:12) ≤ η (cid:12)(cid:12)(cid:12)(cid:12) ˜ v t, approx ( i )˜ v t, true ( i ) − (cid:12)(cid:12)(cid:12)(cid:12) . (32)When (cid:107) y t (cid:107) q ≥ , we have ˜ v t, approx ( i )˜ v t, true ( i ) = (cid:103) (cid:107) y t (cid:107) q − q (cid:107) y t (cid:107) q − q , and when (cid:107) y t (cid:107) q ≤ , we have ˜ v t, approx ( i )˜ v t, true ( i ) = (cid:103) (cid:107) y t (cid:107) qq (cid:107) y t (cid:107) qq . By assumption,with probability at least − o (1) , it holds that (cid:12)(cid:12)(cid:12) (cid:103) (cid:107) y t (cid:107) qq (cid:107) y t (cid:107) qq − (cid:12)(cid:12)(cid:12) ≤ δ . Since ≤ (cid:103) (cid:107) y t (cid:107) q − q (cid:107) y t (cid:107) q − q ≤ (cid:103) (cid:107) y t (cid:107) qq (cid:107) y t (cid:107) qq when (cid:103) (cid:107) y t (cid:107) q ≥ (cid:107) y t (cid:107) q , and ≥ (cid:103) (cid:107) y t (cid:107) q − q (cid:107) y t (cid:107) q − q ≥ (cid:103) (cid:107) y t (cid:107) qq (cid:107) y t (cid:107) qq when (cid:103) (cid:107) y t (cid:107) q < (cid:107) y t (cid:107) q , it also holds that (cid:12)(cid:12)(cid:12) (cid:103) (cid:107) y t (cid:107) q − q (cid:107) y t (cid:107) q − q − (cid:12)(cid:12)(cid:12) ≤ δ . Putting this into (32), we have the desiredinequality.Now, we are ready to prove the main quantum result. Proof of Theorem 3.
First note that in Line 4, we use an estimation (cid:94) (cid:107) A i t (cid:107) pp of (cid:107) A i t (cid:107) pp with relative error at most δ . Then inLine 5, (cid:94) (cid:107) A i t (cid:107) p − p is an estimation of (cid:107) A i t (cid:107) p − p with relative error at most δ because p ≥ and (cid:94) (cid:107) A i t (cid:107) p − p = ( (cid:94) (cid:107) A i t (cid:107) pp ) ( p − /p .Hence, y t +1 has a relative error of at most δ compared to its true value defined by y t + (cid:114) q − T sgn( A i t ,j ) | A i t ,j | p − (cid:107) A i t (cid:107) p − p . (33)Consider Line 6. The estimate (cid:103) (cid:107) y t (cid:107) qq is the median of (cid:100) log T (cid:101) executions of Lemma 4. It implies that, with failure probabilityis at most − (2 / T = 1 − T , (30) holds. Since there are T iterations in total, the probability that (30) holds is at least − T · O (1 /T ) = 1 − o (1) . Also consider (29). It is easy to see that ˜ v t, approx ( i ) , ˜ v t, true ( i ) ≤ /η because of Line 8. Therefore,the conditions of Lemma 5 hold and its result follows.As δ = η , by Lemma 5 and Lemma 2, we have that with probability at least / − O (1 /n ) , (cid:88) t ∈ [ T ] A i t x t ≤ (cid:88) t ∈ [ T ] p (cid:62) t v t + 11 ηT ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] v t ( i ) + 21 ηT + log nη . (34)Moreover, by Lemma 5 and Eq. (10), we have min i ∈ [ n ] (cid:80) t ∈ [ T ] v t ( i ) ≤ ηT + ηT + min i ∈ [ n ] (cid:80) t ∈ [ T ] A i x t . Plugging this into(34), we have (cid:88) t ∈ [ T ] A i t x t ≤ (cid:88) t ∈ [ T ] p (cid:62) t v t + 11 ηT ≤ min i ∈ [ n ] (cid:88) t ∈ [ T ] A i x t + 26 ηT + log nη (35)with probability (5 / − O (1 /n )) · (1 − O (1 /n )) ≥ / .Similar to the proof of Theorem 2, we have min i ∈ [ n ] A i ¯ x ≥ σ − (cid:114) pT − η − log nηT . (36)By the choices of p and η in Algorithm 3, the desired error bound for (22) holds because (cid:32)(cid:114) pT + 26 η + log nηT (cid:33) ≤ (cid:18) pT (cid:19) + 2 (cid:18) η + log nηT (cid:19) ≤ p + 1346 log nT ≤ (cid:15) , (37)where the first inequality follows from the AM-GM inequality and the last inequality follows from the choice of T in Algorithm 3.Now, we analyze the time complexity. In Line 4 of Algorithm 3, the number of queries to O A for Lemma 4 is O ( √ d/δ ) =˜ O ( p √ d/(cid:15) ) . In Line 5, we have y t,j = (cid:114) q − T t (cid:88) τ =1 sgn( A i τ ,j ) | A i τ ,j | p − (cid:94) (cid:107) A i τ (cid:107) pp − . (38)n oracle for y t can be implemented with ˜ O ( p/(cid:15) ) queries to O A . To estimate (cid:107) y t (cid:107) q , we first need to normalize y t . The summandin (38) is in the range [ − , ; to see this, note that | A i τ ,j | p − (cid:107) A i τ (cid:107) p − p ≤ | A i τ ,j | p − ( | A i τ ,j | p ) ( p − /p = | A i τ ,j | ≤ . (39)Therefore, y t,j = ˜ O ( √ pq/(cid:15) ) = ˜ O ( √ p/(cid:15) ) . Since the precision is δ = η = ˜Θ( (cid:15) /p ) , the cost for amplitude estimation is ˜ O ( p √ d/(cid:15) ) . Finally, there are T = ˜ O ( p/(cid:15) ) iterations in total. The total complexity in Line 5 is ˜ O (cid:16) p(cid:15) (cid:17) · ˜ O (cid:18) √ p(cid:15) (cid:19) · ˜ O (cid:32) p √ d(cid:15) (cid:33) · ˜ O (cid:16) p(cid:15) (cid:17) = ˜ O (cid:32) p . √ d(cid:15) (cid:33) . (40)For Line 6, we need to prepare the state | y t (cid:105) . To simulate a query to an coefficient of y t , we need ˜ O ( p/(cid:15) ) queries to O A . Thequery complexity for Algorithm 2 is O ( √ d ) , and there are T = ˜ O ( p/(cid:15) ) iterations in total. The total complexity in Line 6 is ˜ O (cid:16) p(cid:15) (cid:17) · O ( √ d ) · ˜ O (cid:16) p(cid:15) (cid:17) = ˜ O (cid:32) p √ d(cid:15) (cid:33) , (41)which is dominated by (40).For Line 8, to implement one query to O t , we need t queries to O A with ˜ O ( t ) additional arithmetic computations. For Line 9,to prepare the state | p t +1 (cid:105) , we need O ( √ n ) queries to O t , which can be implemented by O ( √ nt ) queries to O A by Line 8 and ˜ O ( √ nt ) additional arithmetic computations. Therefore, the total complexity for Line 9 is T (cid:88) t =1 ˜ O ( √ nt ) = ˜ O ( √ nT ) = ˜ O (cid:18) p √ n(cid:15) (cid:19) . (42)The time complexity of this algorithm is established by (40) and (42).Finally, ¯ x has a succinct classical representation: using i , . . . , i τ obtained from Line 3 and (cid:103) (cid:107) y (cid:107) q , . . . , (cid:93) (cid:107) y T (cid:107) q obtained fromLine 6, a coordinate of ¯ x can be restored in time T = ˜ O ( p/(cid:15) ) . Classical and Quantum Lower Bounds
Recall that the input of the general matrix game is a matrix A ∈ R n × d such that A i ∈ B dp for all i ∈ [ n ] ( A i being the i th row of A ), and the goal is to approximately solve σ := max x ∈ B dq min p ∈ ∆ n p (cid:62) Ax, (43)where p ∈ [2 , + ∞ ) , q ∈ (1 , , and p + q = 1 . Classically, we are given an oracle that inputs i ∈ [ n ] , j ∈ [ d ] and outputs A ij ;our sublinear classical algorithm in Theorem 2 solves the general matrix game (43) in O ( ( n + d )( p +log n ) (cid:15) ) time. Quantumly, weare given the quantum oracle O A such that O A | i (cid:105)| j (cid:105)| (cid:105) = | i (cid:105)| j (cid:105)| A ij (cid:105) ∀ i ∈ [ n ] , j ∈ [ d ] ; our quantum algorithm in Theorem 3solves the general matrix game (43) in ˜ O ( p √ n(cid:15) + p . √ d(cid:15) ) time. We prove matching classical and quantum lower bounds in n and d for constant (cid:15) and p : Theorem 5.
Assume < (cid:15) < . . Then to return an ¯ x ∈ B dq satisfying A j ¯ x ≥ max x ∈ B dq min i ∈ [ n ] A i x − (cid:15) ∀ j ∈ [ n ] (44) with probability at least / , we need Ω( n + d ) classical queries or Ω( √ n + √ d ) quantum queries. The proof of Theorem 5 is inspired by Li, Chakrabarti, and Wu (2019), but for the (cid:96) q - (cid:96) matrix game the construction isdifferent and the analysis is more intricate as seen below. Proof.
Assume we are given the promise that A is from one of the two cases below:1. There exists an l ∈ { , . . . , d } such that A = − /p , A l = /p ; A = A l = /p ; there exists a unique k ∈ { , . . . , n } such that A k = 1 , A kl = 0 ; A ij = /p for all i ∈ { , . . . , n } / { k } , j ∈ { , l } , and A ij = 0 for all i ∈ [ n ] , j / ∈ { , l } .2. There exists an l ∈ { , . . . , d } such that A = − /p , A l = /p ; A = A l = /p ; A ij = /p for all i ∈ { , . . . , n } , j ∈ { , l } , and A ij = 0 for all i ∈ [ n ] , j / ∈ { , l } .otice that the only difference between these two cases is a row where the first entry is 1 and the l th entry is 0; they have thefollowing pictures, respectively. Case 1: A = − /p · · · /p · · · /p · · · /p · · · ... ... . . . ... ... ... . . . ... /p · · · /p · · ·
01 0 · · · · · · /p · · · /p · · · ... ... . . . ... ... ... . . . ... /p · · · /p · · · ; (45)Case 2: A = − /p · · · /p · · · /p · · · /p · · · ... ... . . . ... ... ... . . . ... /p · · · /p · · · ... ... . . . ... ... ... . . . ... /p · · · /p · · · . (46)We denote the maximin value in (43) of these cases as σ and σ , respectively. We have:• σ = /p .On the one hand, consider ¯ x = (cid:126)e l ∈ B dq (the vector in R d with the l th coordinate being 1 and all other coordinates being 0).Then A i ¯ x = /p for all i ∈ [ n ] , and hence σ ≥ min i ∈ [ n ] A i ¯ x = /p . On the other hand, for any x = ( x , . . . , x d ) ∈ B dq , wehave min i ∈ [ n ] A i x = min (cid:110) − /p x + 12 /p x l , /p x + 12 /p x l (cid:111) ≤ /p x l ≤ /p , (47)where the first inequality comes from the fact that min { a, b } ≤ a + b for all a, b ∈ R and the second inequality comes from thefact that x ∈ B dq and | x l | ≤ . As a result, σ = max x ∈ B dq min i ∈ [ n ] A i x ≤ /p . In conclusion, we have σ = /p .• σ = − /q +1) q ) /q .On the one hand, consider ¯ x = − /q +1) q ) /q (cid:126)e + − /q +1(1+(2 − /q +1) q ) /q (cid:126)e l ∈ B d . It can be seen that ¯ x ∈ B dq ; moreover,since p + q = 1 , A ¯ x = − /p · − /q + 1) q ) /q + 12 /p · − /q + 1(1 + (2 − /q + 1) q ) /q = 1(1 + (2 − /q + 1) q ) /q ; A i ¯ x = 12 /p · − /q + 1) q ) /q + 12 /p > − /q + 1) q ) /q ∀ i ∈ [ n ] / { , k } ; A k ¯ x = 1 · − /q + 1) q ) /q + 0 · − /q + 1(1 + (2 − /q + 1) q ) /q = 1(1 + (2 − /q + 1) q ) /q . In all, σ ≥ min i ∈ [ n ] A i ¯ x = − /q +1) q ) /q .On the other hand, for any x = ( x , . . . , x d ) ∈ B d , we have min i ∈ [ n ] A i x = min (cid:110) − /p x + 12 /p x l , /p x + 12 /p x l , x (cid:111) . (48)If x ≤ − /q +1) q ) /q , then (48) implies that min i ∈ [ n ] A i x ≤ − /q +1) q ) /q ; if x ≥ − /q +1) q ) /q , then x l ≤ (1 − x q ) /q = 2 − /q + 1(1 + (2 − /q + 1) q ) /q , (49)nd hence by (48) we have min i ∈ [ n ] A i x ≤ − /p x + 12 /p x l ≤ − /p · − /q + 1) q ) /q + 12 /p · − /q + 1(1 + (2 − /q + 1) q ) /q = 1(1 + (2 − /q + 1) q ) /q . (50)In all, we always have min i ∈ [ n ] A i x ≤ − /q +1) q ) /q . As a result, σ = max x ∈ B dq min i ∈ [ n ] A i x ≤ − /q +1) q ) /q . Inconclusion, we have σ = − /q +1) q ) /q .Now, we prove that an ¯ x ∈ B dq satisfying (44) would simultaneously reveal whether A is from Case 1 or Case 2 aswell as the value of l ∈ { , . . . , d } , by the following algorithm:1. Check if one of ¯ x , . . . , ¯ x d is at least − . · /p ; if there exists an l (cid:48) ∈ { , . . . , d } such that ¯ x l (cid:48) ≥ − . · /p , return‘Case 2’ and l = l (cid:48) ;2. Otherwise, return ‘Case 1’ and l = arg max i ∈{ ,...,d } ¯ x i .We first prove that the classification of A between Case 1 and Case 2 is correct. On the one hand, assume that A comes fromCase 1. If we wrongly classified A as from Case 2, we would have ¯ x l (cid:48) ≥ − . · /p and ¯ x ≤ (1 − (1 − . · /p ) q ) /q .We denote f ( q ) := 1 − (1 − . · − /q ) q − (cid:16) − /q + 1) q ) /q − . (cid:17) q . (51)It can shown that f is a decreasing function on [1 , ; furthermore, f (2) > . See Figure 1. As a result, f ( q ) > , which Figure 1: The plot of f , where the x -axis represents q and the y -axis represents f ( q ) . implies min i ∈ [ n ] A i ¯ x = min (cid:110) − /p ¯ x + 12 /p ¯ x l , /p ¯ x + 12 /p ¯ x l , ¯ x (cid:111) ≤ ¯ x < σ − (cid:15). (52)However, this contradicts with (44). Therefore, for this case we must make the correct classification that A comes from Case 1.On the other hand, assume that A comes from Case 2. If we wrongly classified A as from Case 1, we would have ¯ x l ≤ max i ∈{ ,...,d } ¯ x i < − . · /p ; this would imply min i ∈ [ n ] A i ¯ x = min (cid:110) − /p ¯ x + 12 /p ¯ x l , /p ¯ x + 12 /p ¯ x l (cid:111) ≤ /p ¯ x l < σ − (cid:15), (53)which contradicts with (44). Therefore, for this case we must make the correct classification that A comes from Case 2. In all,our classification is always correct.t remains to prove that the value of l is correct. If A is from Case 1, we have σ − (cid:15) ≤ min i ∈ [ n ] A i ¯ x = min (cid:110) − /p ¯ x + 12 /p ¯ x l , /p ¯ x + 12 /p ¯ x l , ¯ x (cid:111) ; (54)as a result, ¯ x ≥ σ − (cid:15) and − /p ¯ x + /p ¯ x l > σ − (cid:15) , which imply ¯ x l > (2 /p + 1)( σ − (cid:15) ) > (2 − /q + 1) (cid:16) − /q + 1) q ) /q − . (cid:17) . (55)We denote f ( q ) := 2(2 − /q + 1) q (cid:16) − /q + 1) q ) /q − . (cid:17) q . (56)It can shown that f is an increasing function on [1 , ; furthermore, f (1) > . See Figure 2. As a result, f ( q ) > , which Figure 2: The plot of f , where the x -axis represents q and the y -axis represents f ( q ) . implies | ¯ x l | q > / . Therefore, ¯ x l must be the largest among ¯ x , . . . , ¯ x d (otherwise l (cid:48) = arg max i ∈{ ,...,d } ¯ x i and l (cid:54) = l (cid:48) wouldimply (cid:107) ¯ x (cid:107) qq = (cid:80) i ∈ [ d ] | ¯ x i | qq ≥ | ¯ x l | q + | ¯ x l (cid:48) | q ≥ | ¯ x l | q > , contradiction). Therefore, Line 2 of the algorithm correctly returnsthe value of l .If A is from Case 2, we have σ − (cid:15) ≤ min i ∈ [ n ] A i ¯ x = min (cid:110) − /p ¯ x + 12 /p ¯ x l , /p ¯ x + 12 /p ¯ x l (cid:111) ≤ /p ¯ x l , (57)and hence ¯ x l ≥ /p ( σ − (cid:15) ) ≥ /p ( /p − . > − . · /p . Since p + q = 1 , we have − . · /p = 1 − . · − /q > − /q , (58)where the last inequality comes from the fact that − /q ≤ − / = √ < . for any q ∈ [1 , . Therefore, − . · /p ) q > , and only one coordinate of ¯ x could be at least − . · /p and we must have l = l (cid:48) . Therefore, Line 1 of the algorithmcorrectly returns the value of l .In all, we have proved that an (cid:15) -approximate solution ¯ x ∈ B dq for (44) would simultaneously reveal whether A is from Case 1or Case 2 as well as the value of l ∈ { , . . . , d } . As a result:• Classically: On the one hand, notice that distinguishing these two cases requires n − classical queries to the entries of A forsearching the position of k ; therefore, it gives an Ω( n ) classical query lower bound for returning an ¯ x that satisfies (44). Onthe other hand, finding the value of l is also a search problem on the entries of A , which requires d − √ d ) queries.These observations complete the proof of the classical lower bound in Theorem 5. Quantumly: On the one hand, notice that distinguishing these two cases requires Ω( √ n −
2) = Ω( √ n ) quantum queries to O A for searching the position of k because of the quantum lower bound for search (Bennett et al. 1997); therefore, it gives an Ω( √ n ) quantum lower bound on queries to O A for returning an ¯ x that satisfies (44). On the other hand, finding the value of l is also a search problem on the entries of A , which requires Ω( √ d −