Semidefinite relaxation of multi-marginal optimal transport for strictly correlated electrons in second quantization
SSEMIDEFINITE RELAXATION OF MULTI-MARGINALOPTIMAL TRANSPORT FOR STRICTLY CORRELATEDELECTRONS IN SECOND QUANTIZATION
YUEHAW KHOO ∗ , LIN LIN † , MICHAEL LINDSEY ‡ , AND
LEXING YING § Abstract.
We consider the strictly correlated electron (SCE) limit of the fermionic quantummany-body problem in the second-quantized formalism. This limit gives rise to a multi-marginaloptimal transport (MMOT) problem. Here the marginal state space for our MMOT problem isthe binary set { , } , and the number of marginals is the number L of sites in the model. Thecosts of storing and computing the exact solution of the MMOT problem both scale exponentiallywith respect to L . We propose an efficient convex relaxation which can be solved by semidefiniteprogramming (SDP). In particular, the semidefinite constraint is only of size 2 L × L . Moreover,the SDP-based method yields an approximation of the dual potential needed to the perform self-consistent field iteration in the so-called Kohn-Sham SCE framework, which, once converged, yields alower bound for the total energy of the system. We demonstrate the effectiveness of our methods onspinless and spinful Hubbard-type models. Numerical results indicate that our relaxation methodsyield tight lower bounds for the optimal cost, in the sense that the error due to the semidefiniterelaxation is much smaller than the intrinsic modeling error of the Kohn-Sham SCE method. Wealso describe how our relaxation methods generalize to arbitrary MMOT problems with pairwise costfunctions.
1. Introduction.
For ground-state electronic structure calculations, the successof the widely-used Kohn-Sham density functional theory (DFT) [13, 15] hinges on theaccuracy of the approximate exchange-correlation functionals. Although tremendousprogress has been made in the construction of approximate functionals [33, 2, 17, 32],these approximations are mostly derived by fitting known results for the uniformelectron gas, single atoms, small molecules, and perfect crystal systems. Such func-tionals often perform well when the underlying quantum systems are ‘weakly corre-lated,’ i.e., when the single-particle energy is significantly more important than theelectron-electron interaction energy. In order to extend the capability of DFT to thetreatment of strongly correlated quantum systems, one recent direction of functionaldevelopment considers the limit in which the electron-electron interaction energy isinfinitely large compared to other components of the total energy. The resulting limitis known as strictly correlated electron (SCE) [38, 37, 5, 22, 10] limit. The SCE limitprovides an alternative route to derive exchange-correlation energy functionals. Thestudy of Kohn-Sham DFT with SCE-based functionals is still in its infancy, but suchapproaches have already been used to treat strongly correlated model systems andsimple chemical systems (see e.g. [29, 7, 12]).A system of N interacting electrons in a d -dimensional space can be describedusing either the first-quantized or the second-quantized representation. In the first-quantized representation, the number of electrons N is fixed, and the electronic wave-function is an anti-symmetric function in (cid:86) N L ( R d ; C ), which is a subset of thetensor product space (cid:78) N L ( R d ; C ). Here C corresponds to the spin degree of free- ∗ Department of Mathematics, Stanford University, Stanford, CA 94305, USA. Email: [email protected] † Department of Mathematics, University of California, Berkeley, and Computational Re-search Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. Email: [email protected] ‡ Department of Mathematics, University of California, Berkeley, CA 94720, USA. Email: [email protected] § Department of Mathematics and Institute for Computational and Mathematical Engineering,Stanford University, Stanford, CA 94305, USA. Email: [email protected] a r X i v : . [ m a t h . O C ] M a y om. In first quantization, the anti-symmetry condition needs to be treated explicitly.By contrast, in the second-quantized formalism, one chooses a basis for a subspaceof L ( R d ; C ). In practice, the basis is of some finite size L , corresponding to a dis-cretized model with L sites that encode both spatial and spin degrees of freedom. Theelectronic wavefunction is an element of the Fock space F ∼ = C L . The Fock space con-tains wavefunctions of all possible electron numbers, and finding wavefunctions of thedesired electron number is achieved by constraining to a subspace of the Fock space.In the second-quantized representation, the anti-symmetry constraint is in some sensebaked into the Hamiltonian operator instead of the wavefunction, and this perspectiveoften simplifies book-keeping efforts. Due to the inherent computational difficulty ofstudying strongly correlated systems such as high-temperature superconductors, it isoften necessary to introduce simplified Hamiltonians such as in Hubbard-type models.These model problems are formulated directly in the second-quantized formalism viaspecification of an appropriate Hamiltonian.To the extent of our knowledge, all existing works on SCE treat electrons in thefirst-quantized representation with (essentially) a real space basis. In this paper weaim at studying the SCE limit in the second-quantized setting. Note that gener-ally Kohn-Sham-type theories in the second-quantized representation are known as‘site occupation functional theory’ (SOFT) or ‘lattice density functional theory’ inthe physics literature [36, 20, 6, 39, 8]. A crucial assumption of this paper is thatthe electron-electron interaction takes the form (cid:80) Lp,q =1 v pq ˆ n p ˆ n q , which we call thegeneralized Coulomb interaction. (The meaning of the symbols will be explained inSection 2.) We remark that the form of the generalized Coulomb interaction is morerestrictive than the general form (cid:80) Lp,q,r,s =1 v pqrs ˆ a † p ˆ a † q ˆ a s ˆ a r appearing in the quantumchemistry literature, to which our formulation does not yet apply. Assuming a gen-eralized Coulomb interaction, we demonstrate that the corresponding SCE problemcan be formulated as a multi-marginal optimal transport (MMOT) problem over clas-sical probability measures on the binary hypercube { , } L . The cost function in thisproblem is of pairwise form. Hence the objective function in the Kantorovich formula-tion of the MMOT can be written in terms of only the 2-marginals of the probabilitymeasure. In order to solve the MMOT problem directly, even the storage cost of theexact solution scales as 2 L , and the computational cost also scales exponentially withrespect to L . Thus a direct approach becomes impractical even when the number ofsites becomes moderately large. Contribution:
Based on the recent work of Khoo and Ying [14], we propose a convex relaxationapproach by imposing certain necessary constraints satisfied by the 2-marginals. Therelaxed problem can be solved efficiently via semidefinite programming (SDP). Whilethe 2-marginal formulation provides a lower bound to the optimal cost of the MMOTproblem, we also propose a tighter lower bound obtained via an SDP involving the 3-marginals. The computational cost for solving these relaxed problems is polynomialwith respect to L , and, in particular, the semidefinite constraint is only enforcedon a matrix of size 2 L × L . Numerical results for spinless and spinful Hubbard-type systems demonstrate that the 2-marginal and 3-marginal relaxation schemesare already quite tight, especially when compared to the modeling error due to theKohn-Sham SCE formulation itself.By solving the dual problems for our SDPs, we can obtain the Kantorovich dualpotentials, which yield the SCE potential needed for carrying out the self-consistentfield iteration (SCF) in the Kohn-Sham SCE formalism. To this end we need to show hat the dual problem satisfies strong duality and moreover that the dual optimizer isactually attained. We show that a straightforward formulation of the primal SDP doesnot have any strictly feasible point, and hence Slater’s condition cannot be directlyapplied to establish strong duality (see, e.g., [4]). By a careful study of the structureof the dual problem, we prove that the strong duality and dual attainment conditionsare indeed satisfied. We also explain how the SDP relaxations introduced in thispaper can be applied to arbitrary MMOT problems with pairwise cost functions. Wecomment that the justification of the strong duality and dual attainment conditionsholds in this more general setting as well. Related work:
In the first-quantized formulation, for a fixed real-space discretization the com-putational cost of the direct solution of the SCE problem scales exponentially withrespect to the number of electrons N . This curse of dimensionality is a serious obstaclefor SCE-based approaches to the quantum many-body problem. Notable exceptionsto the unfavorable computational scaling are the cases of strictly one-dimensionalsystems (i.e., d = 1) and spherically symmetric systems (for any d ) [37], for whichsemi-analytic solutions exist.In [3], the Sinkhorn scaling approach is applied to an entropically regularizedMMOT problem. This method requires the marginalization of a probability measureon a product space of size that is exponential in the number of electrons N . Thus thecomplexity of this method also scales exponentially with respect to N . Meanwhile, amethod based on the Kantorovich dual of the MMOT problem was proposed in [5, 28].However, there are exponentially many constraints in the dual problem. Furthermore,[5] assumes a Monge solution to the MMOT problem, but it is unknown whether theMMOT problem with pairwise Coulomb cost has a Monge solution for d = 2 , Organization:
In Section 2, we describe the Hamiltonians under consideration andderive an appropriate formulation of Kohn-Sham DFT based on the SCE functional,which is in turn defined in terms of a MMOT problem. In Section 3, we solve theMMOT problem by introducing a convex relaxation of the set of representable 2-marginals, and we prove strong duality for the relaxed problem. In Section 4, a tighter ower bound is obtained by considering a convex relaxation of the set of representable3-marginals. In Section 5, we comment on how a general MMOT problem withpairwise cost can be solved by directly applying the methods introduced in Sections 3and 4. We demonstrate the effectiveness of the proposed methods through numericalexperiments in Section 6, and we discuss conclusions and future directions in Section 7.
2. Preliminaries.2.1. Density functional theory in second quantization.
Our goal is tocompute the ground-state energy of a fermionic system with L states. With someabuse of terminology, we will refer to fermions simply as electrons. Also for simplicitywe use a single index for all of the states, as opposed to using separate site and spinindices in the case of spinful systems. Double indexing for spinful fermionic systemscan be recovered simply by rearranging indices, e.g., by associating odd state indiceswith spin-up components and even state indices with spin-down components.In the second-quantized formulation, the state space is called the Fock space,denoted by F . The occupation number basis set for the Fock space is {| s , . . . , s L (cid:105)} s i ∈{ , } ,i =1 ,...,L , which is an orthonormal basis set satisfying (cid:104) s i , . . . , s i L | s j , . . . , s j L (cid:105) = δ i j · · · δ i L j L . (2.1)A state | ψ (cid:105) ∈ F will be written as a linear combination of occupation number basiselements as follows: | ψ (cid:105) = (cid:88) s ,...,s L ∈{ , } ψ ( s , . . . , s L ) | s , . . . , s L (cid:105) , ψ ( s , . . . , s L ) ∈ C . (2.2)Hence the state vector | ψ (cid:105) can be identified with a vector ψ ∈ C L , and F is isomorphicto C L . We call | ψ (cid:105) normalized if the following condition is satisfied: (cid:104) ψ | ψ (cid:105) = (cid:88) s ,...,s L ∈{ , } | ψ ( s , . . . , s L ) | = 1 . (2.3)We also refer to | (cid:105) = | , . . . , (cid:105) as the vacuum state.The fermionic creation and annihilation operators are respectively defined asˆ a † p | s , . . . , s L (cid:105) = ( − (cid:80) p − q =1 s q (1 − s p ) | s , . . . , − s p , . . . , s L (cid:105) , ˆ a p | s , . . . , s L (cid:105) = ( − (cid:80) p − q =1 s q s p | s , . . . , − s p , . . . , s L (cid:105) , p = 1 , . . . , L. (2.4)The number operator defined as ˆ n p := ˆ a † p ˆ a p satisfiesˆ n p | s , . . . , s L (cid:105) = s p | s , . . . , s L (cid:105) , p = 1 , . . . , L. (2.5)The Hamiltonian operator is assumed to take the following form:ˆ H = L (cid:88) p,q =1 t pq ˆ a † p ˆ a q + L (cid:88) p =1 w p ˆ n p + L (cid:88) p,q =1 v pq ˆ n p ˆ n q . (2.6)Here t ∈ C L × L is a Hermitian matrix, which is often interpreted as the ‘hopping’ termarising from the kinetic energy contribution to the Hamiltonian. w is an on-site term, hich can be viewed as an external potential. v ∈ C L × L is also a Hermitian matrix,which may be viewed as representing the electron-electron Coulomb interaction. Notethat ˆ n p = ˆ a † p ˆ a p = ˆ n p ˆ n p , hence without loss of generality we can assume the diagonalentries t pp = v pp = 0 by absorbing, if necessary, such contributions into the on-sitepotential w . Following the spirit of Kohn-Sham DFT, one could think of t, v as fixedmatrices, and of the external potential w as a contribution that may change dependingon the system (in the context of DFT, w represents the electron-nuclei interaction andis therefore ‘external’ to the electrons). We remark that the restriction of the formof the two-body interaction (cid:80) Lp,q =1 v pq ˆ n p ˆ n q is crucial for the purpose of this paper.In particular, we do not consider the more general form (cid:80) Lp,q,r,s =1 v pqrs ˆ a † p ˆ a † q ˆ a s ˆ a r asis done in the quantum chemistry literature when a general basis set (such as theGaussian basis set) is used to discretize a quantum many-body Hamiltonian in thecontinuous space. In the discussion below, for simplicity we will omit the index rangeof our sums as long as the meaning is clear.The exact ground state energy E can be obtained by the following minimizationproblem: E = inf | ψ (cid:105)∈F : (cid:104) ψ | ψ (cid:105) =1 (cid:104) ψ | ˆ H − µ ˆ N | ψ (cid:105) . (2.7)Here the minimizer | ψ (cid:105) is the many-body ground state wavefunction, and ˆ N := (cid:80) p ˆ n p is the total number operator. µ , which is called the chemical potential, is a Lagrangemultiplier chosen so that the ground state wavefunction | ψ (cid:105) has a number of electronsequal to a pre-specified integer N ∈ { , , . . . , L } , i.e., such that (cid:104) ψ | ˆ N | ψ (cid:105) = N. (2.8)It is clear that µ ˆ N is an on-site potential, and without loss of generality we absorb µ into w , and hence write ˆ H − µ ˆ N as ˆ H in the discussion below.The electron density ρ ∈ R L is defined as ρ p = (cid:104) ψ | ˆ n p | ψ (cid:105) = (cid:88) s ,...,s L | ψ ( s , . . . , s L ) | s p , p = 1 , . . . , L, (2.9)which satisfies (cid:80) p ρ p = N . Note that (cid:104) ψ | (cid:88) p w p ˆ n p | ψ (cid:105) = (cid:88) p w p ρ p =: W [ ρ ] . (2.10)Then we follow the Levy-Lieb constrained minimization approach [18, 19] and rewritethe ground state minimization problem (2.7) as follows: E = inf ρ ∈J N (cid:40)(cid:88) p ρ p w p + (cid:32) inf | ψ (cid:105)(cid:55)→ ρ, | ψ (cid:105)∈F (cid:104) ψ | (cid:88) pq t pq ˆ a † p ˆ a q + (cid:88) pq v pq ˆ n p ˆ n q | ψ (cid:105) (cid:33)(cid:41) = inf ρ ∈J N { W [ ρ ] + F LL [ ρ ] } , (2.11)where F LL [ ρ ] := inf | ψ (cid:105)(cid:55)→ ρ, | ψ (cid:105)∈F (cid:104) ψ | (cid:88) pq t pq ˆ a † p ˆ a q + (cid:88) pq v pq ˆ n p ˆ n q | ψ (cid:105) . (2.12) ere the notation ψ (cid:55)→ ρ indicates that the corresponding infimum is taken over states | ψ (cid:105) that yield the density ρ in the sense of Eq. (2.9), and the domain J N of ρ is definedby J N := (cid:40) ρ ∈ R L (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ρ ≥ , (cid:88) p ρ p = N (cid:41) . (2.13)Note that the external potential w is only coupled with ρ and is singled out in theconstrained minimization. It is easy to see that for any ρ ∈ J N , the set {| ψ (cid:105) ∈ F : | ψ (cid:105) (cid:55)→ ρ } is non-empty, as we may simply choose | ψ (cid:105) = (cid:88) p √ ρ p | s ( p )1 , . . . , s ( p ) L (cid:105) , s ( p ) q = δ pq . Therefore the constrained minimization problem (2.11) is in fact defined over a nonemptyset for all ρ ∈ J N .The functional F LL [ ρ ], which is called the Levy-Lieb functional, is a universalfunctional in the sense that it depends only on the hopping term t and the interactionterm v , hence in particular is independent of the potential w . Once the functional F LL [ ρ ] is known, E can be obtained by minimization with respect to a single vector ρ using standard optimization algorithms, or via the self-consistent field (SCF) iterationto be detailed below. The construction above is called the ‘site occupation functionaltheory’ (SOFT) or ‘lattice density functional theory’ in the physics literature [36, 20,6, 39, 8]. To our knowledge, SOFT or lattice DFT often imposes an additional sparsitypattern on the v matrix for the electron-electron interaction, so that the Hamiltonianbecomes a Hubbard-type model. Using the fact that the infimum of asum is greater than the sum of infimums, we can lower-bound the ground state energyin the following way: F LL [ ρ ] ≥ inf | ψ (cid:105)(cid:55)→ ρ (cid:104) ψ | (cid:88) pq t pq ˆ a † p ˆ a q | ψ (cid:105) + inf | ψ (cid:105)(cid:55)→ ρ (cid:104) ψ | (cid:88) pq v pq ˆ n p ˆ n q | ψ (cid:105) =: T [ ρ ]+ E sce [ ρ ] , (2.14)where the functionals T [ ρ ] and E sce [ ρ ] are defined via the last equality in the mannersuggested by the notation. The first of these quantities is called the kinetic energy, andthe second the strictly correlated electron (SCE) energy. The SCE approximation isobtained by treating T [ ρ ] + E sce [ ρ ] as an approximation for the Levy-Lieb functional.Though in general it is only a lower-bound for the Levy-Lieb functional, this boundis expected to become tight in the limit of infinitely strong interaction. We do notprove this fact in this paper (though we demonstrate it numerically below), but wenonetheless refer to this approximation as the SCE limit by analogy to the literatureon SCE in first quantization [38, 37].Due to the inequality in Eq. (2.14), we have in general the following lower boundfor the total energy, which we shall call the Kohn-Sham SCE energy: E ≥ E KS-SCE := inf ρ ∈J N { W [ ρ ] + T [ ρ ] + E sce [ ρ ] } . (2.15)The advantage of the preceding manipulations is that now each term in this infimumcan be computed. Specifically, W [ ρ ] is trivial to compute, T [ ρ ] is defined in terms of anon-interacting many-body problem (i.e., a problem with Hamiltonian only quadratic n the creation and annihilation operators), for which an exact solution can be ob-tained via the diagonalization of t [30]. Finally, as we shall see below the SCE term(and its gradient) can be computed in terms of a MMOT problem (and its dual). Thusin principle, it would be possible to take gradient descent approach for computing theinfimum in the definition (2.15) of E KS-SCE . In practice, to compute the Kohn-Sham SCE energy we will instead adopt the self-consistent field (SCF) iteration asis common practice in Kohn-Sham DFT. It can be readily checked that E sce [ ρ ] isconvex with respect to ρ . By the convexity of W [ ρ ], T [ ρ ], and E sce [ ρ ], the expressionin Eq. (2.15) admits a minimizer, which is unique unless the functional fails to bestrictly convex. We assume that the solution is unique and E sce [ ρ ] is differentiable forsimplicity, and we derive nonlinear fixed-point equations satisfied by the minimizer asfollows.For suitable ρ , define the SCE potential via v sce [ ρ ] = ∇ ρ E sce [ ρ ] , . (2.16)and we will discuss how to compute this gradient later. Now assume that the (unique)infimum in Eq. (2.15) is obtained at ρ (cid:63) , which is then in particular a critical point ofthe expression W [ ρ ] + T [ ρ ] + E sce [ ρ ] . (2.17)But then ρ (cid:63) is also a critical point of the expression obtained by replacing E sce [ ρ ]with its expansion up to first order about ρ (cid:63) , which is (modulo a constant term thatdoes not affect criticality) G [ ρ ] := W [ ρ ] + T [ ρ ] + v sce [ ρ (cid:63) ] · ρ = T [ ρ ] + ( w + v sce [ ρ (cid:63) ]) · ρ. (2.18)Hence · means the inner product, and we are motivated to try to minimize G [ ρ ] over ρ ∈ J N . But we can write G [ ρ ] = inf | ψ (cid:105)(cid:55)→ ρ (cid:104) ψ | (cid:88) pq h pq [ ρ (cid:63) ]ˆ a † p ˆ a q | ψ (cid:105) , where h [ ρ ] := t + diag( w + v sce [ ρ ]) . Here diag( · ) is a diagonal matrix. Theninf ρ ∈J N G [ ρ ] = inf | ψ (cid:105)∈F : (cid:104) ψ | ψ (cid:105) =1 , (cid:104) ψ | ˆ N | ψ (cid:105) = N (cid:104) ψ | (cid:88) pq h pq [ ρ (cid:63) ]ˆ a † p ˆ a q | ψ (cid:105) . The latter infimum is a ground-state problem for a non-interacting Hamiltonian andis obtained [30] at a so-called Slater determinant of the form | ψ (cid:105) = ˆ c † · · · ˆ c † N | (cid:105) . (2.19)Here the c † k are ‘canonically transformed’ creation operators defined byˆ c † k = (cid:88) p ˆ a † p ϕ pk , (2.20) here Φ = [ ϕ · · · ϕ N ] = [ ϕ pk ] ∈ C L × N is a matrix whose columns are the N lowesteigenvectors of h [ ρ (cid:63) ]. We assume the eigenvectors form an orthonormal set, i.e. Φ ∗ Φ = I N . Moreover, one may directly compute that the electron density of | ψ (cid:105) as definedin Eq. (2.19) is given by ρ p = (cid:104) ψ | ˆ n p | ψ (cid:105) = N (cid:88) k =1 | ϕ pk | , (2.21)i.e., ρ = diag(ΦΦ ∗ ). Hence the optimizer ρ (cid:63) of Eq. (2.15) solves the Kohn-Sham SCEequations: ( t + diag( w + v sce [ ρ ])) ϕ k = ε k ϕ k , k = 1 , . . . , N.ρ = diag(ΦΦ ∗ ) . (2.22)Here ( ε k , ϕ k ) are understood to be the N lowest (orthonormal) eigenpairs of thematrix in the first line of Eq. (2.22).Eq. (2.22) is a nonlinear eigenvalue problem and should be solved self-consistently.The standard iterative procedure for this task works as follows. (1) For the k -thiterate ρ ( k ) , form the matrix h [ ρ ( k ) ], and compute Φ ( k ) by solving the correspondingeigenproblem. (2) Define ρ ( k +1) := diag(Φ ( k ) Φ ( k ) ∗ ). (3) Iterate until convergence,possibly using mixing schemes [1, 34, 21] to ensure or accelerate convergence.Once self-consistency is reached, the total energy can be recovered by the relation E KS-SCE = N (cid:88) k =1 ε k − v sce [ ρ (cid:63) ] · ρ (cid:63) + E sce [ ρ (cid:63) ] , (2.23)as can be observed by adding back to G [ ρ (cid:63) ] the constant term discarded betweenequations (2.17) and (2.18). The problem is then reduced to thecomputation of E sce [ ρ ] and its gradient v sce [ ρ ]. To this end, let us rewrite E sce [ ρ ] = inf | ψ (cid:105)(cid:55)→ ρ (cid:104) ψ | (cid:88) pq v pq ˆ n p ˆ n q | ψ (cid:105) = inf | ψ (cid:105)(cid:55)→ ρ (cid:88) s ,...,s L (cid:88) pq v pq s p s q | ψ ( s , . . . , s L ) | = inf µ ∈ Π( ρ ) (cid:88) s ,...,s L (cid:88) pq v pq s p s q µ ( s , . . . , s L ) , (2.24)where Π( ρ ) is the space of joint probability mass functions on { , } L . The 1-marginals µ (1) p are defined in terms of µ via µ (1) p ( s p ) := (cid:88) s ,...,s L \{ s p } µ ( s , . . . , s L ) , (2.25)and they satisfy µ (1) p ( s ) = (1 − ρ p ) δ s + ρ p δ s , s = 0 , . (2.26) onsidering the µ (1) p alternately as vectors, we also write (by some abuse of notation) µ (1) p = [1 − ρ p , ρ p ] (cid:62) . (2.27)Note that the last line of Eq. (2.24) is obtained by considering | ψ ( s , . . . , s L ) | as aclassical probability density µ ( s , . . . , s L ) ∈ Π( ρ ). (The marginal condition derivesfrom the condition | Ψ (cid:105) (cid:55)→ ρ .)Define the cost function C : { , } L → R by C ( s , . . . , s L ) := (cid:88) pq v pq s p s q . (2.28)Then our SCE energy may be written E sce [ ρ ] = inf µ ∈ Π( ρ ) (cid:88) s ,...,s L C ( s , . . . , s L ) µ ( s , . . . , s L ) = inf µ ∈ Π( ρ ) (cid:104) C, µ (cid:105) , (2.29)where the angle bracket notation is introduced to indicate the suggested inner product,i.e., the inner product of L ( { , } L ). This is precisely the form of a MMOT problem,namely, minimization of a linear functional of a joint probability measure subject toconstraints on all of the marginals of the measure [31]. Note that the dimension ofthe feasible space for this problem is exponential in L , rendering infeasible any directapproach based on the formulation as a general MMOT, at least for L of moderatesize.Nonetheless, we remark that in this exact formulation, ∇ ρ E sce [ ρ ] is the derivativeof the optimal value of a convex optimization problem (in particular, a linear program)with respect to a variation of it constraints. This quantity can be obtained in termsof the variables dual to the varied constraints [4]. In the setting of MMOT, these dualvariables are known as the Kantorovich potentials [40]. We will discuss the dualitytheory of our SDP relaxations in detail later on.Despite the fact that it is possible to formulate our problem as a general MMOTproblem, doing so loses the important structure of our pairwise cost. To wit, recallthat the diagonal entries of v are set to zero, C can be written C ( s , . . . , s L ) = (cid:88) p (cid:54) = q v pq s p s q =: (cid:88) p (cid:54) = q C pq ( s p , s q ) . Hence the sum can be taken over p (cid:54) = q . Accordingly, the objective function of (2.24)can be written as E sce [ ρ ] = inf µ ∈ Π( ρ ) (cid:88) p (cid:54) = q (cid:104) C pq , µ (2) pq (cid:105) , (2.30)where angle brackets are now used to indicate the suggested inner product, i.e., thatof L ( { , } ), and where the 2-marginals µ (2) pq are defined implicitly in terms of µ bymarginalizing out all components other than p, q , i.e., by µ (2) pq ( s p , s q ) := (cid:88) s ,...,s L \{ s p ,s q } µ ( s , . . . , s L ) . (2.31)Later we also identify µ (2) pq with the 2 × µ (2) pq = (cid:34) µ (2) pq (0 , µ (2) pq (0 , µ (2) pq (1 , µ (2) pq (1 , (cid:35) , (2.32) nd we do likewise for C pq . Using the matrix notation (and the symmetry of C pq ), itfollows that E sce [ ρ ] = inf µ ∈ Π( ρ ) (cid:88) p (cid:54) = q Tr[ C pq µ (2) pq ] , (2.33)where ‘Tr’ indicates the matrix trace.At first glance, it might seem that one may achieve a significant reduction ofcomplexity by directly changing the optimization variable in Eq. (2.30) from µ to { µ (2) pq } Lp,q =1 . However, extra constraints would then need to be enforced in order torelate the different 2-marginals; i.e., the two-marginals must be jointly representable in the sense that all of them could simultaneously be yielded from a single jointprobability measure on { , } L .
3. Convex relaxation.
In this section, we show that a relaxation of the rep-resentability condition implicit in Eq. (2.30) allows us to formulate a tractable opti-mization problem in terms of the { µ (2) pq } Lp,q =1 alone. In fact, this optimization problemwill be a semidefinite program (SDP). We now derive certain necessary constraints satisfied by2-marginals { µ (2) pq } Lp,q =1 that are obtained from a probability measure µ on { , } L .In the following we adopt the notation s = ( s , . . . , s L ) ∈ { , } L . Then for any such s , let e s : { , } L → R be the Dirac probability mass function on { , } L localized at s , i.e., e s ( s (cid:48) ) = δ s , s (cid:48) . Note that we can also write e s as an L -tensor, i.e., an element of R × ×···× , via e s = e s ⊗ · · · ⊗ e s L , where we adopt the (zero-indexing) convention e = [1 , (cid:62) , e = [0 , (cid:62) .Any probability measure on { , } L can be written as a convex combination of the e s since they are the extreme points of the set of probability measures; in particularwe can write a probability density µ ∈ Π( ρ ) as µ = (cid:88) s a s e s , where (cid:88) s a s = 1 , a s ≥ . (3.1)From the definitions of the 1- and 2-marginals (2.25), (2.31), it follows that µ (1) p = (cid:88) s a s e s p , µ (2) pq = (cid:88) s a s e s p ⊗ e s q = (cid:88) s a s e s p e (cid:62) s q . (3.2)Now define M = M ( { a s } ) = (cid:88) s a s e s ... e s L (cid:2) e (cid:62) s · · · e (cid:62) s L (cid:3) , (3.3) hen by Eq. (3.2), M is the matrix of 2 × M pq given by M pq = (cid:40) diag( µ (1) p ) , p = q,µ (2) pq , p (cid:54) = q. (3.4)Accordingly we write M = ( M pq ) ∈ R (2 L ) × (2 L ) . Then let C = ( C pq ) ∈ R (2 L ) × (2 L ) bethe matrix of the 2 × C pq defined above, which specifies the pairwise cost oneach pair of marginals . Observe that the value of the objective function of Eq. (2.33)can in fact be rewritten as (cid:88) p (cid:54) = q Tr[ C pq µ (2) pq ] = Tr[ CM ] . Then the MMOT problem Eq. (2.33) can be equivalently rephrased as E sce [ ρ ] = minimize M ∈ R (2 L ) × (2 L ) , { a s } s ∈{ , } L Tr( CM )subject to M = (cid:88) s a s e s ... e s L (cid:2) e (cid:62) s · · · e (cid:62) s L (cid:3) , (3.5) M pp = diag( µ (1) p ) for all p = 1 , . . . , L, (cid:88) s a s = 1 , a s ≥ s ∈ { , } L . Note that in our application to SCE, we have fixed µ (1) p = (cid:20) − ρ p ρ p (cid:21) in advance, i.e., µ (1) p is not an optimization variable.At this point, our reformulation of the problem has not alleviated its exponen-tial complexity; indeed, note that { a s } s ∈{ , } L is a vector of size 2 L . However, thereformulation does suggest a way to reduce the complexity by accepting some approx-imation. In fact, we will omit { a s } s ∈{ , } L entirely from the optimization, retainingonly M as an optimization variable and enforcing several necessary constraints on M that are satisfied by the solution of the exact problem.First, note from the constraint (3.5) that M is both entry-wise nonnegative (writ-ten M ≥
0) and positive semidefinite (written M (cid:23) local con-sistency constraints on M . Indeed, with ∈ R denoting the vector of all ones, wecan write µ (2) pq = µ (1) p , p (cid:54) = q, (3.6)from which it follows that M pq = (cid:20) − ρ p ρ p (cid:21) , p, q = 1 , . . . , L. (3.7) Without loss of generality, one can assume C pp = 0.11 hen we obtain the relaxation E sce [ ρ ] ≥ E sdpsce [ ρ ] := minimize M ∈ R (2 L ) × (2 L ) Tr( CM ) (3.8)subject to M (cid:23) ,M pq ≥ p, q = 1 , . . . , L ( p (cid:54) = q ) ,M pq = µ (1) p for all p, q = 1 , . . . , L ( p (cid:54) = q ) ,M pp = diag( µ (1) p ) for all p = 1 , . . . , L. Again, µ (1) p is not an optimization variable. It is actually helpful to reformulate theprimal 2-marginal SDP (3.8) as E sdpsce [ ρ ] = minimize M ∈ R (2 L ) × (2 L ) Tr( CM ) (3.9)subject to M (cid:23) , (3.10) M pq ≥ p, q = 1 , . . . , L ( p < q ) , (3.11) M pq = µ (1) p for all p, q = 1 , . . . , L ( p < q ) , (3.12) M (cid:62) pq = µ (1) q for all p, q = 1 , . . . , L ( p < q ) , (3.13) M pp = diag( µ (1) p ) for all p = 1 , . . . , L. (3.14)Note that this formulation is equivalent to (3.8), given the symmetry of M (implicitin the notation M (cid:23) primal problem for short.Note that the optimal value of the primal problem is in fact attained because theconstraints (3.10)-(3.14) define a compact feasible set.Reflecting back on the derivation, we caution that replacing E sce [ ρ ] with E sdpsce [ ρ ]comes at a price. Since we only enforce certain necessary conditions on M , the 2-marginals that we recover from M may not in fact be the 2-marginals of a jointprobability measure on { , } L . Thus E sdpsce [ ρ ] should in general only be expected tobe a lower-bound to E sce [ ρ ], though we will see that the error is often small in practice. As detailed in Section 2.2.1, in order to implement theSCF for Kohn-Sham SCE it is necessary to compute ∇ ρ E sce [ ρ ]. After replacing thedensity functional E sce [ ρ ] with the efficient approximation E sdpsce [ ρ ], the same derivationmotivates us to compute ∇ ρ E sdpsce [ ρ ]. This quantity can obtained by examining theconvex duality of our primal 2-marginal SDP.We let Y (cid:23) Z pq ≥ φ pq be dual to (3.12), ψ pq be dual to (3.13), and finally let X p be dual to(3.14). Note that Z pq ∈ R × and φ pq , ψ pq ∈ R for each p < q , and X p ∈ R × foreach p .Then our formal Lagrangian is of the form L ( M, Y, { Z pq , φ pq , ψ pq } p Y M ) (3.15) (cid:88) p Lemma 3.1. The primal and dual problems (3.9) and (3.19) , respectively, havethe same (finite) optimal value. However, in order to compute the SCE potential, we actually require not only thatthe duality gap is zero, but also that the supremum in the dual problem is attained .One might hope to verify Slater’s condition [4], which provides a standard methodfor verifying both strong duality and such ‘dual attainment’ simultaneously.The trouble is that Slater’s condition requires the existence of a feasible interior point M , i.e., a point M satisfying M (cid:31) M pq > p (cid:54) = q . This scenariois in fact impossible since for example the vector (cid:2) (cid:62) − (cid:62) · · · (cid:3) (cid:62) ∈ R L (3.22)lies in the null space of any feasible M , hence M (cid:31) never holds for feasible M .Instead of using Slater’s condition, we will prove dual attainment via a very carefulstudy of the structure of the dual problem. Theorem 3.2. The optimal value of the dual 2-marginal SDP (3.19) is attained.By Lemma 3.1, this optimal value is equal to the optimal value of the primal 2-marginalSDP (3.9) .Proof . Without loss of generality we assume0 < ρ p < , p = 1 , . . . , L. (3.23)To see why this assumption can be made, observe that if ρ p ∈ { , } for some p , thenattainment for the dual problem (3.19) can be reduced to attainment for a strictlysmaller dual 2-marginal SDP. We leave further details of such a reduction to thereader.) Also, for later reference, we let F ( Y, { φ pq , ψ pq } p 1. Then we claim that f ( Y ) = f (cid:0) Y + P i v (cid:62) + vP (cid:62) i (cid:1) (3.25) or any Y , v ∈ R L , and any i = 1 , . . . , L − 1. To prove this, write v = (cid:2) v (cid:62) · · · v (cid:62) L (cid:3) (cid:62) , where v q ∈ R for q = 1 , . . . , L . Then observe that, via the discussion of Kantorovichduality following the statement (3.19) of the dual problem, we can in fact write f ( Y ) = − L (cid:88) p =1 Tr (cid:104) Y pp diag( µ (1) p ) (cid:105) + 2 (cid:88) p Y PP (cid:62) Y Q P (cid:62) Y P (cid:19) . We aim to choose B such that R (cid:62) ( P B + B (cid:62) P (cid:62) ) R = (cid:18) P (cid:62) P BQ P (cid:62) P BP (cid:19) + (cid:18) Q (cid:62) B (cid:62) P (cid:62) P P (cid:62) B (cid:62) P (cid:62) P (cid:19) cancels ˆ Y on all but the top-left block. Using Q (cid:62) P = 0 (and P (cid:62) Q = 0), one canreadily check that such a choice is given by − B = ( P (cid:62) P ) − ˆ Y ( Q (cid:62) Q ) − Q (cid:62) + 12 ( P (cid:62) P ) − ˆ Y ( P (cid:62) P ) − P (cid:62) . By the identity (3.26), it follows that we can further restrict the feasible set by inter-secting with S (cid:48) = (cid:26) Y : R (cid:62) Y R = (cid:18) ∗ 00 0 (cid:19) (cid:23) , f ( Y ) ≥ d (cid:27) . (3.27)In fact S (cid:48) is compact, and the proof is complete pending the proof of this claim, towhich we now turn.Observe that for ( Y, { φ pq , ψ pq } p Y M ) . for ( Y, { φ pq , ψ pq } p Y M ) . In fact M can be written M = Q (cid:102) M Q (cid:62) , where (cid:102) M (cid:31) 0. This can be verified directlyby taking (cid:102) M = (cid:101) ρ ... (cid:101) ρ L (cid:2)(cid:101) ρ · · · (cid:101) ρ L (cid:3) + diag (cid:0)(cid:2) − (cid:101) ρ · · · − (cid:101) ρ L (cid:3)(cid:1) , with (cid:101) ρ p = 1 − ρ p , p = 1 , . . . , L. Note that (cid:102) M (cid:31) f ( Y ) ≤ Tr( CM ) − Tr( Q (cid:62) Y Q (cid:102) M ) . Now since (cid:102) M (cid:31) 0, there exists a scalar K > Y (cid:23) Q (cid:62) Y Q (cid:54)(cid:22) K ,then f ( Y ) < d . But Q (cid:62) Y Q is the upper-left block of R (cid:62) Y R , so it follows from thedefinition (3.27) of S (cid:48) that that S (cid:48) ⊂ (cid:26) Y : R (cid:62) Y R = (cid:18) A 00 0 (cid:19) , (cid:22) A (cid:22) K (cid:27) , from which it follows that S (cid:48) is compact, and the proof is complete. Remark 3.3. Note that the proof of Theorem 3.2 guarantees that the domainof the dual problem (3.19) can be restricted to Y of the form Y = Q (cid:101) Y Q (cid:62) , yieldinga ‘reduced’ dual problem in which (cid:101) Y replaces Y as an optimization variable. In fact,one can also verify directly that any M feasible for the primal problem (3.9) satisfies M P = 0 , hence the domain of the primal problem can be restricted to M of the form M = Q (cid:102) M Q (cid:62) , likewise yielding a reduced primal problem.But despite this apparent symmetry, the latter observation need not imply the for-mer in a more general SDP setting, and the arguments given in the proof of Theorem3.2, which use more of the specific structure of our problem, do appear to be necessaryto the proof of dual attainment for this problem.Moreover, observe with caution that the dual of such a reduced primal problem is not the reduced dual problem! . Tighter lower bound via 3-marginals. In this section, we further tightenthe convex relaxation proposed in Section 3 with a formulation that additionallyinvolves the 3-marginals.One defines the 3-marginals µ (3) pqr (for p, q, r, distinct) induced by a probabilitymeasure µ on { , } L via µ (3) pqr ( s p , s q , s r ) := (cid:88) s ,...,s L \{ s p ,s q ,s r } µ ( s , . . . , s L ) . (4.1)There is no 3-marginal analog known to us of the semidefinite constraint that canbe enforced using the 2-marginals. However, we can nonetheless use the 3-marginalsto enforce additional necessary local consistency constraints. Indeed, the 2-marginalscan themselves be written in terms of the 3-marginals via µ (2) pq ( s p , s q ) = (cid:88) s r µ (3) pqr ( s p , s q , s r ) . (4.2)Accordingly, we will include K = { K pqr } for distinct p, q, r as optimization vari-ables for the 3-marginals. Note that based on Eq. (4.1) we can enforce that K is symmetric , by which we mean that K pqr ( s p , s q , s r ) = K σ ( p ) σ ( q ) σ ( r ) ( s σ ( p ) , s σ ( q ) , s σ ( r ) )for any permutation σ on the letters { p, q, r } . If we were to extend K pqr by zerosto p, q, r not distinct, then we could think of K ∈ R (2 L ) × (2 L ) × (2 L ) as a symmetric3-tensor, with ( p, q, r )-th 2 × × K pqr . In principle the impositionof symmetry removes some redundancy in the specification of K .Then we arrive at the following :minimize M ∈ R (2 L ) × (2 L ) , K ∈ R (2 L ) × (2 L ) × (2 L ) Tr( CM ) (4.3)subject to M (cid:23) ,M pq ≥ p (cid:54) = q,M pq = µ (1) p for p (cid:54) = q,M pp = diag( µ (1) p ) for all p,K ≥ , K symmetric ,M pq ( s p , s q ) = (cid:88) s r K pqr ( s p , s q , s r ) for p, q, r distinct . Note that the blocks K pqr for p, q, r not distinct are superfluous and can be discardedin an efficient optimization.For simplicity, we omit discussion of the duality of (4.3). Since only linear con-straints have been added, most of the interesting features from the mathematicalviewpoint have already been discussed above. Indeed, as in Section 3.2, we may de-rive the dual of the 3-marginal problem (4.3), and we may certify as in Section 3.3that the 3-marginal problem satisfies strong duality and dual attainment. 5. General MMOT with pairwise cost. As has been suggested both explic-itly and via the notation, almost all of our discussion of relaxation methods for MMOTcan be applied to general MMOT problems with pairwise cost functions. The maincaveat is that specific references to the fact that the 1-marginal state space has twoelements should be suitably generalized. For clarity, we now recapitulate our methods or the general MMOT problem with pairwise cost. The reader interested in generalMMOT should still see the earlier sections for derivations, discussions, and proofs.Here we only summarize the methods.We will consider a problem with L marginals, written µ (1) p for p = 1 , . . . , L . Thesequantities are fixed in advance and never varied in the following discussion. We let N p be the size of the state space of the p -th marginal, so µ (1) p is a probability vectorof length N p . Note that the marginals need not all have the same state space, i.e., N p can depend on p . We write the p -th state space as X p := { , . . . , N p } . Then thejoint state space is given by X := (cid:81) Lp =1 X p , and we write Pr p for the p -th projection X → X p . Suppose that we are given a pairwise cost function C pq ∈ R N p × N q for eachpair p (cid:54) = q of marginals. (Without loss of generality we assume C pp = 0.) Then weconsider the problemmin µ ∈P ( X ) (cid:88) ( s ,...,s L ) ∈X L (cid:88) p,q =1 C pq ( s p , s q ) µ ( s , . . . , s L ) , s.t. (Pr p ) µ = µ (1) p , p = 1 , . . . , L. (5.1)Here µ : X → R can be thought of as an L -tensor whose p -th index ranges from1 , . . . , N p . Again, the objective function of such a MMOT problem can be rephrasedin terms of the 2-marginals:min µ ∈P ( X ) L (cid:88) p (cid:54) = q Tr( C pq µ (2) pq ) , s.t. (Pr p ) µ = µ (1) p , p = 1 , . . . , L, (5.2)where the 2-marginals µ (2) pq are here implicitly defined in terms of the optimizationvariable µ .Then we introduce the minimize M ∈ R N tot × N tot Tr( CM ) (5.3)subject to M (cid:23) ,M pq ≥ p, q = 1 , . . . , L ( p (cid:54) = q ) ,M pq N q = µ (1) p for all p, q = 1 , . . . , L ( p (cid:54) = q ) ,M pp = diag( µ (1) p ) for all p = 1 , . . . , L. Here N tot := (cid:80) Lp =1 N p and k denotes the vector of ones of length k . The dual of(5.3) is given bymaximize Y, { φ pq ,ψ pq } p 6. Numerical results. In this section, we numerically demonstrate the effec-tiveness of the proposed methods on model problems of strongly correlated fermionicsystems. Here we consider a 1D spinless Hubbard-like model defined by the Hamiltonian of Eq. (2.6), in which we take t pq = (cid:40) | q − p | = 1 , v , with next-nearest neighbor (NN) interaction, v pq = U/ | q − p | = 1 ,U/ 40 if | q − p | = 2 , v pq = U/ | q − p | = 1 ,U/ 20 if | q − p | = 2 ,U/ 200 if | q − p | = 3 , numerically exact and hence we consider the case to be not representative. We do not have a proof yetto explain why our convex relaxation scheme can be numerically exact.We will compare the Kohn-Sham SCE energies yielded by our methods with oneanother, as well as with the exact ground-state energy (2.7), which is computed viaexact diagonalization (ED) in the OpenFermion [27] software package. The MMOTproblems arising in Kohn-Sham SCE and their SDP relaxations are solved in MATLAB with the CVX software package [11].We refer to the exact self-consistent Kohn-Sham SCE solution obtained by solvingthe original linear programming (LP) problem for MMOT as the ‘LP’ solution. Hencethe tightness of the Kohn-Sham SCE lower bound (2.15) itself can be evaluated bycomparing the exact energy with the LP energy, while the tightness of our SDP relaxations of the relevant MMOT problems (which, in turn, yield lower bounds forthe Kohn-Sham SCE energy) can be evaluated by comparing the LP energy with the - and 3-marginal SDP energies. We refer to these two sources of error, respectively,as the ‘Kohn-Sham SCE model error’ and the ‘error due to relaxation.’In Figs. 6.1(a) and 6.2(a), we plot E/U with respect to U for v as in Eqs.(6.2) and (6.3), respectively. In these experiments, L = 14 and N = 9. The energydifferences of the Kohn-Sham SCE solutions from the exact energy are plotted inFigs. 6.1(b) and 6.2(b). It is confirmed numerically that the LP energy lower-boundsthe exact energy, and in turn the SDP energies lower-bound the LP energy. Whilethe 3-marginal SDP lower bound is noticeably tighter than the 2-marginal SDP lowerbound, the error due to relaxation is dominated by the Kohn-Sham SCE model errorin both cases.Since the effective potential is of interest in Kohn-Sham DFT, in Fig. 6.3 we plotthe SCE potential (2.16) at self-consistency in the case of v as in Eq. (6.3). It canbe seen that the 3-marginal SDP performs better than the 2-marginal SDP in thisregard, as one might expect. (However, note carefully that although it is guaranteed apriori that the 3-marginal SDP provides a lower bound on the energy that is at leastas tight as that of the 2-marginal SDP, no such comparison is theoretically guaranteedin advance for the effective potential.)To study the scaling of energy in the thermodynamic limit L → ∞ , in Fig 6.4(a),we plot E/U as a function of L by fixing U = 5 and a filling factor of N/L = 2 / U -6-4-2024 E / U LPSDP 2-marginalExactSDP 3-marginal (a) U E / U Exact-LPExact-SDP 2-marginalExact-SDP 3-marginal (b) Fig. 6.1: Spinless 1D fermionic lattice model with v as in Eq. (6.2), L = 14, N = 9. (a) E/U as a function of U . (b) Difference between the exact energy and the Kohn-ShamSCE energies obtained from the unrelaxed LP and the SDP relaxations. We consider a 2D generalized Hubbardtype model defined by the Hamiltonianˆ H = − L − (cid:88) i,j =1 (cid:88) σ ∈{↑ , ↓} (cid:16) ˆ a † i +1 ,j ; σ ˆ a i,j ; σ + ˆ a † i,j +1; σ ˆ a i,j ; σ + h.c. (cid:17) + U L (cid:88) i,j =1 ˆ n i,j ; ↑ ˆ n i,j ; ↓ + V L − (cid:88) i,j =1 (ˆ n i +1 ,j ˆ n i,j + ˆ n i,j +1 ˆ n i,j ) . (6.4)Here ˆ n i,j := ˆ n i,j ; ↑ + ˆ n i,j ; ↓ . As discussed in section 2, although the creation andannihilation operators in Eq. (6.4) involve two spatial indices and one spin index, one U -4-2024 E / U LPSDP 2-marginalSDP 3-marginalExact (a) U E / U Exact-LPExact-SDP 2-marginalExact-SDP 3-marginal (b) Fig. 6.2: Spinless 1D fermionic lattice model with v as in Eq. (6.3), L = 14, N = 9. (a) E/U as a function of U . (b) Difference between the exact energy and the Kohn-ShamSCE energies obtained from the unrelaxed LP and the SDP relaxations. Site E sc e [] LPSDP 2-marginalSDP 3-marginal Fig. 6.3: The effective potential for the spinless 1D fermionic lattice model with v as in Eq. (6.3), U = 5, L = 14, N = 9. The relative (cid:96) errors for the 2- and 3-marginal formulations (compared to the unrelaxed LP formulation) are 1 . × − and 2 . × − , respectively.may of course order the operators with a single index by defining b ( j − L + i = a i,j ; ↑ , b ( j − L + i + L = a i,j ; ↓ . The new creation operators are fixed as the Hermitian adjoints of these new annihila-tion operators. The term associated with U is the on-site electron-electron interaction,while V specifies the nearest-neighbor electron-electron interaction. In the standardHubbard model, we have V = 0. (However, in the case V = 0, the MMOT problemarising in the SCE framework becomes a trivial problem, since the interaction termsassociated with different sites are decoupled.) Fig. 6.5 shows the energies for the gen-eralized Hubbard model on a 3 × V = 0 . U and U ranging from 1 . . 0. The number N of electrons is set to be 12. Here energies are obtained from theexact solution, the exact Kohn-Sham SCE solution obtained by linear programming(LP), and the approximate Kohn-Sham SCE solution obtained via the 2-marginalSDP relaxation. We find that the Kohn-Sham SCE formulation becomes asymptot-ically accurate when U becomes large. Furthermore, the error due to relaxation ismuch smaller than the Kohn-Sham SCE model error. Fig. 6.5(b) further shows that L E / U LPSDP 2-marginalSDP 3-marginalExact (a) 12 14 16 18 20 22 24 L T i m e ( s e c ) LPSDP 2-marginalSDP 3-marginal (b) Fig. 6.4: Spinless 1D fermionic lattice model with v as in Eq. (6.3), U = 5, N/L = 2 / E/U as a function of L . (b) Running time as a function of L . ExactSDP 2-marginalLP (a) Exact - SDP 2-marginalLP - SDP 2-marginal (b) Fig. 6.5: Spinful 3 × N = 12.the energy difference between the LP and 2-marginal SDP solutions is approximatelyconstant with respect to the on-site interaction strength U . 7. Conclusion. In this paper, we have considered the strictly correlated elec-tron (SCE) limit of a fermionic quantum many-body system in the second-quantizedformalism. To the extent of our knowledge, the setup of the SCE problem in this set-ting has not appeared in the literature. Mathematically, the SCE limit requires thesolution of a multi-marginal optimal transport problem over certain classical proba-bility measures. We propose a relaxation that enforces constraints on the 2-marginalsof these measures, and the relaxed problem can be solved efficiently via semi-definiteprogramming (SDP). We prove that the SDP problem satisfies strong duality andmoreover that the dual solution is attained, despite the fact that the primal problemdoes not possess a strictly feasible point. We consider a tighter relaxation involvingthe 3-marginals and discuss how our methods can be applied to completely generalmulti-marginal optimal transport problems with pairwise costs.The relaxed formulation is not exact and provides only a lower bound to the SCEenergy. Hence it is meaningful to compare the error due to relaxation with the Kohn-Sham SCE model error, i.e., the disparity between the Kohn-Sham SCE energy and he exact energy of the solution to the quantum many-body problem. Our numericalresults for various fermionic lattice model problems indicate that the former can bemuch smaller than the latter, hence our convex relaxation scheme can be consideredto be effective. On the other hand, as indicated in, e.g., [23], Kohn-Sham SCE is onlythe zero-th order approximation to the quantum many-body ground state energy inthe limit of large interaction. Hence the SCE functional and SCE potential should beconsidered more properly as an “ingredient” for designing more accurate exchange-correlation functionals. From such a perspective, just as the exact formulation of SCEis only a model, it may even be appropriate to consider the relaxed SCE formulationas a model itself. It can capture certain strong correlation effects and can be solvedefficiently.One immediate extension of the current work is to include finite-temperatureeffects via entropic regularization. In fact, entropic regularization may be relevantfor another reason as well. During our numerical studies, we observed that the self-consistent iteration for Kohn-Sham SCE ( not the convex optimization problem solvedwithin each iteration) can be difficult to converge. The convergence behavior may de-pend sensitively on the filling factor, the lattice size, and the form of the interaction.Such difficulty can arise for both the exact SCE formulation solved via linear pro-gramming and the relaxed formulations solved by SDP. Preliminary results show thatentropic regularization can help make the loop easier to converge. We are not awareof any reports of such issues in the literature, and we plan to study such behaviormore systematically in future work. Acknowledgments:. This work was partially supported by the Department ofEnergy under Grant No. de-sc0017867, No. DE-AC02-05CH11231, by the Air ForceOffice of Scientific Research under award number FA9550-18-1-0095 (L.L.), and bythe National Science Foundation Graduate Research Fellowship Program under grantDGE-1106400 (M.L.). The work of Y.K. and L.Y. is partially supported by the U.S.Department of Energy, Office of Science, Office of Advanced Scientific ComputingResearch, Scientific Discovery through Advanced Computing (SciDAC) program, andthe National Science Foundation under award DMS-1818449. We thank Kieron Burke,Gero Friesecke, Paola Gori-Giorgi, and Michael Seidl for helpful discussions. REFERENCES[1] D. G. Anderson , Iterative procedures for nonlinear integral equations , J. Assoc. Comput.Mach., 12 (1965), pp. 547–560.[2] A. D. Becke , Density-functional exchange-energy approximation with correct asymptotic be-havior , Phys. Rev. A, 38 (1988), pp. 3098–3100.[3] J.-D. Benamou, G. Carlier, and L. Nenna , A numerical method to solve multi-marginaloptimal transport problems with Coulomb cost , in Splitting Methods in Communication,Imaging, Science, and Engineering, Springer, 2016, pp. 577–601.[4] Stephen Boyd and Lieven Vandenberghe , Convex optimization , Cambridge Univ. Pr., 2004.[5] Giuseppe Buttazzo, Luigi De Pascale, and Paola Gori-Giorgi , Optimal-transport formu-lation of electronic density-functional theory , Phys. Rev. A, 85 (2012), p. 062502.[6] Klaus Capelle and Vivaldo L. Campo , Density functionals and model Hamiltonians: Pillarsof many-particle physics , Phys. Rep., 528 (2013), pp. 91–159.[7] Huajie Chen, Gero Friesecke, and Christian B Mendl , Numerical methods for a kohn-sham density functional model based on optimal transport , J. Chem. Theory Comput., 10(2014), pp. 4360–4368.[8] J. P. Coe , Lattice density-functional theory for quantum chemistry , Phys. Rev. B, 99 (2019),p. 165118.[9] A John Coleman , Structure of fermion density matrices , Rev. Mod. Phys., 35 (1963), p. 668.2510] Codina Cotar, Gero Friesecke, and Claudia Kl¨uppelberg , Density functional theory andoptimal transportation with coulomb cost , Commun. Pure Appl. Math., 66 (2013), pp. 548–599.[11] M. Grant and S. Boyd , CVX: Matlab software for disciplined convex programming , 2013.[12] Juri Grossi, Derk P Kooi, Klaas JH Giesbertz, Michael Seidl, Aron J Cohen, PaulaMori-S´anchez, and Paola Gori-Giorgi , Fermionic statistics in the strongly correlatedlimit of density functional theory , J. Chem. Theory Comput., 13 (2017), pp. 6089–6100.[13] P. Hohenberg and W. Kohn , Inhomogeneous electron gas , Phys. Rev., 136 (1964), pp. B864–B871.[14] Y. Khoo and L. Ying , Convex relaxation approaches for strictly correlated density functionaltheory , arXiv:1808.04496, (2018).[15] W. Kohn and L. Sham , Self-consistent equations including exchange and correlation effects ,Phys. Rev., 140 (1965), pp. A1133–A1138.[16] H. Komiya , Elementary proof for Sion’s minimax theorem , Kodai Math. J., 11 (1988), pp. 5–7.[17] C. Lee, W. Yang, and R. G. Parr , Development of the Colle-Salvetti correlation-energyformula into a functional of the electron density , Phys. Rev. B, 37 (1988), pp. 785–789.[18] M. Levy , Universal variational functionals of electron densities, first-order density matrices,and natural spin-orbitals and solution of the v-representability problem , Proc. Natl. Acad.Sci., 76 (1979), pp. 6062–6065.[19] E. H. Lieb , Density functional for Coulomb systems , Int J. Quantum Chem., 24 (1983), p. 243.[20] N. A. Lima, L. N. Oliveira, and K. Capelle , Density-functional study of the Mott gap inthe Hubbard model , Europhys. Lett., 60 (2002), pp. 601–607.[21] L. Lin and C. Yang , Elliptic preconditioner for accelerating self consistent field iteration inKohn-Sham density functional theory , SIAM J. Sci. Comp., 35 (2013), pp. S277–S298.[22] F. Malet and P. Gori-Giorgi , Strong Correlation in Kohn-Sham Density Functional Theory ,Phys. Rev. Lett., 109 (2012), p. 246402.[23] Francesc Malet, Andr´e Mirtschink, Klaas JH Giesbertz, Lucas O Wagner, and PaolaGori-Giorgi , Exchange–correlation functionals from the strong interaction limit of DFT:applications to model chemical systems , Phys. Chem. Chem. Phys., 16 (2014), pp. 14551–14558.[24] David Mazziotti , Realization of quantum chemistry without wave functions through first-ordersemidefinite programming , Phys. Rev. Lett., 93 (2004), p. 213001.[25] , Structure of fermionic density matrices: Complete N-representability conditions , Phys.Rev. Lett., 108 (2012), p. 263002.[26] David A Mazziotti , Contracted Schr¨odinger equation: Determining quantum energies andtwo-particle density matrices without wave functions , Phys. Rev. A, 57 (1998), p. 4219.[27] J. R. McClean et al. , OpenFermion: the electronic structure package for quantum computers ,arXiv:1710.07629, (2017).[28] C. Mendl and L. Lin , Kantorovich dual solution for strictly correlated electrons in atoms andmolecules , Phys. Rev. B, 87 (2013), p. 125106.[29] Christian B Mendl, Francesc Malet, and Paola Gori-Giorgi , Wigner localization inquantum dots from kohn-sham density functional theory without symmetry breaking , Phys.Rev. B, 89 (2014), p. 125106.[30] J. W. Negele and H. Orland , Quantum many-particle systems , Westview, 1988.[31] B. Pass , Multi-marginal optimal transport: theory and applications , ESAIM: MathematicalModelling and Numerical Analysis, 49 (2015), pp. 1771–1790.[32] J. P. Perdew, K. Burke, and M. Ernzerhof , Generalized gradient approximation madesimple , Phys. Rev. Lett., 77 (1996), pp. 3865–3868.[33] J. P. Perdew and A. Zunger , Self-interaction correction to density-functional approximationsfor many-electron systems , Phys. Rev. B, 23 (1981), pp. 5048–5079.[34] P. Pulay , Convergence acceleration of iterative sequences: The case of SCF iteration , Chem.Phys. Lett., 73 (1980), pp. 393–398.[35] R. T. Rockafellar , Convex analysis , Princeton University Press, 1970.[36] K. Sch¨onhammer, O. Gunnarsson, and R. M. Noack , Density-functional theory on a lattice:Comparison with exact numerical results for a model with strongly correlated electrons ,Phys. Rev. B, 52 (1995), p. 2504.[37] M. Seidl, P. Gori-Giorgi, and A. Savin , Strictly correlated electrons in density-functionaltheory: A general formulation with applications to spherical densities , Phys. Rev. A, 75(2007), p. 042511.[38] Michael Seidl, John P Perdew, and Mel Levy , Strictly correlated electrons in density-functional theory , Phys. Rev. A, 59 (1999), p. 51.[39]
r [ φ rq (1) − φ rq (0)] + 2 (cid:88) p