[PDF] A BDF2-Approach for the Non-linear Fokker-Planck Equation

Abstract

We prove convergence of a variational formulation of the BDF2 method applied to the non-linear Fokker-Planck equation. Our approach is inspired by the JKO-method and exploits the differential structure of the underlying L 2 -Wasserstein space. The technique presented here extends and strengthens the results of our own recent work on the BDF2 method for general metric gradient flows in the special case of the non-linear Fokker-Planck equation: firstly, we do not require uniform semi-convexity of the augmented energy functional; secondly, we prove strong instead of merely weak convergence of the time-discrete approximations; thirdly, we directly prove without using the abstract theory of curves of maximal slope that the obtained limit curve is a weak solution of the non-linear Fokker-Planck equation.

Full PDF

aa r X i v : . [ m a t h . NA ] J a n A BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION

SIMON PLAZOTTA

Abstract.

We prove convergence of a variational formulation of the BDF2 method applied to the non-linear Fokker-Planck equation. Our approach is inspired by the JKO-method and exploits the diﬀerentialstructure of the underlying L -Wasserstein space. The technique presented here extends and strengthensthe results of our own recent work on the BDF2 method for general metric gradient ﬂows in the specialcase of the non-linear Fokker-Planck equation: ﬁrstly, we do not require uniform semi-convexity of theaugmented energy functional; secondly, we prove strong instead of merely weak convergence of the time-discrete approximations; thirdly, we directly prove without using the abstract theory of curves of maximalslope that the obtained limit curve is a weak solution of the non-linear Fokker-Planck equation. Introduction

This article is concerned with the proof of well-posedness and convergence of a formally higher-order semi-discretization in time, inspired by the Backward Diﬀerentiation Formula 2 (BDF2), applied to the non-linearFokker-Planck equation with no-ﬂux boundary condition : ∂ t ρ = ∆( ρ m ) + div ( ρ ∇ V ) + div ( ρ ∇ ( W ∗ ρ )) in (0 , ∞ ) × Ω , n · D ρ = 0 , on (0 , ∞ ) × ∂ Ω , ρ (0 , x ) = ρ ( x ) in Ω . (1.1)We consider (1.1) as an evolutionary equation in the space of probability measures P (Ω) with ﬁnite secondmoment (i.e M ( µ ) := R Ω k x k d µ ( x ) < ∞ ), where Ω = R d or Ω ⊂ R d is an open and bounded domain withLipschitz-continuous boundary ∂ Ω and normal derivative n . Indeed, if (1.1) is initialized with ρ ∈ P (Ω)then there exists a weak solution ρ : [0 , ∞ ) × Ω → R ≥ such that ρ (0) = ρ and ρ ( t ) ∈ P (Ω) for each t > L -Wasserstein space ( P (Ω) , W ), see [2, 18, 30, 33, 36, 37]. The L -Wasserstein distance W betweentwo measures µ and ν in P (Ω) is deﬁned by W ( µ, ν ) := min p ∈ Γ( µ,ν ) Z Ω k x − y k d p ( x, y ) , (1.2)where Γ( µ, ν ) := { p ∈ P (Ω × Ω) : ( π ) p = µ, ( π ) p = ν } is the set of all transport plans from ρ to ν . Note, the minimizers p ∈ Γ( µ, ν ) of W ( µ, ν ) are called the optimal transport plans . The correspondingenergy functional F : P (Ω) → R ∪ {∞} for (1.1) is given by: F ( µ ) := (R Ω ρ log( ρ ) + V ρ + ( W ∗ ρ ) ρ d x if m = 1 , R Ω 1 m − ρ m + V ρ + ( W ∗ ρ ) ρ d x if m > , (1.3)provided that µ = ρ L d and the integrals on the right-hand side are well-deﬁned otherwise we set F ( µ ) = ∞ .In the framework of L -Wasserstein gradient ﬂows, existence of solutions has been shown via the JKO-scheme, named after the authors of [18]. This scheme is a variational formulation of the Implicit Euler methodgiven as follows: for ﬁxed time step size τ ∈ (0 , τ ∗ ) construct inductively, starting from ρ , a sequence ofprobability measures ( ρ kτ ) k ∈ N as the minimizer of an augmented energy functional: ρ kτ ∈ argmin ρ ∈ P (Ω) τ W ( ρ, ρ k − τ ) + F ( ρ ) . (1.4) Date : January 30, 2018.2010

Mathematics Subject Classiﬁcation.

Key words and phrases. gradient ﬂow, second order scheme, BDF2, minimizing movements, non-linear diﬀusion equations.This research has been supported by the German Research Foundation (DFG), SFB TRR 109. The authors would like to thankDaniel Matthes for helpful discussions and remarks.

It is known that the thus obtained discrete gradient approximation converge to a solution of the non-linearFokker-Planck equation (1.1) as τ tends to zero.Note, this scheme has been similarly applied to a variety of PDEs and systems of PDEs with gradientﬂow structure in the L -Wasserstein or in a L -Wasserstein-like space: non-local Fokker-Planck equations[9, 11, 35]; Fokker-Planck equations on manifolds [13, 34]; fourth order ﬂuid and quantum models [15, 16, 25];chemotaxis systems [4, 5, 39]; Poisson-Nernst-Planck equations [20]; multi-component ﬂuid systems [22];Cahn-Hilliard equations [24]; degenerate cross-diﬀusion systems [29, 40].Besides the theoretical use to construct solution for (1.1), this particular discretization (1.4) providesalso a structure preserving numerical scheme. The approximate solution inherits automatically positivity,mass conservation and energy dissipation. Diﬀerent approaches to actually compute the minimizers of (1.4)have been investigated: particle schemes [6, 8, 7, 38]; evolving diﬀeomorphisms [8, 10]; Lagrangian schemes[3, 12, 14, 19, 26, 28]; entropic regularization [31]. However, it turns out that the application of these schemesto gradient ﬂows in L -Wasserstein space is intricate, since computing the L -Wasserstein distance and itsgradient is diﬃcult in dimension two or more.We proposed in our own recent work [27] a diﬀerent variational formulation of a semi-discretization intime, i.e., of the Backward Diﬀerentiation Formula 2 (BDF2) method. In this context the BDF2 methodreads as follows: for each suﬃciently small time step τ ∈ (0 , τ ∗ ), let a pair of initial data ( ρ − τ , ρ τ ) be giventhat approximate ρ . Then, deﬁne inductively the discrete solution ( ρ kτ ) k ∈ N as the minimizers of the followingaugmented energy functional, ρ kτ ∈ argmin ρ ∈ P (Ω) τ W ( ρ k − τ , ρ ) − τ W ( ρ k − τ , ρ ) + F ( ρ ) . (1.5)Similar to the JKO-scheme the BDF2 method is structure preserving in the sense that the discrete solutioninherits automatically positivity, mass preservation and is almost energy dissipating (see lemma A.1). Weremark that recently also other variational formulations of formally higher-order time discretizations havebeen investigated, namely Runge-Kutta methods [21, 23].Our main contribution in this work is to improve the convergence result of [27] from weak to strongconvergence of the discrete solution ( ρ kτ ) k ∈ N . Also in contrast to [27], our approach is independent of theuniform semi-convexity of the augmented energy functional on the right-hand side of (1.5). More in thespirit of the original works on the linear Fokker-Planck equation of Kinderlehrer et al. [18], we solely utilizethe diﬀerential structure of both the L -Wasserstein space and of the augmented energy functional.Note, the BDF2 method and the techniques presented here have two further possible applications. Firstly,PDEs with gradient ﬂow structure such that the energy function F do not possesses any uniform semi-convexity property – like the Hele-Shaw equation seen as L -Wasserstein gradient ﬂow – are not coveredin [27]. However, as long as the subdiﬀerential calculus in the L -Wasserstein space is applicable to F ourmethod is feasible. With this technique at hand on can compute from (1.5) the discrete Euler-Lagrangeequations for the discrete approximation by variations along solutions of the continuity equation (likwisetheorem 4.1). Hence, passing to the limit as τ tends to zero could yield directly a distributional solutionfor the aforementioned class of PDEs without using the abstract theory of curves of steepest descent for λ -contractive gradient ﬂows. Secondly, the formally higher-order approximation in time is expected toimprove the performance of numerical simulations due to the better resolution of the solution with respectto a coarser time grid. In conclusion, the BDF2 method provides a structure preserving numerical scheme of formally higher-order approximation in time with a strong notion of convergence.

Our main results concerning the well-posedness and the limit behavior as τ ց interpolated solution ρ τ , which is deﬁned as the piecewise constant interpolation in time of the discrete solution ( ρ kτ ) k ∈ N obtainedby the BDF2 method (1.5), ρ τ (0) = ρ τ , ρ τ ( t ) = ρ kτ for t ∈ (( k − τ, kτ ] and k ∈ N , is stated in the following theorem. The threshold τ ∗ is speciﬁed in (3.2). BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 3

Theorem 1.1.

Let Ω ⊂ R d be either an open and bounded domain with Lipschitz continuous boundary ∂ Ω orlet Ω = R d . Further, assume m ≥ and that V and W satisfy Assumption 2.1. Given a vanishing sequence ( τ n ) n ∈ N of step sizes τ n ∈ (0 , τ ∗ ) and initial data ( ρ − τ n , ρ τ n ) satisfying Assumption 2.2, then the followinghold:(i) Existence of the discrete solutions.

For each step size τ ∈ (0 , τ ∗ ) there exists a sequence ( ρ kτ ) k ∈ N obtained by the BDF2 scheme (1.5) , which satisﬁes the step size independent bounds (3.6) on the kineticenergy, on the internal energy, and on the second moments.(ii) Narrow convergence in P (Ω) . There exists a (non-relabelled) subsequence ( τ n ) n ∈ N and a limitcurve ρ ∗ ∈ AC (0 , ∞ ; ( P (Ω) , W )) such that for any t ≥ : ρ τ n ( t ) ⇀ ρ ∗ ( t ) narrowly in the space P (Ω) as n → ∞ . (iii) Step size independent L (0 , T ; BV (Ω)) -estimate. For each ﬁxed time horizon

T > there existsa non-negative constant C , depending only on m, V, W , and T such that for each τ ∈ (0 , τ ∗ ) : k ( ρ τ ) m k L (0 ,T ; BV (Ω)) ≤ C. (iv) Strong convergence in L m (0 , T ; L m (Ω)) . With the notations from (ii), there exists a further(non-relabelled) subsequence ( τ n ) n ∈ N such that for all T > :(a) In the case of an open and bounded set Ω ⊂ R d with Lipschitz-continuous boundary ∂ Ω , we have : ρ τ n → ρ ∗ in L m (0 , T ; L m (Ω)) as n → ∞ . (b) In the case of the entire space, i.e., Ω = R d , we have for every open and bounded set e Ω ⋐ R d : ρ τ n → ρ ∗ in L m (0 , T ; L m ( e Ω)) as n → ∞ . (v) Solution of the non-linear Fokker-Planck equation.

The limit curve ρ ∗ from (ii) satisﬁes thenon-linear Fokker-Planck equation with no-ﬂux boundary condition (1.1) in the distributional sense,i.e., we have for each test function ψ ∈ C ∞ c ([0 , ∞ ) × Ω) : Z ∞ Z Ω − ∆ ψ ρ m ∗ + h∇ ψ, ∇ V i ρ ∗ + h∇ ψ, ∇ W ∗ ρ ∗ i ρ ∗ d x d t = Z ∞ Z Ω ∂ t ψ ρ ∗ d x d t + Z Ω ψ (0) ρ d x. The plan of the paper is as follows. In section 2 we recall the basic notation of the theoretical framework ofthe gradient ﬂow formulation of the non-linear Fokker-Planck equation, of our particular time-discretizationand of BV (Ω)-spaces. Section 3 is concerned with basic properties of the augmented energy functional andof the approximation obtained by that scheme. In Section 4 we derive the discrete Euler-Lagrange equationsby means of a variation of the augmented energy functional along solutions to the continuity equation. Fromthese discrete Euler-Lagrange equations we derive BV (Ω)-regularity estimates. In Section 5 we complete theproof of the main theorem and prove the convergence of the approximation to the distributional solution ofthe non-linear Fokker-Planck equation (1.1).2. Setup and Assumptions

Gradient Flow Framework of the Non-linear Fokker-Planck equation.

Throughout the restof the paper Ω ⊂ R d is either equal to R d or some open and bounded domain with Lipschitz-continuousboundary ∂ Ω. By P (Ω) we will denote the set of probability measures on Ω. We say a sequence of measures( µ n ) n ∈ N converges narrowly to µ ∈ P (Ω) if and only iflim n →∞ Z Ω ψ d µ n ( x ) = Z Ω ψ d µ ( x ) for all ψ ∈ C b (Ω) , i.e., narrow convergence is equal to weak ∗ - convergence , which is induced by the pairing of the continuousand bounded functions C b (Ω) with the corresponding dual space of ﬁnite Borel measures M f (Ω).A curve µ : [0 , ∞ ) → P (Ω) is said to be L -absolutely continuous , we write µ ∈ AC (0 , ∞ ; ( P (Ω) , W )),if there exists a function A ∈ L loc (0 , ∞ ) such that W ( µ ( s ) , µ ( t )) ≤ Z ts A ( r ) d r for all 0 ≤ s ≤ t. SIMON PLAZOTTA

The corresponding energy functional F of the non-linear Fokker-Planck equation (1.1), deﬁned in (1.3),is the sum of three parts: internal energy U m ; external energy V ; interaction energy W . The internal energy U m is given byfor m = 1 : H ( µ ) := U ( µ ) := Z Ω ρ ( x ) log( ρ ( x )) d x, or for m > U m ( µ ) := 1 m − Z Ω ρ ( x ) m d x, where the measure µ is absolutely continuous with respect to the Lebesgue measure L d with density ρ , i.e., µ = ρ L d . For measures ρ which are singular with respect to the Lebesgue measure, we set U m ( ρ ) = ∞ . Thisconvention makes the internal energy U m lower semi-continuous with respect to narrow convergence, see [1].Therefore, by a slight abuse of notation, we shall always identify an absolutely continuous measure µ withits corresponding density ρ . The according proper domains of the H and U m are given by K := (cid:8) ρ ∈ P (Ω) | ρ log( ρ ) ∈ L (Ω) (cid:9) , K m := (cid:8) ρ ∈ P (Ω) | ρ m ∈ L (Ω) (cid:9) . Further, the external energy V and the interaction energy W are deﬁned via V ( ρ ) := Z Ω V ρ d x, W ( ρ ) := 12 Z Ω ( W ∗ ρ ) ρ d x := 12 Z Ω W ( x − y ) ρ ( y ) ρ ( x ) d x d y. For the rest of the paper our assumption on the external potential V and on the interaction kernel W readsas follows: Assumption 2.1.

Let the external potential V ∈ C (Ω) and the symmetric interactionkernel W ∈ C (cid:0) R d (cid:1) be bounded as follows: | V ( x ) | , | W ( x ) | , k∇ V ( x ) k , k∇ W ( x ) k ≤ d (cid:16) k x k (cid:17) . Note, this standard assumption guarantees that all integrals with respect to any measure ρ ∈ P (Ω) andwith integrands V , W , ∇ V , or ∇ W are well-deﬁned and ﬁnite. Further, the functionals V and W arecontinuous with respect to narrow convergence by this assumption [2].2.2. Discretization.

Similarly to [27], the

BDF2 penalization

Ψ : (0 , ∞ ) × ( P (Ω)) → R ∪ {∞} of theoriginal energy functional F is deﬁned byΨ( τ, η, ν ; · ) : P (Ω) → R ∪ {∞} ; Ψ( τ, η, ν ; ρ ) := 1 τ W ( ν, ρ ) − τ W ( η, ρ ) + F ( ρ ) . With this notation, given a time step size τ ∈ (0 , τ ∗ ) and a pair of initial data ( ρ − τ , ρ τ ), the discrete solution ( ρ kτ ) k ∈ N for F on ( P (Ω) , W ) deﬁned in (1.5) is equivalently deﬁned by the recursive formula ρ kτ ∈ argmin ρ ∈ P (Ω) Ψ( τ, ρ k − τ , ρ k − τ ; ρ ) for k ∈ N . (2.1)In the rest of the paper we approximate ρ for a given time step size τ ∈ (0 , τ ∗ ) by a pair of initial data( ρ − τ , ρ τ ) as follows: Assumption 2.2.

There are non-negative constants d , d such that for all τ ∈ (0 , τ ∗ ):(I1) W ( ρ − τ , ρ τ ) ≤ d τ and W ( ρ τ , ρ ) ≤ d τ .(I2) U m ( ρ − τ ) ≤ d and U m ( ρ τ ) ≤ d . Functions of Bounded Variation.

We recall the basic deﬁnitions and properties of functions ofbounded variation, following [17]. A function ρ ∈ L (Ω) is called a function of bounded variation if and onlyif V ( ρ, Ω) := sup (cid:26)Z Ω ρ ( x ) div ξ ( x ) d x | ξ ∈ C ∞ c (Ω , R d ) , k ξ k ∞ ≤ (cid:27) < ∞ . The set of all functions of bounded variation is denoted by BV (Ω) and can be equipped with the norm: k ρ k BV (Ω) = k ρ k L (Ω) + V ( ρ, Ω) . For open sets Ω ⊂ R d the set BV (Ω) is a Banach space and the norm is lower semi-continuous with respect tothe weak convergence in L (Ω). In case that Ω is an open and bounded set in R d with Lipschitz-continuousboundary ∂ Ω, sets of functions uniformly bounded in the BV (Ω)-norm are relatively compact in L (Ω), see[17, Theorem 1.19] for the statement and the proof. BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 5 Well-posedness and Basic Properties of the BDF2 Scheme

Lower Bounds and Lower Semi-Continuity.

We establish the following two basic properties of theBDF2 penalization Ψ, which will be essential for the solvability of problem (2.1): Ψ( τ, η, ν ; · ) is boundedfrom below and lower semi-continuous. Lemma 3.1 (Lower Bound) . There exist a non-negative constant d such that the BDF2 penalization Ψ satisﬁes for each τ > and for all ρ, η, ν ∈ P (Ω) : Ψ( τ, η, ν ; ρ ) ≥ (cid:18) τ − d − d (cid:19) M ( ρ ) − τ M ( ν ) − τ M ( η ) − d − d . (3.1) Remark 3.2.

Without loss of generality we assume that τ ∗ < (12 d + 8 d ) − , (3.2)such that ρ Ψ( τ, η, ν ; ρ ) is bounded from below by a constant. Proof.

Without loss of generality we can assume ρ is an absolutely continuous measure with density ρ .Observe that H is not bounded from below by a constant on P (Ω). However, we derive from the Carlemanestimate a lower bound of H in terms of the second moment M , see [18], i.e., there exist non-negativeconstants d ≥ γ ∈ ( dd +2 ,

1) such that U m ( ρ ) ≥ H ( ρ ) ≥ − d (1 + M ( ρ )) γ ≥ − d (1 + M ( ρ )) . (3.3)Since the external potential V and the interaction kernel W grow at most quadratically at inﬁnity, thecorresponding energies can be estimated from below in terms of the second moment M by V ( ρ ) + W ( ρ ) ≥ − d Z Ω (1 + k x k ) d ρ ( x ) − d Z Ω (1 + k x − y k ) d ρ ( x ) d ρ ( y ) = − d (1 + M ( ρ )) . From the elementary inequality k x k − k y k ≤ k x − y k ≤ k x k + 6 k y k and from the deﬁnition of W it follows immediately M ( ρ ) − M ( ν ) ≤ W ( ρ, ν ) ≤ M ( ρ ) + 6 M ( ν ) for all ρ, ν ∈ P (Ω) . (3.4)Combining all three inequalities, we can deduce the following lower bound:Ψ( τ, η, ν ; ρ ) ≥ τ M ( ρ ) − τ M ( ν ) − τ M ( ρ ) − τ M ( η ) − d (1 + M ( ρ )) − d (1 + M ( ρ )) , which is equivalent to the desired inequality (3.1). (cid:3) Lemma 3.3 (Lower semi-continuity) . For each τ > and for all η, ν ∈ P (Ω) the BDF2 penalization Ψ( τ, η, ν ; · ) is lower semi-continuous with respect to narrow convergence.Proof. Due to the lower semi-continuity with respect to narrow convergence of the internal energy U m , theexternal potential V , and the interaction energy W , the energy F is also lower semi-continuous with respectto narrow convergence as sum of lower semi-continuous functions.Thus it remains to prove the lower semi-continuity of the auxiliary functional A : P (Ω) → R , deﬁnedvia A ( ρ ) := 4 W ( ν, ρ ) − W ( η, ρ ) . First, we simplify the auxiliary functional A . Let p ∈ Γ( ρ, ν ) and p ∈ Γ( ρ, η ) be two optimal transportplans. Further, introduce the special three-plan p ∈ Γ( ρ, ν, η ) := { p ∈ P (Ω × Ω × Ω) : ( π ) p = ρ, ( π ) p = ν, ( π ) p = η } such that p has marginal with respect to the x - and y -components equals to p and themarginal with respect to the x - and z -components is equal to p , i.e., ( π (1 , ) p = p and ( π (1 , ) p = p .The existence of such a three-plan is guaranteed by the gluing lemma, see [2, Lemma 5.3.2]. Then we canrewrite the auxiliary functional A as A ( ρ ) = Z Ω k x − y k d p ( x, y ) − Z Ω k x − z k d p ( x, z ) = Z Ω k x − y k − k x − z k d p ( x, y, z ) . (3.5)Now, let ( ρ n ) n ∈ N be a narrowly converging sequence with limit ρ ∗ ∈ P (Ω). Since ( ρ n ) n ∈ N is narrowlyconverging to ρ ∗ , the sequences ( p n ) n ∈ N and ( p n ) n ∈ N are relatively compact in P (Ω ) with respect to narrowconvergence and any limit point is an optimal transport plan, see [2, Proposition 7.1.3]. Thus we can extract SIMON PLAZOTTA a non-relabelled subsequence such that ( p n ) n ∈ N and ( p n ) n ∈ N converge narrowly to an optimal transportplan p ∗ ∈ Γ( ρ ∗ , ν ) and to an optimal transport plan p ∗ ∈ Γ( ρ ∗ , η ), respectively. By the same argument,the sequence ( p n ) n ∈ N of three-plans is relatively compact in P (Ω ) with respect to narrow convergence.Therefore we can extract a further non-relabelled subsequence such that ( p n ) n ∈ N narrowly converges to somethree-plan p ∗ ∈ Γ( ρ ∗ , ν, η ). Taking marginals is continuous with respect to narrow convergence, so we have( π (1 , ) p ∗ = p ∗ and ( π (1 , ) p ∗ = p ∗ , i.e., this limit three-plan p ∗ is admissible in (3.5).Next, we want to apply the lower semi-continuity result [2, Lemma 5.1.7] to the alternative representationof A . The uniform integrability of the negative part of the integrand in (3.5) with respect to ( p n ) n ∈ N in thesense of [2] follows by the elementary inequality4 k x − y k − k x − z k ≥ k x k − k y k − k z k ≥ − (cid:16) k y k + k z k (cid:17) . Thus the lower bound on 4 k x − y k − k x − z k is independent of x . Since the second moments of ν and η are ﬁnite that diﬀerence is uniform integrable with respect to the family ( p n ) n ∈ N . Hence, we can invoke [2,Lemma 5.1.7] to conclude Z Ω k x − y k − k x − z k d p ∗ ( x, y, z ) ≤ lim inf n →∞ Z Ω k x − y k − k x − z k d p n ( x, y, z ) . Therefore the auxiliary function ρ

7→ A ( ρ ) = 4 W ( ν, ρ ) − W ( η, ρ ) is lower semi-continuous with respect tonarrow convergence. (cid:3) Existence of Minimizer.

Recall that the well-posedness of a single step of the BDF2 scheme isequivalent to the existence of a minimizer in (2.1). The augmented energy functional Ψ shares no uniformsemi-convexity as in the case of [27], so we cannot exploit the convexity to ensure the existence of a minimizer.Nevertheless, a standard technique from the calculus of variations yields the existence of a minimizer.

Theorem 3.4 (Existence of a minimizer) . For each τ ∈ (0 , τ ∗ ) and for all η, ν ∈ P (Ω) , there exists anabsolutely continuous minimizer ρ ∗ ∈ K m of the map ρ Ψ( τ, η, ν ; ρ ) .Proof. Take a minimizing sequence ( ρ n ) n ∈ N for the BDF2 penalization ρ Ψ( τ, η, ν ; ρ ). To extract aconvergent subsequence, we use the auxiliary inequality (3.1). Since τ < τ ∗ , the pre-factor of the secondmoment M ( ρ ) in (3.1) is positive. Hence, the second moment ( M ( ρ n )) n ∈ N of the minimizing sequence( ρ n ) n ∈ N is bounded. Also the internal energy U m ( ρ n ) of the minimizing sequence is bounded, since U m ( ρ n ) ≤ Ψ( τ, η, ν ; ρ n ) + 14 τ W ( η, ρ n ) − V ( ρ n ) − W ( ρ n ) ≤ sup n ∈ N [Ψ( τ, η, ν ; ρ n ) + C (1 + M ( ρ n ))] < ∞ . Due to the super-linear growth of ρ ρ log( ρ ) and of ρ ρ m , we can apply the Dunford-Pettis Theorem tothe densities ( ρ n ) n ∈ N and we can extract a non-relabelled subsequence ( ρ n ) n ∈ N converging weakly in L (Ω).Since C b (Ω) ⊂ L ∞ (Ω) = ( L (Ω)) ∗ , in this case we can deduce from the weak convergence in L (Ω) of thesequence of densities the narrow convergence of the corresponding measures. Summarized, the sequence( ρ n ) n ∈ N also converges narrowly to an absolutely continuous measure ρ ∗ ∈ P (Ω) with density ρ ∗ . By thelower semi-continuity of the L m (Ω)-norm with respect to narrow convergence it follows ρ ∗ ∈ K m .To prove that ρ ∗ is indeed a minimizer we use the lower semi-continuity of the BDF2 penalization Ψ,proven in Lemma 3.3, to concludeΨ( τ, η, ν ; ρ ∗ ) ≤ lim inf n →∞ Ψ( τ, η, ν ; ρ n ) = inf ρ ∈ P (Ω) Ψ( τ, η, ν ; ρ ) . Indeed, the limit measure with density ρ ∗ is a minimizer of the BDF2 penalization Ψ( τ, η, ν ; · ). (cid:3) Step size independent estimates.

By the previous theorem, the sequence ( ρ kτ ) k ∈ N given by the BDF2method is well-deﬁned for τ ∈ (0 , τ ∗ ). Next, we deduce three step size independent bounds: on the kineticenergy, on the internal energy, and on the second moment . We want to emphasize that these estimates areintrinsic properties of the scheme, which do not rely on any uniform semi-convexity of the augmented energyfunctional Ψ. The original proof of those estimates can be found in [27] and for the sake of the completenesswe recall a proof in Appendix A adapted to the L -Wasserstein formalism. BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 7

Theorem 3.5 (Classical estimates) . Fix a time horizon

T > . There exists a constant C , depending onlyon d to d and T , such that the corresponding discrete solutions ( ρ kτ ) k ∈ N satisfy N X k =0 τ W ( ρ k − τ , ρ kτ ) ≤ C, |U m ( ρ Nτ ) | ≤ C, M ( ρ Nτ ) ≤ C, (3.6) for all τ ∈ (0 , τ ∗ ) and for all N ∈ N with N τ ≤ T .Proof. The proof of this theorem is given in Appendix A. (cid:3)

Narrow Convergence.

We are able to prove our ﬁrst weak convergence results. The step size inde-pendent bounds (3.6) and the Arzel`a-Ascoli theorem, which can be found in [2, Proposition 3.3.1], guaranteethe narrow convergence of the interpolated solution ρ τ . Theorem 3.6 (Narrow convergence in P (Ω)) . Given a vanishing sequence ( τ n ) n ∈ N of step sizes τ n ∈ (0 , τ ∗ ) . Then, there exists a (non-relabelled) subsequence ( τ n ) n ∈ N and a L -absolutely continuous limit curve ρ ∗ ∈ AC (0 , ∞ ; ( P (Ω) , W )) such that for any t ≥ : ρ τ n ( t ) ⇀ ρ ∗ ( t ) narrowly in the space P (Ω) as n → ∞ . Proof.

Fix

T > A n ∈ L (0 , T ), also called discrete derivative, as A n ( t ) := W ( ρ k − τ n , ρ kτ n ) τ n for t ∈ (( k − τ n , kτ n ] and k ∈ N . Using the step size independent bounds (3.6) we obtain for N T = max { N | N τ n ≤ T } : Z T A n ( t ) d t ≤ N T X k =1 Z kτ n ( k − τ n W ( ρ k − τ n , ρ kτ n ) τ n ! d t = N T X k =1 W ( ρ k − τ n , ρ kτ n ) τ n ≤ C. Indeed, A n ∈ L (0 , T ) and the L (0 , T )-norm of A n is uniformly bounded independently of the step size τ n .Therefore, the sequence A n possesses a non-relabelled subsequence weakly convergent in L (0 , T ) with limit A ∈ L (0 , T ). To derive an uniform H¨older-estimate for ρ τ n , choose 0 ≤ s ≤ t ≤ T arbitrary and deﬁne k t = max { k ∈ N | kτ n ≤ t } , then W ( ρ τ n ( s ) , ρ τ n ( t )) ≤ k t X k = k s +1 W ( ρ k − τ n , ρ kτ n ) = k t X k = k s +1 Z kτ n ( k − τ n W ( ρ k − τ n , ρ kτ n ) τ n d t ≤ Z t ( s − τ n ) + A n ( t ) d t. (3.7)Taking the limit n → ∞ yields, together with the weak convergence in L (0 , T ) of A n to A ,lim sup n →∞ W ( ρ τ n ( s ) , ρ τ n ( t )) ≤ Z ts A ( r ) d r. Moreover, the second moments of the discrete solutions ( ρ kτ n ) k ∈ N are uniformly bounded independently of thestep size τ n and therefore the interpolated solutions ρ τ n is uniformly contained in a set K which is compactwith respect to narrow convergence. Hence, we can apply the Arzel`a-Ascoli Theorem [2, Proposition 3.3.1]yielding the existence of a non-relabelled subsequence and a limit curve ρ ∗ : [0 , T ] → P (Ω) such that ρ τ n ( t ) converges narrowly to ρ ∗ ( t ) for each ﬁxed t ∈ [0 , T ]. Additionally, the limit curve ρ ∗ is L -absolutelycontinuous with modulus of continuity A ∈ L (0 , T ). A further diagonal argument in T → ∞ yields thenarrow convergence on for any t ≥ ρ ∗ ∈ AC (0 , ∞ ; ( P (Ω) , W )). (cid:3) Discrete Euler Lagrange Equation and Improved Regularity

In theorem 4.1, we derive the discrete Euler-Lagrange equations for the weak formulation of the non-linearFokker-Planck equation (1.1). The key idea is the JKO-method introduced in [18], i.e., we determine the ﬁrstvariation of the augmented energy functional Ψ( τ, ρ k − τ , ρ k − τ ; · ) in the space ( P (Ω) , W ) along solutions tothe continuity equation ∂ s ρ s + div( ξ ρ s ) = 0 , ρ = ρ kτ , (4.1) SIMON PLAZOTTA for an arbitrary smooth vector ﬁeld ξ ∈ C ∞ c (Ω , R d ). The solution ρ s is explicitly given by the push-forwardof ρ kτ under the ﬂow Φ s , i.e., ρ s = (Φ s ) ρ kτ , such that the ﬂow Φ s satisﬁes the initial value problem:dd s Φ s ( x ) = ξ (Φ s ( x )) , Φ ( x ) = x. Note that the ﬂow Φ s exists and each Φ s is a diﬀeomorphism on Ω. Additionally, we can calculate thederivative of det(D Φ s ) and we have an explicit representations of the perturbed density ρ s , i.e.,dd s [det(D Φ s ( x ))] s =0 = tr(D ξ ◦ Φ ) = div( ξ ) , and det(D Φ s ) ρ s ◦ Φ s = ρ kτ . (4.2) Theorem 4.1 (Discrete Euler-Lagrange equations) . The discrete solution (cid:0) ρ kτ (cid:1) k ∈ N obtained by the BDF2method satisﬁes for each k ∈ N and for all vector ﬁelds ξ ∈ C ∞ c (Ω , R d )0 = Z Ω − div( ξ ) ( ρ kτ ) m + h ξ, ∇ V i ρ kτ + h ξ, ∇ W ∗ ρ kτ i ρ kτ d x + 2 τ Z Ω h ξ ( x ) , x − y i d p kτ ( x, y ) − τ Z Ω h ξ ( x ) , x − z i d q kτ ( x, z ) , (4.3) where p kτ ∈ Γ( ρ kτ , ρ k − τ ) and q kτ ∈ Γ( ρ kτ , ρ k − τ ) are optimal transport plans.Proof. Fix ρ kτ , ρ k − τ , ρ k − τ and ξ ∈ C ∞ c (Ω , R d ). We consider the perturbation ρ s of ρ kτ as the solution of thecontinuity equation with velocity ﬁeld ξ starting at ρ kτ , i.e., ρ s is the solution of (4.1). To actually computethe ﬁrst variation of the heat energy H we use exact value of the derivative of det(D Φ s ) and the explicitrepresentation of the perturbed density ρ s , given in (4.2), to obtain as limit of the diﬀerence quotientdd s [ H ( ρ s )] s =0 = lim s → s (cid:0) H ( ρ s ) − H ( ρ kτ ) (cid:1) = − lim s → Z Ω s log(det(D Φ s ( x ))) ρ kτ ( x ) d x = − Z Ω div( ξ ) ρ kτ d x. Similarly, we can compute the ﬁrst variations of the internal energy U m for m >

1, the external potential V , and the interaction energy W . The ﬁrst variation of the energy F along the solution to the continuityequation amounts todd s [ F ( ρ s )] s =0 = Z Ω − div( ξ ) ( ρ kτ ) m + h ξ, ∇ V i ρ kτ + h ξ, ∇ W ∗ ρ kτ i ρ kτ d x. (4.4)The diﬀerentiability of the quadratic L -Wasserstein distance W along the solution ρ s of the continuityequation is more technical, for the proof we refer to [2, 36]. Since ρ k − τ , ρ k − τ , ρ kτ , ρ s are all absolutely continuousmeasures, Theorem 8.13 from [36] is applicable and we can conclude:dd s h W ( ρ k − τ , ρ s ) − W ( ρ k − τ , ρ s ) i s =0 = 8 Z Ω h ξ ( x ) , x − y i d p kτ ( x, y ) − Z Ω h ξ ( x ) , x − z i d q kτ ( x, z ) , (4.5)where p kτ ∈ Γ( ρ kτ , ρ k − τ ) and q kτ ∈ Γ( ρ kτ , ρ k − τ ) are optimal transport plans. Since ρ kτ is a minimizer of theBDF2 penalization Ψ( τ, ρ k − τ , ρ k − τ ; · ) and since s Ψ( τ, ρ k − τ , ρ k − τ ; ρ s ) is diﬀerentiable at s = 0,0 = dd s h Ψ( τ, ρ k − τ , ρ k − τ ; ρ s ) i s =0 = 14 τ dd s h W ( ρ k − τ , ρ s ) − W ( ρ k − τ , ρ s ) i s =0 + dd s [ F ( ρ s )] s =0 = 2 τ Z Ω h ξ ( x ) , x − y i d p kτ ( x, y ) − τ Z Ω h ξ ( x ) , x − z i d q kτ ( x, z )+ Z Ω − div( ξ ) ( ρ kτ ) m + h ξ, ∇ V i ρ kτ + h ξ, ∇ W ∗ ρ kτ i ρ kτ d x. Indeed, we have the desired equality (4.3). (cid:3)

The already obtained regularity results for the interpolated solution ρ τ are not suﬃcient to pass to thelimit in the ﬁrst term of the discrete Euler-Lagrange equation (4.3). Nevertheless, the following bounds inthe BV (Ω)-norm of ( ρ kτ ) m are suﬃcient to obtain the desired regularity results. These estimates can bederived from the discrete Euler-Lagrange equation quite naturally. BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 9

Proposition 4.2 (Step size independent BV (Ω)-estimate) . Fix a time horizon

T > . There exists aconstant C , depending only on d to d and T , such that the corresponding discrete solutions ( ρ kτ ) k ∈ N satisfyfor all τ ∈ (0 , τ ∗ ) and for all k ∈ N with kτ ≤ T : (cid:13)(cid:13) ( ρ kτ ) m (cid:13)(cid:13) BV (Ω) ≤ C (cid:18) W ( ρ kτ , ρ k − τ ) τ + W ( ρ kτ , ρ k − τ ) τ (cid:19) . (4.6) Proof.

The L (Ω)-norm of ( ρ kτ ) m is equal to ( m − U m evaluated at ρ kτ . Hence, we can bound the ﬁrstterm in the deﬁnition of the BV (Ω)-norm uniformly by the classical estimates (3.6). In order to estimatethe variation of ( ρ kτ ) m we estimate the term inside the supremum of the deﬁnition of V (( ρ kτ ) m , Ω). Thus let ξ ∈ C ∞ c (Ω , R d ) with k ξ k ∞ ≤

1, then we can use the discrete Euler-Lagrange equations (4.3) to substitute Z Ω ( ρ kτ ) m div( ξ ) d x = Z Ω h ξ ( x ) , ∇ V i ρ kτ ( x ) + h ξ ( x ) , ∇ W ∗ ρ kτ i ρ kτ ( x ) d x + 2 τ Z Ω h ξ ( x ) , x − y i d p kτ ( x, y ) − τ Z Ω h ξ ( x ) , x − z i d q kτ ( x, z ) . (4.7)By Assumption 2.1 we have quadratic growth bounds for ∇ V and ∇ W , so using the step size independentbounds on the second moment (3.6), we can estimate the ﬁrst terms in (4.7) as follows: Z Ω h ξ ( x ) , ∇ V i ρ kτ ( x ) + h ξ ( x ) , ∇ W ∗ ρ kτ i ρ kτ ( x ) d x ≤ d k ξ k ∞ (1 + M ( ρ kτ )) ≤ d (1 + C ) . The second integral on the right-hand side of (4.7) can be estimated using Jensen’s inequality (cid:12)(cid:12)(cid:12)(cid:12)Z Ω h ξ ( x ) , x − y i d p kτ ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ k ξ k ∞ (cid:18)Z Ω k x − y k d p kτ ( x, y ) (cid:19) / ≤ W ( ρ kτ , ρ k − τ ) , and similar for the third integral of the right-hand side of (4.7). Hence, we have the following upper boundfor the variation of ( ρ kτ ) m : V (( ρ kτ ) m , Ω) ≤ C (cid:18) W ( ρ kτ , ρ k − τ ) τ + W ( ρ kτ , ρ k − τ ) τ (cid:19) . In conclusion, the discrete solution ( ρ kτ ) k ∈ N satisﬁes the desired bound (4.6). (cid:3) Theorem 4.3 (Step size independent L (0 , T ; BV (Ω))-estimate) . Fix a time horizon

T > . There exists aconstant C , depending only on d to d and T , such that the corresponding interpolated solution ρ τ satisﬁesfor each τ ∈ (0 , τ ∗ ) : k ( ρ τ ) m k L (0 ,T ; BV (Ω)) ≤ C. (4.8) Proof.

We use the classical estimates on the kinetic energy (3.6) and the result from Proposition 4.2 toestimate the L (0 , T ; BV (Ω))-norm of ( ρ τ ) m . Let N T := max { N ∈ N | N τ ≤ T } , then we have k ( ρ τ ) m k L (0 ,T ; BV (Ω)) ≤ N T +1 X k =1 Z kτ ( k − τ (cid:13)(cid:13) ( ρ kτ ) m (cid:13)(cid:13) BV (Ω) d t ≤ C N T +1 X k =1 τ (cid:18) W ( ρ kτ , ρ k − τ ) τ + W ( ρ kτ , ρ k − τ ) τ (cid:19) . By the triangle inequality W ( ρ kτ , ρ k − τ ) ≤ W ( ρ kτ , ρ k − τ ) + W ( ρ k − τ , ρ k − τ ) in combination with a Cauchy typeinequality we obtain k ( ρ τ ) m k L (0 ,T ; BV (Ω)) ≤ C N T +1 X k =1 (cid:20) τ + W ( ρ kτ , ρ k − τ ) τ + W ( ρ k − τ , ρ k − τ ) τ (cid:21) ≤ C ( T + τ ) + C N T +1 X k =0 W ( ρ kτ , ρ k − τ ) τ . Finally, we can conclude, under the step size independent bounds on the kinetic energy (3.6), the desiredestimate (4.8) for some universal constant C , which only depends on d to d and T , but not on the stepsize τ ∈ (0 , τ ∗ ). (cid:3) Convergence

In this section we prove our main theorem, the strong convergence of the approximation ρ τ to the solutionof the non-linear Fokker-Planck equation. The convergence in the strong L m (0 , T ; L m (Ω))-topology followsby the improved L (0 , T ; BV (Ω))-estimates (4.8) and by a general version of the Aubin-Lions Theorem [32,Theorem 2], which is recalled in Appendix B. Theorem 5.1 (Strong convergence in L m (0 , T ; L m (Ω))) . Under the same assumptions as in Theorem 3.6and given the vanishing sequence ( τ n ) n ∈ N of step sizes τ n ∈ (0 , τ ∗ ) and the limit curve ρ ∗ therein, then thereexists a further (non-relabelled) subsequence ( τ n ) n ∈ N such that for all T > :(a) In the case of an open and bounded set Ω ⊂ R d with Lipschitz-continuous boundary ∂ Ω , we have: ρ τ n ( t ) → ρ ∗ ( t ) in L m (0 , T ; L m (Ω)) as n → ∞ . (b) In the case of the entire space, i.e., Ω = R d , we have for every open and bounded set e Ω ⋐ R d : ρ τ n ( t ) → ρ ∗ ( t ) in L m (0 , T ; L m ( e Ω)) as n → ∞ . Proof of Theorem 5.1 for Ω is open and bounded with Lipschitz-continuous boundary ∂ Ω . Fix

T >

0. In or-der to prove the strong convergence result we use the Aubin-Lions Theorem B.1 with the underlying Banachspace X = L m (Ω). We consider the functional A : L m (Ω) → R , deﬁned via A ( ρ ) := ( k ρ m k BV (Ω) if ρ ∈ P (Ω) and ρ m ∈ BV (Ω) , + ∞ else . Using the remark in the introductory section about functions of bounded variations it follows that thefunctional A is measurable, lower semi-continuous with respect to the L m (Ω)-topology, and has compactsublevels. Next, we choose as pseudo-distance g = W on L m (Ω). The L -Wasserstein distance is lowersemi-continuous with respect to the L m (Ω)-topology and clearly compatible with A .Next, we verify the assumption (B.1) on ( ρ τ n ) n ∈ N of Theorem B.1. By the L (0 , T ; BV (Ω))-estimates ofTheorem 4.3 it is clear, that the sequence ( ρ τ n ) n ∈ N is tight with respect to A , since we have:sup n ∈ N Z T (cid:13)(cid:13) ( ρ τ n ( t )) m (cid:13)(cid:13) BV (Ω) d t = sup n ∈ N (cid:13)(cid:13) ( ρ τ n ) m (cid:13)(cid:13) L (0 ,T ; BV (Ω)) ≤ C < ∞ . For the proof of the relaxed averaged weak integral equicontinuity condition of ( ρ τ n ) n ∈ N with respect to W ,we use the auxiliary function A n and the estimate (3.7) from the proof of weak convergence results to obtain: Z T − t W ( ρ τ n ( s + t ) , ρ τ n ( s )) d s ≤ Z T − t Z s + t ( s − τ n ) + A n ( r ) d r d s ≤ ( t + τ n ) Z T A n ( r ) d r. Indeed, using the weak convergence in L of A n to some A ∈ L loc (0 , ∞ ) it followslim inf h ց lim sup n →∞ h Z h Z T − t W ( ρ τ n ( s + t ) , ρ τ n ( s )) d s d t ≤ lim inf h ց lim sup n →∞ h Z h ( t + τ n ) d t Z T A n ( t ) d t = 0 . Therefore, we can conclude that there exists a non-relabeled subsequence ( τ n ) n ∈ N such that ρ τ n converges in M (0 , T ; L m (Ω)) to some curve ρ + . Due to the uniform bounds in L ∞ (0 , T ; L m (Ω)), we obtain by a dominatedconvergence argument also convergence in L m (0 , T ; L m (Ω)) as desired. Moreover, the limit curves ρ + and ρ ∗ have to coincide, since ρ τ n converges also in measure to ρ + and ρ ∗ , so both limits have to be equal. (cid:3) In the case of Ω = R d we have to alter the proof given above, since the embedding of BV ( R d ) into L ( R d )is not compact anymore. So we restrict ourself to the open and bounded sets e Ω = B R (0). This subset isclearly open and bounded with Lipschitz-continuous boundary ∂ e Ω, so the embedding of BV ( e Ω) into L ( e Ω)is compact again.

Proof of Theorem 5.1 for

Ω = R d . Fix

T >

0. Without loss of generality we can assume e Ω = B R (0),since every open and bounded subset ˆΩ ⋐ R d is contained in a ball with radius R and convergence in L m (0 , T ; L m ( B R (0))) implies convergence in L m (0 , T ; L m ( ˆΩ)). BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 11

As before, we want to use the Aubin-Lions Theorem B.1 for the Banach space L m ( e Ω) equipped with thenatural topology induced by the L m ( e Ω)-norm applied to ( ρ τ n (cid:12)(cid:12) e Ω ) n ∈ N , the restriction of the density ρ τ n tothe subspace e Ω. In this case we consider the functional e A : L m ( e Ω) → R , deﬁned via e A ( ρ ) := ( k ρ m k BV ( e Ω) if ρ ∈ M f ( e Ω) and ρ m ∈ BV ( e Ω) , + ∞ else . Now, the functional e A is measurable, lower semi-continuous with respect to the L m ( e Ω) topology, and hascompact sublevels. Since e A ( ρ | e Ω ) ≤ A ( ρ ), we obtain by the same calculations as above the tightness of( ρ τ n (cid:12)(cid:12) e Ω ) n ∈ N with respect to e A .Since the measure ρ | e Ω does not have unit mass anymore, we cannot consider the L -Wasserstein distance W as pseudo-distance anymore. However, we can use the following pseudo-distance e g : e g ( ρ, ν ) := inf { W ( e ρ, e ν ) | e ρ ∈ Σ( ρ ) , e ν ∈ Σ( ν ) } , Σ( ρ ) := (cid:8)e ρ ∈ P ( R d ) | e ρ | Ω = ρ, M ( e ρ ) ≤ C (cid:9) , where C is the constant from the classical estimates (3.6) for the speciﬁc T . Since Σ( ρ ) and Σ( ν ) are compactsets with respect to the narrow topology, the inﬁmum is attained at some pair e ρ ∗ , e ν ∗ . The pseudo-distance e g is compatible with e A , i.e., if ρ m , ν m ∈ BV ( e Ω) and e g ( ρ, ν ) = 0 then ρ = ν a.e. on e Ω. The lower semi-continuity of the pseudo-distance e g with respect to the L m ( e Ω)-topology can be proven as follows. Choose toconvergent sequences ρ n → ρ and ν n → ν in L m ( e Ω) with sup n e g ( ρ n , ν n ) < ∞ . By the remark from above,there exists e ρ n , e ν n such that e g ( ρ n , ν n ) = W ( e ρ n , e ν n ). Since the second moments are by deﬁnition of Σ( ρ )uniformly bounded, we can extract a non-relabeled convergent subsequence which converges narrowly to e ρ ∈ Σ( ρ ) , e ν ∈ Σ( ν ). By the lower semi-continuity of W with respect to narrow convergence, we get in theend e g ( ρ, ν ) ≤ W ( e ρ, e ν ) ≤ lim inf n →∞ W ( e ρ n , e ν n ) = lim inf n →∞ W ( ρ n , ν n ) . Therefore, the pseudo-distance e g is lower semi-continuous with respect to the L m ( e Ω)-topology. Thus, e g satisﬁes the assumptions of theorem B.1. Further, one has e g ( ρ | e Ω , ν | e Ω ) ≤ W ( ρ, ν ). Thus we derive, usingthe same proof as above, the equicontinuity of ( ρ τ n (cid:12)(cid:12) e Ω ) n ∈ N with respect to the pseudo-distance e g .Hence, we can conclude that there exists a non-relabeled subsequence of ρ τ n (cid:12)(cid:12) e Ω which converges in M (0 , T ; L m ( e Ω)) to some limit ρ + . As before, we use the uniform bounds in L ∞ (0 , T ; L m ( e Ω)), to obtainthe strong convergence in L m (0 , T ; L m ( e Ω)) by a dominated convergence argument. Moreover, the limitcurves ρ + and ρ ∗ | e Ω have to coincide on e Ω, since ρ τ n (cid:12)(cid:12) e Ω converges also in measure on e Ω to ρ + and ρ ∗ | e Ω ,so both limits have to be equal on e Ω. Two diagonal arguments in T → ∞ and R → ∞ yield the desiredconvergence result. (cid:3) To complete the proof of the main theorem 1.1, we have to validate that ρ ∗ is indeed a solution to (1.1)in the sense of distributions. Theorem 5.2 (Solution of the non-linear Fokker-Planck equation) . Under the same assumptions as inTheorem 5.1, consider the vanishing sequence ( τ n ) n ∈ N of step sizes τ ∈ (0 , τ ∗ ) and the limit curve ρ ∗ deﬁnedthere. The limit curve ρ ∗ is a solution to the non-linear Fokker-Planck equation with no-ﬂux boundarycondition (1.1) in the sense of distributions, i.e., we have for each test function ψ ∈ C ∞ c ([0 , ∞ ) × Ω) : Z ∞ Z Ω − ∆ ψ ρ m ∗ + h∇ ψ, ∇ V i ρ ∗ + h∇ ψ, ∇ W ∗ ρ ∗ i ρ ∗ d x d t = Z ∞ Z Ω ∂ t ψ ρ ∗ d x d t + Z Ω ψ (0) ρ d x. Proof.

For simplicity we drop the index n and write τ and τ ց

0. Fix ψ ∈ C ∞ c ([0 , ∞ ) × Ω) and let be

T > e Ω ⊂ Ω be open and bounded such that supp ψ ⊂ [0 , T ] × e Ω. Further, deﬁne the piecewise constantinterpolation ψ of ψ by ψ (0) = ψ (0) , ψ ( t ) = ψ ( kτ ) for t ∈ (( k − τ, kτ ] and k ∈ N . For each k ∈ N insert the smooth function x

7→ ∇ ψ (( k − τ, x ) in the discrete Euler-Lagrange equation(4.3) for the vector ﬁeld ξ ∈ C ∞ c (Ω). Summing the resulting equations from k = 1 to N T + 1 and multiplying with τ yields:0 = Z T Z e Ω − ∆ ψ ( ρ τ ) m + h∇ ψ, ∇ V i ρ τ + h∇ ψ, ∇ W ∗ ρ τ i ρ τ d x d t + N T X k =1 (cid:20) Z e Ω h∇ ψ (( k − τ, x ) , x − y i d p kτ ( x, y ) − Z e Ω h∇ ψ (( k − τ, x ) , x − z i d q kτ ( x, z ) (cid:21) =: I + I . Due to the strong convergence in L m (0 , T ; L m ( e Ω)) of ρ τ to ρ ∗ and due to the uniform convergence of ψ to ψ lim τ ց I = Z T Z e Ω − ∆ ψ ρ m ∗ + h∇ ψ, ∇ V i ρ ∗ + h∇ ψ, ∇ W ∗ ρ ∗ i ρ ∗ d x d t. To rewrite I , we use, as in [18], the second order Taylor expansion for a time independent function ψ , toobtain (cid:12)(cid:12)(cid:12)(cid:12)Z e Ω ψ ( y ) ρ k − τ ( y ) d y − Z e Ω ψ ( x ) ρ kτ ( x ) d x − Z e Ω h∇ ψ ( x ) , y − x i d p kτ ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)Z e Ω ψ ( y ) − ψ ( x ) − h∇ ψ ( x ) , y − x i d p kτ ( x, y ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ k Hess ψ k ∞ Z e Ω k x − y k d p kτ ( x, y )= 12 k Hess ψ k ∞ W ( ρ kτ , ρ k − τ ) . Replacing the time independent function ψ with ψ (( k − τ ) yields as approximation of I I = N T X k =1 (cid:20)Z e Ω ( 32 ρ kτ − ρ k − τ + 12 ρ k − τ ) ψ (( k − τ ) d x + O ( W ( ρ kτ , ρ k − τ )) + O ( W ( ρ kτ , ρ k − τ )) (cid:21) . We rearrange the sum of the ﬁrst term in I as follows N T X k =0 Z e Ω ( 32 ρ kτ − ρ k − τ + 12 ρ k − τ ) ψ (( k − τ ) d x = N T X k =0 Z e Ω ( 32 ψ (( k − τ ) − ψ ( kτ ) + 12 ψ (( k + 1) τ )) ρ kτ d x − Z e Ω (2 ψ (0) − ψ ( τ )) ρ τ − ψ (0) ρ − τ d x. Finally, use the fundamental theorem of calculus and the classical estimate (3.6) to bound the second termin I , to obtain I = − Z T Z e Ω ( 32 ∂ t ψ ( t ) − ∂ t ψ ( t + τ )) ρ τ d x d t − Z e Ω (2 ψ (0) − ψ ( τ )) ρ τ − ψ (0) ρ − τ d x + O ( τ ) . Indeed, combining the narrow convergence of ρ τ with the uniform convergence of ∂ t ψ ( t + τ ) to ∂ t ψ ( t ) andwith the narrow convergence of the initial data ( ρ τ , ρ − τ ) to ρ , the limit of I is given by:lim τ ց I = − Z T Z e Ω ∂ t ψ ( t ) ρ ∗ d x d t − Z e Ω ψ (0) ρ d x. Finally, we can conclude that for arbitrary test functions ψ ∈ C ∞ c ([0 , ∞ ) × Ω) the limit curve ρ ∗ satisﬁes : Z ∞ Z Ω − ∆ ψρ m ∗ + h∇ ψ, ∇ V i ρ ∗ + h∇ ψ, ∇ W ∗ ρ ∗ i ρ ∗ d x d t = Z ∞ Z Ω ∂ t ψ ρ ∗ d x d t + Z Ω ψ (0) ρ d x. This yields that ρ ∗ is a distributional solution to the non-linear Fokker-Planck equation (1.1). (cid:3) BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 13

Appendix A. Proof of the Classical Estimates

In this part we ﬁll the technical gap in the proof of the step size independent bounds of Theorem 3.5. Theﬁrst result is an auxiliary inequality which will be used to derive the step size independent bounds. Despitethe auxiliary character of this inequality, we want to emphasize that this property is of interest by itself,since we can give a precise estimate of the energy decay of the BDF2 scheme in every step.

Lemma A.1 (Almost energy diminishing) . For each time step size τ ∈ (0 , τ ∗ ) the discrete solution ( ρ kτ ) k ∈ N satisﬁes F ( ρ kτ ) + 12 τ W ( ρ k − τ , ρ kτ ) ≤ F ( ρ k − τ ) + 14 τ W ( ρ k − τ , ρ k − τ ) (A.1) at each step k = 1 , , . . . .Proof. Since ρ kτ is a minimizer of ρ Ψ( τ, ρ k − τ , ρ k − τ ; ρ ), it satisﬁesΨ( τ, ρ k − τ , ρ k − τ ; ρ kτ ) ≤ Ψ( τ, ρ k − τ , ρ k − τ ; ρ )for all ρ ∈ P (Ω). For the speciﬁc choice ρ = ρ kτ , we obtain1 τ W ( ρ k − τ , ρ kτ ) − τ W ( ρ k − τ , ρ kτ ) + F ( ρ kτ ) ≤ − τ W ( ρ k − τ , ρ k − τ ) + F ( ρ k − τ ) . (A.2)By the triangle inequality and the binomial formula, W ( ρ k − τ , ρ kτ ) ≤ W ( ρ k − τ , ρ k − τ ) + 2 W ( ρ k − τ , ρ kτ ) . (A.3)Substitute this in the left-hand side of (A.2). This yields (A.1) (cid:3) Theorem A.2 (Classical estimates) . Fix a time horizon

T > . There exists a constant C , depending onlyon d to d and T , such that the corresponding discrete solutions ( ρ kτ ) k ∈ N satisfy N X k =0 τ W ( ρ k − τ , ρ kτ ) ≤ C, |U m ( ρ Nτ ) | ≤ C, M ( ρ Nτ ) ≤ C, for all τ ∈ (0 , τ ∗ ) and for all N ∈ N with N τ ≤ T .Proof of Theorem 3.5. Sum up inequalities (A.1) for k = 1 to K = N to obtain after cancellation: F ( ρ Nτ ) + 14 τ N X k =1 W ( ρ k − τ , ρ kτ ) ≤ F ( ρ τ ) + 14 τ W ( ρ − τ , ρ τ ) . (A.4)Next, we want to prove the auxiliary inequality M ( ρ kτ ) − M ( ρ k − τ ) ≤ W ( ρ k − τ , ρ kτ ) M ( ρ kτ ) . (A.5)Without loss of generality we assume M ( ρ k − τ ) ≥ M ( ρ kτ ), otherwise the equality is always true. We use thebinomial formula to obtain M ( ρ kτ ) − M ( ρ k − τ ) = ( M ( ρ kτ ) + M ( ρ k − τ ))( M ( ρ kτ ) − M ( ρ k − τ )) ≤ M ( ρ kτ )( M ( ρ kτ ) − M ( ρ k − τ ))Let δ be the discrete probability measure localized at x = 0, then we have by the triangle inequality M ( ρ ) = W ( ρ, δ ) ≤ W ( ρ, ν ) + W ( ν, δ ) = W ( ρ, ν ) + M ( ν ) . This yields (A.5). A Cauchy type inequality with (A.5) yields M ( ρ Nτ ) − M ( ρ τ ) = N X k =1 h M ( ρ kτ ) − M ( ρ k − τ ) i ≤ N X k =1 W ( ρ k − τ , ρ kτ ) M ( ρ kτ ) ≤ τ ∗ N X k =1 W ( ρ k − τ , ρ kτ ) τ + 4 ττ ∗ N X k =1 M ( ρ kτ ) . Substitute (A.4) into this inequality: M ( ρ Nτ ) ≤ M ( ρ τ ) + τ ∗ (cid:18) F ( ρ τ ) + 14 τ W ( ρ − τ , ρ τ ) − F ( ρ Nτ ) (cid:19) + 4 ττ ∗ N X k =1 M ( ρ kτ ) . The ﬁrst term of the right-hand side is estimated by M ( ρ τ ) ≤ W ( ρ τ , ρ ) + M ( ρ ) ≤ d √ τ + 2 M ( ρ ) . (A.6)The energy F evaluated at ρ τ is estimated using Assumptions 2.1 and 2.2 and estimate (A.6): F ( ρ τ ) = U m ( ρ τ ) + V ( ρ τ ) + W ( ρ τ ) ≤ d + 32 d (1 + M ( ρ τ )) ≤ d + 32 d (1 + 2 d √ τ + 2 M ( ρ )) . (A.7)A lower bound of the energy F evaluated at ρ Nτ is derived by the same way as in the prove of Lemma 3.1,i.e., there exist constants d and γ ∈ ( dd +1 ,

1) such that F ( ρ Nτ ) ≥ − d (1 + M ( ρ Nτ )) γ − d (1 + M ( ρ Nτ )) ≥ − ( d + 32 d )(2 + M ( ρ Nτ )) . Hence, there is a universal constant C , not depending on the step size τ , such that M ( ρ Nτ ) ≤ C + τ ∗ ( d + 32 d ) M ( ρ Nτ ) + 4 ττ ∗ N X k =1 M ( ρ kτ ) . We rearrange terms and use the deﬁnition of (3.2) to arrive at the time-discrete Gronwall inequality M ( ρ Nτ ) ≤ C + 8 ττ ∗ N X k =1 M ( ρ kτ ) . By induction on N we obtain M ( ρ Nτ ) ≤ C (cid:18) ττ ∗ (cid:19) N ≤ ˆ C exp (cid:18) N ττ ∗ (cid:19) ≤ ˆ C exp (cid:18) Tτ ∗ (cid:19) . So the second moments M ( ρ Nτ ) of the discrete solution are uniformly bounded independent of the step size τ ∈ (0 , τ ∗ ) and for all N ∈ N with N τ < T .The remaining estimates can be derived from this. An upper bound for the energy F ( ρ Nτ ) follows from(A.4) and (A.7) combined with Assumption 2.1. The lower bound on F ( ρ Nτ ) follows by the lower bounds on U m , V , and W in terms of the second moment. Hence, the boundedness of F ( ρ Nτ ) , V ( ρ Nτ ), and W ( ρ Nτ ) yieldsthe boundedness of U m ( ρ Nτ ). The upper bound for the kinetic energy follows from the lower bound for theenergy F ( ρ Nτ ), (A.4), and (A.7) combined with Assumption 2.1. (cid:3) Appendix B. Auxiliary theorems

The following theorem is an extension of the Aubin-Lions Theorem for Banach-spaces proven in [32]. Thistheorem is the main tool to prove the convergence of the interpolated solution.

Theorem B.1 (Extension of the Aubin-Lions Theorem) . Let X be a separable Banach space, A : X → R ∪ {∞} be measurable, lower semi-continuous and with compact sublevels in X , and g : X × X → R ∪ {∞} be lower semi-continuous and such that g ( u, v ) = 0 for u, v ∈ D ( A ) implies u = v . Let ( u n ) n ∈ N be a sequenceof measurable functions u n : (0 , T ) → X such that sup n ∈ N Z T A ( u n ( t )) d t < ∞ , lim h ց lim sup n →∞ h Z h Z T − t g ( u n ( s + t ) , u n ( s )) d s d t = 0 . (B.1) Then, ( u n ) n ∈ N possesses a subsequence converging in M (0 , T ; X ) to a limit curve u ∗ : [0 , T ] → X , i.e., lim n →∞ L ( { t ∈ (0 , T ) | k u n ( t ) − u ∗ ( t ) k X ≥ σ } ) = 0 for all σ > . Remark B.2.

Note we replaced the usual weak integral equicontinuity conditionlim h ց sup n ∈ N Z T − h g ( u n ( t + h ) , u n ( t )) d t = 0 , given in the original version of the theorem. In the proof it is suﬃcient to have the relaxed averaged weakintegral equicontinuity, given in the theorem above. BDF2-APPROACH FOR THE NON-LINEAR FOKKER-PLANCK EQUATION 15

References [1] L. Ambrosio, N. Fusco, and D. Pallara.

Functions of bounded variation and free discontinuity problems ,volume 254. Clarendon Press Oxford, 2000.[2] L. Ambrosio, N. Gigli, and G. Savar´e.

Gradient ﬂows in metric spaces and in the space of probabilitymeasures . Lectures in Mathematics ETH Z¨urich. Birkh¨auser Verlag, Basel, second edition, 2008.[3] J.-D. Benamou, G. Carlier, Q. M´erigot, and E. Oudet. Discretization of functionals involving themonge–amp`ere operator.

Numerische Mathematik , 134(3):611–636, 2016.[4] A. Blanchet, V. Calvez, and J. A. Carrillo. Convergence of the mass-transport steepest descent schemefor the subcritical Patlak–Keller–Segel model.

SIAM Journal on Numerical Analysis , 46(2):691–721,2008.[5] A. Blanchet and P. Lauren¸cot. The parabolic-parabolic Keller-Segel system with critical diﬀusion as agradient ﬂow in R d , d ≥ Comm. Partial Diﬀerential Equations , 38(4):658–686, 2013.[6] V. Calvez and T. Gallou¨et. Blow-up phenomena for gradient ﬂows of discrete homogeneous functionals. arXiv preprint arXiv:1603.05380 , 2016.[7] J. Carrillo, F. Patacchini, P. Sternberg, and G. Wolansky. Convergence of a particle method for diﬀusivegradient ﬂows in one dimension.

SIAM Journal on Mathematical Analysis , 48(6):3708–3741, 2016.[8] J. Carrillo, H. Ranetbauer, and M. Wolfram. Numerical simulations of nonlinear convectionaggregationequataions by evolving diﬀeomorphisms. preprint , 2015.[9] J. A. Carrillo, M. DiFrancesco, A. Figalli, T. Laurent, and D. Slep˘cev. Global-in-time weak measuresolutions and ﬁnite-time aggregation for nonlocal interaction equations.

Duke Math. J. , 156(2):229–271,02 2011.[10] J. A. Carrillo and J. S. Moll. Numerical simulation of diﬀusive and aggregation phenomena in nonlinearcontinuity equations by evolving diﬀeomorphisms.

SIAM Journal on Scientiﬁc Computing , 31(6):4305–4329, 2009.[11] M. Di Francesco, A. Esposito, and S. Fagioli. Nonlinear degenerate cross-diﬀusion systems with nonlocalinteraction.

Nonlinear Analysis , 169:94–117, 2018.[12] B. D¨uring, D. Matthes, and J. P. Miliˇsic. A gradient ﬂow scheme for nonlinear fourth order equations.

Discrete Contin. Dyn. Syst. Ser. B , 14(3):935–959, 2010.[13] M. Erbar et al. The heat equation on manifolds as a gradient ﬂow in the Wasserstein space. In

Annalesde l’Institut Henri Poincar´e, Probabilit´es et Statistiques , volume 46, pages 1–23. Institut Henri Poincar´e,2010.[14] T. Gallou¨et and Q. M´erigot. A Lagrangian scheme for the incompressible euler equation using optimaltransport. arXiv preprint arXiv:1605.00568 , 2016.[15] L. Giacomelli and F. Otto. Variatonal formulation for the lubrication approximation of the Hele-Shawﬂow.

Calculus of Variations and Partial Diﬀerential Equations , 13(3):377–403, 2001.[16] U. Gianazza, G. Savar´e, and G. Toscani. The Wasserstein gradient ﬂow of the Fisher information andthe quantum drift-diﬀusion equation.

Arch. Ration. Mech. Anal. , 194(1):133–220, 2009.[17] E. Giusti.

Minimal surfaces and functions of bounded variation , volume 80. Birkhauser Verlag, 1984.[18] R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of the Fokker-Planck equation.

SIAM J. Math. Anal. , 29(1):1–17, 1998.[19] O. Junge, D. Matthes, and H. Osberger. A fully discrete variational scheme for solving nonlinearfokker-planck equations in higher space dimensions. arXiv preprint arXiv:1509.07721 , 2015.[20] D. Kinderlehrer, L. Monsaingeon, and X. Xu. A Wasserstein gradient ﬂow approach to Poisson-Nernst-Planck equations.

ESAIM: Control, Optimisation and Calculus of Variations , 2016.[21] L. Laguzet. High order variational numerical schemes with application to Nash -MFG vaccinationgames. working paper or preprint, 2016.[22] P. Lauren¸cot and B.-V. Matioc. A gradient ﬂow approach to a thin ﬁlm approximation of the Muskatproblem.

Calc. Var. Partial Diﬀerential Equations , 47(1-2):319–341, 2013.[23] G. Legendre and G. Turinici. Second-order in time schemes for gradient ﬂows in Wasserstein andgeodesic metric spaces.

Comptes Rendus Mathematique , 355(3):345 – 353, 2017.[24] S. Lisini, D. Matthes, and G. Savar´e. Cahn-Hilliard and thin ﬁlm equations with nonlinear mobility asgradient ﬂows in weighted-Wasserstein metrics.

J. Diﬀerential Equations , 253(2):814–850, 2012. [25] D. Matthes, R. J. McCann, and G. Savar´e. A family of nonlinear fourth order equations of gradientﬂow type.

Comm. Partial Diﬀerential Equations , 34(10-12):1352–1397, 2009.[26] D. Matthes and H. Osberger. Convergence of a variational Lagrangian scheme for a nonlinear driftdiﬀusion equation.

ESAIM: Mathematical Modelling and Numerical Analysis , 48(3):697–726, 2014.[27] D. Matthes and S. Plazotta. A variational formulation of the BDF2 method for metric gradient ﬂows. arXiv preprint arXiv:1711.02935 , 2017.[28] D. Matthes and B. S¨ollner. Convergent Lagrangian discretization for drift-diﬀusion with nonlocal ag-gregation. In

Innovative Algorithms and Analysis , pages 313–351. Springer, 2017.[29] D. Matthes and J. Zinsl. Existence of solutions for a class of fourth order cross-diﬀusion systems ofgradient ﬂow type.

Nonlinear Analysis , 159:316–338, 2017.[30] F. Otto. The geometry of dissipative evolution equations: the porous medium equation.

Comm. PartialDiﬀerential Equations , 26(1-2):101–174, 2001.[31] G. Peyr´e. Entropic approximation of Wasserstein gradient ﬂows.

SIAM Journal on Imaging Sciences ,8(4):2323–2351, 2015.[32] R. Rossi and G. Savar´e. Tightness, integral equicontinuity and compactness for evolution problemsin Banach spaces.

Annali della Scuola Normale Superiore di Pisa-Classe di Scienze-Serie V , 2(2):395,2003.[33] F. Santambrogio.

Optimal transport for applied mathematicians . Springer, 2015.[34] K.-T. Sturm. Convex functionals of probability measures and nonlinear diﬀusions on manifolds.

Journalde math´ematiques pures et appliqu´ees , 84(2):149–168, 2005.[35] C. M. Topaz, A. L. Bertozzi, and M. A. Lewis. A nonlocal continuum model for biological aggregation.

Bulletin of mathematical biology , 68(7):1601–1623, 2006.[36] C. Villani.

Topics in optimal transportation , volume 58 of

Graduate Studies in Mathematics . AmericanMathematical Society, Providence, 2003.[37] C. Villani.

Optimal transport: Old and New , volume 338. Springer Science & Business Media, 2008.[38] M. Westdickenberg and J. Wilkening. Variational particle schemes for the porous medium equation andfor the system of isentropic euler equations.

ESAIM: Mathematical Modelling and Numerical Analysis ,44(1):133–166, 2010.[39] J. Zinsl and D. Matthes. Exponential convergence to equilibrium in a coupled gradient ﬂow systemmodeling chemotaxis.

Analysis & PDE , 8(2):425–466, 2015.[40] J. Zinsl and D. Matthes. Transport distances and geodesic convexity for systems of degenerate diﬀusionequations.

Calculus of Variations and Partial Diﬀerential Equations , 54(4):3397–3438, 2015.

Zentrum f¨ur Mathematik, Technische Universit¨at M¨unchen, 85747 Garching, Germany

E-mail address ::