[PDF] The structure of global conservation laws in Galerkin plasma models

Abstract

Plasmas are highly nonlinear and multi-scale, motivating a hierarchy of models to understand and describe their behavior. However, there is a scarcity of plasma models of lower fidelity than magnetohydrodynamics (MHD). Galerkin models, obtained by projection of the MHD equations onto a truncated modal basis, can furnish this gap in the lower levels of the model hierarchy. In the present work, we develop low-dimensional Galerkin plasma models which preserve global conservation laws by construction. This additional model structure enables physics-constrained machine learning algorithms that can discover these types of low-dimensional plasma models directly from data. This formulation relies on an energy-based inner product which takes into account all of the dynamic variables. The theoretical results here build a bridge to the extensive Galerkin literature in fluid mechanics, and facilitate the development of physics-constrained reduced-order models from plasma data.

Full PDF

TThe structure of global conservation lawsin Galerkin plasma models

Alan A. Kaptanoglu

Department of Physics, University of Washington, Seattle, WA, 98195, USA

Kyle D. Morgan

Department of Aeronautics and Astronautics, University of Washington, Seattle, WA, 98195, USA

Chris J. Hansen

Department of Aeronautics and Astronautics, University of Washington, Seattle, WA, 98195, USA andDepartment of Applied Physics and Applied Mathematics,Columbia University, New York, NY, 10027, USA

Steven L. Brunton

Department of Mechanical Engineering, University of Washington, Seattle, WA, 98195, USA

Plasmas are highly nonlinear and multi-scale, motivating a hierarchy of models to understandand describe their behavior. However, there is a scarcity of plasma models of lower ﬁdelity thanmagnetohydrodynamics (MHD). Galerkin models, obtained by projection of the MHD equationsonto a truncated modal basis, can furnish this gap in the lower levels of the model hierarchy.In the present work, we develop low-dimensional Galerkin plasma models which preserve globalconservation laws by construction. This additional model structure enables physics-constrainedmachine learning algorithms that can discover these types of low-dimensional plasma models directlyfrom data. This formulation relies on an energy-based inner product which takes into account allof the dynamic variables. The theoretical results here build a bridge to the extensive Galerkinliterature in ﬂuid mechanics, and facilitate the development of physics-constrained reduced-ordermodels from plasma data.

I. INTRODUCTION

There are a tremendous number of known plasma mod-els of varying model complexity, from magnetohydro-dynamics (MHD) to the Klimontovich equations, but alarge gap exists in the lower rungs of this hierarchy be-tween simple circuit models and the many MHD variants.This is a valuable place for improvement because higherﬁdelity models often require computationally intensiveand high-dimensional simulations [1–3], obfuscating thedynamics and precluding model-based real-time control.Additionally, many high-dimensional nonlinear systemstend to evolve on low-dimensional attractors [4], deﬁnedby spatio-temporal coherent structures that character-ize the dominant behavior of the system. A number ofstudies in the plasma physics community indicate thatthe vast majority of the total plasma energy can be ex-plained by fewer than ten low-dimensional modes, acrossa large range of parameter regimes, geometry, and degreeof nonlinearity [5–13]. In these cases, the evolution ofonly a few coherent structures, obtained from systematicmodel-reduction techniques [14, 15], can closely approx-imate the full evolution of the high-dimensional physicalsystem.In the present work and our companion paper [16], weprovide a theoretical framework for physics-constrained,low-dimensional plasma models which furnish this im-portant gap in the hierarchy of plasma models. This is a breakthrough in principled and interpretable model re-duction, oﬀering an alternative to deep learning methodswhich often require vast quantities of data and produceopaque models. The present work focuses on the impor-tant technical details for constructing and constrainingthese models, while the accompanying publication con-tains an overview of our high-level contributions and ini-tial results on 3D plasma simulations.

II. LOW-DIMENSIONAL MODELS

Although there are many ways to obtain low-dimensional models, Galerkin methods and their exten-sions have seen remarkable success in ﬂuid mechanics;careful development of a dimensionalized inner productenabled the extension of the proper orthogonal decompo-sition (POD) from incompressible to compressible ﬂuidﬂows [17]. It is also common in ﬂuid mechanics to obtainnonlinear reduced-order models by Galerkin projection ofthe Navier-Stokes equations onto POD modes, making itpossible to enforce known symmetries and conservationlaws, such as conservation of energy [18–21].The present work adapts and extends these innovationsfor plasmas, enabling a wealth of advanced modeling andcontrol machinery. The POD is already used extensivelyfor interpreting plasma physics data across a range of pa-rameter regimes [22–26]. For clarity of presentation androbust connection with the Galerkin literature in ﬂuid a r X i v : . [ phy s i c s . p l a s m - ph ] J a n mechanics, we present results for MHD models which areat most quadratic in nonlinearity. This includes idealMHD, incompressible Hall-MHD, or compressible Hall-MHD with a slowly time-varying density, which togetherdescribe the dynamics of a fairly broad class of space andlaboratory plasmas [27–32]. A. An MHD energy inner product

Traditional use of the POD on the MHD ﬁelds (ve-locity, magnetic, and temperature) requires separate de-compositions for u , B , and T , or an arbitrary choice ofdimensionalization. However, separate decompositionsof the ﬁelds obfuscates the interpretability and increasesthe complexity of a low-dimensional model, and choosingthe units of the combined matrix of measurement datacan have a signiﬁcant impact on the performance andenergy spectrum of the resulting POD basis. Inspiredby the inner product deﬁned for compressible ﬂuids [17],we introduce an inner product for compressible magne-tohydrodynamic ﬂuids by using the conﬁguration vector q ( x , t ) = [ B u , B , B T ]. Here B u = √ ρµ u , B T = 2 (cid:112) ρµ k b T /m i ( γ − , (1)where u is the ﬂuid velocity, ρ is the mass density, k b is Boltzmann’s constant, µ is the permeability of freespace, T is the plasma temperature, m i is the ion mass, γ is the adiabatic index, and p = 2 ρT /m i is the plasmapressure. B u and B T are deﬁned so that the followingscaled inner product yields the total energy W,W = 12 µ (cid:104) q , q (cid:105) = (cid:90) (cid:32) ρu + B µ + pγ − (cid:33) d x . (2) B. Review of the POD

For the POD, measurements at time t k are arrangedin a vector q k ∈ R D , called a snapshot, where the di-mension D is the product of the number of spatial lo-cations and the number of variables measured at eachpoint. The data is sampled at times t , t , ..., t M , ar-ranged in a matrix X ∈ R D × M , and the average in time¯ q is subtracted oﬀ. The singular value decomposition(SVD) provides a low-rank approximation of the subse-quent data matrix X = time −−−−−−−−−−−−−−−−−−−−−−→  q ( t ) q ( t ) · · · q ( t M ) q ( t ) q ( t ) · · · q ( t M )... ... . . . ... q D ( t ) q D ( t ) · · · q D ( t M ) (cid:121) s t a t e = U Σ V ∗ , (3) where U ∈ R D × D and V ∈ R M × M are unitary matrices,and Σ ∈ R D × M is a diagonal matrix containing non-negative and decreasing entries s jj called the singularvalues of X . V ∗ denotes the complex-conjugate trans-pose of V . The singular values indicate the relativeimportance of the corresponding columns of U and V for describing the spatio-temporal structure of X . Al-though varying terminology is used in diﬀerent ﬁelds (inthe plasma physics community this method is often re-ferred to as the biorthogonal decomposition, or BOD),in practice the SVD, BOD, and POD are synonymous.It is often possible to discard small values of Σ , result-ing in a truncated matrix Σ r ∈ R r × r . With the ﬁrst r (cid:28) min( D, M ) columns of U and V , denoted U r and V r , the matrix X can be approximated as X ≈ U r Σ r V ∗ r . (4)The truncation rank r is typically chosen to balance ac-curacy and complexity [33]. The computational com-plexity of the SVD is O ( DM + M ) [34], although thereare randomized singular value decompositions [35–37]for extremely large problems that can be as fast as O ( DM log( r )). This computational speed enables theSVD to be computed online, for updating a model in areal-time control application, or oﬄine, for the decompo-sition of very large data, an examination of the physics,or development of a more generic model describing thedynamics of an amalgam of discharges. A well-deﬁnedSVD requires that the measurements in X have the samephysical dimensions. With a dimensionalized measure-ment vector q , the matrix X ∗ X may be computed via X ∗ X ≈ (cid:104) q ( t k ) , q ( t m ) (cid:105) . (5)When the number of snapshots is far fewer than the num-ber of measurements, M (cid:28) D , we can use the method ofsnapshots. Substitution of Eq. 4 into X ∗ X produces X ∗ XV r ≈ V r Σ r , (6)an eigenvalue equation for V r ; therefore we can obtain V r by diagonalizing X ∗ X ∈ R M × M instead of computing theSVD of X . The chronos are the temporal SVD modes,i.e. the columns of V r , denoted v j . The topos are thespatial modes forming the columns of U r , denoted χ .In the present work, we scale the normalized matrix ofchronos, a , through a jk = v jk / (cid:80) rj =1 max k | v jk | . Finally,the reconstruction can be written q ( x i , t k ) ≈ ¯ q ( x i )+ r (cid:88) j =1 χ j ( x i ) a j ( t k ) , (7)We have absorbed the normalization of a jk and the sin-gular values into the deﬁnition of χ j ( x i ). By construc-tion (cid:104) χ i , χ j (cid:105) ∝ δ ij . Note that, in principle, we could haveexpanded q in any set of modes, although orthonormalmodes are preferred because this property facilitates theanalysis in Section III. The advantage of the POD basis isthat the modes are ordered by energy content; a trunca-tion of the system still captures the vast majority of thedynamics. A separate POD of each of the MHD ﬁeldswould lead to three sets of POD modes with indepen-dent time dynamics and mixed orthogonality properties.In contrast, our approach captures all the ﬁelds simulta-neously, resulting in a single set of modes a i ( t ) in Eq. (7).An example of this decomposition is illustrated inFig. 1 for a 3D plasma simulation that is dominated byharmonics of a driving frequency. In general, examiningthe structure and symmetry in the spatial and tempo-ral POD modes can inform physical understanding. Forinstance, in Fig. 1, the short-wavelength structures ex-hibited in the 3D spatial modes derive both from dis-persive whistler waves via the Hall term and the smallcharacteristic scale associated with the driving actuator.The steep fall-oﬀ in the singular values also indicates thatmodels of only the ﬁrst few modes would be enough toaccurately forecast and control the dominant dynamics. C. POD-Galerkin models

While we have obtained a useful modal decompositionof the evolved ﬁelds, we have yet to derive a model forthe subsequent temporal evolution of the modes. Now wesubstitute Eq. (7) into a quadratically nonlinear MHDmodel, such as ideal MHD, incompressible Hall-MHD, orcompressible Hall-MHD with a slowly time-varying den-sity. Utilizing the orthonormality of the χ i produces:˙ a i ( t ) = C i + r (cid:88) j =1 L ij a j + r (cid:88) j,k =1 Q ijk a j a k , (8) C i = (cid:104) C + L ( ¯ q )+ Q ( ¯ q , ¯ q ) , χ i (cid:105) ,L ij = (cid:104) L ( χ j )+ Q ( ¯ q , χ j )+ Q ( χ j , ¯ q ) , χ i (cid:105) ,Q ijk = (cid:104) Q ( χ j , χ k ) , χ i (cid:105) . The inner products integrate out the spatial dependence,and the model is quadratic in the temporal POD modes a i ( t ). In contrast to Eq. (8), a Galerkin model based onseparate POD expansions for each ﬁeld would involve sig-niﬁcantly more complicated nonlinear terms from mixingand a lack of orthonormality (cid:104) χ u i , χ B j (cid:105) (cid:54) = δ ij between the diﬀerent POD modes associated with each ﬁeld. III. CONSTRAINTS ON MODEL STRUCTURE

Local and global MHD conservation laws are retainedin this low-dimensional basis. Vanishing ∇· B and theorthonormality of the temporal POD modes produce ∇· χ Bi = 0 , ∀ i. (9)In other words, the orthonormality of the temporalmodes guarantees that the local divergence constraintis satisﬁed by each of the χ Bi by construction. In con-trast, global energy conservation will produce strong con-straints on the structure of the coeﬃcients in Eq. 8. A. Global conservation of energy

For an examination of the global conservation laws, weconsider isothermal Hall-MHD with the assumption thatthe density is slowly-varying in time. This model reducesto ideal MHD and incompressible resistive or Hall MHDin the appropriate limits, and produces (Galtier [39] Eq.3.22) ∂W∂t = − (cid:90) (cid:20) ˜ ν ( ∇× u ) + ηµ ( ∇× B ) +43 ˜ ν ( ∇· u ) (cid:21) d x (10) − (cid:73) (cid:34)(cid:18) ρu + p (cid:19) u + P −

43 ˜ ν ( ∇· u ) u − ˜ ν u × ( ∇× u ) (cid:35) · ˆ n dS. Here ˆ n is a unit normal vector to the boundary, and P = 1 µ E × B = u e µ · ( B I − BB ) − ηµ ( ∇× B ) × B , (11)is the Poynting vector ( E is the electric ﬁeld), which isoften a known function of space and time. Omission ofthe Hall term changes u e to u . Even with a Hall-MHDmodel that evolves the temperature, the electron diamag-netic contribution to P does not change the energy bal-ance if Dirichlet boundary conditions are used for ρ and T . To simplify, we assume that u · ˆ n = u × ˆ n =0, J · ˆ n =0,and B · ˆ n =0 at the wall, consistent with the simulationsused in the accompanying work [16]. Moreover, we nowassume that we have a steady-state, deﬁne a constant a ( t )=1, and substitute Eq. (7) into Eq. (10).0 ≈ ∂W∂t = (cid:73) ηµ (( ∇× B ) × B ) · ˆ n dS − (cid:90) (cid:34) νµ ( ∇× B u − ∇ ρ ρ × B u ) + ηµ ( ∇× B ) + 43 νµ ( ∇· B u − ∇ ρ ρ · B u ) (cid:35) d x , (12)= W C + r (cid:88) i =1 W Li a i + r (cid:88) i,j =1 W Q ij a i a j = W C + r (cid:88) i =1 a i ( W Li + r (cid:88) j =1 W Qij a j )= W C + r (cid:88) i =1 r (cid:88) j =0 W Qij a i a j = r (cid:88) i =0 r (cid:88) j =0 W Qij a i a j , tω (a) Pairwise correlations of POD amplitudes (b) Spatial POD modes(c) Time evolutions a ( t ) and Fourier transforms ˜ a ( ω ) N o r m a li ze d s i n g u l a r v a l u e s Mode index10 − − − − − a a a a a a a a a a a a a a χ χ χ χ χ χ χ B x B y B z B ux B uy B uz a a a a a a a ˜ a ˜ a ˜ a ˜ a ˜ a ˜ a ˜ a FIG. 1: The ﬁrst seven POD modes for an isothermal, compressible Hall-MHD simulation of the HIT-SI device [38]:(a) Feature space trajectories of every mode pair and the singular values; (b) 3D spatial modes visualized in the Z = 0 midplane and normalized to ±

1; (c) Normalized temporal modes and corresponding Fourier transformsproduce harmonics of the driving frequency, labeled 1-5 in the Fourier space.where we have padded the matrix in the last step so that W Q i =0, W Qi = W Li for i ∈{ ,...,r } , and W Q = W C . Itimmediately follows from Eq. (12) that W Qij is an antisymmetric matrix. Writing out the coeﬃcients we have0= W Q00 = ηµ (cid:73) (cid:2) ( ∇× ¯ B ) × ¯ B (cid:3) · ˆ n dS − (cid:90) (cid:20) ν ( ∇× ¯ B u −∇ ρ ρ × ¯ B u ) + ηµ ( ∇× ¯ B ) +43 ν ( ∇· ¯ B u −∇ ρ ρ · ¯ B u ) (cid:21) d x , (13)0= W Q i = ηµ (cid:73) (cid:104) ( ∇× ¯ B ) × χ Bi +( ∇× χ Bi ) × ¯ B (cid:105) · ˆ n dS − (cid:90) (cid:20) ν ( ∇× ¯ B u −∇ ρ ρ × ¯ B u ) · ( ∇× χ B u i −∇ ρ ρ × χ B u i )+ ηµ ( ∇× ¯ B ) · ( ∇× χ Bi )+43 ν ( ∇· ¯ B u −∇ ρ ρ · ¯ B u ) · ( ∇· χ B u i −∇ ρ ρ · χ B u i ) (cid:21) d x ,W Q ij = ηµ (cid:73) (cid:104) ( ∇× χ Bi ) × χ Bj (cid:105) · ˆ n dS − (cid:90) (cid:20) ν ( ∇× χ B u i −∇ ρ ρ × χ B u i ) · ( ∇× χ B u j −∇ ρ ρ × χ B u j )+ ηµ ( ∇× χ Bi ) · ( ∇× χ Bj )+43 ν ( ∇· χ B u i −∇ ρ ρ · χ B u i ) · ( ∇· χ B u j −∇ ρ ρ · χ B u j ) (cid:21) d x . With some algebra, we can compute a i ˙ a i for i ∈{ ,...,r } , a i ˙ a i = r (cid:88) i,j =1 a i ∂a j ∂t (cid:90) χ i χ j d x = (cid:90) ∂q ∂t d x = ∂W∂t (14) In index notation a i ˙ a i = a i C i + a i L ij a j + a i Q ijk a j a k for i,j,k ∈{ ,...,r } . First, note that W Q i =0 produces C i =0for all i ∈{ ,...,r } . In other words, there are no constantterms in the Galerkin model; data-driven methods canimplement this constraint by simply searching for modelsthat do not have constant terms. This a physical conse-quence of our assumption that ¯ q is steady-state becausenonzero constant terms in the Galerkin model would im-ply the possibility of unbounded growth in the energynorm. The anti-symmetry of W Qij for i,j ∈{ ,...,r } con-strains the quadratic structure of a T · a , a T L a ≈ . (15)This physical interpretation is also clear; if the plasma issteady-state but has ﬁnite dissipation, the input power,here manifested through a purely quadratic Poynting ﬂux P ∝ η J × B , must be balancing these losses. Finally, be-cause of the boundary conditions there are no cubic terms in the energy at all, meaning a T Q aa =0 . (16)This is the condition for a system to have energy-preserving quadratic nonlinearities; this conclusion doesnot rely on any assumption of steady-state and energy-preserving structure in other quadratic nonlinearities iswell-studied in ﬂuid mechanics [20, 40]. B. Global conservation of cross-helicity

An analogous derivation can be done to further con-strain the model-building for systems which conservecross-helicity, although this is not appropriate for thesimulation results in the accompanying work [16]. Con-sider the local form of cross-helicity H c = u · B . UsingGaltier [39] Eq. (3.36), ∂H c ∂t = −∇· (cid:32) u γp ( γ − ρ (cid:33) B + u × ( u × B ) − d i √ ρµ u × (cid:0) ( ∇× B ) × B (cid:1) − η u × ( ∇× B )  (17)+ ν ∇· (cid:18) B × ( ∇× u )+ 43 ( ∇· u ) B (cid:19) − d i √ ρµ ( ∇× u ) · (cid:0) ( ∇× B ) × B (cid:1) − ( η + ν )( ∇× B ) · ( ∇× u ) . Consider again the simplifying case J · ˆ n =0, B · ˆ n =0, and u · ˆ n = u × ˆ n =0. Then the integral form is0 ≈ (cid:90) ∂H c ∂t d x = (cid:90) (cid:34) ν ∇ ρρ · (cid:18) B × ( ∇× u )+ 43 ( ∇· u ) B (cid:19) − d i √ ρµ ( ∇× u ) · (cid:0) ( ∇× B ) × B (cid:1) − ( η + ν )( ∇× B ) · ( ∇× u ) (cid:35) d x . (18)Substituting in Eq. (7) produces terms up to cubic in thetemporal POD modes, (cid:90) ∂H c ∂t d x = ∂∂t ( a i a j ) (cid:90) √ ρµ χ B u i · χ Bj d x (19)0= A ij ∂∂t ( a i a j )=  A ij C j a i A ij L jk a i a k A ij Q jkl a i a k a l Note that if the system is energy-preserving, C j =0 forall j, so the ﬁrst equality is already satisﬁed. The secondinequality gives A ij L jk anti-symmetric under swapping i and k , and energy-preservation gave anti-symmetry un-der swapping j and k (Eq. 15). The most straightforwardsolution is L jk =0 for all j , k ; this solution is precisely theideal limit corresponding to η = ν =0. Since A ij is notsymmetric by construction, this constraint can also ap-ply to systems which conserve cross-helicity despite ﬁnitedissipation.Lastly, A ij Q jkl , containing only the contribution fromthe Hall-term, exhibits the same structure as (and is compatible with) our constraint on the energy-preservingnonlinearities in Eq. (16). The simplest solution is A ij Q jkl =0 for all i,k,l , since this corresponds to standardMHD without the Hall term. Like the analysis of the lin-ear terms, this constraint indicates that if the Hall-termshave this special energy-preserving structure, nonzeroHall contributions can still conserve cross-helicity. En-forcing other invariants of Hall-MHD may require alter-native formulations to the one presented here, since de-rived ﬁelds like the vector potential are involved. C. Conservation laws with velocity units

In closer analogy to ﬂuid dynamics [17], we could havealternatively used q =[ u , u A ,u s ], u s = 4 Tm i ( γ − , u A = B √ µ ρ , (20)12 (cid:104) q , q (cid:105) = 12 (cid:90) (cid:16) u + u A + u s (cid:17) d x . (21)We have deﬁned a scaled plasma sound speed, u s . If ρ is uniform ρ (cid:104) q , q (cid:105) / W . The isothermal and time-independent density assumptions allow us to derive an-other quadratic model in q , for which a POD-Galerkinmodel is readily available (the form is identical to Eq. 8but the coeﬃcients have changed). Once again, assume u · ˆ n = u × ˆ n =0, and B · ˆ n =0 on the boundary, so that (cid:90) ρ dq dt d x = ∂W∂t . (22)This is equivalent to Eq. (14) in the particular case oftime-independent density. Without this assumption, anextra term appears, proportional to (cid:82) u ·∇ ( u + u A ) d x .Although from dimensional analysis this term is poten-tially very large, this may not be the case for many lab-oratory devices with strong anisotropy introduced by alarge external magnetic ﬁeld. For instance, steady-statetoroidal plasmas with large closed ﬂux surfaces wouldexpect u ·∇ u A and u ·∇ u to be small, as the ﬂuid ve-locity is primarily along ﬁeld lines and gradients in boththe magnetic and velocity ﬁelds are primarily across ﬁeldlines. For this reason, in certain devices the use of q =[ u , u A ,u s ] could be a useful alternative to the formu-lation used in the main body of this work. It is possi-ble that, in these units, the structure of the nonlinear-ities in the associated POD-Galerkin model may provemore amenable to analysis. Now that we have illustratedhow global conservation laws manifest as structure inGalerkin models, we could compute these coeﬃcients andevolve the subsequent model. However, For an explicitcalculation of the model coeﬃcients, the ﬁrst and sec-ond order spatial derivatives for the MHD ﬁelds must bewell-approximated in the region of experimental interest.In some cases, high-resolution diagnostics on experimen-tal devices can resolve these quantities in a particularregion of the plasma, but even if this data is available,computing these inner products and evaluating the non-linear terms in the model is resource-intensive. Fortu-nately, there are hyper-reduction techniques from ﬂuiddynamics [41], such as the discrete empirical interpola-tion method (DEIM) [42], QDEIM [43], missing-point es-timation (MPE) [44] and gappy POD [45, 46], which canenable eﬃcient computations. Instead of using hyper-reduction, one can use emerging and increasingly sophis-ticated machine learning methods to discover Galerkinmodels from data. In the following section, we deriveconstraints on machine learning methods that guaranteethe model structure we derived from global conservationlaws in Sec. III. IV. GLOBAL CONSERVATION LAWS INMACHINE LEARNING MODEL DISCOVERY

Increasingly, machine learning techniques are allowingscientists to extract a system’s governing equations ofmotion directly from data. Here we show how these algo- rithms can directly incorporate global conservation lawsduring the search for low-dimensional models in plasmadatasets. We use the sparse identiﬁcation of nonlineardynamics (SINDy) algorithm [47] to identify nonlinearreduced-order models for plasmas in the accompanyingwork [16].

A. The constrained SINDy method

The goal of SINDy is to identify a low-dimensionalmodel for a ( t ), the vector of POD amplitudes, as a sparselinear combination of elements from a library of candi-date terms Θ = (cid:2) θ ( a ) θ ( a ) ··· θ p ( a ) (cid:3) : ddt a = f ( a ) ≈ Θ ( a ) Ξ . (23)To address this combinatorically hard problem, it lever-ages sparse regression techniques, optimizing for thesparsest set of equations that produces an accurate ﬁtof the data. The SINDy optimization problem ismin Ξ || ˙ a − Θ ( a ) Ξ || + λR ( Ξ ) , (24)subject to D Ξ [:]= d , where R ( Ξ ) is some regularizer, like the L or L norm, which promotes sparsity in Ξ . Here a , ˙ a ∈ R M × r , Θ ( a ) ∈ R M × N , Ξ ∈ R N × r , D ∈ R N c × rN , Ξ [:] ∈ R rN , d ∈ R N c , where N is the number of candidateterms, N c is the number of constraints, and Ξ [:]= (cid:2) ξ a ··· ξ a r ··· ξ a N ··· ξ a r N (cid:3) . B. Derivation of the SINDy constraints

In Sec. III, we derived model constraints from globalconservation laws; our goal here is to rewrite these con-straints to be compatible with the notation in Eq. 24.The conclusions for the global conservation of energywere: 1) no constant terms, 2) an anti-symmetryconstraint on the linear part of the coeﬃcient ma-trix Ξ , and 3) a more complicated energy-preservingstructure in the quadratic coeﬃcients. Consider aquadratic library in a set of r modes, ordered as Θ ( a )=[ a ,...,a r ,a a ,...,a r − a r ,a ,...,a r ]. Note that this ar-rangement of the polynomials in Θ diﬀers from Loiseauet al. [40], so the indexing and subscripts are also dif-ferent here. First we will consider the constraint on thelinear part of the Galerkin model in Eq. 8, or equivalentlythat the quadratic term a T L a ≈

0. We can rewrite thisin the SINDy notation as0= (cid:2) a ··· a r (cid:3) ξ a ··· ξ a r ... . . . ... ξ a r r ··· ξ a r r  a ... a r  . (25)We conclude ξ a j i = − ξ a i j for i,j ∈{ ,...,r } and identify ξ a j i by accessing the ( i − r + j index in Ξ [:]. Note we areonly accessing the ﬁrst r elements of Ξ [:]. For mod-els limited to linear and quadratic polynomials, N =( r +3 r ) / N L =( r + r ) / rN − N L = r ( r +2 r − / D Ξ [:]= d ,we can write this out explicitly for r =3,  ··· ··· ··· ··· ··· ···  ξ a ξ a ξ a ξ a ...  =   . (26)The boundary conditions u · ˆ n =0, J · ˆ n =0, B · ˆ n =0 guar-anteed that the quadratic nonlinearities were energy-preserving, and thus that cubic terms in Eq. 10 vanish.In this case, regardless of the steady-state assumption,we have that r (cid:88) i,j,k =0 Q ijk a i a j a k ≈ . (27)This constraint is signiﬁcantly more involved to reformat.Written in SINDy notation, this is equivalent to0= (cid:2) a ··· a r (cid:3) ξ a r +1 ξ a r +2 ··· ξ a N ... ... ... ... ξ a r r +1 ξ a r r +2 ··· ξ a r N  a a ... a r − a r a ... a r  . (28)Expand this all out and group the like terms, i.e. termswhich look like a i , a i a j or a i a j a k , i,j,k ∈{ ,...,r } , i (cid:54) = j (cid:54) = k . All of the like terms can be straightforwardlyshown to be linearly independent, so we can considerthree constraints separately for the three types of terms.The number of each of these respective terms is (cid:0) r (cid:1) = r ,2 (cid:0) r (cid:1) = r ( r − (cid:0) r (cid:1) = r ( r − r − /

6, for a total of r ( r +1)( r +2) / N Q constraints. With both constraints,we have rN − N L − N Q = r ( r − r +5) / N c = N L + N Q constraints. Further considering thequadratic case, we ﬁnd that coeﬃcients which adorn a i must vanish, ξ a i N − r + i =0. Now deﬁne˜ ξ ijk = ξ a i r + j (2 r − j − k − . (29)The second type of constraint, with i (cid:54) = j , produces ξ a i N − r + j = (cid:40) ˜ ξ jij ij, (30) while the third type of constraint produces˜ ξ ijk + ˜ ξ jik + ˜ ξ kij =0 . (31)This relation is equivalent to the energy-preserving con-ditions in Schlegel et al. [20], but the indexing is notstraightforward, even after fully expanding Eq. 28. Thisequation is an arbitrary r generalization to the r =3 con-straint used in Loiseau et al. [40]. For the speciﬁc casewhere the plasma system is Hamiltonian (for instancein ideal [48], Hall [49], and extended [50] MHD withoutdissipation) and the measurements are assumed to besuﬃcient to represent the Hamiltonian, one could alter-natively use formulations of SINDy to directly discoverthe Hamiltonian [51] and subsequently derive the equa-tions of motion. Lastly, cross-helicity constraints canbe straightforwardly implemented from these results. Ifthe constraint on the quadratic terms in Ξ is written D jk Ξ k =0, then the quadratic cross-helicity constraintcan be written D jk A kl Ξ l =0. V. CONCLUSION

A hierarchy of models with varying ﬁdelity is essen-tial for understanding and controlling plasmas. Thepresent work, along with the accompanying work [16],provides a principled lower level on this hierarchy − low-dimensional and interpretable plasma models which canbe used for physical discovery, forecasting, stability anal-ysis, and real-time control. We have illustrated howGalerkin plasma models retain the global conservationlaws of MHD, and how machine learning methods likeSINDy can use these constraints directly in an optimiza-tion procedure for discovering such models from data.This principled enforcement of global conservation lawsis critical for the success of future low-dimensional plasmamodels. VI. ACKNOWLEDGEMENTS

The authors would like to extend their gratitude toDr. Uri Shumlak for his input on this work. This workand the companion work were supported by the ArmyResearch Oﬃce (ARO W911NF-19-1-0045) and the AirForce Oﬃce of Scientiﬁc Research (AFOSR FA9550-18-1-0200). Simulations were supported by the U.S. Depart-ment of Energy under award numbers DE-SC0016256and DE-AR0001098 and used resources of the NationalEnergy Research Scientiﬁc Computing Center, supportedby the Oﬃce of Science of the U.S. Department of Energyunder Contract No. DE-AC02–05CH11231. [1] J. Candy and R. E. Waltz, Anomalous transport scal-ing in the DIII-D tokamak matched by supercomputersimulation, Phys. Rev. Lett. , 045001 (2003).[2] O. Ohia, J. Egedal, V. S. Lukin, W. Daughton, and A. Le,Demonstration of anisotropic ﬂuid closure capturing thekinetic structure of magnetic reconnection, Phys. Rev.Lett. , 115004 (2012).[3] D. Groˇselj, A. Mallet, N. F. Loureiro, and F. Jenko, Fullykinetic simulation of 3D kinetic Alfv´en turbulence, Phys.Rev. Lett. , 105101 (2018).[4] K. Taira, S. L. Brunton, S. Dawson, C. W. Rowley,T. Colonius, B. J. McKeon, O. T. Schmidt, S. Gordeyev,V. Theoﬁlis, and L. S. Ukeiley, Modal analysis of ﬂuidﬂows: An overview, AIAA Journal , 4013 (2017).[5] I. Furno, Fast transient transport phenomena measuredby soft x-ray emission in TCV tokamak plasmas , Tech.Rep. (2001).[6] R. Jim´enez-G´omez, E. Ascas´ıbar, T. Estrada, I. Garc´ıa-Cort´es, B. Van Milligen, A. L´opez-Fraguas, I. Pastor,and D. L´opez-Bruna, Analysis of magnetohydrodynamicinstabilities in TJ-II plasmas, Fusion science and tech-nology , 20 (2007).[7] B. P. van Milligen, E. S´anchez, A. Alonso, M. A. Pe-drosa, C. Hidalgo, A. M. de Aguilera, and A. L. Fraguas,The use of the biorthogonal decomposition for the iden-tiﬁcation of zonal ﬂows at TJ-II, Plasma Physics andControlled Fusion , 025005 (2014).[8] M. Pandya, Low edge safety factor disruptions in theCompact Toroidal Hybrid: Operation in the low-q regime,passive disruption avoidance and the nature of MHD pre-cursors , Ph.D. thesis, Auburn University (2016).[9] B. Victor, C. Akcay, C. Hansen, T. Jarboe, B. Nelson,and K. Morgan, Development of validation metrics usingbiorthogonal decomposition for the comparison of mag-netic ﬁeld measurements, Plasma Physics and ControlledFusion , 045010 (2015).[10] E. Strait, J. King, J. Hanson, and N. Logan, Spatial andtemporal analysis of DIII-D 3D magnetic diagnostic data,Review of Scientiﬁc Instruments , 11D423 (2016).[11] P. J. Byrne, Study of External Kink Modes in ShapedHBT-EP Plasmas , Ph.D. thesis, Columbia University(2017).[12] S. Gu, B. Wan, Y. Sun, N. Chu, Y. Liu, T. Shi, H. Wang,M. Jia, and K. He, A new criterion for controlling edgelocalized modes based on a multi-mode plasma response,Nuclear Fusion , 126042 (2019).[13] A. A. Kaptanoglu, K. D. Morgan, C. J. Hansen, andS. L. Brunton, Characterizing magnetized plasmas withdynamic mode decomposition, Physics of Plasmas ,032108 (2020).[14] P. Benner, V. Mehrmann, and D. C. Sorensen, Dimensionreduction of large-scale systems , Vol. 45 (Springer, 2005).[15] P. Benner, M. Ohlberger, A. Cohen, and K. Willcox,

Model reduction and approximation: theory and algo-rithms (SIAM, 2017).[16] A. A. Kaptanoglu, K. D. Morgan, C. J. Hansen, and S. L.Brunton, Physics-constrained, low-dimensional modelsfor mhd: First-principles and data-driven approaches,arXiv preprint arXiv:2004.10389 (2020).[17] C. W. Rowley, T. Colonius, and R. M. Murray, Modelreduction for compressible ﬂows using POD and Galerkin projection, Physica D: Nonlinear Phenomena , 115(2004).[18] B. R. Noack, M. Schlegel, M. Morzynski, and G. Tad-mor,

Galerkin method for nonlinear dynamics (Springer,2011).[19] M. J. Balajewicz, E. H. Dowell, and B. R. Noack, Low-dimensional modelling of high-Reynolds-number shearﬂows incorporating constraints from the Navier–Stokesequation, Journal of Fluid Mechanics , 285 (2013).[20] M. Schlegel and B. R. Noack, On long-term boundednessof Galerkin models, Journal of Fluid Mechanics , 325(2015).[21] K. Carlberg, M. Barone, and H. Antil, Galerkin v. least-squares Petrov–Galerkin projection in nonlinear modelreduction, Journal of Computational Physics , 693(2017).[22] T. Dudok de Wit, A.-L. Pecquet, J.-C. Vallet, andR. Lima, The biorthogonal decomposition as a tool forinvestigating ﬂuctuations in plasmas, Physics of Plasmas , 3288 (1994).[23] J. Levesque, N. Rath, D. Shiraki, S. Angelini, J. Bialek,P. Byrne, B. DeBono, P. Hughes, M. Mauel, G. Navratil, et al. , Multimode observations and 3D magnetic controlof the boundary of a tokamak plasma, Nuclear Fusion , 073037 (2013).[24] C. Galperti, C. Marchetto, E. Alessi, D. Minelli,M. Mosconi, F. Belli, L. Boncagni, A. Botrugno, P. Bu-ratti, B. Esposito, et al. , Development of real-time MHDmarkers based on biorthogonal decomposition of signalsfrom Mirnov coils, Plasma Physics and Controlled Fusion , 114012 (2014).[25] B. P. Van Milligen, E. S´anchez, A. Alonso, M. Pedrosa,C. Hidalgo, A. M. De Aguilera, and A. L. Fraguas, Theuse of the biorthogonal decomposition for the identiﬁca-tion of zonal ﬂows at TJ-II, Plasma Physics and Con-trolled Fusion , 025005 (2014).[26] C. Hansen, B. Victor, K. Morgan, T. Jarboe, A. Hos-sack, G. Marklin, B. Nelson, and D. Sutherland, Nu-merical studies and metric development for validationof magnetohydrodynamic models on the HIT-SI exper-iment, Physics of Plasmas , 056105 (2015).[27] D. D. Schnack, D. C. Barnes, D. P. Brennan, C. C.Hegna, E. Held, C. C. Kim, S. E. Kruger, A. Y. Pankin,and C. R. Sovinec, Computational modeling of fully ion-ized magnetized plasmas using the ﬂuid approximation,Physics of Plasmas , 058103 (2006).[28] Z. W. Ma and A. Bhattacharjee, Hall magnetohydrody-namic reconnection: The geospace environment model-ing challenge, Journal of Geophysical Research: SpacePhysics , 3773.[29] V. Krishan and S. M. Mahajan, Magnetic ﬂuctuationsand Hall magnetohydrodynamic turbulence in the solarwind, Journal of Geophysical Research: Space Physics .[30] F. Ebrahimi, B. Lefebvre, C. B. Forest, and A. Bhat-tacharjee, Global Hall-MHD simulations of magnetoro-tational instability in a plasma Couette ﬂow experiment,Physics of Plasmas , 062904 (2011).[31] N. M. Ferraro, Calculations of two-ﬂuid linear response tonon-axisymmetric ﬁelds in tokamaks, Physics of Plasmas , 056105 (2012). [32] A. A. Kaptanoglu, T. E. Benedett, K. D. Morgan, C. J.Hansen, and T. R. Jarboe, Two-temperature eﬀects inHall-MHD simulations of the HIT-SI experiment, Physicsof Plasmas , 072505 (2020).[33] S. L. Brunton and J. N. Kutz, Data-driven science andengineering: Machine learning, dynamical systems, andcontrol (Cambridge University Press, 2019).[34] G. H. Golub et al. , Matrix computations, The Johns Hop-kins (1996).[35] A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carloalgorithms for ﬁnding low-rank approximations, Journalof the ACM (JACM) , 1025 (2004).[36] E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin,and M. Tygert, Randomized algorithms for the low-rankapproximation of matrices, Proceedings of the NationalAcademy of Sciences , 20167 (2007).[37] F. Woolfe, E. Liberty, V. Rokhlin, and M. Tygert, A fastrandomized algorithm for the approximation of matrices,Applied and Computational Harmonic Analysis , 335(2008).[38] T. Jarboe, W. Hamp, G. Marklin, B. Nelson, R. O’Neill,A. Redd, P. Sieck, R. Smith, and J. Wrobel, Spheromakformation by steady inductive helicity injection, Physicalreview letters , 115003 (2006).[39] S. Galtier, Introduction to modern magnetohydrodynam-ics (Cambridge University Press, 2016).[40] J.-C. Loiseau, B. R. Noack, and S. L. Brunton, Sparsereduced-order modeling: sensor-based dynamics to full-state estimation, Journal of Fluid Mechanics , 459(2018).[41] P. Benner, S. Gugercin, and K. Willcox, A survey ofprojection-based model reduction methods for paramet-ric dynamical systems, SIAM review , 483 (2015).[42] S. Chaturantabut and D. C. Sorensen, Discrete empiricalinterpolation for nonlinear model reduction, in Proceed-ings of the 48th IEEE Conference on Decision and Con-trol (CDC) held jointly with 2009 28th Chinese Control Conference (IEEE, 2009) pp. 4316–4321.[43] Z. Drmac and S. Gugercin, A new selection operator forthe discrete empirical interpolation method—improveda priori error bound and extensions, SIAM Journal onScientiﬁc Computing , A631 (2016).[44] P. Astrid, S. Weiland, K. Willcox, and T. Backx, Missingpoint estimation in models described by proper orthog-onal decomposition, IEEE Transactions on AutomaticControl , 2237 (2008).[45] K. Willcox, Unsteady ﬂow sensing and estimation via thegappy proper orthogonal decomposition, Computers &Fluids , 208 (2006).[46] K. Carlberg, C. Farhat, J. Cortial, and D. Amsallem, TheGNAT method for nonlinear model reduction: eﬀectiveimplementation and application to computational ﬂuiddynamics and turbulent ﬂows, Journal of ComputationalPhysics , 623 (2013).[47] S. L. Brunton, J. L. Proctor, and J. N. Kutz, Discover-ing governing equations from data by sparse identiﬁca-tion of nonlinear dynamical systems, Proceedings of theNational Academy of Sciences , 3932 (2016).[48] P. J. Morrison and J. M. Greene, Noncanonical hamil-tonian density formulation of hydrodynamics and idealmagnetohydrodynamics, Physical Review Letters , 790(1980).[49] Z. Yoshida and E. Hameiri, Canonical hamiltonian me-chanics of hall magnetohydrodynamics and its limitto ideal magnetohydrodynamics, Journal of Physics A:Mathematical and Theoretical , 335502 (2013).[50] H. M. Abdelhamid, Y. Kawazura, and Z. Yoshida, Hamil-tonian formalism of extended magnetohydrodynamics,Journal of Physics A: Mathematical and Theoretical ,235502 (2015).[51] H. K. Chu and M. Hayashibe, Discovering interpretabledynamics by sparsity promotion on energy and the la-grangian, IEEE Robotics and Automation Letters5