[PDF] Quasioptical modeling of wave beams with and without mode conversion: I. Basic theory

Abstract

This work opens a series of papers where we develop a general quasioptical theory for mode-converting electromagnetic beams in plasma and implement it in a numerical algorithm. Here, the basic theory is introduced. We consider a general quasimonochromatic multi-component wave in a weakly inhomogeneous linear medium with no sources. For any given dispersion operator that governs the wave field, we explicitly calculate the approximate operator that governs the wave envelope ψ to the second order in the geometrical-optics parameter. Then, we further simplify this envelope operator by assuming that the gradient of ψ transverse to the local group velocity is much larger than the corresponding parallel gradient. This leads to a parabolic differential equation for ψ ("quasioptical equation") in the basis of the geometrical-optics polarization vectors. Scalar and mode-converting vector beams are described on the same footing. We also explain how to apply this model to electromagnetic waves in general. In the next papers of this series, we report successful quasioptical modeling of radiofrequency wave beams in magnetized plasma based on this theory.

Full PDF

aa r X i v : . [ phy s i c s . p l a s m - ph ] M a y Quasioptical modeling of wave beams with and without mode conversion:I. Basic theory

I. Y. Dodin, D. E. Ruiz, K. Yanagihara, Y. Zhou, and S. Kubo Princeton Plasma Physics Laboratory, Princeton, New Jersey 08543, USA Sandia National Laboratories, P.O. Box 5800, Albuquerque, New Mexico 87185, USA Nagoya University, 464-8601, Nagoya, Aichi, Japan National Institute for Fusion Science, National Institutes of Natural Sciences, 509-5292, Toki, Gifu, Japan

This work opens a series of papers where we develop a general quasioptical theory for mode-converting electromagnetic beams in plasma and implement it in a numerical algorithm. Here, thebasic theory is introduced. We consider a general quasimonochromatic multi-component wave ina weakly inhomogeneous linear medium with no sources. For any given dispersion operator thatgoverns the wave ﬁeld, we explicitly calculate the approximate operator that governs the waveenvelope ψ to the second order in the geometrical-optics parameter. Then, we further simplify thisenvelope operator by assuming that the gradient of ψ transverse to the local group velocity is muchlarger than the corresponding parallel gradient. This leads to a parabolic diﬀerential equation for ψ (“quasioptical equation”) in the basis of the geometrical-optics polarization vectors. Scalar andmode-converting vector beams are described on the same footing. We also explain how to apply thismodel to electromagnetic waves in general. In the next papers of this series, we report successfulquasioptical modeling of radiofrequency wave beams in magnetized plasma based on this theory. I. INTRODUCTIONA. Motivation

Describing the propagation of waves in inhomogeneousmedia is a classic problem with a long history [1–3]. It isparticularly important in fusion research, where quasis-tationary beams of electromagnetic (EM) radiation arecommonly used for many purposes and need to be mod-eled with ﬁdelity [4]. Full-wave modeling, which involvessolving the complete Maxwell’s equations, can be imprac-tical at short (cm and mm) wavelengths, especially whenmulti-dimensional simulations with complex geometriesare required, such as those of tokamak and stellaratorplasmas. Hence, reduced methods have been widely usedin practice. These methods are rooted in geometrical op-tics (GO) [1] and include conventional ray tracing [5–8],complex ray tracing [9, 10], beam tracing [11–14], andvariations of thereof [15]. There are also other “qua-sioptical” models, such as in Refs. [16–19], that resolvethe evolution of the beam transverse structure withoutadopting any particular ansatz for the intensity proﬁle.Still, they assume that only one branch of the dispersionrelation is excited in each given case [20], i.e., mode con-version does not occur [21]. As a result, today’s simula-tions of mode-converting beams mainly rely on full-wavecodes [19] and thus have to compromise the ﬁdelity byreducing the number of the dimensions resolved [8].However, simulating mode conversion does not actu-ally require the full-wave approach; in fact, it can bedone within the quasioptical approach too, if the latteris generalized properly. Developing the correspondingalgorithms would facilitate not only fusion applications.For example, advanced quasioptical models could ﬁnd usein optics and general relativity, where mode-convertingbeams are also possible and have been attracting much attention lately [22, 23]. Hence, it is potentially beneﬁ-cial to approach the problem of quasioptical modeling asa general-physics problem, without restricting it to fusionapplications.

B. General idea

Mode conversion can be described quite generally, par-ticularly without restricting it to narrow regions in space,within “extended geometrical optics” (XGO), which wasdeveloped recently [24–28]. XGO is a theory that calcu-lates the leading-order correction U to the GO dispersionoperator of a general vector wave and shows [26] thatthis correction is analogous to (and a generalization of)the Stern–Gerlach Hamiltonian of a quantum spin-1 / U is responsible for two eﬀects si-multaneously: (i) it modiﬁes the ray equations just likespin–orbital interactions aﬀect the electron motion, and(ii) it also governs mode conversion, which appears as adirect analog of spin up–down transitions [29]. An ex-amination of the quasioptical algorithms such as thosein Refs. [12, 16] shows that they already involve calcu-lations of terms similar to U . Hence, adding the mode-conversion capability to quasioptical codes should not beburdensome and is not expected to slow down the codesconsiderably. However, formulating the correspondingtheory is easier to do using the abstract quantumlike for-malism of XGO. One only needs to upgrade the existingXGO by adding diﬀraction, for which it also helps to in-troduce more general coordinates with curved metric [30].In this series of papers (Papers I-III), we propose suchan upgrade of XGO and apply it to numerical simula-tions. Our goal is to develop a general modular frame-work which later could be applied to a broad variety ofproblems both in plasma physics and beyond. Withinthis framework to be presented, one can assume generaldispersion, diﬀraction, and polarization eﬀects, includingmode conversion of not just two but arbitrarily many res-onant waves. Also importantly, our formulation below isnot restricted to EM waves, since we do not specify thedispersion operator in the governing wave equation.Our series of papers is organized as follows. In Paper I,we introduce the basic theory of waves that diﬀract andmode-convert simultaneously. In Paper II [31] and Pa-per III [32], we apply this theory to perform quasiopticalmodeling of radiofrequency-wave beams in magnetizedplasma as an example. In particular, we consider appli-cations to mode conversion caused by magnetic shear inedge plasma [29], which, for example, is a known problem[33, 34] in the Large Helical Device [35, 36]. C. Outline

In this ﬁrst paper of our series, we consider an ar-bitrary quasimonochromatic multi-component wave in aweakly inhomogeneous linear medium. Supposing that b D is some dispersion operator governing the wave dynam-ics, we simplify b D and obtain an approximate operatorthat governs the wave envelope ψ . Then, we derive aparabolic diﬀerential equation for ψ (“quasioptical equa-tion”) by assuming that the gradient of the wave envelopetransverse to the local group velocity is much larger thanthe corresponding parallel gradient. The resulting the-ory applies to both scalar and mode-converting vectorbeams. At the end of the paper, we also discuss how thismodel can be applied to EM waves in particular. How-ever, readers who are mainly interested in simulationsas opposed to the general theory are encouraged to pro-ceed straight to Paper II, where our key equations areoverviewed in a simpliﬁed form and without derivations.This paper is organized as follows. In Sec. II, we intro-duce the general problem. In Sec. III, we formalize theconcept of the envelope dispersion operator. In Sec. IV,we derive an approximation for the envelope dispersionoperator for scalar waves, and we also derive its quasiop-tical approximation. In Sec. V, we extend this model tovector waves. In Sec. VI, we explain how to apply theresulting theory to EM waves in particular. In Sec. VII,we present our main conclusions. In Appendix A, wesummarize some of our notations. In Appendix B, someauxiliary calculations are presented. Our paper also con-tains Supplementary Material [37]. There, we overviewthe Weyl calculus on a curved conﬁguration space, whichis used in this work. II. GENERAL PROBLEM

Consider a wave propagating on an n -dimensionalconﬁguration space M n with coordinates x ≡{ x , x , . . . , x n − } and some general metric tensor g ( x ).For simplicity, assume that M n is diﬀeomorphic to R n , i.e., the n -dimensional Euclidean space or pseudo-Euclidean space with the same metric signature as M n .(For more information on why the diﬀeomorphism with R n is needed, see Sec. IV A 2.) Suppose that the waveﬁeld Ψ ≡ Ψ( x ), which may have multiple components, isgoverned by a linear equation with no source terms, b D Ψ = 0 , (1)where b D is a diﬀerential or, most generally, integral dis-persion operator. (For vector waves, b D is a matrix whoseelements are operators; see Sec. V.) We shall assume thatthe GO parameter ǫ is small; namely, ǫ . = λ/L ≪ . = denotes deﬁnitions), where λ is the char-acteristic wave period, or wavelength, and L is the leastcharacteristic scale among those of the wave envelope andof the medium, including the metric [38]. Below, we pro-pose a systematic reduction of Eq. (1) using the smallnessof ǫ and eventually obtain a quasioptical model based onthis equation. The idea of quasioptical modeling will beformalized later (Secs. IV E and V F).As a side note, we emphasize that the theory to bedeveloped does not describe wave transformations nearcutoﬀs, where the GO parameter (2) is not small. It ispossible to waive this limitation, but formulating suchgeneralized theory is left to future publications. III. ENVELOPE DISPERSION OPERATOR

As the ﬁrst step, let us introduce a unitary variabletransformation Ψ = b U ψ, b U . = e iθ ( x ) . (3)(Other b U may also be justiﬁed in some cases, e.g., fordealing with caustics or quasiperiodic media [39], but weshall not consider this possibility in the present work.)The phase θ , which we call the “reference phase”, servesas a gauge potential. It is a real function such that k ( x ) . = ∇ θ ( x ) (4)is the wave vector identical or close to that predicted bythe GO approximation. This implies that the envelope ψ and also k are slow functions [and that the wavelengthentering Eq. (2) is λ ≈ π/k ]. Then, Eq. (1) becomes b D ψ = 0 , (5)where the “envelope dispersion operator” is b D . = b U † b D b U (the dagger denotes the adjoint, as usual), or more ex-plicitly, b D = e − iθ b D e iθ . (6)The reference phase θ is treated as a prescribed func-tion. As will become clear later, knowing θ per se is notactually needed for our purposes; instead, it is k thatmatters. The latter can be calculated on some “refer-ence rays” using the conventional ray equations [1]d x α d τ = ∂H∂k α , d k α d τ = − ∂H∂x α (7)( τ is any parameter along the ray), which also lead to H ( x , k ( x )) = const . (8)Note that, in general, θ and k are deﬁned uniquely only tothe leading order, so there exists some freedom in choos-ing reference rays and their Hamiltonian H . This meansthat more than one b D is possible. Still, envelope equa-tions that (slightly) diﬀer in the choice of k are equivalentin the sense that the total ﬁeld Ψ that they describe isthe same by construction. We shall discuss this in moredetail in Secs. V E and V G.Also note that our approach equally applies to station-ary and nonstationary waves. In the case of a stationarywave, we assume that x is a coordinate in physical space(“spatial problem”), and the wave frequency ω serves as aconstant parameter. In the case of a nonstationary wave,we assume that x is a coordinate in spacetime (“space-time problem”), and then ω is a part of k . In space-time problems, we assume coordinates such that x = ct ,where c is the speed of light, t is time, and the metricsignature is ( − , + , + , . . . ); then, k = − ω/c (in case ofthe Minkowski metric, this implies that k = ω/c ), so ω = − ∂ t θ , as usual. In other respects, spatial and space-time problems are described on the same footing and willbe distinguished only in Sec. VI.Having deﬁned this terminology, we shall now discusshow b D can be expanded in ǫ asymptotically for any b D . IV. SCALAR WAVES

Our asymptotic theory of mode-converting beams isbased on the well-known phase-space methods in quan-tum and classical wave theory. (The key papers include,but are not limited to, Refs. [40–46]; for recent overviews,see Refs. [1, 24].) In order to present our framework andnotation in a self-contained manner, we restate some ba-sics in Sec. IV A and Supplementary Material [37]. Alsonote that our calculations in Sec. IV B are similar in spiritto those in Ref. [45], except we perform a higher-order ex-pansion and assume a more general metric. We shall alsocomment throughout the paper on how our equations re-produce other known results in the corresponding limits.

A. Weyl calculus

1. Basic deﬁnitions

Until Sec. V, we shall assume that Ψ is a scalar func-tion. Any given operator b A acting on it maps Ψ to a new scalar function b A Ψ that can be expressed as follows:( b A Ψ)( x ) . = Z d n x ′ p g ⋄ ( x ′ ) A ( x , x ′ )Ψ( x ′ ) . (9)Here the integral is taken over R n (and so are all integralsbelow, up to dimension), g ⋄ . = | det g | , and A is some ker-nel function that determines b A . Consider also a family ofall unitary operators b A u , which is a subset of all possible b A . For a given Ψ, all image functions b A u Ψ are mutuallyequivalent up to an isomorphism, so their family { b A u Ψ } can be viewed as a single object, a “state vector” | Ψ i ,which belongs to a Hilbert space H n with inner product h Ψ | Φ i . = Z d n x p g ⋄ ( x ) Ψ ∗ ( x )Φ( x ) . (10)Then, Eq. (9) can be viewed as the “ x representation” of b A , while the operator itself can be understood more gen-erally as a transformation of | Ψ i , i.e., of the whole family { b A u Ψ } . Using this invariant notation, one can formulatea machinery, called the Weyl calculus [47], that allows ef-ﬁcient asymptotic approximation of operators using thesmallness of ǫ . Below, we overview the key theorems ofthe Weyl calculus that are used in our paper. Readerswho are interested in details and proofs of these theoremscan ﬁnd them in the Supplementary Material [37].

2. Coordinate and momentum operators

We start by deﬁning the coordinate and momentum(wave-vector) operators b x = { b x , b x , . . . , b x n − } , (11) b p = { b p , b p , . . . , b p n − } (12)such that the x representations of b x µ and b p µ be as follows: b x µ Ψ = x µ Ψ , (13) b p µ Ψ = − ig − / ⋄ ∂ µ ( g / ⋄ Ψ) . (14)Here ∂ µ . = ∂/∂x µ , x µ and p µ are the corresponding eigen-values, and the factors g ± / ⋄ are introduced to keep b p self-adjoint under the inner product (10) [48]. (Thiswould not be the case if M n were not diﬀeomorphic to R n [49].) Since b p ν = − i∂ ν − iq ν ( x ) , q ν . = 14 ∂ ν (ln g ⋄ ) , (15)and q ν commutes with b x µ , one arrives at the usual com-mutation relation [ b x µ , b p ν ] = iδ µν .Let us consider the eigenvectors | x i and | p i of the co-ordinate and momentum operators, which are deﬁned as b x | x i = x | x i , b p | p i = p | p i . (16)Since the operators are self-adjoint, these eigenvectorscan be chosen as mutually orthogonal, and we shall as-sume the following normalization: h x | x i = G ( x , x ) δ ( x − x ) , (17) h p | p i = ¯ G ( p , p ) δ ( p − p ) . (18)Here we introduced G ( x , x ) . = [ g ⋄ ( x ) g ⋄ ( x )] − / , (19)¯ G ( p , p ) . = [¯ g ⋄ ( p )¯ g ⋄ ( p )] − / . (20)The function ¯ g ⋄ can be chosen arbitrarily as long as it iskept positive. It plays a role similar to that of g ⋄ in theWeyl calculus, but note that this is just a normalizationfactor, and we introduced it only to maintain the sym-metry between b x and b p . One can show then [37] that ouroriginal function Ψ and the envelope ψ can be expressedthrough the corresponding state vectors asΨ( x ) = h x | Ψ i , ψ ( x ) = h x | ψ i . (21)The p representations of these state vectors are intro-duced similarly as h p | Ψ i and h p | ψ i . One can also show[37] that h x | b A | x i = A ( x , x ), and h x | p i = h p | x i ∗ = exp( i p · x ) , (2 π ) n/ [ g ⋄ ( x )¯ g ⋄ ( p )] / . (22)Here, p · x . = p µ x µ , and summation over repeated indicesis assumed here and further.

3. Wigner–Weyl transform

The set of all eigenvalues of the coordinate and mo-mentum operators form a 2 n -dimensional “phase space” z ≡ ( x , p ). For every given z , we introduce the so-called Wigner operator b ∆ z , which is self-adjoint and deﬁned as b ∆ z . = Z d n s G ( x , s ) | x − s / i h x + s / | e − i p · s , (23) G ( x , s ) . = [ g ⋄ ( x − s / g ⋄ ( x + s / / . (24)(Note that G = 1 if the x space is Euclidean or pseudo-Euclidean.) Using b ∆ z , we deﬁne the Wigner–Weyl trans-form W z : b A A , which maps a given operator b A on H n to a function A (“Weyl symbol”) on the z space.Speciﬁcally, the Weyl symbol of a given operator b A is A ( x , p ) . = tr( b ∆ z b A ) (“tr” stands for trace); i.e., A ( x , p ) . = Z d n s G ( x , s ) h x + s / | b A | x − s / i e − i p · s . (25)As can be seen easily from this deﬁnition, if a givenoperator is self-adjoint, then its Weyl symbol is real. Thisalso leads to the following corollary. Consider splitting a given b A as b A = b A H + i b A A , where the subscripts denotethe Hermitian and anti-Hermitian parts, b A H . = 12 ( b A + b A † ) , b A A . = 12 i ( b A − b A † ) . (26)Both b A H and b A A (not to be confused with i b A A ) areself-adjoint by deﬁnition. Thus, the corresponding Weylimages A H and A A are real.We also deﬁne the inverse Wigner–Weyl transform W − : A b A , which maps a given function A to thecorresponding operator b A via b A = 1(2 π ) n Z d n x d n p A ( x , p ) b ∆ z . (27)The direct and inverse transforms set the “Weyl cor-respondence” between operators and functions on the( x , p ) space, b A ⇔ A ( x , p ). As can be checked by a directcalculation, for any function f , one has f ( b x ) ⇔ f ( x ) , f ( b p ) ⇔ f ( p ) . (28)However, the Weyl symbols of operators that cannot berepresented as f ( b x ) + f ( b p ) are generally more compli-cated. In particular, one can show that [37] f ( b x ) b p α ⇔ p α f ( x ) + i ∂ α f ( x ) , (29) b p α f ( b x ) ⇔ p α f ( x ) − i ∂ α f ( x ) , (30)and also [37] b p α f ( b x ) b p β ⇔ p α p β f ( x ) + 14 ∂ αβ f ( x )+ i p α ∂ β f ( x ) − i p β ∂ α f ( x ) . (31)Overall, the Weyl symbol of an operator that is any givencombination f ( b x , b p ) of b x and b p approaches f ( x , p ) in theGO limit, when [ b x µ , b p ν ] is negligible. However, in general, f ( b x , b p ) does not map simply to f ( x , p ). B. Approximate b D Using the notation introduced in Sec. IV A, one canexpress Eq. (1) in the following invariant form: b D | Ψ i = 0 . (32)Accordingly, Eq. (5) becomes b D | ψ i = 0 , (33)where the invariant form of the envelope dispersion op-erator [Eq. (6)] is as follows: b D = e − iθ ( b x ) b D e iθ ( b x ) . (34)We shall now approximate b D in three steps: (i) we mapthe right-hand side of Eq. (34) onto a function, or symbol,using the Wigner–Weyl transform; (ii) we approximatethis symbol using the smallness of ǫ [Eq. (2)]; and (iii)we produce an operator out of the approximated symbolusing the inverse Wigner–Weyl transform.The ﬁrst two steps can be done by expressing the Weyl symbol of b D as D ( x , p ) = e − iθ ( x ) ⋆ D ( x , p ) ⋆ e iθ ( x ) , where D is the Weyl symbol of b D and ⋆ is the Moyal product[37]. By expanding ⋆ in the GO parameter, one can thenobtain an approximate symbol of b D formally. But herewe adopt a diﬀerent approach, which is more transparent.We start by expressing b D through D explicitly, b D = 1(2 π ) n Z d n x d n p d n s G ( x , s ) | x − s / i e − i p · s D ( x , p ) h x + s / | , (35)where G is a metric factor given by Eq. (24). Using the fact that θ ( b x ) | x ± s / i = θ ( x ± s / | x ± s / i , one can rewriteEq. (34) as follows: b D = 1(2 π ) n Z d n x d n p d n s G ( x , s ) e i [ θ ( x + s / − θ ( x − s / − p · s ] | x − s / i D ( x , p ) h x + s / | . (36)Consider a formal Taylor expansion of the reference phase θ in s : θ ( x ± s /

2) = θ ( x ) ± s α ∂θ ( x ) ∂x α + 12 s α s β ∂ θ ( x ) ∂x α ∂x β ± s α s β s γ ∂ θ ( x ) ∂x α ∂x β ∂x γ + . . . (37)This gives θ ( x + s / − θ ( x − s /

2) = s α ∂θ ( x ) ∂x α + s α s β s γ ∂ θ ( x ) ∂x α ∂x β ∂x γ + . . . = k ( x ) · s + s α s β s γ ∂ k γ ( x ) ∂x α ∂x β + . . . , (38)so e i [ θ ( x + s / − θ ( x − s / − p · s ] = (cid:20) i s α s β s γ ∂ k γ ( x ) ∂x α ∂x β + . . . (cid:21) e i [ k ( x ) − p ] · s = (cid:20) ∂ k γ ( x ) ∂x α ∂x β ∂ ∂p α ∂p β ∂p γ + . . . (cid:21) e i [ k ( x ) − p ] · s . (39)This leads to the following expression for the envelope dispersion operator: b D = 1(2 π ) n Z d n x d n p d n s G ( x , s ) (cid:26)(cid:20) ∂ k γ ( x ) ∂x α ∂x β ∂ ∂p α ∂p β ∂p γ + . . . (cid:21) e i [ k ( x ) − p ] · s (cid:27) | x − s / i D ( x , p ) h x + s / | = 1(2 π ) n Z d n x d n p d n s G ( x , s ) e i [ k ( x ) − p ] · s | x − s / i (cid:20) D ( x , p ) − ∂ k γ ( x ) ∂x α ∂x β ∂ D ( x , p ) ∂p α ∂p β ∂p γ + . . . (cid:21) h x + s / | = 1(2 π ) n Z d n x d n p d n s G ( x , s ) e − i p · s | x − s / i D ′ (x , k ( x ) + p) h x + s / | , (40)where we integrated by parts and introduced an “eﬀective dispersion function” D ′ ( x , p ) . = D ( x , p ) − ∂ k γ ( x ) ∂x α ∂x β ∂ D ( x , p ) ∂p α ∂p β ∂p γ + . . . (41)The ﬁrst term on the right-hand side of Eq. (41) is O (1), because at ǫ →

0, it becomes the GO dispersion function,which is assumed order-one [50]. (At nonzero ǫ , the Weyl symbol D generally does not coincide with the dispersionfunction that governs the wave in a homogeneous medium; see Sec. IV A 3.) The second term can be estimated byassuming ∂/∂x ∼ /L and ∂/∂p ∼ /k , since we are interested in D ′ ( x , p ) at p close to k . Then, ∂ k γ ∂x α ∂x β ∂ D∂p α ∂p β ∂p γ ∼ kL Dk ∼ ǫ D, (42)meaning that the second term in Eq. (41) is roughly ǫ times smaller than the ﬁrst one. This fact will be used below.The inverse Wigner–Weyl transform will turn the coordinate p into the operator b p . The latter will act on thewave envelope [Eq. (33)], which is considered slow in the coordinate representation, so b p | ψ i = O ( ǫ ). In this sense, p = O ( ǫ ), so by Taylor-expanding D ′ in p , we are eﬀectively expanding b D in ǫ . It is suﬃcient for our purposes toadopt the second-order expansion, b D ≈ π ) n Z d n x d n p d n s G ( x , s ) e − i p · s | x − s / i h D ′ ( x ) + p µ V µ ( x ) + 12 p µ p ν Θ µν ( x ) i h x + s / | , (43)where we introduced D ′ ( x ) . = D ′ (x , k ( x ) ) and V µ ( x ) . = ∂D ′ (x , k ( x ) ) ∂k µ , (44)Θ µν ( x ) . = ∂ D ′ (x , k ( x ) ) ∂k µ ∂k ν . (45)By properties of the inverse Wigner–Weyl transform W − (Sec. IV A 3), b D can be written as follows: b D ≈ W − [ D ′ ( x )] + W − [ p µ V µ ( x )]+ 12 W − [ p µ p ν Θ µν ( x )] . (46)The inverse Wigner–Weyl transforms here can be calcu-lated using Eqs. (28)-(31) and Θ µν = Θ νµ . This leads to b D ≈ D ′ ( b x ) + 12 [ b p µ V µ ( b x ) + V µ ( b x ) b p µ ]+ 12 b p µ Θ µν ( b x ) b p ν −

18 ( ∂ µν Θ µν )( b x ) . (47)One can further simplify Eq. (47) as follows. Supposethat the medium, including the metric, is characterizedby some parameters P . Within the accuracy of the theorydeveloped in this paper, we shall neglect terms involving ∂ P and ( ∂ P )( ∂ψ ) while generally retaining terms suchas ∂ ψ (as to be explained Sec. IV E 2), where ∂ denotesa generic spatial derivative. Hence, we can ignore the dif-ference between D and D ′ , and we can also omit ∂ µν Θ µν in Eq. (47), so b D can be expressed as follows: b D ≈ D ( b x ) + 12 [ b p µ V µ ( b x ) + V µ ( b x ) b p µ ] + 12 b p µ Θ µν ( b x ) b p ν . (48)Also, V µ and Θ µν can be calculated through the zeroth- ǫ limit of D , denoted D , which is just the dispersionfunction of the homogeneous medium: V µ ( x ) ≈ ∂D (x , k ( x ) ) ∂k µ , Θ µν ( x ) ≈ ∂ D (x , k ( x ) ) ∂k µ ∂k ν . Importantly, D ( x , k ) can depend only on k and on localparameters of the medium (but not on their derivatives).As a gradient of a scalar [Eq. (4)], k is a true covector;hence, D ( x , k ) is a true scalar. This makes V µ a vectorand Θ µν a tensor within the assumed accuracy.As a side remark, note that vectors like V µ , which aredenoted with upper indices, belong to the space tangent to M n . (Such vectors must not be confused with multi-component ﬁelds denoted with Latin indices in Sec. V.)Similarly, covectors like k µ , which are denoted with lowerindices, belong to the space cotangent to M n . For anygiven X , one does not have to distinguish X µ and X µ only if the metric is Euclidean. In the case of a generalmetric g µν , one has X µ = g µν X ν , X µ = g µν X ν . (49)where the matrix g µν is the inverse of g µν . Similar rulesapply to manipulating tensors, as usual. C. Approximate envelope equation

Let us derive the x representation of Eq. (48), whichwe need to obtain an explicit form of Eq. (5). From the x representation of b x and b p given by Eqs. (13) and (14),terms like b p f ( b x ) (for any f ) must be interpreted as[ b p µ f ( b x )] ψ = − ig − / ⋄ ∂ µ [ g / ⋄ f ( x ) ψ ] , (50)where g and ψ are also functions of x . Then, b D ψ ≈ Dψ − i V µ ∂ µ ψ − i V µ ; µ ψ − g / ⋄ ∂ µ [Θ µν ∂ ν ( g / ⋄ ψ )] , (51)where all terms are functions of x , and ; µ denotes thecovariant derivative with respect to x µ , so X µ ; µ (for anygiven vector X µ ) is the divergence; namely, X µ ; µ . = 1 √ g ⋄ ∂∂x µ ( √ g ⋄ X µ ) . (52)Except for Dψ , all terms in Eq. (51) are covariant(i.e., invariant with respect to coordinate transforma-tions) within the accuracy of our theory. Hence, thesymbol D is also covariant within the accuracy of ourtheory [this includes not just the zeroth-order disper-sion function D but also the leading-order correction D − D = O ( ∂ P ) that may appear in inhomogeneousmedium], even though this may not be obvious directlyfrom the deﬁnition of the Weyl transform (Sec. IV A).To emphasize this fact, and also to ensure consistency ofnotation with the vector-wave theory to be discussed inthe later sections, we denote this covariant function as D ,and we also adopt the notation D H . = Re D , D A . = Im D . (53)Then, the approximate form of the envelope equation (5)can be summarized as follows:( b D H + i b D A ) ψ = 0 . (54)Here, b D H and b D A are, respectively, the Hermitian andanti-Hermitian parts of the approximated b D , namely, b D H ψ = D H ψ − iV µ ψ ,µ − i V µ ; µ ψ − g / ⋄ [ D | µνH ( g / ⋄ ψ ) ,ν ] ,µ , (55) b D A ψ = D A ψ − i V µA ψ ,µ − i V µA ; µ ψ − g / ⋄ [ D | µνA ( g / ⋄ ψ ) ,ν ] ,µ (56)[from now on, we do not emphasize the approximate na-ture of Eq. (51), so ≈ will be replaced with =], and V µ . = V µH = D | µH , (57)where we assume a standard notation f ,µ . = ∂f∂x µ , f ,µν . = ∂ f∂x µ ∂x ν . (58)Note that ,µ is the “full” derivative in the sense that, forany given f (x , k ( x ) ) , it applies both to the ﬁrst and thesecond argument of f ; namely, the chain rule leads to f ,µ (x , k ( x ) ) = f | µ + f | ν k ν | µ . (59)The true partial derivatives are introduced as follows: f | µ . = ∂f ( x , k ) ∂x µ , f | µν . = ∂ f ( x , k ) ∂x µ ∂x ν , (60) f | µ . = ∂f ( x , k ) ∂k µ , f | µν . = ∂ f ( x , k ) ∂k µ ∂k ν . (61)The derivatives ,µ and | µ are equivalent for functions thatdepend only on x . In particular, k ν ≡ ∂ ν θ ( x ) satisﬁes k ν | µ = k ν,µ = k µ,ν = θ ,µν = θ ,νµ . (62) D. Leading-order approximation:geometrical optics

For the envelope approximation to hold, i.e., for ψ ( x )to remain a slow function, k ( x ) must be chosen suchthat D (x , k ( x ) ) ψ . O ( ǫ ). [The other terms in Eq. (5) are automatically small on the account of Eq. (2).] Letus adopt the usual GO ordering [51] D H = O (1) , D A = O ( ǫ ) , (63)or more rigorously, D A . O ( ǫ ); i.e., D A much smallerthan O ( ǫ ) is also allowed [52]. Then, one can deﬁne k ( x )such that D H (x , k ( x ) ) = 0 . (64)This can be achieved by calculating k as discussed inSec. III with the ray Hamiltonian H = D H and the initialcondition [at any chosen x (0) and k (0) ] such that D H (x (0) , k (0) ) = 0 . (65)To the ﬁrst order in ǫ , Eq. (5) is V µ ψ ,µ + 12 V µ ; µ ψ = D A ψ, (66)and as a corollary, J µ ; µ = − D A | ψ | , J µ . = − V µ | ψ | . (67)At zero D A , the dynamics is conservative, and Eq. (67)reﬂects conservation of the wave action, or quanta [53].Accordingly, D A determines the dissipation rate, J µ canbe identiﬁed (at least, up to a constant factor) as theaction ﬂux density, and V µ is proportional to the groupvelocity (in space or in spacetime, depending on the prob-lem). Equations (64)-(67) coincide with the equations ofthe traditional GO theory [1, 53]. Below, we extend themby retaining the second-order terms neglected in Eq. (66). E. Quasioptical model

1. Ray-based coordinates

Let us start by introducing ray-based coordinates asfollows [54]. Suppose multiple rays launched with dif-ferent x (0) within the beam. Their trajectories can befound using the ray equations as discussed in Sec. IV D;this determines the path ζ along each ray as a functionof the initial coordinate. We treat ζ as the longitudinalcoordinate along the wave beam, and each of its isosur-faces is considered as the transverse space M n − ⊥ ( ζ ). Thecoordinates on this space, ̺ ≡ { ̺ , ̺ , . . . , ̺ n − } , can beintroduced arbitrarily (yet they will be speciﬁed below).Then, we deﬁne n − e σ via e σ . = ∂ x /∂̺ σ (68)and another vector e such that it is linearly-independentfrom all e σ . Hence, any d x can be decomposed as follows:d x = e d ζ + e σ d ̺ σ , (69)and summation over repeated indices σ (and ˜ σ ) is hence-forth assumed from 1 to n − { e µ } . (We treatvectors and covectors on the same footing; namely, X · Y = X µ Y µ = X µ Y µ , and X = Y equally means X µ = Y µ and X µ = Y µ .) By deﬁnition, e µ · e ν = δ µν , (70)so we adopt e . = ∇ ζ . Then, the general form of e is e = ± α e + β , (71)where the ﬁrst sign is determined by the metric signature, α . = | e · e | − / , β . = e σ β σ , and β σ are arbitrary coeﬃ-cients. This leads to the following metric representationin the coordinates { ζ, ̺ } : g = ± α + β ⊺ hβ ( hβ ) ⊺ hβ h ! , (72)Here, h is the matrix with elements h σ ˜ σ . = e σ · e ˜ σ , and ⊺ denotes transposition. By using a known theorem forthe determinant of a block matrix [55], one obtains g ⋄ = α h ⋄ , h ⋄ . = | det h | . (73)Let us also introduce the transverse projection of d ̺ ,d ̺ ⊥ . = ( ∓ α e e )d ̺ , d ̺ σ ⊥ = d ̺ σ + β σ d ζ, (74)where is a unit matrix and e e is a dyad formed outof e . Then, one obtains from Eq. (69) thatd x · d x = ± α (d ζ ) + h σ ˜ σ d ̺ σ ⊥ d ̺ ˜ σ ⊥ , (75)so h σ ˜ σ serves as the transverse metric. Below, we assume β = 0, so ̺ σ ⊥ and ̺ σ do not need to be distinguished andwe can deﬁne the inner product on M n − ⊥ as follows: h ψ | φ i ⊥ . = Z d n − ̺ p h ⋄ ( ζ, ̺ ) ψ ∗ ( ζ, ̺ ) φ ( ζ, ̺ ) . (76)We also choose transverse coordinates speciﬁcally suchthat ̺ = const. In other words, the coordinate ̺ of agiven point in space is the initial location of the ray thatarrives at this point from the initial transverse surface M n − ⊥ (0). Then, the transverse group velocity is zero, V σ ≡ e σ · d x d τ = d ̺ σ d τ = 0 , (77)where we used Eqs. (69) and (70). This simpliﬁes ourequations below. A diﬀerent deﬁnition of ̺ is assumedin Papers II and III, leading to straightforward modiﬁ-cations of the equations, which are not discussed below.

2. Quasioptical equation

Suppose a wave beam such that its longitudinal scale L k , which is deﬁned via ψ ,ζ ∼ ψ/L k , is much larger than its perpendicular scale L ⊥ , which is deﬁned via ψ ,σ ∼ ψ/L ⊥ and may or may not be the beam globalwidth. (We assume the notation ψ ,ζ . = ∂ψ/∂ζ and ψ ,σ . = ∂ψ/∂̺ σ .) We introduce two GO parameters, ǫ k . = λ/L k , ǫ ⊥ . = λ/L ⊥ , ǫ k ≪ ǫ ⊥ (78)[so the original parameter (2) is ǫ = ǫ ⊥ ], and we shallneglect terms smaller than O ( ǫ k ǫ ⊥ ) from now on. Also,we require that either α is chosen as independent of ̺ or, more generally, the relative variation of α across thewave beam is small enough such that α ,σ is negligible.We also assume( D H ) | µ . O ( ǫ k ) , g αβ,µ . O ( ǫ k ) , (79)where the derivatives can be taken over the longitudi-nal or transverse coordinates. (Having g αβ,µ ∼ /L k isindeed typical in typical applications [31].)Under these assumptions, Eq. (55) can be approxi-mated as follows [assuming also Eq. (64)]: b D H ψ = − iV ψ ,ζ − i √ g ⋄ ( √ g ⋄ V ) ,ζ ψ + b G ψ. (80)Here, V . = V , and we used that V µ ψ ,µ = V ψ ,ζ due toEq. (77). The operator b G is self-adjoint under the innerproduct (76) and given by b G ψ . = − h / ⋄ [ D | σ ˜ σH ( h / ⋄ ψ ) , ˜ σ ] ,σ . (81)(We kept only the transverse derivatives in this term be-cause the longitudinal ones are beyond the assumed ac-curacy.) Similarly, Eq. (56) becomes b D A ψ = D A − i V σA ψ ,σ − i √ h ⋄ ( p h ⋄ V σA ) ,σ ψ, (82)where the higher-order terms are neglected because D A is assumed small [Eq. (63)]. Although the last term inEq. (82) is negligible within the assumed accuracy, it isretained anyway to make b D A (not to be confused with i b D A ) exactly self-adjoint under the inner product (76).Then, Eq. (5) becomes iV ψ ,ζ + i √ g ⋄ ( √ g ⋄ V ) ,ζ ψ − b G ψ = i b D A ψ, (83)and, as a corollary,1 α dd ζ h ψ | αV | ψ i ⊥ = 2 h ψ | b D A | ψ i ⊥ , (84)where we used the fact that b G and b D A are self-adjoint.In particular, for zero D A , Eq. (84) predicts conservationof the wave-action ﬂux through the beam cross section, h ψ | αV | ψ i ⊥ = const . (85)Equation (84) indicates that b D A cannot be much largerthan ∂/∂ζ , so the correct scaling to assume for the dissi-pation term is D A . O ( ǫ k ). Then, Eq. (83) leads to the“quasioptical scaling” ǫ k ∼ ǫ ⊥ . (86)Hence, the diﬀerence between b D A and D A in Eq. (82)is . O ( ǫ ), so it is beyond the accuracy of our theory,which is O ( ǫ ). For this reason, we henceforth adopt [56] b D A = D A , (87)which will also simplify our notation. [Also, as a re-minder, ∂ µ g αβ . O ( ǫ k ) ∼ ǫ ⊥ , and the parameters of themedium vary similarly.] This leads o iV ψ ,ζ + i √ g ⋄ ( √ g ⋄ V ) ,ζ ψ − b G ψ = i D A ψ. (88)Equation (88) is a general quasioptical equation for ascalar-wave beam in an inhomogeneous medium. Sinceit is a parabolic equation (it contains only the ﬁrst-orderderivative with respect to ζ ), Eq. (88) is much easier tosolve than Eq. (5) with the original expressions for b D H and b D A given by Eqs. (55) and (56).

3. Simpliﬁed equations

A typical metric of interest for quasioptical modelinghas the form g µν = δ µν + R µνσ ̺ σ with R µνσ = O ( L − k )[31]. This satisﬁes the assumed scaling (79) and corre-sponds to ln g ⋄ = O ( L ⊥ /L k ). Then,(ln g ⋄ ) ,ζ ∼ L ⊥ /L k = O ( ǫ / k ) , (89)which is negligible. [In contrast, (ln g ⋄ ) ,σ = O ( ǫ k ).] Thederivatives of g ⋄ in Eq. (81) are negligible too. Hence,the metric factors in the quasioptical equation (88) canbe replaced with unity. (However, the eﬀect of the metricon the dispersion matrix may remain important in thiscase; see Sec. VI.) Then, Eq. (88) becomes iV ψ ,ζ + i V ,ζ ψ + 12 (cid:16) D | σ ˜ σH ψ , ˜ σ (cid:17) ,σ = i D A ψ. (90)Using the variable transformation φ . = √ V ψ , this equa-tion can be further simpliﬁed as follows: iφ ,ζ + 12 (Σ σ ˜ σ φ , ˜ σ ) ,σ = i Υ φ, (91)which is just a dissipative Schr¨odinger equation withΣ σ ˜ σ . = D | σ ˜ σH /V, Υ . = D A /V. (92)[In Eq. (91), we used that V ,σ φ , ˜ σ is negligible comparedto V φ , ˜ σσ = O ( ǫ ⊥ ).] Also, Eq. (84) becomesdd ζ h φ | φ i ⊥ = 2 h φ | Υ | φ i ⊥ . (93) Note that Σ can also be expressed alternatively as fol-lows. Suppose one ﬁnds the group velocity and constructsa ray-based metric (as described in Sec. IV E 1) in thevicinity of a given x . Then, Eq. (64) can be consid-ered as an equation for the local k as a function of thetransverse wave-vector components k σ (and x ) in thisprescribed metric; i.e., D H (x , k σ , k ( x , k σ ) ) = 0 . (94)By diﬀerentiating this with respect to k σ , we obtain k | σ = − D | σH / D | ζH . By diﬀerentiating the latter formulaonce again, now with respect to k ˜ σ , we also obtain k | σ ˜ σ = − D | ζH (cid:16) D | ζζH k | σ k | ˜ σ + D | ζ ˜ σH k | σ + D | ζσH k | ˜ σ + D | σ ˜ σH (cid:17) . (95)Recall that D | σH = V σ and V σ = 0 [Eq. (77)]. Then, k | σ = 0, and by comparing Eq. (95) with Eq. (92) andusing Eq. (57), one ﬁnds that k | σ ˜ σ = − Σ σ ˜ σ . Thus,Eq. (91) can be rewritten as iφ ,ζ −

12 ( k | σ ˜ σ φ , ˜ σ ) ,σ = i Υ φ. (96)In the case of a spacetime problem, k | σ ˜ σ can also beexpressed through the wave frequency in the laboratoryframe. Then, the familiar Schr¨odinger equation for thewave envelope [57] can be reproduced. We do not discussit further for this subject is not essential to our paper. V. VECTOR WAVESA. Hilbert space for vector waves

Now, let us generalize the above results to vectorwaves. Suppose an m -component wave ﬁeld Ψ ≡ Ψ( x ), soΨ =  Ψ Ψ ... Ψ m  , | Ψ i =  | Ψ i| Ψ i ... | Ψ m i  . (97)Here, | Ψ i is a vector on some Hilbert space H nm for whichwe assume the following general inner product: h Ψ | Φ i m . = Z d n x p g ⋄ ( x ) γ ab ( x )Ψ a ∗ ( x )Φ b ( x ) . (98)Here, the Latin indices that characterize the ﬁeld com-ponents span from 1 to m . (This is in contrast with theGreek indices introduced earlier, which characterize thecomponents of x and span from 0 to n − γ ( x ) can be any m × m symmetric matrix. It canbe considered as an additional metric. This metric doesnot have to be the same as g , as seen already from thefact that m and n do not have to be the same. (For0example, Ref. [27] describes a six-dimensional ﬁeld ona three-dimensional spacetime; also see Sec. VI B.) Thismeans that the space to which Ψ( x ) belongs is not neces-sarily the space tangent to M n . That said, having γ = g is possible as a special case; see Secs. VI A and VI C.Using γ as a metric, we introduce the standard rulesfor manipulating the Latin indices,Ψ a . = γ ab Ψ b , Ψ a = γ ab Ψ b , (99)where γ ab are elements of γ − . In particular, in the x representation, one hasΨ † ( x ) . = h Ψ | x i = (Ψ ∗ , Ψ ∗ , . . . , Ψ ∗ m ) , (100)and we shall also use the following local dot product:Ψ · Φ . = Ψ † ( x )Φ( x ) = Ψ ∗ a ( x )Φ a ( x ) . (101)Under this dot product, a matrix A with mixed-index el-ements A ab is self-adjoint [( A Ψ) · Φ = Ψ · ( A Φ)] if the ma-trix with the corresponding lower-index elements, A ab , isHermitian. The diﬀerence between Hermitian and self-adjoint matrices can be ignored if the metric γ is Eu-clidean or pseudo-Euclidean. B. Operators on H nm Any operator b A on H nm can be understood as an m × m matrix with elements b A ab which are operators on H n . Forany given b A ab , we introduce b A ab . = γ ac b A cb , which is alsoan operator on H n . Correspondingly, two types of Weylimages can be deﬁned, namely, W z [ b A ab ] and W z [ b A ab ]. Be-low, we assume the notation A ab . = W z [ b A ab ] , A ab . = γ ac A cb . (102)Also, A will be an index-free notation of the matrix withelements A ab , and A will be an index-free notation of thematrix with elements A ab . Importantly, A ab should notbe confused with W z [ b A ab ], because the Weyl symbols donot transform as tensors, W z [ b A ab ] = γ ac W z [ b A cb ] , (103)except for those which are independent of p . Symbols W z [ b A ab ] will not be used below, so no special notation isintroduced for them.Note that by Eq. (98), the following equality holds forany Φ, Ψ, and b A on H nm : h Ψ | b A Φ i m = Z d n x √ g ⋄ γ ab Ψ a ∗ ( b A bc Φ c )= Z d n x √ g ⋄ Ψ a ∗ ( b A ab Φ b )= h Ψ a | b A ab Φ b i , (104) where we invoked the deﬁnition of the inner product on H n [Eq. (10)]. By deﬁnition of the adjoint operator b A † , h Ψ | b A Φ i m = h b A † Ψ | Φ i m = Z d n x √ g ⋄ γ cb [( b A † ) ca Ψ a ] ∗ Φ b = Z d n x √ g ⋄ [( b A † ) ba Ψ a ] ∗ Φ b . (105)In order to express b A † through b A , let us represent b A ab in terms of the corresponding Weyl image A ab and theWigner operator b ∆ z on H n , with z ≡ ( x , p ) (Sec. IV A 3).Then, Eq. (105) leads to h Ψ | b A Φ i m = Z d n z ′ (2 π ) n A ab ( z ′ ) h Ψ a | b ∆ z ′ Φ b i = Z d n z ′ (2 π ) n h A ∗ ab ( z ′ ) b ∆ z ′ Ψ a | Φ b i , (106)where we used that b ∆ z ′ is self-adjoint on H n . By com-paring this with Eq. (105), one ﬁnds that, on one hand,( b A † ) ba = Z d n z ′ (2 π ) n A ∗ ab ( z ′ ) b ∆ z ′ . (107)On the other hand, by Eq. (27) for the inverse Wigner–Weyl transform, one has( b A † ) ba = Z d n z ′ (2 π ) n W z ′ [( b A † ) ba ] b ∆ z ′ . (108)Hence, the Weyl image of b A † is simply the conjugatetranspose of the Weyl image of b A in the sense that W z [( b A † ) ab ] = A ∗ ba ( z ) . (109)In other words, for operators with both indices lowered ,the operations W z and † commute.Equation (109) implies that, if b A is self-adjoint on H nm ,then A is a Hermitian matrix, and thus the correspond-ing matrix A is self-adjoint (Sec. V A). In particular,this means that the Hermitian and anti-Hermitian parts[Eq. (26)] of an operator are determined by, respectively,the Hermitian and anti-Hermitian parts of its lower-indexWeyl symbol. We summarize this as follows:( A H ) ab = 12 γ ac ( A cb + A ∗ bc ) , (110)( A A ) ab = 12 i γ ac ( A cb − A ∗ bc ) . (111)For lower-index matrices, the Hermitian and anti-Hermitian parts are deﬁned as usual:( A H ) ab = 12 ( A ab + A ∗ ba ) , (112)( A A ) ab = 12 i ( A ab − A ∗ ba ) . (113)1 C. Weyl expansion of the dispersion operator

Now, let us consider the dispersion operator b D in par-ticular. According to the above deﬁnition, it is viewedas an m × m matrix with elements b D ab . By lowering theindex in the envelope equation (1), one can write thisequation as follows: b D ab Ψ b = 0 , a = 1 , , . . . m, (114)where b D ab . = γ ac b D cb . Equivalently, this can be writtenas the envelope equation b D ab ψ b = 0 , b D ab . = e − iθ b D ab e iθ . (115)Then, the approximate operators b D ab can be expressedjust like b D for scalar waves (Sec. IV B), b D ab ≈ D ab + 12 [ b p µ ( V µ ) ab + ( V µ ) ab b p µ ] + 12 b p µ (Θ µν ) ab b p ν . (116)Here, the coeﬃcients are x -dependent matrices; speciﬁ-cally, D ab ( x ) . = D ab (x , k ( x ) ) , and[ V µ ( x )] ab . = ∂D ab (x , k ( x ) ) ∂k µ , (117)[Θ µν ( x )] ab . = ∂ D ab (x , k ( x ) ) ∂k µ ∂k ν . (118)One can also multiply Eq. (115) by γ − to raise the ﬁrstindex. Then, one obtains b D ab ψ b = 0 , b D ab . = γ ac b D cb . (119)Note that if b D is self-adjoint, then D ab is Hermitian;then, b D ab is self-adjoint too.Unlike in scalar waves (Sec. IV C), D ab ( x , k ) is not acovariant object, i.e., in this case, not a true tensor. (Forexample, see Sec. VI.) It is convenient to split it as D ab ( x , k ) = D ab ( x , k ) + C ab ( x , k ) , (120)where D is a true tensor, and C = O ( ∂ P ) is a smallremainder. [Here ∂ P means the same as in Sec. IV C, so C = O ( ǫ ) under the GO ordering, and C = O ( ǫ k ) underthe quasioptical ordering.] The splitting Eq. (120) is notunique and is a matter of convenience; for example, onecan deﬁne D ( x , k ) simply as the zeroth-order dispersiontensor. Hence, we can redeﬁne[ V µ ( x )] ab . = ∂ D ab (x , k ( x ) ) ∂k µ , (121)[Θ µν ( x )] ab . = ∂ D ab (x , k ( x ) ) ∂k µ ∂k ν , (122)which is equivalent to the above deﬁnitions within the ac-curacy of our theory. Then, assuming the same orderingsas in the scalar-wave case (also see below), the following approximation is enough both for the ﬁrst-order theoryand for the quasioptical theory: b D ≈ D H + b D , (123) b D = C H + iD A + b D H + b D H . (124)Here, D ≡ D ( x ), and D = γ − D , so D ab ( x ) = γ ac D cb ( x );similar conventions are assumed for C and D . Accord-ingly, the matrices D H and D A are self-adjoint, and soare the operators b D H and b D H , which are given by b D H . = 12 γ − ( b p µ V µH + V µH b p µ ) , (125) b D H . = 12 γ − b p µ Θ µνH b p ν . (126)Using [ γ − , b p µ ] = i ( γ − ) ,µ and ( γ − γ ) ,µ = 0 to obtain( γ − ) ,µ γ = − γ − γ ,µ ≡ − ℓ µ , (127)one can also express these as b D H = 12 [ b p µ V µH ( x ) + V µH ( x ) b p µ − iℓ µ V µH ] , (128) b D H ≈ b p µ Θ µνH ( x ) b p ν . (129)(The term ∝ ℓ µ must be kept in b D H but a similar termin b D H can be neglected.) Also, V µH ( x ) . = γ − V µH ( x ) , Θ µνH ( x ) . = γ − Θ µνH ( x ) , (130)and since γ is k -independent, this leads to[ V µH ( x )] ab . = ∂ ( D H ) ab (x , k ( x ) ) ∂k µ , (131)[Θ µνH ( x )] ab . = ∂ ( D H ) ab (x , k ( x ) ) ∂k µ ∂k ν . (132)If b D H is neglected, the envelope equation (119) be-comes a Dirac-type equation similar to those consideredin the context of XGO [24–28] and also, for instance, inRefs. [58, 59]. We shall also discuss this limit in Sec. V E. D. Active and passive modes

1. Basis from the eigenvectors of D H Let us assume the GO ordering as in Eq. (63), exceptthat D H and D A are now matrices. Then, b D = O ( ǫ ), sothe envelope equation [Eq. (5)] acquires the form D H ( x ) ψ = O ( ǫ ) . (133)Hence, it is convenient to express ψ through the eigen-vectors of D H ( x ). Let us denote these eigenvectors as η s ( x ) and the corresponding eigenvalues as Λ s ( x ), so D H η s = Λ s η s . (134)2[Note that η s ( x ) = η s (x , k ( x ) ) , where η s ( x , p ) are theeigenvectors of D H ( x , p ). Likewise, Λ s ( x ) = Λ s (x , k ( x ) ) ,where Λ s ( x , p ) are the eigenvalues of D H ( x , p ).] Since D H is self-adjoint, it has m eigenvectors η s that forma complete basis { η s } . Let us also introduce the corre-sponding dual basis { η s } as usual, i.e., such that η s · η s ′ = δ ss ′ . (135)Then, we can represent ψ as ψ = η s a s , a s = η s · ψ ≡ ( η s ) ∗ a ψ a , (136)where a s serve as the components of ψ in the basis { η s } .[Remember that the dot product (101) includes conju-gation.] Since D H is self-adjoint, it is always possibleto make the basis { η s } orthonormal, and this choice isassumed below. Then, η s · η s ′ = δ ss ′ ; thus η s = η s , i.e.,( η s ) a = ( η s ) a = γ ab ( η s ) b . (137)Also, for any two ﬁelds ψ = η s a s and φ = η s b s , the innerproduct (98) can be written as follows: h ψ | φ i m = Z d n x p g ⋄ ( x ) a ∗ s ( x ) b s ( x ) , (138)where a s . = δ ss ′ a s ′ . The matrix δ with elements δ ss ′ serves as a Euclidean metric for manipulating the modeindices s and s ′ . Those should not to be confused withthe coordinate indices denoted with Greek letters andalso with other Latin indices that are manipulated bythe metric γ [Eq. (99)].

2. Amplitude vectors

The physical meaning of the expansion coeﬃcients a s is understood as follows. Consider multiplying Eq. (133)by η s from the left. That gives Λ s a s = O ( ǫ ), where nosummation over s is assumed. This shows that, for agiven s , there are two possibilities: either a s is small orΛ s is small. In the ﬁrst case, the polarization η s doesnot correspond to a propagating wave mode per se ; thesmall nonzero projection of ψ on η s is only due to the factthat the wave ﬁeld is not strictly sinusoidal in inhomo-geneous medium. We call such “modes” passive. In thesecond case, the local k ( x ) approximately satisﬁes the lo-cal dispersion relation Λ s (x , k ( x ) ) ≈

0. Then, a s can beunderstood as the local scalar amplitude of an actual GOmode, so that a s = O (1) is allowed. We call such modesactive, and mode conversion occurs when more than oneactive mode exists. In other words, for a given boundaryor initial conditions, active modes are those that are ex-cited resonantly, while passive modes are those that arenonresonant and, thus, adiabatically isolated. (However,this does not mean that the passive-modes amplitudesare simply negligible; see below.)Assuming that there are N ≥ s = 1 , , . . . , N and the remaining ¯ N = m − N modes with s = ( N + 1) , ( N + 2) , . . . , m are passive. Let us alsoadopt the notation¯ a s = a s + N , ¯ η s = η s + N , ¯Λ s = Λ s + N (139)( s = 1 , , . . . , ¯ N ) for the passive-mode amplitudes, po-larizations, and eigenvalues. Then, it is convenient tointroduce the following “amplitude vectors” a . =  a ... a N  , ¯ a . =  ¯ a ...¯ a ¯ N  (140)and the corresponding row vectors that are dual to theamplitude vectors under the Euclidean complex dot prod-uct; namely, a † = ( a ∗ , a ∗ , . . . , a ∗ N ) , ¯ a † = (¯ a ∗ , ¯ a ∗ , . . . , ¯ a ∗ ¯ N ) , (141) a s . = δ ss ′ a s ′ , ¯ a s . = δ ss ′ ¯ a s ′ . (142)Below, we seek to derive an approximate envelope equa-tion in terms of these amplitude vectors.

3. Polarization matrices

Before we proceed, let us introduce the following no-tation. First, consider the “polarization matrices”Ξ = ( η , η , . . . , η N ) , ¯Ξ = (¯ η , ¯ η , . . . , ¯ η ¯ N ) . (143)These are non-square matrices that have active- andpassive-mode polarizations as their columns; namely,Ξ as = ( η s ) a , ¯Ξ as = (¯ η s ) a . (144)Using these, Eq. (136) can be rewritten compactly as ψ = Ξ a + ¯Ξ¯ a. (145)Also consider the auxiliary polarization matricesΞ + . =  η ∗ ... η N ∗  , ¯Ξ + . =  ¯ η ∗ ...¯ η ¯ N ∗  , (146)which have the η s ∗ as their rows,(Ξ + ) sa = ( η s ) ∗ a , (¯Ξ + ) sa = (¯ η s ) ∗ a . (147)Since( ψ † ) a = ( η s ) ∗ a a s ∗ +(¯ η s ) ∗ a ¯ a s ∗ = ( η s ) ∗ a a ∗ s +(¯ η s ) ∗ a ¯ a ∗ s , (148)one obtains ψ † = a † Ξ + + ¯ a † ¯Ξ + . (149)3However, note that in general, + does not mean to rep-resent the adjoint in the common sense [cf. Eq. (154)].Next, notice that(Ξ + Ξ) ss ′ = ( η s ) ∗ a ( η s ′ ) a = η s · η s ′ = δ ss ′ , (150)where the indices s and s ′ span from 1 to N . Analogousformulas apply to ¯Ξ. Then,(¯Ξ + Ξ) ss ′ = (¯ η s ) ∗ a ( η s ′ ) a = ¯ η s · η s ′ = 0 , (151)and similarly for Ξ + ¯Ξ. In other words, one hasΞ + Ξ = , ¯Ξ + ¯Ξ = , ¯Ξ + Ξ = , Ξ + ¯Ξ = , (152)where is the unit square matrix and is the zero squarematrix, correspondingly. [The dimensions of and arediﬀerent in diﬀerent equations.] Hence, if one multipliesEq. (145) by Ξ + and, separately, by ¯Ξ + , one obtainsconcise formulas for the amplitude vectors a and ¯ a , a = Ξ + ψ, ¯ a = ¯Ξ + ψ. (153)Another property that we shall use later on is(Ξ + ) sa = δ ss ′ ( η s ′ ) ∗ a = δ ss ′ ( η s ′ ) b ∗ γ ab = ( δ − Ξ H γ ) sa , and similarly for ¯Ξ + . Here, H denotes the conjugatetranspose (Ξ H . = Ξ ⊺ ∗ ), and δ − is the Kronecker matrixwith upper-index elements δ ss ′ . Hence,Ξ + = δ − Ξ H γ, ¯Ξ + = δ − ¯Ξ H γ. (154)The factor δ − can be omitted if one ignores the diﬀer-ence between upper and lower indices s and s ′ that referto the mode number. If γ is Euclidean, one can also ig-nore the diﬀerence between upper and lower coordinateindices; then, Ξ + = Ξ H .As a side remark, note that Ξ and ¯Ξ are generally non-square, so Π . = ΞΞ + and ¯Π . = ¯Ξ¯Ξ + are not unit matricesbut rather projectors . Indeed, due to Eqs. (152), one hasΠ = Π and ¯Π = ¯Π. Also, by applying Π and ¯Π toEq. (145), one obtainsΞ a = Π ψ, ¯Ξ a = ¯Π ψ. (155)Thus, Π ψ is the projection of ψ on the active-mode space,and ¯Π ψ is the projection of ψ on the passive-mode space.

4. Equation for the active modes

Using the eigendecomposition theorem and Eq. (152),one obtains D H = η s Λ s η s = ΞΛΞ + + ¯Ξ ¯Λ¯Ξ + , (156)where Λ and ¯Λ are the diagonal eigenvalue matrices,Λ . = diag { Λ , Λ , . . . , Λ N } = O ( ǫ ) , (157)¯Λ . = diag { ¯Λ , ¯Λ , . . . , ¯Λ ¯ N } = O (1) . (158) These and Eq. (152) also lead to D H Ξ = ΞΛ = O ( ǫ ) , D H ¯Ξ = ¯Ξ¯Λ = O (1) . (159)Hence, the envelope equation (5) can be written as0 = D H Ξ a + D H ¯Ξ¯ a + b D Ξ a + b D ¯Ξ¯ a = ΞΛ a + ¯Ξ ¯Λ¯ a + b D Ξ a + b D ¯Ξ¯ a, (160)where we used Eq. (159). Let us multiply this by Ξ + and,separately, by ¯Ξ + . Then, due to Eq. (152), one obtainsΛ a + Ξ + b D Ξ a + Ξ + b D ¯Ξ¯ a = 0 , (161)¯Λ¯ a + ¯Ξ + b D Ξ a + ¯Ξ + b D ¯Ξ¯ a = 0 . (162)Let us also use Eq. (162) to express ¯ a through a andsubstitute the result into Eq. (161). Since ¯ a and b D areboth of order ǫ , it is suﬃcient to solve Eq. (162) for ¯ a approximately to the leading order,¯ a ≈ − ¯Λ − ¯Ξ + b D Ξ a. (163)Then, Eq. (161) becomes an equation for just the N -dimensional active-mode amplitude vector,(Λ + Ξ + b D Ξ − Ξ + b D ¯Ξ ¯Λ − ¯Ξ + b D Ξ) a = 0 . (164)As a reminder, this equation is valid up to O ( ǫ ). E. Leading-order approximation:extended geometrical optics

Upon substituting Eqs. (124) into Eq. (164), we obtainthe following equation to lowest (ﬁrst) order in ǫ :(Λ + i Γ + b K ) a = 0 , (165)where we introduced Γ . = Ξ + D A Ξ , (166) b K . = Ξ + ( b D H + C H )Ξ . (167)As shown in Appendix B 1, b Ka = − iV µ a ,µ − i V µ ; µ a − U a, (168) V µ . = Ξ + V µH Ξ ≈ Λ | µ , (169) U = δ − (Ξ H ,µ V µH Ξ) A − Ξ + C H Ξ , (170)where Ξ is considered as a function of x . [As a reminder,Ξ H . = Ξ ⊺ ∗ is the conjugate transpose of Ξ; V µH is givenby Eq. (121); and δ − is a unit matrix that only raisesthe mode index.] Alternatively, Ξ can be considered as afunction of ( x , k ). Then, as shown in Appendix B 2, U ≈ δ − (Λ | µ Ξ H γ Ξ | µ − Λ | µ Ξ H γ Ξ | µ + Ξ H | µ D H Ξ | µ ) A − Ξ + C H Ξ , (171)4where the partial derivatives | µ and | µ are deﬁned inSec. IV C. The term U is the Stern–Gerlach Hamilto-nian mentioned in the introduction (Sec. I), except hereit is generalized to an arbitrary metric. This term causespolarization-driven bending of the ray trajectories, whichis missed in traditional GO; also, it causes mode con-version, if more than one active mode is present [24–28].These eﬀects are is discussed in further detail in Sec. V G.More explicitly, Eq. (165) can be written asΛ a − iV µ a ,µ − i V µ ; µ a = ( U − i Γ) a. (172)This generalizes the XGO equation derived in Refs. [24–29] to dissipative waves and curved coordinates. Sincethe vector a belongs to the space where the metric isEuclidean by deﬁnition (Sec. V D 2), the elements of thisvector are covariant, i.e., invariant with respect to co-ordinate transformations in the physical space. Sameapplies to the eigenvalues comprising Λ; thus, the wholeleft-hand-side of Eq. (172) is covariant too. This meansthat the operator on the right-hand side is also covariant,which includes its Hermitian and anti-Hermitian partsseparately. Hence, U and Γ are covariant.Also note that Eq. (172) is similar to Eq. (66) for scalarwaves and has a similar corollary, J µ ; µ = − a † Γ a, J µ . = − a † V µ a. (173)Using Eqs. (100), (130), (145), (149), (155), and (169)one obtains − J µ = a † Ξ + V µH Ξ a = (Π ψ ) † V µH (Π ψ ) ≈ ψ † V µH ψ = ψ ∗ V µH ψ, (174)so Eq. (173) can be viewed as a generalization of Eq. (67).At zero Γ, the dynamics is conservative, and Eq. (173)reﬂects conservation of the wave action, or quanta [53].Accordingly, Γ determines the dissipation rate, J µ canbe identiﬁed (up to a constant factor) as the action ﬂuxdensity summed over all active modes (for example, seeSec. VI B), and the elements of the diagonal matrix Λ | µ are proportional to the group velocities of the correspond-ing active modes. We shall call them the group velocities(without “proportional to”) for brevity. We also empha-size that this model applies even when the group veloc-ities of the active modes are very diﬀerent, unlike thequasioptical model discussed in Sec. V F.For scalar waves studied in Sec. IV, we had Λ = D H and we deﬁned k ( x ) such that this term be zero[Eq. (64)]. Now, Λ is a diagonal matrix with N ≥ k ( x ). Hence, therecan be more than one natural way to deﬁne k ( x ). Oneaesthetically pleasing and convenient [29] option is torequire that Λ or Λ − U be traceless . This amountsto choosing the reference-ray Hamiltonian in Sec. III as H = tr Λ /N or H = tr (Λ − U ) /N , correspondingly. [Ar-bitrary constant factors can be introduced instead of N , for that only redeﬁnes τ in Eq. (7).] Another option isto adopt Λ s = 0 for some single s ≤ N , since all activemodes have close wave vectors anyway; then H = Λ s .As mentioned in Sec. III, all such choices of k ( x ) areequally justiﬁed. Although they lead to slightly diﬀer-ent envelope equations, those equations are equivalentin the sense that they all describe the same total ﬁeldby construction. (One might call this a gauge freedom.)However, for the case N = 1, choosing H = Λ − U ispreferable, as discussed in Sec. V G 2. F. Quasioptical model

1. Quasioptical equation

Suppose that a wave propagates largely as a singlebeam. This implies that all active modes have group ve-locities close to their average group velocity V avr , whichcan be deﬁned via N V µ avr . = tr Λ | µ . Then, V µ = V µ avr + ∆ V µ , ∆ V µ . = V µ − V µ avr , (175)where ∆ V µ are matrices with eigenvalues much smallerthan V avr . Let us assume a ray-based coordinate systemaligned with V avr with the inner product on the trans-verse space M n − ⊥ deﬁned similarly to Eq. (76) [cf. alsoEq. (138)], namely, h a | b i ⊥ N . = Z d n − ̺ p h ⋄ ( ζ, ̺ ) a ∗ s ( ζ, ̺ ) b s ( ζ, ̺ ) . (176)Let us also adopt the quasioptical ordering as we did forscalar waves in Sec. IV E. In particular, we allow ∆ V µ = O ( ǫ ⊥ ). Then, Eq. (164) becomes(Λ + i Γ + b K + b G ) a = 0 . (177)Here, b K is deﬁned as in Eq. (167) and can be approxi-mated as follows: b Ka = − iV a ,ζ − i √ g ⋄ ( √ g ⋄ V ) ,ζ a + ∆ b Ka, (178)∆ b Ka . = − U a − i ∆ V σ a ,σ − i √ h ⋄ ( p h ⋄ ∆ V σ ) ,σ a, (179)∆ V σ = Ξ + D | σH Ξ , (180)where V . = V (we used the fact that V σ avr = 0 by deﬁ-nition of the ray-based coordinates), U = O ( ǫ k ) is givenby Eq. (171), and Γ is given by Eq. (166). Finally, b G . = Ξ + b D H Ξ − Ξ + b D H ¯Ξ ¯Λ − ¯Ξ + b D H Ξ . (181)As shown in Appendix B 3, b G can also be simpliﬁed as b G a ≈ − h / ⋄ [Λ | σ ˜ σ ( h / ⋄ a ) , ˜ σ ] ,σ , (182)5so it is also self-adjoint under the inner product (176).In summary then, the quasioptical equation for vectorwaves can be written as follows:Λ a + i Γ a − iV a ,ζ − i √ g ⋄ ( √ g ⋄ V ) ,ζ a + b G a + ∆ b Ka = 0 . (183)Equation (183) is the main result of our paper. LikeEq. (172), this equation is covariant within the accuracyof our theory. As a reminder, a is generally a vector(dim a = N ≥ D H ; Γ is given byEq. (166); V = N − tr Λ | ζ is a scalar; | ζ . = ∂/∂k ; b G isgiven by Eq. (182); and ∆ b K is given by Eq. (179). There, U is given by Eq. (171), and ∆ V is given by Eq. (180).The derivatives of h ⋄ could be neglected within the as-sumed accuracy, but a purist might want to retain themin order to keep ∆ b K and b G precisely self-adjoint underthe inner product (176).Like in Sec. IV E 2, we require the α parameter of theray-based coordinates to be deﬁned as ̺ -independent.Then, Eq. (183) has the following corollary:1 α dd ζ h a | αV | a i ⊥ N = 2 h a | Γ | a i ⊥ N , (184)which is similar to Eq. (84). For the case of zero Γ,Eq. (184) predicts conservation of the wave-action ﬂuxthrough the beam cross section, h ψ | αV | ψ i ⊥ N = const . (185)Unlike in Eq. (85), this is the ﬂux of all active modescombined, and the ﬂuxes of individual active modes maynot be conserved separately.

2. Simpliﬁed equations

If the metric is nearly Euclidean (or pseudo-Euclidean)as in Sec. IV E 3, we can drop the metric factors andrewrite Eq. (183) as follows:Λ a + i Γ a − iV a ,ζ − i V ,ζ a −

12 (Λ | σ ˜ σ a , ˜ σ ) ,σ − U a − i ∆ V σ a ,σ − i V σ,σ a = 0 . (186)Using the variable transformation φ . = √ V a , this equa-tion can be further simpliﬁed as iφ ,ζ = b χφ + i Υ φ. (187)Here, b χ is an operator self-adjoint under the inner prod-uct (176) and given by b χφ ≈ Qφ −

12 (Σ σ ˜ σ φ , ˜ σ ) ,σ − ∆ V σ V iφ ,σ − i (cid:18) ∆ V σ V (cid:19) ,σ φ. Also, Q , Σ, and Υ are self-adjoint matrices given by Q . = (Λ − U ) /V, Σ σ ˜ σ . = Λ | σ ˜ σ /V, Υ . = Γ /V. (188)Accordingly, Eq. (184) becomesdd ζ h φ | φ i ⊥ N = 2 h φ | Υ | φ i ⊥ N , (189)so h φ | φ i ⊥ N is conserved if Υ is zero. G. Summary and discussion

The quasioptical model proposed above is equallyapplicable to scalar beams, single-mode vector beams( N = 1), and multi-mode vector beams ( N >

1. Scalar beams

In the case of a scalar beam (Sec. IV), one has Ξ = 1, soΛ = D H , Γ = D A , U = 0 , ∆ V σ = 0 . (190)Then, the equations from Sec. IV E are reproduced. Inparticular, Λ is made zero by the choice of k [Eq. (64)].This implies that the ray Hamiltonian is H = Λ, whichleads to the following ray equations:d x α d τ = ∂ Λ ∂k α , d k α d τ = − ∂ Λ ∂x α . (191)

2. Single-mode vector beams

In the case of a single-mode vector beam ( N = 1), onehas Ξ = η , where η is the polarization vector. Then,Λ = η † D H η, Γ = η † D A η, (192)so they are scalars. Also, ∆ V σ = 0, but U is generally anonzero scalar function. There are two natural ways todeﬁne k in this case. One is to require that Λ = 0, i.e.,to adopt H = Λ. In this case, U is left in the envelopeequation and can intensify the envelope inhomogeneity inthe direction perpendicular to the group velocity. Thismay eventually result in the violation of the envelopeapproximation. (For multi-mode beams, this is less ofa concern because they typically split into single-modebeams before the issue becomes signiﬁcant.) Hence, itis potentially advantageous to deﬁne k by requiring Λ − U = 0, i.e., by adopting H = Λ − U . In this case, theamplitude equation is identical to that of a scalar wavewhile the eﬀect of U is absorbed in the reference phase,leading to modiﬁed ray equations,d x α d τ = ∂ (Λ − U ) ∂k α , d k α d τ = − ∂ (Λ − U ) ∂x α . (193)6The eﬀect of U on the ray trajectories is known as the(spin) Hall eﬀect of light in optics [22, 23, 60–63]. Inplasma physics, this eﬀect is typically neglected. (Toour knowledge, all existing ray-tracing codes ignore U entirely.)Finally, note that the ray dynamics can also be rep-resented in an alternative form. Since Λ is a scalar, onecan rewrite Eq. (171) for U ( x , k ) as follows: U = U − Λ | µ Im ( η † η | µ ) + Λ | µ Im ( η † η | µ ) ≈ U − ˙ x µ A ( x ) µ − ˙ k µ A µ ( k ) . (194)Since there is only one mode, there is no mode index tomanipulate, so the “metric” factor δ has been dropped.Likewise, the anti-Hermitian parts are simply the imagi-nary parts in this case. We also substituted the GO rayequations (191), where we replaced d / d τ with dots, andadopted U . = Im ( η H | µ D H η | µ ) − η H C H η, (195) A ( x ) µ . = Im ( η † η | µ ) , A µ ( k ) . = Im ( η † η | µ ) . (196)Then, the phase-space Lagrangian of a ray, L [ x , k ] = k µ ˙ x µ − H , has a non-canonical structure, namely, L = (cid:2) k µ + A ( x ) µ (cid:3) ˙ x µ + A µ ( k ) ˙ k µ − (Λ − U ) . (197)The equations originating from a special case of this non-canonical phase-space Lagrangian were used, for exam-ple, in Ref. [61] to study the Hall eﬀect for light propa-gating in non-birefringent material. A comparison of theresulting non-canonical ray equations with the canonicalray equations (193) can be found in Ref. [27]. Also no-tably, the terms A ( x ) µ and A µ ( k ) are known as the Berryconnection and are linked to the Berry phase, or the ge-ometrical phase of light [27, 46, 61].

3. Multi-mode vector beams

In the case of multiple active modes (

N > a is an N -dimensional vector, and the coeﬃcients in the amplitudeequation are matrices (except for V , which is a scalar).In particular, the matrices Γ, ∆ V σ , and U are generallynondiagonal, so they can cause mode conversion. In thesimplest case, when both the transverse gradients anddissipation are negligible, this process is most transpar-ently described by Eq. (187), which then becomes iφ ,ζ = Qφ, (198)with Q being a self-adjoint matrix. The modes decou-ple when Q is close to diagonal. More generally, thedynamics governed by Eq. (198) is similar to that of an N -level quantum system with a Hamiltonian Q (and tothe dynamics of N coupled classical oscillators whose pa-rameters do not change too rapidly with ζ [64]). Alter-natively, it can be mapped to a precession equation for a real ( N − N = 2, when the spinis three-dimensional. A detailed analysis of mode con-version for this case can be found in Ref. [29].As a reminder, the model presented here (Sec. V F) re-lies on the assumption that the group velocities of the ac-tive modes are close to each other. Otherwise, beam split-ting occurs rapidly, and the assumption that a ,ζ ≪ a ,σ does not hold. The quasioptical description is inapplica-ble to such beams; however, the ﬁrst-order XGO modeldescribed in Sec. V E can be used. VI. ELECTROMAGNETIC WAVESA. Covariant formulation

Here, we shall explain how the above theory applies toEM waves. We start with Maxwell’s equation [65]1 √ g ⋄ ∂ α ( √ g ⋄ F αβ ) = − πc J β , (199)where F αβ is the EM tensor, namely, F αβ = g αµ g βν F µν , F αβ . = ∂ α A β − ∂ β A α , (200) A is the vector potential, and J is the current density.This equation can also be represented as follows: b D ( A )vac A = − πc J. (201)Here b D ( A )vac is the vacuum dispersion operator,[ b D ( A )vac ] αβ A β = 1 √ g ⋄ ∂ λ (cid:8) √ g ⋄ g αµ g λν × [ ∂ ν ( g µβ A β ) − ∂ µ ( g νβ A β )] (cid:9) , (202)which is self-adjoint under the inner product (98). Byassuming J = c b ϑA , were b ϑ is some linear operator, onecan cast the equation for A in the form (1), namely,[ b D ( A )vac + 4 π b ϑ ] A = 0 . (203)Hence, the theory developed in the previous sectionsreadily applies, speciﬁcally, with the dimension of theconﬁguration space being n = 4, the dimension ofthe vector ﬁeld A being m = 4 (so m = n ), and γ = g . The corresponding Weyl symbols are calcu-lated as follows. It can be shown straightforwardly that[ b D ( A )vac ] αβ . = g αγ [ b D ( A )vac ] γβ is approximately given by[ b D ( A )vac ] αβ ≈ b p α b p β − b p µ g µν g αβ b p ν + i ( q α b p β − q β b p α ) + i [( ℓ β ) µα − ( ℓ α ) µβ ] b p µ . (204)Here, we omitted second-order derivatives of g , whichare negligible within the accuracy of our theory. Also, q is deﬁned as in Eq. (15), and ℓ µ . = g − g ,µ , which is in7agreement with Eq. (127) because g = γ . Finally, in theform introduced in Sec. V C, the above result is[ D ( A ) ] αβ ≈ p α p β − g αβ g µν p µ p ν + 4 π ( ϑ ) αβ ,C ( A ) αβ ≈ i ( q α k β − q β k γ ) + i [( ℓ β ) µα − ( ℓ α ) µβ ] k µ + 4 π (∆ ϑ ) αβ , where ( ϑ ) αβ is the GO limit of ϑ , (∆ ϑ ) αβ is the remain-ing part of ϑ , and ϑ itself is the symbol of b ϑ .As a side remark, these equations can be used to calcu-late the Hall eﬀect of light in gravitational ﬁelds, i.e., thedeviation of vacuum light rays from geodesics in curvedspacetime. (Special cases of this eﬀect are discussed inRefs. [23, 62, 63].) B. Non-covariant formulation

As a special case, suppose a metric of the form g = (cid:18) − h (cid:19) (205)with time-independent spatial metric h . Then, the aboveequations can be simpliﬁed as follows. Let us assume theWeyl gauge ( A = 0). Then, it is suﬃcient to considerjust the spatial part of the four-dimensional Eq. (203)and replace the vector potential with the electric ﬁeld E a . = F a = − i b p A a . (206)(As usual, b p . = − ic − ∂ t , which is a self-adjoint operator.Assuming we work with wave ﬁelds that have zero timeaverage, b p can also be considered reversible.) Unlike inthe previous sections, we now use the Latin indices todenote spatial components, and we shall adopt the boldfont for spatial vectors and matrices when using index-free notation. Speciﬁcally, one obtains b D ( E ) E = 0 , (207) b D ( E ) . = ( b p ) − [ b D ( A )vac + 4 π b ϑ ]( b p ) − , (208)where we also multiplied the equation by ( b p ) − . Thishas the form (1) with n = 4, m = 3, and γ = h .Let us introduce the conductivity operator b σ via J = b σ E and notice that 4 π b σ = ic b p b χ by deﬁnition of thesusceptibility operator b χ . Then, 4 π b ϑ = b p b χ b p , so b D ( E ) can be expressed as follows: b D ( E ) . = ( b p ) − b D ( A )vac ( b p ) − + b χ , (209)or equivalently, b D ( E ) = ( b p ) − b D ( A )vac + b χ . The correspond-ing Weyl image is D ( E ) ab = ( p ) − [ D ( A )vac ] ab + χ ab , or D ( E ) ab = ( p ) − (cid:2) p a p b − p c p d h cd h ab + C ( A ) ab (cid:3) + ε ab , (210)where ε ab . = h ab + ( χ ) ab serves as the dielectric tensorand χ is the GO limit of χ . Using the fact that the dispersion operator is deﬁnedonly up to a constant factor, let us introduce an addi-tional factor 1 / (16 π ). Then, Eq. (210) becomes D ( E ) ab = D ( E ) ab + C ( E ) ab , (211) D ( E ) ab = 116 πp ( p a p b − p c p d h cd h ab ) + ε ab π , (212) C ( E ) ab . = 116 πp C ( A ) ab . (213)The quantity C ( A ) ab was derived in Sec. VI A, and the vec-tor q [Eq. (15)] and the matrices ℓ a [Eq. (127)] can bewritten as q a = 14 ∂ a ln | det h | , ℓ a = h − ∂ a h , (214)so they both can be of order ǫ k ; hence, C = O ( ǫ k ). Notethat such C is generally non-negligible even for near-Euclidean metrics such as those discussed in Sec. IV E 3.From Eq. (174), we obtain J α = − E a ∗ E b ∂∂k α [ D ( E ) H ( t, x , ω, k )] ab , (215)becomes as the true action-ﬂux density [53]. As a re-minder, the action density I . = J can be written as I ≈ E a ∗ E b πω ∂∂ω (cid:8) ω [ ε H ( t, x , ω, k )] ab (cid:9) , (216)where we used [ D ( E ) H ] ab E b ≈

0. Using the latter andFaraday’s law B ≈ c k × E /ω for the magnetic ﬁeld B ,one can also rewrite I in another common form [4], I ≈ E a ∗ E b πω ∂∂ω (cid:8) ω [ ε H ( t, x , ω, k )] ab (cid:9) + B ∗ a B a πω . (217)One can also show [53] that the spatial component of theaction ﬂux density (215) can be expressed as J ≈ S ω − E a ∗ E b π ∂∂ k [ ε H ( t, x , ω, k )] ab , (218)where S is the (time- or space-averaged) Poynting vector, S = c π Re ( E × B ∗ ) . (219)The GO equation (173) can be written as ∂ I ∂t + 1 √ h ⋄ ∂∂x a ( p h ⋄ J a ) = − E a ∗ E b π [ ε A ( t, x , ω, k )] ab . The corresponding density of the wave energy is ω I , andthe density of the energy ﬂux is ω J , as ﬂows from thevariational principle [53].8 C. Stationary waves

For stationary waves with ﬁxed frequency, one has b p = − ω/c and ω can be treated as a constant parame-ter. Such waves can be studied on the three-dimensionalconﬁguration space, namely, the physical space x . In thiscase, n = m = 3 and γ = h . If there is no spatial dis-persion, then b χ = χ ( b x ), where the dependence on ω isassumed but not emphasized. Assuming that the under-lying space is ﬂat (which is always the case in laboratoryapplications), the symbol χ ab in this case is identical tothe susceptibility of the homogeneous medium. (Otherformulations of the susceptibility have also been proposedin this case, e.g., for light in nondispersive dielectric me-dia [27] and for waves in cold plasmas [24, 25, 66].)These properties can also be extended, approximately,to media with weak spatial dispersion, i.e., such that χ ab = χ (0) ab ( x ) + χ ( k ) ab ( x , p ) , χ ( k ) ab ≪ χ (0) ab . (220)For example, in plasma, χ ( k ) ab is proportional to the tem-perature, and for EM waves, thermal eﬀects often canbe considered as small perturbations [4]. Although theWeyl symbol χ ( k ) ab is, strictly speaking, diﬀerent from thecorresponding part of the susceptibility in homogeneousmedium, the diﬀerence is of order ǫ k χ ( k ) ab , so it is muchsmaller than ǫ k and thus can be neglected. In otherwords, one can simply replace the small χ ( k ) ab with itsGO limit, which is usually well known [4].In principle, our general method is also applicable towaves in media with strong spatial dispersion. How-ever, b χ may be hard to calculate in that case un-less a medium is homogeneous, and no extrapolation ofa known homogeneous-medium model is guaranteed toyield the true general b χ . D. Waves in ﬂat space

Let us also discuss the special case when the conﬁgu-ration space is ﬂat and Eq. (220) applies, as in typicallaboratory applications. Although the ray-based metric h is still non-Euclidean in this case (unless the referencerays are straight), one can use the ﬂatness of the con-ﬁguration space in the following sense. In the Euclideanlaboratory coordinates ˜ x , the spatial metric has the form˜ h ab = δ ab and ˜ C vanishes. (Here and further, the tildesdenote that the corresponding quantities are evaluated inthe laboratory coordinates. Note that Papers II and IIIassume a diﬀerent notation; there, tilded are quantitiesin the ray-based coordinates, and non-tilded are those inthe laboratory coordinates.) This simpliﬁes the expres-sion for the dispersion matrix and makes D H a true ten-sor (modulo the insigniﬁcant corrections caused by weakspatial dispersion discussed in Sec. VI C). Then, since U and Γ are frame-invariant (Sec. V E), one can calculate them in the laboratory coordinates using U = ˜ U ≈ (˜Λ | µ ˜Ξ H ˜Ξ | µ − ˜Λ | µ ˜Ξ H ˜Ξ | µ + ˜Ξ H | µ ˜ D H ˜Ξ | µ ) A , Γ = ˜Γ = ˜Ξ H ˜ D A ˜Ξ , where both δ − and ˜ γ are omitted for simplicity, sincethey are unit matrices. This also extends to the quasiop-tical model. Since calculations in the laboratory coordi-nates are generally easier than the corresponding calcula-tions in the ray-based coordinates, this simpliﬁes applica-tions of our theory to numerical simulations, as discussedfurther in Paper II.Speciﬁc applications of this theory, including those towaves in fusion plasmas, will be discussed in future pub-lications. Here, we only note that our formulation re-lies on the GO ordering and thus is inapplicable to wavetransformations near caustics, for example, as in the O–Xconversion caused by the density gradient in dense mag-netized plasmas [67]. In contrast, the O–X conversioncaused by the magnetic shear in dilute plasmas [29] canbe modeled naturally, as also demonstrated explicitly inPaper III. Any adiabatic transformations of waves alongcontinuous dispersion curves can also be modeled natu-rally, speciﬁcally, within the single-mode approximationsummarized in Sec. V G 2. (In special cases like the X–B transformation [4], when the group velocity becomeszero on a ray while k remains large, one may want toreplace the coordinate ζ with τ , where d τ = d ζ/V , toremove the singularity in the amplitude equation.) In allthese cases, our theory is expected to yield results thatagree with full-wave modeling up to local errors of or-der O ( ǫ k ), because this is the accuracy within which thepassive-mode contribution is calculated (Sec. V D 4). VII. CONCLUSIONS

In summary, we propose a quasioptical theory of mode-converting wave beams in inhomogeneous media such asplasma. This includes the following. For any given dis-persion operator b D that governs the original wave ﬁeld Ψ,we explicitly calculate the approximate operator b D thatgoverns the wave envelope ψ to the second order in theGO parameter ǫ . Then, we further simplify this envelopeoperator by assuming that the gradient of ψ transverseto the local group velocity is much larger than the cor-responding parallel gradient. This leads to a parabolicdiﬀerential equation for ψ (“quasioptical equation”) inthe basis of the GO polarization vectors [Eq. (136)]. Ourmain results can be found in Sec. V F, which includes thegeneral quasioptical equation (183) and its special case(187). Scalar and mode-converting vector beams are de-scribed on the same footing (Sec. V G). We also explainhow to apply this model to EM waves considered as aspecial case (Sec. VI). In the follow-up papers, we reportsuccessful quasioptical modeling of radiofrequency wavebeams in magnetized plasma based on this theory.9 VIII. ACKNOWLEDGMENTS

The ﬁrst author (IYD) acknowledges the support andhospitality of the National Institute of Fusion Science,Japan. The authors are also thankful to M. A. Oanceafor stimulating discussions regarding the XGO general-ization to non-Euclidean metrics. The work was sup-ported by JSPS KAKENHI Grant Number JP17H03514.The work was also supported by the U.S. DOE throughContract No. DE-AC02-09CH11466 and by the Labo-ratory Directed Research and Development program atSandia National Laboratories. Sandia National Labora-tories is a multimission laboratory managed and oper-ated by National Technology and Engineering Solutionsof Sandia, LLC., a wholly owned subsidiary of HoneywellInternational, Inc., for the U.S. DOE National NuclearSecurity Administration under contract DE-NA-0003525.This paper describes objective technical results and anal-ysis. Any subjective views or opinions that might beexpressed in the paper do not necessarily represent theviews of the U.S. Department of Energy or the UnitedStates Government.

Appendix A: Summary of selected notations

Here, we present a summary of selected notations usedin the main text.

1. Basic symbols • . = denotes a deﬁnition. • b denotes an operator. • † denotes either a dual vector [Eqs. (100) and (141)]or an adjoint operator. • ⊺ denotes the matrix transpose. • H = ⊺ ∗ denotes the conjugate matrix transpose. • + is used only in Ξ + and ¯Ξ + deﬁned in Eqs. (146). • H and A denote the Hermitian and anti-Hermitianparts of an operator [Eqs. (26)] or those of a matrix(Sec. V B). When applied to a scalar, H and A de-note the real and imaginary parts, correspondingly. • The underline notation is explained in Sec. V Band is used for matrices with lower indices only. • W z and W − denote the Wigner–Weyl transformand its inverse (Sec. IV A 3), respectively.

2. Index manipulation

On the n -dimensional conﬁguration space M n , we as-sume a general metric tensor g with components g µν . For(co)vector ﬁelds on the space (co)tangent to M n , the in-dices are manipulated as usual, namely, via X µ = g µν X ν , µ, ν = 0 , , . . . , ( n − . (A1)(Summation over repeated indices is always assumed un-less speciﬁed otherwise.) For vector waves, we also in-troduce an additional vector space that may or may notbe the same as the space tangent to M n (Sec. V A). Onthat space, an additional metric γ is introduced, and theindices are manipulated as follows:Ψ a = γ ab Ψ b , a, b = 1 , , . . . , m. (A2)In special cases, Eqs. (A1) and (A2) can be equivalent,but that is not a generic situation.We also introduce the Euclidean metric δ on the spaceof amplitude vectors a (Sec. V D 2), a s = δ ss ′ a s ′ . (A3)For active modes, which are of our primary interest, themode indices s and s ′ range from 1 to N . The distinctionbetween a s and a s is made only for aesthetic reasons(consistency of notation) and can be neglected otherwise.

3. Derivatives

For spatial derivatives, we use the notation that is stan-dard, for example, in general relativity [68]. In particular, X µ ; µ is the divergence; namely, X µ ; µ = 1 √ g ⋄ ( √ g ⋄ X µ ) ,µ . (A4)Here, g ⋄ . = | det g | , and f ,µ ≡ ∂ µ f . = ∂f∂x µ , f ,µν . = ∂ f∂x µ ∂x ν . (A5)For functions of the form f ( x , k ), we also introduce thefollowing partial derivatives: f | µ . = ∂f ( x , k ) ∂x µ , f | µν . = ∂ f ( x , k ) ∂x µ ∂x ν , (A6) f | µ . = ∂f ( x , k ) ∂k µ , f | µν . = ∂ f ( x , k ) ∂k µ ∂k ν . (A7)Since k = ∇ θ ( x ), one has f ,µ (x , k ( x ) ) = f | µ + f | ν k ν | µ , (A8) k ν | µ = k ν,µ = k µ,ν = θ ,µν = θ ,νµ . (A9)0

4. Inner products

The inner product for scalar waves on M n is h Ψ | Φ i = Z d n x p g ⋄ ( x ) Ψ ∗ ( x )Φ( x ) , (A10)and for m -dimensional vector waves on M n , h Ψ | Φ i m . = Z d n x p g ⋄ ( x ) γ ab ( x )Ψ a ∗ ( x )Φ b ( x ) . (A11)In particular, in the x representation,Ψ † ( x ) . = h Ψ | x i = (Ψ ∗ , Ψ ∗ , . . . , Ψ ∗ m ) . (A12)We also use the dot productΨ · Φ . = Ψ † ( x )Φ( x ) = Ψ ∗ a ( x )Φ a ( x ) . (A13) The inner product on the transverse space M n − ⊥ isdeﬁned for scalar ﬁelds as h ψ | φ i ⊥ . = Z d n − ̺ p h ⋄ ( ζ, ̺ ) ψ ∗ ( ζ, ̺ ) φ ( ζ, ̺ ) (A14)and for the N -dimensional “active” parts of the vectorﬁelds (Sec. V D 2) as h a | b i ⊥ N . = Z d n − ̺ p h ⋄ ( ζ, ̺ ) a ∗ s ( ζ, ̺ ) b s ( ζ, ̺ ) . (A15) Appendix B: Auxiliary calculations

Here we present some auxiliary calculations whose results are used in Sec. V.

1. Calculation of c K and U ( x ) We start with Eq. (167) for b K . This equation can be rewritten as follows: b Ka = 12 Ξ + ( b p µ V µH + V µH b p µ − iℓ µ V µH )Ξ a + Ξ + C H Ξ= − i g / ⋄ [Ξ + ( g / ⋄ V µH Ξ a ) ,µ + Ξ + V µH ( g / ⋄ Ξ a ) ,µ ] − i + ℓ µ V µH Ξ a + Ξ + C H Ξ= − ig / ⋄ ( g / ⋄ ) ,µ Ξ + V µH Ξ a − i Ξ + V µH Ξ a ,µ − i + ( V µH Ξ) ,µ + Ξ + V µH Ξ ,µ ] a − i + ℓ µ V µH Ξ a + Ξ + C H Ξ= − i √ g ⋄ ) ,µ V µ a − iV µ a ,µ − i V µ,µ − (Ξ + ) ,µ V µH Ξ + Ξ + V µH Ξ ,µ + Ξ + ℓ µ V µH Ξ] a + Ξ + C H Ξ= − iV µ a ,µ − i V µ ; µ a − i + V µH Ξ ,µ − (Ξ + ) ,µ V µH Ξ + Ξ + ℓ µ V µH Ξ] a + Ξ + C H Ξ , (B1)where we used Eq. (52) and introduced V µ . = Ξ + V µH Ξ = (Ξ + D H Ξ) | µ − Ξ + | µ D H Ξ − Ξ + D H Ξ | µ = Λ | µ − Ξ + | µ ΞΛ − ΛΞ + Ξ | µ ≈ Λ | µ , (B2)where, at the end, we used Λ ∼ O ( ǫ ) (Sec. V D 4), so that the last two terms in Eq. (B2) can be neglected.From Eq. (154), we have (Ξ + ) ,µ = δ − Ξ H ,µ γ + Ξ + ℓ µ . Using this and also Eq. (130), one obtains b Ka = − iV µ a ,µ − i V µ ; µ a − U a, U = U − Ξ + C H Ξ , (B3) U = i δ − (Ξ H V µH Ξ ,µ − Ξ H ,µ V µH Ξ) = δ − (Ξ H ,µ V µH Ξ) A , (B4)where the subscript A denotes the anti-Hermitian part, as usual.1

3. Calculation of b G Consider Eq. (181) for b G derived under the assumption of the quasioptical ordering. Here, we simplify that equationand derive the explicit formula (182). First, note thatΞ + b D H Ξ a ≈

12 Ξ + b p µ Θ µνH b p ν Ξ a ≈ − h / ⋄ [Ξ + D | σ ˜ σH Ξ( h / ⋄ a ) , ˜ σ ] ,σ , (B11)where we used Eq. (129). Also, using Eq. (128) for b D H , one obtainsΞ + b D H ¯Ξ¯Λ − ¯Ξ + b D H Ξ a ≈ − (Ξ + V σH ¯Ξ¯Λ − ¯Ξ + V ˜ σH Ξ) a ,σ ˜ σ = − [Ξ + V ( σH ¯Ξ¯Λ − ¯Ξ + V ˜ σ ) H Ξ] a ,σ ˜ σ ≈ − h − / ⋄ [Ξ + V ( σH ¯Ξ¯Λ − ¯Ξ + V ˜ σ ) H Ξ( h / ⋄ a ) , ˜ σ ] ,σ . (B12)Here, in the ﬁrst line, we substituted Eq. (128) for b D H and also Eq. (14) for b p µ . Then, we only kept the termsinvolving derivatives of a along the perpendicular direction of the wave beam. The parenthesis in the indices denotesymmetrization; namely, for any A and B , A ( σ B ˜ σ ) = 12 ( A σ B ˜ σ + A ˜ σ B σ ) . (B13)2The metric factor h ⋄ has been added to keep the operator self-adjoint under the inner product (176). (Accountingfor the inhomogeneity of h in b G is beyond the accuracy of our theory. However, keeping b G self-adjoint is convenientand physically meaningful, for it is known that the exact operator is not responsible for dissipation.) Hence, b G ≈ − h / ⋄ [Φ σ ˜ σ ( h / ⋄ a ) , ˜ σ ] ,σ , Φ σ ˜ σ = Ξ + D | σ ˜ σH Ξ − + D ( | σH ¯Ξ¯Λ − ¯Ξ + D | ˜ σ ) H Ξ . (B14)In order to simplify the expression for Φ σ ˜ σ , note that¯Ξ + D | ˜ σH Ξ = (¯Ξ + D H Ξ) | ˜ σ − ¯Ξ + | ˜ σ D H Ξ − ¯Ξ + D H Ξ | ˜ σ = (¯Ξ + ΞΛ) | ˜ σ − ¯Ξ + | ˜ σ ΞΛ − ¯Λ¯Ξ + Ξ | ˜ σ ≈ ¯Λ¯Ξ + | ˜ σ Ξ , (B15)where we used ¯Ξ + Ξ = (and thus ¯Ξ + Ξ | ˜ σ = − ¯Ξ + | ˜ σ Ξ) and neglected terms proportional to Λ = O ( ǫ k ). Hence,Ξ + D | σH ¯Ξ¯Λ − ¯Ξ + D | ˜ σH Ξ = Ξ + ¯Ξ | σ ¯Λ ¯Λ − ¯Λ¯Ξ + | ˜ σ Ξ = Ξ + ¯Ξ | σ ¯Λ¯Ξ + | ˜ σ Ξ . (B16)Using Eqs. (152) for Ξ and ¯Ξ like we did above, Eq. (156) for D H , and the fact that Λ = O ( ǫ ), we ﬁnally obtainΦ σ ˜ σ = Ξ + (ΞΛΞ + + ¯Ξ¯Λ¯Ξ + ) | σ ˜ σ Ξ − + ¯Ξ ( | σ ¯Λ¯Ξ + | ˜ σ ) Ξ= Ξ + (Ξ | σ ΛΞ + + ΞΛ | σ Ξ + + ΞΛΞ + | σ + ¯Ξ | σ ¯Λ¯Ξ + + ¯Ξ ¯Λ | σ ¯Ξ + + ¯Ξ¯Λ¯Ξ + | σ ) | ˜ σ Ξ − + ¯Ξ ( | σ ¯Λ¯Ξ + | ˜ σ ) Ξ ≈ Ξ + (Ξ | σ Λ | ˜ σ Ξ + + Ξ | ˜ σ Λ | σ Ξ + + ΞΛ | σ ˜ σ Ξ + + ΞΛ | σ Ξ + | ˜ σ + ΞΛ | ˜ σ Ξ + | σ + ¯Ξ | σ ¯Λ¯Ξ + | ˜ σ + ¯Ξ | ˜ σ ¯Λ¯Ξ + | σ )Ξ − + ¯Ξ ( | σ ¯Λ¯Ξ + | ˜ σ ) Ξ= Λ | σ ˜ σ + Ξ + Ξ | σ Λ | ˜ σ + Ξ + Ξ | ˜ σ Λ | σ − Λ | σ Ξ + Ξ | ˜ σ − Λ | ˜ σ Ξ + Ξ | σ ≈ Λ | σ ˜ σ + [Ξ + Ξ ( | σ , V ˜ σ ) ] , (B17)where we substituted Λ | σ ≈ V σ [Eq. (B2)]. The quasioptical approximation implies that V ˜ σ is close to a scalar matrix,so the commutator [Ξ + Ξ ( | σ , V ˜ σ ) ] can be neglected. Hence, Φ σ ˜ σ ≈ Λ | σ ˜ σ . [1] E. R. Tracy, A. J. Brizard, A. S. Richardson, andA. N. Kaufman, Ray Tracing and Beyond: Phase SpaceMethods in Plasma Wave Theory (Cambridge UniversityPress, New York, 2014).[2] Yu. A. Kravtsov and Yu. I. Orlov,

Geometrical opticsof inhomogeneous media (Springer-Verlag, New York,1990).[3] G. B. Whitham,

Linear and Nonlinear Waves (Wiley,New York, 1974).[4] T. H. Stix,

Waves in Plasmas (AIP, New York, 1992).[5] P. T. Bonoli,

Review of recent experimental and modelingprogress in the lower hybrid range of frequencies at ITERrelevant parameters , Phys. Plasmas Ray-tracing code TRAVIS for ECR heating, EC current driveand ECE diagnostic , Comput. Phys. Commun. , 165(2014).[8] T. I. Tsujimura, S. Kubo, H. Takahashi, R. Makino,R. Seki, Y. Yoshimura, H. Igami, T. Shimozuma, K. Ida,C. Suzuki, M. Emoto, M. Yokoyama, T. Kobayashi,C. Moon, K. Nagaoka, M. Osakabe, S. Kobayashi, S. Ito,Y. Mizuno, K. Okada, A. Ejiri, T. Mutoh, and the LHDExperiment Group,

Development and application of aray-tracing code integrating with 3D equilibrium map-ping in LHD ECH experiments , Nucl. Fusion , 123019(2015). [9] E. Mazzucato, Propagation of a Gaussian beam in a non-homogeneous plasma , Phys. Fluids B , 1855 (1989).[10] D. Farina, A quasi-optical beam-tracing code for electroncyclotron absorption and current drive: GRAY , FusionSci. Tech. , 154 (2007).[11] G. V. Pereverzev, Use of the multidimensional WKBmethod to describe propagation of lower hybrid waves intokamak plasma , Nucl. Fusion , 1091 (1992).[12] G. V. Pereverzev, Beam tracing in inhomogeneousanisotropic plasmas , Phys. Plasmas , 3529 (1998).[13] E. Poli, A.G. Peeters, and G.V. Pereverzev, TORBEAM,a beam tracing code for electron-cyclotron waves in toka-mak plasmas , Comput. Phys. Commun. , 90 (2001).[14] E. Poli, A. Bock, M. Lochbrunner, O. Maj, M. Re-ich, A. Snicker, A. Stegmeir, F. Volpe, N. Bertelli,R. Bilato, G. D. Conway, D. Farina, F. Felici, L. Fig-ini, R. Fischer, C. Galperti, T. Happel, Y. R. Lin-Liu,N. B. Marushchenko, U. Mszanowski, F. M. Poli, J. Sto-ber, E. Westerhof, R. Zille, A. G. Peeters, and G. V.Pereverzev,

TORBEAM 2.0, a paraxial beam tracing codefor electron-cyclotron beams in fusion plasmas for ex-tended physics applications , Comput. Phys. Commun. , 36 (2018).[15] It is not a purpose of this paper to provide a compre-hensive list of existing approaches to reduced modelingof EM waves in plasmas, so the above reference list is notintended as exhaustive. [16] A. A. Balakin, M. A. Balakina, G. V. Permitin, andA. I. Smirnov, Quasi-optical description of wave beamsin smoothly inhomogeneous anisotropic media , J. Phys.D: Appl. Phys. , 4285 (2007).[17] A. A. Balakin, M. A. Balakina, G. V. Permitin, and A. I.Smirnov, Scalar equation for wave beams in a magnetizedplasma , Plasma Phys. Rep. , 302 (2007).[18] For applications and comparison with other codes, seealso A. A. Balakin, M. A. Balakina, and E. Wester-hof, ECRH power deposition from a quasi-optical pointof view , Nucl. Fusion , 065003 (2008); O. Maj, A. A.Balakin, and E. Poli, Eﬀects of aberration on paraxialwave beams beam tracing versus quasi-optical solutions ,Plasma Phys. Control. Fusion , 085006 (2010).[19] There have been works on quasioptical modeling forsystems where wave beams are naturally aligned withan axis of symmetry. For example, see C. K. Phillips,F. W. Perkins, and D. Q. Hwang, Parabolic approxima-tion method for fast magnetosonic wave propagation intokamaks , Phys. Fluids , 1608 (1986). However, themethods used there do not easily extend to beam mod-eling in more general settings.[20] This does not extend to ray-tracing codes. For example,a ray-tracing code raycon [1] captures mode conversionby using an approximate yet somewhat generic analyticsolution that applies in the limit of a narrow region of res-onant coupling. A related ray-tracing model with modeconversion was also proposed in Refs. [24, 25].[21] Mode conversion is understood here as the exchangeof action between near-resonant but non-intersectingbranches of the dispersion relation [1]. Mathematically,this process is similar to quantum tunneling. Adiabatictransformations of a wave mode along continuous disper-sion curves, such as the X–B transformation [4], are notmode conversion in the true sense of the term and canalso be simulated using already existing reduced models.[22] K. Y. Bliokh, F. J. Rodr´ıguez-Fortu˜no, F. Nori, and A. V.Zayats, Spin-orbit interactions of light , Nat. Photonics ,796 (2015).[23] M. A. Oancea, C. F. Paganini, J. Joudioux, and L. An-dersson, An overview of the gravitational spin Hall eﬀect ,arXiv:1904.09963.[24] D. E. Ruiz,

Geometric theory of waves and its applica-tions to plasma physics , Ph.D. Thesis, Princeton Univer-sity (2017), arXiv:1708.05423.[25] D. E. Ruiz and I. Y. Dodin,

Extending geometrical optics:A Lagrangian theory for vector waves , Phys. Plasmas ,055704 (2017).[26] D. E. Ruiz and I. Y. Dodin, Lagrangian geometrical opticsof nonadiabatic vector waves and spin particles , Phys.Lett. A , 2337 (2015).[27] D. E. Ruiz and I. Y. Dodin,

First-principles variationalformulation of polarization eﬀects in geometrical optics ,Phys. Rev. A , 043805 (2015).[28] D. E. Ruiz, C. L. Ellison, and I. Y. Dodin, Relativisticponderomotive Hamiltonian of a Dirac particle in a vac-uum laser ﬁeld , Phys. Rev. A , 062124 (2015).[29] I. Y. Dodin, D. E. Ruiz, and S. Kubo, Mode conversionin cold low-density plasma with a sheared magnetic ﬁeld ,Phys. Plasmas , 122116 (2017).[30] This also facilitates the aforementioned applications torelated problems in general relativity [23], as will be dis-cussed in future publications.[31] K. Yanagihara, I. Y. Dodin, and S. Kubo, Quasiopti- cal modeling of wave beams with and without mode con-version: II. Numerical simulations of single-mode beams ,arXiv:1903.01357.[32] K. Yanagihara, I. Y. Dodin, and S. Kubo,

Quasiopti-cal modeling of wave beams with and without mode con-version: III. Numerical simulations of mode-convertingbeams , arXiv:1903.01364.[33] T. Notake, S. Kubo, T. Shimozuma, H. Idei,Y. Yoshimura, S. Inagaki, K. Ohkubo, S. Kobayashi,Y. Mizuno, S. Ito, Y. Takita, T. Watari, K. Nari-hara, T. Morisaki, I. Yamada, Y. Nagayama, K. Tanaka,S. Sakakibara, R. Kumazawa, T. Seki, K. Saito, T. Mu-toh, A. Shimizu, A. Komori, and the LHD ExperimentGroup,

Optimization of incident wave polarization forECRH in LHD , Plasma Phys. Control. Fusion , 531(2005).[34] S. Kubo, H. Igami, T. I. Tsujimura, T. Shimozuma,H. Takahashi, Y. Yoshimura, M. Nishiura, R. Makino,and T. Mutoh, Plasma interface of the EC waves to theLHD peripheral region , AIP Conf. Proc. , 090006(2015).[35] A. Iiyoshi, A. Komori, A. Ejiri, M. Emoto, H. Funaba,M. Goto, K. Ida, H. Idei, S. Inagaki, S. Kado, O. Kaneko,K. Kawahata, T. Kobuchi, S. Kubo, R. Kumazawa,S. Masuzaki, T. Minami, J. Miyazawa, T. Morisaki,S. Morita, S. Murakami, S. Muto, T. Mutoh, Y. Na-gayama, Y. Nakamura, H. Nakanishi, K. Narihara,K. Nishimura, N. Noda, S. Ohdachi, N. Ohyabu, Y. Oka,M. Osakabe, T. Ozaki, B. J. Peterson, A. Sagara,S. Sakakibara, R. Sakamoto, H. Sasao, M. Sasao, K. Sato,M. Sato, T. Seki, T. Shimozuma, M. Shoji, H. Suzuki,Y. Takeiri, K. Tanaka, K. Toi, T. Tokuzawa, K. Tsumori,K. Tsuzuki, K. Y. Watanabe, T. Watari, H. Yamada,I. Yamada, S. Yamaguchi, M. Yokoyama, R. Akiyama,H. Chikaraishi, K. Haba, S. Hamaguchi, M. Iima, S. Ima-gawa, N. Inoue, K. Iwamoto, S. Kitagawa, J. Ko-daira, Y. Kubota, R. Maekawa, T. Mito, T. Nagasaka,A. Nishimura, C. Takahashi, K. Takahata, Y. Takita,H. Tamura, T. Tsuzuki, S. Yamada, K. Yamauchi,N. Yanagi, H. Yonezu, Y. Hamada, K. Matsuoka, K. Mu-rai, K. Ohkubo, I. Ohtake, M. Okamoto, S. Satoh,T. Satow, S. Sudo, S. Tanahashi, K. Yamazaki, M. Fu-jiwara, and O. Motojima,

Overview of the Large HelicalDevice project , Nucl. Fusion , 1245 (1999).[36] M. Osakabe, Y. Takeiri, T. Morisaki, G. Moto-jima, K. Ogawa, M. Isobe, M. Tanaka, S. Murakami,A. Shimizu, K. Nagaoka, H. Takahashi, K. Nagasaki,H. Takahashi, T. Fujita, Y. Oya, M. Sakamoto, Y. Ueda,T. Akiyama, H. Kasahara, S Sakakibara, R. Sakamoto,M. Tokitani, H. Yamada, M. Yokoyama, Y. Yoshimura,and LHD Experiment Group, Current status of Large He-lical Device and its prospect for deuterium experiment ,Fusion Sci. Technol. , 1 (2017).[37] A placeholder for a link to the SupplementaryMaterial. [38] For a more rigorous deﬁnition of the GO approximation,see, for instance, Ref. [2] or I. Y. Dodin, A. I. Zhmogi-nov, and D. E. Ruiz,

Variational principles for dissipative(sub)systems, with applications to the theory of linear dis-persion and geometrical optics , Phys. Lett. A , 1411(2017).[39] D. E. Ruiz and I. Y. Dodin,

Ponderomotive dynamics ofwaves in quasiperiodically modulated media , Phys. Rev.A , 032114 (2017). [40] H. Weyl, The Theory of Groups and Quantum Mechanics (Dover, New York, 1931).[41] E. Wigner,

On the quantum correction for the thermody-namic equilibrium , Phys. Rev. , 749 (1932).[42] J. E. Moyal, Quantum mechanics as a statistical theory ,Proc. Cambridge Philosoph. Soc. , 99 (1949).[43] R. G. Littlejohn, The semiclassical evolution of wavepackets , Phys. Rep. , 193 (1986).[44] L. Friedland, G. Goldner, and A. N. Kaufman,

Four-dimensional eikonal theory of linear mode conversion ,Phys. Rev. Lett. , 1392 (1987).[45] S. W. McDonald, Phase-space representations of waveequations with applications to the eikonal approximationfor short-wavelength waves , Phys. Rep. , 337 (1988).[46] R. G. Littlejohn and W. G. Flynn,

Geometric phases inthe asymptotic theory of coupled wave equations , Phys.Rev. A , 5239 (1991).[47] For Euclidean coordinates, a detailed discussion of theWeyl calculus can be found in Refs. [1, 24]. A generaliza-tion to curved metric was considered also in C. Gneiting,T. Fischer, and K. Hornberger, Quantum phase-spacerepresentation for curved conﬁguration spaces , Phys.Rev. A , 062117 (2013) and, to some extent, Ref. [64].[48] B. S. DeWitt, Point transformations in quantum mechan-ics , Phys. Rev. , 653 (1952).[49] J. M. Domingos and M. H. Caldeira, Self-adjointness ofmomentum operators in generalized coordinates , Found.Phys. , 147 (1984).[50] Eventually (Sec. V E), we shall choose k ( x ) such that D (x , k ( x ) ) be small. But that will characterize k ( x )rather than the function D itself, which is consideredorder-one when evaluated on general ( x , p ).[51] Reduced codes typically assume that the anti-Hermitianpart of the dielectric tensor is small. That is not alwaysthe case; for example, see [M. D. Tokman, E. Westerhof,and M. A. Gavrilova, Wave power ﬂux and ray-tracing inregions of resonant absorption , Plasma Phys. ControlledFusion , 91 (2000)] and references therein. However,this subject will not be discussed below.[52] Strictly speaking, Eq. (1) can always be multiplied by aconstant matrix to ensure that the new dispersion opera-tor satisﬁes the ordering (63) at a given location. Hence,the only nontrivial requirement that we actually imposehere is that such D A does not grow too much as the wavepropagates.[53] I. Y. Dodin and N. J. Fisch, Axiomatic geometrical op-tics, Abraham-Minkowski controversy, and photon prop-erties derived classically , Phys. Rev. A , 053834(2012).[54] We use an approach similar to the 3 + 1 approach ingeneral relativity. For example, see I. Y. Dodin and N. J. Fisch, Vlasov equation and collisionless hydrodynamicsadapted to curved spacetime , Phys. Plasmas , 112118(2010), including the Supplementary Material there.[55] For example, see P. D. Powell, Calculating determinantsof block matrices , arXiv:1112.4379.[56] In principle, b D A produces eﬀects qualitatively diﬀerentfrom those produced by b D H , so one might argue that theaccuracy of its approximation does not necessarily needto equal that of b D H . Besides, D A is often more abruptthan D H , so retaining V σA in Eq. (82) may in fact leadto a better model in principle. However, in such cases,the Weyl symbol D of the original dispersion operator istypically known only to the zeroth order (Sec. VI C), andthus so is D A . Hence, retaining ﬁrst-order corrections thedamping coeﬃcient does not seem warranted.[57] For example, see the derivation of the linear part of thenonlinear Schr¨odinger equation for a wave in a homo-geneous medium in V. I. Karpman, Non-linear Wavesin Dispersive Media (Pergamon Press, New York, 1974),Sec. 27.[58] T. J. Bridges and S. Reich,

Multi-symplectic integrators:numerical schemes for Hamiltonian PDEs that conservesymplecticity , Phys. Lett. A , 184 (2001).[59] L. Friedland and A. N. Kaufman,

Congruent reduction ingeometric optics and mode conversion , Phys. Fluids ,3050 (1987).[60] M. Onoda, S. Murakami, and N. Nagaosa, Hall eﬀect oflight , Phys. Rev. Lett. , 083901 (2004).[61] K. Y. Bliokh, A. Niv, V. Kleiner, and E. Hasman, Ge-ometrodynamics of spinning light , Nature Phot. , 748(2008).[62] P. Gosselin, A. B´erard, and H. Mohrbach, Spin Hall eﬀectof photons in a static gravitational ﬁeld , Phys. Rev. D ,084035 (2007).[63] C. Duval and T. Sch¨ucker, Gravitational birefringence oflight in Robertson–Walker cosmologies , Phys. Rev. D ,043517 (2017).[64] I. Y. Dodin, Geometric view on noneikonal waves , Phys.Lett. A , 1598 (2014).[65] L. D. Landau and E. M. Lifshitz,

The Classical Theoryof Fields (Pergamon Press, New York, 1971).[66] L. Friedland,