[PDF] Adjoint methods for stellarator shape optimization and sensitivity analysis

Abstract

The design of a stellarator with acceptable confinement properties requires optimization of the magnetic field in the non-convex, high-dimensional spaces describing their geometry. Another major challenge facing the stellarator program is the sensitive dependence of confinement properties on electro-magnetic coil shapes, necessitating the construction of the coils under tight tolerances. In this Thesis, we address these challenges with the application of adjoint methods and shape sensitivity analysis. Adjoint methods enable the efficient computation of the gradient of a function that depends on the solution to a system of equations, such as linear or nonlinear PDEs. This enables gradient-based optimization in high-dimensional spaces and efficient sensitivity analysis. We present the first applications of adjoint methods for stellarator shape optimization. The first example we discuss is the optimization of coil shapes based on the generalization of a continuous current potential model. Understanding the sensitivity of coil metrics to perturbations of the winding surface allows us to understand features of configurations that enable simpler coils. We next consider solutions of the drift-kinetic equation. An adjoint drift-kinetic equation is derived based on the self-adjointness property of the Fokker-Planck collision operator, allowing us to compute the sensitivity of neoclassical quantities to perturbations of the magnetic field strength. Finally, we consider functions that depend on solutions of the MHD equilibrium equations. We generalize the self-adjointness property of the MHD force operator to include perturbations of the rotational transform and the currents outside the confinement region. This self-adjointness property is applied to develop an adjoint method for computing the derivatives of such functions with respect to perturbations of coil shapes or the plasma boundary.

Full PDF

AABSTRACT

Title of dissertation:

ADJOINT METHODS FORSTELLARATOR SHAPE OPTIMIZATIONAND SENSITIVITY ANALYSISElizabeth Joy PaulDoctor of Philosophy, 2020

Dissertation directed by:

Professor William DorlandDepartment of Physics

Stellarators are a class of device for the magnetic conﬁnement of plasmas without toroidalsymmetry. As the conﬁning magnetic ﬁeld is produced by clever shaping of external electro-magnetic coils rather than through internal plasma currents, stellarators enjoy enhancedstability properties over their two-dimensional counterpart, the tokamak. However, the de-sign of a stellarator with acceptable conﬁnement properties requires numerical optimizationof the magnetic ﬁeld in the non-convex, high-dimensional spaces describing their geometry.Another major challenge facing the stellarator program is the sensitive dependence of con-ﬁnement properties on electro-magnetic coil shapes, necessitating the construction of thecoils under tight tolerances. In this Thesis, we address these challenges with the applicationof adjoint methods and shape sensitivity analysis.Adjoint methods enable the eﬃcient computation of the gradient of a function thatdepends on the solution to a system of equations, such as linear or nonlinear PDEs. Ratherthan perform a ﬁnite-diﬀerence step with respect to each parameter, one additional adjointPDE is solved to compute the derivative with respect to any parameter. This enablesgradient-based optimization in high-dimensional spaces and eﬃcient sensitivity analysis. Wepresent the ﬁrst applications of adjoint methods for stellarator shape optimization.The ﬁrst example we discuss is the optimization of coil shapes based on the generaliza-tion of a continuous current potential model. We optimize the geometry of the coil-windingsurface using an adjoint-based method, producing coil shapes that can be more easily con-structed. Understanding the sensitivity of coil metrics to perturbations of the winding surfaceallows us to gain intuition about features of conﬁgurations that enable simpler coils. Wenext consider solutions of the drift-kinetic equation, a kinetic model for collisional transportin curved magnetic ﬁelds. An adjoint drift-kinetic equation is derived based on the self-adjointness property of the Fokker-Planck collision operator. This adjoint method allows us1o understand the sensitivity of neoclassical quantities, such as the radial collisional trans-port and self-driven plasma current, to perturbations of the magnetic ﬁeld strength. Finally,we consider functions that depend on solutions of the magneto-hydrodynamic (MHD) equi-librium equations. We generalize the well-known self-adjointness property of the MHD forceoperator to include perturbations of the rotational transform and the currents outside theconﬁnement region. This self-adjointness property is applied to develop an adjoint methodfor computing the derivatives of such functions with respect to perturbations of coil shapesor the plasma boundary. We present a method of solution for the adjoint equations basedon a variational principle used in MHD stability analysis.2

DJOINT METHODS FOR STELLARATOR SHAPE OPTIMIZATIONAND SENSITIVITY ANALYSISbyElizabeth Joy Paul

Dissertation submitted to the Faculty of the Graduate School of theUniversity of Maryland, College Park in partial fulﬁllmentof the requirements for the degree ofDoctor of Philosophy2020Advisory Committee:Professor William Dorland, Chair/AdvisorDr. Matthew Landreman, Co-AdvisorProfessor Thomas M. Antonsen, Jr.Professor Adil HassamProfessor Ricardo Nochetto (cid:13)

In an eﬀort to promote open science, all data and the associated post-processing scriptsused to produce the ﬁgures in this Thesis have been preserved in aZenodo archive with citeable DOI 10.5281/zenodo.3745635.ii cknowledgments

I owe many thanks to the individuals who have made my graduate career fruitful andenjoyable. Most importantly, I would like to thank my advisors, Bill Dorland and MattLandreman, who guided me toward interesting and important physics problems and madethe completion of this Thesis possible. Bill, your positive outlook on life and constantcuriosity are an inspiration to me. I walk away from every interaction with you with a smileon my face and a new interesting idea in my head. Matt, thank you for your generosityand meticulous attention to detail. From deriving the drift-kinetic equation on the board toproviding detailed comments on every manuscript, I could never thank you enough for yourinvestment in my graduate career. As an incoming graduate student I took a bit of a leapof faith when I decided to come to Maryland, and I could not have asked for a better pairof (award-winning!) advisors. Thank you for believing in me and supporting my career atevery step of the way.Many thanks goes to the other members of the dissertation committee. To Tom Antonsen,for giving me the opportunity to teach plasma physics and contributing to our games of“dungeons and plasmas” with your top-secret notes. I feel honored to be able to work witha great mind such as yours. I hope we can continue to collaborate and spread the goodnews about ALPO. To Adil Hassam, for never ceasing to ask thought-provoking questionsduring group meeting. Your math methods course laid the perfect foundation for plasmaphysics research. To Ricardo Nochetto, for introducing our group to the methods of shapeoptimization. I appreciate the time you took in making the mathematical literature accessibleto us physicists. Our interactions have contributed to much of the work in this Thesis. Thankyou all for agreeing to serve on my committee.I would also like to give a special acknowledgement to Ian Abel, who introduced ourgroup to adjoint methods which formed the basis for this Thesis work.This work was supported by the ARCS Foundation and the US Department of EnergyFES grants DE-FG02-93ER-54197 and DE-FC02-08ER-54964. The computations presentedin this Thesis have used resources at the National Energy Research Scientiﬁc ComputingCenter (NERSC). iii ublication List

1. L. M. Imbert-Gerard, E. J. Paul, and A. Wright, “An introduction to symmetries andstellarators,” in preparation (2019). (link to preprint)2. E. J. Paul, T. Antonsen, Jr., M. Landreman, and W. A. Cooper, “Adjoint approachto calculating shape gradients for 3D magnetic conﬁnement equilibria,”

Journal ofPlasma Physics

86, 905860103 (2020). (link to preprint)3. E. J. Paul, I. G. Abel, M. Landreman, and W. Dorland, “An adjoint method forneoclassical stellarator optimization,”

Journal of Plasma Physics

85, 795850501 (2019).(link to preprint)4. T. Antonsen, Jr., E. J. Paul, and M. Landreman, “Adjoint approach to calculatingshape gradients for 3D magnetic conﬁnement equilibria,”

Journal of Plasma Physics

85, 905850207 (2019). (link to preprint)5. M. Landreman and E. J. Paul, “Computing local sensitivity and tolerances for stel-larator physics properties using shape gradients,”

Nuclear Fusion

58, 076023 (2018).(link to preprint)6. E. J. Paul, M. Landreman, A. Bader, and W. Dorland, “An adjoint method forgradient-based optimization of stellarator coil shapes,”

Nuclear Fusion

58, 076015(2018). (link to preprint)7. E. J. Paul, M. Landreman, F. M. Poli, D. A. Spong, H. M. Smith, and W. Dorland, “Ro-tation and neoclassical ripple transport in ITER,”

Nuclear Fusion

57, 116044 (2017).(link to preprint) iv able of Contents f and the adjoint method . . . . . . . . . . . . . . . . . . . . 453.5 Winding surface optimization results . . . . . . . . . . . . . . . . . . . . . . 473.5.1 Trends with optimization parameters . . . . . . . . . . . . . . . . . . 473.5.2 Optimal W7-X winding surface . . . . . . . . . . . . . . . . . . . . . 483.5.3 Optimal HSX winding surface . . . . . . . . . . . . . . . . . . . . . . 51v.6 Local winding surface sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . 573.7 Metrics for conﬁguration optimization . . . . . . . . . . . . . . . . . . . . . . 603.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 β . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005.5.2 Rotational transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035.5.3 Vacuum magnetic well . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.5.4 Ripple on magnetic axis . . . . . . . . . . . . . . . . . . . . . . . . . 1145.5.5 Eﬀective ripple in the 1 /ν regime . . . . . . . . . . . . . . . . . . . . 1165.5.6 Departure from quasi-symmetry . . . . . . . . . . . . . . . . . . . . . 1195.5.7 Neoclassical ﬁgures of merit . . . . . . . . . . . . . . . . . . . . . . . 1225.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 m = 0, n = 0 mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.3.2 n = 0, m (cid:54) = 0 modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.3.3 m = 0, n (cid:54) = 0 modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.3.4 m (cid:54) = 0, n (cid:54) = 0 modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1446.4 Tokamak shape gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146vi.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 A Toroidal coordinate systems 155

A.1 Toroidal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155A.2 Flux coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.3 Magnetic coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A.4 Boozer coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

B Justiﬁcation for current potential 161C Adjoint derivative at ﬁxed J max F.0.1 DKES trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168F.0.2 Full trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

G Symmetry of the sensitivity function 170

G.0.1 Symmetry of S R implied by Fourier derivatives . . . . . . . . . . . . 170G.0.2 Symmetry of Fourier derivatives . . . . . . . . . . . . . . . . . . . . . 171 H Derivatives at ambipolarity 173I Derivation of generalized MHD self-adjointness relation 176J Alternate derivation of ﬁxed-boundary adjoint relation 178K Interpretation of the displacement vector 180L Details of axis ripple calculation 182M Details of eﬀective ripple in the /ν regime calculation 184N Details of departure from quasi-symmetry calculation 187O Details of neoclassical ﬁgures of merit calculation 189 vii Linearized equilibrium energy functional and coeﬃcient matrices 190

P.1 Further simpliﬁcation of energy functional . . . . . . . . . . . . . . . . . . . 190P.2 Explicit forms of coeﬃcient matrices . . . . . . . . . . . . . . . . . . . . . . 192P.3 Invertibility of A αα . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 Q Constraint on bulk force perturbation 194R Near-axis expansion of screw pinch equilibria 196 viii hapter 1

Introduction

This Chapter aims to motivate and place in context the work of this Thesis. We be-gin with an introduction to the stellarator concept of toroidal conﬁnement in Section 1.1,including the necessity of optimization of the magnetic ﬁeld. We then discuss importantproperties of a stellarator device in Section 1.2. To put stellarator optimization in perspec-tive, we brieﬂy discuss the relevant history in Section 1.3. We then, in Section 1.4, providea detailed introduction to stellarator optimization, including typical assumptions, numericalmethods, and associated challenges. We conclude with an overview of this Thesis in Section1.5.Throughout this Chapter, we use terminology related to magnetic ﬁeld geometry andtoroidal coordinate systems, which are introduced in Appendix A.

The fusion community must face several signiﬁcant scientiﬁc challenges to demonstratea viable magnetic fusion reactor. A large fraction of the present research in magnetic fusionis dedicated to the tokamak, a concept that relies on a large plasma current for conﬁnement.Driving such a current requires a signiﬁcant amount of recirculated power and necessitateseither pulsed operation or non-inductive current drive, both of which are disadvantageousfor a fusion reactor. This large current makes them susceptible to current-driven instabilitiesthat can limit plasma performance. These instabilities, such as tearing and kink instabili-ties, can result in catastrophic terminations of the discharge (Chapter 7.9 in [235]). Runawayelectrons formed due to disruptions can be accelerated by the inductive electric ﬁeld, possi-bly causing damage to plasma-facing components and applying large electro-magnetic forcesto the vacuum vessel. The eﬀect of runaway electrons will be much more harmful in largereactor-scale tokamaks due to the exponential dependence of the density of relativistic elec-trons on the plasma current [104]. Thus in a reactor, disruptions must be mitigated by activefeedback and operation within a safe margin of stability limits. However, such control willbe diﬃcult when alpha particles provide a signiﬁcant fraction of the heating power [93].Remarkably, Lyman Spitzer predicted these possible diﬃculties of tokamak conﬁnementin 1952 [210], before the ﬁrst toroidal conﬁnement experiment,1 a) (b)

Figure 1.1: A schematic image of a tokamak (a) and stellarator (b). The electro-magneticcoils are shown in blue, and the plasma domain is shown in green. Magnetic ﬁeld lines lyingon the outermost magnetic surface are shown in black.“... a large induced current is open to the two practical objectives that it cannotbe sustained in a steady equilibrium and that the rapid generation of such acurrent is likely to lead to plasma oscillations.”These observations led to the development of the stellarator concept. In contrast to thetokamak, a stellarator generates a poloidal magnetic ﬁeld through clever shaping by externalcurrents rather than internal plasma currents. A small amount of current in the plasma isself-driven due to pressure gradients, though this is typically not large enough to result insigniﬁcant MHD modes. There is some experimental evidence that stellarator conﬁgurationsmay be able to operate above the linear MHD stability pressure threshold [234] rather thanbeing terminated by a disruption. The Large Helical Device (LHD) has operated up to avolume-averaged β of 5% without any disruptive MHD phenomena, though the heat trans-port increases due to low- n mode activity [201]. Here β = p/ ( B / (2 µ )) is the ratio of theplasma pressure, p , to the magnetic pressure, and n is the toroidal mode number. Similarly,high-beta discharges in the Wendelstein 7-Advanced Stellarator (W7-AS) have shown satu-ration of low- n and interchange modes at a low level that merely slowly degrades conﬁnement[234]. Stellarators can also operate at higher density than tokamaks due to the absence ofthe Greenwald limit [72]. While in tokamaks, the limits on the density and pressure dueto the Greenwald and MHD stability limits set hard boundaries on the operating points, ina stellarator much softer limits exist. Performance at high beta is often instead limited byequilibrium properties, such as magnetic ﬁeld stochasticity near the edge. For example, if theShafranov shift becomes comparable to the minor radius of the plasma, this can lead to lossof magnetic surfaces [212]. The ability to operate at high beta is critical for an economicalfusion reactor: in the temperature range of 10-20 keV, the fusion power density scales as P ∼ β B [208]. See Figure 1.1 for schematics of a tokamak and stellarator conﬁguration.Despite these clear advantages, much care must be taken to design a stellarator withacceptable conﬁnement properties. Due to its continuous toroidal symmetry, the tokamakenjoys conﬁnement of collisionless single-particle trajectories and the existence of closed,2 ep. Prog. Phys. (2014) 087001 Review Article -6 -5 -4 -3 -2 -1 ν *10 -3 -2 -1 D * TokamakW7-X Plateau PS

Figure 12.

The so-called ‘mono-energetic’ diffusion coefﬁcient (see[63] for details) versus collisionality, ν ∗ = ν R/ ι v , where ν is themono-energetic pitch-angle-scattering frequency, R the major radiusand v the speed of the particles, in the standard conﬁguration ofW7-X (bold) and a tokamak (dashed) with similar aspect ratios( r/R = . / . .

5. The asymptoticregimes are indicated by dotted straight lines. In the order ofincreasing collisionality: the √ ν regime, the 1 / ν regime, the plateauregime and the Pﬁrsch–Schl ¨uter regime. At very low collisionality(below the range shown) the transport again becomes proportionalto ν . The diffusivity has been normalized to the plateau value in acircular tokamak, and the radial electric ﬁeld has been chosen as E r /vB = · − , where B is the magnetic ﬁeld strength. If theelectric ﬁeld is made larger, the transition from the √ ν regime to the1 / ν regime occurs at higher collisionality. From [48]. In the treatment just given, we focused on the equilibriumproperties of the plasma, treating the time derivativeas O ( δ v T /L) . This is sufﬁcient for calculating thecollisional (neoclassical) transport but fails to capture turbulentﬂuctuations and transport. To do so, we need to elevate thetime derivative to order O ( δ v T /L) and also allow f a to varyon the length scale of the gyroradius. If it is assumed thatthe ﬂuctuating electric and magnetic ﬁelds, δ E = −∇ δφ − ∂ δ A / ∂ t and δ B = ∇ × δ A , are small and the wave numbersare ordered as k ∥ L ∼ k ⊥ ρ i ∼ , (120)the result is the famous gyrokinetic equation ∂ g a ∂ t + (v ∥ b + v d a + δ v d a ) · ∇ (f a + g a ) − ⟨ C a (g a ) ⟩ R = e a f a T a ∂ ⟨ χ ⟩ R ∂ t , (121)where the distribution function has been written as f a = − e a δφ ( r , t )T a f a + g a ( R , H , µ, t ), and where χ = δφ − v · δ A is the gyrokinetic potential. Here,the gyro-average at ﬁxed guiding-centre position is denoted by ⟨ · · · ⟩ R , and the perturbation of the drift velocity is given by δ v d a = b × ∇ ⟨ χ ⟩ R B . (122)According to equation (120) perturbations are assumed to varymuch more rapidly across the ﬁeld than along it. The physical reason for this ordering is that unless the parallel phase velocityexceeds the ion thermal speed, ω k ∥ > v T i , there is strong ion Landau damping. Since the frequency fordrift waves is of order ω ∗ ∼ k ⊥ ρ i v T i /L , it follows that theparallel wavelength must be of order L if k ⊥ ρ i = O ( ) toavoid Landau damping. For each Fourier component of theﬂuctuations we then have ⟨ χ ⟩ R , k = J ! δφ k − v ∥ δ A ∥ k " + J v ⊥ k ⊥ δ B ∥ k , (123)where the argument of the Bessel functions is k ⊥ v ⊥ / ) a , δ B ∥ = b · δ B , and we have adopted the Coulomb gauge, ∇ · δ A = δφ , δ A ∥ and δ B ∥ are a n a e a T a δφ = a e a $ g a J d v, δ A ∥ = µ k ⊥ a e a $ v ∥ g a J d v, (124) δ B ∥ = − µ k ⊥ a e a $ v ⊥ g a J d v, where the volume element in velocity space is given byequation (103). The gyrokinetic particle and heat ﬂuxes are % δ Γ a · ∇ ψδ q a · ∇ ψ & = $ % m a v − T a & g a δ v d · ∇ ψ d v, and are thus of order δ in our basic gyroradius expansion (54).This is the same order as the neoclassical transport, and we thusexpect that the two transport channels should be comparable,at least generally speaking. In practice, turbulent transporttends to dominate except in low-collisionality plasmas withoutaxisymmetry. There is an important difference between neoclassical andturbulent transport concerning ambipolarity. It follows fromequations (122), (123) and (124) that the turbulent transport isautomatically ambipolar, ⟨ δ J · ∇ ψ ⟩ = a e a ⟨ δ Γ a · ∇ ψ ⟩ = , to leading order, regardless of the magnitude of the radialelectric ﬁeld. However, as we shall see, neoclassicaltransport is in general not ambipolar unless the electricﬁeld assumes a particular value. Since the total transportmust be ambipolar (on the transport time scale ∂ / ∂ t ∼ δ v T a /L ), the radial electric ﬁeld must therefore adjust so asto make the neoclassical channel ambipolar (unless the ﬁeldis quasisymmetric). This ﬁxes the perpendicular ﬂow velocityof each species, V a ⊥ = b × ( ∇ φ − ∇ p a /n a e a )B , Figure 1.2: The neoclassical diﬀusion coeﬃcient, D ∗ , as a function of the normalized col-lisionality, ν ∗ = νR/ ( ιv ), where ν is the collision frequency, ι is the rotational transform, v is the speed, and R is the major radius. An axisymmetric ﬁeld exhibits a low-collisionalityregime in which D ∗ ∼ ν , while a stellarator exhibits D ∗ ∼ /ν . Thus the neoclassical trans-port in a general three-dimensional ﬁeld can be especially deleterious at low collisionality.Figure reproduced from [101] with permission.nested magnetic surfaces. However, in the general three-dimensional ﬁeld of a stellarator,these properties are not always present. The trajectories of energetic ions, such as the al-pha particles produced in a fusion reaction, may therefore be lost, resulting in damage tomaterial surfaces. Stellarators can experience enhanced neoclassical transport, the colli-sional transport of thermal particles due to the magnetic ﬁeld geometry, leading to increasedtransport of heat and particles, especially at low collisionality (Figure 1.2). The presence oflarge magnetic islands or chaotic regions in a three-dimensional ﬁeld can also severely limitperformance by locally ﬂattening the temperature proﬁle.However, none of these challenges appear to be showstoppers for stellarator conﬁnement.The success of modern stellarators can be attributed to the ability to design the magnetic ﬁeldwith numerical optimization. While tokamak optimization is also possible [107], it is muchmore diﬃcult as conﬁnement properties become very sensitive to the current density andpressure proﬁles. These proﬁles can be determined with multi-scale modeling on turbulentand transport time scales, which is very computationally intensive. On the other hand,the physical properties of stellarators are relatively insensitive to these proﬁles, as theyprimarily rely on the externally produced magnetic ﬁeld for conﬁnement [27]. Given theability to numerically optimize the magnetic ﬁeld of a stellarator, in Section 1.2, we discussthe properties one should consider in a design.3igure 1.3: A Poincare surface computed from the NCSX coil shapes [236]. To produce thisFigure, magnetic ﬁeld lines are integrated toroidally around the device. Each time they hita plane at constant toroidal angle, a point is plotted with color indicating the ﬁeld line. Ageneral 3D ﬁeld contains regions of chaotic ﬁeld lines and magnetic island chains along witha volume of nested toroidal magnetic surfaces. Figure adapted from [121]. We now outline the desired physical properties of a stellarator and standard proxy func-tions applied during their design. We will reserve any discussion of coils, the external currentsthat produce the magnetic ﬁeld, until Section 1.4.3.

Equilibrium properties

The operating space of stellarators is often restricted due to MHD equilibrium propertiesrather than stability limits. For example, when β ∼ (cid:15)ι / (cid:15) is the inverse aspectratio and ι is the rotational transform, the Shafranov shift becomes comparable to the minorradius, which may result in ﬂux-surface break-up [97, 212]. There is a tendency of the edgemagnetic ﬁeld to become stochastic at large beta [201], so a design should try to maximize thevolume of continuously nested ﬂux surfaces [119]. One should also minimize the island widthat low-order rational surfaces, which can be estimated using analytic expressions [38, 147],assuming the magnetic ﬁeld is close to having perfect magnetic surfaces. Such islands can alsobe minimized by controlling the rotational transform, either by maintaining low magneticshear and eliminating low-order rational surfaces altogether or by taking advantage of largemagnetic shear, as the magnetic island width scales as 1 / (cid:112) ι (cid:48) ( ψ ) [26]. See Figure 1.3 fora visualization of magnetic surfaces, magnetic islands, and chaotic ﬁeld lines in the NCSXstellarator. Pressure-driven currents

There are several sources of self-driven plasma current [97]: the parallel bootstrap currentarises due to collisions between trapped and passing particles in the presence of densityand temperature gradients, and the parallel Pﬁrsch-Sch¨uter and perpendicular diamagneticcurrents occur due to equilibrium pressure gradients. The bootstrap current can cause4hifts in the rotational transform toward low-order rational values, which must especially beavoided in low-shear devices. Control of the edge rotational transform is also vital for designswith an island divertor [75]. In the presence of reduced bootstrap current, the magnetic ﬁeldstructure becomes less sensitive to changes in beta. For these reasons, the Wendelstein 7-X(W7-X) conﬁguration was designed for minimal bootstrap current [86]. Often optimizationis performed with a low-collisionality semi-analytic bootstrap current model [205]. Bootstrapcurrent optimization will be described further in Chapter 4. The Pﬁrsch-Schl¨uter currentdoes not provide any net current and therefore does not shift the rotational transform.However, it can give rise to a Shafranov shift and thus aﬀect the equilibrium beta limit[232]. The Pﬁrsch-Schl¨uter current can be reduced by minimizing the magnitude of thegeodesic curvature. The net diamagnetic current will only be non-zero in the presence ofanother source of net current; thus, the reduction of the bootstrap current will automaticallyreduce the diamagnetic current.While the presence of self-driven current can give rise to unfavorable shifts in the rota-tional transform, there are situations in which signiﬁcant bootstrap current may be desirable.If the bootstrap current provides a source of rotational transform in addition to the exter-nal coils, the coil complexity may be reduced and a more compact device may be possible.Plasma current can also provide island healing [95], reducing the width of islands in com-parison with those in the vacuum conﬁguration. For these reasons, the National CompactStellarator Experiment (NCSX) was designed to be quasi-axisymmetric with a signiﬁcantfraction of rotational transform provided by the plasma current [114].

Energetic-particle conﬁnement

A successful stellarator reactor must conﬁne energetic alpha particles for at least theirslowing-down time such that their energy can be deposited with the thermal population.Prompt losses of fast particles should especially be avoided because they can lead to damageto material surfaces. Collisional diﬀusion and deﬂection are minimal at energies near thebirth energy of 3 . J = (cid:73) dl v || , (1.1)is a conserved quantity, where v || is the velocity parallel to the magnetic ﬁeld and l mea-sures length along a ﬁeld line. For trapped particles, the integral is taken along a closedtrajectory between bounce points. For passing particles, it is taken along a ﬁeld line untilit comes inﬁnitesimally close to its starting point. If J is constant on a magnetic surface,then the collisionless trajectories will experience no net radial drift, a property known asomnigeneity [39]. Thus several properties involving J , such as its variation within a ﬂuxsurface, have been considered during the design process [58, 213]. There is evidence thattargeting quasi-symmetry (deﬁned shortly) near the half-radius may also improve energeticparticle conﬁnement [105]. 5 uasi-symmetry Quasi-symmetric magnetic ﬁelds are a subset of omnigeneous magnetic ﬁelds. A quasi-symmetric magnetic ﬁeld possesses a symmetry direction of the magnetic ﬁeld strength whenexpressed in Boozer coordinates (Appendix A.4), B ( ψ, ϑ B , ϕ B ) = B ( ψ, M ϑ B − N ϕ B ) , (1.2)for ﬁxed integers M and N . If M = 0, the contours of the magnetic ﬁeld strength closepoloidally, known as quasi-poloidal symmetry. If N = 0, the contours of the magnetic ﬁeldstrength close toroidally, known as quasi-axisymmetry. If both M and N are non-zero,known as quasi-helical symmetry, the contours of the ﬁeld strength close both toroidally andpoloidally.This symmetry implies guiding center conﬁnement [24] and neoclassical properties thatare comparable to those of an equivalent tokamak [97], including the ability to rotate inthe direction of quasi-symmetry [100]. A quasi-symmetric ﬁeld is omnigeneous, though theconverse is not necessarily true. Quasi-symmetry is typically targeted by minimizing thesymmetry-breaking Fourier harmonics of the magnetic ﬁeld strength. Neoclassical transport

Stellarators experience enhanced neoclassical transport at low collisionality in comparisonwith tokamaks (Figure 1.2). Neoclassical transport is typically the dominant transport chan-nel in classical (unoptimized) stellarators. It is common to employ the eﬀective ripple ( (cid:15) eﬀ )proxy, which quantiﬁes the geometric dependence of the radial ﬂuxes in the low-collisionality1 /ν regime [168]. A discussion of (cid:15) eﬀ and neoclassical diﬀusion in the 1 /ν regime is givenin Chapter 5 and Appendix M. Neoclassical optimization will be discussed in more depth inChapter 4. A review of neoclassical optimization strategies is given in [165]. Stability

Although stellarators may be able to operate above linear MHD stability limits, it isdesirable to design a stellarator with an increased beta limit to reduce enhanced transportcaused by MHD modes. It is common to employ the magnetic well [85] (discussed in Chapter5) or Mercier criterion [157] as proxies for the stability of low- n interchange modes. One canalso try to increase magnetic shear, the radial derivative of the rotational transform ι (cid:48) ( ψ ), toimprove large n ballooning stability and Mercier stability [95]. It appears that stellaratorscan also be designed with reduced microturbulence, though turbulence optimization has yetto be demonstrated experimentally. Some proxies have been proposed, such as reducing theoverlap between bad curvature and trapping regions [239] or increasing nonlinear energytransfer between unstable and damped modes [96]. Lyman Spitzer’s ﬁrst stellarator concept used a simple ﬁgure-eight design (Figure 1.4),which produced rotational transform by “twisting the torus out of the plane” [211]. Spitzerand his team experimentally demonstrated that external shaping could produce rotationaltransform in a vacuum ﬁeld with the Model A, B, and C series stellarators at Princeton[215]. Results from the Model B1 demonstrated conﬁnement of energetic electrons for sev-6igure 1.4: A diagram of the ﬁgure-eight stellarator design from Lyman Spitzer’s 1951Project Matterhorn report. Figure reproduced from [209].eral milliseconds, much longer than would be possible with a purely toroidal ﬁeld. However,the observed diﬀusion of thermal particles was much larger than that predicted from Bohmscaling [46]. The Model C, using a racetrack conﬁguration with helically wound coils, wasable to demonstrate the existence of nested magnetic surfaces [207]. Nonetheless, the ModelC experienced poor conﬁnement with Bohm-like diﬀusion [241]. These early stellarator ex-periments operated until the late 1960s when promising results from the Soviet T-3 tokamakbecame available, and it was decided that Princeton’s Model C would be converted to atokamak [1].Meanwhile, the Wendelstein line of stellarators was active at IPP Garching, initiallyadopting Princeton’s racetrack design. Experiments on WII-A provided insight into thebeneﬁts of low magnetic shear and accurate construction of the coil system for avoidingmagnetic islands [19]. The performance continued, however, to be limited by neoclassicaltransport at low collisionality and low equilibrium pressure limits due to the Shafranov shift[108].A signiﬁcant breakthrough in the stellarator program came with the design of W7-AS,which aimed to improve conﬁnement with equilibrium optimization. To demonstrate thestellarator optimization concept, W7-AS was partially optimized for minimal geodesic cur-vature. Such an objective was predicted to minimize radial magnetic drifts and pressure-driven parallel currents. For the ﬁrst time, the magnetic ﬁeld shaping was supplied by7igure 1.5: The modular ﬁeld (MF) coils, toroidal ﬁeld (TF) coils, and ﬂux surfaces of theW7-AS stellarator. Figure reproduced from [108] with permission.non-planar, modular coils (Figure 1.5) that provided the freedom to tailor the magnetic ﬁeldmore carefully than helical coils. The experiment operated from 1988 to 2002, demonstrat-ing the improved equilibrium and stability properties and reduction of neoclassical transportenabled through equilibrium optimization [108, 117].The success of W7-AS paved the way for the W7-X experiment [233], which was fullyoptimized for nested magnetic surfaces, fast-particle conﬁnement, reduced parallel currents,minimal neoclassical transport at low collisionality, and MHD stability up to an average β of 5% [15]. The early optimization eﬀorts of the Wendelstein team beneﬁted greatly fromthe discovery that guiding center conﬁnement could be achieved with a quasi-symmetric[24] magnetic ﬁeld. N¨uhrenberg and Zille of the Wendelstein team then demonstrated thatquasi-symmetric equilibria could be obtained from numerical optimization of MHD equilibria[175]. The W7-X conﬁguration was designed based on one of their quasi-helical conﬁgura-tions, modiﬁed to achieve the objectives outlined above. The resulting conﬁguration wasquasi-isodynamic, a quasi-omnigenous magnetic ﬁeld with poloidally closed contours of themagnetic ﬁeld strength [98, 176]. Experiments from the initial campaigns of W7-X havedemonstrated the success of the stellarator equilibrium optimization concept, conﬁrming thedesired magnetic topology to within a tolerance of 10 − [188]. High-beta operation will notbe demonstrated until an actively-cooled divertor is installed for the next operating cam-paign. However, there is initial evidence that recent high-performance shots could not havebeen achieved without neoclassical optimization [237].W7-X was not, however, the ﬁrst experimental demonstration of a fully optimized stel-larator. The Helically Symmetric eXperiment (HSX) was designed to have quasi-helical8igure 1.6: Modular ﬁeld coils (silver), toroidal ﬁeld coils (bronze), and magnetic surfacesof the W7-X stellarator. Figure reproduced from [223] with permission.symmetry, Mercier stability, and low magnetic shear [8] using the equilibrium optimizationtools developed by the Wendelstein team [6]. HSX has demonstrated a reduction of elec-tron thermal diﬀusivity [35] due to the decrease in neoclassical transport and a reductionof ﬂow damping in the symmetry direction [77]. The inward-shifted conﬁguration of LHDwas partially optimized for reduced neoclassical transport and energetic particle conﬁne-ment [163], though its ideal MHD stability is worsened in comparison with the standardconﬁguration. Experiments have demonstrated higher electron temperatures and improvedenergetic ion conﬁnement in the inward-shifted conﬁguration as compared with the standardconﬁguration [164].There continues to be an eﬀort toward advanced stellarator designs. Construction hascommenced for the Chinese First Quasi-symmetric Stellarator (CFQS) [206], which will bethe ﬁrst quasi-axisymmetric device in operation. The quasi-axisymmetric NCSX [242] wasdesigned and partially constructed at the Princeton Plasma Physics Laboratory (PPPL),but its funding was terminated before its completion. As the ﬁeld of stellarator optimizationhas developed, several other stellarator equilibria have been optimized to be quasi-symmetric[12, 57, 70, 106, 134, 135, 167] and quasi-omnigeneous [122, 159]. Historically, stellarator optimization has largely used a two-staged approach: in the ﬁrststep, the magnetic ﬁeld in the conﬁnement region is optimized to obtain the desirable physicsproperties. The magnetic ﬁeld must satisfy the MHD equilibrium equations; thus this taskamounts to optimization in the space of free parameters that describe the MHD equilibrium.9ften a ﬁxed-boundary MHD calculation is performed, in which an outer ﬂux surface isprescribed, as opposed to a free-boundary calculation, in which the currents in the vacuumregion are prescribed. As a second step, the currents in the vacuum region are optimized tobe consistent with the boundary obtained in the ﬁrst step. As numerical MHD equilibriumcalculations form the foundation of stellarator optimization, these will be described in Section1.4.1. The two stages of the optimization process are described in Sections 1.4.2 and 1.4.3.We will conclude with a discussion of the present challenges associated with the design ofstellarators and how this Thesis will address them in Section 1.4.4.

The MHD equilibrium equations, J × B = ∇ p (1.3a) ∇ × B = µ J (1.3b) ∇ · B = 0 , (1.3c)describe the steady-state behavior of the magnetic ﬁeld in strongly magnetized plasmas.Many assumptions are made in arriving at (1.3), such as small plasma resistivity, low fre-quency in comparison with the cyclotron and collision frequencies, and small electron inertia.In practice, these equations describe the long-wavelength, low-frequency behavior of mag-netic fusion plasma very well [64].Finding solutions to (1.3) is non-trivial in a general three-dimensional ﬁeld, as well-posedness requires a set of constraints to be satisﬁed on every closed ﬁeld line unless thepressure proﬁle is locally ﬂattened ([84], Section 10.3 in [121]). An alternative is to relyon the assumption that there exists a set of continuously nested toroidal magnetic surfaces,Γ( ψ ), labeled by the toroidal ﬂux label, ψ . Although magnetic surfaces are not guaranteed toexist in general three-dimensional geometry, any stellarator conﬁguration of physical interestwill possess a large region of continuously nested surfaces, and making this assumption willallow for tractable MHD equilibrium calculations.Under the assumption of continuously nested toroidal magnetic surfaces, (1.3) can beshown to be stationary points of an energy functional [133], W [ B ] = (cid:90) V P d x (cid:32) B µ − p (cid:33) , (1.4)where V P is the volume of the conﬁnement region bounded by a magnetic surface S P . Varia-tions of W are computed at prescribed and ﬁxed pressure ( p ( ψ )), rotational transform ( ι ( ψ )),and the toroidal ﬂux label on S P ( ψ ) ([97], Section 11.1 in [121]). Solutions to (1.3) underthese assumptions can be computed eﬃciently and robustly using gradient-descent methodsto obtain local minima of W [ B ]. This approach is implemented in the VMEC [111] andNSTAB [69] codes.Sometimes another function of ﬂux is prescribed instead of the rotational transform, such10DE BC Given( ∇ × B ) × B = µ ∇ p ( ψ ) B · ˆ n | S P = 0 p ( ψ ), ψ , & S P ∇ · B = 0 ι ( ψ ) or I T ( ψ )Table 1.1: Summary of ﬁxed-boundary equilibrium PDE.as the net toroidal current inside a constant ψ surface, I T ( ψ ) = (cid:90) S T ( ψ ) d x J · ˆ n , (1.5)where S T ( ψ ) is a surface at constant toroidal angle bounded by Γ( ψ ) (Figure A.2) and ˆ n isthe unit normal. This choice of ﬂux function is more common in the context of optimization,as I T ( ψ ) can be chosen to vanish for a vacuum ﬁeld or to be consistent with a bootrstrapcurrent model at ﬁnite pressure [206, 214].We can consider (1.3) to be an equation determining the magnetic ﬁeld B , as the currentdensity is computed from Ampere’s law (1.3b) and the pressure is given as a function of ﬂux, p ( ψ ). The MHD equilibrium equations are solved with a Dirichlet boundary condition, B · ˆ n | S P = 0 . (1.6)In the ﬁxed-boundary approach, S P is given and ﬁxed during the equilibrium calculation.The relevant equations for a ﬁxed-boundary calculation are summarized in Table 1.1.In the free-boundary approach, the current density, J C , in the vacuum region, R \ V P ,is prescribed instead of S P . The magnetic ﬁeld due to this current is computed from theBiot-Savart law, B C ( x ) = µ π (cid:90) R \ V P d x (cid:48) J C ( x (cid:48) ) × ( x − x (cid:48) ) | x − x (cid:48) | . (1.7)For a given S P , the plasma current, J P , is computed from (1.3). The magnetic ﬁeld dueto the plasma current can similarly be computed from the Biot-Savart law or more eﬃcientlywith the application of the virtual casing principle [143]. The total magnetic ﬁeld must betangent to the boundary, ( B P + B C ) · ˆ n | S P = 0 . (1.8)Furthermore, the total pressure must be continuous across S P , (cid:104)(cid:2) B / (2 µ ) + p (cid:3)(cid:105) S P = 0 , (1.9)to ensure force balance.In the free-boundary approach, S P is varied until (1.8) and (1.9) are satisﬁed. These con-ditions (1.8)-(1.9) can also be obtained from a variational principle similar to (1.4) includingthe vacuum region [14]. The free-boundary equilibrium problem is summarized in Table 1.2.Figure 1.7 shows the geometry of equilibrium calculations.Due to its eﬃciency and robustness, equilibrium optimization has primarily relied onthis variational approach. There are several alternative approaches to obtaining numerical11DE BC Given( ∇ × B ) × B = µ ∇ p ( ψ ) B · ˆ n | S P = 0 p ( ψ ), ψ , & J C ∇ · B = 0 S P s.t.  ( B P + B C ) · ˆ n | S P = 0 (cid:104)(cid:2) B / (2 µ ) + p (cid:3)(cid:105) S P = 0 ι ( ψ ) or I T ( ψ )Table 1.2: Summary of free-boundary equilibrium PDEs. The magnetic ﬁeld due to theplasma current, B P , is computed from the Biot-Savart law (1.7) or the virtual casing prin-ciple. The magnetic ﬁeld due to the coil current, B C , is computed from the Biot-Savartlaw.Figure 1.7: An equilibrium is computed with a ﬁxed plasma boundary, S P , or prescribedexternal currents, J C . We assume the existence of a set of closed, nested toroidal surfaces,Γ( ψ ). 12olutions to (1.3) in a three-dimensional ﬁeld. For example, sometimes the pressure is as-sumed to be piece-wise constant [120], or the magnetic ﬁeld is taken to resistively relax to anequilibrium [90, 115]. For a review of other 3D equilibrium models, see Chapter 11 in [121]. The goal of stellarator optimization is ultimately to obtain the currents in the vacuumregion needed to produce a stellarator conﬁguration with desired physical properties. Inthis sense, it is logical to optimize the coils directly based on a free-boundary equilibrium.However, ﬁxed-boundary optimization has been predominantly used for several practicalreasons. Free-boundary equilibrium calculations tend to be more expensive, as they requireiterations between an equilibrium solve and vacuum ﬁeld calculations. This iterative schemewill not always converge in practice, hence the historical use of the more robust ﬁxed-boundary method. It has also been suggested that ﬁxed-boundary optimization may yieldbetter equilibrium properties, as the model assumes the existence of at least one magneticsurface. With this approach, considerations of the physics properties of a conﬁgurationare largely decoupled from engineering considerations of the coils. As a second step, theelectro-magnetic coils are designed, as described in Section 1.4.3.The ﬁxed-boundary optimization problem is,min S P f ( S P , B ( S P )) , (1.10)where B is seen as a function of S P through the ﬁxed-boundary equations (Table (1.1)). Here,the objective function, f , quantiﬁes physics or engineering properties of an equilibrium, suchas those outlined in Section 1.2. It is common to consider several objectives during anoptimization, taking the objective function to be a sum of squares, f ( S P , B ( S P )) = (cid:88) i (cid:16) f i ( S P , B ( S P )) − f target i (cid:17) σ i . (1.11)Here f target i is the target value for objective i and the σ i parameters quantify the relativeweighting of the objectives.Sometimes additional equality or inequality constraints are imposed, g ( S P , B ( S P )) = 0 (1.12a) h ( S P , B ( S P )) ≤ . (1.12b)For example, the rotational transform might be constrained to be equal to a target value, or amaximum plasma volume may be imposed. Depending on the choice of optimization method,a local or global minimum will be sought. We will delay discussion of speciﬁc optimizationalgorithms until Section 1.4.4. The ﬁxed-boundary optimization method is implemented inthe STELLOPT [197, 213] and ROSE codes [59].13 .4.3 Coil optimization Once a target plasma boundary, S P , and equilibrium magnetic ﬁeld, B , are identiﬁedfrom equilibrium optimization, electro-magnetic coils that are consistent with this equilib-rium must be identiﬁed. The total magnetic ﬁeld, B , can be decomposed into that whichresults from the target equilibrium plasma current, B P , and that which results from thecoil currents, B C , computed from the Biot-Savart law. If the two are consistent, then thefollowing relation will be satisﬁed,0 = B P ( x ) · ˆ n ( x ) + µ π (cid:90) R \ V P d x (cid:48) J C ( x (cid:48) ) × ( x − x (cid:48) ) · ˆ n ( x ) | x − x (cid:48) | , (1.13)for all x ∈ S P . In other words, the coils must be consistent with the last magnetic surfaceof the target equilibrium.We note that the above is in the form of an integral equation of the ﬁrst kind, g ( t ) = (cid:90) ba dsK ( t, s ) f ( s ) , (1.14)where g ( t ) is given in some domain t ∈ [ c, d ], K ( t, s ) is a known kernel function, and f ( s )must be inferred. It is well-known that such problems are ill-posed [131], in the sense thatsmall changes in the prescribed data, g ( t ), result in large changes in the solution, f ( s ), anda unique solution may not exist.Thus ﬁnding a solution for J C in (1.13) is not well-posed. In some ways, this is advan-tageous, as there may be many possible coil arrangements that provide the desired plasmaconﬁguration, and the one with the most favorable engineering properties can be chosen.However, one must be careful when obtaining numerical solutions to this problem so thatnoise in the prescribed data is not ampliﬁed. A classical technique for such problems isTikhonov regularization [225], in which (1.14) is replaced by the optimization problem,min f ( t ) (cid:90) dc dt (cid:32)(cid:90) ba ds K ( t, s ) f ( s ) − g ( t ) (cid:33) + λ (cid:90) ba ds (cid:0) f ( s ) (cid:1)  . (1.15)When λ = 0, the above is equivalent to (1.14). In order for the problem to be well-posed,additional information about the nature of the solution is provided. In (1.15), the assumptionis made that the norm of the solution will be small. The regularization parameter, λ ,describes the trade-oﬀ between obtaining a solution of (1.14) and satisfying the expected ordesired behavior of the solution. The regularized problem now has a unique solution anddepends continuously on g ( t ) for all λ > J C (cid:32)(cid:90) S P d x (cid:18)(cid:16) B P + B C (cid:17) · ˆ n (cid:19) + λ (cid:90) R \ V P d x F ( J C ) (cid:33) , (1.16)14here B C is the magnetic ﬁeld due to J C computed from the Biot-Savart law (1.7) and F ( J C ) is some function of the coil currents that characterizes desired engineering properties. Coil properties

Given the freedom inherent in designing stellarator coils, we now outline some desiredproperties for a set of stellarator coils. • Physics objectives - Our primary interest is to ﬁnd a coil set consistent with our tar-get ﬁxed-boundary equilibrium. This objective is typically quantiﬁed by the error inobtaining the last magnetic surface, as in (1.13). In practice, some physics metricsdepend very sensitively on coil perturbations, so other critical physics properties of theequilibrium can be included in the coil optimization, such as the magnetic ripple onaxis (a measure of quasi-symmetry) or the rotational transform [56]. • Manufacturability - Coil shapes have a minimum allowable radius of curvature due totheir ﬁnite build, and overly-complex coils may be diﬃcult to manufacture withoutexcessive cost [220]. There are many metrics suggested for quantifying complexity,such as length [243], torsion [118], and curvature [32]. • Stresses - Complex support structures must be built to maintain coil locations andshapes under their large electro-magnetic, thermal, and gravitational stresses. As coilstend to become more circular and planar under electro-magnetic stresses [129], it isadvantageous to minimize curvature and non-planarity when possible. • Access to the plasma chamber - There should be suﬃcient distance between coils toallow for diagnostic ports and ease of machine assembly and maintenance. Coils withrelatively straight sections on the outboard side may particularly provide improvedaccess [32]. • Coil-plasma separation - In a reactor, coils should be designed suﬃciently far from theplasma boundary to allow space for neutron shielding, a blanket, the ﬁrst wall, coilcasing, and the vacuum vessel. Increased coil-plasma distance can also reduce the mag-netic ﬁeld ripple due to the ﬁnite number of coils. The minimum coil-plasma distanceeﬀectively sets the required size of a reactor, as ≈ . Current potential methods

The ﬁrst stellarator coil design code, NESCOIL [158], assumes that all currents in thevacuum region lie on a closed toroidal surface called the winding surface, S C . This method15as used to design the modular coils of W7-AS [108], W7-X [15], and HSX [5] and was latergeneralized to include regularization in the REGCOIL [136] code. In the limit of a largenumber of coils, we can describe a set of discrete coils by a continuous current density on S C , J = δ ( b ( x )) J C ( θ, φ ) . (1.17)Here b ( x ) is the signed-distance function [179], b ( x ) =  − d ( x , S C ) x ∈ V C x ∈ S C d ( x , S C ) x (cid:54)∈ V C . (1.18)The volume enclosed by S C is V C and d ( x , S C ) is the shortest distance from x to any pointon S C . The signed distance function is also discussed in Section 2.1. The surface current J C is a function of the two angles, θ and φ , parameterizing the position on S C . As a consequenceof Ampere’s law (Appendix B), the continuous surface current can be written as, J C = ˆ n × ∇ Φ . (1.19)We can note that current will ﬂow along the contours of Φ, as J C · ∇ Φ = 0. In this way, onceΦ is computed, the coil shapes can be chosen to be a set of the contours of Φ. As we willsee in Section 3, it is possible to construct an objective function that is a convex function ofΦ, possessing a unique global minimum that can be obtained through linear least-squares.Thus current potential methods are particularly robust and eﬃcient, though based on somesevere assumptions. Coil complexity can be approximated from the properties of the currentpotential. In REGCOIL, this is done with the norm of the current density, χ J = (cid:90) S C d x | J C | , (1.20)as large values of χ J indicate small coil-coil spacing. An example REGCOIL calculation isshown in Figure 1.8. Filamentary methods

Other coil design codes instead assume that all currents in the vacuum region are conﬁnedto ﬁlamentary lines, { C k } , taken to be the center of each winding pack. This assumption isagain an idealization, as stellarator coils have a ﬁnite build consisting of several layers, eachwith several turns of the conducting material. However, the ﬁlamentary method is morerealistic than current potential methods, as it accounts for the ripple due to the ﬁnite natureof coils. The lines and the current through each are optimized to minimize some objectivefunction that includes the normal ﬁeld error on S P in addition to engineering objectives,which serve as a form of regularization. For example, the FOCUS code [243] uses the coillength as a form of regularization, and the COILOPT code [216] includes the coil-plasmaseparation, coil-coil separation, and the coil curvature. These optimization problems aregenerally nonlinear and non-convex so that the resulting local minimum will depend onthe initial guess. For this reason, a current potential solution can be used to initialize the16 ormalized current potential ( ) /(2 /N P ) / ( ) (a) (b)(c) Figure 1.8: An example of a REGCOIL calculation for the W7-X standard conﬁgurationequilibrium. The winding surface is taken to be a surface uniformly oﬀset from S P by 0.5m. (a) The current potential and the uniformly-spaced contours taken for the coil set. (b)The coil set computed from the contours on the winding surface. (c) The 5 unique coils inone half period and the plasma surface. 17ptimization with ﬁlamentary methods. Although there have arguably been signiﬁcant successes in optimized stellarator design,there is still room for improvement in the algorithms and numerical methods. Speciﬁcally,we aim to address several major challenges that arise in the optimization of stellaratorconﬁgurations.1.

Coil complexity - In the standard two-step approach, coil design is decoupled fromequilibrium optimization. While this may allow for improved physics properties, theresulting equilibrium may require overly-complex coils that cannot be manufacturedeconomically or are not consistent with engineering constraints. As was stated in the2018 report of the National Stellarator Coordinating Committee [73],“The highest priority for technology is to better integrate the engineeringdesign with the physics design at the earliest possible stage.”For this reason, it is favorable to include coil complexity metrics in equilibrium opti-mization. As an example, one approach is to compute the properties of the currentpotential (Section 1.4.3) on a winding surface that is uniformly oﬀset from the plasmasurface [59] during ﬁxed-boundary optimization. It has also been proposed that prop-erties of the optimal ﬁlamentary coils for a given plasma boundary be included inequilibrium optimization [118]. Alternatively, the coils can be directly optimized witha free-boundary method. This approach was implemented in the late stages of theNCSX design [119, 217] and in the QPS (Quasi-Poloidally Symmetric Stellarator) de-sign [218], resulting in simultaneous attainment of engineering feasibility and desiredplasma properties. Another tactic to reduce coil complexity is replacing non-planarmodular coils by permanent magnets [103, 246].2.

Non-convexity - The optimization problems that arise in stellarator design are oftennon-convex (except for the current potential methods described in Section 1.4.3). Whileconvex optimization problems can be solved in polynomial time (Chapter 1 in [29]),obtaining the global optimum of a non-convex optimization problem is generally

N P -hard. As global optima are diﬃcult to locate, it is common to apply algorithms thatinstead converge to local optima. Such methods are sensitive to the initial conditionsand tend to get “stuck” in small local minima or saddle points. For this reason, it isvery valuable to have initial conﬁgurations that are close to the desired conﬁguration.One approach is to begin with an analytic construction of an equilibrium close toquasi-symmetry or omnigeneity by employing an expansion about the magnetic axis[139, 142, 193].Gradient information is invaluable for obtaining the local minimum of an objectivefunction. While there are some algorithms for derivative-free local optimization, theytypically are only eﬀective for small problems (Chapter 9 in [170]). Gradient informa-tion is also useful for global optimization; for example, with a multi-start approach,18any local optimization problems are solved to approximately obtain the global min-imum. As considerations of the gradient will be central to this Thesis, we will discussthis topic further in Chapter 2.In Figure 1.9 we show a benchmark of several optimization problems on the Rosenbrockfunction, f ( { x i } Ni =1 ) = N − (cid:88) i =1 x i +1 − x i ) + ( x i − , (1.21)with N = 2, a non-convex function with a long, thin valley that is often used to bench-mark optimization algorithms. We can note that the gradient-based BFGS methodconverges rather directly toward the optimum. In contrast, the gradient-free particleswarm method takes a scattered trajectory and requires many additional functionalevaluations.3. High-dimensionality - Often, the optimization problems that arise in stellarator de-sign require navigation through the high-dimensional spaces that describe the outerboundary of the plasma or coil shapes. While such shapes are inﬁnite-dimensionalin reality, often they are parameterized with Fourier series, and only a ﬁnite numberof modes are retained during the optimization. The number of parameters used inpractice to describe such shapes is typically O (10 ) [242]. We show a benchmark ofthe N -dimensional Rosenbrock function (1.21) in Figure 1.10, noting that the numberof function evaluations required to obtain the optimum scales poorly with N for thegradient-free methods and ﬁnite diﬀerence based gradient-free methods. As computingthe gradient with a ﬁnite-diﬀerence method requires O ( N ) function evaluations, theassociated cost is reduced signiﬁcantly if analytic derivatives are available. Stellaratorequilibrium optimization has historically proceeded with gradient-free methods, such asgenetic algorithms [161] and the Brent algorithm [59], or gradient-based methods withﬁnite-diﬀerence gradient calculations [213]. Recently, gradient-based optimization ofcoils shapes has begun to take advantage of analytic gradient and Hessian calculations[243, 244]. However, for many functions of interest, it is not so simple to compute theanalytic derivative, as the objective function may depend on the solution to a systemof equations. For such objectives, analytic derivatives can be computed with an adjointmethod. This topic will be discussed in detail in Chapter 2 and throughout the Thesis.4. Tight engineering tolerances - Once an optimal design is identiﬁed, engineering andmetrology coil tolerances must be determined from the allowable deviations of physicsparameters. In the NCSX design, it was determined that coil tolerances of ≈ . x , x ) = (10 ,

10) andconverges to the optimum at (1 ,

1) in 58 function evaluations, using an analytic gradient toobtain the descent direction. The particle swarm optimization is initialized with a swarmof 20 particles at (10 ,

10) and converges to the optimum at (1 ,

1) in 3400 evaluations. Thegradient-based method converges more directly toward a minimum, while the gradient-freemethod converges in a scattered way requiring excessive function evaluations. For (a), theoptimization was terminated when the maximum of the absolute value of the gradient ele-ments was less than 10 − , and for (b), the optimizations was terminated when the relativechange in the objective function over the previous 20 iterations was less than 10 − .schedule stretch-out which has a large management overhead cost.”One approach to address this challenge is to optimize the expected value of an objectivefunction over a distribution of possible deviations, known as stochastic optimization.This technique has been shown to increase the tolerances of an optimized coil set[150, 151]. There has also been a recent development of tools for the eﬃcient evaluationof tolerance information to avoid costly parameter scans or Monte Carlo samplingmethods [31, 88]. The eigenvectors of the Hessian matrix illuminate the most sensitiveperturbation directions at a local minimum [243, 245], and in this Thesis, we willdiscuss the shape gradient approach [138]. This Thesis aims to address each of the challenges outlined in the previous Section.The focus will be on adjoint methods, which allow for eﬃcient analytic gradient calcula-tions. With such gradient information available, we can navigate through high-dimensional,non-convex spaces that arise in stellarator design with gradient-based methods, addressing20igure 1.10: The number of function evaluations required for convergence to the minimumof the N -dimensional Rosenbrock function (1.21) as a function of the dimension. Results areshown for the gradient-based BFGS algorithm with ﬁnite-diﬀerence and analytic gradientsand the gradient-free particle swarm method. We note that the gradient-free and ﬁnite-diﬀerence gradient-based methods scale poorly with the dimension. Knowledge of analyticgradients reduces the associated cost by several orders of magnitude in comparison. Thecost reduction provided by analytic derivatives increases with increasing dimension. Forthe BFGS algorithm the optimization was terminated when the maximum of the absolutevalue of the gradient elements was less than 10 − , and for the particle swarm algorithm theoptimizations was terminated when the relative change in the objective function over theprevious 20 iterations was less than 10 − . 21bjectives 2 and 3. Derivatives obtained from the adjoint method can also be used to analyzelocal sensitivity to perturbations using the shape gradient, addressing objective 4. Speciﬁcapplications of the adjoint method described in this Thesis will enable eﬃcient free-boundarycoil optimization or coupled coil-plasma optimization, addressing objective 1.We begin in Chapter 2 with an introduction to some mathematical fundamentals thatlay the groundwork for this Thesis, including an overview of shape optimization and adjointmethods. Chapter 3 describes an adjoint method for the optimization of the coil windingsurface for minimal coil complexity. Chapter 4 describes an adjoint method for the opti-mization of several neoclassical ﬁgures of merit local to a magnetic surface, including radialﬂuxes and the bootstrap current. Chapter 5 describes an adjoint method for the optimiza-tion of functions which depend on MHD equilibrium solutions, such as those that arise inﬁxed and free-boundary optimization. The adjoint method discussed in Chapter 5 requiresthe solution of linearized MHD equilibrium equations, which are discussed in Chapter 6. InChapter 7, we summarize and discuss ongoing and future research related to this Thesis.22 hapter 2 Mathematical fundamentals

The design of a stellarator requires optimizing in the space of shapes: equilibrium designinvolves optimization of the shape of the plasma boundary, S P , and coil design involves op-timization of the shapes of ﬁlamentary coils or toroidal winding surfaces. The mathematicalﬁeld of shape optimization has developed to study such problems, contributing to the designof aerodynamic car bodies [180] and airplane wings with increased lift [162]. In this Section,we brieﬂy outline several concepts from this ﬁeld. We refer to several fundamental textbooks[40, 52, 91, 191] and a Ph.D. thesis with a gentler introduction [47]. Consider some functional, f , which depends on the shape of some domain, Γ. In order tocompute the derivative of f , we must ﬁrst identify a deformation ﬁeld, δ x , which describesthe change of the shape. If the shape begins in a state Γ, the shape deformed in the direction δ x by magnitude (cid:15) is Γ (cid:15) = { x + (cid:15)δ x ( x ) : x ∈ Γ } . In this way, we can deﬁne the shapederivative of f as, δf (Γ; δ x ) ≡ lim (cid:15) → f (Γ (cid:15) ) − f (Γ) (cid:15) . (2.1)This is a functional derivative in the direction δ x (a Gateaux functional derivative).We can prove some useful properties of the shape derivative for speciﬁc choices of func-tional, J (Γ) = (cid:90) Γ d x j (Γ) (2.2a) J (Γ) = (cid:90) ∂ Γ d x j (Γ) , (2.2b)volume and surface integrals.For volume-integrated functionals, the shape derivative can be evaluated by noting theJacobian of the transformation x ∈ Γ → x ∈ Γ (cid:15) is given by I + (cid:15) ∇ δ x , where I is the identity23ensor. This allows us to relate the volume integral over Γ (cid:15) to a volume integral over Γ, δJ (Γ; δ x ) = lim (cid:15) → (cid:15) (cid:32)(cid:90) Γ (cid:15) d x j (Γ (cid:15) ) − (cid:90) Γ d x j (Γ) (cid:33) = lim (cid:15) → (cid:15) (cid:90) Γ d x (cid:2) det ( I + (cid:15) ∇ δ x ) j (Γ (cid:15) ) | x + (cid:15)δ x − j (Γ) (cid:3) . (2.3)Noting that j (Γ (cid:15) ) | x + (cid:15)δ x = j (Γ) | x + (cid:15)δj (Γ; δ x ) + (cid:15)δ x · ∇ j (Γ) + O ( (cid:15) ) we have, δJ (Γ; δ x ) = (cid:90) Γ d x (cid:32) δj (Γ; δ x ) + δ x · ∇ j (Γ) + dd(cid:15) (cid:0) det( I + (cid:15) ∇ δ x ) (cid:1) (cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 j (Γ) (cid:33) . (2.4)The derivative of the determinant of a matrix can be computed from Jacobi’s formula, d/dt (cid:0) det( A ( t )) (cid:1) = det( A ( t ))tr( A ( t ) − A (cid:48) ( t )), δJ (Γ; δ x ) = (cid:90) Γ d x (cid:2) δj (Γ; δ x ) + δ x · ∇ j (Γ) + ( ∇ · δ x ) j (Γ) (cid:3) . (2.5)From the divergence theorem, we arrive at the following form for the shape derivative ofvolume-integrated functionals, δJ (Γ; δ x ) = (cid:90) Γ d x δj (Γ; δ x ) + (cid:90) ∂ Γ d x δ x · ˆ n j (Γ) . (2.6)The ﬁrst term accounts for the Eulerian change to j while the second term accounts forthe motion of the boundary. In ﬂuid mechanics, this relation is sometimes referred to asthe Reynolds transport theorem (Chapter 2 in [145]), which describes the time derivativeof integrated quantities associated with a moving ﬂuid. A physical picture of this result isgiven in Figure 2.1.We can now use (2.6) to obtain the shape derivative of the surface-integrated functional(2.2b). To do so, we recall that the normal vector can be expressed as ˆ n = ∇ b | ∂ Γ , where b is the signed distance function [179], b ( x ) =  − d ( x , ∂ Γ) x ∈ Γ0 x ∈ ∂ Γ d ( x , ∂ Γ) x (cid:54)∈ Γ , (2.7)and d ( x , ∂ Γ) is the shortest distance from x to any point on ∂ Γ. This can be seen by notingthat ˆ n points outward, in the direction of increasing b ( x ), and the shortest path betweena point near ∂ Γ and ∂ Γ will be along the normal direction. As b ( x ) measures Euclidiandistance, ∇ b has unit length.We can now apply the divergence theorem to write (2.2b) as J (Γ) = (cid:90) Γ d x ∇ · (cid:0) j (Γ) ∇ b (Γ) (cid:1) . (2.8)We apply the transport theorem for volume-integrated functionals (2.6) to obtain, δJ (Γ; δ x ) = (cid:90) ∂ Γ d x (cid:104) δ x · ˆ n (cid:0) ˆ n · ∇ j + j ∇ b (cid:1) + ∇ b · ∇ δb (Γ; δ x ) + δj (Γ; δ x ) (cid:105) . (2.9)24 a) (b) Figure 2.1: (a) An unperturbed volume, Γ. (b) The normal perturbation ﬁeld of magnitude (cid:15)δ x · ˆ n (black) and the perturbed volume, Γ (cid:15) (green). We can see that the linear change involume associated with the perturbation ﬁeld is δV = (cid:82) ∂ Γ d x δ x · ˆ n .We can interchange shape and spatial derivatives to see that ∇ b ·∇ δb = δ ( ∇ b · ∇ b ) = 0, as ∇ b will remain a unit vector. We can also recognize that the mean curvature, H , is relatedto the normal vector by H = ∇ ∂ Γ · ˆ n , where ∇ ∂ Γ · f = ∇ · f − ˆ n · ( ∇ f ) · ˆ n is the tangentialdivergence operator. (Sometimes H is deﬁned with the opposite sign.) For surface-integratedfunctionals we therefore obtain the following shape derivative, δJ (Γ; δ x ) = (cid:90) ∂ Γ d x (cid:2) δj (Γ; δ x ) + (ˆ n · ∇ j + 2 Hj ) δ x · ˆ n (cid:3) . (2.10)The ﬁrst term accounts for the Eulerian change to j , while the second and third termsaccount for the motion of the boundary. As one would expect, an outward perturbation ofa surface with large mean curvature leads to a large change in the area. See Figure 2.2 fora physical picture.We can already see from (2.6) and (2.10) that the shape derivatives of volume and surface-integrated functionals involve integrals over the boundary. It may appear that to understandthe form of these shape derivatives, we will need to specify the structure of j (Γ) and j (Γ).However, we can make a more general statement about shape derivatives of any form. TheHadamard-Zolesio structure theorem [52, 87] states that the shape derivative of a generalfunctional of the domain Γ with suﬃcient smoothness can be expressed as, δJ (Γ; δ x ) = (cid:90) ∂ Γ d x δ x · ˆ n G , (2.11)where G is called the shape gradient. This is an example of the Riesz representation theorem, Under the assumption of suﬃcient smoothness, spatial and shape derivatives can be shown to commuteby noting that x and Γ are independent variables (Chapter 6 in [40]). ∂ Γ, shown asthe blue and red lines, with curvatures κ and κ , respectively. The unperturbed surfacearea element bounded by the principal directions is given by dA = l l . Upon a normaldisplacement of magnitude (cid:15)δ x · ˆ n , the new area element is given by ( dA ) (cid:15) = l l (1 + κ (cid:15)δ x · ˆ n )(1 + κ (cid:15)δ x · ˆ n ), so the linear change in the area element is δA = ( dA )2 Hδ x · ˆ n , where H = κ + κ is the mean curvature.which (roughly) states that any linear functional can be expressed as an inner product withan element of the appropriate space (Chapter 4 in [199]). The shape derivative is a linearfunctional of the normal perturbation to the boundary, δ x · ˆ n , and can be expressed as asurface integral with the shape gradient. This form is especially powerful for computation,as the deformation ﬁeld only needs to be deﬁned on the boundary, and the derivative canbe written in terms of a surface integral rather than a volume integral. Intuitively, linearchanges to a functional only depend on normal perturbations of the boundary. If the shapegradient can be determined, then for any possible deformation ﬁeld, δ x , the correspondingchange to the functional δJ (Γ; δ x ), is known. We can think of G as being a measure of the local sensitivity: regions of increased |G| correspond to regions of increased sensitivity of J (Γ) with respect to normal perturbations.For stellarator optimization, we are also interested in functionals which depend on theshape of a set of ﬁlamentary lines, C = { C k } . We expect that perturbations of the coils inthe tangential direction will not result in a linear change to the functional. We can, therefore,write the shape derivative in a form analogous to the structure theorem (2.11) by the Rieszrepresentation theorem, δf ( C ; δ x C k ) = (cid:88) k (cid:73) C K dl δ x C k × ˆ t · G k , (2.12)where ˆ t is the tangent vector, integration is taken along each coil, and the sum is taken overall coils. As a curve has two independent directions perpendicular to the tangent vector,26he shape gradient is now a vector, G k . Its direction indicates the direction of perturbationwhich leads to the largest increase in the functional, and its magnitude indicates the level ofsensitivity to a given perturbation.To motivate this form of the coil shape gradient, we consider the example of the magneticﬁeld computed from the Biot-Savart law applied to a set of ﬁlamentary coils { C k } , B ( x , C ) = µ π (cid:88) k I C k (cid:73) C k dl ˆ t ( l ) × ( x − x k ( l )) | x − x k ( l ) | , (2.13)where x k is the position along the k th coil and ˆ t = x (cid:48) k ( l ) is the unit tangent vector. The shapederivative of the magnetic ﬁeld can now be computed with respect to a coil perturbationﬁeld δ x by considering the perturbation of a general closed line integral Q L ( C ) = (cid:72) C dl Q ( C )[9, 138], δQ L ( C ; δ x ) = (cid:73) C dl (cid:32) δ x · (cid:18) − κ Q + (cid:16) I − ˆ t ˆ t (cid:17) · ∇ Q (cid:19) + δQ ( C ; δ x ) (cid:33) , (2.14)where κ ( l ) = ˆ t (cid:48) ( l ) is the curvature vector.Upon application of this identity and integration by parts, we obtain, δ B ( x , C ; δ x k ) = µ π (cid:88) k (cid:73) C k dl δ x k × ˆ t ( l ) · (cid:32) − I | x − x k ( l ) | + 3( x − x k ( l )) ( x − x k ( l )) | x − x k ( l ) | (cid:33) , (2.15)where I is the identity tensor. Thus the shape derivative of a ﬁgure of merit that depends onthe vacuum magnetic ﬁeld through the Biot-Savart law can be expressed in the coil shapegradient form (2.12). In Chapter 5 we will show explicit examples of other ﬁgures of meritthat can be expressed in this form. In practice, it may be convenient to describe a shape by a set of parameters, Ω. We canrelate the shape derivative and shape gradient deﬁned in the previous Section to derivativeswith respect to such parameters.Suppose that we have a surface described by a set of parameters, Ω. For example, in thecontext of stellarator equilibrium calculations, the plasma boundary is often described by aset of Fourier coeﬃcients of the cylindrical coordinates, { R cm,n , Z sm,n } , R = (cid:88) m,n R cm,n cos( mθ − nN P φ ) (2.16a) Z = (cid:88) m,n Z sm,n sin( mθ − nN P φ ) . (2.16b)Here θ is a poloidal angle, φ is a toroidal angle, and the conﬁguration is assumed to possessstellarator symmetry, which implies that R ( − θ, − φ ) = R ( θ, φ ) and Z ( − θ, − φ ) = − Z ( θ, φ )2753]. The number of periods is N P , representing the discrete rotational symmetry of theequilibrium (Section 12 in [121]). This is the representation of the boundary shape used inthe VMEC code [111].In this case, we can compute the shape derivative corresponding to perturbations of eachparameter, δ x = (cid:0) ∂ x (Ω) /∂ Ω i (cid:1) δ Ω i δJ (Γ(Ω); δ x ) = ∂J (Γ(Ω)) ∂ Ω i δ Ω i , (2.17)by expression our functional as a function of the parameters. We apply the structure theorem(2.11) to obtain the following expression, ∂J (Γ(Ω)) ∂ Ω i = (cid:90) ∂ Γ d x ∂ x (Ω) ∂ Ω i · ˆ n G . (2.18)Given ∂J (Γ(Ω)) /∂ Ω i and ∂ x (Ω) /∂ Ω i , we can consider this to be a linear system for G .For numerical calculation, the above can be discretized using a collocation method or byexpanding G in a set of basis functions. Often the linear system is not square, in which casean SVD or QR decomposition can be used.Now suppose that our coils are described by a set of parameters, Ω. For example, theCartesian components of the ﬁlamentary line can be described by a Fourier series, x k = (cid:88) m X kcm cos( mθ ) + X ksm sin( mθ ) (2.19a) y k = (cid:88) m Y kcm cos( mθ ) + Y ksm sin( mθ ) (2.19b) z k = (cid:88) m Z kcm cos( mθ ) + Z ksm sin( mθ ) , (2.19c)where θ ∈ [0 , π ] is an angle parameterizing each curve. Again we compute the shapederivative corresponding to perturbations of each parameter, δ x C k = (cid:0) ∂ x C k (Ω) /∂ Ω i (cid:1) δ Ω i , δf ( C ; δ x C k ) = ∂f ( { C k (Ω) } ) ∂ Ω i δ Ω i , (2.20)to obtain, ∂f ( C ) ∂ Ω i = (cid:88) k (cid:73) C k dl ∂ x C k (Ω) ∂ Ω i × ˆ t · G k . (2.21)As with the case of functionals of surfaces, we can consider the above to be a linear systemfor G k that can be solved numerically.An overview of this method and examples of its application for ﬁgures of merit relevantfor stellarator optimization are provided in [138]. The shape derivatives computed in this Section are quite general, applying to any func-tional of surfaces, volumes, or lines. For some problems we will be able to use the expressions28or the shape derivatives, (2.6) and (2.10), to obtain an explicit expression for the shape gra-dient. For example, if we consider the volume functional, (2.2a) with j = 1, then we seefrom (2.6) that the shape gradient will be G = 1. If we consider the surface functional, (2.2b)with j = 1, then we see from (2.10) that the shape gradient will be G = 2 H . However, formany functionals, this type of explicit calculation is not possible. We are often interested infunctionals which depend on solutions of a PDE, in which case we can compute the shapegradient by solving an additional PDE, known as an adjoint equation. We describe theadjoint method in more detail in the following Section.For other problems, it may be more convenient to compute the shape derivative fromparameter derivatives, as in (2.17) and (2.20), rather than applying the transport theorems.The shape gradient can then be inferred by solving the corresponding linear systems, (2.18)and (2.21). Sometimes these parameter derivatives can be obtained analytically or with anadjoint method; otherwise, they are obtained with a ﬁnite-diﬀerence method.As the shape gradient measures the local sensitivity of a ﬁgure of merit to perturbationsof a shape, we can use it to quantify the uncertainty in a ﬁgure of merit given a distributionof small perturbations to the shape. As shown in [138], the plasma surface or coil shapegradient can be used to determine the allowable deformations of a shape given a permissiblechange to a ﬁgure of merit. Suppose a ﬁgure of merit f has an allowable deviation ∆ f (ineither direction). If we deﬁne a local tolerance for the k th coil as, T k ( l ) = w k ( l )∆ f (cid:80) k (cid:48) (cid:72) dl w k (cid:48) ( l (cid:48) ) | G k (cid:48) ( l (cid:48) ) | , (2.22)such that the perturbation amplitude | δ x C k ( l ) × ˆ t ( l ) | ≤ T k ( l ) along the k th coil, then thethe change of the ﬁgure of merit will be, | δf (cid:0) C ; δ x C k (cid:1) | ≤ (cid:88) k (cid:73) C k dl | δ x C k × ˆ t · G k | ≤ (cid:88) k (cid:73) C k dl T k | G k | = ∆ f, (2.23)upon application of the triangle inequality. Here w k ( l ) is a weight function which allows forthe distribution of tolerance to be non-uniform along the coil. In identifying such a toler-ance we have relied on a local approximation of the function, considering small-amplitudeperturbations such that a linear approximation is valid.Similarly, a tolerance with respect to perturbations of a surface can be deﬁned withrespect to the surface shape gradient, T = w ∆ f (cid:82) ∂ Γ d x w G , (2.24)where w is a weight function deﬁned on the surface ∂ Γ. For example, we could considerthe tolerance of a ﬁgure of merit that depends on the position of the plasma boundary, S P .If we constrain perturbations of the surface such that | δ x · ˆ n | ≤ T , then we ﬁnd that thecorresponding change to the ﬁgure of merit is δf ≤ ∆ f . However, the deformation of amagnetic surface is not a quantify that can be directly experimentally controlled, requiringequilibrium reconstruction methods [89].A more practically relevant quantity is computed from the sensitivity to perturbations29f the magnetic ﬁeld, S B , deﬁned through, δf ( S P ; δ x ) = (cid:104)G(cid:105) ψ δV ( δ x ) + (cid:90) S P d x S B δ B ( δ x ) · ˆ n , (2.25)where δV and δ B are the perturbations to the volume enclosed by S P and magnetic ﬁeldresulting from a surface displacement of δ x and (cid:104) . . . (cid:105) ψ is the ﬂux-surface average (A.10).The quantity S B , which quantiﬁes the local sensitivity to perturbations of the magneticﬁeld, is computed from the shape gradient as, B · ∇ S B = (cid:104)G(cid:105) ψ − G . (2.26)A tolerance with respect to magnetic ﬁeld perturbations can then be constructed as, T B = w ∆ f (cid:82) S P d x w | S B | , (2.27)for a chosen weight function w , such that if the normal magnetic perturbations satisfy | δ B · ˆ n | ≤ T B , then δf ≤ ∆ f . The tolerance with respect to magnetic perturbations caninform allowable coil deformations, location of trim coils, and position of current leads. Inthis way, important engineering tolerances are inferred, addressing objective 4 from Section1.4.4. An adjoint method is a numerical method for the eﬃcient calculation of derivatives ofan objective function that depends on the solution to some set of equations, known as theforward system. At the heart of the adjoint method is the adjoint equation, in which theadjoint of the linearized forward operator appears in addition to an inhomogeneous termthat depends on the objective function of interest.There are other instances in which the adjoint operator may become useful. An adjointFokker-Planck equation is used to compute the quasilinear generation of current by RF waves[9] or to study runaway electron dynamics [148]. An adjoint gyrokinetic equation can alsobe used to analyze the evolution of free energy [141]. Finally, adjoint operators are used topredict and correct discretization error [78, 189] and perform eﬃcient grid adaptation [231].In this Chapter, we focus our attention on adjoints for eﬃcient derivative calculations.Adjoint methods were introduced by the optimal control theory community in the 1960s[74, 126], and were later adopted by the ﬂuid dynamics community [190]. They have sincebeen popularized for aeronautical design [123], car aerodynamics [180], geophysics [192], andnuclear ﬁssion reactor design [68]. Aside from the body of work associated with this Thesis,there is only one other example of the use of adjoint methods in fusion sciences: for theshape optimization of tokamak divertors based on adjoint ﬂuid equations [47, 49, 50, 51].We refer to several introductory articles on adjoint methods [4, 79, 192].We begin our overview of adjoint methods with its application for objective functions thatdepend on the solution of ﬁnite-dimensional, discrete linear systems in Section 2.2.1. We willthen generalize to objective functions that depend on the solution of inﬁnite-dimensional,30ossibly nonlinear systems in Section 2.2.2. The two approaches are compared in Section2.2.3.

Suppose we would like to solve the optimization problem,min Ω f (Ω , −→ x ) , (2.28)where −→ x is the solution of a linear system, ←→ A (Ω) −→ x = −→ b (Ω) . (2.29)Here ←→ A is an N × N matrix and −→ x and −→ b are N × { Ω i } N Ω i =1 be a set of design parameters deﬁning our optimization space. To minimize (2.28) with agradient-based method, we compute the derivative with respect to Ω using the chain rule, df (Ω , −→ x (Ω)) d Ω = ∂f (Ω , −→ x ) ∂ Ω + (cid:32) ∂f (Ω , −→ x ) ∂ −→ x (cid:33) T ∂ −→ x (Ω) ∂ Ω . (2.30)Here ∂f (Ω , −→ x ) /∂ −→ x is the gradient of f with respect to −→ x , a column vector. To evaluate ∂ −→ x (Ω) /∂ Ω, we must compute linear perturbations of (2.29), ∂ ←→ A (Ω) ∂ Ω −→ x (Ω) + ←→ A (Ω) ∂ −→ x (Ω) ∂ Ω = ∂ −→ b (Ω) ∂ Ω . (2.31)We schematically evaluate the perturbation to the solution as, ∂ −→ x (Ω) ∂ Ω = ←→ A (Ω) − (cid:32) ∂ −→ b (Ω) ∂ Ω − ∂ ←→ A (Ω) ∂ Ω −→ x (Ω) (cid:33) . (2.32)Inserting the result into (2.30), we obtain df (Ω , −→ x (Ω)) d Ω = ∂f (Ω , −→ x ) ∂ Ω+ (cid:32) ∂f (Ω , −→ x ) ∂ −→ x (cid:33) T  ←→ A (Ω) − (cid:32) ∂ −→ b (Ω) ∂ Ω − ∂ ←→ A (Ω) ∂ Ω −→ x (Ω) (cid:33) . (2.33)This approach to computing the derivative, the forward-sensitivity method, requires com-puting N Ω + 1 solutions to a linear system of size N × N : we must solve (2.29) once for −→ x ,and we must solve, ←→ A (Ω i ) −→ y = ∂ −→ b (Ω) ∂ Ω i − ∂ ←→ A (Ω) ∂ Ω i −→ x (Ω i ) , (2.34)for ←→ y once for each Ω i . 31y rearranging parentheses, (2.33) is equivalent to, df (Ω , −→ x (Ω)) d Ω = ∂f (Ω , −→ x ) ∂ Ω+ (cid:32)(cid:16) ←→ A (Ω) T (cid:17) − ∂f (Ω , −→ x ) ∂ −→ x (cid:33) T (cid:32) ∂ −→ b (Ω) ∂ Ω − ∂ ←→ A (Ω) ∂ Ω −→ x (Ω) (cid:33) , (2.35)where we have noted that the transpose and inverse operations can be interchanged for anyinvertible matrix. Thus we can see that if we compute the solution to the following adjointequation, ←→ A (Ω) T ←→ z = ∂f (Ω , −→ x ) ∂ −→ x , (2.36)then we can compute the derivative of the objective function in a more convenient way, df (Ω , −→ x (Ω)) d Ω = ∂f (Ω , −→ x ) ∂ Ω + −→ z T (cid:32) ∂ −→ b (Ω) ∂ Ω − ∂ ←→ A (Ω) ∂ Ω −→ x (Ω) (cid:33) . (2.37)This method for computing the derivative, known as the adjoint method, only requires twosolutions of a linear system of size N × N : (2.29) and (2.36). In general, the partial derivativesof −→ b (Ω) and ←→ A (Ω) can be computed analytically. In this way, no approximations are madein obtaining (2.37). The power of this approach becomes apparent in high-dimensionalspaces: the adjoint method requires only two solutions of such linear systems, while theforward-sensitivity method requires N Ω + 1 solutions. Approximating the derivative with aﬁnite-diﬀerence method also requires at least N Ω + 1 solutions, depending on the size of thestencil.The approach presented in this Section can be understood as a linear algebra trick. Wewant to solve a linear system for many right-hand sides, as in (2.34). Moreover, we areonly interested in a speciﬁc inner product with these solutions, (2.33). As we are allowedto interchange the transpose and inverse operations, we arrive at the adjoint form (2.36). Ifthe partial derivatives of ←→ A (Ω) and −→ b (Ω) can be computed analytically, and the adjointequation is solved exactly, then no approximations are made here. In this sense, we canconsider the adjoint-based derivative to be the exact analytic derivative. In practice, theremay be a small amount of error introduced due to the ﬁnite tolerance of the linear solve. Computational complexity comparison

We now compare the computational complexity of the forward-sensitivity method, theﬁnite-diﬀerence method, and the adjoint method for computing the derivative. Here we willignore any cost associated with constructing ←→ A (Ω), −→ b (Ω), or their derivatives. For somematrix types (e.g. sparse) the number of required operations may be reduced from whatis given here, but we simply try to estimate the relative costs. The ﬂop counts for matrixcomputations can be found in standard references such as [226].For both the forward and adjoint sensitivity methods, we must form the right-hand side of(2.34) for each Ω i , each of which requires a matrix-vector product and a vector-vector sum for32orward Sensitivity Finite diﬀerence Adjoint4 N Ω N + N N Ω N N Ω N + N Table 2.1: Approximate ﬂop counts for the forward-sensitivity, ﬁnite-diﬀerence, and adjointmethod for calculation of the derivative.a combined cost of ≈ N + N ﬂops. The forward-sensitivity method requires solving (2.34) N Ω times. For example, an LU factorization method can be used, which requires ≈ N ﬂops. Once the factorization is known, solving the system (2.31) via backward substitutioncosts ≈ N ﬂops for each Ω i . Once ∂ −→ x /∂ Ω is obtained, N Ω vector-vector products mustbe performed to obtain the derivatives of f as in (2.33), each which requires 2 N ﬂops. Thusthe composite number of ﬂops is ≈ N Ω N + N . With a ﬁnite-diﬀerence method, the totalcost of computing ∂ −→ x /∂ Ω requires at least ≈ N Ω N ﬂops, assuming that the linear solveis the most expensive step and a one-sided stencil is used.Alternatively, the adjoint method for computing the derivative requires two linear solves.If an LU factorization method is used, then the matrix factorization of ←→ A = ←→ L ←→ U can bereused to solve the adjoint system (2.36), as ←→ A T = ←→ U T ←→ L T where ←→ U T is lower-triangularand ←→ L T is upper-triangular. Thus the cost of computing the two solutions requires ≈ N + 4 N ﬂops. Once the adjoint solution is obtained, N Ω matrix-vector products andvector-vector sums must be computed in (2.37) each with cost ≈ N + N ﬂops. Again, N Ω vector-vector products are required, each of which requires ≈ N ﬂops. Thus the totalcomplexity is ≈ N Ω N + N ﬂops, assuming large N . A summary of these approximateﬂop counts is given in Table 2.1.We see that the adjoint method provides modest savings over the forward-sensitivitymethod when N Ω is comparable to N . However, for many problems the assumptions madein this Section do not apply. In particular, if ←→ A is sparse, ←→ L and ←→ U will be generally bedense, in which case the matrix-vector multiplication that appears on the right-hand-sideof (2.37) will be signiﬁcantly cheaper than backsubstitution to solve (2.34), and there willbe a more signiﬁcant savings with the application of the adjoint method over the forward-sensitivity method. For very large matrices it may be impractical to LU factorize ←→ A .Instead, a preconditioner may be factorized, and the linear system is solved with a Krylovsubspace iterative method. Again for such systems, solving the factorized system will besigniﬁcantly more expensive than matrix-vector multiplication.In comparison with ﬁnite diﬀerences, the adjoint method oﬀers a reduction of complexityby O ( N Ω ). The accuracy of the ﬁnite-diﬀerence method depends on the size of the stenciland choice of step size. While a wider stencil provides a more accurate derivative, it increasesthe number of required function evaluations. The step size must also be chosen carefully toavoid the introduction of noise: a large step size will introduce nonlinearity, while a smallstep size will introduce round-oﬀ error. For these reasons, the adjoint method is preferableover a ﬁnite-diﬀerence method. 33 .2.2 Continuous approach The adjoint method presented in the previous Section applies only to functions thatdepended on the solution of a linear system in a ﬁnite-dimensional space. We now gener-alize this result to obtain an adjoint equation in an inﬁnite-dimensional space. Often inoptimization, we are interested in an objective function which depends on the solution of aPDE, L (Ω , u ) = 0 , (2.38)such as the MHD equilibrium equations (1.3). Here L is some linear or nonlinear operator,and u is an unknown. We are optimizing with respect to a set of parameters, Ω, which maygenerally be inﬁnite-dimensional; for example, Ω may describe the shape of some domain.Our diﬀerential operator may depend on these parameters. We assume that u is a member ofsome Hilbert space, H , which possesses an inner product structure denoted by (cid:104) . , . (cid:105) . If thisPDE is linear, then the discretized form of this problem can generally be written as (2.29),and the adjoint equation can be obtained after discretization as described in the previousSection. The method described in this Section will allow us to get an adjoint equation before discretization.We can consider u to depend on Ω through the solution to (2.38). We perform linearperturbations about the base state (2.38) corresponding to perturbations of Ω, δL (Ω , u ; δ Ω) + δL (cid:0) Ω , u ; δu (Ω; δ Ω) (cid:1) = 0 . (2.39)Our objective function, f (Ω , u ), is some linear or nonlinear scalar functional of Ω and u .Linear perturbations of f (Ω , u ) can generally be written as an inner product with δu , δf (Ω , u ; δu ) = (cid:68) (cid:101) f , δu (cid:69) . (2.40)This is another example of the Riesz representation theorem: as δf is a linear functional of δu , we can express it as an inner product with (cid:101) f ∈ H .We are interested in computing linear perturbations to f such that u (Ω) satisﬁes thePDE. The constrained problem is expressed through the objective function, f (Ω , u (Ω)),whose derivative with respect to Ω is computed to be, δf (Ω , u (Ω); δ Ω) = δf (Ω , u ; δ Ω) + (cid:68) (cid:101) f , δu (Ω; δ Ω) (cid:69) , (2.41)and δu (Ω; δ Ω) satisﬁes (2.39). This is an analogous expression to (2.33) in the discretelinear case. Computing the derivative in this way requires many solutions of a PDE: onesolution of the initial base state (2.38) and one solution of (2.39) for each perturbation ofthe optimization parameters, δ Ω.A more eﬃcient method of computing these derivatives is by application of Lagrangemultipliers, enforcing (2.38) as a constraint. We now deﬁne the corresponding Lagrangianas, L (Ω , (cid:101) u, (cid:101) λ ) = f (Ω , (cid:101) u ) + (cid:68)(cid:101) λ, L (Ω , (cid:101) u ) (cid:69) , (2.42)where (cid:101) λ ∈ H is a Lagrange multiplier. In the above expression, (cid:101) u ∈ H but it does not34ecessarily satisfy (2.38), hence the distinction by the tilde. If L is stationary with respectto (cid:101) λ , then (cid:101) u is a weak solution of the PDE, indicated by u . If L is stationary with respectto (cid:101) u , then (cid:101) λ will satisfy the weak form of an adjoint PDE, at which point we denote (cid:101) λ by λ .If L is stationary with respect to both (cid:101) u and (cid:101) λ , or (cid:101) u = u and (cid:101) λ = λ , then derivatives of L with respect to Ω are equal to derivatives of f with respect to Ω, δ L (Ω , (cid:101) u, (cid:101) λ ; δ Ω) | (cid:101) u = u, (cid:101) λ = λ = δf (Ω , u (Ω); δ Ω) . (2.43)We will show this directly in a moment.We now look for a stationary point of L with respect to (cid:101) u , δ L (Ω , (cid:101) u, (cid:101) λ ; δ (cid:101) u ) = (cid:68) (cid:101) f , δ (cid:101) u (cid:69) + (cid:68)(cid:101) λ, δL (Ω , (cid:101) u ; δ (cid:101) u ) (cid:69) = 0 . (2.44)We note that δL (Ω , (cid:101) u ; δ (cid:101) u ) is a linear functional of δ (cid:101) u , so we can write this schematically as, δL (Ω , (cid:101) u ; δ (cid:101) u ) = ˆ L (Ω , (cid:101) u ) δu, (2.45)where ˆ L (Ω , (cid:101) u ) is a linear operator. The adjoint of an operator A , which we denote by A † , isdeﬁned by (cid:104) Ay, x (cid:105) = (cid:104) y, A † x (cid:105) for x, y ∈ H . Thus we can rewrite the above as, δ L (Ω , (cid:101) u, (cid:101) λ ; δ (cid:101) u ) = (cid:68) (cid:101) f + ˆ L (Ω , (cid:101) u ) † (cid:101) λ, δ (cid:101) u (cid:69) = 0 . (2.46)This is a weak form of the adjoint PDE, (cid:101) f + ˆ L (Ω , (cid:101) u ) † λ = 0 . (2.47)We indicate its solution by λ , as it corresponds with a stationary point of L with respectto (cid:101) u . We now see that if (cid:101) u satisﬁes (2.38) and (cid:101) λ satisﬁes (2.47), then derivatives of f withrespect to Ω are equal to derivatives of L with respect to Ω, δ L (Ω , (cid:101) u, (cid:101) λ ; δ Ω) | (cid:101) u = u, (cid:101) λ = λ = δf (Ω , u ; δ Ω) + (cid:10) λ, δL (Ω , u ; δ Ω) (cid:11) = δf (Ω , u ; δ Ω) − (cid:10) λ, δL (Ω , u ; δu (Ω; δ Ω) (cid:11) , (2.48)where we have used (2.39). If we now apply the adjoint condition and enforce that λ satisfythe adjoint PDE (2.47), then we indeed obtain (2.41), as desired.The adjoint method for computing the derivative of f with respect to the parameters Ωis, δf (Ω , u (Ω); δ Ω) = δ L (Ω , (cid:101) u, (cid:101) λ ; δ Ω) | (cid:101) u = u, (cid:101) λ = λ = δf (Ω , u ; δ Ω) + (cid:10) λ, δL (Ω , u ; δ Ω) (cid:11) . (2.49)This is the continuous analogue of (2.37). The ﬁrst term corresponds with the explicitdependence of f on Ω, while the second term corresponds with the dependence through u .Note that, if (2.38) is satisﬁed, then we can choose λ to be whatever we would like, as thesecond term in the Lagrangian functional (2.42) will always vanish. For some problems, otherchoices for λ may be convenient, although (2.49) will no longer hold. In Chapter 5, a slightlydiﬀerent choice for the adjoint variable will be made. Rather than being a stationary point,boundary terms remain in the expression for δ L (Ω , u, λ ; δu ) (see (5.42)-(5.43) and (5.52)-(5.53)).In practice, the inﬁnite-dimensional optimization space may be approximated by a dis-crete set of parameters, Ω = { Ω i } N Ω i =1 . Thus with the solution of only two PDEs, the forward352.38) and adjoint (2.47) problems, we obtain the derivative of our objective function withrespect to an arbitrary number of parameters. An alternative is the forward-sensitivitymethod, using (2.39) and (2.41), which requires N Ω linear PDE solution and one (possibly)nonlinear PDE solutions, (2.38).The ﬁnite-diﬀerence method requires at least N Ω + 1 (possibly) nonlinear PDE solutions,depending on the size of the stencil. Thus the adjoint method provides a signiﬁcant advantagewhen N Ω is large, assuming that the PDE solve is expensive in comparison with otheroperations, such as performing the inner products. It is not straightforward to comparethe complexity of these methods as in Section 2.2.1 as the ﬂop count will depend on thenumerical methods used to solve a PDE. However, we can see that the adjoint methodprovides a reduction in the number of required PDE solves by O ( N Ω ) over both the forward-sensitivity and ﬁnite-diﬀerence methods.Of course, both the forward and adjoint PDEs are typically solved numerically by ap-proximation in a ﬁnite-dimensional space. The accuracy of the derivative computed withthe adjoint method will, therefore, depend on the tolerance to which the base state and ad-joint PDEs are solved in addition to the discrepancy between the inﬁnite-dimensional innerproduct and its ﬁnite-dimensional approximation. We now see that there are two general strategies to the application of the adjoint method:obtaining the adjoint before discretization, the continuous adjoint approach, or obtaining theadjoint after discretization, the discrete approach. There are relative merits to each. Withthe discrete adjoint method, the accuracy of the derivative only depends on the tolerance towhich the forward and adjoint systems are solved. On the other hand, with the continuousmethod, it also depends on the discretization error of the PDE due to the diﬀerence betweenthe inﬁnite-dimensional inner product and its ﬁnite-dimensional approximation. The twoapproaches must agree in the limit of inﬁnite resolution. In practice, the diﬀerence betweenthe two is relatively small, though it has been suggested that the discrepancy between thecontinuous and discrete gradients may become important near a local minimum [47], wherethe gradient obtained from the continuous approach may not be a descent direction of thediscretized problem.The continuous approach oﬀers the advantage that the adjoint equation can be derivedindependently of the choice of discretization; thus, if the adjoint equation has a signiﬁcantlydiﬀerent structure from the forward equation, a distinct discretization scheme can be applied.It also may oﬀer further insight into the structure of the adjoint equations and its boundaryconditions. For this reason, the continuous approach may be preferable in the presenceof shocks or singularities [79], as we demonstrate in Chapter 6. For both approaches, theresulting adjoint equation is linear. Implementation of the discrete method is sometimesmore straightforward, as the adjoint and forward operators have the same eigenvalues, sothe same numerical linear algebra methods can typically be used to solve both problems. Aswe will see in Chapter 4, if an LU factorization method is used to solve the linear system,then the factorization of the matrix or its preconditioner can be reused to solve the discrete36djoint problem. There is not a clear consensus in the literature as to which approach ispreferable, and the choice usually depends on the application of interest. With an adjoint method, optimization within a high-dimensional space is no longer asigniﬁcant challenge. An adjoint-based derivative provides a reduction of computationalcomplexity over ﬁnite diﬀerences by approximately the optimization dimension, N Ω , assummarized in Table 2.1. Given that the cost of computing the gradient becomes com-parable to the cost of the forward solve, we can easily take advantage of gradient-basedoptimization methods. For line-search gradient-based methods, each iteration reduces toa one-dimensional line search once a descent direction is identiﬁed [170]. Therefore withadjoint methods, high-dimensional, non-convex optimization becomes feasible, allowing usto address objectives 2 and 3 from Section 1.4.4. In the following Chapters, we will demonstrate the application of shape calculus andadjoint methods for several problems arising in stellarator optimization. In Chapter 3 wedescribe a discrete adjoint method for the optimization of coil shapes based on the currentpotential method described in Section 1.4.3. With the derivatives obtained from the ad-joint method, we compute a shape gradient with respect to perturbations of the coil-windingsurface, allowing us to identify regions where ﬁgures of merit become sensitive to coil pertur-bations. In Chapter 4, we compare a continuous and discrete adjoint method for computinggeometric derivatives of several neoclassical quantities. These geometric derivatives allowus to compute a sensitivity function for local magnetic ﬁeld strength perturbations that isanalogous to the shape gradient. In Chapter 5, we describe a continuous adjoint methodfor computing the shape gradient of quantities that depend on MHD equilibrium solutions.These shape gradients can be used for equilibrium optimization of the plasma boundaryor coil shapes and sensitivity analysis. For this application, the adjoint equation containssingular behavior, so a distinct discretization and solution scheme are required, discussed inChapter 6. 37 hapter 3

Adjoint winding surface optimization

In this Chapter, we apply the linear adjoint approach described in Section 2.2.1 for theoptimization of coil shapes. We assume that coils are conﬁned to a winding surface using thecurrent potential method introduced in Section 1.4.3. The application of the adjoint methodwill allow us to eﬃciently optimize in the space of the geometry of the coil-winding surfaceand study the sensitivity to local perturbations using the shape gradient.The material in this Chapter has been adapted from [185] with permission.

In the traditional stellarator optimization method, coils are designed to produce a targetouter plasma boundary. The plasma boundary is separately optimized for various physicsquantities, including magnetohydrodynamic (MHD) stability, neoclassical conﬁnement, andproﬁles of rotational transform and pressure [175]. The coil shapes are then optimized suchthat one of the magnetic surfaces approximately matches the desired plasma surface. Ingeneral, the desired plasma conﬁguration cannot be produced exactly due to engineeringconstraints on the coil complexity. Additional diﬃculty is introduced by the ill-posednessof solving Laplace’s equation numerically in the vacuum region for a prescribed normalmagnetic ﬁeld on the plasma boundary [25, 158].In addition to the minimization of the magnetic ﬁeld error, several factors should be con-sidered in the design of coil shapes. The winding surface upon which the currents lie shouldbe suﬃciently separated from the plasma surface to allow for neutron shielding to protectthe coils, the vacuum vessel, and a divertor system. In a reactor, the coil-plasma distanceis closely tied to the tritium-breeding ratio and overall cost of electricity, as it determinesthe allowable blanket thickness. The coil-plasma distance was targeted in the ARIES-CSstudy to reduce machine size [60]. In practice, the minimum feasible coil-plasma separationis a function of the desired plasma shape. Concave regions (such as the bean-shaped W7-Xcross-section) are especially challenging to produce [137] and require the winding surface tobe near the plasma surface. While decreasing the inter-coil spacing minimizes ripple ﬁelds,increasing coil-coil spacing allows adequate space for removal of blanket modules, heat trans-port plumbing, diagnostics, and support structures. The curvature of a coil should be below38 certain threshold to allow for the ﬁnite thickness of the conducting material and to avoidprohibitively high manufacturing costs. The length of each coil should also be considered,as the expense will grow with the amount of conducting material that needs to be produced.For these reasons, identifying coils with suitable engineering properties can impact the sizeand cost of a stellarator device.Most coil design codes have assumed the coils to lie on a closed toroidal winding surfaceenclosing the desired plasma surface. In NESCOIL [158], the currents on this surface aredetermined by minimizing the integral-squared normal magnetic ﬁeld on the target plasmasurface. The current density is computed using a stream function approach, where the cur-rent potential on the winding surface is decomposed in Fourier harmonics. The optimizationtakes the form of a least-squares problem that can be solved with the solution of a singlelinear system. The coil ﬁlament shapes are then obtained from the contours of the currentpotential. Because it is guaranteed to ﬁnd a global minimum, NESCOIL is often used in thepreliminary stages of the design process [57, 135, 212]. NESCOIL was used for the initial coilconﬁguration studies for NCSX [194], and the W7-X coils were designed using an extensionof NESCOIL, which modiﬁed the winding surface geometry for quality of magnetic surfacesand engineering properties of the coils [15]. However, the inversion of the Biot-Savart in-tegral by NESCOIL is fundamentally ill-posed, resulting in solutions with ampliﬁed noise.The REGCOIL [136] approach addresses this problem with Tikhonov regularization. Herethe surface-average-squared current density, corresponding to the squared-inverse distancebetween coils, is added to the objective function. With the addition of this regularizationterm, REGCOIL can simultaneously increase the minimum coil-coil distances and improvethe reconstruction of the desired plasma surface over NESCOIL solutions. In this Chapter,we build on the REGCOIL method to optimize the current distribution in three dimensions.The current distribution on a single winding surface is computed with REGCOIL, and thewinding surface geometry is optimized to reproduce the plasma surface with ﬁdelity andimprove the engineering properties of the coil shapes.Other nonlinear coil optimization tools exist which evolve discrete coil shapes ratherthan continuous surface current distributions. Drevlak’s ONSET code [154] optimizes coilswithin limiting inner and outer coil surfaces. The COILOPT [216, 218] code, developed forthe design of the NCSX coil set [242], optimizes coil ﬁlaments on a winding surface whichis allowed to vary. COILOPT++ [32] improved upon COILOPT by deﬁning coils usingsplines, which enables one to straighten modular coils to improve access to the plasma. Theneed for a winding surface was eliminated with the FOCUS [243] code, which represents coilsas three-dimensional space curves. The FOCUS approach employs analytic diﬀerentiationfor gradient-based optimization, as we do in this Chapter. As the design of optimal coilsis central to the development of an economical stellarator, it is important to have severalapproaches. The current potential method could have several advantages, including thepossible implementation of adjoint methods. Furthermore, the complexity of the nonlinearoptimization is reduced over other approaches, as the current distribution on the windingsurface is eﬃciently and robustly computed by solving a linear system. By optimizing thewinding surface, it is possible to gain insight into what features of plasma surfaces requirecoils to be close to the plasma, and what features allow coils to be placed farther away [137].39arallels can be drawn between the design of stellarator coils and the design of magneticresonance imaging (MRI) coils. MRI gradient coils which lie on a cylindrical winding surfacemust provide a speciﬁed spatial variation in the magnetic ﬁeld within a region of interest.This inverse problem is often solved with a linear least-squares system by minimizing thesquared departure from the desired ﬁeld at speciﬁed points with respect to the currentin diﬀerential surface elements [228]. This method is comparable to the NESCOIL [158]approach for stellarator coil design. Gradient coil design was improved by the addition of aregularization term related to the integral-squared current density [63] or the integral-squaredcurvature [62], comparable to the REGCOIL approach. The adjoint method has been appliedto compute the sensitivity of an objective function with respect to the current potential onthe MRI winding surface. Here the Biot-Savart law is written in terms of a matrix equationusing the least-squares ﬁnite element method, and the adjoint of this matrix is inverted tocompute the derivatives [124]. As the adjoint formalism has proven fruitful in this ﬁeld, weanticipate that it could have similar applications in the closely-related ﬁeld of stellarator coildesign.In the Sections that follow, we present a new method for the design of the coil-windingsurface using adjoint-based optimization. An adjoint solve is performed to obtain gradientsof several ﬁgures of merit, the integral-squared normal magnetic ﬁeld on the plasma surfaceand root-mean-squared current density on the winding surface, with respect to the Fouriercomponents describing the coil surface. A brief overview of the REGCOIL approach is givenin Section 3.2. The optimization method and objective function are described in Section 3.3.The adjoint method for computing gradients of the objective function is outlined in Section3.4. Optimization results for the W7-X and HSX winding surfaces are presented in Section3.5. In Section 3.6 we demonstrate a method for computing local sensitivity of ﬁgures ofmerit to perturbations of the winding surface using the shape gradient. We discuss propertiesof optimized winding surface conﬁgurations in Section 3.7. In Section 3.8 we summarize ourresults and conclude.

First, we review the problem of determining coil shapes once the plasma boundary andcoil-winding surface have been speciﬁed. Given the winding surface geometry, our task isto obtain the surface current density, J . The divergence-free surface current density can berelated to a scalar current potential Φ, the stream function for J , J = ˆ n × ∇ Φ . (3.1)Here ˆ n is the unit normal on the winding surface. The current potential Φ can be decomposedinto single-valued and secular terms,Φ( θ, φ ) = Φ sv ( θ, φ ) + Gφ π + Iθ π . (3.2)Here φ is the cylindrical azimuthal angle and θ is a poloidal angle. The quantities G and I are the currents linking the surface poloidally and toroidally, respectively. The single-valued40erm (Φ sv ) is determined by solving the REGCOIL system. It is chosen to minimize theprimary objective function, χ = χ B + λχ J . (3.3)Here χ B is the surface-integrated-squared normal magnetic ﬁeld on the desired plasma sur-face, χ B = (cid:90) S P d x ( B · ˆ n ) . (3.4)The normal component of the magnetic ﬁeld on the plasma surface, B · ˆ n , includes contri-butions from currents in the plasma, current density J on the winding surface, and currentsin other external coils. The quantity χ J is the surface-integrated-squared current density onthe winding surface, χ J = (cid:90) S coil d x | J | . (3.5)As discussed in Section 1.4.3, minimization of χ B by itself ( λ = 0) is fundamentally ill-posed,as very diﬀerent coil shapes can provide almost identical normal ﬁeld on the plasma surface.(Oppositely directed currents cancel in the Biot-Savart integral.) The addition of χ J to theobjective function is a form of Tikhonov regularization. As we will show, minimization of χ J also simpliﬁes coil shapes. While the NESCOIL formulation relies on Fourier series trunca-tion for regularization, the formulation in REGCOIL allows for ﬁner control of regularizationwhile improving engineering properties of the coil set. The regularization parameter λ canbe chosen to obtain a target maximum current density J max , corresponding to a minimumtolerable inter-coil spacing. A 1D nonlinear root ﬁnding algorithm is typically used for thisprocess.The single-valued part of the current potential Φ sv is represented using a ﬁnite Fourierseries, Φ sv ( θ, φ ) = (cid:88) m,n Φ m,n sin( mθ − nN P φ ) , (3.6)where N P is the number of periods. Only a sine series is needed if stellarator symmetryis imposed on the current density ( J ( − θ, − φ ) = J ( θ, φ )). As the minimization of χ withrespect to Φ m,n is a linear least-squares problem, it can be solved via the normal equations toobtain a unique solution. The Fourier amplitudes Φ m,n are determined by the minimizationof χ , ∂χ ∂ Φ m,n = ∂χ B ∂ Φ m,n + λ ∂χ J ∂ Φ m,n = 0 , (3.7)which takes the form of a linear system, (cid:88) m,n A m (cid:48) ,n (cid:48) ; m,n Φ m,n = b m (cid:48) ,n (cid:48) . (3.8)We will use the notation ←→ A −→ Φ = −→ b . Throughout bold-faced type with a right-facing arrow41ill denote the vector space of basis functions for Φ sv unless otherwise noted. For additionaldetails see [136]. We use REGCOIL to compute the distribution of current on a ﬁxed, two-dimensionalwinding surface. To design coil shapes in three-dimensional space, we modify the wind-ing surface geometry by minimizing an objective function (3.10). This objective functionquantiﬁes fundamental physics and engineering properties and is easy to calculate from theREGCOIL solution. Optimal coil geometries are obtained by nonlinear, constrained opti-mization. The cylindrical components of the winding surface are decomposed in Fourier harmonics, R = (cid:88) m,n R cm,n cos( mθ + nN p φ ) (3.9a) Z = (cid:88) m,n Z sm,n sin( mθ + nN p φ ) , (3.9b)where stellarator symmetry of the winding surface is assumed ( R ( − θ, − φ ) = R ( θ, φ )and Z ( − θ, − φ ) = − Z ( θ, φ )). We take the Fourier components of the winding surface,Ω = { R cm,n , Z sm,n } , as our optimization parameters and assume that the desired plasma sur-face is held ﬁxed. Throughout, Ω displayed with a subscript index will refer to a single Fouriercomponent, while in the absence of a subscript, it refers to the set of Fourier components.For a given winding surface geometry, Ω, and desired plasma surface, the current poten-tial Φ(Ω) can be determined by solving the REGCOIL system to obtain a solution whichboth reproduces the desired plasma surface with ﬁdelity and maximizes coil-coil distance, asdescribed in Section 3.2.We deﬁne an objective function, f , which will be minimized with respect to Ω, f (Ω , −→ Φ (Ω)) = χ B (Ω , −→ Φ (Ω)) − α V V / (Ω) + α S S (Ω) + α J (cid:107) J (cid:107) (Ω , −→ Φ (Ω)) . (3.10)The coeﬃcients α V , α S , and α J are positive constants that weigh the relative importance ofthe terms in f . We take χ B (3.4) as our proxy for the desired physics properties of the plasmasurface. The normal magnetic ﬁeld depends on −→ Φ , the single-valued current potential onthe surface, and Ω, the geometric properties of the coil-winding surface. The quantity V coil is the total volume enclosed by the coil-winding surface, V coil = (cid:90) S coil d x. (3.11) The adjoint method and winding-surface optimization tools are implemented in the main branch of theREGCOIL code https://github.com/landreman/regcoil.

42e use V / as a proxy for the coil-plasma separation. Our objective function decreaseswith increasing V coil , as we desire a winding surface which allows for increased coil-plasmaseparation. This minimizes coil ripple and provides increased access for neutral beams anddiagnostics. We recognize that increasing V coil implies increased coil length and experimentsize, which may not always be desired.The quantity S is a measure of the spectral width of the Fourier series describing thecoil-winding surface [110], S = (cid:88) m,n m p (cid:16) ( R cm,n ) + ( Z sm,n ) (cid:17) . (3.12)Smaller values of S correspond to Fourier spectra which decay rapidly with increasing m .We take advantage of the non-uniqueness of the representation in (3.9) to obtain surfaceparameterization which are more eﬃcient. As χ B , (cid:107) J (cid:107) , and V coil are coordinate-independent,these terms remain unchanged if the surface is reparameterized ( θ is redeﬁned). Minimizationof S removes this zero-gradient direction in parameter space. We use a typical value of p = 2. One could also remove the redundancy in the deﬁnition of θ by using the uniqueand spectrally condensed representation of Hirshman and Breslau [109] or by solving thenonlinear constraint equation of Hirshman and Meier [110] once the optimal surface hasbeen obtained.The quantity (cid:107) J (cid:107) = (cid:112) χ J /A coil is the 2-norm of the current density, where A coil is thewinding surface area, A coil = (cid:90) coil d x . (3.13)Although we are using a current potential approach rather than directly optimizing coilshapes, including (cid:107) J (cid:107) in the objective function allows us to obtain coils with good engi-neering properties. Derivatives of coil-speciﬁc metrics (such as curvature) could be com-puted from the current potential if desired. For example, consider N contours beginning atequally-spaced toroidal angles φ i and θ = 0. The i th contour is deﬁned by functions θ i ( s )and φ i ( s ) for parameter s , where ∂ Φ /∂s = 0. The derivatives of coil metrics which dependon x ( θ i ( s ) , φ i ( s )), could be computed with the adjoint method which will be described inSection 3.4. As the direct targeting of coil metrics introduces additional arbitrary weightsin the objective function and the solution to another adjoint equation must be obtained tocompute its gradient, we instead include (cid:107) J (cid:107) in our objective function.To demonstrate this correlation between (cid:107) J (cid:107) and coil shape complexity, we compute thecoil set on the actual W7-X winding surface using REGCOIL. The regularization parameter λ is varied to achieve several values of (cid:107) J (cid:107) . Coil shapes are obtained from the contours ofΦ. In Figure 3.1, two of the W7-X non-planar coils computed in this way are shown, and thecorresponding coil metrics are given in Table 3.1. (These correspond to the two leftmost coilsin Figure 3.5.) We consider the average and maximum length l , toroidal extent ∆ φ , curvature κ , and the minimum coil-coil distance d mincoil-coil . The average, maximum, and minimum aretaken over the set of 5 unique coils. The coil shapes become more complex as (cid:107) J (cid:107) increases,quantiﬁed by increasing κ and ∆ φ and decreasing d mincoil-coil . Here the curvature, κ , of a43 a) (b) (c) Figure 3.1: Two non-planar W7-X coils (corresponding to the two leftmost coils in Figure3.5) computed with REGCOIL using the actual W7-X winding surface. The regularizationparameter λ is chosen to achieve the shown values of (cid:107) J (cid:107) . As (cid:107) J (cid:107) increases, the averagelength, toroidal extent, and curvature increase. Figure adapted from [185] with permission.three-dimensional parameterized curve, x ( t ), is, κ = (cid:12)(cid:12) x (cid:48) ( t ) × x (cid:48)(cid:48) ( t ) (cid:12)(cid:12)(cid:12)(cid:12) x (cid:48) ( t ) (cid:12)(cid:12) . (3.14)We have compared coil shapes on a single winding surface, ﬁnding them to become simpleras (cid:107) J (cid:107) decreases. As (cid:107) J (cid:107) = (cid:0) χ J /A coil (cid:1) / , we would ﬁnd similar trends with χ J . We havechosen to include (cid:107) J (cid:107) in the objective function as it is normalized by A coil , so it is a moreuseful quantity for comparison of coil shapes on diﬀerent winding surfaces.To minimize f , the relative weights in (3.10) ( α V , α S , and α J ) are chosen such that eachof the terms in the objective function have similar magnitudes, though much tuning of theseparameters is required to obtain results which simultaneously improve the physics properties(decrease χ B ) and engineering properties (increase V coil and d mincoil-coil , decrease κ and ∆ φ ). Minimization of f is performed subject to the inequality constraint d min ≥ d targetmin . Here d min is the minimum distance between the coil-winding surface and the plasma surface, d min = min θ,φ (cid:0) d coil-plasma (cid:1) = min θ,φ (cid:18) min θ p ,φ p | x C − x P | (cid:19) , (3.15)and d targetmin is the minimum tolerable coil-plasma separation. The quantities θ p and φ p arepoloidal and toroidal angles on the plasma surface, x P and x C are the position vectors onthe plasma and winding surface, and d coil-plasma is the coil-plasma distance as a function of θ and φ . 44 J (cid:107) [MA/m] 2.20 2.70 3.20 J max [MA/m] 4.55 9.50 29.1 χ B [T m ] 1.89 5 . × − . × − Average l [m] 8.03 9.18 9.81Max l [m] 8.26 10.5 11.8Average ∆ φ [rad.] 0.146 0.222 0.253Max ∆ φ [rad.] 0.161 0.282 0.372Average κ [m − ] 1.04 1.29 1.32Max κ [m − ] 2.54 20.3 56.1 d mincoil-coil [m] 0.353 0.182 0.0758Table 3.1: Comparison of metrics for coils computed with REGCOIL using the actual W7-X winding surface. Average and max are evaluated for the set of 5 unique coils. Theregularization parameter λ is varied to achieve these values of (cid:107) J (cid:107) . Table adapted from[185] with permission.The maximum current density J max is also constrained, J max = max θ,φ J. (3.16)This roughly corresponds to a ﬁxed minimum coil-coil spacing. This constraint is enforcedby ﬁxing J max to obtain the regularization parameter λ in the REGCOIL solve, so we avoidthe need for an equality constraint or the inclusion of J max in the objective function. Rather, −→ Φ (Ω) is determined such that J max is ﬁxed. The inequality-constrained nonlinear opti-mization is performed using the NLOPT [125] software package using a conservative convexseparable quadratic approximation (CCSAQ) [224]. While there are several gradient-basedinequality-constrained algorithms available, we choose to use CCSAQ as it is relatively in-sensitive to the bound constraints imposed on the optimization parameters. We recognizethat there are many possible combinations of constraints, objective functions, and regular-ization conditions that could be used. For example, (cid:107) J (cid:107) could be ﬁxed to determine λ while J max could be included in the objective function. We found that the formulation we havepresented produces the best coil shapes. f and the adjoint method We must compute derivatives of f with respect to the geometric parameters Ω in orderto use gradient-based optimization methods. The spectral width S and the volume V coil areexplicit functions of Ω, so their analytic derivatives can be obtained. On the other hand, χ B and (cid:107) J (cid:107) depend both explicitly on coil geometry and on Φ (Ω). One approach to obtain45he derivatives of these quantities could be to solve the REGCOIL linear system N Ω + 1times, taking a ﬁnite-diﬀerence step in each Fourier coeﬃcient. However, if N Ω is large, thecomputational cost of this method could be prohibitively expensive. Instead, we will applythe adjoint method to compute derivatives. This technique will be demonstrated below.The derivative of χ B can be computed using the chain rule, ∂χ B (Ω , −→ Φ (Ω)) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n + ∂χ B (Ω , −→ Φ ) ∂ −→ Φ · ∂ −→ Φ (Ω) ∂ Ω m,n , (3.17)where −→ Φ (Ω) is understood to vary with Ω such that (3.8) is satisﬁed. The dot prod-uct is a contraction over the current potential basis functions, { Φ m,n } . We can compute ∂ −→ Φ (Ω) /∂ Ω m,n by diﬀerentiating the linear system (3.8) with respect to Ω m,n , ∂ ←→ A (Ω) ∂ Ω m,n −→ Φ + ←→ A ∂ −→ Φ (Ω) ∂ Ω m,n = ∂ −→ b (Ω) ∂ Ω m,n , (3.18)and formally solving this equation to obtain, ∂ Φ (Ω) ∂ Ω m,n = ←→ A − (cid:32) ∂ −→ b (Ω) ∂ Ω m,n − ∂ ←→ A (Ω) ∂ Ω m,n −→ Φ (cid:33) . (3.19)Equation (3.19) is inserted into (3.17), ∂χ B (Ω , −→ Φ (Ω)) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n + ∂χ B (Ω , −→ Φ ) ∂ −→ Φ ·  ←→ A − (cid:32) ∂ −→ b (Ω) ∂ Ω m,n − ∂ ←→ A (Ω) ∂ Ω m,n −→ Φ (cid:33) . (3.20)This expression could be evaluated by solving the linear system (3.18) for ∂ −→ Φ /∂ Ω m,n andperforming the inner product with ∂χ B /∂ −→ Φ . However, the computational cost of this methodscales similarly to that of ﬁnite diﬀerencing, as described in Section 2.2.1. Instead, we canexploit the adjoint property of the operator to obtain, ∂χ B (Ω , −→ Φ (Ω)) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n + (cid:34)(cid:16) ←→ A − (cid:17) T ∂χ B (Ω , −→ Φ ) ∂ −→ Φ (cid:35) · (cid:32) ∂ −→ b (Ω) ∂ Ω m,n − ∂ ←→ A (Ω) ∂ Ω m,n −→ Φ (cid:33) . (3.21)For any invertible matrix, (cid:16) ←→ A − (cid:17) T = (cid:16) ←→ A T (cid:17) − . Hence we can instead solve a linear systeminvolving the matrix ←→ A T to compute an adjoint variable −→ q , deﬁned as the solution of ←→ A T −→ q = ∂χ B (Ω , −→ Φ ) ∂ −→ Φ . (3.22)Rather than compute a ﬁnite-diﬀerence derivative for each Ω m,n or solve a linear system tocompute each ∂ −→ Φ /∂ Ω m,n as in (3.19), we solve two linear systems: the forward (3.8) andadjoint (3.22). The adjoint equation is similar to the forward equation ( ←→ A T has the samedimensions and eigenspectrum as ←→ A ), so the same computational tools can be used to solvethe adjoint problem. We then perform an inner product with −→ q to obtain the derivatives46ith respect to each Ω m,n , ∂χ B (Ω , −→ Φ (Ω)) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n + −→ q · (cid:32) ∂ −→ b (Ω) ∂ Ω m,n − ∂ ←→ A (Ω) ∂ Ω m,n Φ (cid:33) . (3.23)The derivatives ∂ −→ b /∂ Ω m,n , ∂ ←→ A /∂ Ω m,n , ∂χ B /∂ Ω m,n , and ∂χ B /∂ −→ Φ can be computed ana-lytically. In the above discussion, the regularization parameter λ has been assumed to beﬁxed. A similar method can be used if a λ search is performed to obtain a target J max (seeAppendix C). The same method is used to compute derivatives of (cid:107) J (cid:107) .We note that adjoint methods provide the most signiﬁcant reduction in computationalcost when the linear solve is expensive. For the REGCOIL system, this is not the case, asthe cost of constructing ←→ A and −→ b exceeds that of the solve. We have implemented OpenMPmultithreading for the construction of ∂ ←→ A /∂ Ω and ∂ −→ b /∂ Ω such that the cost of computingthe gradients via the adjoint method is cheaper than computing ﬁnite-diﬀerence derivativesserially.The constraint functions, d min and J max , must also be diﬀerentiated with respect to Ω m,n .As d min is deﬁned in terms of the minimum function, we approximate it using the smoothlog-sum-exponent function [29], d min, lse = − q log (cid:32) (cid:82) S C d x C (cid:82) S P d x P exp (cid:0) − q | x C − x P | (cid:1)(cid:82) S C d x C (cid:82) S P d x P (cid:33) . (3.24)This function can be analytically diﬀerentiated with respect to Ω m,n . As q approachesinﬁnity, d min, lse approaches d min . For q very large, the function obtains very sharp gradients.A typical value of q = 10 m − was used. The log-sum-exponent function is also used toapproximate J max , as described in Appendix C. Beginning with the actual W7-X winding surface, we perform scans over the coeﬃcients α V and α S in the objective function (3.10). The plasma surface was obtained from a ﬁxed-boundary VMEC solution that predated the coil design and is free from modular coil ripple.The constraint target is set to be the minimum coil-plasma distance on the initial windingsurface, d targetmin = 0 .

37 m. The cross-sections of the optimized surfaces in the poloidal planeare shown in Figures 3.2 and 3.3 along with the last-closed ﬂux surface (red), a constantoﬀset surface at d targetmin (black solid), and the initial winding surface (black dashed).We perform a scan over α S with α V = α J = 0. For optimal values of α S , the additionof the spectral width term should simply reparameterize the surface, eliminating the zero-gradient direction in parameter space. Thus we expect that when χ B is the only other term inthe objective function, the winding surface should collapse to a constant oﬀset surface. When α S is too large, the surface shape changes to favor a condensed Fourier series. When α S istoo small, the optimization may terminate prematurely in a local minimum due to the non-47niqueness of the representation. Indeed we ﬁnd that with increasing α S , the winding surfaceapproaches a torus with a circular cross-section, which has a minimal Fourier spectrum. Atmoderately small values of α S ( ∼ .

3) the surface approaches a constant oﬀset surface at d targetmin , as χ B is dominant in objective function. For very small values of α S ( ∼ . α S = 0 . α V is performed at ﬁxed α S = 0 . α J = 0 such that the spectral widthdoes not greatly increase. As α V increases, d coil-plasma increases signiﬁcantly on the outboardside while it remains ﬁxed in the inboard concave regions. This trend is not surprising,as concave plasma shapes have been shown to be ineﬃcient to produce with coils [137].Interestingly, the winding surface obtains a somewhat pointed shape at the triangle cross-section ( φ = 0 . π/N p ), becoming elongated at the tip of the triangle and “pinching” towardthe plasma surface at the edges. We now include nonzero α J and attempt a comprehensive optimization. The J max con-straint is selected such that the metrics ( l , κ , and ∆ φ ) of the coils computed on the initialsurface roughly match those of the actual non-planar coil set. The coil-plasma distance con-straint d targetmin is set to be the minimum d coil-plasma on the initial winding surface. Parameters α V = 0 . α S = 0 .

24, and α J = 1 . × − were used in the objective function. Optimizationwas performed over 118 Fourier coeﬃcients (cid:0) | n | ≤ m ≤ (cid:1) and the objectivefunction was evaluated a total of 5165 times to reach the optimum (1 . × linear solvesrather than 6 . × required for ﬁnite-diﬀerence derivatives). The optimal surface and coilset are shown in Figures 3.4 and 3.5, and the corresponding metrics are shown in Table 3.2.We ﬁnd a solution which increases V coil by 22% and decreases χ B by 52% over the initialwinding surface. (Note that it is numerically impossible to obtain a current distribution thatexactly reproduces the plasma surface, so χ B is nonzero when computed from the REGCOILsolution on the initial winding surface.) In addition, the optimized coil set features a smalleraverage and maximum ∆ φ and κ and larger d mincoil-coil . The length of the coils increases toaccommodate for the increase in V coil . Again we ﬁnd that the increase in V coil is most pro-nounced in the outboard convex regions while d coil-plasma is maintained in the concave regionsof the bean-shaped cross-sections. The “pinching” feature of the winding surface is againpresent in the triangle cross-section ( φ = 0 . π/N p ).It should be noted that the decrease in d coil-plasma at the bottom and top of the bean cross-section ( φ = 0) might interfere with the current W7-X divertor baﬄes. However, the increasein volume on the outboard side would allow for increased ﬂexibility for the neutral beaminjection duct [200]. We have performed this optimization to show that a winding surfacecould be constructed that increases V coil (and thus the average d coil-plasma ), improves coilshapes, and decreases χ B . If further engineering considerations were necessary, these couldbe implemented. The surface we have obtained is optimal with respect to the engineeringconsiderations and constraints we have imposed, which diﬀer from those of the W7-X team48 Z [ m e t e r s ] =0.0 N p =0.25 N p Offset surface S = 0.003 S = 0.3= 3 × 10 Initial surfacePlasma surface Z [ m e t e r s ] =0.5 N p =0.75 N p Figure 3.2: Optimized winding surfaces obtained with α V = α J = 0 and the values of α S shown. The actual W7-X winding surface is used as the initial surface in the optimization(black dashed). As α S increases, the magnitude of the spectral-width term in the objectivefunction increases, and the winding surface approaches a cylindrical torus with a minimalFourier spectrum. For moderately small values of α S , the winding surface approaches auniform oﬀset surface from the plasma surface (black solid). Figure adapted from [185] withpermission. 49 Z [ m e t e r s ] =0.0 N p =0.25 N p Offset surface V = 0.5 V = 1.0 V = 2.0Initial surfacePlasma surface Z [ m e t e r s ] =0.5 N p =0.75 N p Figure 3.3: Optimized winding surfaces obtained with α S = 0 . α J = 0, and the values of α V shown. The actual W7-X winding surface is used as the initial surface in the optimization(black dashed). As α V increases, d coil-plasma increases on the outboard side while it remainsﬁxed in the concave region. Figure adapted from [185] with permission.50 .51.00.50.00.51.01.5 Z [ m e t e r s ] =0.0 N p =0.25 N p Actual surfaceOptimized surfacePlasma surface Z [ m e t e r s ] =0.5 N p =0.75 N p Figure 3.4: The actual W7-X coil-winding surface and plasma surface are shown with ouroptimized winding surface. In comparison with the actual surface, the optimized surfacereduces χ B by 52% and increases V coil by 22%. Figure adapted from [185] with permission.[15]. Thus the direct comparison between our method and those of [15] cannot be madebased on these results. We perform the same procedure for the optimization of the HSX winding surface. Pa-rameters α V = 3 . × − , α S = 0, and α J = 3 × − were used in the objective function.We found that the spectral width term was not necessary to obtain a satisfying optimum inthis case. The initial winding surface was taken to be a toroidal surface on which the actualmodular coils lie. The plasma equilibrium used is a ﬁxed-boundary VMEC solution withoutcoil ripple. Optimization was performed over 100 Fourier coeﬃcients (cid:0) | n | ≤ m ≤ (cid:1) and the objective function was evaluated a total of 560 times to reach the optimum(1 . × linear solves rather than 5 . × required for forward-diﬀerence derivatives).The coil-plasma distance constraint was set to be d targetmin = 0 .

14 m, the minimum coil-plasma51 a)(b)

Figure 3.5: Comparisons of coil set computed with REGCOIL using the actual W7-X windingsurface (dark blue) and the optimized surface (light blue). Figure reproduced from [185] withpermission. 52nitial Optimized Actual coil set χ B [T m ] 0.115 0.0711 V coil [m ] 156 190 (cid:107) J (cid:107) [MA/m] 2.21 2.16 J max [MA/m] 7.70 7.70Average l [m] 8.51 8.95 8.69Max l [m] 8.84 9.14 8.74Average ∆ φ [rad.] 0.190 0.179 0.198Max ∆ φ [rad.] 0.222 0.197 0.208Average κ [m − ] 1.21 1.10 1.20Max κ [m − ] 9.01 4.84 2.59 d mincoil-coil [m] 0.223 0.271 0.261Table 3.2: Comparison of metrics of the actual W7-X winding surface and our optimizedsurface. We also show metrics of the coil set computed on the winding surfaces using REG-COIL and the metrics for the actual W7-X nonplanar coils. Regularization in REGCOILis chosen such that the coil metrics computed on the initial surface roughly match those ofthe actual coil set. Coil complexity improves from the initial to the ﬁnal surface (decreasedaverage and max ∆ φ and κ , increased d mincoil-coil ). The average and max l increases to allowfor the increase in V coil . Table adapted from [185] with permission.53 .50.00.5 Z [ m e t e r s ] =0.0 N p =0.25 N p Actual surfaceOptimized surfacePlasma surface Z [ m e t e r s ] =0.5 N p =0.75 N p Figure 3.6: The actual HSX coil-winding surface and plasma surface are shown with ouroptimized winding surface. In comparison with the actual surface, the optimized surface hasdecreased χ B by 4% and increased V coil by 18%. Figure adapted from [185] with permission.distance on the actual winding surface. The optimal surface and coil set are shown in Figures3.6 and 3.7, and the corresponding coil metrics are shown in Table 3.3. We ﬁnd a solutionthat increases V coil by 18% and decreases χ B by 4% over the initial winding surface. Thecoil set computed with REGCOIL using the optimized surface appears qualitatively similarto that computed with the initial surface but with increased d coil-plasma on the outboard side.The average and maximum ∆ φ and κ decreased while d mincoil-coil was increased for the coil setcomputed on the optimal surface in comparison to that of the initial surface. As was ob-served in the W7-X optimization (Figure 3.4), the optimized HSX winding surface obtainsa somewhat pinched shape near the triangle cross-section ( φ = 0 . π/N p ).54 a)(b) Figure 3.7: The coils obtained from REGCOIL using the actual HSX winding surface (darkblue) and optimized surface (light blue). Figure reproduced from [185] with permission.55nitial Optimized Actual coil set χ B [T m ] 1 . × − . × − V coil [m ] 2.60 3.07 (cid:107) J (cid:107) [MA/m] 0.956 0.891 J max [MA/m] 1.84 1.84Average l [m] 2.26 2.39 2.24Max l [m] 2.49 2.46 2.33Average ∆ φ [rad.] 0.372 0.365 0.362Max ∆ φ [rad.] 0.530 0.505 0.478Average κ [m − ] 5.15 4.80 5.05Max κ [m − ] 33.4 25.8 11.7 d mincoil-coil [m] 0.0850 0.0853 0.0930Table 3.3: Comparison of metrics of the actual HSX winding surface and our optimized sur-face. We also show metrics of the coil set computed on the winding surfaces using REGCOILand the metrics for the actual HSX modular coils. Regularization in REGCOIL is chosensuch that the coil metrics computed on the initial surface roughly match those of the actualcoil set. Coil complexity improves from the initial to the ﬁnal surface (decreased averageand max ∆ φ and κ , increased d mincoil-coil ). The average and max l increases to allow for theincrease in V coil . Table adapted from [185] with permission.56 .6 Local winding surface sensitivity With the adjoint method we have computed derivatives of the objective function withrespect to Fourier components of the winding surface, ∂f /∂

Ω. While this representation ofderivatives is convenient for gradient-based optimization, the sensitivity to local displace-ments of the surface is obscured. Alternatively, it is possible to represent the sensitivity of f with respect to normal displacements of surface area elements of a given winding surface S C , δf ( S C ; δ x ) = (cid:90) S C d x G δ x · ˆ n . (3.25)The shape gradient and shape derivatives are described in detail in Section 2.1. As both χ B and (cid:107) J (cid:107) are deﬁned in terms of surface integrals over the winding surface, it can be shownthat the shape derivative of these functions can be written in the Hadamard form [171]. Theshape gradients G χ B and G (cid:107) J (cid:107) can be computed from the Fourier derivatives ( ∂χ B /∂ Ω and ∂ (cid:107) J (cid:107) /∂ Ω) using a singular value decomposition method [138]. Here the perturbations δf and δ x are written in terms of the Fourier derivatives, and G is also represented in a ﬁniteFourier series, ∂f (Ω) ∂ Ω m,n = (cid:90) S C d x (cid:88) m,n G m,n cos( mθ + nN p φ )  ∂ x (Ω) ∂ Ω m,n · ˆ n . (3.26)After discretizing in θ and φ , (3.26) takes the form of a (generally not square) matrix equationwhich can be solved using the Moore-Penrose pseudoinverse to obtain G m,n .We compute G χ B and G (cid:107) J (cid:107) (Figure 3.9) at ﬁxed λ . These quantities are computed onthe actual W7-X winding surface and a surface uniformly oﬀset from the plasma surfacewith d coil-plasma = 0 .

61 m (the area-averaged d coil-plasma over the actual surface). We considersurfaces that are equidistant from the plasma surface on average as G scales inversely with A coil . The poloidal cross-sections of these surfaces are shown in Figure 3.8. For each surface λ is chosen to achieve J max = 7 . G χ B , indicating that d coil-plasma shoulddecrease at that location in order that χ B decreases. This corresponds to locations onthe plasma surface with signiﬁcant concavity (Figure 3.11b). The maximum G χ B occurs at φ = 0 .

15 2 π/N p on both surfaces (Figure 3.4). In comparison with this region, the magnitudeof G χ B is relatively small over the majority of the area of the surfaces shown, demonstratingthat engineering tolerances might be more relaxed in these locations. There is also a regionof negative G χ B near φ = π/N p and θ = 0. This is the “tip” of the triangle-shaped cross-section, where d coil-plasma was increased over the course of the optimization (Figures 3.2, 3.3,and 3.4). We ﬁnd that G χ B computed on the actual winding surface has similar trends tothat computed on the surface uniformly oﬀset from the plasma. This indicates that theshape gradient depends on the speciﬁc geometry of the winding surface. We have computed G χ B for several other winding surfaces with varying d coil-plasma . Regardless of the windingsurface chosen, we observe increased sensitivity in the concave regions.The quantity G (cid:107) J (cid:107) roughly quantiﬁes how coil complexity changes with normal displace-57 Z [ m e t e r s ] =0.0 N p =0.25 N p Offset from plasmaActual surfacePlasma surface Z [ m e t e r s ] =0.5 N p =0.75 N p Figure 3.8: The cross-sections of the two winding surfaces used to compute G χ B and G (cid:107) J (cid:107) are shown in the poloidal plane. Figure adapted from [185] with permission.ments of the coil surface. In view of Figure 3.10, the locations of large G (cid:107) J (cid:107) overlap withareas of increased J . On the actual winding surface, the maximum of G (cid:107) J (cid:107) occurs near thelocation of the closest approach between coils (two rightmost coils in Figure 3.5(a)). Theshape gradients G (cid:107) J (cid:107) and G χ B have very similar trends. The concave regions of the plasmasurface are diﬃcult to produce with external coils, resulting in increased coil complexity and J . Therefore, (cid:107) J (cid:107) is most sensitive to displacements of the coil-winding surface in theseregions.We recognize several ways that the shape gradient technique could be improved to providemore relevant diagnostics for experimental design. With a winding surface representation,the shape gradient does not allow for calculation of the sensitivity to lateral coil displace-ments. Also, our analysis does not account for ﬁeld ripple due to the ﬁnite number of coils.Although Figure 3.9 indicates that the coils should move toward the plasma to reduce theﬁeld error, the ripple ﬁelds might be signiﬁcant with a ﬁlamentary model. A similar cal-culation could be performed using the ﬁlamentary coil sensitivity techniques presented inSection 2.1 and discussed further in Chapter 5. Finally, χ B does not account for the sensi-tivity to resonant ﬁelds that could cause the formation of islands, though there is ongoingwork toward computing the shape gradient for such a metric [76].Sensitivity studies on NCSX similarly found that coil errors on the inboard side in regions58 a) Oﬀset from plasma (b) Actual(c) Oﬀset from plasma (d) Actual Figure 3.9: Shape gradient for χ B ((a) and (b)) || J || ((c) and (d)). These functions arecomputed using the W7-X plasma surface and a uniform oﬀset winding surface from theplasma surface with d coil-plasma = 0 .

61 m ((a) and (c)) and the actual winding surface ((b)and (d)). The region of increased G χ B corresponds with concave regions of the plasma surface(Figure 3.11b). Regions of large positive (cid:107) J (cid:107) correspond to regions with increased J (Figure3.10). Figure adapted from [185] with permission.59 a) Oﬀset from plasma (b) Actual Figure 3.10: Current density magnitude, J , computed from REGCOIL using the W7-Xplasma surface and (a) a uniform oﬀset winding surface from the plasma surface with d coil-plasma = 0 .

61 m and (b) the actual winding surface. Figure adapted from [185] withpermission.of small d coil-plasma had a signiﬁcant eﬀect on ﬂux surface quality [236]. The necessity of small d coil-plasma for bean-shaped plasmas has been noted in many coil optimization eﬀorts [60, 216]and has been demonstrated by evaluating the singular value decomposition of the discretizedBiot-Savart integral operator [137]. We can identify these regions where the ﬁdelity of theplasma surface requires tighter tolerance on coil positions using the shape gradient. The results presented here and in [137] indicate that the concave regions of the surfaceare both the regions where a small coil-plasma distance is required and the sensitivity tothe winding surface position is highest. The regions of concavity can be determined byconsidering the principal curvatures of the plasma surface. Let ˆ n ( x ) represent the normalvector at the plasma surface at some point x , and let A n represent a plane that includesthis normal vector. The intersection of the plane and the surface makes a curve x ( l ), whichhas curvature κ at the point x , as calculated from (3.14). The two principal curvatures κ and κ represent the maximum and minimum curvatures, κ , from all possible planes A n . We choose the convention for the principal curvatures such that convex curves havepositive curvature and concave curves have negative curvatures. Therefore, small values ofthe second principal curvature, κ , represent regions on the surface where the concavity isincreased. 60 a) (b) Figure 3.11: (a) The minimum distance between the W7-X plasma surface and the optimizedwinding surface obtained in Section 3.5.2 and (b) the second principle curvature κ are shownas a function of location on the plasma surface. Locations of large negative κ coincide withregions where the optimization resulted in small d coil-plasma . Figure adapted from [185] withpermission.The second principal curvature for the W7-X plasma surface is shown in Figure 3.11b.Although κ and the shape gradients are evaluated on diﬀerent surfaces, we note that regionsof high concavity (negative κ ) coincide with regions of large, positive G (Figure 3.9). Theregions of high concavity also correspond to the regions where the optimization proceduretends to place the winding surface closest to the plasma (Figure 3.11). We recognize that ourwinding surface optimization accounts for several engineering considerations in addition toreproducing the desired plasma surface. However, for a wide range of parameters the windingsurfaces we obtain feature small d coil-plasma in the bean-shaped cross-sections (Figures 3.2 and3.3). Thus κ , which is exceedingly fast to compute, may serve as a target for optimizationof the plasma conﬁguration. By minimizing the regions of high concavity, it may be possibleto ﬁnd stellarator equilibria that are more amenable to coils that are positioned farther fromthe plasma. Any increase in the minimal distance between the plasma and the coils hasimplications for the size of a reactor, where d coil-plasma is set by the required blanket width.Similar metrics are considered in the ROSE code, such as the integrated absolute value ofthe Gaussian curvature and integrated absolute value of the maximum curvature [59]. We have outlined a new method for the optimization of the stellarator coil-winding surfaceusing a continuous current potential approach. Rather than evolving ﬁlamentary coil shapes,we use REGCOIL to obtain the current density on a winding surface and optimize the61inding surface using analytic gradients of the objective function. We have shown that wecan indirectly improve the coil curvature and toroidal extent by targeting the root-mean-squared current density in our objective function (Figure 3.1). This approach oﬀers severalpotential advantages over other nonlinear coil optimization tools.1. The diﬃculty of the optimization is reduced by the application of the REGCOILmethod, which takes the form of a linear least-squares system. The optimal coil shapeson a given winding surface can thus be eﬃciently and robustly computed.2. By ﬁxing the maximum current density to obtain the regularization in REGCOIL, weeliminate the need to implement an additional equality constraint or arbitrary weightin the objective function.3. By using REGCOIL to compute coil shapes on a given surface, we can apply the adjointmethod for computing derivatives (Section 3.4). This allows us to reduce the numberof function evaluations required during the nonlinear optimization by a factor of ≈ hapter 4 Adjoint-based optimization of neoclassicalproperties

Several critical quantities for stellarator design arise from neoclassical physics, the kinetictheory of collisional transport in the presence of magnetic ﬁeld gradients and curvature. Thisso-called neoclassical transport results from the random-walk of charged particles as theyexhibit guiding center motion. Due to the complicated guiding center orbits present in a3D ﬁeld, neoclassical transport is generally enhanced in a stellarator. One of the primarygoals of stellarator optimization is to reduce this transport. Furthermore, the bootstrapcurrent, driven by collisional processes, should be minimized in low-shear designs or if anisland divertor system is to be used. These neoclassical properties are described by solutionsof the drift-kinetic equation (DKE), (cid:16) v || ˆ b + v d (cid:17) · ∇ f = C ( f ) , (4.1)where f is the distribution function, v || = v · ˆ b is the parallel component of the velocity, v d isthe guiding center drift velocity, and C is the collision operator. The DKE is obtained fromthe Fokker-Planck equation under the assumption that the plasma is strongly magnetizedsuch that (4.1) describes length scales much longer than the gyroradius and frequenciesmuch smaller than the gyrofrequency. We have taken the equilibrium limit, assuming timescales longer than the gyroperiod but shorter than the transport time scale on which theproﬁles relax. In this Chapter we make an additional assumption of local thermodynamicequilibrium, such that f ≈ f M , a Maxwellian distribution (deﬁned in Section 4.2), to lowestorder. This assumption is valid in stellarator conﬁgurations, provided that the collisionlessorbits are suﬃciently conﬁned and the collision frequency is not too low [33, 227]. Thedeparture from a Maxwellian, f , is driven by gradients in f M due to variations in thedensity, temperature, and electrostatic potential. The drift-kinetic equation is described inmany references, including Chapter 7 in [99] and [94, 97].In this Chapter, we will apply both the discrete and continuous adjoint methods describedin Chapter 2 to eﬃciently compute derivatives of functions that depend on such solutionsof the drift kinetic equation. This analysis will allow us to eﬃciently optimize the localmagnetic ﬁeld for several neoclassical quantities in addition to analyzing their sensitivity tochanges in the magnetic ﬁeld. 64he material in this Chapter has been adapted from [186]. Neoclassical transport is governed by solutions of the drift kinetic equation (DKE) (5.131)from which moments (e.g., radial ﬂuxes and bootstrap current) are computed. The DKE localto a ﬂux surface can be solved numerically [18, 140]. However, this four-dimensional problemis expensive to solve within an optimization loop, especially in low-collisionality regimes forwhich increased pitch-angle resolution is required to resolve the collisional boundary layer.Therefore, it is sometimes desirable to consider an analytic reduction of the DKE. Un-der the assumption of low collisionality, a bounce-averaged DKE can be considered [17, 34].While bounce-averaging can signiﬁcantly reduce the computational cost by decreasing thespatial dimensionality, this approach typically requires restrictions on the geometry, suchas closeness to omnigeneity or a model magnetic ﬁeld. Additional reduction of the DKEcan be made in low-collisionality regimes, resulting in semi-analytic expressions. For ex-ample the eﬀective ripple, (cid:15) eﬀ [168], quantiﬁes the geometric dependence of the 1 /ν radialtransport ( ν is the collision frequency) and has been widely used during optimization studies[106, 134, 242]. (The eﬀective ripple will be discussed further in Chapter 5 and Appendix M.)The 1 /ν regime, though, is only relevant when E r is small enough that the typical poloidalrotation frequency is much smaller than the typical collision frequency [116], which is not al-ways an experimentally-relevant regime. A low-collisionality semi-analytic bootstrap currentmodel [205] is also commonly adopted for stellarator design [15, 114]. However, this ana-lytic expression is known to be ill-behaved near rational surfaces. Furthermore, benchmarkswith numerical solutions of the DKE in the low-collisionality limit have been shown to diﬀersigniﬁcantly from the semi-analytic model [16, 127]. Any analytic reduction of the DKEimplies additional assumptions, such as on the collisionality, size of E r , or on the magneticgeometry.Due to the limitations of bounce-averaged and semi-analytic models, there are beneﬁtsto computing neoclassical quantities using numerical solutions to the DKE without approx-imation. With the numerical methods currently used for stellarator optimization, this ap-proach becomes computationally challenging within an optimization loop. Due to their fullythree-dimensional nature, optimization of stellarator geometry requires navigation throughhigh-dimensional spaces, such as the space of the shape of the outer boundary of the plasmaor the shapes of electromagnetic coils. The number of parameters required to describe thesespaces, N , is often quite large ( O (10 )). Knowledge of the gradient of the objective functionwith respect to these parameters can signiﬁcantly improve the convergence to a local min-imum. Once a descent direction is identiﬁed, each iteration reduces to a one-dimensionalline search. Gradient-based optimization with the Levenberg-Marquardt algorithm in theSTELLOPT code [218] has been widely used in the stellarator community and led to thedesign of NCSX [197].Although derivative information is valuable, numerically computing the derivative of aﬁgure of merit f (for example, with ﬁnite-diﬀerence derivatives) can be prohibitively expen-65ive, as f must be evaluated O ( N ) times. For neoclassical optimization, this implies solvingthe DKE O ( N ) times; thus including ﬁnite-collisionality neoclassical quantities in the ob-jective function is often impractical. In this Chapter, we describe an adjoint method forneoclassical optimization. With this method, the computation of the derivatives of f withrespect to N parameters has cost comparable to solving the DKE twice, thus making theinclusion of these quantities possible within an optimization loop. In this Chapter, we obtainderivatives of neoclassical ﬁgures of merit with respect to local geometric parameters on asurface rather than the outer boundary or coil shapes. However, the geometric derivatives wecompute provide an important step toward adjoint-based optimization of MHD equilibria,as discussed in Section 4.5.2 and Chapter 5.In Section 4.2, we provide an overview of the numerical solution of the DKE local to aﬂux surface. In Section 4.3 the adjoint neoclassical method is described. The continuousand discrete approaches for this problem are presented, and their implementation and bench-marks are discussed in Section 4.4. The adjoint method is used to compute derivatives ofmoments of the neoclassical distribution function with respect to local geometric quantities.The derivative information can be used to identify regions of increased sensitivity to magneticperturbations, as discussed in Section 4.5.1. We demonstrate adjoint-based optimization inSection 4.5.2 by locally modifying the ﬁeld strength on a ﬂux surface. A discussion of theapplication of this method for optimization of MHD equilibria is presented in 4.5.2. Finally,the adjoint method is applied to accelerate the calculation of the ambipolar electric ﬁeld inSection 4.5.3. The local drift kinetic equation is, (cid:16) v || ˆ b + v E (cid:17) · ∇ f s − C s ( f s ) = − v m s · ∇ ψ ∂f Ms ∂ψ , (4.2)Here ˆ b = B /B is a unit vector in the direction of the magnetic ﬁeld, v || = v · ˆ b is the parallelcomponent of the velocity, and 2 πψ is the toroidal ﬂux. The Fokker-Planck collision operatoris C s ( f s ), linearized about a Maxwellian f Ms = n s v − ts π − / e − v /v ts where v ts = (cid:112) T s /m s is the thermal speed, n s is the density, T s is the temperature, m s is the mass, and thesubscript indicates species. In (4.2), derivatives are performed holding W s = m s v / q s Φand µ = v ⊥ / B ﬁxed, where v = √ v · v is the magnitude of velocity, Φ is the electrostaticpotential, v ⊥ = (cid:113) v − v || is the perpendicular velocity, and q s is the charge. The radialmagnetic drift is, v m s · ∇ ψ = m s q s B (cid:32) v || + v ⊥ (cid:33) ˆ b × ∇ B · ∇ ψ, (4.3)assuming a magnetic ﬁeld in MHD force balance, and v E is the E × B velocity, v E = B × ∇ Φ B . (4.4)66hroughout we assume Φ = Φ( ψ ) such that (4.2) is linear. In (4.2) we will not consider theeﬀect of inductive electric ﬁelds, as these can be assumed to be small for stellarators withoutinductive current drive. We also do not consider the eﬀects of magnetic drifts tangentialto the ﬂux surface in (4.2), as these only become important when E r is small [184]. Wecan assume radial locality, manifested by the absence of any radial derivatives of f s in(4.2), when ν ∗ (cid:29) ρ ∗ [33], where ν ∗ = ν/ ( v t /L ) (cid:28) L and ρ ∗ = v t m/ ( LqB ) is the normalized gyrofrequency. Numericalsolutions to (4.2) are computed with the Stellarator Fokker-Planck Iterative NeoclassicalSolver (SFINCS) [140] code which allows for general stellarator geometry with ﬂux surfaces.SFINCS solves (4.2) locally on a ﬂux surface ψ , a four-dimensional system. The SFINCScoordinates include two angles (poloidal angle θ and toroidal angle φ ), speed X s = v/v ts , andpitch angle ξ s = v || /v . Speciﬁcs about the implementation of (4.2) in the SFINCS code aredescribed in Appendix D. We will refer to two choices of implementation: the full trajectorymodel and the DKES trajectory model. The full trajectory model maintains µ conservationas radial coupling (terms involving ∂f s /∂ψ ) is dropped. While the DKES model does notconserve µ when E r (cid:54) = 0, the adjoint operator under the DKES model takes a particularlysimple form, as discussed in Section 4.3.1. This model also does not introduce any unphysicalconstraints on the distribution function when E r = 0, as occurs for the full trajectory model[140]. These constraints motivate the introduction of particle and heat sources, which arediscussed in the following Section. We will discuss details of the implementation of the DKEin the SFINCS code, as these need to be considered in arriving at the adjoint equation.However, the adjoint neoclassical approach is quite general and could be implemented inother drift-kinetic codes with slight modiﬁcation.From solutions of (4.2), several neoclassical quantities are computed, including the ﬂux-surface averaged parallel ﬂow, V || ,s = (cid:10) B (cid:82) d v f s v || (cid:11) ψ n s (cid:104) B (cid:105) / ψ , (4.5)the radial particle ﬂux, Γ s = (cid:28)(cid:90) d v ( v m s · ∇ ρ ) f s (cid:29) ψ , (4.6)and the radial heat ﬂux (sometimes referred to as an energy ﬂux), Q s = (cid:42)(cid:90) d v m s v v m s · ∇ ρ ) f s (cid:43) ψ . (4.7)Here the ﬂux-surface average of a quantity A is, (cid:104) A (cid:105) ψ = (cid:82) π dθ (cid:82) π dφ √ gAV (cid:48) ( ψ ) (4.8a) V (cid:48) ( ψ ) = (cid:90) π dθ (cid:90) π dφ √ g, (4.8b)67nd √ g = ( ∇ ψ × ∇ θ · ∇ φ ) − is the Jacobian. We will also consider species-summed quan-tities including the bootstrap current, J b = (cid:80) s q s n s V || ,s , the radial current, J r = (cid:80) s q s Γ s ,and the total heat ﬂux, Q tot = (cid:80) s Q s . Here the eﬀective normalized radius is ρ = (cid:112) ψ/ψ ,where 2 πψ is the toroidal ﬂux at the boundary. To avoid unphysical constraints on f s implied by the moment equations of (4.2) in thepresence of a non-zero E r [140], particle and heat sources are added to the DKE (D.1), L s f s − C s ( f s ) − f Ms (cid:18) X s − (cid:19) S f s ( ψ ) − f Ms (cid:18) X s − (cid:19) S f s ( ψ ) = S s , (4.9)where S f s ( ψ ) and S f s ( ψ ) are unknowns such that S f s provides a particle source and S f s provides a heat source. The collisionless trajectory operator in SFINCS coordinates is, L s = ˙ x · ∇ + ˙ X s ∂∂X s + ˙ ξ s ∂∂ξ s , (4.10)and the inhomogeneous drive term is S s = − ( v m s · ∇ ψ ) ∂f Ms /∂ψ . The source functions aredetermined via the requirement that (cid:104) (cid:82) d v f s (cid:105) ψ = 0 and (cid:104) (cid:82) d v X s f s (cid:105) ψ = 0 (i.e. f s doesnot provide net density or pressure). So, the following system of equations is solved,  L s − C s − f Ms ( X s − ) − f Ms ( X s − ) L s L s (cid:124) (cid:123)(cid:122) (cid:125) L s  f s S f s S f s (cid:124) (cid:123)(cid:122) (cid:125) F s =  S s (cid:124) (cid:123)(cid:122) (cid:125) S s . (4.11)The velocity-space averaging operations are denoted L s f s = (cid:104) (cid:82) d v f s (cid:105) ψ and L s f s = (cid:104) (cid:82) d v f s X s (cid:105) ψ . The full multi-species system can be written as,  L ... L N species   F ... F N species  =  S ... S N species  . (4.12)Here the linear systems corresponding to each species as in (4.11) are coupled through thecollision operator. We use the following notation to refer to the above system, L F = S . (4.13) The goal of the adjoint neoclassical approach is to compute derivatives of a moment ofthe distribution function eﬃciently, R (e.g., V || ,s , Γ s , Q s , J b , J r , Q tot ), with respect to manyparameters. Consider a set of parameters, Ω = { Ω i } N Ω i =1 , on which R depends. Computing a68orward-diﬀerence derivative with respect to Ω requires N Ω + 1 solutions of (4.13). With theadjoint approach, ∂ R /∂ Ω can be computed with one solution of (4.13) and one solution of alinear adjoint equation of the same size as (4.13). Thus if N Ω is very large and the solutionto (4.13) is computationally expensive to obtain, the adjoint approach can reduce the costby N Ω . For stellarator optimization, it is desirable to compute derivatives with respect toparameters that describe the magnetic geometry. In fully three-dimensional geometry, N Ω is O (10 ) and solving (4.13) is the most expensive part of computing R (rather than con-structing the linear system or taking a moment of the distribution function). The discretizedlinear system is typically very large ( N ∼ − for the calculations shown in the Chap-ter) and sparse. Thus matrix-matrix products are signiﬁcantly less expensive than the linearsolve, which is performed with a preconditioned Krylov iterative method. Consequently, theadjoint method provides a factor of N Ω ∼ savings over both the forward sensitivity andﬁnite-diﬀerence methods, as described in Section 2.2.1. The adjoint method also allows usto avoid additional round-oﬀ or truncation error arising from ﬁnite-diﬀerence derivatives. Inwhat follows, we consider Ω to be a set of parameters describing the magnetic geometry,which will be speciﬁed in Section 4.4.We compute the derivatives of R using two approaches. In the ﬁrst approach, we deﬁne aninner product that involves integrals over the distribution function, and an adjoint operatoris obtained with respect to this inner product. This is the continuous approach introducedin Section 2.2.2. In the second approach, we consider the DKE after discretization, deﬁningan adjoint operator with respect to the Euclidean dot product. This is the discrete approachintroduced in Section 2.2.1. While these approaches should provide identical results withindiscretization error, the advantages and drawbacks of each method will be discussed at theend of Section 4.3.2. Let F = { F s } N species s =1 be the set of unknowns computed with SFINCS before discretization,denoted by the column vector in (4.12) with F s given by (4.11). That is, F consists of a setof N species distribution functions over ( θ, φ, X s , ξ s ) and their associated source functions. Wedeﬁne an inner product between two such quantities in the following way, (cid:104) F, G (cid:105) = (cid:88) s (cid:28)(cid:90) d v f s g s f Ms (cid:29) ψ + S f s S g s + S f s S g s . (4.14)Here the superscript on S s and S s denotes the distribution function with which the sourcefunctions are associated and the sum is over species. The space of continuous functions, F ,of this form such that (cid:104) F, F (cid:105) is bounded will be denoted by H . It can be seen that (4.14)is indeed an inner product, as it satisﬁes conjugate symmetry ( (cid:104) G, F (cid:105) = (cid:104) F, G (cid:105) ∀

F, G ∈ H ),linearity ( (cid:104) F + G, H (cid:105) = (cid:104) F, H (cid:105) + (cid:104) G, H (cid:105) ∀

F, G, H ∈ H and (cid:104)

F, aG (cid:105) = a (cid:104) F, G (cid:105) ∀

F, G ∈ H , a ∈ R ), and positive deﬁniteness ( (cid:104) F, F (cid:105) ≥ (cid:104)

F, F (cid:105) = 0 only if F = 0 ∀ F ∈ H ) [199].This implies that if H is ﬁnite-dimensional, then for any linear operator L there exists aunique adjoint operator L † such that (cid:104) LF, G (cid:105) = (cid:104) F, L † G (cid:105) for all F, G ∈ H . While here H is not ﬁnite-dimensional, we will show that such an adjoint operator exists for this inner69roduct.Note that the norm associated with this inner product || F || = (cid:112) (cid:104) F, F (cid:105) is similar to thefree energy norm, W = (cid:88) s (cid:42)(cid:90) d v T s f s f Ms (cid:43) ψ , (4.15)which obeys a conservation equation in gyrokinetic theory [2, 132, 141]. The choice of innerproduct (4.14) is advantageous, as the linearized Fokker-Planck collision operator becomesself-adjoint for species linearized about Maxwellians with the same temperature. In whatfollows, we assume that all included species are of the same temperature. This assumptioncould be lifted, with a modiﬁcation to the collision operator that appears in the adjointequation (Appendix E). This assumption is not necessary when using the discrete approach(Section 4.3.2).Consider a moment of the distribution function R ∈ { V || ,s , Γ s , Q s , J b , J r , Q tot } , which canbe written as an inner product with a vector (cid:101) R ∈ H , R = (cid:104) F, (cid:101) R(cid:105) , (4.16)according to (4.14). For example, (cid:101) J r =  q s v m s · ∇ ψf Ms  N species s =1 , (4.17)where the column structure corresponds with that in (4.11) and (4.12).We are interested in computing the derivative of R with respect to a set of parameters,Ω = { Ω i } N Ω i =1 such that the DKE is satisﬁed. Computing such a derivative with the forwardsensitivity method requires that we compute ∂F (Ω) /∂ Ω i from the linearized DKE, ∂ L (Ω) ∂ Ω i F + L ∂F (Ω) ∂ Ω i = ∂ S (Ω) ∂ Ω i , (4.18)for each Ω i and evaluate the derivative using the chain rule, ∂ R (Ω , F (Ω)) ∂ Ω i = ∂ R (Ω , F ) ∂ Ω i + (cid:28) (cid:101) R , ∂F (Ω) ∂ Ω i (cid:29) . (4.19)We see that the forward sensitivity method requires solutions of N Ω linear systems of thesame dimension as the DKE (4.13).To avoid this additional computational cost, we instead apply the adjoint method byconstructing the Lagrangian functional, enforcing (4.13) as a constraint, L (Ω , F, λ R ) = R (Ω , F ) + (cid:68) λ R , L F − S (cid:69) . (4.20)Here λ R is the Lagrange multiplier. We obtain the adjoint equation by ﬁnding a stationary70oint of L with respect to F , δ L (Ω , F, λ R ; δF ) = (cid:104) δF, (cid:101) R(cid:105) + (cid:68) λ R , L δF (cid:69) = 0 . (4.21)We can now use the adjoint property to express the above as, δ L (Ω , F, λ R ; δF ) = (cid:104) δF, (cid:101) R + L † λ R (cid:105) . (4.22)A stationary point of L with respect to F corresponds to λ R which satisﬁes the weak formof the adjoint equation, L † λ R + (cid:101) R = 0 . (4.23)With this adjoint variable, we can now compute derivatives of R with respect to any pa-rameter by computing the corresponding perturbations of L , ∂ R (Ω , F (Ω)) ∂ Ω i = ∂ L (Ω , F, λ R ) ∂ Ω i = ∂ R (Ω , F ) ∂ Ω i + (cid:28) λ R , ∂ L (Ω) ∂ Ω i F − ∂ S (Ω) ∂ Ω i (cid:29) . (4.24)The ﬁrst term on the right hand side accounts for the explicit dependence on Ω i whilethe second accounts for the implicit dependence on Ω i through F . Thus, using (4.24),the derivative with respect to Ω can be computed with the solution to two linear systems,(4.13) and (4.23). The partial derivatives on the right hand side of (4.24) can be computedanalytically by considering the explicit geometric dependence of R , L , and S .When N Ω is large, the cost of computing ∂ R /∂ Ω using (4.24) is dominated not by thelinear solve but by constructing ∂ S /∂ Ω and ∂ L /∂ Ω and computing the inner product. Thusthe cost still scales with N Ω . However, we obtain a signiﬁcant savings in comparison withforward-diﬀerence derivatives, as shown in Section 4.4.The adjoint operator for each species takes the following form, L † s =  L † s − C s f Ms f Ms X s L † s L † s  , (4.25)where L † s = 5 / L s − L s and L † s = 3 / L s − L s . The same column structure is used as forthe forward operator (4.12), L † = { L † s } N species i =1 . The quantity L † s satisﬁes (cid:104) (cid:82) d v g s L s f s /f Ms (cid:105) ψ = (cid:104) (cid:82) d v f s L † s g s /f Ms (cid:105) ψ and depends on which trajectory model is applied. The expression(4.25) can be veriﬁed by noting that (cid:104) L F, G (cid:105) = (cid:88) s (cid:42) f s (cid:16) ( L † s − C s ) g s + f Ms (cid:0) S g s + S g s X s (cid:1)(cid:17) f Ms (cid:43) ψ + S f s L † s g s + S f s L † s g s = (cid:104) F, L † G (cid:105) . (4.26)For the DKES trajectories the adjoint operator is, L † s = − L s . (4.27)This anti-self-adjoint property is used in obtaining the variational principle which provides71ounds on neoclassical transport coeﬃcients in the DKES code [230]. For full trajectories itis, L † s = − L s + q s T s Φ (cid:48) ( ψ ) v m s · ∇ ψ. (4.28)The anti-self-adjoint property does not hold for this trajectory model as the E × B drift(F.9) is no longer divergenceless. Appendix F contains details on obtaining these adjointoperators. Next, we consider the discrete adjoint approach. Let −→ F be the set of unknowns computedwith SFINCS after discretization of F . The linear DKE (4.13) upon discretization can thenbe written schematically as, ←→ L −→ F = −→ S . (4.29)In this case, we can deﬁne an inner product as the vector dot product, (cid:104)−→ F , −→ G (cid:105) = −→ F · −→ G . (4.30)In real Euclidean space, the adjoint operator, (cid:16) ←→ L (cid:17) † , which satisﬁes, (cid:68) ←→ L −→ F , −→ G (cid:69) = (cid:28) −→ F , (cid:16) ←→ L (cid:17) † −→ G (cid:29) (4.31)is simply the transpose of the matrix, (cid:16) ←→ L (cid:17) T . Again, the moments of the distributionfunction, R can be expressed as an inner product with a vector −→ R , R = (cid:104)−→ F , −→ R (cid:105) . (4.32)Using the discrete approach, the following adjoint equation must be solved (cid:16) ←→ L (cid:17) T −→ λ R = −→ R . (4.33)The adjoint variable, −→ λ R , can again be used to compute the derivative of R with respect toΩ, ∂ R (cid:16) Ω , −→ F (Ω) (cid:17) ∂ Ω i = ∂ R (cid:16) Ω , −→ F (cid:17) ∂ Ω i + (cid:42) −→ λ R , (cid:32) ∂ −→ S (Ω) ∂ Ω i − ∂ ←→ L (Ω) ∂ Ω i −→ F (cid:33)(cid:43) . (4.34)As with the continuous approach, the partial derivatives on the right hand side can becomputed analytically. In this way, the derivative of R with respect to Ω can be computedwith only two linear solves, (4.29) and (4.33).In the SFINCS implementation, the DKE is typically solved with the preconditionedGMRES algorithm. In the continuous approach, a preconditioner matrix for both the for-ward and adjoint operator must be LU -factorized. Here the preconditioner matrix is the72ame as the full matrix but without cross-species or speed coupling. As the adjoint matrixis suﬃciently diﬀerent from the forward matrix, we do not obtain convergence when thesame preconditioner is used for both problems. However, in the discrete approach, the LU -factorization for the preconditioner of the forward matrix can be reused for the preconditionerof the adjoint matrix. (If a matrix A has been factorized as A = LU then A T = U T L T where U T is lower triangular and L T is upper triangular). This provides a signiﬁcant reduction inmemory and computational cost for the discrete approach.Furthermore, the discrete adjoint approach provides the exact derivatives for the dis-cretized problem. With this method, the adjoint equation is obtained using the vector dotproduct and matrix transpose, which can be computed without any numerical approxima-tion. The error in the derivatives obtained by the adjoint method is therefore only limitedby the tolerance to which the linear solve is performed with GMRES. On the other hand,the continuous adjoint approach relies on a continuous inner product that must ultimatelybe approximated numerically. Thus the continuous approach provides the exact derivativesonly in the limit that the discrete approximation of the inner product exactly reproducesthe continuous inner product. Therefore we expect the results of the discrete and adjointapproaches to agree within discretization error, as will be demonstrated in Section 4.4.The continuous approach can be advantageous in that an adjoint equation may be pre-scribed independently of the discretization scheme. Note that in the discrete approach, theadjoint operator is obtained from the matrix transpose of the discretized forward operator,which implies that the same spatial and velocity resolution parameters must be used for boththe forward and adjoint solutions. In this Chapter, we will employ the same discretizationparameters for both the adjoint and forward problems, but this restriction is not requiredfor the continuous approach. The adjoint method has been implemented in the SFINCS code using both the dis-crete and continuous approaches. The magnetic geometry is speciﬁed in Boozer coordinates(Appendix A.4) such that the covariant form of the magnetic ﬁeld is, B = I ( ψ ) ∇ ϑ B + G ( ψ ) ∇ ϕ B + K ( ψ, ϑ B , ϕ B ) ∇ ψ, (4.35)where I ( ψ ) = µ I T ( ψ ) / π and G ( ψ ) = µ I P ( ψ ) / π , I T ( ψ ) is the toroidal current enclosedby ψ , and I P ( ψ ) is the poloidal current outside of ψ . The contravariant form is, B = ∇ ψ × ∇ ϑ B − ι ( ψ ) ∇ ψ × ∇ ϕ B , (4.36)where ι ( ψ ) is the rotational transform. The Jacobian is obtained from dotting (4.35) with(4.36), √ g = G ( ψ ) + ι ( ψ ) I ( ψ ) B . (4.37) The adjoint method is implemented in the main branch of the SFINCS codehttps://github.com/landreman/sﬁncs. K ( ψ, ϑ B , ϕ B ) does not appear in any of the trajectory coeﬃcients ((D.2) and (D.4)), in thedrive term in (D.1), or in the geometric factors used to deﬁne the moments of the distributionfunction ((4.5), (4.6), and (4.7)), all the geometric dependence enters through B ( ψ, ϑ B , ϕ B ), G ( ψ ), I ( ψ ), and ι ( ψ ). We choose to use Boozer coordinates for these computations as itreduces the number of geometric parameters that must be considered, but the neoclassicaladjoint method is not limited to this choice of coordinate system.We approximate B by a truncated Fourier series, B = (cid:88) m,n B cm,n cos( mϑ B − nN P ϕ B ) , (4.38)where the sum is taken over Fourier modes m ≤ m max and | n | ≤ n max and N P is the numberof periods. In (4.38), we have assumed stellarator symmetry such that B ( − ϑ B , − ϕ B ) = B ( ϑ B , ϕ B ), and N p symmetry such that B ( ϑ B , ϕ B + 2 π/N P ) = B ( ϑ B , ϕ B ). Thus we computederivatives with respect to the parametersΩ = { B cm,n , I ( ψ ) , G ( ψ ) , ι ( ψ ) } . Additionally, derivatives with respect to E r are computed,which are used for eﬃcient ambipolar solutions and computing derivatives of geometricquantities at ambipolarity (Section 4.5.3) rather than at ﬁxed E r .To demonstrate, we compute ∂ R /∂B c , for moments of the ion distribution function usingthe discrete and continuous adjoint methods. A 3-mode model of the standard conﬁgurationW7-X geometry at ρ = (cid:112) ψ/ψ = 0 . B = B c , + B c , cos( N P ϕ B ) + B c , cos( ϑ B − N P ϕ B ) + B c , cos( ϑ B ) , (4.39)where B c , = 0 . B c , , B c , = − . B c , , and B c , = − . B c , . Electron and ion( q i = e ) species are included, and the derivatives are computed at the ambipolar E r withthe full trajectory model. The derivatives are also computed with a forward-diﬀerence ap-proach with varying step size ∆ B c , . In Figure 4.1 we show the fractional-diﬀerence between ∂ R /∂B c , computed using the adjoint method and with forward-diﬀerence derivatives. Wesee that at large values of ∆ B c , , the adjoint and numerical derivatives begin to diﬀer signiﬁ-cantly due to discretization error from the forward-diﬀerence approximation. The fractionalerror decreases proportional to ∆ B c , as expected until the rounding error begins to domi-nate [203] when ∆ B c , /B c , is approximately 10 − , where B c , is the value of the unperturbedmode. The discrete and continuous approaches show qualitatively similar trends. However,the minimum fractional diﬀerence is lower in the discrete approach due to the additionaldiscretization error that arises with the continuous approach. With suﬃcient resolution pa-rameters (41 θ grid points, 61 φ grid points, 85 ξ basis functions, and 7 X basis functions),the fractional error of the continuous approach is ≤ .

1% and should not be signiﬁcantfor most applications. We ﬁnd similar agreement for other derivatives and with the DKEStrajectory model.To demonstrate that the discrete and continuous methods indeed produce the samederivative information, we compute the fractional diﬀerence between the derivatives com-puted with the two methods as a function of the resolution parameters. As an example, inFigure 4.2a we show the fractional diﬀerence in ∂Q i /∂ι , where Q i is the radial ion heat ﬂux,as a function of the number of Legendre polynomials used for the pitch angle discretization,74 -8 -7 -6 -5 -4 -3 -2 -1 F r a c t i ona l d i ff e r en c e (a) Discrete approach -8 -7 -6 -5 -4 -3 -2 -1 (b) Continuous approach Figure 4.1: Fractional diﬀerence between derivatives with respect to B c , computed withthe adjoint method and with a forward-diﬀerence derivative with step size ∆ B c , . The fulltrajectory model was used with (a) the discrete and (b) the continuous adjoint approaches.Figure adapted from [186] with permission. N ξ , keeping the other resolution parameters ﬁxed. As N ξ is increased, the fractional diﬀer-ences converge to a ﬁnite value, approximately 10 − , due to the discretization error in theother resolution parameters. Similar resolution parameters are required for the convergenceof the moment itself, Q i , and its derivative computed with the continuous method, ∂Q i /∂ι .Convergence of Q i within 5% is obtained with N ξ = 38, similar to that required for theconvergence of ∂Q/∂ι , as can be seen in Figure 4.2a.In Figure 4.2b, we compare the cost of calculating derivatives of one moment with respectto N Ω parameters using the continuous and discrete adjoint methods and forward-diﬀerencederivatives. All computations are performed on the Edison computer at NERSC using 48processors, and the elapsed wall time is reported. Here we include the cost of solving thelinear system and computing diagnostics N Ω + 1 times for the forward-diﬀerence approach,and the cost of solving the forward and adjoint linear systems and computing diagnostics forthe adjoint approaches. The cost of the continuous approach is slightly more than that of thediscrete approach due to the cost of factorizing the adjoint preconditioner. However, at large N Ω the cost of computing diagnostics for the adjoint approach (e.g., computing ∂ S /∂ Ω and ∂ L /∂ Ω and performing the inner product in (4.24)) dominates that of solving the adjointlinear system; thus the discrete and continuous approaches become comparable in cost. Inthis regime, the adjoint approach provides speed-up by a factor of approximately 50.75 N -5 -4 -3 -2 -1 Q i / F r a c t i ona l d i ff e r en c e (a) N W a ll c l o ck t i m e [ s ] Forward differenceContinuous adjointDiscrete adjoint (b)

Figure 4.2: (a) The fractional diﬀerence between ∂Q i /∂ι computed with the continuousand discrete approaches converges with the number of pitch angle Legendre modes, N ξ .(b) Comparison of the computational cost of computing ∂ R /∂ Ω with forward-diﬀerencederivatives and the adjoint approach as a function of N Ω , the number of parameters in thegradient. Figure reproduced from [186] with permission. With the adjoint method, it is possible to compute derivatives of a moment of the distri-bution function with respect to the Fourier amplitudes of the ﬁeld strength, { ∂ R /∂B cm,n } .Rather than consider sensitivity in Fourier space, we would like to compute the sensitivityto local perturbations of the ﬁeld strength. We now quantify the relationship between thesetwo representations of sensitivity information.Consider the Gateaux functional derivative [52] of R with respect to B , δ R ( B ( x ); δB ) = lim (cid:15) → R ( B ( x ) + (cid:15)δB ( x )) − R ( B ( x )) (cid:15) . (4.40)Here the ﬁeld strength is perturbed at ﬁxed I ( ψ ), G ( ψ ), and ι ( ψ ). As δ R ( B ( x ); δB ) is alinear functional of δB , by the Riesz representation theorem [199], δ R can be expressed asan inner product with δB and some element of the appropriate space. The function δB isdeﬁned on a ﬂux surface, ψ ; thus it is sensible to express δ R in the following way, δ R ( B ( x ); δB ) = (cid:10) S R δB ( x ) (cid:11) ψ . (4.41)Here δ R quantiﬁes the change in the moment R associated with a local perturbation to theﬁeld strength, δB ( x ). The function S R is analogous to the shape gradient introduced inSection 2.1, which will be discussed further in Section 4.5.2.Suppose that B is stellarator symmetric and N P symmetric. If E r = 0, then S R must76lso possess stellarator and N P symmetry (Appendix G). However, when E r (cid:54) = 0, S R is nolonger guaranteed to have stellarator symmetry. Nonetheless, it may be desirable to ignorethe stellarator-asymmetric part of S R if an optimized stellarator-symmetric conﬁgurationis desired. For the remainder of this Chapter, we will make this assumption, though theanalysis could be extended to consider the eﬀect of breaking of stellarator symmetry. Atruncated Fourier series can approximate the quantity S R under these assumptions, S R = (cid:88) m,n S m,n cos( mϑ B − nN P ϕ B ) , (4.42)where the sum is taken over m ≤ m max and | n | ≤ n max . The quantity δB ( x ) can be writtenin terms of perturbations to the Fourier coeﬃcients, δB ( x ) = (cid:88) m,n δB cm,n cos( mϑ B − nN P ϕ B ) , (4.43)and now δ R can be written in terms of these perturbations to the Fourier coeﬃcients, δ R = (cid:88) m,n ∂ R ∂B cm,n δB cm,n . (4.44)In this way, (4.41) can be expressed as a linear system, ∂ R ∂B cm,n = (cid:88) m (cid:48) ,n (cid:48) D m,n ; m (cid:48) ,n (cid:48) S m (cid:48) ,n (cid:48) , (4.45)where, D m,n ; m (cid:48) ,n (cid:48) = V (cid:48) ( ψ ) − (cid:90) π dϑ B (cid:90) π dϕ B √ g cos( mϑ B − nN P ϕ B ) cos( m (cid:48) ϑ B − n (cid:48) N P ϕ B ) . (4.46)If the same number of modes is used to discretize δ R and S R , then the linear system issquare.In contrast to derivatives with respect to the Fourier modes of B , the sensitivity function, S R , is a spatially local quantity, quantifying the change in a ﬁgure of merit resulting from alocal perturbation of the ﬁeld strength. In this way, S R can inform where perturbations to themagnetic ﬁeld strength can be tolerated. The sensitivity function could be related directly toa local magnetic tolerance, as described in Section 2.1.3. In contrast with the work in [138],here we are considering perturbations to the ﬁeld strength on any ﬂux surface rather thanat the plasma boundary. However, S R still provides insight into where trim coils shouldbe placed or coil displacements can be tolerated without sacriﬁcing desired neoclassicalproperties. The sensitivity function can also be used for gradient-based optimization inthe space of the ﬁeld strength on a ﬂux surface, as demonstrated in Section 4.5.2.We compute S J b for the W7-X standard conﬁguration at ρ = 0 .

70, shown in Figure 4.3a.We use a ﬁxed-boundary equilibrium that preceded the coil design and does not include coilripple, and the full equilibrium is used rather than the truncated Fourier series considered inSection 4.4. The same resolution parameters are used as in Section 4.4, and derivatives with77 a) (b)(c)

Figure 4.3: (a) The local magnetic sensitivity function for the bootstrap current, S J b , isshown for the W7-X standard conﬁguration. Positive values indicate that increasing theﬁeld strength at a given location will increase J b through (4.41). (b) The local sensitivityfunction for the ion particle ﬂux, S Γ i . (c) The magnetic ﬁeld strength on the ρ = 0 . B cm,n are computed for m max = n max = 20. The largest modes for this conﬁgurationare the helical curvature B c , , the toroidal curvature B c , , and the toroidal mirror B c , .We ﬁnd that S J b is large and negative on the inboard side, indicating that increasing themagnitude of the toroidal curvature component of B would lead to an increase in J b . Thisresult is in agreement with previous analysis [155], which found that at low collisionality,the bootstrap current coeﬃcients depend strongly on the toroidal curvature. Additionally,we note a localized region of strong sensitivity on the inboard side near the bean-shapedcross-section. Experimental [55] and numerical [75] evidence indicates that the magnitude ofthe bootstrap current is increased in the lower mirror-ratio conﬁguration of W7-X, where themirror-ratio is deﬁned as ( B max − B min ) / ( B max + B min ). Our result appears to be consistentwith these observations: we note that the localized region of strongly positive S J b is nearthe maximum of the magnetic ﬁeld strength (Figure 4.3c), indicating that increasing themirror-ratio would lead to a decrease in the magnitude of bootstrap current, as J b < S Γ i , computed for thesame conﬁguration using m max = 20 and n max = 20. We ﬁnd that the particle ﬂux is moresensitive to perturbations on the outboard side in localized regions, while on the inboardside the sensitivity is relatively small in magnitude. Optimization of the magnetic ﬁeld strength

As a second demonstration of the adjoint neoclassical method, we consider optimizingin the space of the ﬁeld strength on a surface, taking Ω = { B cm,n } . As Boozer coordinatesare used, the covariant form (4.35) satisﬁes ( ∇ × B ) · ∇ ψ = 0 and the contravariant form(4.36) satisﬁes ∇ · B = 0. As we will artiﬁcially modify the ﬁeld strength while keepingother geometry parameters ﬁxed, the resulting ﬁeld will not necessarily satisfy both of theseconditions with both the covariant and contravariant forms. While there is no guarantee thatthe resulting ﬁeld strength will be consistent with a global equilibrium solution, it providesinsight into how local changes to the ﬁeld strength can impact neoclassical properties. As asecond step, the outer boundary could be optimized to match the desired ﬁeld strength ona single surface. In Section 4.5.2, we discuss how the derivatives computed in this Chaptercould be coupled to the optimization of an MHD equilibrium.We perform optimization with a BFGS quasi-Newton method (Chapter 6 in [170]) usingan objective function χ = J b , implemented in the sfincs adjoint branch of the STEL-LOPT code. A backtracking line search is used at each iteration to ﬁnd a step size thatsatisﬁes a condition of suﬃcient decrease of χ . We use the same equilibrium as in Section4.5.1, retaining modes m ≤

12 and | n | ≤

12, and compute derivatives with respect to thesemodes. Convergence to χ ≤ − was obtained within 8 BFGS iterations (28 functionevaluations), as shown in Figure 4.4a. The diﬀerence in ﬁeld strength between the initialand optimized conﬁguration, B opt − B init , is shown in Figure 4.4b. As expected from theanalysis in Section 4.5.1, the ﬁeld strength increased on the outboard side and decreased onthe inboard side in comparison with B init . (Note that J b < Iteration -10 -5 Initial value (a) -1.5-1-0.500.5110 -3 (b) Figure 4.4: (a) Convergence of χ = J b for optimization over Ω = { B cm,n } with an adjoint-based BFGS method. (b) The change in ﬁeld strength from the initial to optimized conﬁg-uration. Figure adapted from [186] with permission. Optimization of MHD equilibria

The local sensitivity function, S R , along with ∂ R /∂I , ∂ R /∂G , and ∂ R /∂ι , can be used todetermine how perturbations to the outer boundary of the plasma, S P , result in perturbationsto R . This is quantiﬁed through the idea of the shape gradient, introduced in Section 2.1.The partial derivatives of R can be computed with the adjoint method outlined in Section4.3, and the shape gradient can be obtained with only one additional MHD equilibriumsolution through the application of another adjoint method.Consider a ﬁgure of merit which is integrated over the toroidal conﬁnement volume, V P , f R ( S P ) = (cid:90) V P d x w ( ψ ) R ( ψ ) , (4.47)where w ( ψ ) is a weighting function. That is, SFINCS is run on a set of ψ surfaces within V P and the volume integral is computed numerically. Here we consider S P to be the plasmaboundary used for a ﬁxed-boundary MHD equilibrium calculation. From the Hadamard-Zolesio structure theorem (Section 2.1), the perturbation to f R resulting from normal per-turbation to S P can be written in the following form, δf R ( S P ; δ x ) = (cid:90) S P d x ( δ x · ˆ n ) G , (4.48)under certain assumptions of smoothness [52]. This can be thought of as another instanceof the Riesz representation theorem, as δf R is a linear functional of δ x . Here ˆ n is the out-ward unit normal on S P and δ x is a vector ﬁeld describing the perturbation to the surface.Intuitively, only normal perturbations to S P result in a change to f R . The shape gradient80s G , which quantiﬁes the contribution of a local normal perturbation of the boundary tothe change in f R . The shape gradient can be used for ﬁxed-boundary optimization of equi-libria or analysis of sensitivity to perturbations of magnetic surfaces. It can be computedusing a second adjoint method, where a perturbed MHD force balance equation is solvedwith the addition of a bulk force that depends on derivatives computed from the neoclas-sical adjoint method. This will be described in detail in Chapter 5. While the continuousneoclassical adjoint method described in this Chapter arises from the self-adjointness of thelinearized Fokker-Planck operator, the adjoint method for MHD equilibria arises from theself-adjointness of the MHD force operator. In practice, these two adjoint methods could becoupled by ﬁrst computing an MHD equilibrium solution, computing neoclassical transportand its geometric derivatives from this equilibrium with the neoclassical adjoint method,and passing these derivatives back to the equilibrium code to compute the shape gradientwith the perturbed MHD adjoint method. In this way, derivatives of neoclassical quantitieswith respect to the shape of the outer boundary are computed with only two equilibriumsolutions and two DKE solutions.Rather than solve an additional adjoint equation, the outer boundary could be optimizedby numerically computing derivatives of { B cm,n ( ψ ) , G ( ψ ) , I ( ψ ) } with respect to the doubleFourier series describing the outer boundary shape in cylindrical coordinates, { R cm,n , Z sm,n } ,using a ﬁnite-diﬀerence method. This could be done using the STELLOPT code [197, 213]with BOOZ XFORM [202] to perform the coordinate transformation. For example, if therotational transform is held ﬁxed in the VMEC equilibrium calculation [111], the derivativeof a moment, R , with respect to a boundary coeﬃcient, R cm,n , can be computed as, ∂ R ( ψ ) ∂R cm,n ( ψ ) = (cid:88) m (cid:48) ,n (cid:48) ∂ R ( ψ ) ∂B cm (cid:48) ,n (cid:48) ( ψ ) ∂B cm (cid:48) ,n (cid:48) ( ψ ) ∂R cm,n ( ψ ) + ∂ R ( ψ ) ∂G ( ψ ) ∂G ( ψ ) ∂R cm,n ( ψ ) + ∂ R ( ψ ) ∂I ( ψ ) ∂I ( ψ ) ∂R cm,n ( ψ ) , (4.49)where ∂ R ( ψ ) /∂B cm,n ( ψ ), ∂ R ( ψ ) /∂G ( ψ ), and ∂ R ( ψ ) /∂I ( ψ ) are computed with the neoclas-sical adjoint method and ∂B cm,n ( ψ ) /∂R cm,n ( ψ ), ∂G ( ψ ) /∂R cm,n ( ψ ), and ∂I ( ψ ) /∂R cm,n ( ψ ) are computed with ﬁnite-diﬀerence derivatives using STELLOPT. Similarly,derivatives of { B cm,n ( ψ ) , G ( ψ ) , I ( ψ ) } could be computed with respect to coil parameters us-ing a free-boundary equilibrium solution, allowing for direct optimization of neoclassicalquantities with respect to coil shapes. The neoclassical calculation with SFINCS is typicallysigniﬁcantly more expensive than the equilibrium calculation (for the geometry discussedin Section 4.5.1 ﬁxed-boundary VMEC took 54 seconds while SFINCS took 157 secondson 4 processors of the NERSC Edison computer). As such, combining adjoint-based withﬁnite-diﬀerence derivatives can still result in a signiﬁcant computational savings. As stellarators are not intrinsically ambipolar, the radial electric ﬁeld is not truly anindependent parameter. The ambipolar E r must be obtained which satisﬁes the condition J r ( E r ) = 0. The application of adjoint-based derivatives for computing the ambipolar solu-tion is discussed in Section 4.5.3. An adjoint method to compute derivatives with respect togeometric parameters at ﬁxed ambipolarity is discussed in Section 4.5.3.81 ccelerating ambipolar solve A nonlinear root-ﬁnding algorithm must be used to compute the ambipolar E r . Thisroot-ﬁnding can be accelerated with derivative information, such as with a Newton-Raphsonmethod [195]. The derivative required, ∂J r /∂E r , can be computed with the discrete orcontinuous adjoint method as described in Section 4.3 with the replacement Ω i → E r , con-sidering R = J r .We implement three nonlinear root ﬁnding methods: Brent’s method [30], the Newton-Raphson method, and a hybrid between the bisection and Newton-Raphson methods [195].Brent’s method guarantees at least linear convergence by combining quadratic interpola-tion with bisection and does not require derivatives. The Newton-Raphson method canprovide quadratic convergence under certain assumptions but in general is not guaranteedto converge. If an iterate lies near a stationary point or a poor initial guess is given, themethod can fail. For this reason, we implement the hybrid method, which combines thepossible quadratic convergence properties of Newton-Raphson with the guaranteed linearconvergence of the bisection method. Both Brent’s method and the hybrid method requirethe root to be bracketed and therefore may require additional function evaluations to obtainthe bracket.We compare these methods in Figure 4.5, using the W7-X standard conﬁguration con-sidered in Section 4.5.1 with the full trajectory model and the discrete adjoint approach,beginning with an initial guess of E r = −

10 kV/m with bounds at E min r = −

100 kV/m and E max r = 100 kV/m. The root is located at E r = − .

84 kV/m. For this example, the hybridand Newton methods had nearly identical convergence properties. However, the Newtonmethod is less expensive as it does not require J r to be evaluated at the bounds of the in-terval. The Newton method provides a 22% savings in wall clock time over Brent’s methodto obtain the root within the same tolerance.In the above discussion, we have assumed that there is only one stable root of interest. Ofcourse, a given conﬁguration may possess several roots, especially if the ions and electrons arein diﬀerent collisionality regimes [92]. Multiple roots can be obtained by performing severalroot solves with diﬀerent initial values and brackets, which could be trivially parallelized.Thus the adjoint method could still provide an acceleration in this more general case. Derivatives at ambipolarity

The adjoint method described in Section 4.3 assumes that E r is held constant when com-puting derivatives with respect to Ω. However, E r cannot truly be determined independentlyfrom geometric quantities, as the ambipolar solution should be recomputed as the geometryis altered. It is therefore desirable to compute derivatives at ﬁxed ambipolarity (ﬁxed J r = 0)rather than at ﬁxed E r . This is performed by solving an additional adjoint equation, L † λ J r + (cid:101) J r = 0 , (4.50)in the continuous approach or, (cid:16) ←→ L (cid:17) T −→ λ J r = −→ J r , (4.51)82 Iteration -10 -5 BrentNewton hybridNewtonInitial guess

Figure 4.5: The ambipolar root is obtained with Brent, Newton-Raphson, and Newtonhybrid nonlinear root solvers. The derivatives obtained with the adjoint method providebetter convergence properties for the Newton methods. Figure adapted from [186] withpermission.in the discrete approach. Details are described in Appendix H.It should be noted that by computing derivatives at ambipolarity, we assume that agiven moment R is a diﬀerentiable function of the geometry at ﬁxed J r = 0. That is, thismethod cannot be applied to cases in which a stable root disappears as the geometry varies.As this will occur at a stationary point of J r ( E r ), this situation could be avoided withinan optimization loop by computing derivatives at constant E r rather than constant J r if | ∂J r /∂E r | falls below a given threshold at ambipolarity.Although an additional adjoint solve is required, this method of computing derivativesat ambipolarity is advantageous as several linear solves are typically needed to obtain theambipolar root. A comparison of the computational cost between the adjoint method and theforward-diﬀerence method for derivatives at ambipolarity is shown in Figure 4.6a. Here thefull trajectory model is used, and the results for both the discrete and continuous adjointmethods are shown. For the ﬁnite-diﬀerence derivative, the ambipolar solve is performedwith Brent’s method at each step in Ω. As in Figure 4.2b, we ﬁnd that for large N Ω , thecost of the continuous and discrete approaches are essentially the same, as the cost is nolonger dominated by the linear solve. When computing the derivatives at ambipolarity, bothadjoint methods decrease the cost by a factor of approximately 200 for large N Ω .In Figure 4.6b we show a benchmark between derivatives at ambipolarity,( ∂ R /∂B c , ) J r , computed with the discrete adjoint method and with forward-diﬀerence deriva-tives. For the forward-diﬀerence method, the Newton solver is used to obtain the ambipolar E r as B c , is varied. As the forward diﬀerence step size ∆ B c , decreases, the fractional dif-ference again decreases proportional to ∆ B c , until it reaches a minimum when ∆ B c , /B c , is approximately 10 − . In comparison with Figure 4.1, we see that the minimum fractionaldiﬀerence is slightly larger at ﬁxed ambipolarity than at ﬁxed E r , as the tolerance parametersassociated with the Newton solver introduce an additional source of error to the forward-83 N W a ll c l o ck t i m e [ s ] Forward differenceContinuous adjointDiscrete adjoint (a) -8 -7 -6 -5 -4 -3 -2 -1 F r a c t i ona l d i ff e r en c e (b) Figure 4.6: (a) The cost of computing the gradient ∂ R /∂ Ω at ambipolarity scales with N Ω ,the number of parameters in Ω. (b) The fractional diﬀerence between ∂ R /∂B c , at constantambipolarity obtained with the adjoint method and with ﬁnite-diﬀerence derivatives. Figureadapted from [186] with permission.diﬀerence approach.In Figures 4.7a and 4.7b we compare the sensitivity function for the particle ﬂux, S Γ i ,computed using derivatives at constant E r with that computed at constant J r . Here deriva-tives are computed using the discrete adjoint method with full trajectories, and the sensitivityfunction is constructed as described in Section 4.5.1. The conﬁguration and numerical pa-rameters are the same as described in Section 4.5.1. At constant J r , the large region ofincreased sensitivity on the outboard side that appears at constant E r remains, though theoverall magnitude of the sensitivity decreases. Thus it may be important to account for theeﬀect of the ambipolar E r when optimizing for radial transport. In Figures 4.7c and 4.7d weperform the same comparison for S J b , ﬁnding the derivatives at ﬁxed E r and at ﬁxed J r to bevirtually identical. This is to be expected, as numerical calculations of neoclassical transportcoeﬃcients for W7-X have found that the bootstrap coeﬃcients are much less sensitive to E r than those for the radial transport (Figures 18 and 26 in [16]). Furthermore, the bootstrapcurrent in the 1 /ν regime is independent of E r , and the ﬁnite-collisionality correction is smallfor optimized stellarators, such as W7-X [102]. Therefore, the ambipolarity corrections tothe derivatives are less important for J b than for the radial transport. We have described a method by which moments R of the neoclassical distribution functioncan be diﬀerentiated eﬃciently with respect to many parameters. The adjoint approach84 a) (b)(c) (d) Figure 4.7: The sensitivity function for the ion particle ﬂux, S Γ i , is computed at (a) constant E r and (b) constant J r . Similarly, S J b is computed at (c) constant E r and (d) constant J r .Figure adapted from [186] with permission. 85equires deﬁning an inner product from which the adjoint operator is obtained. We considertwo choices for this inner product. One choice corresponds with computing the adjoint ofthe linear operator after discretization, and the other corresponds with computing it beforediscretization. In the case of the former, the Euclidean dot product can be used, and inthe case of the latter, an inner product whose corresponding norm is similar to the freeenergy norm (4.14) is deﬁned. In Section 4.4, we show that these approaches provide thesame derivative information within discretization error, as expected. Both methods provide areduction in computational cost by a factor of approximately 50 in comparison with forward-diﬀerence derivatives when diﬀerentiating with respect to many ( O (10 )) parameters. InSection 4.5.3 the adjoint method is extended to compute derivatives at ambipolarity. Thismethod provides a reduction in cost by a factor of approximately 200 over a forward-diﬀerenceapproach. We have implemented this method in the SFINCS code, and similar methods couldbe applied to other drift kinetic solvers.In this Chapter, we consider derivatives with respect to geometric quantities that enterthe DKE through Boozer coordinates. However, the adjoint neoclassical method we havedescribed is much more general, allowing for many possible applications. For example,derivatives of the radial ﬂuxes with respect to the temperature and density proﬁles couldbe used to accelerate the solution of the transport equations using a Newton method [13].The transport solution could furthermore be incorporated into the optimization loop to self-consistently evolve the macroscopic proﬁles in the presence of neoclassical ﬂuxes. Ratherthan simply optimizing for minimal ﬂuxes, an objective function such as the total fusionpower could be considered [107], with optimization accelerated by adjoint-based derivatives.Another application of the continuous adjoint formulation is the correction of discretiza-tion error. The same solution obtained in Section 4.3.1 can be used to quantify and correctfor the error in a moment, R , providing similar accuracy to that computed with a higher-order stencil or ﬁner mesh without the associated cost. This method has been applied in theﬁeld of computational ﬂuid dynamics by solving adjoint Euler equations [189, 231] and couldprove useful for eﬃciently obtaining solutions of the DKE in low-collisionality regimes.In Section 4.5.2, we have shown an example of adjoint-based neoclassical optimization,where the optimization space is taken to be the Fourier modes of the ﬁeld strength on asurface, { B cm,n } . While optimization within this space is not necessarily consistent witha global equilibrium solution, it demonstrates the adjoint neoclassical method for eﬃcientoptimization. In Section 4.5.2, two approaches to self-consistently optimize MHD equilibriaare discussed. Further discussion and demonstration will be provided in Chapter 5.In Appendix G we show that when E r = 0 and the unperturbed geometry is stellaratorsymmetric, the sensitivity functions for moments of the distribution function are also stel-larator symmetric. However, when E r (cid:54) = 0 this is no longer true. This implies that obtainingminimal neoclassical transport in the √ ν regime may require breaking of stellarator symme-try. In this Chapter, we have ignored the eﬀects of stellarator symmetry-breaking, thoughwe hope to extend this work to study these eﬀects in the future.86 hapter 5 Adjoint shape gradient for MHD equilibria

Most stellarator optimization to date has assumed that the magnetic ﬁeld satisﬁes theMHD equilibrium equations with either a ﬁxed or free-boundary approach, as detailed inSection 1.4.2. If a gradient-based optimization approach is applied, derivatives of quantitiesthat depend on the equilibrium solutions must be computed with respect to the shapesof the ﬁlamentary coils or plasma boundary. In this Chapter, we demonstrate an adjointapproach for obtaining the coil or surface shape gradient of such functions. With the shapegradient eﬃciently computed, shape derivatives with respect to any shape perturbation canbe calculated.The material in this Chapter has been adapted with permission from [10] and [187].

Several ﬁgures of merit quantifying conﬁnement must be considered in the numerical op-timization of stellarator MHD equilibrium. These ﬁgures of merit describing a conﬁgurationdepend on the shape of the outer plasma boundary or the shape of the electro-magneticcoils. It is thus desirable to obtain derivatives with respect to these shapes for optimizationof equilibria or identiﬁcation of sensitivity information. These so-called shape derivatives canbe computed by directly perturbing the shape, recomputing the equilibrium, and computingthe resulting change to a ﬁgure of merit that depends on the equilibrium solution. However,this direct ﬁnite-diﬀerence approach requires recomputing the equilibrium for each possibleperturbation of the shape. For stellarators whose geometry is described by a set of N Ω ∼ parameters, this requires N Ω solutions to the MHD equilibrium equations. Despite this com-putational complexity, gradient-based optimization of stellarators has proceeded with thedirect approach (e.g. [134, 196, 197]).As the target optimized conﬁguration can never be realized exactly, an analysis of thesensitivity to perturbations, such as errors in coil fabrication or assembly, is central to thesuccess of a stellarator. Tight tolerances have proven to be a signiﬁcant driver of the cost ofstellarator experiments [130, 220]; thus an improvement to the algorithms used to conductsensitivity studies can have a substantial impact on the ﬁeld. In studies of the coil tolerancesfor ﬂux surface quality of LHD [240] and NCSX [31, 236], perturbations of several distribu-87ions were manually applied to each coil. Sensitivity analysis can also be performed withanalytic derivatives. Numerical derivatives with respect to tilt angle and coil translation ofthe CNT coils have been used to compute the sensitivity of the rotational transform on axis[88]. Analytic derivatives have recently been applied to study coil sensitivities of the CNTstellarator by considering the eigenvectors of the Hessian matrix [243]. Thus, in addition togradient-based optimization, derivatives with respect to shape can be applied to sensitivityanalysis.The shape gradient quantiﬁes the change in a ﬁgure of merit associated with a localperturbation to a shape. Thus, if the shape gradient can be obtained, the shape derivativewith respect to any perturbation is known (more precise deﬁnitions of the shape deriva-tive and gradient are given in Sections 2.1 and 5.2). The shape gradient representation canbe computed from parameter derivatives by solving a small linear system (Sections 2.1.2).However, computing parameter derivatives can often be computationally expensive, as nu-merical derivatives require evaluating the objective function at least N Ω +1 times if one-sidedﬁnite-diﬀerence derivatives are used, or 2 N Ω times for centered diﬀerences. As computingthe objective function often involves solving a linear or nonlinear system, such as the MHDequilibrium equations, this implies solving the system of equations ≥ N Ω + 1 times. Numer-ical derivatives also introduce additional noise, and the ﬁnite-diﬀerence step size must bechosen carefully.Rather than use parameter derivatives, in this Chapter we will use an adjoint method tocompute the shape gradient. This is sometimes termed adjoint shape sensitivity or adjointshape optimization, which has its origins in aerodynamic engineering and computational ﬂuiddynamics [82, 190]. As with adjoint methods for parameter derivatives, this technique onlyrequires the solution of two linear or nonlinear systems of equations. This technique has beenapplied to magnetic conﬁnement fusion for the design of tokamak divertor shapes by solvingforward and adjoint ﬂuid equations [48, 49, 50]. As stellarators require many parametersto describe their shape, adjoint shape sensitivity could signiﬁcantly decrease the cost ofcomputing the shape gradient. If one is optimizing in the space of parameters describing theboundary of the plasma or the shape of coils, the shape gradient representation obtainedfrom the adjoint method can be converted to parameter derivatives upon multiplication witha small matrix (Section 2.1).We begin in Section 5.2 with a brief review of shape calculus concepts in the contextof MHD equilibria. In Section 5.3, the fundamental adjoint relations for perturbations toMHD equilibria are derived and discussed. These relations take a form that is similar tothat of transport coeﬃcients that are related by Onsager symmetry [177, 178]. Speciﬁcally,perturbations to the equilibrium are characterized as a set of generalized responses to acomplementary set of generalized forces. The responses and forces can be thought of asbeing related by a matrix operator, which is symmetric. The resulting relations amongforces and responses can be used to compute the shape gradient of functions of the equilibriawith respect to displacements of the plasma boundary or the coil shapes. In Section 5.4, thecontinuous adjoint method that takes advantage of the generalized self-adjointness relationsis discussed. Several applications to stellarator ﬁgures of merit will be demonstrated inSection 5.5. 88lthough the adjoint relations are based on the equations of linearized MHD, we performnumerical calculations in this Chapter with nonlinear MHD solutions with the addition ofa small perturbation. Demonstration is performed using nonlinear stellarator MHD equi-librium codes based on a variational principle, VMEC [111] and ANIMEC [43]. We obtainexpressions for the shape gradients of the volume-averaged β (Section 5.5.1), rotational trans-form (Section 5.5.2), vacuum magnetic well (Section 5.5.3), magnetic ripple (Section 5.5.4),eﬀective ripple in the 1 /ν neoclassical regime [168] where ν is the collision frequency (Section5.5.5), and departure from quasi-symmetry (Section 5.5.6). Finally, we demonstrate that theadjoint method for neoclassical optimization outlined in Chapter 4 can be coupled with a lin-earized adjoint MHD solution to compute derivatives of several neoclassical quantities withrespect to the shape of the plasma boundary (Section 5.5.7). We present calculations of theshape gradient with the adjoint approach for the volume-averaged β , rotational transform,and vacuum magnetic well ﬁgures of merit, which do not require modiﬁcation to VMEC.The calculation for the magnetic ripple is computed with a minor modiﬁcation of the ANI-MEC code. The adjoint force balance equations needed to compute the shape gradient forthe other ﬁgures of merit require the addition of a bulk force that will necessitate furthermodiﬁcation of an equilibrium or linearized MHD code. Numerical calculations for theseﬁgures of merit will, therefore, not be presented in this Chapter. We now review shape calculus fundamentals introduced in Chapter 2 in the context offunctions that depend on MHD equilibrium quantities. Consider a functional, F ( S P ), thatdepends implicitly on the plasma boundary, S P , through the solution to the ﬁxed-boundaryMHD equilibrium equations (Section 1.4.1) with boundary condition B · ˆ n | S P = 0 where ˆ n isthe outward unit normal on S P . We deﬁne a functional integrated over the plasma volume, V P , f ( S P ) = (cid:90) V P d x F ( S P ) , (5.1)where S P is the boundary of V P . Consider a vector ﬁeld describing displacements of thesurface, δ x , and a displaced surface S P,(cid:15) = { x + (cid:15)δ x : x ∈ S P } . The shape derivative of F is deﬁned as, δF ( S P ; δ x ) = lim (cid:15) → F ( S P,(cid:15) ) − F ( S P ) (cid:15) . (5.2)The shape derivative of f is deﬁned by the same expression with F → f . Under certainassumptions of smoothness of δF with respect to δ x , the shape derivative of the volume-integrated quantity, f , can be written in the following way (Section 2.1), δf ( S P ; δ x ) = (cid:90) V P d x δF ( S P ; δ x ) + (cid:90) S P d x δ x · ˆ n F. (5.3)The ﬁrst term accounts for the Eulerian perturbation to F while the second accounts for themotion of the boundary. This is referred to as the transport theorem for domain functionals89nd will be used throughout this Chapter to compute the shape derivatives of ﬁgures ofmerit of interest.According to the Hadamard-Zolesio structure theorem [52], the shape derivative of afunctional of S P (not restricted to the form of (5.1)) can be written in the following form, δf ( S P ; δ x ) = (cid:90) S P d x δ x · ˆ n G , (5.4)assuming δf exists for all δ x and is suﬃciently smooth. In the above expression, G is theshape gradient. This is an instance of the Riesz representation theorem, which states thatany linear functional can be expressed as an inner product with an element of the appropriatespace [199]. As the shape derivative of f is linear in δ x , it can be written in the form of(5.4). Intuitively, the shape derivative does not depend on tangential perturbations to thesurface. The shape gradient can be computed from derivatives with respect to the set ofparameters, Ω, used to discretize S P , ∂f (Ω) ∂ Ω i = (cid:90) S P d x ∂ x (Ω) ∂ Ω i · ˆ n G . (5.5)For example, Ω = { R cm,n , Z sm,n } could be assumed, where these are the Fourier coeﬃcients(5.70) in a cosine and sine representation of the cylindrical coordinates ( R, Z ) of S P . Upondiscretization of the right-hand side on a surface, the above takes the form of a linear systemthat can be solved for G [138]. However, this approach requires performing at least oneadditional equilibrium calculation for each parameter with a ﬁnite-diﬀerence approach.The shape gradient can also be computed with respect to perturbations of currents inthe vacuum region. We now consider f to depend on the shape of a set of ﬁlamentary coils, C = { C k } , through a free-boundary solution to the MHD equilibrium equations (Section1.4.1). We consider a vector ﬁeld of displacements to the coils, δ x C . The shape derivativeof f can also be written in shape gradient form, δf ( C ; δ x C ) = (cid:88) k (cid:73) C k dl δ x C k · (cid:101) G k , (5.6)where (cid:101) G k is the shape gradient for coil k , C k is the line integral along coil k , and the sumis taken over coils. Again, (cid:101) G k can be computed from derivatives with respect to a set ofa parameters describing coil shapes (5.84), analogous to (5.5). Note that we have deﬁnedthe shape gradient in a slightly diﬀerent way here than that introduced in Chapter 2 (2.12)(without the cross with ˆ t ), although we will ﬁnd in this Chapter that (cid:101) G k is perpendicularto ˆ t for the functionals under consideration. We distinguish the shape gradient as deﬁnedin (5.6) from that deﬁned in (2.12) with a tilde.To avoid the cost of direct computation of the shape gradient, we apply an adjointapproach. The shape gradient is thus obtained without perturbing the plasma surface orcoil shapes directly, but instead by solving an additional adjoint equation that depends onthe ﬁgure of merit of interest. We perform the calculation with the direct approach todemonstrate that the same derivative information is computed with either method.90 .3 Adjoint relations for MHD equilibria The goal of this Section is to generalize the well-known self-adjointness [20] of the MHDforce operator, (cid:90) V P d x (cid:0) ξ · F [ ξ ] − ξ · F [ ξ ] (cid:1) − µ (cid:90) S P d x ˆ n · (cid:0) ξ δ B [ ξ ] · B − ξ δ B [ ξ ] · B (cid:1) = 0 , (5.7)to allow for perturbations of interest for stellarator optimization. In this expression, theperturbed magnetic ﬁeld is expressed in terms of the displacement vector, δ B [ ξ , ] = ∇ × (cid:0) ξ , × B (cid:1) , (5.8)which follows from the assumption that the rotational transform is ﬁxed by the perturbation(ﬂux-freezing). The MHD force operator, F [ ξ , ] = (cid:0) ∇ × δ B [ ξ , ] (cid:1) × B µ + ( ∇ × B ) × δ B [ ξ , ] µ − ∇ (cid:0) δp [ ξ , ] (cid:1) , (5.9)is a linearization of the MHD equilibrium equation,( ∇ × B ) × B µ = ∇ p, (5.10)with boundary condition, B · ˆ n | S P = 0 , (5.11)under the assumption that the magnetic ﬁeld is perturbed according to (5.8) and the pressureis perturbed according to, δp [ ξ , ] = − ξ , · ∇ p − γp ∇ · ξ , , (5.12)where γ is the adiabatic index. As ξ describes the motion of ﬁeld lines, modes which perturbthe plasma boundary exhibit non-zero ξ · ˆ n | S P . The self-adjointness provides a relationshipbetween two perturbations about an MHD equilibrium state described by (5.10)-(5.11).This relation is incredibly valuable for ideal MHD stability analysis, forming the basis forthe energy principle.As described in Section 2.2.2, when formulating a continuous adjoint approach, the ad-joint of the linearized operator appearing in the forward PDE must be obtained. However,we cannot directly apply the self-adjointness relation from MHD stability theory (5.7) for thestellarator optimization problem. While MHD perturbations assume ﬁxed rotational trans-form, stellarator optimization is often performed instead at ﬁxed toroidal current. While theMHD self-adjointness relation allows for perturbations of the plasma boundary, we wouldalso like to consider linearized equilibrium states corresponding to perturbations of coils inthe vacuum region. We now form the appropriate generalized self-adjointness relations corre-sponding to ﬁxed-boundary perturbations (applied perturbations to the plasma boundary)and free-boundary perturbations (applied perturbations to electro-magnetic coils). Eventhough the boundary shape changes in the former case, we refer to it as “ﬁxed boundary”since the equilibrium code is run in ﬁxed-boundary mode, and since the associated adjointproblem will turn out to have no boundary perturbation.91he resulting expressions will allow us to relate the “direct perturbations,” those cor-responding to a linearized equilibrium state associated with the direct perturbation of theplasma boundary or coil shapes, and “adjoint perturbations,” with which we can computethe shape gradient eﬃciently. The adjoint perturbation will correspond to the change in theequilibrium when an additional bulk force acts on the plasma or the toroidal current proﬁleis changed. For the adjoint perturbation, there is no change to the outer ﬂux surface inthe ﬁxed-boundary case or to the coil currents in the free-boundary case. In this Section,we will show that aspects of the direct and adjoint changes are related to each other in amanner similar to Onsager symmetry. Thus, it will be shown that by calculating the adjointperturbation, with a judiciously chosen added force or change in the toroidal current proﬁle,the solution to the direct problem can be determined.We consider equilibria in which the magnetic ﬁeld in the plasma can be expressed interms of scalar functions ψ ( x ) , χ ( ψ ) , ϑ ( x ), and ϕ ( x ), B = ∇ ψ × ∇ ϑ − ∇ χ × ∇ ϕ = ∇ ψ × ∇ α, (5.13)where ( ψ , ϑ , ϕ ) form any magnetic coordinate system (Appendix A.3). We will regard ψ aslabeling the ﬂux surfaces and consider toroidal geometries for which, α = ϑ − ι ( ψ ) ϕ, (5.14)label ﬁeld lines in a ﬂux surface, where ϑ is a poloidal angle, ϕ is a toroidal angle, and ι ( ψ ) = χ (cid:48) ( ψ ) is the rotational transform, with χ ( ψ ) being the poloidal ﬂux function. Withthese deﬁnitions, the magnetic ﬂux passing toroidally through a poloidally closed curve ofconstant ψ is 2 πψ , and the ﬂux passing poloidally between the magnetic axis and the surfaceof constant ψ is 2 πχ ( ψ ). Thus, we assume that good ﬂux surfaces exist and leave aside theissues of islands and chaotic ﬁeld lines. In addition to the representation of the magneticﬁeld, we assume that MHD force balance (5.10) is satisﬁed with a scalar pressure, p ( ψ ).As mentioned, we will consider two cases, a ﬁxed-boundary case in which the shape ofthe outer ﬂux surface is prescribed, and a free-boundary case for which outside the plasma,whose surface is deﬁned by a particular value of toroidal ﬂux, the force balance equation(5.10) does not apply, but rather, the magnetic ﬁeld is determined by Ampere’s law, ∇ × B = µ J , (5.15)with a given current density J C , representing current ﬂowing outside the conﬁnement region.The ﬁxed-boundary and free-boundary equations are discussed in detail in Section 1.4.1.From (5.10) it follows that current density stream-lines also lie in the ψ = constant sur-faces. The toroidal current passing through a surface, S T ( ψ ) (Figure A.2), whose perimeteris a closed poloidal loop at constant ψ is given by, I T ( ψ ) = (cid:90) S T ( ψ ) d x ˆ n · J = (cid:90) S T ( ψ ) dψ dϑ √ g ∇ ϕ · J , (5.16)where √ g − = ∇ ψ × ∇ ϑ · ∇ φ .Equations (5.10) and (5.13) to (5.16) describe our base equilibrium conﬁguration. We nowconsider small changes in the equilibrium that are assumed to yield a second equilibrium stateof the same form as (5.13), but with new functions such that B (cid:48) = ∇ ψ (cid:48) ×∇ ϑ (cid:48) −∇ χ (cid:48) ( ψ (cid:48) ) ×∇ ϕ (cid:48) .92ach of the primed variables is assumed to diﬀer from the corresponding unprimed variablesby a small amount (e.g. ψ (cid:48) = ψ + δψ ( x )). The perturbed magnetic ﬁeld can then be expressed B (cid:48) = B + δ B , where, δ B = ∇ δψ × ∇ ϑ + ∇ ψ × ∇ δϑ − ∇ χ ( ψ ) × ∇ δϕ − ∇ (cid:0) ι ( ψ ) δψ + δχ ( ψ ) (cid:1) × ∇ ϕ. (5.17)We write the perturbed poloidal ﬂux as the sum of a term resulting from the perturbationof toroidal ﬂux at ﬁxed rotational transform, ι ( ψ ) δψ , and a term representing the perturbedrotational transform, δχ ( ψ ). Thus, we can regroup the terms in (5.17) as follows, δ B = ∇ × (cid:0) δψ ∇ ϑ − ι ( ψ ) δψ ∇ ϕ − δϑ ∇ ψ + δϕ ∇ χ ( ψ ) (cid:1) − ∇ δχ ( ψ ) × ∇ ϕ. (5.18)The group of terms in parentheses in (5.18) corresponds to perturbations of the magneticﬁeld allowed by ideal MHD, which is constrained by the “frozen-in law”, and which preservesthe rotational transform, ( δι ( ψ ) = 0). The last term in (5.18) allows for changes in therotational transform, ( δι ( ψ ) = χ (cid:48) ( ψ )). Note also that the expression in parentheses in (5.18)can be written as a sum of terms parallel to ∇ ψ and ∇ α , and hence it is perpendicular to B . The group of terms in parentheses in (5.18) can thus be expressed in terms of a vectorpotential that is perpendicular to the equilibrium magnetic ﬁeld, while the last term in(5.18) can be represented in terms of a vector potential in the toroidal direction, which thushas a component parallel to the equilibrium ﬁeld. We can therefore write δ B [ ξ , δχ ( ψ )] = ∇ × δ A [ ξ , δχ ( ψ )], where, δ A [ ξ , δχ ( ψ )] = ξ × B − δχ ( ψ ) ∇ ϕ. (5.19)Here, the variable ξ can be taken to be perpendicular to the applied magnetic ﬁeld, as theperturbed magnetic ﬁeld, δ B [ ξ , δχ ( ψ )] = ∇ × ( ξ × B ) − δχ (cid:48) ( ψ ) ∇ ψ × ∇ ϕ, (5.20)does not depend on ξ · ˆ b . We emphasize that this departs from the typical assumption madein ideal MHD stability theory that ∇ · ξ = 0.We deﬁne a vector ﬁeld of the displacement of a ﬁeld line, δ x , such that the perturbationto the ﬁeld line label α = ϑ − ι ( ψ ) ϕ and toroidal ﬂux satisfy, δψ + δ x · ∇ ψ = 0 (5.21a) δα + δ x · ∇ α = 0 , (5.21b)and δ x · B = 0. Noting that δα = δϑ − ι ( ψ ) δϕ − (cid:0) ι (cid:48) ( ψ ) δψ + δχ (cid:48) ( ψ ) (cid:1) ϕ , we ﬁnd, δ x = ξ + ˆ b × ∇ δχ ( ψ ) B ϕ, (5.22)which follows from (5.18). As one would expect, in the limit δχ ( ψ ) = 0, we recover the MHDdisplacement vector.As the pressure proﬁle is often assumed to be held ﬁxed during a conﬁguration optimiza-tion, we assume that the local pressure changes such that p ( ψ ) is unchanged, δp [ ξ ] = − ξ · ∇ p, (5.23)which follows from (5.22). We would similarly like to consider direct perturbations that ﬁx93he toroidal current. The change in toroidal current ﬂowing through the perturbed surfaceis computed using (5.3) by expressing (5.16) as a volume integral, δI T ( ψ ) = (cid:90) ∂ S T ( ψ ) dϑ √ g ξ · ∇ ψ J · ∇ ϕ + (cid:90) S T ( ψ ) dψdϑ √ gδ J [ ξ , δχ ( ψ )] · ∇ ϕ, (5.24)where S T ( ψ ) is a surface at constant toroidal angle (Figure A.2) bounded by the ψ surfaceand ∂ S T ( ψ ) is the boundary of such surface, a closed poloidal loop. The perturbed currentdensity is δ J [ ξ , δχ ( ψ )] = ∇ × δ B [ ξ , δχ ( ψ )]. Here the ﬁrst term accounts for the displacementof the ﬂux surface and the second term accounts for the change in toroidal current density.A linearized equilibrium state satisﬁes, F [ ξ , δχ ( ψ )] + δ F = 0 , (5.25)where δ F is an additional perturbed force to be prescribed and F [ ξ , δχ ( ψ )] is the generalizedforce operator, F [ ξ , δχ ( ψ )] = δ J [ ξ , δχ ( ψ )] × B + J × δ B [ ξ , δχ ( ψ )] − ∇ δp [ ξ ] . (5.26)We now consider two distinct perturbations of the equilibrium of the type described by(5.19), (5.20) and (5.23) to (5.26), which we denote with subscripts 1 and 2. In general,variables with subscript 1 will be associated with the direct perturbation, and those withsubscripts 2 will be associated with the adjoint perturbation. We then form the quantity, U T = (cid:90) V T d x ( δ J · δ A − δ J · δ A ) = 0 , (5.27)where we use the notation δ J , = δ J [ ξ , , δχ , ( ψ )] and δ A , = δ A [ ξ , , δχ , ( ψ )] and theintegral is, for the time being, over all space. The above is seen to vanish by expressing δ J , in terms of δ B , using Ampere’s law (5.15) and applying the divergence theorem.We now express the volume integral in (5.27) as the sum of three terms, U T = U P + U B + U C = 0 . (5.28)Here U P is the contribution from the plasma volume, integrated just up to the plasma-vacuumboundary. For this term we represent the vector potentials using (5.19), U P = (cid:90) V P d x (cid:16) δ J · (cid:0) ξ × B − δχ ( ψ ) ∇ ϕ (cid:1) − δ J · (cid:0) ξ × B − δχ ( ψ ) ∇ ϕ (cid:1)(cid:17) . (5.29)To evaluate (5.29) we use the perturbed force balance relation (5.25).The term U B comes from integrating over a thin layer at the plasma-vacuum boundary.At the boundary, the diﬀerence between the perturbed and unperturbed current density hasthe character of a current sheet due to the displacement of the outermost ﬂux surface. Thiseﬀective current sheet causes a jump in the tangential components of the perturbation to themagnetic ﬁelds at the surface. This jump implies that care must be taken in evaluating theperturbed magnetic ﬁelds at the surface as they have diﬀerent values on either side of theplasma-vacuum surface. However, the vector potential is continuous at the plasma-vacuum94oundary. Thus, we write, U B = (cid:90) S P d x |∇ ψ | ( ξ · ∇ ψ J · δ A − ξ · ∇ ψ J · δ A ) , (5.30)where the vector potentials are expressed as in (5.19). Using this expression for the vectorpotentials and expressing the surface integral as an integral over the toroidal and poloidalangles gives, U B = (cid:90) S P dϑdϕ √ g J · ∇ ϕ (cid:0) − ξ · ∇ ψδχ ( ψ ) + ξ · ∇ ψδχ ( ψ ) (cid:1) . (5.31)Here we note the terms in the vector potential coming from the MHD displacement cancel.Last, the quantity U C represents the contribution from the integral over the volumeoutside the plasma where only the coil currents need to be included, U C = (cid:90) V V d x ( δ J C · δ A V − δ J C · δ A V ) , (5.32)where δ A V , is the change in the vacuum vector potential, and δ J C , is the change in thecoil current density.Combining U P , U B , and U C gives the following relation appropriate to the free-boundarycase U T = U P + U B + U C = 0, or (cid:90) V P d x ( ξ · F − ξ · F ) + 2 π (cid:90) V P dψ (cid:16) δχ ( ψ ) δI (cid:48) T, ( ψ ) − δχ ( ψ ) δI (cid:48) T, ( ψ ) (cid:17) + (cid:90) V V d x ( δ J C · δ A V − δ J C · δ A V ) = 0 , (5.33)where we use the notation F , = F [ ξ , , δχ , ( ψ )]. This is the generalized free-boundaryadjoint relation. The steps leading to (5.33) are outlined in Appendix I. When the coilcurrents are conﬁned to ﬁlaments, the integral over the vacuum region can be expressed interms of changes to the coil currents, ﬂuxes through the coils, and integrals along the coils, (cid:90) V V d x δ J C , · δ A V , = (cid:88) k (cid:32) δ Φ C , ,k δI C , ,k + I C k (cid:73) C k dl δ x , ,C k ( x ) · ˆ t × δ B , (cid:33) . (5.34)Here δ Φ C k and δI C k are the change in magnetic ﬂux through and change in current in coil k ,respectively, and I C k is the current through the unperturbed coil. The unit tangent vectoralong C k is ˆ t , and δ x C k is a vector ﬁeld of perturbations to the k th coil. The above expressionis obtained upon application of Stokes theorem and the expression for the perturbation of aline integral (2.14).A similar relation can be obtained in the ﬁxed-boundary case. Here the integral overthe plasma volume (5.29) can be written as a surface integral by applying the divergencetheorem, U P = 1 µ (cid:90) S P d x ˆ n · ( δ B × δ A − δ B × δ A ) . (5.35)95gain, following steps outlined in Appendix I, this may be rewritten in the following form, (cid:90) V P d x ( ξ · F − ξ · F ) − π (cid:90) V P dψ (cid:0) δI T, ( ψ ) δχ (cid:48) ( ψ ) − δI T, ( ψ ) δχ (cid:48) ( ψ ) (cid:1) − µ (cid:90) S P d x ˆ n · ( ξ δ B − ξ δ B ) · B = 0 . (5.36)The ﬁxed-boundary adjoint relation can also be obtained by applying the self-adjointness(5.7) of the MHD force operator (Appendix J). If the second term in (5.36) is integrated byparts in ψ , we see that the ﬁxed and free-boundary adjoint relations share the terms involvingthe products of displacements with bulk forces and perturbed ﬂuxes with perturbed toroidalcurrents. The integral over the vacuum region in (5.33) is replaced by an integral over theplasma boundary and a boundary term from the integration by parts in ψ in (5.36).We now have two integral relations between perturbations 1 and 2, (5.33) and (5.36).They have a common form in that they each are the sum of three integrals: the ﬁrst involvingforces and displacements, the second involving the toroidal current and poloidal ﬂux proﬁles,and the third involving the manner in which the plasma boundary is prescribed. In (5.33),the free-boundary case, the changes in coil current densities are speciﬁed. In (5.36), theﬁxed-boundary case, the displacement of the outer ﬂux surface is prescribed. Equations(5.33) and (5.36) can also be viewed as the diﬀerence in sums of generalized forces andresponses. For example, in (5.33) we can consider the quantities δ F , δχ ( ψ ), δ J C as forcesand ξ , δI (cid:48) T ( ψ ), δ A V as responses. The fact that the sum of the products of direct forces andadjoint responses less the products of adjoint forces and direct responses vanishes is similarto the relation between forces and ﬂuxes related by Onsager symmetry [177, 178]. In thecase of Onsager symmetry, this relation follows from the self-adjoint property of the collisionoperator. In this case, the symmetry follows from the generalized self-adjointness relation. We now demonstrate how these relations (5.33) and (5.36) can be used to compute theshape gradient eﬃciently with a continuous adjoint method.

Consider a general ﬁgure of merit which involves a volume integral over the plasmadomain, f ( S P , B ) = (cid:90) V P d x F ( B ) , (5.37)where F ( B ) depends on the plasma surface through the ﬁxed-boundary MHD equilibriumequations (Table 1.1). We are interested in computing perturbations of f such that (5.10)96s satisﬁed. This constraint is enforced using the following Lagrangian functional, L ( S P , B , ξ ) = f ( S P , B ) + (cid:90) V P d x ξ · (cid:18) ( ∇ × B ) × B µ − ∇ p (cid:19) , (5.38)where ξ is a Lagrange multiplier and we have deﬁned our inner product to be a volumeintegral over the domain. To obtain the adjoint equation that ξ must satisfy, we computethe functional derivative of (5.38) with respect to B , where we note that perturbations to themagnetic ﬁeld satisfy (5.20). As δf (cid:0) S P , B ; δ B [ ξ , δχ ( ψ )] (cid:1) is a linear functional of ξ ∈ V P , δχ (cid:48) ( ψ ), and ξ · ˆ n | S P , from the Riesz representation theorem, the functional derivative of f with respect to B is expressed as, δf ( S P , B ; δ B ) = (cid:90) V P d x ξ · L + (cid:90) V P dψ χ (cid:48) ( ψ ) L ( ψ ) + (cid:90) S P d x ξ · ˆ n L , (5.39)for some quantities L , L , and L . The functional derivative of L is now, δ L ( S P , B , ξ ; δ B ) = (cid:90) V P d x ( ξ · L + ξ · F )+ (cid:90) V P dψ δχ (cid:48) ( ψ ) L ( ψ ) + (cid:90) S P d x ξ · ˆ n L , (5.40)where F = F [ ξ , δχ ( ψ )] is the generalized force operator associated with the direct pertur-bation (5.26). We apply the ﬁxed-boundary self-adjointness relation (5.36) to obtain, δ L ( S P , B , ξ ; δ B ) = (cid:90) V P d x ξ · ( L + F )+ (cid:90) V P dψ (cid:0) δχ (cid:48) ( ψ ) L ( ψ ) − πδI T, δχ (cid:48) ( ψ ) + 2 πδI T, ( ψ ) δχ (cid:48) ( ψ ) (cid:1) + (cid:90) S P d x (cid:34) ξ · ˆ n (cid:18) L + B · δ B µ (cid:19) − ξ · ˆ n B · δ B µ (cid:35) , (5.41)where F = F [ ξ , δχ ( ψ )] is the generalized bulk force associated with the adjoint perturba-tion (5.26), δI T, ( ψ ) is the adjoint toroidal current perturbation, and δχ ( ψ ) is the adjointpoloidal ﬂux perturbation.If the direct problem is computed with ﬁxed rotational transform, then δχ ( ψ ) = 0,and the adjoint variable (Lagrange multiplier) is chosen to satisfy the linearized equilibriumproblem, F [ ξ , δχ ( ψ )] + L = 0 (5.42a)ˆ n · ξ | S P = 0 (5.42b) δχ (cid:48) ( ψ ) = 0 , (5.42c)such that the above functional derivative (5.41) vanishes, except for the ﬁnal term that isalready in the desired Hadamard form (5.4). If instead the direct problem is computed with97xed toroidal current, then δI T, ( ψ ) = 0 and the adjoint variable is chosen to satisfy, F [ ξ , δχ ( ψ )] + L = 0 (5.43a)ˆ n · ξ | S P = 0 (5.43b) δI T, ( ψ ) = L π . (5.43c)The shape derivative of L with respect to boundary perturbation ξ is now computed to be, δ L ( S P , B , ξ ; ξ ) = (cid:90) S P d x ξ · ˆ n ( F + L ) + (cid:90) V P d x ξ · L + (cid:90) V P dψ δχ (cid:48) ( ψ ) L ( ψ ) + δ (cid:32)(cid:90) V P d x ξ · (cid:18) ( ∇ × B ) × B µ − ∇ p (cid:19)(cid:33) , (5.44)where the ﬁrst term is evaluated using the transport theorem (5.3). The notation in theﬁnal term indicates a shape derivative with respect to boundary perturbation ξ . The aboveexpression can be evaluated more easily by using the generalized adjoint relation (5.36),applying the conditions placed on the adjoint state (5.42) or (5.43), δ L ( S P , B , ξ ; ξ ) = (cid:90) S P d x ˆ n · ξ (cid:18) F + L + B · δ B µ (cid:19) . (5.45)So we identify the shape gradient to be, G = (cid:18) F + L + B · δ B µ (cid:19) S P . (5.46)Thus by solving a linearized equilibrium problem corresponding to the addition of a bulkforce for δ B [ ξ , δχ ( ψ )], we can compute the shape derivative with respect to any boundaryperturbation using the above shape gradient. We now consider free-boundary perturbations. Consider a general ﬁgure of merit whichinvolves a volume integral over the plasma domain, f ( C, B ) = (cid:90) V P d x F ( B ) , (5.47)where F ( B ) depends on the coil shapes C = { C k } through the free-boundary MHD equi-librium equations (Table 1.2). We are interested in computing perturbations of f such that(5.10) is satisﬁed, which we enforce with the Lagrangian functional, L ( C, B , ξ ) = f ( C, B ) + (cid:90) V P d x ξ · (cid:18) ( ∇ × B ) × B µ − ∇ p (cid:19) . (5.48)In this case, δf ( C, B ; δ B [ ξ , δχ ( ψ )]) is a linear functional of ξ ∈ V P , δχ ( ψ ), and theboundary perturbation ξ · ˆ n | S P resulting from a coil perturbation δ x ,C k × ˆ t . (While inthe ﬁxed-boundary case, we considered δf to be a linear functional of δχ (cid:48) ( ψ ), for the free-98oundary case it is more convenient to consider it to be a linear functional of δχ ( ψ ).) Bythe Riesz representation theorem, δf (cid:0) C, B ; δ B [ ξ , δχ ( ψ )] (cid:1) = (cid:90) V P d x ξ · L + (cid:90) V P dψ χ ( ψ ) L ( ψ ) + (cid:90) S P d x ξ · ˆ n L , (5.49)for some quantities L , L ( ψ ), and L . The functional derivative of L is now, δ L (cid:0) C, B , ξ ; δ B [ ξ , δχ ( ψ )] (cid:1) = (cid:90) V P d x ( ξ · L + ξ · F )+ (cid:90) V P dψ δχ ( ψ ) L ( ψ ) + (cid:90) S P d x ξ · ˆ n L . (5.50)We apply the free-boundary relation (5.33) to obtain, δ L (cid:0) C, B , ξ ; δ B [ ξ , δχ ( ψ )] (cid:1) = (cid:90) V P d x ξ · ( L + F )+ (cid:90) V P dψ (cid:16) δχ ( ψ ) L ( ψ ) − πδI (cid:48) T, ( ψ ) δχ ( ψ ) + 2 πδI (cid:48) T, ( ψ ) δχ ( ψ ) (cid:17) + (cid:88) k I C k (cid:73) C k dl (cid:0) δ x ,C k ( x ) × δ B − δ x ,C k ( x ) × δ B (cid:1) · ˆ t + (cid:90) S P d x ξ · ˆ n L , (5.51)where we have considered perturbations to currents in the vacuum region corresponding todisplacements of the ﬁlamentary coils without change to their currents. If the direct problemis computed with ﬁxed rotational transform, then δχ ( ψ ) = 0, and the adjoint variable ischosen to satisfy, F [ ξ , δχ ( ψ )] + L = 0 (5.52a) δχ ( ψ ) = 0 (5.52b) δ x ,C k × ˆ t = 0 , (5.52c)such that the above functional derivative vanishes, except for the terms involving integralsover S P or the ﬁlamentary coils. If instead the direct problem is computed with ﬁxed toroidalcurrent, then δI T, ( ψ ) = 0 and the adjoint variable is chosen to satisfy, F [ ξ , δχ ( ψ )] + L = 0 (5.53a) δI T, ( ψ ) = L π (5.53b) δ x ,C k × ˆ t = 0 . (5.53c)The shape derivative of L is now computed to be, δ L (cid:0) C, B , ξ ; δ x ,C k (cid:1) = (cid:90) V P d x ( ξ · L ) + δ (cid:32)(cid:90) V P d x ξ · (cid:18) ( ∇ × B ) × B µ − ∇ p (cid:19)(cid:33) + (cid:90) V P dψ δχ ( ψ ) L ( ψ ) + (cid:90) S P d x ξ · ˆ n ( L + F ) , (5.54)99here the notation δ ( . . . ) indicates a shape derivative with respect to coil displacement δ x ,C k . We can now simplify the above expression using the free-boundary relation (5.33)and the conditions placed on the adjoint variable, (5.52) or (5.53). We now obtain, δ L ( C, B , ξ ; δ x ,C k ) = (cid:90) S P d x ξ · ˆ n ( L + F ) + (cid:88) k I C k (cid:73) C k dl δ x ,C k × δ B · ˆ t , (5.55)where it is understood that ξ is the perturbation to the boundary arising from the coilperturbation δ x ,C k . The ﬁrst term can equivalently be expressed in terms of displacementsof the coil shapes using the virtual casing principle [143], though in this Chapter for simplicitywe will consider ﬁgures of merit such that ( L + F ) S P vanishes.Some examples of these continuous adjoint methods are discussed in the following Sec-tions. In this Section we will consider ﬁgures of merit which depend on the shape of the outerboundary of the plasma (Sections 5.5.1, 5.5.2, 5.5.3, and 5.5.4) and on the shape of theelectro-magnetic coils (Sections 5.5.2 and 5.5.3). The shape gradients of these ﬁgures ofmerit will be computed using both a direct method and an adjoint method, to demonstratethat the adjoint method produces identical results to the direct method but at much lowercomputational expense. For other ﬁgures of merit (Sections 5.5.5-5.5.7) the calculation is notpossible with existing codes, but a discussion of the adjoint linearized equilibrium equationsis presented. β Consider a ﬁgure of merit, the volume-averaged β , f β = f P f B , (5.56)where, f P = (cid:90) V p d x p ( ψ ) , (5.57)and, f B = (cid:90) V p d x B µ . (5.58)(This deﬁnition of volume-averaged β is the one employed in the VMEC code [111].) While f β is a ﬁgure of merit not often considered in stellarator shape optimization, we includethis calculation to demonstrate the adjoint approach, as its shape gradient can be computedwithout modiﬁcations to an equilibrium code.100 urface shape gradient We consider direct perturbations about an equilibrium with ﬁxed rotational transform, F [ ξ , δχ ( ψ )] = 0 (5.59a) ξ · ˆ n | S P = δ x · ˆ n | S P (5.59b) δχ (cid:48) ( ψ ) = 0 . (5.59c)The diﬀerential change in f P associated with displacement ξ is, δf P ( S P ; ξ ) = − (cid:90) V P d x ξ · ∇ p + (cid:90) S P d x ξ · ˆ n p ( ψ ) , (5.60)which follows from the transport theorem (5.3). The ﬁrst term accounts for the change in p at ﬁxed position due to the motion of the ﬂux surfaces, and the second term accounts forthe motion of the boundary. The diﬀerential change in f B associated with ξ is, δf B ( S P ; ξ ) = − µ (cid:90) V P d x (cid:16) B ∇ · ξ + ξ · ∇ (cid:0) B + µ p (cid:1)(cid:17) + 12 µ (cid:90) S P d x ξ · ˆ n B , (5.61)where we have noted that the perturbation to the magnetic ﬁeld strength at ﬁxed positionis given by, δB = − B (cid:16) B ∇ · ξ + ξ · ∇ (cid:0) B + µ p (cid:1) + δχ (cid:48) ( ψ ) B · ( ∇ ψ × ∇ ϕ ) (cid:17) . (5.62)The ﬁrst term in (5.61) corresponds with the change in f B due to the perturbation to theﬁeld strength, while the second term accounts for the motion of the boundary. Applying thedivergence theorem we obtain, δf B ( S P ; ξ ) = − (cid:90) V P d x ξ · ∇ p − µ (cid:90) S P d x ξ · ˆ n B . (5.63)The diﬀerential change in f β associated with displacement ξ satisﬁes, δf β ( S P ; ξ ) f β = (cid:90) S P d x ξ · ˆ n (cid:32) p ( ψ ) f P + B µ f B (cid:33) − (cid:18) f P − f B (cid:19) (cid:90) V P d x ξ · ∇ p. (5.64)The ﬁrst term on the right of (5.64) is already in the form of a shape gradient. To evaluatethe second term, we turn to the adjoint problem, choosing, F [ ξ , δχ ( ψ )] − ∇ p = 0 (5.65a) ξ · ˆ n | S P = 0 (5.65b) δχ (cid:48) ( ψ ) = 0 . (5.65c)That is, we add a bulk force corresponding to the equilibrium pressure gradient. Thisadditional force produces a proportional change in magnetic ﬁeld at the boundary and thusfrom (5.36), we ﬁnd, δf β ( S P ; ξ ) f β = (cid:90) S P d x ξ · ˆ n (cid:32) p ( ψ ) f P + B µ f B + (cid:18) f P − f B (cid:19) δ B · B µ (cid:33) . (5.66)101hus, we can obtain the shape gradient without perturbing the shape of the surface, G = f β (cid:32) p ( ψ ) f P + B µ f B + (cid:18) f P − f B (cid:19) δ B · B µ (cid:33) S P . (5.67)In practice, the adjoint magnetic ﬁeld is approximated from a nonlinear equilibrium solutionby adding a small perturbation to the pressure of magnitude ∆ P , p (cid:48) = (1 + ∆ P ) p . A forward-diﬀerence approximation is used to obtain, δ B ≈ B ( p + ∆ P p ) − B ( p )∆ P , (5.68)where B ( p ) is the magnetic ﬁeld evaluated with pressure p ( ψ ).A similar expression can be obtained for equilibria for which the rotational transform isallowed to vary, but the toroidal current is held ﬁxed ( δI T, = 0). In this case, F [ ξ , δχ ( ψ )] − ∇ p = 0 (5.69a) ξ · ˆ n | S P = 0 (5.69b) δI T, ( ψ ) = − I T ( ψ ) (cid:0) /f P − /f B (cid:1) − (cid:0) /f B (cid:1) . (5.69c)The shape gradient can then be obtained from (5.67).To demonstrate, we use the NCSX LI383 equilibrium [242]. The pressure proﬁle wasperturbed with ∆ P = 0 .

01 to compute the adjoint ﬁeld. The unperturbed and adjointequilibria are computed with the VMEC code [111]. The shape gradient obtained with theadjoint solution, G adjoint , and that obtained with the direct approach, G direct , are shown inFigure 5.1a. Positive values of the shape gradient indicate that f β increases if a normalperturbation is applied at a given location as indicated by (5.4). For the direct approachparameter derivatives with respect to the Fourier harmonics describing the plasma boundary( ∂f β /∂R cm,n , ∂f β /∂Z sm,n ), where R cm,n and Z sm,n are deﬁned through, R = (cid:88) m,n R cm,n cos( mθ − nN P φ ) (5.70a) Z = (cid:88) m,n Z sm,n sin( mθ − nN P φ ) , (5.70b)are computed with a centered 4-point stencil for m ≤

15 and | n | ≤ G residual = |G adjoint − G direct | (cid:113)(cid:82) S P d x G / (cid:82) S P d x , (5.71)is shown in Figure 5.1c, where the surface-averaged value of G residual is 1 . × − . We notethat the number of required equilibrium calculations for the direct shape gradient calculationdepends on the Fourier resolution and ﬁnite-diﬀerence stencil chosen. In this Chapter wepresent the number of function evaluations required in order for the adjoint and direct shape102radient calculations to agree within a few percent. As the Fourier resolution is increased,the results of the adjoint and direct methods converge to each other.The parameter ∆ P must be chosen carefully, as the perturbation must be large enoughthat the result is not dominated by round-oﬀ error, but small enough that nonlinear eﬀectsdo not become important. The relationship between G residual and ∆ P is shown in Figure5.1d. Here G direct is computed using the parameters reported above such that convergenceis obtained. We ﬁnd that G residual decreases as (∆ P ) until ∆ P ≈ .

5, at which point round-oﬀ error begins to dominate. This scaling is to be expected, as δ B is computed with aforward-diﬀerence derivative with step size ∆ P .For this and the following examples, the computational cost of transforming the param-eter derivatives to the shape gradient was negligible compared to the cost of computing theparameter derivatives. The direct approach used 2357 calls to VMEC while the adjoint ap-proach only required two. It is clear that the adjoint method yields nearly identical derivativeinformation to the direct method but at a substantially reduced computational cost.The residual diﬀerence is nonzero due to several sources of error, including discretizationerror in VMEC. As a result of the assumption of nested magnetic surfaces, MHD forcebalance (5.10) is not satisﬁed exactly, but a ﬁnite force residual is introduced. Error isalso introduced by computing δ B with the addition of a small perturbation to a nonlinearequilibrium calculation rather than from a linearized MHD solution.In Figure 5.1 we ﬁnd that f β is everywhere positive. This reﬂects the fact that the toroidalﬂux enclosed by S P is ﬁxed. As perturbations which displace the plasma surface outwardincrease the surface area of a toroidal cross-section, the toroidal ﬁeld must correspondinglydecrease, thus increasing f β . We ﬁnd that the shape gradient is increased in regions of largeﬁeld strength, as indicated by the second term in (5.67). Consider a ﬁgure of merit, the average rotational transform in a radially localized region, f ι = (cid:90) V P dψ ι ( ψ ) w ( ψ ) . (5.72)Here w ( ψ ) is a normalized weighting function, w ( ψ ) = e − ( ψ − ψ m ) /ψ w (cid:82) V P dψ e − ( ψ − ψ m ) /ψ w , (5.73)and ψ m and ψ w are parameters deﬁning the center and width of the Gaussian weighting,respectively. 103 a)(b) (c) -3 -2 -1 P -3 -2 -1 ( P ) (d) Figure 5.1: (a) The shape gradient for f β (5.56) computed using the adjoint solution (5.67)(left) and using parameter derivatives (right). (b) The shape gradient computed with theadjoint solution in the φ − θ plane, the VMEC [111] poloidal and toroidal angles (not magneticcoordinates). (c) The fractional diﬀerence (5.71) between the shape gradient obtained withthe adjoint solution and with parameter derivatives. (d) The fractional diﬀerence (5.71)depends on the scale of the perturbation added to the adjoint force balance equation, ∆ P .Figure adapted from [10] with permission. 104 urface shape gradient We consider direct perturbations about an equilibrium such that the toroidal current isﬁxed and the rotational transform is allowed to vary, F [ ξ , δχ ( ψ )] = 0 (5.74a) ξ · ˆ n | S P = δ x · ˆ n | S P (5.74b) δI T, ( ψ ) = 0 . (5.74c)The diﬀerential change of f ι associated with perturbation ξ is, δf ι ( S P ; ξ ) = (cid:90) V P dψ δχ (cid:48) ( ψ ) w ( ψ ) . (5.75)For the adjoint problem, we prescribe, F [ ξ , δχ ( ψ )] = 0 (5.76a) ξ · ˆ n | S P = 0 (5.76b) δI T, = w ( ψ ) . (5.76c)This additional current produces a proportional change in the magnetic ﬁeld at the boundary;thus using (5.36), we obtain the following, δf ι ( S p ; ξ ) = 12 πµ (cid:90) S P d x ˆ n · ξ δ B · B . (5.77)So, we can obtain the shape gradient from the adjoint solution, G = (cid:18) δ B · B πµ (cid:19) S P . (5.78)Note that the computation of the shape derivative of the rotational transform on a singlesurface, ψ m , with the adjoint approach would require a delta-function current perturbation, δI T, = δ ( ψ − ψ m ). As this type of perturbation is diﬃcult to resolve in a numerical computa-tion, the use of the Gaussian envelope allows the shape derivative of the rotational transformin a localized region of ψ m to be computed.To demonstrate, we use the NCSX LI383 equilibrium. We again apply a forward-diﬀerence approximation (5.68) of the adjoint solution, characterized by amplitude ∆ I = 715A. The parameters of the weight function are taken to be ψ m = 0 . ψ , and ψ w = 0 . ψ . Theshape gradient obtained with the adjoint solution and with the direct approach are shown inFigure 5.2a. For the direct approach, the shape gradient is computed from parameter deriva-tives with respect to the Fourier harmonics of the boundary (2.16) using an 8-point stencilwith m ≤

18 and | n | ≤

12. The fractional diﬀerence, G residual , between the two approachesis shown in Figure 5.2c, with a surface-averaged value of 2 . × − . The direct approachused 7401 calls to VMEC, while the adjoint only required two. Again, it is apparent thatthe adjoint method allows the same derivative information to be computed at a much lowercomputational cost.We ﬁnd that over much of the surface, the shape gradient is close to zero. A regionof large negative shape gradient occurs in the concave region of the plasma surface with105 a)(b) (c) Figure 5.2: (a) The shape gradient for f ι (5.72) computed using the adjoint solution (5.78)(left) and using parameter derivatives (right). (b) The shape gradient computed with theadjoint solution in the φ − θ plane, the VMEC [111] poloidal and toroidal angles (not mag-netic coordinates). (c) The fractional diﬀerence (5.71) between the shape gradient obtainedwith the adjoint solution and with parameter derivatives. Again, the results are essentiallyindistinguishable, as expected. Figure adapted from [10] with permission.106djacent regions of large positive shape gradient. This indicates that “pinching” the surfacein this region, making it more concave, would increase ι near the axis. Coil shape gradient

The shape gradient of f ι can also be computed with a free-boundary approach. Weconsider perturbations about an equilibrium with ﬁxed toroidal current, F [ ξ , δχ ( ψ )] = 0 (5.79a) δI T, ( ψ ) = 0 , (5.79b)with speciﬁed perturbation to the coil shapes, δ x C × ˆ t . We prescribe the adjoint problem, F [ ξ , δχ ( ψ )] = 0 (5.80a) δ x C × ˆ t = 0 (5.80b) δI T, ( ψ ) = w ( ψ ) , (5.80c)where w ( ψ ) is given by (5.73). Using (5.75) and (5.33) and noting that δI T, ( ψ ) vanishes atthe plasma boundary and on the axis, we ﬁnd, δf ι ( C ; δ x C ) = 12 π (cid:90) V V d x δ J C · δ A V . (5.81)Using (5.34), this can be written in terms of changes in the positions of coils in the vacuumregion, δf ι ( C ; δ x C ) = 12 π (cid:88) k (cid:32) I C k (cid:73) C k dl δ x C k ( x ) · ˆ t × δ B (cid:33) . (5.82)When computing the coil shape gradient, the current in each coil is ﬁxed. In arriving at(5.82), we assume that δI C ,k = 0. The coil shape gradient is thus (cid:101) G k = I C k ˆ t × δ B π (cid:12)(cid:12)(cid:12)(cid:12) C k . (5.83)As anticipated, (cid:101) G k has no component in the direction tangent to the coil. The adjointmagnetic ﬁeld is computed with a forward-diﬀerence approximation (5.68) with step size∆ I = 5 . × A. Evaluating the shape gradient requires computing the adjoint magneticﬁeld at the unperturbed coil locations in the vacuum region. This can be performed withthe DIAGNO code [71, 143], which employs the virtual casing principle.To demonstrate, we use the NCSX stellarator LI383 equilibrium. The toroidal currentproﬁle was perturbed with ψ m = 0 . ψ and ψ w = 0 . ψ . The shape gradient is computedfor each of the three unique modular coils per half period of the C09R00 coil set [236],keeping the planar coils ﬁxed. The result obtained with the adjoint solution, (cid:101) G adjoint ,k , isshown in Figure 5.3. The shape gradient is also computed with the direct approach, (cid:101) G direct ,k . https://princetonuniversity.github.io/STELLOPT/VMEC%20Free%20Boundary%20Run x k = (cid:88) m X kcm cos( mθ ) + X ksm sin( mθ ) (5.84a) y k = (cid:88) m Y kcm cos( mθ ) + Y ksm sin( mθ ) (5.84b) z k = (cid:88) m Z kcm cos( mθ ) + Z ksm sin( mθ ) , (5.84c)where θ ∈ [0 , π ] parameterizes each ﬁlament and k denotes each coil shape. The numericalderivative with respect to these parameters are computed for m ≤

45 using an 8-point stencil.In Figure 5.4a the Cartesian components of the shape gradient computed with the adjointapproach, (cid:101) G l adjoint ,k , and with the direct approach, (cid:101) G l direct ,k , are shown for each coil, where l ∈ { x, y, z } . The arrows indicate the direction and magnitude of (cid:101) G k such that if a coil weredeformed in the direction of (cid:101) G k , f ι would increase according to (5.6). The direct approachused 6553 calls to VMEC, while the adjoint only required two. In Figure 5.4b the fractionaldiﬀerence between the results obtained with the two methods, (cid:101) G l residual ,k = | (cid:101) G l adjoint ,k − (cid:101) G l direct ,k | (cid:114)(cid:72) C k dl (cid:16) (cid:101) G l adjoint ,k (cid:17) / (cid:72) C k dl , (5.85)is plotted. The line-averaged values of (cid:101) G l residual are 6 . × − for coil 1, 3 . × − for coil 2,and 4 . × − for coil 3.From Figure 5.3, we see that the sensitivity of f ι to coil displacements is much higher inregions where the coils are close to the plasma surface. The shape gradient points towardthe plasma surface in the concave region of the plasma surface, while on the outboard sidethe sensitivity is signiﬁcantly lower, again indicating the “pinching” eﬀect seen in Figure 5.2. The averaged radial (normal to a ﬂux surface) curvature is an important metric for MHDstability [64], κ ψ ≡ (cid:42) κ · (cid:18) ∂ x ∂ψ (cid:19) α,l (cid:43) ψ = (cid:42) B (cid:18) ∂∂ψ (cid:0) µ p + B (cid:1)(cid:19) α,l (cid:43) ψ , (5.86)where the curvature is κ = ˆ b · ∇ ˆ b , ˆ b = B /B is a unit vector in the direction of the magneticﬁeld and l measures length along a ﬁeld line. Subscripts in the above expression ( α, l )indicate quantities held ﬁxed while computing the derivative. The ﬂux surface average of aquantity A is, (cid:104) A (cid:105) ψ = (cid:82) ∞−∞ dlB A (cid:82) ∞−∞ dlB = (cid:82) π dϑ (cid:82) π dϕ √ gAV (cid:48) ( ψ ) . (5.87)108igure 5.3: The coil shape gradient for f ι (5.72) computed using the adjoint solution (5.83)for each of the 3 unique coil shapes (black). The arrows indicate the direction of (cid:101) G k , and theirlength indicates the local magnitude relative to the reference arrow shown. The arrows arenot visible on this scale on the outboard side. Figure reproduced from [10] with permission.109 x y z AdjointDirect arclength [m] -0.0500.050.1 0 2 4 6 arclength [m] -0.2-0.100.10.2 0 2 4 6 arclength [m] (a) x y z arclength [m] arclength [m] arclength [m] (b) Figure 5.4: (a) The Cartesian components of the coil shape gradient for each of the 3 uniquemodular NCSX coils computed with the adjoint and direct approaches. (b) The fractionaldiﬀerence (5.85) between the shape gradient computed with the adjoint approach and thedirect approach is plotted for each Cartesian component and each of the 3 unique coils.Figure adapted from [10] with permission. 110ere V ( ψ ) is the volume enclosed by the surface labeled by ψ . The average radial curvatureappears in the ideal MHD potential energy functional for interchange modes, and it providesa stabilizing eﬀect when p (cid:48) ( ψ ) κ ψ <

0. As typically p (cid:48) ( ψ ) < κ ψ > κ ψ = − V (cid:48)(cid:48) ( ψ ) V (cid:48) ( ψ ) . (5.88)Thus, as volume increases with ﬂux, V (cid:48)(cid:48) ( ψ ) < p (cid:48) ( ψ ) V (cid:48)(cid:48) ( ψ )also appears in the Mercier criterion for ideal MHD interchange stability [157]. Known as thevacuum magnetic well, V (cid:48)(cid:48) ( ψ ) has been employed in the optimization of several stellaratorconﬁgurations (e.g. [106, 114]).We consider the following ﬁgure of merit, f W = (cid:90) V P dψ w ( ψ ) V (cid:48) ( ψ ) , (5.89)where w ( ψ ) is a radial weight function which will be chosen so that (5.89) approximates V (cid:48)(cid:48) ( ψ ). This can equivalently be written as, f W = (cid:90) V P d x w ( ψ ) . (5.90) Surface shape gradient

We consider direct perturbations about an equilibrium with ﬁxed toroidal current (5.74).The shape derivative of f W is computed upon application of the transport theorem (5.3),noting that δψ = − ξ · ∇ ψ , δf W ( S P ; ξ ) = − (cid:90) V P d x ξ · ∇ w ( ψ ) + (cid:90) S P d x ξ · ˆ n w ( ψ ) , (5.91)where we have assumed w ( ψ ) to be diﬀerentiable. We recast the ﬁrst term in (5.91) as asurface integral by applying the ﬁxed-boundary adjoint relation (5.36) and prescribing theadjoint perturbation to satisfy the following, F [ ξ , δχ ( ψ )] − ∇ w ( ψ ) = 0 (5.92a) ξ · ˆ n | S P = 0 (5.92b) δI T, ( ψ ) = 0 . (5.92c)Upon application of (5.36) we obtain the following expression for the shape gradientwhich depends on the adjoint solution, δ B , G W = (cid:18) w ( ψ ) + δ B · B µ (cid:19) S P . (5.93)In Figure 5.5 we present the computation of G W for the NCSX LI383 equilibrium [242]using the the adjoint and direct approaches. We use a weight function, w ( ψ ) = exp( − ( ψ − ψ m, ) /ψ w ) − exp( − ( ψ − ψ m, ) /ψ w ) , (5.94)111uch that f W remains smooth while it approximates V (cid:48) ( ψ m, ) − V (cid:48) ( ψ m, ) where ψ m, = 0 . ψ , ψ m, = 0 . ψ , and ψ w = 0 . ψ (Figure 5.5c). We note that f W can be interpreted asmeasuring the change in volume due to the interchange of two ﬂux tubes centered at ψ m, and ψ m, . If f W >

0, this indicates that moving a ﬂux tube radially outward will cause it toexpand and lower its potential energy.The adjoint magnetic ﬁeld is computed with a forward-diﬀerence approximation (5.68)characterized by a step size ∆ P = 400 Pa. For the direct approach, derivatives with respectto the Fourier discretization (5.70) of the boundary are computed for m ≤

20 and | n | ≤ G residual is 3 . × − . Coil shape gradient

The shape derivative of f W can also be computed with respect to a perturbation of thecoil shapes. We consider perturbations about an equilibrium with ﬁxed toroidal current, F [ ξ , δχ ( ψ )] = 0 (5.95a) δI T, ( ψ ) = 0 , (5.95b)with speciﬁed perturbation to the coils shapes, δ x C × ˆ t . We prescribe the following adjointperturbation, F [ ξ , δχ ( ψ )] − ∇ w ( ψ ) = 0 (5.96a) δ x C × ˆ t = 0 (5.96b) δI T, ( ψ ) = 0 . (5.96c)The same weight function (5.94) is applied, which decreases suﬃciently fast that we canapproximate w ( ψ ) = 0. Upon application of the free-boundary adjoint relation (5.33), weobtain the following coil shape gradient, (cid:101) G k = I C k ˆ t × δ B µ (cid:12)(cid:12)(cid:12)(cid:12) C k . (5.97)The calculation of (cid:101) G k for each of the 3 unique coil shapes from the NCSX C09R00 coilset is shown in Figure 5.6. A two-point centered-diﬀerence approximation of the adjointmagnetic ﬁeld (5.68) is applied with characteristic step size ∆ P = 3 × Pa. The adjointﬁeld is evaluated in the vacuum region using the DIAGNO code. The shape gradient isalso computed with a direct approach. The Cartesian components of each coil are Fourier-discretized (5.84), and derivatives are computed with respect to modes with m ≤

40 witha 4-point centered-diﬀerence stencil. The fractional diﬀerence between the results obtainedwith the two approaches is quantiﬁed with (5.85). The line-averaged value of (cid:101) G l residual ,k is4 . × − . The direct approach required 2917 VMEC calls while the adjoint only required112 a) Adjoint (b) Direct / -1-0.500.51 w () (c) Weight function Figure 5.5: The shape gradient for f W (5.89) is computed using the (a) adjoint and (b) directapproaches. (c) The weight function (5.94) used to compute f W . Figure reproduced from[187] with permission. 113 a) Adjoint (b) Direct Figure 5.6: The coil shape gradient for f W is calculated for each of the 3 unique NCSXcoil shapes. The arrows indicate the direction of (cid:101) G k (5.97), and their lengths indicate themagnitude scaled according to the legend. Figure reproduced from [187] with permission.three. We now consider a ﬁgure of merit which quantiﬁes the ripple near the magnetic axis[37, 58, 59]. As all physical quantities must be independent of the poloidal angle on themagnetic axis, this quantiﬁes the departure from quasi-helical or quasi-axisymmetry nearthe magnetic axis. We deﬁne the magnetic ripple to be, f R = (cid:90) V P d x (cid:101) f R , (5.98)with, (cid:102) f R ( ψ, B ) = 12 w ( ψ ) (cid:16) B − B (cid:17) (5.99a) B = (cid:82) V P d x w ( ψ ) B (cid:82) V P d x w ( ψ ) , (5.99b)and a weight function given by, w ( ψ ) = exp( − ψ /ψ w ) , (5.100)with ψ w = 0 . ψ . 114 urface shape gradient We compute perturbations about an equilibrium with ﬁxed rotational transform (5.59).Noting that the local perturbation to the ﬁeld strength is given by (5.62), the shape derivativeis computed with the transport theorem (5.3), δf R ( S P ; ξ ) = (cid:90) S P d x ξ · ˆ n (cid:102) f R + (cid:90) V P d x (cid:32) ∂ (cid:102) f R ( ψ, B ) ∂B δB + ∂ (cid:102) f R ( ψ, B ) ∂ψ δψ (cid:33) . (5.101)We prescribe the following adjoint perturbation, F [ ξ , δχ ( ψ )] − ∇ · P = 0 (5.102a) ξ · ˆ n | S P = 0 (5.102b) δχ (cid:48) ( ψ ) = 0 . (5.102c)The bulk force perturbation required for the adjoint problem is written as the divergence ofan anisotropic pressure tensor, P = p ⊥ I + ( p || − p ⊥ )ˆ b ˆ b where I is the identity tensor. Theparallel and perpendicular pressures are related by the parallel force balance condition, ∂p || ( ψ, B ) ∂B = p || − p ⊥ B , (5.103)which follows from the requirement that ˆ b · δ F = 0 (5.25). We take the parallel pressure tobe, p || = (cid:102) f R . (5.104)Upon application of the ﬁxed-boundary adjoint relation and the expression for the cur-vature in an equilibrium ﬁeld, κ = ∇ ⊥ BB + ∇ pµ B , (5.105)we obtain the following shape gradient, G R = (cid:18) p ⊥ + δ B · B µ (cid:19) S P . (5.106)If instead the toroidal current is held ﬁxed in the direct perturbation as in (5.74), then therequired adjoint current perturbation is given by, δI T, ( ψ ) = V (cid:48) ( ψ )2 π (cid:42) ∂ (cid:101) f R ( ψ, B ) ∂B ˆ b · ∇ ϕ × ∇ ψ (cid:43) ψ , (5.107)with the shape gradient unchanged. See Appendix L for details of the calculation.To compute the adjoint perturbation (5.102)-(5.107), we consider the addition of ananisotropic pressure tensor to the nonlinear force balance equation, J (cid:48) × B (cid:48) = ∇ p (cid:48) + ∆ P ∇ · P ( ψ (cid:48) , B (cid:48) ) , (5.108)where P ( ψ (cid:48) , B (cid:48) ) = p ⊥ ( ψ (cid:48) , B (cid:48) ) I + (cid:0) p || ( ψ (cid:48) , B (cid:48) ) − p ⊥ ( ψ (cid:48) , B (cid:48) ) (cid:1) ˆ b (cid:48) ˆ b (cid:48) . Here primes indicate theperturbed quantities (i.e. B (cid:48) = B + δB ) where unprimed quantities satisfy (5.10). As in115ection 5.5.3, the perturbation has a scale set by ∆ P which is chosen to be small enough thatthe response is linear. Enforcing parallel force balance from (5.108) results in the followingcondition, ∂p || ( ψ (cid:48) , B (cid:48) ) ∂B (cid:48) = p || ( ψ (cid:48) , B (cid:48) ) − p ⊥ ( ψ (cid:48) , B (cid:48) ) B (cid:48) . (5.109)If we furthermore assume that ∆ P ∇ · P is small compared with the other terms in (5.108),we can consider it to be a perturbation to the base equilibrium (5.10). In this way, we canapply the perturbed force balance equation (5.25) with δ F = − ∆ P ∇ · P ( B ), where P is nowevaluated with the equilibrium ﬁeld which satisﬁes (5.10). Thus the desired pressure tensor(5.104) can be implemented by evaluating p || with the perturbed ﬁeld such that (5.109) issatisﬁed.We have implemented the pressure tensor deﬁned by (5.103)-(5.104) in the ANIMECcode [43], which modiﬁes the VMEC variational principle to allow 3D equilibrium solutionswith anisotropic pressures to be computed. The ANIMEC code has been used to modelequilibria with energetic particle species using pressure tensors based on bi-Maxwellian [45]and slowing-down [44] distribution functions. The variational principle assumes that p || onlyvaries on a surface through B and can, therefore, be used to include the required adjointbulk force.In Figure 5.7, we present the computation of G R for the NCSX LI383 equilibrium usingthe adjoint and direct approaches. For the direct approach, derivatives with respect to theFourier discretization of the boundary (5.70) are computed for m ≤

11 and | n | ≤ P = 7 . × Pa. The directapproach required 2761 calls to VMEC while the adjoint approach required two calls. Thesurface-averaged value of G residual (5.71) is 3 . × − . /ν regime The eﬀective ripple in the 1 /ν regime [168] is a ﬁgure of merit which has proven valuablefor neoclassical optimization (e.g. [106, 134, 242]). This quantity characterizes the geometricdependence of the neoclassical particle ﬂux under the assumption of low-collisionality suchthat (cid:15) eﬀ is analogous to the helical ripple amplitude, (cid:15) h , that appears in the expression ofthe 1 /ν particle ﬂux for a classical stellarator [66]. The following expression is obtained forthe eﬀective ripple, (cid:15) / ( ψ ) = π √ V (cid:48) ( ψ ) (cid:15) (cid:90) /B min /B max dλλ (cid:90) π dα (cid:88) i ( ∂∂α ˆ K i ( α, λ )) ˆ I i ( α, λ ) . (5.110)Here λ = v ⊥ / ( v B ) is the pitch angle, B min and B max are the minimum and maximum valuesof the ﬁeld strength on a surface labeled by ψ , and (cid:15) ref is a reference aspect ratio. We have116 a) Adjoint (b) Direct(c) Weight function Figure 5.7: The shape gradient for f R (5.98) is computed using the (a) adjoint and (b) directapproaches with a weight function (5.100) shown in (c). Figure reproduced from [187] withpermission. 117eﬁned the bounce integrals, ˆ I i ( α, λ ) = (cid:73) dl v || Bv (5.111a)ˆ K i ( α, λ ) = (cid:73) dl v || Bv , (5.111b)where the notation (cid:72) dl = (cid:80) σ σ (cid:82) ϕ + ϕ − dϕ/ ˆ b · ∇ ϕ indicates integration at constant λ and α between successive bounce points where v || ( ϕ + ) = v || ( ϕ − ) = 0 and σ = sign( v || ). The sumin (5.110) is taken over wells at constant λ and α for ϕ − ,i ∈ [0 , π ).We consider an integrated ﬁgure of merit, f (cid:15) = (cid:90) V P d x w ( ψ ) (cid:15) / ( ψ ) , (5.112)where w ( ψ ) is a radial weight function. We perturb about an equilibrium with ﬁxed toroidalcurrent (5.74). The shape derivative of f (cid:15) is computed to be, δf (cid:15) ( S P ; ξ ) = (cid:90) V P d x (cid:0) P (cid:15) : ∇ ξ + δχ (cid:48) ( ψ ) I (cid:15) (cid:1) , (5.113)where the double dot (:) indicates contraction between dyadic tensors A and B as A : B = (cid:80) i,j A ij B ji , with, I (cid:15) = πw ( ψ )2 √ (cid:15) (cid:90) /B /B max dλλ × (cid:34) (cid:16) ∂∂α ˆ K ( α, λ, ϕ ) (cid:17) ˆ I ( α, λ, ϕ )  − ϕ B × ∇ ψ · ∇ (cid:32) | v || | vB (cid:33) + B × ∇ ψ · ∇ ϕ ∂∂B (cid:32) | v || | vB (cid:33) + 2 ∂∂α (cid:32) ∂∂α ˆ K ( α, λ, ϕ )ˆ I ( α, λ, ϕ ) (cid:33)  − ϕ B × ∇ ψ · ∇ (cid:32) | v || | v B (cid:33) + B × ∇ ψ · ∇ ϕ ∂∂B (cid:32) | v || | v B (cid:33) (cid:35) , (5.114)and P (cid:15) = p || ˆ b ˆ b + p ⊥ ( I − ˆ b ˆ b ) with, p || = − πw ( ψ )2 √ (cid:15) (cid:90) /B /B max dλλ (cid:32) (cid:16) ∂∂α ˆ K ( α, λ, ϕ ) (cid:17) ˆ I ( α, λ, ϕ ) | v || | v + 2 ∂∂α (cid:32) ∂∂α ˆ K ( α, λ, ϕ )ˆ I ( α, λ, ϕ ) (cid:33) | v || | v (cid:33) (5.115a) p ⊥ = − πw ( ψ )2 √ (cid:15) (cid:90) /B /B max dλλ (cid:32) (cid:16) ∂∂α ˆ K ( α, λ, ϕ ) (cid:17) ˆ I ( α, λ, ϕ ) (cid:32) λvB | v || | + | v || | v (cid:33) + 2 ∂∂α (cid:32) ∂∂α ˆ K ( α, λ, ϕ )ˆ I ( α, λ, ϕ ) (cid:33) (cid:32) λ | v || | B v + | v || | v (cid:33) (cid:33) . (5.115b)118erivatives are computed assuming (cid:15) ref is held constant. The bounce integrals are de-ﬁned with respect to ϕ such that ˆ I ( α, λ, ϕ ) = ˆ I i if ϕ ∈ [ ϕ − ,i , ϕ + ,i ] and ˆ I ( α, λ, ϕ ) = 0 if λB ( α, ϕ ) >

1. The same convention is used for ˆ K ( α, λ, ϕ ). We prescribe the followingadjoint perturbation, F [ ξ , δχ ( ψ )] − ∇ · P (cid:15) = 0 (5.116a) ξ · ˆ n | S P = 0 (5.116b) δI T, ( ψ ) = V (cid:48) ( ψ )2 π (cid:104)I (cid:15) (cid:105) ψ . (5.116c)The adjoint bulk force must be consistent with parallel force balance from (5.25), which isequivalent to the condition, ∇ || p || = ∇ || BB ( p || − p ⊥ ) . (5.117)This can be shown to be satisﬁed by (5.115), noting that the λ integrand vanishes at 1 /B such that there is no contribution from the parallel gradient acting on the bounds of theintegral. There is also no contribution to the parallel gradient from the bounce-integrals, as | v || | vanishes at points of non-zero gradient of ˆ I ( α, λ, ϕ ) and ˆ K ( α, λ, ϕ ).Upon application of the ﬁxed-boundary adjoint relation (5.36) and integration by parts,we obtain the following expression for the shape gradient, G (cid:15) = (cid:18) p ⊥ + δ B · B µ (cid:19) S P . (5.118)See Appendix M for details of the calculation. The approach demonstrated in this Sectioncould be extended to compute the shape gradients of other ﬁgures of merit involving bounceintegrals, such as the Γ c metric for energetic particle conﬁnement [169] or the variation ofthe parallel adiabatic invariant on a ﬂux surface [58]. Quasi-symmetry is desirable as it ensures collisionless conﬁnement of guiding centers.This property follows when the ﬁeld strength depends on a linear combination of the Boozerangles, B ( ψ, ϑ B , ϕ B ) = B ( ψ, M ϑ B − N ϕ B ) for ﬁxed integers M and N [22, 175] (Appendix5.5.6). Several stellarator conﬁgurations have been optimized to be close to quasi-symmetry(e.g., [57, 106, 149, 197]) by minimizing the amplitude of symmetry-breaking Fourier har-monics of the ﬁeld strength. We will consider a ﬁgure of merit that does not require a Boozercoordinate transformation; instead, we use a general set of magnetic coordinates ( ψ, ϑ, ϕ ) todeﬁne our ﬁgure of merit.In Boozer coordinates [21, 97] ( ψ, ϑ B , ϕ B ) the covariant form for the magnetic ﬁeld is, B = I ( ψ ) ∇ ϑ B + G ( ψ ) ∇ ϕ B + K ( ψ, ϑ B , ϕ B ) ∇ ψ. (5.119)Here G ( ψ ) = µ I P ( ψ ) / (2 π ), where I P ( ψ ) is the poloidal current outside the ψ surface. Thepoloidal current can be computed using Ampere’s law and expressed as an integral over a119urface labeled by ψ , S P ( ψ ), I P ( ψ ) = 1 µ (cid:90) π dϕ B · ∂ x ∂ϕ = − πµ (cid:90) S P ( ψ ) d x B · ∇ ϑ × ˆ n . (5.120)The quantity I ( ψ ) = µ I T ( ψ ) / (2 π ), where I T ( ψ ) is the toroidal current inside the ψ surface(5.16). We quantify the departure from quasi-symmetry in the following way, f QS = 12 (cid:90) V P d x w ( ψ ) (cid:0) B × ∇ ψ · ∇ B − F ( ψ ) B · ∇ B (cid:1) . (5.121)Here w ( ψ ) is a radial weight function and, F ( ψ ) = ( M/N ) G ( ψ ) + I ( ψ )( M/N ) ι ( ψ ) − . (5.122)If f QS = 0, then the ﬁeld is quasi-symmetric with mode numbers M and N [97], which can beshown using the covariant (5.13) and contravariant (5.119) representations of the magneticﬁeld assuming B = B ( ψ, M ϑ B − N ϕ B ) for ﬁxed M and N . Note that f QS quantiﬁes thesymmetry in Boozer coordinates but can be evaluated in any ﬂux coordinate system.We consider perturbation about an equilibrium with ﬁxed toroidal current (5.74). Theperturbations to the Boozer poloidal covariant component is computed using the transporttheorem (5.3), δG ( ψ ) = − π (cid:90) S P ( ψ ) d x (cid:0) ∇ · ( B × ∇ ϑ ) ξ · ˆ n + δ B × ∇ ϑ · ˆ n (cid:1) . (5.123)In arriving at (5.123) we have used the fact that spatial derivatives commute with shapederivatives. The ﬁrst term accounts for the unperturbed current density through the per-turbed boundary, and the second accounts for the perturbed current density through theunperturbed boundary. The contribution from the perturbation to the poloidal angle canbe shown to vanish. Upon application of (5.20) we obtain, noting that (cid:82) S P ( ψ ) d x A = V (cid:48) ( ψ ) (cid:104) A |∇ ψ |(cid:105) ψ for any quantity A , δG ( ψ ) = − V (cid:48) ( ψ )4 π (cid:42) ξ · ∇ ψ ∇ · ( B × ∇ ϑ ) − √ g ∂ x ∂ϕ · ∇ × ( ξ × B ) − δχ (cid:48) ( ψ ) √ g ∂ x ∂ϕ · ∂ x ∂ϑ (cid:43) ψ , (5.124)Applying the transport theorem (5.3), the shape derivative of f QS takes the form, δf QS ( S P ; ξ ) = 12 (cid:90) S P d x ξ · ˆ n M w ( ψ ) + 12 (cid:90) V P d x w (cid:48) ( ψ ) δψ M + (cid:90) V P d x w ( ψ ) M (cid:18) δ B · A + S · ∇ δB + B × ∇ δψ · ∇ B − δG ( ψ ) B · ∇ Bι ( ψ ) − ( N/M ) (cid:19) + (cid:90) V P d x w ( ψ ) M (cid:18) F ( ψ ) ι ( ψ ) − ( N/M ) δχ (cid:48) ( ψ ) B · ∇ B − δψF (cid:48) ( ψ ) B · ∇ B (cid:19) , (5.125)120here M = B ×∇ ψ ·∇ B − F ( ψ ) B ·∇ B , A = ∇ ψ ×∇ B − F ( ψ ) ∇ B , and S = B ×∇ ψ − F ( ψ ) B .After several steps outlined in Appendix N, the shape derivative can be written in thefollowing way, δf QS ( S P ; ξ ) = (cid:90) V P d x (cid:0) ξ · F QS + δχ (cid:48) ( ψ ) I QS (cid:1) + (cid:90) S P d x ξ · ˆ n B QS , (5.126)with, F QS = 12 ∇ ⊥ (cid:0) w ( ψ ) M (cid:1) + (cid:16) (ˆ b × ∇ ψ ) ∇ || B + F ( ψ ) ∇ ⊥ B (cid:17) w ( ψ ) B · ∇M + B × ( ∇ × ( ∇ ψ × ∇ B )) w ( ψ ) M − B ∇ ⊥ (cid:0) w ( ψ ) S · ∇M (cid:1) + κ Bw ( ψ ) S · ∇M− ∇ ψ ∇ B · ∇ × (cid:0) w ( ψ ) M B (cid:1) + 14 π (cid:32) − ∇ ⊥ (cid:18) w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ ( ι ( ψ ) − ( N/M )) (cid:19) ( B · ∇ ψ × ∇ ϑ )+ w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ ι ( ψ ) − ( N/M ) (cid:0) ∇ ψ ∇ · ( B × ∇ ϑ ) − B × ∇ × ( ∇ ψ × ∇ ϑ ) (cid:1) (cid:33) (5.127a) B QS = − w ( ψ ) M + Bw ( ψ ) S · ∇M − w ( ψ ) M∇ B × B · ∇ ψ + w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ π ( ι ( ψ ) − ( N/M )) ( B · ∇ ψ × ∇ ϑ ) (5.127b) I QS = − w ( ψ ) M∇ ψ × ∇ ϕ · A + w ( ψ ) ( S · ∇M ) ˆ b · ∇ ψ × ∇ ϕ + w ( ψ ) M B · ∇ Bι ( ψ ) − ( N/M )  F ( ψ ) − (cid:42) V (cid:48) ( ψ )4 π √ g ∂ x ∂ϕ · ∂ x ∂ϑ (cid:43) ψ  . (5.127c)In (5.127a), ∇ || = ˆ b · ∇ and ∇ ⊥ = ∇ − ˆ b ∇ || are the parallel and perpendicular gradients.We can now prescribe an adjoint perturbation which satisﬁes, F [ ξ , δχ ( ψ )] + F QS = 0 (5.128a) ξ · ˆ n | S P = 0 (5.128b) δI T, ( ψ ) = V (cid:48) ( ψ )2 π (cid:104)I QS (cid:105) ψ . (5.128c)We note that F QS satisﬁes the parallel force balance condition (ˆ b · F QS = 0) implied by(5.25). Upon application of the ﬁxed-boundary adjoint relation we obtain the following shapegradient, G QS = (cid:18) B QS + δ B · B µ (cid:19) S P . (5.129)121 .5.7 Neoclassical ﬁgures of merit In Section 5.5.5, we considered a ﬁgure of merit that quantiﬁes the geometric dependenceof the neoclassical particle ﬂux in the 1 /ν regime. In applying this model, several assump-tions are imposed, such as a small radial electric ﬁeld, E r , low collisionality, and a simpliﬁedpitch-angle scattering collision operator. In this Section, we consider a more general neo-classical ﬁgure of merit arising from a moment of the local drift kinetic equation, allowingfor optimization at ﬁnite collisionality and E r . It is assumed here that the collision time iscomparable to the bounce time but shorter than the time needed to complete a magneticdrift orbit. In Chapter 4, an adjoint method is demonstrated for obtaining derivatives ofneoclassical ﬁgures of merit with respect to local geometric quantities on a ﬂux surface. Theadjoint method described in this Section will extend these results, such that shape derivativeswith respect to the plasma boundary can be computed.Consider the following ﬁgure of merit, f NC = (cid:90) V P d x w ( ψ ) R ( ψ ) . (5.130)Here R ( ψ ) is a ﬂux surface averaged moment of the neoclassical distribution function, f ,which satisﬁes the local drift kinetic equation (DKE),( v || ˆ b + v E ) · ∇ f − C ( f ) = − v m · ∇ ψ ∂f M ∂ψ , (5.131)where v E = E × B /B is the E × B drift velocity, v m · ∇ ψ is the radial magnetic driftvelocity (4.3), f M is a Maxwellian (M.3), and C is the linearized Fokker-Planck operator.For example, R can be taken to be the bootstrap current, J b = (cid:88) s (cid:104) B (cid:82) d v f s v || (cid:105) ψ n s (cid:104) B (cid:105) / ψ , (5.132)where the sum is taken over species. We note that the geometric dependence that enters theDKE when written in Boozer coordinates only arises through the quantities { B, G ( ψ ) , I ( ψ ) , ι ( ψ ) } .Thus for simplicity, Boozer coordinates will be assumed throughout this Section.The perturbation to R ( ψ ) at ﬁxed toroidal current (5.74) can be written as, δ R ( ψ ) = (cid:104) S R δB (cid:105) ψ + ∂ R ( ψ ) ∂G ( ψ ) δG ( ψ ) + ∂ R ( ψ ) ∂ι ( ψ ) δχ (cid:48) ( ψ ) . (5.133)Here S R is a local sensitivity function which quantiﬁes the change to R associated with aperturbation of the ﬁeld strength δB deﬁned in the following way. Consider the perturbationto R resulting from a change in the ﬁeld strength at ﬁxed G ( ψ ), I ( ψ ), and ι ( ψ ). Thefunctional derivative of R ( ψ ) with respect to B ( x ) can be expressed as, δ R ( δB ; B ( x )) = (cid:10) S R δB ( x ) (cid:11) ψ . (5.134)This is another instance of the Riesz representation theorem: δ R is a linear functional of δB , with the inner product taken to be the ﬂux surface average. Thus S R can be thoughtof as analogous to the shape gradient (5.4). 122he quantities { S R , ∂ R ( ψ ) /∂G ( ψ ) , ∂ R ( ψ ) /∂ι ( ψ ) } can be computed with the adjointmethod described in Chapter 4 with the SFINCS code [140]. Here we consider SFINCSto be run on a set of surfaces such that (5.130) can be computed numerically. The deriva-tives computed by SFINCS will appear in the additional bulk force required for the adjointperturbed equilibrium. We consider perturbations of an equilibrium at ﬁxed toroidal cur-rent (5.74). The shape derivative of f NC can be computed on application of the transporttheorem (5.3), δf NC ( S P ; ξ ) = (cid:90) S P d x ξ · ˆ n w ( ψ ) R ( ψ ) + (cid:90) V P d x δψ ∂∂ψ (cid:0) w ( ψ ) R ( ψ ) (cid:1) + (cid:90) V P d x w ( ψ ) (cid:18) ∂ R ( ψ ) ∂G ( ψ ) δG ( ψ ) + ∂ R ( ψ ) ∂ι ( ψ ) δχ (cid:48) ( ψ ) + (cid:104) S R δB (cid:105) ψ (cid:19) . (5.135)After several steps outlined in Appendix O, the shape derivative is written in the followingform, δf NC ( S P ; ξ ) = (cid:90) V P d x (cid:0) ξ · F NC + δχ (cid:48) ( ψ ) I NC (cid:1) + (cid:90) S P d x ξ · ˆ n B NC , (5.136)with, F NC = −∇ ( R ( ψ ) w ( ψ )) − ∇ ψ ( ∇ × B ) · ∇ ϑ ∂ R ( ψ ) ∂G ( ψ ) w ( ψ ) B √ g (cid:104) B (cid:105) ψ + w ( ψ ) (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) B × ∇ × (cid:18) ∂ x ∂ϕ B (cid:19) + G ( ψ ) B ∇ (cid:32) w ( ψ ) (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) (cid:33) − κ w ( ψ ) S R B + B ∇ ⊥ ( w ( ψ ) S R ) (5.137a) B NC = w ( ψ ) R ( ψ ) − w ( ψ ) B (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) G ( ψ ) − w ( ψ ) S R B (5.137b) I NC = ∂ R ( ψ ) ∂G ( ψ ) w ( ψ ) B (cid:104) B (cid:105) ψ √ g ∂ x ∂ϕ · ∂ x ∂ϑ + w ( ψ ) ∂ R ( ψ ) ∂ι ( ψ ) − w ( ψ ) S R ˆ b · ∇ ψ × ∇ ϕ. (5.137c)We consider the following adjoint perturbation, F [ ξ , δχ ( ψ )] + F NC = 0 (5.138a) ξ · ˆ n | S P = 0 (5.138b) δI T, ( ψ ) = V (cid:48) ( ψ )2 π (cid:104)I NC (cid:105) ψ . (5.138c)The adjoint bulk force F NC is chosen to satisfy parallel force balance required by (5.25).Upon application of the ﬁxed-boundary adjoint relation we obtain the shape gradient, G NC = (cid:18) B NC + δ B · B µ (cid:19) S P . (5.139)123 .6 Conclusions We have obtained a relationship between 3D perturbations of MHD equilibria that isa consequence of the self-adjoint property of the MHD force operator. The relation allowsfor the eﬃcient computation of shape gradients for either the outer plasma surface usingthe ﬁxed-boundary adjoint relation (5.36) or for coil shapes using the free boundary adjointrelation (5.33). The computation of the shape gradient of several stellarator ﬁgures of merithas been demonstrated with both the adjoint and direct approach. The application of theadjoint relation provides an O ( N Ω ) reduction in CPU hours required in comparison with thedirect method of computing the shape gradient, where N Ω is the number of parameters usedto describe the shape of the outer boundary or the coils. For fully 3D geometry, N Ω canbe 10 − . Thus, the application of adjoint methods can signiﬁcantly reduce the cost ofcomputing the shape gradient for gradient-based optimization or local sensitivity analysis.We have demonstrated that the self-adjointness relations (Section 5.3) can be imple-mented to eﬃciently compute the shape gradient of ﬁgures of merit relevant for stellaratorconﬁguration optimization. The shape gradient is obtained by solving an adjoint perturbedforce balance equation that depends on the ﬁgure of merit of interest. For the volume-averaged β and vacuum well parameter (Sections 5.5.1 and 5.5.3), the additional bulk forcerequired for the adjoint problem is simply the gradient of a function of ﬂux, and so it can beimplemented by adding a perturbation to the pressure proﬁle. For the magnetic ripple onaxis (Section 5.5.4), the required bulk force takes the form of the divergence of a pressuretensor that only varies on a surface through the ﬁeld strength. As the ANIMEC code cur-rently treats this type of pressure tensor, this adjoint bulk force is implemented with a minormodiﬁcation to the code. Computing the shape gradient of (cid:15) / with the adjoint approachalso requires the addition of the divergence of a pressure tensor. However, this pressuretensor varies on a surface through the ﬁeld line label due to the bounce integrals that appear(5.115). Thus the variational principle used by the ANIMEC code cannot be easily extendedfor this application. Similarly, the shape gradients for the quasi-symmetry (Section 5.5.6)and neoclassical (Section 5.5.7) ﬁgures of merit require an adjoint bulk force that is not in theform of the divergence of a pressure tensor. This provides an impetus for the developmentof a ﬂexible perturbed MHD equilibrium code that could enable these calculations. Whileseveral 3D ideal MHD stability codes exist [7, 204, 219], only the CAS3D code has beenmodiﬁed in order to perform perturbed equilibrium calculations [28, 173]. A discussion ofsuch linear equilibrium calculations for adjoint-based shape gradient evaluations is presentedin Chapter 6.It should be noted that the adjoint approach we have outlined can not yield an exactanalytic shape gradient, as error is introduced through the approximation of the adjointsolution. Throughout, we have assumed the existence of magnetic surfaces as the 3D equi-librium is perturbed. Therefore a code such as VMEC or ANIMEC, which minimizes anenergy subject to the constraint that surfaces exist, is suitable. Generally VMEC solutionsdo not satisfy (5.10) exactly [174], as they do not account for the formation of islands orcurrent singularities associated with rational surfaces. Furthermore, the parameters ∆ P and∆ I introduce additional numerical noise. As demonstrated in Section 5.5.1, these parameters124ust be small enough that nonlinear eﬀects do not become important yet large enough thatround-oﬀ error does not dominate. We have demonstrated that the typical diﬀerence be-tween the shape gradient obtained with the adjoint method and that computed directly fromnumerical derivatives is (cid:46) hapter 6 Linearized equilibrium solutions

As discussed in Chapter 5, the application of the adjoint approach for computing theshape gradient of functions of MHD equilibria requires solutions of linearized MHD equi-librium equations. In the examples presented thus far, these linearized solutions were ap-proximated by adding a small perturbation to a nonlinear MHD equilibrium, such as aperturbation to the prescribed toroidal current or pressure proﬁles. This approximation in-troduces error associated with the choice of the amplitude of the perturbation and limits thetypes of objective functions that can be treated. In this Chapter, we discuss an approach tocompute the necessary linearized equilibrium solutions based on a variational method.

There are several existing techniques for computing linearized ideal MHD equilibria. Aswill be shown directly in the following Section, a linearized equilibrium state is a stationarypoint of an energy functional. This energy functional is related to the potential energythat appears in ideal MHD stability analysis, W P [ ξ ] = − (cid:82) V P d x ξ · F [ ξ ], where ξ is thedisplacement vector and F [ ξ ] is the MHD force operator (6.3). For this reason, ideal MHDstability codes can be augmented for perturbed equilibrium calculations. One approach isbased on the Direct Criterion of Newcomb (DCON) code [80], which minimizes the potentialenergy by solving an Euler-Lagrange equation for the displacement vector. This method hasbeen extended with the Ideal Perturbed Equilibrium Code (IPEC) [182, 183], which couplesapplied plasma boundary perturbations to perturbations of currents in the vacuum region.This code models axisymmetry-breaking perturbations on tokamak equilibria for the studyof mode-locking [61] and neoclassical toroidal viscosity (NTV) [152]. Modiﬁcation of DCONis currently underway to enable stability calculations for stellarators with stepped-pressureequilibria [81].The Code for the Analysis of the MHD Stability of 3D Equilibria (CAS3D) has similarlybeen modiﬁed for perturbed MHD equilibrium calculations. To evaluate ideal MHD stability,CAS3D solves an eigenvalue problem to obtain a minimum of W P [ ξ ] /W K [ ξ ], where W K [ ξ ] = (cid:82) V P d x ρ | ξ | is the kinetic energy associated with the displacement vector ξ and ρ is thedensity. As perturbed equilibria are stationary points of an energy functional similar to126 P [ ξ ], not W P [ ξ ] /W K [ ξ ], such stability codes based on eigenvalue calculations need to bemodiﬁed in order to compute perturbed equilibrium states. The CAS3D code allows theoption to normalize W P [ ξ ] by a modiﬁed energy functional such that perturbed equilibriumstates can be computed [28, 173]. This technique has been used to study the eﬀect ofboundary perturbations on magnetic island width [174].While several 3D MHD stability codes exist [7, 204, 219], they cannot be directly usedto compute perturbed equilibrium states relevant for stellarator optimization problems. Forstability studies, it is often suﬃcient to consider only symmetry-breaking modes (modes thatbreak period symmetry or stellarator symmetry), while optimization is typically performedassuming preservation of symmetry. Furthermore, none of the existing codes enable theaddition of a general bulk force perturbation as is required for our adjoint approach.There are additional limitations that motivate us to consider the development of an in-dependent linearized equilibrium code. The DCON and CAS3D approaches minimize theirrespective energy functionals assuming that the displacement vector is divergenceless. Thisassumption implies that (cid:104) ξ · ∇ ψ (cid:105) ψ vanishes [153, 204], where (cid:104) . . . (cid:105) ψ is the ﬂux-surface av-erage (A.10). This places a signiﬁcant restriction on ξ ψ ≡ ξ · ∇ ψ that cannot generally besatisﬁed in addition to the Euler-Lagrange equation. Therefore, modes that are constrainedby (cid:104) ξ ψ (cid:105) ψ = 0 cannot be included in the Euler-Lagrange equation. In axisymmetry, this dis-allows the toroidal mode number n = 0. In stellarator geometry with discrete N P -symmetry,this disallows modes where n is an integer multiple of N P (sometimes called the N = 0 modefamily [204]). This assumption is valid for stability problems, as such modes correspondingto ﬁxed-boundary perturbations are always stable [204]. However, for stellarator optimiza-tion and tolerance calculations, these modes cannot be ignored. Rather than assume that ∇ · ξ = 0, for adjoint calculations it is much more convenient to assume that ξ · B = 0,which enables the inclusion of these modes. Finally, the postprocessing of results diﬀers sig-niﬁcantly between stability and perturbed equilibria applications. The development of sucha 3D perturbed equilibrium code could substantially reduce the computational complexityof gradient-based optimization by enabling the application of the adjoint approach to manycritical objective functions. Such a tool would also allow for the analysis of the response ofan equilibrium to boundary perturbations without resorting to a full nonlinear calculation.This capability would improve ﬁxed-boundary optimization when an adjoint method is notavailable for sensitivity and tolerance studies.In Section 6.2, we present the proposed method to compute linearized equilibrium stateswith the addition of an arbitrary bulk force. This method is based on a variational principlesimilar to that used in the DCON code. In Section 6.3, we analyze the behavior of classes ofmodes of the displacement vector in the simpliﬁed geometry of a screw pinch. In this way,we highlight key numerical challenges and proposed solution methods. Finally, in Section6.4, we demonstrate this method for the computation of the shape gradient of a ﬁgure of This assumption is made in the original version of CAS3D [204]. There exists the option to retain theterms in the energy functional involving ∇ · ξ in a more recent version [172]. This arises from noting (cid:104)∇ · ξ (cid:105) ψ = V (cid:48) ( ψ ) − d/dψ (cid:0) V (cid:48) ( ψ ) (cid:104) ξ · ∇ ψ (cid:105) ψ (cid:1) , thus V (cid:48) ( ψ ) (cid:104) ξ · ∇ ψ (cid:105) ψ must be aconstant. As ξ · ∇ ψ must vanish at the origin due to regularity while V (cid:48) ( ψ ) is ﬁnite at the origin, thequantity V (cid:48) ( ψ ) (cid:104) ξ · ∇ ψ (cid:105) ψ = 0. We consider a base equilibrium magnetic ﬁeld satisfying MHD force balance,( ∇ × B ) × B = µ ∇ p, (6.1)with prescribed pressure p ( ψ ) and rotational transform ι ( ψ ). We would like to computelinearizations about this state satisfying, F [ ξ ] + δ F = 0 , (6.2)where the MHD force operator is F [ ξ ] = (cid:0) ∇ × δ B [ ξ ] (cid:1) × B µ + ( ∇ × B ) × δ B [ ξ ] µ − ∇ (cid:0) δp [ ξ ] (cid:1) , (6.3)and δ F is a bulk force perturbation. The perturbed magnetic ﬁeld can be expressed in termsof the displacement vector ξ , δ B [ ξ ] = ∇ × ( ξ × B ) , (6.4)under the assumption that the rotational transform ι ( ψ ) is preserved by the perturbation.In this Chapter, we will not consider the eﬀect of perturbations to the rotational transform,although such eﬀects are necessary to compute the shape gradient of certain ﬁgures of merit.Assuming the pressure proﬁle is ﬁxed by the perturbation, then we can also express theperturbation to the local pressure in terms of the displacement vector, δp [ ξ ] = − ξ · ∇ p. (6.5)The linearized force balance equation is solved subject to a boundary condition, ξ · ˆ n (cid:12)(cid:12) S P = δ x · ˆ n , (6.6)for a prescribed boundary perturbation δ x · ˆ n . We can express this PDE (6.2) with boundarycondition (6.6) in an equivalent variational form involving the energy functional, W [ ξ ] = (cid:90) V P d x ξ · (cid:0) F [ ξ ] + 2 δ F (cid:1) + 1 µ (cid:90) S P d x ˆ n · (cid:0) ξ δ B [ ξ ] (cid:1) · B . (6.7)Stationary points of W [ ξ ] subject to the boundary condition (6.6) are equivalent to solutionsof (6.2). While (6.2) is a coupled set of PDEs involving two components of the displacementvector, the application of the variational principle will allow us to arrive at an Euler-Lagrangeequation that is a coupled set of ODEs for one component of the displacement vector.We now demonstrate that stationary points of (6.7) with respect to ξ subject to theboundary condition (6.6) indeed correspond with solutions of (6.2). We perform the ﬁrst128ariation with respect to ξ , δW [ ξ ; δ ξ ] = (cid:90) V P d x (cid:16) δ ξ · (cid:0) F [ ξ ] + 2 δ F (cid:1) + ξ · F [ δ ξ ] (cid:17) + 1 µ (cid:90) S P d x ˆ n · (cid:0) δ ξ δ B [ ξ ] + ξ δ B [ δ ξ ] (cid:1) · B . (6.8)We now apply the self-adjointness of the MHD force operator (5.7), repeated here for con-venience, (cid:90) V P d x (cid:0) ξ · F [ ξ ] − ξ · F [ ξ ] (cid:1) − µ (cid:90) S P d x ˆ n · (cid:0) ξ δ B [ ξ ] · B − ξ δ B [ ξ ] · B (cid:1) = 0 , (6.9)to obtain, δW [ ξ ; δ ξ ] = 2 (cid:90) V P d x (cid:16) δ ξ · (cid:0) F [ ξ ] + δ F (cid:1)(cid:17) , (6.10)where the boundary term vanishes due to (6.6). As δW [ ξ ; δ ξ ] must vanish for any δ ξ , weobtain (6.2) as our Euler-Lagrange equation. Thus stationary points of W [ ξ ] correspondwith solutions of (6.2).We can now obtain a simpliﬁed Euler-Lagrange equation from manipulations of ourenergy functional (6.7). A vector identity is applied in order to obtain, W [ ξ ] = (cid:90) V P d x (cid:20) − δ B [ ξ ] · δ B [ ξ ] µ + ξ · J × δ B [ ξ ] + ξ · ∇ ( ξ · ∇ p ) + 2 ξ · δ F (cid:21) . (6.11)The energy functional now does not depend on second derivatives of the displacement vector.This form of the energy functional is further simpliﬁed in Appendix P. We apply anothervector identity to obtain, W [ ξ ] = (cid:90) V P d x (cid:20) − δ B [ ξ ] · δ B [ ξ ] µ + ξ · J × δ B [ ξ ] − ( ξ · ∇ p ) ∇ · ξ + 2 ξ · δ F (cid:21) − (cid:90) S P d x ξ · ˆ n ξ · ∇ p. (6.12)We can drop this boundary term, as variations that respect the boundary condition (6.6)will automatically make it vanish. We note that this energy functional is the same (to withinoverall constants) as (12) in [80] if γ = 0, though we have allowed for the inclusion of anadditional bulk force.Minimization of W [ ξ ] is performed upon expressing the magnetic ﬁeld in a magneticcoordinate system (Appendix A.3), B = ∇ ψ × ∇ ϑ − ι ( ψ ) ∇ ψ × ∇ ϕ. (6.13)From the assumption that ξ · B = 0, in such a coordinate system, the energy functional onlydepends on the radial, ξ ψ = ξ · ∇ ψ, (6.14)129nd in-surface, ξ α = ξ · (cid:0) ∇ ϑ − ι ( ψ ) ∇ ϕ (cid:1) , (6.15)components of the displacement vector. Furthermore, we note that no radial derivatives of ξ α appear in the energy functional, as we can express the perturbed magnetic ﬁeld as, δ B = ∇ ξ α × ∇ ψ + ∇ × (cid:16) ξ ψ (cid:0) ι ( ψ ) ∇ ϕ − ∇ ϑ (cid:1)(cid:17) . (6.16)Upon further manipulations of the energy functional (Appendix P), we also note that ξ α only appears under derivatives with respect to ϑ and ϕ in the ﬁrst three terms of the energyfunctional (6.11). Given certain constraints on the bulk force perturbation that can alwaysbe satisﬁed (Appendix Q), we are free to choose (cid:82) π dϑ (cid:82) π dϕ ξ α = 0 on all surfaces. Thisreﬂects the fact that constant shifts of ξ α on a surface do not change the perturbed magneticﬁeld.We express the radial component of the displacement vector in a Fourier series, ξ ψ ( ψ, ϑ, ϕ ) = (cid:88) m,n (cid:16) ξ ψcm,n ( ψ ) cos( mϑ − nϕ ) + ξ ψsm,n ( ψ ) sin( mϑ − nϕ ) (cid:17) (6.17)= Ξ ψ · F ψ . Here Ξ ψ is interpreted as a vector of Fourier amplitudes and F ψ is a vector of the Fourierbasis functions. We similarly expand ξ α in a Fourier series, ξ α = (cid:88) m,n ;max( | m | , | n | ) (cid:54) =0 (cid:16) ξ αcm,n ( ψ ) sin( mϑ − nϕ ) + ξ αsm,n ( ψ ) cos( mϑ − nϕ ) (cid:17) (6.18)= Ξ α · F α . As we are free to shift ξ α by a constant on each surface, we can take the m = 0, n = 0mode of ξ α to vanish. If the equilibrium geometric quantities have a deﬁnite parity withrespect to ϑ and ϕ and the prescribed boundary perturbation and bulk force perturbationmaintains this parity, then ξ ψ will have the same parity as the equilibrium and ξ α willhave the opposite parity. For example, if the equilibrium is stellarator symmetric [53] (thecylindrical coordinates satisfy R ( ψ, − ϑ, − ϕ ) = R ( ψ, ϑ, ϕ ) and Z ( ψ, − ϑ, − ϕ ) = − Z ( ψ, ϑ, ϕ ))and this parity is maintained by the perturbation, only the cosine series is needed for ξ ψ andthe sine series is needed for ξ α . We will assume stellarator symmetry for the remainder ofthis Chapter for simplicity of the presentation.We similarly express the bulk force perturbation in a magnetic coordinate system, δ F = δF ψ ∇ ψ + δF α (cid:0) ∇ ϑ − ι ( ψ ) ∇ ϕ (cid:1) . (6.19)This results from the parallel force balance condition (6.2), which implies that δ F · ˆ b = 0.130he energy functional can be expressed schematically as, W [ Ξ ψ , Ξ α ] = (cid:90) V P dψ (cid:20) Ξ (cid:48) ψ ( ψ ) · (cid:16) A ψ (cid:48) ψ (cid:48) Ξ (cid:48) ψ ( ψ ) (cid:17) + Ξ ψ · (cid:16) A ψψ Ξ ψ + A ψψ (cid:48) Ξ (cid:48) ψ ( ψ ) + I ψ (cid:17) + Ξ α · (cid:16) A αα Ξ α + A αψ (cid:48) Ξ (cid:48) ψ ( ψ ) + A αψ Ξ ψ + I α (cid:17) (cid:21) , (6.20)upon integration over ϑ and ϕ . Explicit forms for the coeﬃcient matrices are provided inAppendix P.We now perform variations with respect to the in-surface component, δW [ Ξ ψ , Ξ α ; δ Ξ α ] = (cid:90) V P dψ δ Ξ α · (cid:20) A αα Ξ α + A αψ (cid:48) Ξ (cid:48) ψ ( ψ ) + A αψ Ξ ψ + I α (cid:21) , (6.21)where we have noted that A αα can be made symmetric due to the self-adjointness of the MHDforce operator. (The explicit form given in Appendix P is evidently symmetric.) Thus thein-surface component can be expressed in terms of the radial component of the displacementvector using the corresponding Euler-Lagrange equation,2 A αα Ξ α + A αψ (cid:48) Ξ (cid:48) ψ ( ψ ) + A αψ Ξ ψ + I α = 0 . (6.22)As shown in Appendix P, A αα is invertible, so we ﬁnd the reduced energy functional to be, W [ Ξ ψ ] = (cid:90) V P dψ (cid:20) Ξ ψ · (cid:16) C ψψ Ξ ψ + C ψψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ (cid:17) + Ξ (cid:48) ψ ( ψ ) · (cid:16) C ψ (cid:48) ψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ (cid:48) (cid:17) − I α · A − αα I α (cid:21) , (6.23)with, C ψψ = A ψψ − A Tαψ A − αα A αψ (6.24a) C ψψ (cid:48) = A ψψ (cid:48) − A Tαψ A − αα A αψ (cid:48) (6.24b) C ψ (cid:48) ψ (cid:48) = A ψ (cid:48) ψ (cid:48) − A Tαψ (cid:48) A − αα A αψ (cid:48) (6.24c) K ψ = I ψ − A Tαψ A − αα I α (6.24d) K ψ (cid:48) = − A Tαψ (cid:48) A − αα I α . (6.24e)We now perform variations with respect to Ξ ψ , δW [ Ξ ψ ; δ Ξ ψ ] = (cid:90) V P dψ δ Ξ ψ · (cid:20) C ψψ Ξ ψ + C ψψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ − ddψ (cid:16) C Tψψ (cid:48) Ξ ψ + 2 C ψ (cid:48) ψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ (cid:48) (cid:17) (cid:21) , (6.25)131o obtain the following Euler-Lagrange equation,2 C ψψ Ξ ψ + C ψψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ − ddψ (cid:16) C Tψψ (cid:48) Ξ ψ + 2 C ψ (cid:48) ψ (cid:48) Ξ (cid:48) ψ ( ψ ) + K ψ (cid:48) (cid:17) = 0 . (6.26)We deﬁne our vector of unknowns as, −→ u =  Ξ ψ C Tψψ (cid:48) Ξ ψ + 2 C ψ (cid:48) ψ (cid:48) Ξ (cid:48) ψ ( ψ )  , (6.27)so that our Euler-Lagrange equation takes the form, ←→ L −→ u + ←→ L −→ u (cid:48) ( ψ ) + −→ b = 0, with, ←→ L =  C Tψψ (cid:48) − I C ψψ  (6.28a) ←→ L =  C ψ (cid:48) ψ (cid:48) C ψψ (cid:48) − I  (6.28b) −→ b =  K ψ − K (cid:48) ψ (cid:48) ( ψ )  . (6.28c)Currently this is an implicit system of diﬀerential equations. When ←→ L is invertible, thissystem can be transformed into an explicit system of ODEs. If det (cid:0) C ψ (cid:48) ψ (cid:48) (cid:1) = 0 at a point ψ = ψ s and C − ψ (cid:48) ψ (cid:48) ∼ / ( ψ − ψ s ) to leading order near ψ s , then ψ s is a regular singularpoint. At such points, additional care must be taken in obtaining numerical solutions tothe Euler-Lagrange equation. In analogy with regular singular points of an uncoupled ODE,power series solutions can be constructed near ψ s using a matrix form of Frobenius analysis(Chapter 4 in [41]). As discussed in [80], for the Euler-Lagrange equation under consider-ation, such singular points occur when ψ = 0, ι = 0, or mι ( ψ ) − n = 0 for any m and n included in the spectrum for ξ ψ and ξ α . This singular behavior is discussed in more detailin Section 6.3.This coupled set of second-order ODEs is solved with a boundary condition of Ξ ψ (0) = 0and Ξ ψ ( ψ ) speciﬁed according to the prescribed boundary perturbation, ξ ψcm,n ( ψ ) = (cid:82) π dϑ (cid:82) π dϕ δ x · ∇ ψ cos( mϑ − nϕ ) (cid:82) π dϑ (cid:82) π dϕ cos( mϑ − nϕ ) , (6.29)where ψ is the ﬂux label on the plasma boundary S P . As ∇ ψ vanishes at the origin, werequire that Ξ ψ (0) = 0 such that the displacement vector remains ﬁnite.The approach presented in this Section is very similar to that of the DCON approach,with several important distinctions. (1) Rather than assuming ∇ · ξ = 0, we have assumedˆ b · ξ . This allows us to include n = 0 modes in our displacement vector in axisymmetryand n that are an integer multiple of the number of periods in N P symmetry. (2) We haveallowed for the inclusion of a general bulk force, given it is consistent with the conventionswe have adopted for our displacement vector (ˆ b · ξ = 0 and ξ αc , = 0). (3) DCON solves an132nitial value problem by integrating a set of linearly-independent solutions that are regularat the axis. We instead solve a BVP. (4) Our treatment of singular surfaces diﬀers slightlyfrom that of DCON, as is described in Section 6.3.4. To further analyze the behavior of the solutions to the linearized equilibrium equations,we will consider the simpliﬁed geometry of a one-dimensional screw pinch. A screw pinch isan inﬁnite cylindrical device with ﬁeld lines that lie on surfaces of constant radius r . Theﬁeld lines generally have both a toroidal (ˆ z ) and poloidal ( ˆ θ ) component. We assume acylindrical coordinate system with ˆ r × ˆ θ · ˆ z = 1 where all equilibrium quantities only dependon r . The inﬁnite length of a screw pinch is approximated by a cylindrical torus with majorradius R (cid:29) B = ψ (cid:48) ( r ) (cid:32) ˆ z r + ι ( r ) ˆ θ R (cid:33) . (6.30)Here ψ ( r ) is the toroidal ﬂux label,2 πψ ( r ) = (cid:90) π dθ (cid:90) r dr (cid:48) r (cid:48) B · ˆ z , (6.31)and ι ( r ) is the rotational transform, ι ( r ) = R B · ∇ θ B · ∇ z , (6.32)the number of poloidal rotations of the ﬁeld line through a z displacement of 2 πR . We notethat θ and z/R are magnetic coordinates for this system. The MHD force balance equation(6.1) for this geometry becomes, ddr (cid:18) µ p ( r ) + 12 r (cid:0) ψ (cid:48) ( r ) (cid:1) (cid:19) + ι ( r ) ψ (cid:48) ( r ) rR ddr (cid:0) rι ( r ) ψ (cid:48) ( r ) (cid:1) = 0 , (6.33)where ι ( ψ ), p ( ψ ) and ψ ≡ ψ ( r = 1) are prescribed. The solution is obtained for r ∈ [0 , ψ ( r = 0) = 0.Due to the toroidal and poloidal symmetry of this equilibrium, each of the Fourier modesof the displacement vector decouple from each other, and we can consider each mode indepen-dently. Although the Euler-Lagrange equation is solved for ξ ψ ( ψ ), it is more straightforwardto analyze the nature of the solutions in terms of ξ r ( r ) = ξ · ∇ r . Thus we will discuss theEuler-Lagrange equation in terms of modes of ξ r , (cid:16) ξ rcm,n (cid:17) (cid:48)(cid:48) ( r ) = B ( r ) (cid:16) ξ rcm,n (cid:17) (cid:48) ( r ) + B ( r ) ξ rcm,n ( r ) + B ( r ) . (6.34)We consider a bulk force perturbation of the form, δ F = (cid:88) m,n δF m,nrc ( r ) cos (cid:18) mθ − n zR (cid:19) ˆ r + δF m,nαs ( r ) sin (cid:18) mθ − n zR (cid:19) (cid:18) r ˆ θ − ι ( r ) R ˆ z (cid:19) , (6.35)133nd a boundary condition given by, ξ r (1) = (cid:88) m,n ξ rcm,n (1) cos (cid:18) mθ − n zR (cid:19) . (6.36) m = 0 , n = 0 mode We begin with a discussion of the m = 0, n = 0 mode. The coeﬃcients appearing in theEuler-Lagrange equation (6.34) become, B ( r ) = R − r ι ( r )( ι ( r ) + 2 rι (cid:48) ( r )) r ( R + r ι ( r ) ) − ψ (cid:48)(cid:48) ( r ) ψ (cid:48) ( r ) (6.37a) B ( r ) = (3 R − r ι ( r ) ) ψ (cid:48) ( r ) − rR ψ (cid:48)(cid:48) ( r ) r ( R + r ι ( r ) ) ψ (cid:48) ( r ) (6.37b) B ( r ) = − µ r δF , rc ( r )(1 + r ι ( r ) /R ) ψ (cid:48) ( r ) . (6.37c)We note that the Euler-Lagrange equation exhibits regular singular behavior at r = 0. Tostudy the regular singular behavior near the axis in more detail, we expand the toroidal ﬂuxas, ψ ( r ) = ψ r + O ( r ) , (6.38)where ψ is some constant, which follows from noting that ψ ( r ) must be even in r from(6.33). From the indicial equation for the homogeneous problem with B ( r ) = 0, we ﬁndthe leading order behavior to be ξ rc , ( r ) ∼ r ± near the origin. The negative root will beexcluded given our boundary condition on the axis; thus, we expect a smooth solution forthe radial displacement vector. The leading order behavior of the inhomogeneous problemwill depend on the bulk force perturbation of interest.We ﬁrst demonstrate a perturbed equilibrium with an imposed boundary perturbationand no force perturbation, ξ rc , (1) = 1 δF , rc ( r ) = 0 . (6.39)The boundary value problem is solved with MATLAB’s bvp4c routine, which employs animplicit Runge-Kutta method with adaptive mesh reﬁnement [128]. Given that the coeﬃ-cients become singular on the axis, the axis is not included on the computational grid, andthe inner boundary condition is imposed at a point near the axis, ψ min . For the calculationsin this Chapter, we use ψ min ∼ − − − . (While some numerical methods for BVPs donot require the evaluation of the ODE at the boundary points, such as ﬁnite-diﬀerence orcollocation methods, our numerical method requires evaluation at the origin.)The Euler-Lagrange equation is computed for a VMEC [111] equilibrium, approximatinga screw pinch by imposing a large aspect ratio boundary, R ( ψ , θ b ) = R + a cos( θ b ) Z ( ψ , θ b ) = a sin( θ b ) , (6.40) a = 1 and R = 10 . The angle θ b ∈ [0 , π ] is used to parameterize the boundary.The proﬁles are taken to be p ( ψ ) = 10 − × (cid:0) ψ/ψ (cid:1) + 2 . × ( ψ/ψ ) and ι ( ψ ) =10 + 5 × ( ψ/ψ ) + 2 × ( ψ/ψ ) . The equilibrium ﬂux and proﬁles are presented inFigure 6.1.We compare the numerical solution of the Euler-Lagrange equation with the displacementvector computed from ﬁnite-diﬀerence calculations with the nonlinear VMEC code. Weimpose a perturbed boundary of the form, δR ( ψ , θ b ) = ∆ cos( θ b ) δZ ( ψ , θ b ) = ∆ sin( θ b ) . (6.41)We apply a two-point centered diﬀerence derivative with a step size of ∆ = 10 − . Theresulting displacement vector is computed from, ξ ψ ( ψ, ϑ ) = δR ( ψ, ϑ ) ∂ψ ( R, Z ) ∂R + δZ ( ψ, θ ) ∂ψ ( R, Z ) ∂Z , (6.42)where δR ( ψ, ϑ ) and δZ ( ψ, ϑ ) are the measured changes in the cylindrical coordinates atﬁxed ﬂux label and straight ﬁeld line poloidal angle. The result of the calculation is shownin Figure 6.2, where we observe good agreement between the ﬁnite-diﬀerence and Euler-Lagrange results with a volume-averaged error,∆ V = (cid:82) V P d x (cid:16) ξ r VMEC − ξ r Euler-Lagrange (cid:17) (cid:82) V P d x (cid:0) ξ r VMEC (cid:1) , (6.43)of 2 . × − .We next consider a perturbed equilibrium state corresponding to the addition of a bulkforce in the form of the gradient of a scalar pressure perturbation, ξ rc , (1) = 0 δF , rc ( r ) = − δp (cid:48) ( r ) . (6.44)This type of bulk force perturbation is necessary to compute the shape gradient for thevacuum magnetic well and beta ﬁgures of merit discussed in Chapter 5. We take δp ( r ) = p ( r ),the unperturbed pressure proﬁle. The Euler-Lagrange solution is compared with a ﬁnite-diﬀerence VMEC calculation, δp ( ψ ) = ∆ p ( ψ ) , (6.45)computed with a two-point centered-diﬀerence stencil of amplitude ∆ = 10 − . The resultingdisplacement vectors are displayed in Figure 6.3, where we again observe good agreementbetween the linearized solution and its approximation with a ﬁnite-diﬀerence derivative of thenonlinear solution. The volume-averaged fractional diﬀerence (6.43) between the solutionsis found to be 1 . × − . 135 () / R (a) p () (b) r (c) Figure 6.1: Equilibrium (a) rotational transform and (b) pressure proﬁles used for screwpinch calculations. (c) Equilibrium ﬂux computed with these proﬁles.136igure 6.2: Benchmark of screw pinch m = 0, n = 0 mode with applied boundary pertur-bation (6.39). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.37) iscompared with a ﬁnite-diﬀerence VMEC calculation.137igure 6.3: Benchmark of screw pinch m = 0, n = 0 mode with applied pressure pertur-bation (6.44). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.37) iscompared with a ﬁnite-diﬀerence VMEC calculation.138 .3.2 n = 0 , m (cid:54) = 0 modes We next consider the behavior of the n = 0, m (cid:54) = 0 modes. The coeﬃcients appearingthe Euler-Lagrange equation (6.34) are, B ( r ) = − r − ι (cid:48) ( r ) ι ( r ) − ψ (cid:48)(cid:48) ( r ) ψ (cid:48) ( r ) (6.46a) B ( r ) = m − r (6.46b) B ( r ) = − µ R mδF m, rc + δ (cid:0) F m, αs (cid:1) (cid:48) ( r ) mι ( r ) ψ (cid:48) ( r ) . (6.46c)In addition to the regular singular point on the axis, we note that the coeﬃcients becomesingular when ι ( r ) = 0. This class of equilibria is typically not of interest, so we will notconsider this type of singularity. Expanding the displacement vector as a power series nearthe origin, we ﬁnd the leading order behavior of the homogeneous solution to be ξ rcm, ∼ r − ± m .As ψ ( r ) ∼ r to leading order near the axis, we note that ξ ψcm, ∼ ψ ±| m | / . In order to satisfythe boundary condition at ψ = 0, the minus solution is excluded. As ξ ψcm, ( ψ ) becomes non-smooth at the origin, additional care must be taken in obtaining the numerical solution. Weﬁnd that the accuracy is improved by solving the BVP on a grid in √ ψ rather than ψ , asthe solution is expected to be a smooth function of √ ψ ( ξ ψcm, ( √ ψ ) ∼ (cid:0) √ ψ (cid:1) m ). To ensure theaccuracy of the coeﬃcients near the axis, we additionally employ a near-axis expansion of theequilibrium equations to O ( r ) (Appendix R). The incorporation of the near-axis solutionbecomes important when linearizing about equilibria computed with the VMEC code, whichexhibits poor resolution near the magnetic axis.To demonstrate this method, we perform a benchmark of the homogeneous problem withan m = 1 boundary perturbation, ξ rc , (1) = 1 δF , rc ( r ) = 0 . (6.47)The same equilibrium proﬁles are used as those in Section 6.3.1. We perform a benchmarkbetween solutions of the Euler-Lagrange equation and ﬁnite-diﬀerence approximations withVMEC equilibria. A boundary perturbation of the form, δR ( ψ , θ b ) = ∆ cos(2 θ b ) δZ ( ψ , θ b ) = ∆ sin(2 θ b ) , (6.48)is imposed. The amplitude of the perturbation is taken to be ∆ = 10 − , and the perturbedequilibrium state is computed with a two-point centered-diﬀerence stencil.The resulting displacement vector is presented in Figure 6.4. We indeed ﬁnd that thedisplacement vector has very sharp derivatives near the origin, though our numerical methodcan reproduce the solution obtained from VMEC. The volume-averaged fractional errorbetween the solutions is found to be ∆ V = 5 . × − .139igure 6.4: Benchmark of screw pinch m = 1, n = 0 mode with applied boundary pertur-bation (6.47). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.46) iscompared with a ﬁnite-diﬀerence VMEC calculation.140 .3.3 m = 0 , n (cid:54) = 0 modes We next consider the m = 0, n (cid:54) = 0 modes, for which the coeﬃcients of the Euler-Lagrangeequation take the form, B ( r ) = 1 r − ψ (cid:48)(cid:48) ( r ) ψ (cid:48) ( r ) (6.49a) B ( r ) = 3 r + n R − R ι ( r ) (cid:0) ι ( r ) + rι (cid:48) ( r ) (cid:1) − r R ι ( r ) ) rψ (cid:48) ( r ) ψ (cid:48)(cid:48) ( r ) (6.49b) B ( r ) = − µ r (cid:16) nrδF ,nrc ( r ) + rι ( r ) (cid:0) δF ,nαs (cid:1) (cid:48) ( r ) + δF ,nαs (cid:0) ι ( r ) + rι (cid:48) ( r ) (cid:1)(cid:17) nψ (cid:48) ( r ) . (6.49c)Although the ODE exhibits a regular singular point at the axis, we expect regular behaviorof the homogenous solution near the origin, as the indicial equation implies that ξ rc ,n ( r ) ∼ r . Analytic solutions

We can compare numerical solutions of the Euler-Lagrange equation with an analyticsolutions in certain limits. Assuming ι = 0 and p = 0, we ﬁnd that the equilibrium ﬂux(6.33) satisﬁes ψ ( r ) = ψ r . We consider a perturbed equilibrium problem corresponding toa boundary perturbation and no force perturbation, ξ rc ,n (1) = 1 δF ,nrc ( r ) = 0 . (6.50)In this case, we recover the modiﬁed Bessel equation, n r R (cid:16) ξ rc ,n (cid:17) (cid:48)(cid:48) (cid:18) nrR (cid:19) + nrR (cid:16) ξ rc ,n (cid:17) (cid:48) (cid:18) nrR (cid:19) − (cid:32) n r R (cid:33) ξ rc ,n (cid:18) nrR (cid:19) = 0 . (6.51)The two solutions are I ( nr/R ) and K ( nr/R ), the modiﬁed Bessel functions of the ﬁrstand second kind. As the solution must be ﬁnite at the origin we ﬁnd, ξ rc ,n ( r ) = I (cid:16) nrR (cid:17) I (cid:16) nR (cid:17) . (6.52)A comparison between the n = 1 Euler-Lagrange solution and analytic solution is given inFigure 6.5. The volume-averaged fractional error between the solutions is ∆ V = 1 . × − .We now consider the inhomogeneous problem with a bulk force given by δF ,nrc ( r ) =1 / ( rµ ). In this case, our Euler-Lagrange equation takes the form of an inhomogeneousmodiﬁed Bessel equation, n r R (cid:16) ξ rc ,n (cid:17) (cid:48)(cid:48) (cid:18) nrR (cid:19) + nrR (cid:16) ξ rc ,n (cid:17) (cid:48) (cid:18) nrR (cid:19) − (cid:32) n r R (cid:33) ξ rc ,n (cid:18) nrR (cid:19) + r (2 ψ ) = 0 . (6.53)141igure 6.5: Benchmark of screw pinch m = 0, n = 1 mode with an applied boundaryperturbation (6.50). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients(6.49) is compared with an analytic solution (6.52).142igure 6.6: Benchmark of screw pinch m = 0, n = 1 mode with a bulk force perturbation δF , rc = 1 /r . The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.49) iscompared with an analytic solution (6.54).The solution satisfying the BVP is given by, ξ rc ,n ( r ) = R (2 ψ ) rn I (cid:16) nR (cid:17) (cid:32) rI (cid:18) nrR (cid:19) (cid:32) − R + nK (cid:18) nR (cid:19)(cid:33) + I (cid:18) nR (cid:19) (cid:32) R − nrK (cid:18) nrR (cid:19)(cid:33) (cid:33) . (6.54)We note that xK ( x ) ∼ (cid:0) A + B log( x ) (cid:1) x for constants A and B near x = 0, so ourdisplacement vector is not smooth. We ﬁnd that the numerical solution depends very sensi-tively on the accuracy of the coeﬃcients, and it becomes useful to employ the axis expansiondescribed in Appendix R. We compare the resulting numerical and analytic Euler-Lagrangesolutions in Figure 6.6. The volume-averaged fractional error (6.43) between the numericalEuler-Lagrange solution and analytic solution is ∆ V = 6 . × − .143 .3.4 m (cid:54) = 0 , n (cid:54) = 0 modes Finally, we consider modes with m (cid:54) = 0 and n (cid:54) = 0, for which the Euler-Lagrange coeﬃ-cients take the form, B ( r ) = − r + 2 n rn r + m R + 2 mι (cid:48) ( r ) n − mι ( r ) − ψ (cid:48)(cid:48) ( r ) ψ (cid:48) ( r ) (6.55a) B ( r ) = 2 n rµ p (cid:48) ( r )( n − mι ( r )) ψ (cid:48) ( r ) + n ( − m ) + n r R + m ( m − R r + n n − mι ( r ) n r + m R (6.55b) B ( r ) = − µ n r + m R ( n − mι ( r )) ψ (cid:48) ( r ) δF m,nrc − µ mR + nr ι ( r )( n − mι ( r )) ψ (cid:48) ( r ) ( δF m,nαs ) (cid:48) ( r ) (6.55c) − µ nr (cid:0) − mnR + 2( n r + 2 m R ) ι ( r ) + ( n r + m rR ) ι (cid:48) ( r ) (cid:1) ( n r + m R )( n − mι ( r )) ( ψ (cid:48) ( r )) δF m,nαs . By expanding the solution in a power series, we note the behavior of the solution variesas ξ rcm,n ∼ r m − near the origin. Thus, as for modes with n = 0 and m (cid:54) = 0, ξ ψ will varywith fractional powers of ψ . The numerical treatment of these modes beneﬁts from accuratecalculations of the coeﬃcients with the near-axis expansion. In addition to the regularsingular point at r = 0, we note that there will also be a singular point on surfaces where ι ( r ) = n/m .One method to treat singular surfaces relies on a series expansion of the displacementvector within a boundary layer near the singularity. The method of Frobenius yields twoindependent solutions of the second-order ODE, ξ r series ( r ) = A ξ r, ( r ) + A ξ r, ( r ) , (6.56)near a resonant surface at r = r s . A numerical solution of the ODE, ξ r num ( r ) is integratedfrom the axis to the beginning of the boundary layer at r = r s − r b . The two constants, A and A , are ﬁxed by matching the numerical solution and its derivative at r s − r b . Theseries solution is then evaluated at the other edge of the boundary layer at r s + r b . Thenumerical solution is integrated to the plasma boundary at r = 1 using the initial conditions ξ r num ( r s + r b ) = ξ r series ( r s + r b ) and ( ξ r num ) (cid:48) ( r s + r b ) = ( ξ r series ) (cid:48) ( r s + r b ). A shooting methodis used to solve the BVP. This technique is similar to that used in the DCON [80] code.However, in DCON only one independent series solution is considered, as the other is not anelement of the required function space for the generalized Newcomb crossing criteria.While the above method can reproduce the singular behavior of the Euler-Lagrangeequation, as will be demonstrated shortly, it is not always desirable to include such singularbehavior in the Euler-Lagrange solutions. If the perturbed current density varies as ∼ / ( r − r s ) near the rational surface, this will drive inﬁnite classical transport [97], which is144nphysical. An alternative is to smooth the coeﬃcients artiﬁcially as, B smooth1 ( r ) = B ( r )sign( n − mι ( r )) n − mι ( r ) (cid:112) ( n − mι ( r )) + (cid:15) (6.57a) B smooth2 ( r ) = B ( r ) ( n − mι ( r )) ( n − mι ( r )) + (cid:15) , (6.57b)where (cid:15) (cid:28) (cid:15) →

0, the Euler-Lagrange equation remains unchanged. For small but ﬁnite (cid:15) , the coeﬃcientsare only modiﬁed in the vicinity of r s . This is similar to a technique used in the IPEC [181]code. Analytic solution near singular surfaces

To study the solutions of the Euler-Lagrange equation with m (cid:54) = 0 and n (cid:54) = 0 further,we consider a limit in which analytic solutions can be obtained. We will take p (cid:48) ( ψ ) = 0 and ι ( r ) = ι r where ι is a constant. In this case the force-balance equation (6.33) gives us thefollowing expression for the ﬂux in terms of hypergeometric functions, ψ ( r ) = r ψ F (cid:16) ; ; ; − r ι R (cid:17) F (cid:16) ; ; ; − ι R (cid:17) . (6.58)We deﬁne a variable r s = n/ ( mι ) such that a singular surface occurs at r = r s . Thecoeﬃcients of the homogeneous problem can be expressed as, B ( r ) = 3 r − r s − r s r − rr s − r + r ι /R − R rR + r r s ι (6.59a) B ( r ) = 1 + m r + 4 r s r − r + m r s ι R + 2 R ( r + r s ) r ( r − r s )( R + r r s ι ) . (6.59b)In the limit of small shear, (cid:15) ι = ι r s /R (cid:28)

1, we can approximate the coeﬃcients as, B ( r ) = 3 r s − rr ( r − r s ) + O (cid:0) (cid:15) ι (cid:1) (6.60a) B ( r ) = m − r + O (cid:0) (cid:15) ι (cid:1) . (6.60b)In practice we choose a very small value for this expansion parameter ( (cid:15) ι ∼ − ) so thatdropping the higher order terms is a very good approximation. For the m = 2, n = 1 modesubject to a boundary perturbation, ξ rc , (1) = 1 δF , rc ( r ) = 0 , (6.61)we have the analytic solution, ξ rc , ( r ) = r Re  F (cid:16) − √

7; 3 + √ , , rr s (cid:17) F (cid:16) − √

7; 3 + √ , , r s (cid:17)  . (6.62)145e ﬁrst consider the case in which r s = 2 such that a singular surface does not appearwithin the volume. We compare the numerical solution of the Euler-Lagrange equation witha ﬁnite-diﬀerence calculation with VMEC. We impose a boundary perturbation of the form, δR ( ψ , θ b , φ ) = ∆ cos(3 θ b − φ ) (6.63a) δZ ( ψ , θ b , φ ) = ∆ sin(3 θ b − φ ) , (6.63b)where φ is the geometric toroidal angle. The perturbed ﬁeld is computed with a two-point centered-diﬀerence stencil with amplitude ∆ = 10 − . The results of the calculationsare shown in Figure 6.7. We note that the Euler-Lagrange solution agrees well with theanalytic solution, with a volume-averaged diﬀerence of ∆ V = 1 . × − , but there is asmall discrepancy between the VMEC solution and the analytic solution near the edge,with a volume-averaged diﬀerence of ∆ V = 9 . × − . One possible source of this erroris the treatment of singularities by the VMEC code. While recent results have indicatedthat VMEC equilibria can exhibit 1 /x -like behavior near rational surfaces [144, 160], thenumerical solution is not truly singular on such surfaces, and very large numerical resolutionis necessary in order to see behavior resembling a singularity. Therefore, we do not expect thedisplacement vector computed with ﬁnite-diﬀerence VMEC to agree with the Euler-Lagrangesolution. Although for this equilibrium, ι does not resonate with the harmonics of thedisplacement vector, it may resonate with other modes present in the nonlinear equilibrium.Next we consider an equilibrium with a singular surface in the volume, r s = 0 .

5. TheEuler-Lagrange equation is solved with both the power-series method, which captures thesingular nature of the solution, and the coeﬃcient smoothing method (6.57) with severalvalues of (cid:15) . Again, we compare with a ﬁnite-diﬀerence VMEC solution with a boundaryperturbation given by (6.63). With the power-series method, we ﬁnd agreement between theEuler-Lagrange and analytic solutions. As expected, the solutions with smoothed coeﬃcientsdo not reproduce the analytic expression. However, neither of these approaches approximatesthe VMEC solution well. Although the VMEC equilibrium is fairly well-resolved (701 ﬂuxsurfaces, 10 − force tolerance, m ≤ | n | ≤ r = r s . We may need to consider a revised treatment of thesingularity to match the behavior from VMEC better. We will now demonstrate the linearized equilibrium technique to compute the shapegradient of the vacuum magnetic well ﬁgure of merit discussed in Chapter 5, f W ( S P ) = (cid:90) V P d x w ( ψ ) , (6.64)with, w ( ψ ) = exp( − ( ψ − ψ m, ) /ψ w ) − exp( − ( ψ − ψ m, ) /ψ w ) , (6.65)146igure 6.7: Benchmark of screw pinch m = 2, n = 1 mode with a boundary perturbation(6.61). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.55) is com-pared with an analytic solution (6.62) and a ﬁnite-diﬀerence calculation from VMEC. Thisequilibrium does not contain a resonant surface within the volume.147igure 6.8: Benchmark of screw pinch m = 2, n = 1 mode with a boundary perturbation(6.61). The solution of the Euler-Lagrange equation (6.34) with coeﬃcients (6.55) is com-pared with an analytic solution (6.62) and a ﬁnite-diﬀerence calculation from VMEC. Thisequilibrium contains a resonant surface at r = 0 . ψ = 0 . ψ m, = 0 . ψ , ψ m, = 0 . ψ , and ψ w = 0 . ψ . The shape gradient of f W is obtainedwith an adjoint approach by computing a perturbed equilibrium state corresponding to theaddition of a bulk force with no displacement of the boundary, δ x · ∇ ψ = 0 δ F = −∇ w ( ψ ) . (6.66)The resulting perturbed ﬁeld, δ B [ ξ ], is used to compute the shape gradient, G = δ B [ ξ ] · B µ (cid:12)(cid:12)(cid:12)(cid:12) S P . (6.67)We perform this calculation for an axisymmetric conﬁguration with a plasma boundary givenby, R ( ψ , θ b ) = R + a cos( θ b ) + b cos(2 θ b ) (6.68a) Z ( ψ , θ b ) = a sin( θ b ) − b sin(2 θ b ) , (6.68b)with R = 3, a = 1, and b = 0 .

1. Owing to its toroidal symmetry, all of the toroidalmodes of the displacement vector decouple. Given the toroidal symmetry of the bulk forceperturbation, we only need to consider the n = 0 modes. Therefore, the only singular pointof the Euler-Lagrange equation is at the origin. As before, the magnetic axis is not includedon the computational grid, and the coupled BVP is solved with the bvp4c routine. Theradial displacement vector is computed retaining modes m ≤ δp ( ψ ) = ∆ w ( ψ ) . (6.69)A two-point centered-diﬀerence derivative is computed with magnitude ∆ = 10. The surface-averaged fractional diﬀerence between the Euler-Lagrange and VMEC solutions is computedto be 7 . × − . We have demonstrated a variational method for computing perturbed equilibrium statescorresponding to the addition of a bulk force or boundary perturbation. We considered thesimpliﬁed geometry of a screw pinch to demonstrate the behavior of each of the modes of thedisplacement vector. Numerical solutions of the Euler-Lagrange equation are benchmarkedwith ﬁnite-diﬀerence calculations of the nonlinear equilibrium code, VMEC, and with ana-lytic solutions in certain limits. Finally, we employed this approach to compute the shapegradient of a ﬁgure of merit of interest for stellarator optimization in toroidally symmetricgeometry. We aim to apply this approach for computing such shape gradients in stellaratorgeometry, though this task may be somewhat more challenging. In fully 3D geometry, theremay exist several singular surfaces throughout a volume due to toroidal mode coupling, eachof which needs to be treated carefully,While the Euler-Lagrange equation exhibits singular behavior at rational surfaces, theequilibria computed with the VMEC code do not appear to exhibit any singular response, as149 a) (b)(c)

Figure 6.9: The shape gradient of the vacuum magnetic well (6.64) is computed for a toka-mak equilibrium with triangularity (6.68) with the solution of the Euler-Lagrange equationcorresponding to the adjoint problem (6.66) and a ﬁnite-diﬀerence approximation of theadjoint problem with VMEC (6.69). 150emonstrated in Section 6.3.4. If the goal is to linearize about VMEC equilibria, we thereforemay not want to solve the Euler-Lagrange equation exactly, but to artiﬁcially smooth thecoeﬃcients appearing in the ODE. As an alternative, artiﬁcial viscosity could be addedto the Euler-Lagrange system with the addition of a small term involving a higher-orderderivative. This technique, commonly used in the ﬂuid dynamics community [67, 156], turnsa singular ODE into an ODE with a singular perturbation. It remains to be demonstratedthat the shape gradients obtained from Euler-Lagrange solutions including such smoothingtechniques can reproduce the expected shape gradients computed with the VMEC code.In addition to the demonstration for three-dimensional geometry, there are several inter-esting extensions of the work discussed in this Chapter. As discussed in Chapter 5, there areseveral ﬁgures of merit for which the adjoint problem requires the addition of a perturbationto the prescribed toroidal current proﬁle. This would necessitate generalizing this formula-tion to allow for perturbations to the magnetic ﬁeld that vary the rotational transform proﬁle.While the work in this Chapter has been applied to compute the shape gradient with respectto the plasma boundary, it may be possible to couple perturbations of the boundary to coilperturbations in order to compute the coil shape gradient. This may beneﬁt from a methodsimilar to that used in the IPEC code, in which the virtual casing principle is applied tocouple boundary perturbations to changes in the external magnetic ﬁelds.The further development of this linear equilibrium approach would enable the shapegradient of many additional ﬁgures of merit to be computed with an adjoint method. Evenif an adjoint method is not applied, the linear equilibrium approach could prove very fruitfulfor gradient-based, ﬁxed-boundary optimization. Replacing a ﬁnite-diﬀerence calculationby an analytic derivative may reduce computational cost and noise associated with theﬁnite-diﬀerence step size, enabling more eﬃcient sensitivity and tolerance calculations forstellarator conﬁgurations. 151 hapter 7

Conclusions

In this Thesis, we have aimed to address fundamental challenges (Section 1.4.4) associatedwith stellarator optimization using the adjoint method and shape sensitivity analysis:1.

Coil complexity Non-convexity High-dimensionality Tight engineering tolerances .The adjoint method allows us to eﬃciently compute derivatives in the context of sev-eral problems of interest for stellarator optimization. These derivatives enable navigationthrough high-dimensional, non-convex spaces with gradient-based methods. We demonstrategradient-based optimization with adjoints in Chapter 3, for the design of coil shapes withminimal complexity. Computing the shape gradient of coil metrics to perturbations of thewinding surface allows us to gain intuition about features of conﬁgurations that enable sim-pler coils. We also demonstrate gradient-based optimization of the local magnetic geometryfor ﬁnite-collisionality neoclassical properties in Chapter 5. While including such objectivefunctions is typically prohibitively expensive for non-convex, high-dimensional optimization,we demonstrate convergence toward a local optimum with a minimal number of functionevaluations. With this adjoint method, we also gain intuition of the sensitivity of the boot-strap current and particle ﬂuxes to perturbations in the ﬁeld strength, informing engineeringtolerances. Finally, in Chapter 5 we demonstrate an adjoint method for computing theplasma surface and coil shape gradient for functions that depend on MHD equilibrium solu-tions. Importantly, the coil shape gradient can be used to evaluate engineering tolerances forsuch ﬁgures of merit (Section 2.1.3). While it has not yet been demonstrated in this Thesis,these shape gradients can also enable eﬃcient adjoint-based optimization, either in the spaceof the plasma boundary or coil shapes. As discussed in Section 1.4, the direct optimizationof coil shapes may result in coils that can be more feasibly engineered than those resultingfrom the traditional two-step optimization.For several problems discussed in this Thesis, it is convenient to apply the discrete adjointmethod (Section 2.2.1). For the winding surface optimization problem in Chapter 3, the152orward problem is solved as a discrete linear system, so the discrete adjoint operator canbe obtained by simply taking the matrix transpose. A similar discrete adjoint method wasapplied for neoclassical optimization in Chapter 4, as the discretized form of the drift-kineticequation takes the form of a linear system in the SFINCS code.Physical insight into the structure of the relevant equations can inform the developmentof continuous adjoint methods (Section 2.2.2). For the neoclassical application, the adjointequation was obtained based on an inner product similar to the free-energy norm fromgyrokinetic theory. The self-adjointness of the linear Fokker-Planck operator with respect tothis inner product enabled straightforward calculation of the adjoint operator. For the MHDapplication, the adjoint equation is obtained by noting the self-adjointness of the MHD forceoperator, generalized to allow for perturbations of the rotational transform and currentsin the vacuum region. Finally, in Chapter 6, a variational method for solving the adjointequations obtained in Chapter 5 is presented. Here we are able to borrow a variationalmethod from MHD stability theory to eﬃciently compute the adjoint equilibrium problem.

There are several natural extensions of the work presented in this Thesis. • The advancement of the adjoint approach for functions of MHD equilibria necessitatesthe further development of a linearized equilibrium code, as outlined in Chapter 6.While we have demonstrated this technique for axisymmetric equilibria, we plan toextend it to 3D equilibria. In this way, adjoint methods for computing the shapegradient of the departure from quasi-symmetry (Section 5.5.6), eﬀective ripple (Section5.5.5), and several ﬁnite-collisionality neoclassical quantities (Section 5.5.7) could bedemonstrated. • In Chapter 3, we applied the adjoint method to compute derivatives with respect to thewinding surface parameters. Similarly, we can apply the adjoint method to computederivatives with respect to plasma surface parameters. This would allow for the iden-tiﬁcation of plasma surfaces that do not require overly-complex coils, facilitating theincorporation of coil considerations in plasma conﬁguration optimization [36]. Similarﬁgures of merit (without derivative information) have been used in the ROSE code[59].

We have not yet taken full advantage of derivative information for stellarator optimizationproblems. • The analysis of sensitivity and tolerances presented in this Thesis is based on a localmodel, using a linear approximation of a function with ﬁrst derivative information.153 more accurate global analysis can be computed from Monte-Carlo sampling, whichtypically requires many function evaluations to converge. Uncertainty quantiﬁcationcan be accelerated through the application of a surrogate model of the design space[238] with the incorporation of the uncertainty of the data. A surrogate model is anapproximation to an expensive simulation based on a small number of evaluations ofthe function. The number of required evaluations to build the surrogate is reducedwith a gradient-enhanced Gaussian process regression model [146]; thus the availabil-ity of adjoint-based gradients would enable more accurate uncertainty quantiﬁcation.In addition to sensitivity analysis, once a surrogate is constructed, it can replace theexpensive model during optimization, allowing for more eﬃcient local or global opti-mization. • In particular, one type of surrogate function of interest is a neural network, whichcan be trained more eﬃciently using derivative information. Neural networks withcertain choices of activation functions are diﬀerentiable, and can therefore be optimizedwith gradient-based optimization techniques. Gradient-based shape optimization withneural networks has proven fruitful in the ﬁeld of aerodynamics [222]. • Optimization under uncertainty methods optimize the expected value of an objectivefunction by performing a sample average over a distribution of possible deviations.These techniques can improve the robustness of the optimum by avoiding small localminima and obtaining solutions with reduced risk. This technique has proven eﬀectivefor the optimization of coil shapes with increased tolerances [150, 151], using a Monte-Carlo approach. To avoid the excessive cost of a Monte-Carlo method, a linear orquadratic approximation can be made such that the expectation value and variancecan be computed with derivative information [3] obtained with an adjoint method.We look forward to the adoption of adjoint methods and shape optimization tools formany stellarator design problems. 154 ppendix A: Toroidal coordinate systems

In this Appendix, we brieﬂy review coordinate systems for describing scalar and vectorﬁelds in toroidal systems. Comprehensive introductions to this topic are provided in thetextbook [54], the review article [97], and the tutorial [121].

A.1 Toroidal coordinates

In this Thesis, we often want to describe surfaces of toroidal topology or the volumesenclosed by such surfaces. We can describe the position on a toroidal surface by two angles(Figure A.1). A poloidal angle, denoted by θ , increases by 2 π upon one rotation the shortway around the torus. A toroidal angle, denoted by φ , increases by 2 π upon one rotationthe long way around the torus.We will consider a volume, V , bounded by a toroidal surface, S . Suppose that we usea set of continuously nested toroidal surfaces, Γ( r ), as a radial coordinate r , such that theposition within this volume can be expressed as x ( r, θ, φ ). A vector ﬁeld, A can be expressedin the basis of the gradients of the coordinates, A = A r ∇ r + A θ ∇ θ + A φ ∇ φ, (A.1)the covariant form, or the derivatives of the position vectors with respect to the coordinates, A = A r ∂ x ∂r + A θ ∂ x ∂θ + A φ ∂ x ∂φ , (A.2)Figure A.1: The position on a toroidal surface, S , is described by the toroidal and poloidalangles. Figure adapted from [121]. 155acobian √ g = (cid:16) ∂ x ∂x i × ∂ x ∂x j (cid:17) · ∂ x ∂x k = (cid:16)(cid:0) ∇ x i × ∇ x j (cid:1) · ∇ x k (cid:17) − Diﬀerential volume d x = |√ g | dx i dx j dx k Diﬀerential length d x = (cid:80) i =1 ∂ x ∂x i dx i Diﬀerential surface area (constant x k ) d x = |√ g ||∇ x k | dx i dx j Divergence of vector ﬁeld ∇ · A = (cid:80) i =1 1 √ g ∂∂x i (cid:0) √ gA i (cid:1) Curl of vector ﬁeld ∇ × A = (cid:80) k =1 1 √ g (cid:16) ∂A j ∂x i − ∂A i ∂x j (cid:17) ∂ x ∂x k Gradient of scalar ∇ q = (cid:80) i =1 ∂q∂x i ∇ x i Table A.1: Summary of formulas used to describe the geometry of a non-orthogonal coor-dinate system ( x , x , x ). In the above, { i, j, k } is a cyclic permutation of { , , } . Tableadapted from [121].the contravariant form. The two basis vectors can be related through the dual relations, ∂ x ∂x i = ∇ x j × ∇ x k ∇ x i · ∇ x j × ∇ x k , (A.3)where ( x i , x j , x k ) = ( r, θ, φ ) or cyclic permutations. Such a coordinate system is generallynon-orthogonal, so ∂ x /∂x i is not necessarily parallel to ∇ x i . Several useful relations in non-orthogonal coordinate systems are summarized in Table A.1. For a more detailed discussion,refer to Chapter 2 in [54]. A.2 Flux coordinates

If magnetic surfaces exist, indicating that the magnetic ﬁeld is tangent to a set of con-tinuously nested toroidal surfaces, we can use the toroidal ﬂux through such surfaces as a156igure A.2: The plasma domain, V P , is bounded by a toroidal surface, S P . We make theassumption that there exists a set of toroidal magnetic surfaces, Γ( ψ ). The toroidal ﬂuxthrough each of these surfaces is deﬁned by (A.4) with S T ( ψ ) an open surface bounded by apoloidally closed curve on Γ( ψ ), ∂ S T ( ψ ).coordinate, deﬁned as, 2 πψ ≡ (cid:90) S T ( ψ ) d x B · ˆ n . (A.4)In the above expression, S T ( ψ ) is an open surface such that ∂ S T ( ψ ) is a loop on Γ( ψ ) thatcloses after one poloidal rotation (Figure A.2). The unit normal is ˆ n , often chosen to pointin the direction of increasing φ . Another choice for labeling magnetic surfaces is the poloidalﬂux function, χ , 2 πχ ≡ (cid:90) S P ( ψ ) d x B · ˆ n , (A.5)where S P ( ψ ) is an open surface such that ∂ S P ( ψ ) is a loop on Γ( ψ ) that closes after onetoroidal rotation (Figure A.3).The rotational transform quantiﬁes the number of poloidal turns of a ﬁeld line per toroidalturn, ι ≡ lim n →∞ (cid:80) nk =1 (∆ θ ) k πn . (A.6)Here (∆ θ ) k is the change in poloidal angle in toroidal rotation k and n counts the toroidalturns. If ﬂux surfaces exist, then the rotational transform can be computed from the deriva-tive of the poloidal ﬂux with respect to the toroidal ﬂux, ι ( ψ ) = χ (cid:48) ( ψ ) , (A.7)If a ﬂux label, ψ , is used as one of the coordinates, known as a ﬂux coordinate system,then the contravariant form for the magnetic ﬁeld simpliﬁes, B = B θ ∂ x ∂θ + B φ ∂ x ∂φ , (A.8)157igure A.3: The poloidal ﬂux through the magnetic surface, Γ( ψ ), is deﬁned by (A.5) with S P ( ψ ) an open surface bounded by a toroidally closed curve on Γ( ψ ), ∂ S P ( ψ ).from the assumption that B · ∇ ψ = 0. Given ∇ · B = 0 and using (A.3), we can express themagnetic ﬁeld as, B = ∇ ψ × ∇ (cid:0) θ − ι ( ψ ) φ + λ ( ψ, θ, φ ) (cid:1) , (A.9)where λ ( ψ, θ, φ ) is 2 π -periodic in θ and φ (Section 11.1 in [121]).In a ﬂux-coordinate system, the ﬂux-surface average, (cid:104) A (cid:105) ψ = (cid:82) π dθ (cid:82) π dφ √ gAV (cid:48) ( ψ ) , (A.10)appears in many calculations, where V (cid:48) ( ψ ) = (cid:90) π dθ (cid:90) π dφ √ g, (A.11)is the diﬀerential volume associated with a change in ﬂux. The ﬂux-surface average can beequivalently deﬁned as the average over the inﬁnitesimal volume between ﬂux surfaces, (cid:104) A (cid:105) ψ = lim ∆ V → V (cid:32)(cid:90) V P ( ψ )+∆ V d x A − (cid:90) V P ( ψ ) d x A (cid:33) , (A.12)where V P ( ψ ) is the volume enclosed by a surface labeled by ψ and V P ( ψ ) + ∆ V is the volumeof a neighboring surface. The ﬂux-surface average is discussed in more detail in Section 4.9of [54]. A.3 Magnetic coordinates

A ﬂux coordinate system can be deﬁned with many choices of poloidal and toroidalangles. With some choices of these angles, the contravariant expression for the magneticﬁeld can simplify further. Given (A.9), the deﬁnition of the poloidal and toroidal angles can158e shifted to ϑ and ϕ such that the magnetic ﬁeld can be expressed as, B = ∇ ψ × ∇ (cid:0) ϑ − ι ( ψ ) ϕ (cid:1) . (A.13)Such angles deﬁne a magnetic coordinate system. For example, one choice is ϑ = θ + λ ( ψ, θ, φ )and ϕ = φ . For any choice of ϕ , there is a corresponding choice of ϑ that deﬁnes a magneticcoordinate system. With this choice of angles, the magnetic ﬁeld lines are said to be straightin the ϑ − ϕ plane, dϑ ( l ) dϕ ( l ) = B · ∇ ϑ B · ∇ ϕ = ι ( ψ ) , (A.14)with a slope given by the rotational transform. Here l measures length along a ﬁeld line suchthat df /dl = ˆ b · ∇ f for any quantity f , where ˆ b = B /B is the unit vector in the directionof the magnetic ﬁeld.From the covariant form for the magnetic ﬁeld, B = B ϑ ∇ ϑ + B ϕ ∇ ϕ + B ψ ∇ ψ, (A.15)we can compute the net toroidal and poloidal currents enclosed by the surface labeled by ψ , I T ( ψ ) ≡ (cid:90) S T ( ψ ) d x J · ˆ n = 1 µ (cid:73) ∂ S T ( ψ ) d l · B = 1 µ (cid:90) π dϑ B ϑ (A.16a) I P ( ψ ) ≡ (cid:90) S P ( ψ ) d x J · ˆ n = 1 µ (cid:73) ∂ S P ( ψ ) d l · B = 1 µ (cid:90) π dϕ B ϕ , (A.16b)where S T is deﬁned in Figure A.2 and S P is deﬁned in Figure A.3. Under the additionalassumption that J · ∇ ψ = 0, which follows from MHD force balance (1.3a) with p ( ψ ), wecan write the covariant form as, B = I ( ψ ) ∇ ϑ + G ( ψ ) ∇ ϕ + K ( ψ, ϑ, ϕ ) ∇ ψ + ∇ H ( ψ, ϑ, ϕ ) , (A.17)where I ( ψ ) = µ I T ( ψ ) / (2 π ) and G ( ψ ) = µ I P ( ψ ) / (2 π ). See Section 2.5 in [97], Section 9.2in [121], and Chapter 6.5 of [54] for details. A.4 Boozer coordinates

As previously mentioned, there are many choices of magnetic coordinates correspondingto diﬀerent choices of toroidal angle, ϕ . Suppose we begin with a system deﬁned by ( ψ, ϑ, ϕ )and want to transform for a system deﬁned by ( ψ, ϑ (cid:48) , ϕ (cid:48) ). In order for the primed systemto remain a magnetic coordinate system, we must have ϕ (cid:48) = ϕ + γ ( ψ, ϑ, ϕ ) and ϑ (cid:48) = ϑ + ι ( ψ ) γ ( ψ, ϑ, ϕ ), where γ ( ψ, ϑ, ϕ ) is 2 π -periodic in ϑ and ϕ . To construct the Boozer coordinatesystem [23], we will make a particular choice for γ to simplify the covariant form for themagnetic ﬁeld (A.17). The corresponding changes to the quantities appearing in the covariant159orm (A.17) are H (cid:48) = H − (cid:0) ι ( ψ ) I ( ψ ) + G ( ψ ) (cid:1) γ ( ψ, ϑ, ϕ ) (A.18a) K (cid:48) = K + γ ( ψ, ϑ, ϕ ) (cid:0) ι ( ψ ) I (cid:48) ( ψ ) + G (cid:48) ( ψ ) (cid:1) . (A.18b)Boozer coordinates are deﬁned such that H (cid:48) = 0, or γ ( ψ, ϑ, ϕ ) = H ( ψ, ϑ, ϕ ) / ( ι ( ψ ) I ( ψ ) + G ( ψ )). With this choice of transformation, we will denote ϑ B = ϑ + ιγ and ϕ B = ϕ + γ .The covariant form becomes, B = I ( ψ ) ∇ ϑ B + G ( ψ ) ∇ ϕ B + K ( ψ, ϑ B , ϕ B ) ∇ ψ. (A.19)By dotting the covariant with the contravariant form, we obtain an expression for the Jaco-bian, √ g = 1 ∇ ψ × ∇ ϑ B · ∇ ϕ B = G ( ψ ) + ι ( ψ ) I ( ψ ) B . (A.20)We note that the Jacobian only varies on a surface through the magnetic ﬁeld strength;thus each of the contravariant and covariant components of the magnetic ﬁeld, except for K ( ψ, ϑ B , ϕ B ), possesses the same property. (The radial covariant component, K ( ψ, ϑ B , ϕ B ),is related to the ﬁeld strength through the MHD force balance equation (1.3a).) For thisreason, the Boozer coordinate system is extremely convenient for analyzing guiding centermotion and neoclassical transport, as we will in Chapter 4.160 ppendix B: Justiﬁcation for current potential In this Appendix, we justify the form for a continuous current density supported on atoroidal surface, S C , J C ( θ, φ ) = ˆ n × ∇ Φ , (B.1)where ˆ n is the unit normal vector.We consider an extension of J C in a neighborhood of S C of width ∆ b , (cid:101) J C ( b, (cid:101) θ, (cid:101) φ ) = J C ( θ, φ ) , (B.2)where we deﬁne extensions of θ and φ as, (cid:101) θ ( x ) = θ ( x − b ( x ) ∇ b ) (B.3a) (cid:101) φ ( x ) = φ ( x − b ( x ) ∇ b ) , (B.3b)or a normal projection onto S C . We consider b ∈ [ − ∆ b , ∆ b ] to be a “thickened” region ofcontinuous current density. We impose the constraint that ∇ · (cid:101) J C = 0, expressed in the( b, (cid:101) θ, (cid:101) φ ) coordinate system (Table A.1),1 √ g  ∂ (cid:16) √ g (cid:101) J C · ∇ b (cid:17) ∂b + ∂ (cid:16) √ g (cid:101) J C · ∇ (cid:101) θ (cid:17) ∂ (cid:101) θ + ∂ (cid:16) √ g (cid:101) J C · ∇ (cid:101) φ (cid:17) ∂ (cid:101) φ  = 0 , (B.4)where √ g = ∂ x /∂b · (cid:16) ∂ x /∂ (cid:101) θ × ∂ x /∂ (cid:101) φ (cid:17) By the deﬁnition of our extension, the ﬁrst term willvanish. In the limit that ∆ b →

0, the divergence-free condition is expressed as, ∇ Γ · J C ≡ √ g (cid:32) ∂ (cid:0) √ gJ θ (cid:1) ∂θ + ∂ (cid:0) √ gJ φ (cid:1) ∂φ (cid:33) = 0 , (B.5)where we have expressed the current in the contravariant basis as J C = J θ ∂ x /∂θ + J φ ∂ x /∂φ and ∇ Γ · is the surface divergence (Appendix 3 in [229]). For a continuous current density,Ampere’s law (1.3b) implies that ∇ · J = 0. Thus the equivalent condition for a currentsupported on a surface is ∇ Γ · J C = 0 [11]. The surface divergence of a vector ﬁeld tangentto a surface Γ ( A · ˆ n = 0 on Γ) deﬁned in terms of a general continuous extension, (cid:101) A in aneighborhood of Γ is, ∇ Γ · A ≡ (cid:16) ∇ · (cid:101) A (cid:17) (cid:12)(cid:12) Γ − ˆ n · (cid:16) ∇ (cid:101) A (cid:17) (cid:12)(cid:12) Γ · ˆ n . (B.6)In (B.2), we have deﬁned our extension such that ∇ b · (cid:16) ∇ (cid:101) J C (cid:17) = 0 such that the second term161n the above expression vanishes.Given (B.5), we can write, J θ = − √ g ∂ Φ( θ, φ ) ∂φ (B.7a) J φ = 1 √ g ∂ Φ( θ, φ ) ∂θ , (B.7b)where, Φ = (cid:90) dθ √ gJ φ . (B.7c)In other words, J C = ˆ n × ∇ Φ . (B.8)162 ppendix C: Adjoint derivative at ﬁxed J max We enforce J max = constant in the REGCOIL solve in order to obtain the regularizationparameter λ by requiring that the following constraint be satisﬁed within a given tolerance, G (cid:16) Ω , −→ Φ (Ω , λ ) (cid:17) = J max (cid:16) Ω , −→ Φ (Ω , λ ) (cid:17) − J targetmax = 0 . (C.1)Here J targetmax is the target maximum current density and −→ Φ is chosen to satisfy the forwardequation (3.8), −→ F (cid:16) Ω , −→ Φ , λ (cid:17) = ←→ A (Ω , λ ) −→ Φ − −→ b (Ω , λ ) = 0 . (C.2)A log-sum-exponent function is used to approximate the maximum function, similar to thatused to approximate d coil-plasma (3.24), J max ≈ J max , lse = 1 p log (cid:32) (cid:82) S C d x exp ( pJ ) A coil (cid:33) . (C.3)We compute the total diﬀerential of −→ F , d −→ F (Ω , −→ Φ , λ ) = (cid:88) m,n (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) d Ω m,n + ←→ A d −→ Φ + (cid:16) ←→ A K −→ Φ − −→ b K (cid:17) dλ = 0 . (C.4)Here ←→ A K = ∂ ←→ A /∂λ and −→ b K = ∂ −→ b /∂λ . We left multiply by ←→ A − and solve for d −→ Φ suchthat d −→ F (Ω , −→ Φ , λ ) = 0, d −→ Φ = − (cid:88) m,n ←→ A − (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) d Ω m,n − ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17) dλ. (C.5)We also compute the total diﬀerential of G , dG (Ω , −→ Φ ) = (cid:88) m,n ∂G (Ω , −→ Φ ) ∂ Ω m,n d Ω m,n + ∂G (Ω , −→ Φ ) ∂ −→ Φ · d −→ Φ = 0 . (C.6)163sing the form for d −→ Φ (C.5), we compute dλ in terms of d Ω m,n , dλ = (cid:32) ∂G (Ω , −→ Φ ) ∂ −→ Φ · (cid:20) ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17)(cid:21)(cid:33) − × (cid:88) m,n  ∂G (Ω , −→ Φ ) ∂ Ω m,n − ∂G (Ω , −→ Φ ) ∂ −→ Φ ·  ←→ A − (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) d Ω m,n . (C.7)Using (C.5) and (C.7), the derivative of −→ Φ with respect to Ω m,n subject to equations (C.1)and (C.2) is given by the following expression, ∂ −→ Φ (Ω , λ (Ω)) ∂ Ω m,n = −←→ A − (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) − ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17) ∂G (Ω , −→ Φ ) ∂ −→ Φ · (cid:20) ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17)(cid:21) ×  ∂G (Ω , −→ Φ ) ∂ Ω m,n − ∂G (Ω , −→ Φ ) ∂ −→ Φ ·  ←→ A − (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) . (C.8)Here −→ Φ is understood to be a function of Ω and λ through (C.2) and λ is understood to bea function of Ω through (C.1). We use the adjoint method to avoid solving a linear systeminvolving the operator ←→ A for each Ω m,n , ∂ −→ Φ (Ω , λ (Ω)) ∂ Ω m,n = −←→ A − (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) − ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17) ∂G (Ω , −→ Φ ) ∂ −→ Φ · (cid:20) ←→ A − (cid:16) ←→ A K −→ Φ − −→ b K (cid:17)(cid:21) ×  ∂G (Ω , −→ Φ ) ∂ Ω m,n − (cid:34)(cid:16) ←→ A T (cid:17) − ∂G (Ω , −→ Φ ) ∂ −→ Φ (cid:35) · (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) . (C.9)We introduce a new adjoint vector −→ (cid:101) q , deﬁned to be the solution of, ←→ A T −→ (cid:101) q = ∂G (Ω , −→ Φ ) ∂ −→ Φ . (C.10)Equation (C.9) is then used to compute the derivatives of χ B with respect to Ω m,n , ∂χ B (cid:16) Ω , −→ Φ (Ω , λ (Ω)) (cid:17) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n + ∂χ B (Ω , −→ Φ ) ∂ −→ Φ · ∂ −→ Φ (Ω , λ (Ω)) ∂ Ω m,n . (C.11)164his result can be written in terms of both adjoint variables, −→ q and −→ (cid:101) q , ∂χ B (cid:16) Ω , −→ Φ (Ω , λ (Ω)) (cid:17) ∂ Ω m,n = ∂χ B (Ω , −→ Φ ) ∂ Ω m,n − −→ q · (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) − −→ q · (cid:16) ←→ A K −→ Φ − −→ b K (cid:17) −→ (cid:101) q · (cid:16) ←→ A K −→ Φ − −→ b K (cid:17)  ∂G (Ω , −→ Φ ) ∂ Ω m,n − −→ (cid:101) q · (cid:32) ∂ ←→ A (Ω , λ ) ∂ Ω m,n −→ Φ − ∂ −→ b (Ω , λ ) ∂ Ω m,n (cid:33) . (C.12)The same method is used to compute derivatives of (cid:107) J (cid:107) . So, to obtain the derivatives atﬁxed J max , we compute a solution to the two adjoint equations, (3.22) and (C.10), in additionto the forward equation, (3.8). 165 ppendix D: Trajectory models In the SFINCS coordinate system, the DKE can be written in the following way,˙ x · ∇ f s + ˙ X s ∂f s ∂X s + ˙ ξ s ∂f s ∂ξ s − C s ( f s ) = − ( v m s · ∇ ψ ) ∂f Ms ∂ψ . (D.1)To obtain the trajectory coeﬃcients ( ˙ x , ˙ X s , and ˙ ξ s ) several approximations are made. Forexample, any terms that require radial coupling ( ψ derivatives of f s ) cannot be retained, asthis would necessitate solving a ﬁve-dimensional system.Under the full trajectory model, the trajectory coeﬃcients are chosen such that µ con-servation is maintained as radial coupling is dropped,˙ x = v || ˆ b + Φ (cid:48) ( ψ ) B B × ∇ ψ (D.2a)˙ X s = − ( v m s · ∇ ψ ) q s T s X s Φ (cid:48) ( ψ ) (D.2b)˙ ξ s = − − ξ s Bξ s v || ˆ b · ∇ B + ξ s (1 − ξ s ) 12 B Φ (cid:48) ( ψ ) B × ∇ ψ · ∇ B. (D.2c)Under the DKES trajectory model, the E × B velocity is taken to be divergenceless, v DKES E = B × ∇ Φ (cid:104) B (cid:105) ψ , (D.3)where the ﬂux surface average of a quantity is (4.8). Under the DKES trajectory model, thetrajectory coeﬃcients are taken to be,˙ x = v || ˆ b + 1 (cid:104) B (cid:105) ψ Φ (cid:48) ( ψ ) B × ∇ ψ (D.4a)˙ X s = 0 (D.4b)˙ ξ s = − − ξ s Bξ s v || ˆ b · ∇ B. (D.4c)These eﬀective trajectories are adopted in the widely-used DKES code [113, 230].166 ppendix E: Adjoint collision operator We want to ﬁnd an adjoint collision operator, C † s , that satisﬁes the following relation, (cid:28)(cid:90) d v g s C s ( f s ) f Ms (cid:29) ψ = (cid:42)(cid:90) d v f s C † s ( g s ) f Ms (cid:43) ψ . (E.1)The linearized Fokker-Planck collision operator can be written as, C s ( f s ) = (cid:88) s (cid:48) C Lss (cid:48) ( f s , f s (cid:48) ) = (cid:88) s (cid:48) C ss (cid:48) ( f s , f Ms (cid:48) ) + C ss (cid:48) ( f Ms , f s (cid:48) ) , (E.2)where s (cid:48) sums over species. The ﬁrst term on the right hand side of (E.2) is referred to as thetest-particle collision operator, C Tss (cid:48) ( f s ) = C ss (cid:48) ( f s , f Ms (cid:48) ), and the second the ﬁeld-particlecollision operator, C Fss (cid:48) ( f s (cid:48) ) = C ss (cid:48) ( f Ms , f s (cid:48) ). The test and ﬁeld terms satisfy the followingrelations [198, 221], (cid:90) d v g s C ss (cid:48) ( f s , f Ms (cid:48) ) f Ms = (cid:90) d v f s C ss (cid:48) ( g s , f Ms (cid:48) ) f Ms (E.3a) (cid:90) d v g s C ss (cid:48) ( f Ms , f s (cid:48) ) f Ms = T s (cid:48) T s (cid:90) d v f s (cid:48) C s (cid:48) s ( f Ms (cid:48) , g s ) f Ms (cid:48) . (E.3b)For collisions between species of the same temperature, we see that C s ( f s ) is self-adjoint.The adjoint operator with respect to the inner product (4.14) is thus, C † s = C Ts + (cid:88) s (cid:48) f Ms f Ms (cid:48) T s (cid:48) T s C Fs (cid:48) s . (E.4)167 ppendix F: Adjoint collisionless trajectories We want to ﬁnd an adjoint operator, L † s , that satisﬁes, (cid:28)(cid:90) d v g s L s f s f Ms (cid:29) ψ = (cid:42)(cid:90) d v f s L † s g s f Ms (cid:43) ψ , (F.1)for both trajectory models, where L s is deﬁned in (4.10) with (D.4) for the DKES trajectoriesmodel and (D.2) for the full trajectory model. Throughout we use the velocity space elementin SFINCS coordinates, d v = 2 πv ts X s dξ s dX s . F.0.1 DKES trajectories

The operator under consideration is, L s = v || ˆ b · ∇ + ˆ v DKES E · ∇ − − ξ s Bξ s v || ˆ b · ∇ B ∂∂ξ s . (F.2)Considering the contribution of the streaming term in (F.2) to the left hand side of (F.1) weobtain, (cid:42)(cid:90) d v g s v || ˆ b · ∇ f s f Ms (cid:43) ψ = − (cid:42)(cid:90) d v f s v || B · ∇ (cid:0) g s /B (cid:1) f Ms (cid:43) ψ . (F.3)Here the identity (cid:104)∇ · Q (cid:105) ψ = 1 /V (cid:48) ( ψ ) ∂/∂ψ (cid:0) V (cid:48) ( ψ ) (cid:104) Q · ∇ ψ (cid:105) ψ (cid:1) for any vector Q has beenused. We next consider the contribution of the E × B drift term in (F.2), (cid:42)(cid:90) d v g s v DKES E · ∇ f s f Ms (cid:43) ψ = − (cid:42)(cid:90) d v f s v DKES E · ∇ g s f Ms (cid:43) ψ . (F.4)Here we have used the identity, (cid:104) B × ∇ ψ · ∇ w (cid:105) ψ = 0 , (F.5)for any w . We consider the contribution of the mirror-force term in (F.2), (cid:42)(cid:90) d v g s ˙ ξ s f Ms ∂f s ∂ξ s (cid:43) ψ = − (cid:42)(cid:90) d v f s ˙ ξ s f Ms ∂g s ∂ξ s (cid:43) ψ − (cid:28)(cid:90) d v v || B ˆ b · ∇ B g s f s f Ms (cid:29) ψ . (F.6)168ombining (F.3-F.6), we obtain (cid:28)(cid:90) d v g s L s f s f Ms (cid:29) ψ = − (cid:28)(cid:90) d v f s L s g s f Ms (cid:29) ψ . (F.7)Therefore, in the DKES trajectory model we obtain (4.27). F.0.2 Full trajectories

The operator under consideration for the full model is, L s = v || ˆ b · ∇ + v E · ∇ + (1 + ξ s ) X s B v E · ∇ B ∂∂X s − − ξ s Bξ s v || ˆ b · ∇ B ∂∂ξ s + ξ s (1 − ξ s )2 B v E · ∇ B ∂∂ξ s . (F.8)The contribution to (F.1) from the streaming term in (F.8) is identical to that in the case ofthe DKES trajectory model, (F.3). We next consider the contribution from the E × B driftterm in (F.8), (cid:28)(cid:90) d v g s v E · ∇ f s f Ms (cid:29) ψ = − (cid:42)(cid:90) d v f s B v E · ∇ (cid:0) g s /B (cid:1) f Ms (cid:43) ψ , (F.9)again using (F.5). The contribution from the ˙ X s term in (F.8) is, (cid:42)(cid:90) d v g s ˙ X s f Ms ∂f s ∂X s (cid:43) ψ = − (cid:42)(cid:90) d v f s ˙ X s f Ms ∂g s ∂X s (cid:43) ψ − (cid:28)(cid:90) d v (3 + 2 X s )(1 + ξ s ) g s f s f Ms B v E · ∇ B (cid:29) ψ . (F.10)The contribution from the mirror term in (F.8) is the same as in the case of the DKEStrajectories model (F.6). We consider the contribution from the ﬁnal term in (F.8), (cid:42)(cid:90) d v g s ξ s (1 − ξ s ) v E · ∇ B Bf Ms ∂f s ∂ξ s (cid:43) ψ = − (cid:42)(cid:90) d v f s ξ s (1 − ξ s ) v E · ∇ B Bf Ms ∂g s ∂ξ s (cid:43) ψ − (cid:28)(cid:90) d v (1 − ξ s ) v E · ∇ B f s g s Bf M (cid:29) ψ . (F.11)Combining (F.3), (F.9), (F.10), (F.6), and (F.11), we obtain (cid:28)(cid:90) d v g s L s f s f Ms (cid:29) ψ = − (cid:28)(cid:90) d v f s L s g s f Ms (cid:29) ψ + Φ (cid:48) ( ψ ) q s T s (cid:28)(cid:90) d v ( v m s · ∇ ψ ) f s g s f Ms (cid:29) ψ . (F.12)Therefore, under the full trajectory model we obtain (4.28).169 ppendix G: Symmetry of the sensitivity function In this Appendix we discuss several symmetry properties of the local sensitivity function, S R , deﬁned through (4.41). The arguments that follow are similar to those in Appendix Cof [138]. Throughout we will assume that B is stellarator symmetric and N P symmetric. Wewill show that this implies N P symmetry of S R . In the limit that E r →

0, then S R also hasstellarator symmetry. G.0.1 Symmetry of S R implied by Fourier derivatives First we would like to show that S R is stellarator symmetric if and only if ∂ R /∂B sm,n = 0for all m and n , where we express B in a Fourier series, B = (cid:88) m,n B cm,n cos( mϑ B − nϕ B ) + B sm,n sin( mϑ B − nϕ B ) . (G.1)The perturbation, δB , is decomposed similarly. We begin with the “if” portion of theargument. From (4.41) we have, ∂ R ∂B sm,n = V (cid:48) ( ψ ) − (cid:90) π dϑ B (cid:90) π dϕ B √ gS R sin( mϑ B − nϕ B ) . (G.2)Suppose ∂ R /∂B sm,n = 0 for all m and n . The quantity ( √ gS R ) can be represented as aFourier series, (cid:0) √ gS R (cid:1) = (cid:88) m,n A cm,n cos( mϑ B − nϕ B ) + A sm,n sin( mϑ B − nϕ B ) . (G.3)From (G.2), we see that A sm,n = 0 for all m and m . Thus the quantity ( √ gS R ) must beeven under the transformation ( ϑ B , ϕ B ) → ( − ϑ B , − ϕ B ). We now note that √ g must be evenfrom (4.37) under the assumption that B is stellarator symmetric. Therefore S R must bestellarator symmetric, assuming that √ g does not vanish anywhere, which must be the casefor any well-deﬁned coordinate transformation.We continue with the “only if” portion of the argument. Suppose S R is stellaratorsymmetric. As √ g is also stellarator symmetric, ( √ gS R ) can be expressed in a Fourier seriesas (G.3) with A sm,n = 0 for all m and n . Thus from (G.2) ∂ R /∂B sm,n = 0 for all m and n .We next show that if B is N P symmetric, then S R is N P symmetric if and only if ∂ R /∂B cm,n = 0 for all n that are not integer multiples of N P . We begin with the “if” portion170f the argument. From (4.41), ∂ R ∂B cm,n = V (cid:48) ( ψ ) − (cid:90) π dϑ B (cid:90) π dϕ B √ gS R cos( mϑ B − nϕ B ) . (G.4)Suppose ∂ R /∂B cm,n = 0 for all n which are not integer multiples of N P . Here ( √ gS R ) can beexpressed in a Fourier series as (G.3) with A sm,n = 0 for all m and n . Inserting the Fourierseries into (G.4), we ﬁnd that A cm,n = 0 for all n that are not integer multiples of N P . Thus( √ gS R ) must be N P symmetric. As √ g must be N P symmetric, this implies S R possessesthe same symmetry.Next we consider the “only if” portion of the argument. Suppose that S R is N P symmet-ric. As √ g is also N P symmetric, then ( √ gS R ) can be expressed in a Fourier series as (G.3)where the sum includes n that are integer multiples of N P . Inserting the Fourier series into(G.4), we ﬁnd that ∂ R /∂B cm,n = 0 for all n that are not integer multiples of N P . G.0.2 Symmetry of Fourier derivatives

To continue, we need to show that ∂ R /∂B sm,n = 0 for all m and n and ∂ R /∂B cm,n = 0 forall n which are not integer multiples of N P . We begin with the N P symmetry argument. Weconsider the symmetry of f s implied by (D.1). Under the transformation ϕ B → ϕ B +2 π/N P ,we ﬁnd that each of the trajectory coeﬃcients remain unchanged, as well as the source termand collision operator. Therefore we can conclude that f s is N P symmetric. We can alsonote that each of the (cid:101) R vectors are N P symmetric, as well as √ g . We consider the integrandthat appears in the ﬂux surface average in (4.16), D s ( ϑ B , ϕ B ) = (cid:90) d v f s (cid:101) R fs √ gf Ms . (G.5)Here the superscript and subscript on (cid:101) R denotes that we consider the unknowns correspond-ing to the distribution function of species s . We note that D s ( ϑ B , ϕ B +2 π/N P ) = D s ( ϑ B , ϕ B ).The quantity R can be expressed in terms of D s as follows, R = (cid:88) s V (cid:48) ( ψ ) − (cid:90) π dϕ B (cid:90) π dϑ B D s . (G.6)Next we consider the functional derivative of R with respect to B , deﬁned as in (4.40). Thederivative with respect to B cm,n can be thus deﬁned as, ∂ R ∂B cm,n = V (cid:48) ( ψ ) − (cid:90) π dϕ B (cid:90) π dϑ B (cid:32)(cid:88) s δD s δB − R δ √ gδB (cid:33) cos( mϑ B − nϕ B ) . (G.7)As the functional derivative maintains the N P symmetry of D s and √ g , the quantity inparenthesis in (G.7) can be expressed in a Fourier series containing only n that are integermultiples of N P . Thus we see that the quantity ∂ R /∂B cm,n = 0 for all n that are not integermultiples of N P .Next we consider a similar argument for stellarator symmetry. We begin by consider-ing the symmetry of f s implied by (D.1) in the case E r = 0. Under the transformation171 ϑ B , ϕ B , v || ) → ( − ϑ B , − ϕ B , − v || ), we see that both the collisionless trajectory operator andthe collision operator maintain the parity of f s , while the source term is odd. Therefore, f s must be odd under this transformation. In this case, we can write f s as, f s = f − a,s ( X s , ξ s ) f + b,s ( ϑ B , ϕ B ) + f + a,s ( X s , ξ s ) f − b,s ( ϑ B , ϕ B ) , (G.8)where f − a,s ( X s , − ξ s ) = − f − a,s ( X s , ξ s ), f + a,s ( X s , − ξ s ) = f + a,s ( X s , − ξ s ), and analogous expressionsfor f + b,s and f − b,s .We next note that each of the (cid:101) R fs are odd under the transformation ( ϑ B , ϕ B , v || ) → ( − ϑ B , − ϕ B , − v || ). As √ g is even, then we can express (cid:101) R fs √ g in a similar way to (G.8), (cid:101) R fs √ g = B − a,s ( X s , ξ s ) B + b,s ( ϑ B , ϕ B ) + B + a,s ( X s , ξ s ) B − b,s ( ϑ B , ϕ B ) . (G.9)The integrand that appears in the ﬂux surface average becomes, D s = (cid:90) d v f − Ms (cid:18) f − a,s ( X s , ξ s ) B − a,s ( X s , ξ s ) f + b,s ( ϑ B , ϕ B ) B + b,s ( ϑ B , ϕ B )+ f + a,s ( X s , ξ s ) B + a,s ( X s , ξ s ) f − b,s ( ϑ B , ϕ B ) B − b,s ( ϑ B , ϕ B ) (cid:19) . (G.10)We see that D s is even with respect to the transformation ( ϑ B , ϕ B ) → ( − ϑ B , − ϕ B ). Thequantity R can be written as in (G.6) and the derivative with respect to a stellarator asym-metric mode is ∂ R ∂B sm,n = V (cid:48) ( ψ ) − (cid:90) π dϕ B (cid:90) π dϑ B (cid:32)(cid:88) s δD s δB − R δ √ gδB (cid:33) sin( mϑ B − nϕ B ) . (G.11)The functional derivative with respect to B does not change the parity of D s or √ g , thuswe see that the quantity in parenthesis in the above equation is even with respect to thetransformation ( ϑ B , ϕ B ) → ( − ϑ B , − ϕ B ). Therefore, ∂ R /∂B sm,n = 0 for all m and n . Asimilar argument cannot be made if E r (cid:54) = 0, as the inhomogeneous drive term in (D.1)no longer has deﬁnite parity. However, according to the arguments in [112] the transportcoeﬃcients do obey this symmetry property.172 ppendix H: Derivatives at ambipolarity In this Appendix, we derive an expression for derivatives of moments of the distributionfunction at ﬁxed ambipolarity rather than ﬁxed E r by determining the relationship betweengeometry parameters, Ω, and E r . We begin by assuming that the continuous adjoint ap-proach outlined in Section 4.3.1 is used. The approach taken here is analogous to that usedin Appendix C, in which an additional adjoint equation is used to compute derivatives at aﬁxed constraint function for optimization of stellarator coil shapes.Consider the set of unknowns computed with SFINCS, F , which depends on parametersΩ and E r . The total diﬀerential of F satisﬁes, L dF (Ω , E r ) = (cid:18) ∂ S (Ω , E r ) ∂E r − ∂ L (Ω , E r ) ∂E r F (cid:19) dE r + N Ω (cid:88) i =1 (cid:18) ∂ S (Ω , E r ) ∂ Ω i − ∂ L (Ω , E r ) ∂ Ω i F (cid:19) d Ω i , (H.1)which follows from (4.13). Consider J r (Ω , F ), which depends on E r through F . The totaldiﬀerential of J r can be computed, dJ r (Ω , F (Ω , E r )) = N Ω (cid:88) i =1 ∂J r (Ω , F ) ∂ Ω i d Ω i + (cid:68) (cid:101) J r , dF (Ω , E r ) (cid:69) , (H.2)which can be written using (H.1) and the solution to (4.50), dJ r (Ω , F (Ω , E r )) = (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂E r F − ∂ S (Ω , E r ) ∂E r (cid:19)(cid:43) dE r + N Ω (cid:88) i =1  ∂J r (Ω , F ) ∂ Ω i + (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂ Ω i F − ∂ S (Ω , E r ) ∂ Ω i (cid:19)(cid:43) d Ω i . (H.3)By enforcing dJ r (Ω , F (Ω , E r )) = 0, we obtain the relationship between E r and Ω at ambipo-173arity, ∂E r (Ω) ∂ Ω i (cid:12)(cid:12)(cid:12)(cid:12) dJ r =0 = − (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂E r F − ∂ S (Ω , E r ) ∂E r (cid:19)(cid:43) −  ∂J r (Ω , F ) ∂ Ω i + (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂ Ω i F − ∂ S (Ω , E r ) ∂ Ω i (cid:19)(cid:43) . (H.4)Consider a moment of the distribution function, R (Ω , F (Ω , E r )). The derivative with respectto Ω i at ﬁxed ambipolarity can thus be computed, ∂ R (Ω , F (Ω , E r (Ω)) ∂ Ω i = ∂ R (Ω , F ) ∂ Ω i + (cid:28) (cid:101) R , ∂F (Ω , E r (Ω)) ∂ Ω i (cid:29) , (H.5)where E r is viewed as a function of Ω through (H.4). The ﬁrst term corresponds to the explicitdependence on Ω i , while the second contains dependence through F . Here ∂F (Ω , E r (Ω)) /∂ Ω i satisﬁes, L ∂F (Ω , E r (Ω)) ∂ Ω i = (cid:18) ∂ S (Ω , E r ) ∂ Ω i − ∂ L (Ω , E r ) ∂ Ω i F (cid:19) − (cid:18) ∂ S (Ω , E r ) ∂E r − ∂ L (Ω , E r ) ∂E r F (cid:19) (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂E r F − ∂ S (Ω , E r ) ∂E r (cid:19)(cid:43) − ×  ∂J r (Ω , F ) ∂ Ω i + (cid:42) λ J r , (cid:18) ∂ L (Ω , E r ) ∂ Ω i F − ∂ S (Ω , E r ) ∂ Ω i (cid:19)(cid:43) , (H.6)from (H.1) using (H.4). Using (H.6) and (4.23), we ﬁnd ∂ R (Ω , F (Ω , E r (Ω)) ∂ Ω i = ∂ R (Ω , F ) ∂ Ω i + (cid:42) λ R , (cid:18) ∂ L (Ω , E r ) ∂ Ω i F − ∂ S (Ω , E r ) ∂ Ω i (cid:19)(cid:43) − (cid:42) λ R , (cid:18) ∂ L (Ω , E r ) ∂E r F − ∂ S (Ω , E r ) ∂E r (cid:19)(cid:43) × (cid:32) ∂J r (Ω ,F ) ∂ Ω i + (cid:28) λ J r , (cid:16) ∂ L (Ω ,E r ) ∂ Ω i F − ∂ S (Ω ,E r ) ∂ Ω i (cid:17)(cid:29)(cid:33)(cid:28) λ J r , (cid:16) ∂ L (Ω ,E r ) ∂E r F − ∂ S (Ω ,E r ) ∂E r (cid:17)(cid:29) . (H.7)174n analogous expression can be obtained using the discrete approach, ∂ R (cid:16) Ω , −→ F (cid:0) Ω , E r (Ω) (cid:1)(cid:17) ∂ Ω i = ∂ R (cid:16) Ω , −→ F (cid:17) ∂ Ω i + (cid:42) −→ λ R , (cid:32) ∂ −→ S (Ω , E r ) ∂ Ω i − ∂ ←→ L (Ω , E r ) ∂ Ω i −→ F (cid:33)(cid:43) − (cid:42) −→ λ R , (cid:32) ∂ −→ S (Ω , E r ) ∂E r − ∂ ←→ L (Ω , E r ) ∂E r −→ F (cid:33)(cid:43) ×  ∂J r (cid:16) Ω , −→ F (cid:17) ∂ Ω i + (cid:42) −→ λ J r , (cid:18) ∂ −→ S (Ω ,E r ) ∂ Ω i − ∂ ←→ L (Ω ,E r ) ∂ Ω i −→ F (cid:19)(cid:43)(cid:42) −→ λ J r , (cid:18) ∂ −→ S (Ω ,E r ) ∂E r − ∂ ←→ L (Ω ,E r ) ∂E r −→ F (cid:19)(cid:43) , (H.8)where (4.51) has been used. 175 ppendix I: Derivation of generalized MHD self-adjointness relation The quantity U P = U P + U P consists of two terms, accounting for changes to the vectorpotential due to MHD perturbations, U P = (cid:90) V P d x ( δ J · ξ × B − δ J · ξ × B ) , (I.1)and changes to the rotational transform, U P = (cid:90) V P d x (cid:0) δχ ( ψ ) δ J · ∇ ϕ − δχ ( ψ ) δ J · ∇ ϕ (cid:1) . (I.2)The quantity U P can be expressed by using (5.26) and applying the divergence theorem tothe pressure gradient terms, U P = (cid:90) V P d x ξ · (cid:0) J × δ B + ∇ p ( ∇ · ξ ) − F (cid:1) − (cid:90) V P d x ξ · (cid:0) J × δ B + ∇ p ( ∇ · ξ ) − F (cid:1) . (I.3)We will deﬁne δ (cid:101) B , = ∇ × (cid:0) ξ , × B (cid:1) such that δ B , = δ (cid:101) B , − ∇ δχ , ( ψ ) × ∇ ϕ . The termsin (I.3) due to δ (cid:101) B , can be evaluated using J = J || ˆ b + ˆ b × ∇ p/B and (5.10), (cid:90) V P d x (cid:16) ξ · J × δ (cid:101) B − ξ · J × δ (cid:101) B (cid:17) = (cid:90) V P d x J || B ∇ · (cid:0) ( ξ × B ) × ( ξ × B ) (cid:1) + (cid:90) V P d x B (cid:16) ( ξ · ∇ p ) ˆ b · δ (cid:101) B − ( ξ · ∇ p ) ˆ b · δ (cid:101) B (cid:17) . (I.4)The ﬁrst term in (I.4) can be simpliﬁed using ∇ · J = 0 and noting that the perturbationcan be written as ξ , = ξ ψ , ∇ ψ + ξ ⊥ , ˆ b × ∇ ψ . Applying the identity B · δ (cid:101) B , = − B ∇ · ξ , − ξ , · ∇ B − µ ξ , · ∇ p to the second term, the following expression can be obtained, (cid:90) V P d x (cid:16) ξ · J × δ (cid:101) B − ξ · J × δ (cid:101) B (cid:17) = (cid:90) V P d x (cid:0) ( ∇ · ξ ) ξ · ∇ p − ( ∇ · ξ ) ξ · ∇ p (cid:1) . (I.5)176ence we obtain the following expression for U P , U P = (cid:90) V P d x ( − ξ · F + ξ · F ) − (cid:90) V P d x (cid:0) δχ (cid:48) ( ψ ) ξ · ∇ ψ − δχ (cid:48) ( ψ ) ξ · ∇ ψ (cid:1) J · ∇ ϕ. (I.6)We now consider U P deﬁned in (I.2). Applying (5.24) for the change in toroidal current,integrating by parts in ψ , and combining the expressions for U P (I.3) and U P (I.2), weobtain, U P = (cid:90) V P d x ( − ξ · F + ξ · F ) + 2 π (cid:90) V P dψ (cid:16) δχ ( ψ ) δI (cid:48) T, ( ψ ) − δχ ( ψ ) δI (cid:48) T, ( ψ ) (cid:17) − (cid:90) S P d x (cid:0) δχ ( ψ ) ξ − δχ ( ψ ) ξ (cid:1) · ˆ nJ · ∇ ϕ. (I.7)Next we combine U P (I.7) with U B (5.31) and U C (5.32) to obtain the free-boundary adjointrelation (5.33).To obtain the ﬁxed-boundary adjoint relation, the integral over the plasma volume (5.29)can be related to a surface integral by applying the divergence theorem to arrive at (5.35).Using (5.19) and applying several vector identities, U P = − µ (cid:90) S P d x ˆ n · ( ξ δ B − ξ δ B ) · B − µ (cid:90) S P d x (cid:0) δχ ( ψ ) δ B − δχ ( ψ ) δ B (cid:1) · ∇ ϕ × ˆ n . (I.8)Using (I.7) and expressing the second term in (I.8) as a perturbed current using (5.24), theﬁxed boundary adjoint relation (5.36) is obtained.177 ppendix J: Alternate derivation of ﬁxed-boundary adjoint relation The MHD force operator, F [ ξ , ] = J × (cid:16) ∇ × (cid:0) ξ , × B (cid:1)(cid:17) + ∇ × (cid:16) ∇ × (cid:0) ξ , × B (cid:1)(cid:17) × B µ + ∇ (cid:0) ξ , · ∇ p (cid:1) , (J.1)possesses the following self-adjointness property [20, 83], (cid:90) V P d x (cid:0) ξ · F [ ξ ] − ξ · F [ ξ ] (cid:1) = 1 µ (cid:90) S P d x ˆ n · (cid:16) ξ B · δ (cid:101) B − ξ B · δ (cid:101) B (cid:17) , (J.2)where δ (cid:101) B , = ∇ × (cid:0) ξ , × B (cid:1) is the perturbed ﬁeld corresponding to the MHD perturba-tions. As we consider linearized equilibrium states that preserve p ( ψ ), the perturbed pressuresatisﬁes δp ( ψ ) = − ξ · ∇ p . The force operator we adopt (J.1) is the γ → ∇ ( γp ∇ · ξ ).For perturbations described by (5.19), (5.20) and (5.23) to (5.26), the force operatorsatisﬁes, F [ ξ , ] = J × (cid:0) ∇ δχ , ( ψ ) × ∇ ϕ (cid:1) + ∇ × (cid:0) ∇ δχ , ( ψ ) × ∇ ϕ (cid:1) × B µ − δ F , . (J.3)Using (J.3) and several vector identities, the left hand side of (J.2) can be written as (cid:90) V P d x (cid:0) ξ · F [ ξ ] − ξ · F [ ξ ] (cid:1) = (cid:90) V P d x (cid:0) δχ (cid:48) ( ψ ) ξ − δχ (cid:48) ( ψ ) ξ (cid:1) · ∇ ψ J · ∇ ϕ − µ (cid:90) V P d x ∇ ψ × ∇ ϕ · (cid:16) δχ (cid:48) ( ψ ) δ (cid:101) B − δχ (cid:48) ( ψ ) δ (cid:101) B (cid:17) − µ (cid:90) S P d x (cid:0) ξ δχ (cid:48) ( ψ ) − ξ δχ (cid:48) ( ψ ) (cid:1) · ˆ n ( ∇ ψ × ∇ ϕ · B ) − (cid:90) V P d x ( ξ · δ F − ξ · δ F ) . (J.4)In arriving at (J.4), we use J · ∇ ψ = 0, which follow from MHD force balance (5.10). Using1785.24) to re-express the ﬁrst two terms on the right-hand side, (cid:90) V P d x (cid:0) ξ · F [ ξ ] − ξ · F [ ξ ] (cid:1) = 2 π (cid:90) V P dψ (cid:0) δI T, ( ψ ) δχ (cid:48) ( ψ ) − δI T, ( ψ ) δχ (cid:48) ( ψ ) (cid:1) − µ (cid:90) S P d x (cid:0) ξ δχ (cid:48) ( ψ ) − ξ δχ (cid:48) ( ψ ) (cid:1) · ˆ n ( ∇ ψ × ∇ ϕ · B ) − (cid:90) V P d x ( ξ · δ F − ξ · δ F ) . (J.5)Using (5.19) and (J.2) we obtain (5.36). 179 ppendix K: Interpretation of the displacement vector For MHD perturbations such that δ B = ∇ × ( ξ × B ) the displacement can be interpretedas a vector describing the motion of a ﬁeld lines. Thus a normal perturbation to the surfaceof the plasma as in (5.4) can be expressed in terms of the displacement vector, δf ( S P ; ξ ) = (cid:90) S P d x G ξ · ˆ n . (K.1)For perturbations that allow for changes in the rotational transform it remains to be shownthat a similar relation can be found.As we require that ψ remain a ﬂux surface label in the perturbed equilibrium, the La-grangian perturbation to ψ at ﬁxed position is δψ = − δ x · ∇ ψ. (K.2)The perturbed magnetic ﬁeld, B (cid:48) = B + δ B must remain tangent to ψ (cid:48) = ψ + δψ surfaces;thus to ﬁrst order in the perturbation,0 = B (cid:48) · ∇ ψ (cid:48) = B · ∇ δψ + δ B · ∇ ψ. (K.3)Applying the form for the perturbed ﬁeld allowing for changes in the rotational transform, δ B = ∇ × (cid:0) ξ × B − δχ ( ψ ) ∇ ϕ (cid:1) , and using several vector identities, the following conditionis obtained B · ∇ ( δ x · ∇ ψ ) = B · ∇ ( ξ · ∇ ψ ) . (K.4)This implies that δ x · ∇ ψ = ξ · ∇ ψ + F ( ψ ), where F ( ψ ) is some ﬂux function which canbe determined by requiring that the perturbation to the toroidal ﬂux as a function of ψ vanishes, δ Ψ T ( ψ ) = 0.The perturbed toroidal ﬂux through a surface labeled by ψ contains two terms, corre-sponding to the ﬂux of the unperturbed ﬁeld through the perturbed surface and the perturbedﬁeld through the unperturbed surface, δ Ψ T ( ψ ) = (cid:90) ∂S T ( ψ ) dϑ √ gδ x · ∇ ψ B · ∇ ϕ + (cid:90) S T ( ψ ) dψdϑ √ gδ B · ∇ ϕ. (K.5)Using the form for δ B , applying the divergence theorem, and noting that B · ∇ ϕ = √ g − ,the following condition is obtained, δ Ψ T ( ψ ) = (cid:90) π dϑ ( δ x · ∇ ψ − ξ · ∇ ψ ) . (K.6)By requiring that δ Ψ T ( ψ ) = 0, we ﬁnd that F ( ψ ) = 0. Thus we can express shape gradients180n the form of (K.1) even when the rotational transform is allowed to vary.181 ppendix L: Details of axis ripple calculation In this Appendix, we compute the shape derivative of the ﬁnite-pressure magnetic wellﬁgure of merit from (5.101) and show that if we impose an adjoint perturbation of the form(5.102), the shape gradient is given by (5.106).We use the expression for the perturbation to the ﬁeld strength (5.62) and δψ = − ξ · ∇ ψ with (5.101) to obtain, δf R ( S P ; ξ ) = (cid:90) S P d x ξ · ˆ n (cid:102) f R − (cid:90) V P d x ∂ (cid:102) f R ∂ψ ξ · ∇ ψ − (cid:90) V P d x ∂ (cid:102) f R ∂B B (cid:16) B ∇ · ξ + ξ · ∇ (cid:0) B + µ p (cid:1) + δχ (cid:48) ( ψ ) B · ( ∇ ψ × ∇ ϕ ) (cid:17) . (L.1)The third term can be integrated by parts to obtain, δf R ( S P ; ξ ) = (cid:90) S P d x ξ · ˆ n (cid:32)(cid:102) f R − ∂ (cid:102) f R ∂B B (cid:33) + (cid:90) V P d x (cid:32) ∂ (cid:102) f R ∂B∂ψ B − ∂ (cid:102) f R ∂ψ (cid:33) ξ · ∇ ψ + (cid:90) V P d x (cid:32) − ∂ (cid:102) f R ∂B B ξ · κ + B ∂ (cid:102) f R ∂B ξ · ∇ B + δχ (cid:48) ( ψ ) ∂ (cid:102) f R ∂B ˆ b · ( ∇ ϕ × ∇ ψ ) (cid:33) , (L.2)where the expression for the curvature in an equilibrium ﬁeld (5.105) has been applied.We compute one term that appears in the ﬁxed-boundary adjoint relation (5.36) usingthe prescribed adjoint bulk force perturbation (5.102a), (cid:90) V P d x ξ · F = (cid:90) V P d x (cid:32) − ∂ p || ∂B∂ψ B + ∂p || ∂ψ (cid:33) ξ · ∇ ψ + (cid:90) V P d x (cid:32) ∂p || ∂B B ξ · κ − B ∂ p || ∂B ξ · ∇ B (cid:33) , (L.3)where we have applied the parallel force balance condition (5.103). Therefore, if we impose182 || = (cid:102) f R , we obtain the following expression for the shape derivative of f R , δf R ( S P ; ξ ) = (cid:90) S P d x ξ · ˆ n (cid:32)(cid:102) f R − ∂ (cid:102) f R ∂B B (cid:33) − (cid:90) V P d x ξ · F + (cid:90) V P d x δχ (cid:48) ( ψ ) ∂ (cid:102) f R ∂B ˆ b · ( ∇ ϕ × ∇ ψ ) . (L.4)Upon application of the ﬁxed-boundary adjoint relation we obtain (5.106) with (5.102).183 ppendix M: Details of eﬀective ripple in the 1 /ν regime calculation Neoclassical transport in the 1 /ν collisionality regime is discussed in many referencesincluding [65], [42], and [116]. In this Appendix we sketch the computation of (cid:15) / originallyintroduced in [168] and compute linear perturbations of f (cid:15) (5.112), showing them to takethe form of (5.113).In the 1 /ν regime, the distribution function is ordered in the parameter ν ∗ = ν/ ( v t /L ) (cid:28)

1, where ν is the collision frequency, the thermal speed is v t = (cid:112) T /m for mass m andtemperature T , and L is a macroscopic scale length, f = f − + f + O ( ν ∗ ) . (M.1)In velocity space we use a pitch angle coordinate λ = v ⊥ / ( v B ), energy coordinate (cid:15) = v / σ = sign( v || ), where v ⊥ = (cid:113) v − v || is the perpendicular velocity and v || = v · ˆ b is theparallel velocity. We use the ﬁeld line label, α , and length along a ﬁeld line, l , to describelocation on a constant ψ surface. In the 1 /ν regime the E × B precession frequency isassumed to be small relative to the collision frequency, so the drift kinetic equation (4.2)becomes, v || ∂f ∂l = C ( f ) − v m · ∇ ψ ∂f ∂ψ , (M.2)where the Maxwellian with density n is, f = nπ − / v − t e − v /v t , (M.3)and the radial magnetic drift is, v m · ∇ ψ = ( v + v || ) m qB ∇ ψ × B · ∇ B, (M.4)for charge q . The drift kinetic equation to O ( ν − ∗ ) is, v || ∂f − ∂l = 0 . (M.5)In the trapped portion of phase space, this implies that f − = f − ( ψ, α, (cid:15), λ ), and in thepassing portion of phase space, this implies that f − = f − ( ψ, (cid:15), λ, σ ). The drift kineticequation to O ( ν ∗ ) is, v || ∂f ∂l = C ( f − ) − v m · ∇ ψ ∂f ∂ψ . (M.6)In the passing region, this implies that f − is a Maxwellian, so it can be taken to vanish.184e employ a pitch-angle scattering operator, C = 2 ν ( (cid:15) ) v || B(cid:15) ∂∂λ (cid:18) λv || ∂∂λ (cid:19) . (M.7)The parallel streaming term in (M.6) is annihilated by the bounce averaging operation,0 = (cid:104) C ( f − ) (cid:105) b − (cid:104) v m · ∇ ψ (cid:105) b ∂f ∂ψ , (M.8)where the bounce average of a quantity A is (cid:104) A (cid:105) b = τ − (cid:72) dl A/v || and the bounce time is τ = (cid:72) dl v − || . The bounce-averaged equation (M.8) can be expressed in terms of the paralleladiabatic invariant J = (cid:72) dl v || using the relation, (cid:104) v m · ∇ ψ (cid:105) b = mqτ ∂J∂α . (M.9)Integrating (M.8) with respect to λ we obtain, ∂f − ∂λ = m(cid:15) qλν ( (cid:15) ) ∂f ∂ψ (cid:18)(cid:73) dl v || B (cid:19) − (cid:90) λ /B max dλ (cid:48) ∂J∂α . (M.10)Here B max is the maximum value of the ﬁeld strength on the surface labeled by ψ . Wehave used the boundary condition (cid:0)(cid:72) dl v || /B (cid:1) ∂f − /∂λ | λ =1 /B max = 0, as there is no ﬂuxin pitch-angle from the passing region. The integration with respect to λ is performed toobtain, ∂f − ∂λ = − m qλν ( (cid:15) ) ∂f ∂ψ (cid:18)(cid:73) dl v || B (cid:19) − ∂∂α (cid:32)(cid:73) dl v || B (cid:33) . (M.11)The particle ﬂux from f − is obtained by multiplying (M.6) by f − ( ∂f /∂ψ ) − , integratingover velocity space, and ﬂux surface averaging, (cid:104) Γ · ∇ ψ (cid:105) ψ ≡ (cid:28)(cid:90) d v f − v m · ∇ ψ (cid:29) ψ = (cid:42)(cid:90) d v f − C ( f − ) (cid:18) ∂f ∂ψ (cid:19) − (cid:43) ψ . (M.12)The velocity space integration is performed using the velocity-space Jacobian d v = 2 π (cid:80) σ B(cid:15)/ | v || | dλd(cid:15) .Upon integration by parts in λ and applying (M.11), the following expression is obtained, (cid:104) Γ · ∇ ψ (cid:105) ψ = − √ πV (cid:48) ( ψ ) (cid:18) m q (cid:19) (cid:90) ∞ d(cid:15) (cid:18) ∂f ∂ψ (cid:19) (cid:15) / ν ( (cid:15) ) (cid:90) /B min /B max dλλ (cid:90) π dα (cid:88) i ( ∂∂α ˆ K i ( α, λ )) ˆ I i ( α, λ ) , (M.13)where the bounce integrals are deﬁned by (5.111). The sum in (M.13) is taken over trappingregions for particles with pitch angle λ on a ﬁeld line labeled by α for left bounce points ϕ − ,i ∈ [0 , π ).The parameter (cid:15) / quantiﬁes the geometric dependence of the 1 /ν particle ﬂux. It is185eﬁned in terms of the radial particle ﬂux in the following way [168], (cid:104) Γ · ∇ ψ (cid:105) ψ = − (cid:104)|∇ ψ |(cid:105) ψ (cid:18) m q (cid:19) B R (cid:15) / (cid:90) ∞ d(cid:15) (cid:18) ∂f ∂ψ (cid:19) (cid:15) / ν ( (cid:15) ) . (M.14)We take our normalizing length and ﬁeld values to be such that B R = (cid:15) − (cid:104)|∇ ψ |(cid:105) ψ , where (cid:15) ref is a reference aspect ratio. Comparing (M.13) with (M.14) we obtain the expressionfor (cid:15) / (5.110). The corresponding expression (29) in [168] is obtained by noting thatˆ H Nemov = − ( ∂ ˆ K/∂α ) λ / B / and ˆ I = 2 ˆ I Nemov , where ˆ H Nemov and ˆ I Nemov are given in (30)-(31) of [168].The shape derivative of f (cid:15) (5.112) is computed to be, δf (cid:15) ( S P ; ξ ) = (cid:90) V P dψ w ( ψ ) δ ( V (cid:48) ( ψ ) (cid:15) / ( ψ )) . (M.15)The perturbation to the bounce integrals is computed using the following identity for theperturbation of a line integral Q L = (cid:82) l L l dl Q due to displacement of the integration curveby vector ﬁeld δ x [9, 138], δQ L = (cid:90) l L l dl (cid:32) δ x · (cid:18) − κ Q + (cid:16) I − ˆ t ˆ t (cid:17) · ∇ Q (cid:19) + δQ (cid:33) + Q ( l L ) δl L − Q ( l ) δl , (M.16)where δQ is the perturbation to the integrand at ﬁxed position, ˆ t = x (cid:48) ( l ) is the unit tangentvector, κ = x (cid:48)(cid:48) ( l ) is the curvature, and δl L and δl are perturbations to the bounds of theintegral.We compute the perturbation to the bounce integrals to be, δ ˆ I i = (cid:73) dl  − v || vB κ · δ x − (cid:32) λv Bv || + v || B v (cid:33) ( δ x · ∇ B + δB )  (M.17a) δ ˆ K i = (cid:73) dl  − v || v B κ · δ x − (cid:32) λv || Bv + v || B v (cid:33) ( δ x · ∇ B + δB )  , (M.17b)where δB is the perturbation to the ﬁeld strength (5.62) and δ x is given by (5.22). Wenote that δ x · ˆ b = 0 such that the perpendicular projection, ( I − ˆ t ˆ t ), is not needed. Thereis no contribution due to the perturbation of the bounce points, as the integrand vanishesat these points. The expressions (5.113)-(5.115) can now be obtained by writing (M.15) interms of the perturbations of the bounce integrals, using ξ · ∇ B + δB = − B (cid:16) I − ˆ b ˆ b (cid:17) : ∇ ξ − δχ (cid:48) ( ψ )ˆ b · ( ∇ ψ × ∇ ϕ ) and κ · ξ = − ˆ b ˆ b : ∇ ξ .186 ppendix N: Details of departure from quasi-symmetry calculation In this Appendix we compute the shape derivative of f QS (5.121) to obtain (5.126)-(5.127c) by expressing each term in (5.125) in the desired form. The second term in (5.125)is expressed using δψ = − ξ · ∇ ψ ,12 (cid:90) V P d x w (cid:48) ( ψ ) δψ M = − (cid:90) V P d x M ξ · ∇ w ( ψ ) . (N.1)The third term in (5.125) is computed upon application of (5.20), the divergence theorem,and noting that M = B · A , (cid:90) V P d x w ( ψ ) M δ B · A = − (cid:90) S P d x ξ · n w ( ψ ) M − (cid:90) V P d x w ( ψ ) δχ (cid:48) ( ψ ) M∇ ψ × ∇ ϕ · A + (cid:90) V P d x ξ · (cid:16) w ( ψ ) M (cid:0) B × ( ∇ × A ) (cid:1) − A w ( ψ ) B · ∇M + M∇ (cid:0) w ( ψ ) M (cid:1)(cid:17) . (N.2)The quantity A can be projected into the perpendicular direction as ξ · ˆ b = 0, noting that,ˆ b × (cid:16) A × ˆ b (cid:17) = − (ˆ b × ∇ ψ ) ∇ || B − F ( ψ ) ∇ ⊥ B. (N.3)Similarly, any terms in (N.2) involving ξ · ∇ can be expressed as ξ · ∇ ⊥ . The correspondingterms in (5.127a) are obtained using the expression for the curvature in an equilibrium ﬁeld.The fourth term in (5.125) is expressed in the following way upon application of (5.62), thedivergence theorem, and noting that S · ∇ ψ = ∇ · S = 0, (cid:90) V P d x w ( ψ ) M S · ∇ δB = (cid:90) S P d x ξ · n Bw ( ψ ) S · ∇M − (cid:90) V P d x ξ · (cid:104) B ∇ (cid:0) w ( ψ ) S · ∇M (cid:1)(cid:105) + (cid:90) V P d x w ( ψ )( S · ∇M ) (cid:16) δχ (cid:48) ( ψ )ˆ b · ( ∇ ψ × ∇ ϕ ) + B ξ · κ (cid:17) . (N.4)We express terms involving ξ · ∇ as ξ · ∇ ⊥ to obtain the corresponding terms in (5.127a).The ﬁfth term in (5.125) is expressed in the following way upon application of δψ = − ξ ·∇ ψ ,the divergence theorem, and several vector identities, (cid:90) V P d x w ( ψ ) M B × ∇ δψ · ∇ B = − (cid:90) S P d x ξ · ˆ n w ( ψ ) M∇ B × B · ∇ ψ − (cid:90) V P d x ξ · ∇ ψ ∇ B · ∇ × (cid:0) w ( ψ ) M B (cid:1) . (N.5)187he sixth term in (5.125) upon application of (5.124) is, − (cid:90) V P d x δG ( ψ ) w ( ψ ) M B · ∇ Bι ( ψ ) − ( N/M ) =14 π (cid:90) S P d x w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ ( ι ( ψ ) − ( N/M )) ( B · ∇ ψ × ∇ ϑ ) ξ · ˆ n − π (cid:90) V P d x ξ · ∇ (cid:18) w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ ( ι ( ψ ) − ( N/M )) (cid:19) B · ∇ ψ × ∇ ϑ + 14 π (cid:90) V P d x w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ ι ( ψ ) − ( N/M ) (cid:16) ξ · (cid:0) ∇ ψ ∇ · ( B × ∇ ϑ ) − B × ∇ × ( ∇ ψ × ∇ ϑ ) (cid:1)(cid:17) − π (cid:90) V P d x δχ (cid:48) ( ψ ) w ( ψ ) V (cid:48) ( ψ ) (cid:104)M B · ∇ B (cid:105) ψ √ g ( ι ( ψ ) − ( N/M )) ∂ x ∂ϕ · ∂ x ∂ϑ . (N.6)In obtaining the corresponding terms in (5.127a), terms involving ξ · ∇ are expressed as ξ · ∇ ⊥ . The seventh term in (5.125) is expressed using δψ = − ξ · ∇ ψ . Combining all terms,we obtain (5.126)-(5.127c). 188 ppendix O: Details of neoclassical ﬁgures of merit calculation In this Section we compute the shape derivative of f NC (5.130) to obtain (5.136)-(5.137c)by expressing each term in (5.135) in the desired form. Throughout Boozer coordinates willbe assumed.The second term in (5.135) is expressed using δψ = − ξ · ∇ ψ . The third term in (5.135)can be computed using (5.124), noting that V (cid:48) ( ψ ) / (4 π √ g ) = B / (cid:104) B (cid:105) ψ in Boozer coordi-nates and applying the divergence theorem, (cid:90) V P d x w ( ψ ) ∂ R ( ψ ) ∂G ( ψ ) δG ( ψ ) = − (cid:90) V P d x w ( ψ ) B √ g (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) ξ · ∇ ψ ( ∇ × B ) · ∇ ϑ + (cid:90) V P d x  ξ · ∇ (cid:32) ∂ R ( ψ ) ∂G ( ψ ) w ( ψ ) (cid:104) B (cid:105) ψ (cid:33) B G ( ψ ) + w ( ψ ) (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) ξ · B × ∇ × (cid:18) ∂ x ∂ϕ B (cid:19) + (cid:90) V P d x w ( ψ ) δχ (cid:48) ( ψ ) B √ g (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) ∂ x ∂ϕ · ∂ x ∂ϑ − (cid:90) S P d x w ( ψ ) B (cid:104) B (cid:105) ψ ∂ R ( ψ ) ∂G ( ψ ) G ( ψ ) ξ · ˆ n . (O.1)The ﬁfth term in (5.135) can be computed using (5.62), the divergence theorem, and theexpression for the curvature in an equilibrium ﬁeld (5.105), (cid:90) V P d x w ( ψ ) (cid:104) S R δB (cid:105) ψ = (cid:90) V P d x (cid:16) ξ · ∇ (cid:0) w ( ψ ) S R (cid:1) B − BS R w ( ψ ) ξ · κ (cid:17) − (cid:90) V P d x δχ (cid:48) ( ψ ) S R w ( ψ )ˆ b · ∇ ψ × ∇ ϕ − (cid:90) S P d x w ( ψ ) S R B ξ · ˆ n . (O.2)The resulting terms can be combined to write the shape derivative in the form of (5.136),noting that any terms involving ξ · ∇ can be expressed as ξ · ∇ ⊥ .189 ppendix P: Linearized equilibrium energy functional and coeﬃcient ma-trices P.1 Further simpliﬁcation of energy functional

We will now further simplify the energy functional (6.11) using a magnetic coordinatesystem. Each of the contravariant components of the perturbed magnetic ﬁeld are evaluatedto be, Q ψ ≡ δ B [ ξ ] · ∇ ψ = 1 √ g (cid:32) ∂ξ ψ ∂ϕ + ι ∂ξ ψ ∂ϑ (cid:33) (P.1a) Q ϑ ≡ δ B [ ξ ] · ∇ ϑ = 1 √ g (cid:32) ∂ξ α ∂ϕ − ∂ξ ψ ι∂ψ (cid:33) (P.1b) Q ϕ ≡ δ B [ ξ ] · ∇ ϕ = − √ g (cid:32) ∂ξ α ∂ϑ + ∂ξ ψ ∂ψ (cid:33) . (P.1c)We also express the current density in the contravariant basis as, J = J ϑ ∂ x ∂ϑ + J ϕ ∂ x ∂ϕ . (P.2)The ﬁrst term in the energy functional is expressed as, W ≡ − µ (cid:90) V P d x δ B [ ξ ] · δ B [ ξ ] (P.3)= − µ (cid:90) V P d x (cid:34) (cid:16) Q ψ (cid:17) g ψψ + (cid:16) Q ϑ (cid:17) g ϑϑ + ( Q ϕ ) g ϕϕ + 2 Q ψ Q ϑ g ψϑ (cid:35) , where g x i x j = ∂ x /∂x i · ∂ x /∂x j are the metric coeﬃcients. Here we have assumed that ϕ = φ ,the geometric toroidal angle, such that g ϑϕ = g ψϕ = 0.The second term in the energy functional is expressed as, W ≡ (cid:90) V P d x ξ · J × δ B [ ξ ] (P.4)= (cid:90) V P d x √ g (cid:18) ξ ψ (cid:16) J ϑ Q ϕ − J ϕ Q ϑ (cid:17) + Q ψ (cid:16) ξ ϑ J ϕ − ξ ϕ J ϑ (cid:17)(cid:19) . Here we can note that the radial component of MHD force balance yields p (cid:48) ( ψ ) = J ϑ − ι ( ψ ) J ϕ W = (cid:90) V P d x √ g (cid:18) ξ ψ (cid:16) J ϑ Q ϕ − J ϕ Q ϑ (cid:17) + Q ψ (cid:0) ξ α J ϕ − p (cid:48) ( ψ ) ξ ϕ (cid:1)(cid:19) . (P.5)The third term in the energy functional can be expressed as, W ≡ (cid:90) V P d x ξ · ∇ ( ξ · ∇ p ) (P.6)= (cid:90) V P d x  ξ ψ ∂ ( ξ ψ p (cid:48) ( ψ )) ∂ψ + p (cid:48) ( ψ ) (cid:32) ξ α ∂ξ ψ ∂ϑ + √ gQ ψ ξ ϕ (cid:33) . Combining W and W , we see that the energy functional indeed only depends on ξ α and ξ ψ , W + W = (cid:90) V P d x (cid:32) √ gξ ψ (cid:16) J ϑ Q ϕ − J ϕ Q ϑ (cid:17) + ξ α J · ∇ ξ ψ + ξ ψ ∂ ( ξ ψ p (cid:48) ( ψ )) ∂ψ (cid:33) . (P.7)We now can apply the divegernce theorem, noting that ∇ · J = J · ∇ ψ = 0, to obtain, W + W = (cid:90) V P d x (cid:32) ξ ψ (cid:16) J ϕ ι (cid:48) ( ψ ) ξ ψ − J · ∇ ξ α + ξ ψ p (cid:48)(cid:48) ( ψ ) (cid:17) (cid:33) . (P.8)We now see that the ﬁrst three terms of the energy functional only depend on ξ α throughits ϑ and ϕ derivatives. Furthermore, given the restriction of δF α discussed in Appendix Q,the m = 0, n = 0 mode of ξ α will not enter the variational principle.191 .2 Explicit forms of coeﬃcient matrices We can now express the linear operators that couple the Fourier components of ξ α , ξ ψ ,and ∂ξ ψ /∂ψ given the simpliﬁcations of the energy functional in the previous Section: A ψ (cid:48) ψ (cid:48) = − V (cid:48) ( ψ ) µ (cid:42) (cid:0) √ g (cid:1) (cid:0) g ϕϕ + ι ( ψ ) g ϑϑ (cid:1) F ψ F ψ (cid:43) ψ (P.9a) A ψψ = V (cid:48) ( ψ ) µ (cid:42) (cid:0) √ g (cid:1) (cid:34) − g ψψ (cid:32) ∂ F ψ ∂ϕ ∂ F ψ ∂ϕ + ι ( ψ ) ∂ F ψ ∂ϑ ∂ F ψ ∂ϑ (cid:33) (P.9b) − g ψψ ι ( ψ ) (cid:32) ∂ F ψ ∂ϑ ∂ F ψ ∂ϕ + ∂ F ψ ∂ϕ ∂ F ψ ∂ϑ (cid:33) + (cid:16) µ (cid:0) √ g (cid:1) (cid:0) J ϕ ι (cid:48) ( ψ ) + p (cid:48)(cid:48) ( ψ ) (cid:1) − g ϑϑ (cid:0) ι (cid:48) ( ψ ) (cid:1) (cid:17) F ψ F ψ + g ψϑ ι (cid:48) ( ψ ) (cid:32) ∂ F ψ ∂ϕ + ι ( ψ ) ∂ F ψ ∂ϑ (cid:33) F ψ + F ψ (cid:32) ∂ F ψ ∂ϕ + ι ( ψ ) ∂ F ψ ∂ϑ (cid:33) (cid:35)(cid:43) ψ A ψψ (cid:48) = V (cid:48) ( ψ ) µ (cid:42) ι ( ψ ) (cid:0) √ g (cid:1) (cid:34) − F ψ g ϑϑ ι (cid:48) ( ψ ) + g ψϑ (cid:32) ∂ F ψ ∂ϕ + ι ( ψ ) ∂ F ψ ∂ϑ (cid:33) (cid:35) F ψ (cid:43) ψ (P.9c) A αα = − V (cid:48) ( ψ ) µ (cid:42) (cid:0) √ g (cid:1) (cid:34) g ϑϑ ∂ F α ∂ϕ ∂ F α ∂ϕ + g ϕϕ ∂ F α ∂ϑ ∂ F α ∂ϑ (cid:35)(cid:43) ψ (P.9d) A αψ (cid:48) = 2 V (cid:48) ( ψ ) µ (cid:42) (cid:0) √ g (cid:1) (cid:34) g ϑϑ ι ∂ F α ∂ϕ − g ϕϕ ∂ F α ∂ϑ (cid:35) F ψ (cid:43) ψ (P.9e) A αψ = − V (cid:48) ( ψ ) µ (cid:42) (cid:0) √ g (cid:1) (cid:34) (cid:18) − g ϑϑ ι (cid:48) ( ψ ) ∂ F α ∂ϕ + µ (cid:0) √ g (cid:1) J · ∇ F α (cid:19) F ψ (P.9f)+ g ψϑ ∂ F α ∂ϕ (cid:32) ∂ F ψ ∂ϕ + ι ( ψ ) ∂ F ψ ∂ϑ (cid:33) (cid:35)(cid:43) ψ I ψ = 2 V (cid:48) ( ψ ) (cid:68) F ψ δF ψ (cid:69) ψ (P.9g) I α = 2 V (cid:48) ( ψ ) (cid:104) F α δF α (cid:105) ψ , (P.9h)where (cid:104) ... (cid:105) ψ is the ﬂux-surface average (A.10). P.3 Invertibility of A αα Obtaining the Euler-Lagrange solution for ξ α requires inverting A αα . We now show thatthis matrix is, in fact, negative deﬁnite and thus invertible. For any non-zero vector Ξ α , we192an write the inner product with A αα as, Ξ α · ( A αα Ξ α ) = − µ (cid:90) π dϑ (cid:90) π dϕ (cid:34) g ϑϑ √ g (cid:18) ∂ξ α ∂ϕ (cid:19) + g ϕϕ √ g (cid:18) ∂ξ α ∂ϑ (cid:19) (cid:35) . (P.10)We note that for a well-deﬁned coordinate system, g ϑϑ > g ϕϕ >

0, and √ g >

0. Whileeither ∂ξ α /∂ϕ or ∂ξ α /∂ϑ may vanish, they will not vanish simultaneously throughout theintegrand as we have excluded the n = 0, m = 0 mode. Therefore, the integrand will onlyvanish at isolated points. Thus the above integral is negative deﬁnite, and A αα is invertiblethroughout the volume. 193 ppendix Q: Constraint on bulk force perturbation As shown in Appendix P, the ﬁrst three terms in the energy functional (6.11) only dependon ξ α through its derivatives with respect to ϑ and ϕ . In this Appendix, we show that itis always possible to choose the in-surface component of the bulk force perturbation, δF α ,such that the ﬁnal term in the energy functional, W ≡ (cid:90) V P d x ξ α δF α , (Q.1)does not depend on ξ αc , = π ) (cid:82) π dϑ (cid:82) π dϕ ξ α . As ξ αc , does not enter our variationalprinciple, we can take it to vanish. The condition that ξ αc , does not enter W is equivalentto requiring that, (cid:104) δF α (cid:105) ψ = 0 , (Q.2)on every surface, where (cid:104) . . . (cid:105) ψ is the ﬂux-surface average (A.10). This follows from thesurface-averaged in-surface component of the linearized force-balance equation (6.2), (cid:28) ∂ x ∂ϑ · F [ ξ ] (cid:29) ψ = 0 . (Q.3)This property of the MHD force operator holds for any equilibrium ﬁeld that satisﬁes MHDforce balance (6.1). To see this we note that the ﬂux-surface average can be deﬁned in termsof an average over the inﬁnitesimal volume between ﬂux surfaces ∆ V (A.12). We can nowapply the self-adjointness relation (6.9) to simplify (Q.3), (cid:28) ∂ x ∂ϑ · F [ ξ ] (cid:29) ψ = (cid:42) ξ · F (cid:20) ∂ x ∂ϑ (cid:21)(cid:43) ψ + lim ∆ V → µ ∆ V (cid:32)(cid:90) ∂ ( V P +∆ V ) d x ˆ n · ξ B · δ B (cid:20) ∂ x ∂ϑ (cid:21) − (cid:90) ∂ ( V P ) d x ˆ n · ξ B · δ B (cid:20) ∂ x ∂ϑ (cid:21)(cid:33) , (Q.4)where we have noted that ˆ n · ∂ x ∂ϑ = 0, as ˆ n ∝ ∇ ψ . The quantity δ B (cid:2) ∂ x /∂ϑ (cid:3) = ∇ × (cid:0) ∂ x /∂ϑ × B (cid:1) is shown to vanish by expressing B in contravariant form and using the dualrelations (A.3) between the contravariant and covariant basis vectors. The remaining ﬂux-194urface averaged term can also be shown to vanish, (cid:42) ξ · F (cid:20) ∂ x ∂ϑ (cid:21)(cid:43) ψ = (cid:42) ξ ·  J × δ B (cid:20) ∂ x ∂ϑ (cid:21) + (cid:18) ∇ × δ B (cid:104) ∂ x ∂ϑ (cid:105)(cid:19) × B µ + ∇ (cid:18) ∂ x ∂ϑ · ∇ p (cid:19)(cid:43) ψ , (Q.5)as ∂ x ∂ϕ · ∇ ψ = 0 and δ B (cid:2) ∂ x /∂ϑ (cid:3) = 0.Therefore, we see that in order to satisfy linear force balance, δF α must be chosen tosatisfy the condition (Q.2). However, this property can always be imparted on a bulk forcearising from the adjoint formulation. Consider the ﬁxed-boundary adjoint relation (5.36)without perturbations to the rotational transform, (cid:90) V P d x ( ξ · F − ξ · F ) − µ (cid:90) S P d x ˆ n · (cid:0) ξ δ B [ ξ ] · B − ξ δ B [ ξ ] · B (cid:1) = 0 . (Q.6)As δ B [ ξ ] does not depend on ξ αc , , we can choose to deﬁne the displacement vector such that ξ αc , = 0. This is analogous to our convention that ξ · B = 0, as δ B [ ξ ] does not depend onthe parallel component of ξ . Given this convention for the displacement vector, we can notethat (cid:104) δF α, (cid:105) ψ and (cid:104) δF α, (cid:105) ψ do not enter the above adjoint relation. Therefore, we are free tochoose our bulk force such that the desired constraint (Q.2) is satisﬁed.195 ppendix R: Near-axis expansion of screw pinch equilibria The MHD force-balance equation for a screw pinch is, ddr (cid:18) µ p ( r ) + 12 r (cid:0) ψ (cid:48) ( r ) (cid:1) (cid:19) + ι ( r ) ψ (cid:48) ( r ) R r ddr (cid:0) rι ( r ) ψ (cid:48) ( r ) (cid:1) = 0 . (R.1)We note that (R.1) remains unchanged under the transformation r → − r , so ψ ( r ) must beeven in r . Thus near the origin we can express the ﬂux function as, ψ ( r ) = ψ r + ψ r + ψ r + O ( r ) , (R.2)under the assumption that ψ (0) = 0. We similarly express the rotational transform andpressure proﬁles in a power series near the axis, ι ( ψ ( r )) = ι + ι ψ ( r ) + ι ψ ( r ) + ι ψ ( r ) + O ( ψ ( r ) ) (R.3a) p ( ψ ( r )) = p + p ψ ( r ) + p ψ ( r ) + p ψ ( r ) + O ( ψ ( r ) ) . (R.3b)The force-balance equation to O ( r ) becomes, µ p ψ + 2 ι ψ R + ψ ψ , (R.4)and to O ( r ) it is, µ p ψ ι ι ψ R + µ p ψ ι ψ ψ R + ψ

18 + ψ ψ

30 = 0 . (R.5)In order to determine the power series expansion of ψ , we match the solution near the axiswith a numerical solution for ψ ( r ) at some chosen boundary location near the axis, r b . Toperform an expansion to O ( r ), ψ is chosen such that ψ = 2 ψ ( r b ) r b . (R.6)196o perform an expansion to O ( r ), (R.4) is used to express ψ in terms of ψ , and ψ ischosen such that ψ r b / ψ r b /

4! = ψ ( r b ), ψ = − µ p r b − ψ ( r b )2 r b (cid:16) r b ι R − (cid:17) (R.7a) ψ = − (cid:32) µ p + 2 ι ψ R (cid:33) . (R.7b)To perform an expansion to O ( r ), (R.4) and (R.5) are used to express ψ and ψ in terms of ψ , and ψ is chosen such that ψ r b / ψ r b /

4! + ψ r b /

6! = ψ ( r b ). The resulting equation for ψ is quadratic, but only one solution is allowed in practice to ensure that ( ψ r b / / ( ψ r b / ψ r b / ∼ r b in the limit that r b (cid:28) ψ = − R ι ι r b (cid:32) − r b + 12 r b ι R + r b (cid:32) µ p − ι R (cid:33) (R.8a)+ r b (cid:34) (cid:32) −

24 + µ p r b + 12 r b ι R − r b ι R (cid:33) + 48 r b ι ι R  p r b (cid:32) − r b ι R (cid:33) − ψ ( r b )  (cid:35) / (cid:33) ψ = − (cid:32) µ p + 2 ι R ψ (cid:33) (R.8b) ψ = 15 (cid:32) µ p ι R − µ p ψ + 8 ι ψ R − ι ι ψ R (cid:33) . (R.8c)We compare the resulting solution for ψ to a numerical solution of (R.1) using MATLAB’sbvp4c routine. The solution is computed for r ∈ [0 ,

1] with a boundary condition of ψ (0) = 0and ψ (1) = ψ . The same proﬁles are used as described in Section 6.3.1. The axis expansionsolution is matched with the numerical solution at r b = 10 − . In Figure R.1 we present acomparison between the numerical solution and axis expansion of ψ ( r ). As expected, theerror in the axis expansion to O ( r p ) scales as ∼ | r − r b | p +2 as one moves away from r = r b .197 a) (b) Figure R.1: (a) The axis expansion solutions to O ( r ), O ( r ), and O ( r ) are compared withthe numerical solution of ψ ( r ) near the axis. (b) The absolute error in the expansion isshown, | (cid:80) n ψ n r n /n ! − ψ ( r ) | where ψ ( r ) is the numerical solution. As expected, the error inthe axis expansion to O ( r p ) scales as | r − r b | p +2 near r = r b .198 ibliography [1] Princeton plasma physics laboratory - timeline. URL . date accessed: 01/03/2019.[2] I. Abel, G. Plunk, E. Wang, M. Barnes, S. Cowley, W. Dorland, and A. Schekochi-hin. Multiscale gyrokinetics for rotating tokamak plasmas: ﬂuctuations, transport andenergy ﬂows. Reports on Progress in Physics , 76(11):116201, 2013.[3] A. Alexanderian, N. Petra, G. Stadler, and O. Ghattas. Mean-variance risk-averseoptimal control of systems governed by PDEs with random parameter ﬁelds usingquadratic approximations.

SIAM/ASA Journal on Uncertainty Quantiﬁcation , 5(1):1166–1192, 2017.[4] G. Allaire. A review of adjoint methods for sensitivity analysis, uncertainty quantiﬁ-cation and optimization in numerical codes.

Ingnieurs de lAutomobile , 836:33, 2015.[5] A. F. Almagri, D. T. Anderson, and S. F. B. Anderson. Design and construction ofHSX: A helically symmetric stellarator. In

Helical System Research . 1998.[6] D. Anderson. Personal communication, 9 2019.[7] D. V. Anderson, W. Cooper, R. Gruber, S. Merazzi, and U. Schwenn. Methods forthe eﬃcient calculation of the (MHD) magnetohydrodynamic stability properties ofmagnetically conﬁned fusion plasmas.

The International Journal of SupercomputingApplications , 4(3):34, 1990.[8] F. S. B. Anderson, A. F. Almagri, D. T. Anderson, P. G. Matthews, J. N. Talmadge,and J. L. Shohet. The Helically Symmetric eXperiment,(HSX) goals, design and status.

Fusion Technology , 27(3T):273–277, 1995.[9] T. Antonsen and Y. Lee. Electrostatic modiﬁcation of variational principles foranisotropic plasmas.

Physics of Fluids , 25(1):132, 1982.[10] T. Antonsen, E. J. Paul, and M. Landreman. Adjoint approach to calculating shapegradients for three-dimensional magnetic conﬁnement equilibria.

Journal of PlasmaPhysics , 85(2), 2019.[11] H. F. Arnoldus. Conservation of charge at an interface.

Optics Communications , 265(1):52–59, 2006. 19912] A. Bader, M. Drevlak, D. Anderson, B. Faber, C. Hegna, K. Likin, J. Schmitt, andJ. Talmadge. Stellarator equilibria with reactor relevant energetic particle losses.

Jour-nal of Plasma Physics , 85(5), 2019.[13] M. Barnes, I. Abel, W. Dorland, T. G¨orler, G. Hammett, and F. Jenko. Direct multi-scale coupling of a transport code to gyrokinetic turbulence codes.

Physics of Plasmas ,17(5):056109, 2010.[14] F. Bauer, O. Betancourt, and P. Garabedian.

A Computational Method in PlasmaPhysics . Springer Science & Business Media, 2012.[15] C. Beidler, G. Grieger, F. Herrnegger, E. Harmeyer, W. Lotz, H. Maassberg, P. Merkel,J. N¨uhrenberg, F. Rau, J. Sapper, F. Sardei, R. Scardovelli, A. Schl¨uter, and H. Wobig.Physics and engineering design for Wendelstein VII-X.

Fusion Technology , 17(1):148,1990.[16] C. Beidler, K. Allmaier, M. Y. Isaev, S. Kasilov, W. Kernbichler, G. Leitold, H. Maass-berg, D. Mikkelsen, S. Murakami, M. Schmidt, et al. Benchmarking of the mono-energetic transport coeﬃcientsresults from the International Collaboration on Neo-classical Transport in Stellarators (ICNTS).

Nuclear Fusion , 51(7):076001, 2011.[17] C. D. Beidler and W. D. D’haeseleer. A general solution of the ripple-averaged kineticequation (GSRAKE).

Plasma Physics and Controlled Fusion , 37(4):463, 1995.[18] E. A. Belli and J. Candy. Neoclassical transport in toroidal plasmas with nonaxisym-metric ﬂux surfaces.

Plasma Physics and Controlled Fusion , 57(5):054012, 2015.[19] E. Berkl et al. Plasma physics and controlled nuclear fusion research 1968. In

Proceed-ings of the 3rd International Conference Novosibirsk , volume 1, 1968.[20] I. Bernstein, E. Frieman, M. Kruskal, and R. Kulsrud. An energy principle for hydro-magnetic stability problems.

Proceedings of the Royal Society A , 244(1236):17, 1958.[21] A. Boozer. Plasma equilibrium with rational magnetic surfaces.

The Physics of Fluids ,24(11):1999, 1981.[22] A. Boozer. Quasi-helical symmetry in stellarators.

Plasma Physics and ControlledFusion , 37(11A):A103, 1995.[23] A. H. Boozer. Guiding center drift equations.

The Physics of Fluids , 23(5):904, 1980.[24] A. H. Boozer. Transport and isomorphic equilibria.

The Physics of Fluids , 26(2):496,1983.[25] A. H. Boozer. Stellarator coil optimization by targeting the plasma conﬁguration.

Physics of Plasmas , 7(8):3378, 2000. 20026] A. H. Boozer. Non-axisymmetric magnetic ﬁelds and toroidal plasma conﬁnement.

Nuclear Fusion , 55(2):025001, 2015.[27] A. H. Boozer. Stellarators as a fast path to fusion energy. arXiv preprintarXiv:1912.06289 , 2019.[28] A. H. Boozer and C. N¨uhrenberg. Perturbed plasma equilibria.

Physics of Plasmas ,13(10):102501, 2006.[29] S. Boyd and L. Vandenberghe.

Convex Optimization . Cambridge University Press,2004.[30] R. P. Brent.

Algorithms for Minimization Without Derivatives . Courier Corporation,2013.[31] A. Brooks and W. Reiersen. Coil tolerance impact on plasma surface quality for NCSX.In , page 553. IEEE, 2003.[32] T. Brown, J. Breslau, D. Gates, N. Pomphrey, and A. Zolfaghari. Engineering op-timization of stellarator coils lead to improvements in device maintenance. In

IEEE26th Symposium on Fusion Engineering (SOFE) , Austin, Texas, 2015.[33] I. Calvo, F. I. Parra, J. L. Velasco, and J. A. Alonso. The eﬀect of tangential driftson neoclassical transport in stellarators close to omnigeneity.

Plasma Physics andControlled Fusion , 59(5):055014, 2017.[34] I. Calvo, J. L. Velasco, F. I. Parra, J. A. Alonso, and J. M. Garc´ıa-Rega˜na. Electro-static potential variations on stellarator magnetic surfaces in low collisionality regimes.

Journal of Plasma Physics , 84(4), 2018.[35] J. Canik, D. Anderson, F. Anderson, C. Clark, K. Likin, J. Talmadge, and K. Zhai.Reduced particle and heat transport with quasisymmetry in the Helically SymmetricExperiment.

Physics of Plasmas , 14(5):056107, 2007.[36] A. Carlton-Jones, E. Paul, and W. Dorland. Computing the shape gradient of coilcomplexity with respect to the plasma boundary with an adjoint method.

Bulletin ofthe American Physical Society , 64, 2019.[37] B. Carreras, V. Lynch, and A. Ware. Conﬁguration studies for a small-aspect-ratiotokamak stellarator hybrid. Technical report, Oak Ridge National Lab., 1996.[38] J. R. Cary and J. D. Hanson. Simple method for calculating island widths.

Physics ofFluids B: Plasma Physics , 3(4):1006, 1991.[39] J. R. Cary and S. G. Shasharina. Omnigenity and quasihelicity in helical plasmaconﬁnement systems.

Physics of Plasmas , 4(9):3323, 1997.20140] K. K. Choi and N.-H. Kim.

Structural Sensitivity Analysis and Optimization 1: LinearSystems . Springer Science & Business Media, 2006.[41] E. A. Coddington and N. Levinson.

Theory of Ordinary Diﬀerential Equations . TataMcGraw-Hill Education, 1955.[42] J. Connor and R. Hastie. Neoclassical diﬀusion in an l = 3 stellarator. Physics ofFluids , 17(114):114, 1974.[43] W. Cooper, S. Hirshman, S. Merazzi, and R. Gruber. 3D magnetohydrodynamicequilibria with anisotropic pressure.

Computer Physics Communications , 72(1):1, 1992.[44] W. Cooper, S. Hirshman, T. Yamaguchi, Y. Narushima, S. Okamura, S. Sakak-ibara, C. Suzuki, K. Watanabe, H. Yamada, and K. Yamazaki. Three-dimensionalanisotropic pressure equilibria that model balanced tangential neutral beam injectioneﬀects.

Plasma Physics and Controlled Fusion , 47(3):561, 2005.[45] W. Cooper, J. Graves, S. Hirshman, T. Yamaguchi, Y. Narushima, S. Okamura,S. Sakakibara, C. Suzuki, K. Watanabe, H. Yamada, et al. Anisotropic pressurebi-Maxwellian distribution function model for three-dimensional equilibria.

NuclearFusion , 46(7):683, 2006.[46] T. Coor, S. Cunningham, R. Ellis, M. Heald, and A. Kranz. Experiments on the ohmicheating and conﬁnement of plasma in a stellarator.

The Physics of Fluids , 1(5):411,1958.[47] W. Dekeyser.

Optimal Plasma Edge Conﬁgurations for Next-Step Fusion Reactors .PhD thesis, Katholieke Universiteit Leuven, 2014.[48] W. Dekeyser, D. Reiter, and M. Baelmans. Divertor design through shape optimization.

Contributions to Plasma Physics , 52(5):544, 2012.[49] W. Dekeyser, D. Reiter, and M. Baelmans. Automated divertor target design byadjoint shape sensitivity analysis and a one-shot method.

Journal of ComputationalPhysics , 278:117, 2014.[50] W. Dekeyser, D. Reiter, and M. Baelmans. Optimal shape design for divertors.

Inter-national Journal of Computational Science and Engineering 2 , 9(5-6):397, 2014.[51] W. Dekeyser, D. Reiter, and M. Baelmans. A one shot method for divertor targetshape optimization.

Proceedings in Applied Mathematics and Mechanics , 14(1):1017,2014.[52] M. C. Delfour and J.-P. Zol´esio.

Shapes and Geometries . Society for Industrial andApplied Mathematics, 2011.[53] R. Dewar and S. Hudson. Stellarator symmetry.

Physica D: Nonlinear Phenomena ,112(1):275–280, 1998. 20254] W. D. D’haeseleer, W. N. Hitchon, J. D. Callen, and J. L. Shohet.

Flux Coordinatesand Magnetic Field Structure: A Guide to a Fundamental Tool of Plasma Theory .Springer, 1991.[55] A. Dinklage, C. Beidler, P. Helander, G. Fuchert, H. Maaßberg, K. Rahbarnia, T. S.Pedersen, Y. Turkin, R. Wolf, A. Alonso, et al. Magnetic conﬁguration eﬀects on theWendelstein 7-X stellarator.

Nature Physics , 14(8):855–860, 2018.[56] M. Drevlak. Optimization of heterogenous magnet systems. In

Proceedings of the 12thInternational Stellarator Workshop , number P1-17, 1999.[57] M. Drevlak, F. Brochard, P. Helander, J. Kisslinger, M. Mikhailov, C. N¨uhrenberg,J. N¨uhrenberg, and Y. Turkin. ESTELL: A Quasi-Toroidally Symmetric Stellarator.

Contributions to Plasma Physics , 53(6):459, 2013.[58] M. Drevlak, J. Geiger, P. Helander, and Y. Turkin. Fast particle conﬁnement withoptimized coil currents in the W7-X stellarator.

Nuclear Fusion , 54(7):073002, 2014.[59] M. Drevlak, C. Beidler, J. Geiger, P. Helander, and Y. Turkin. Optimisation of stel-larator equilibria with ROSE.

Nuclear Fusion , 59(1):016010, 2018.[60] L. El-Guebaly, P. Wilson, D. Henderson, M. Sawan, G. Sviatoslavsky, R. Slaybaugh,B. Kiedrowski, A. Ibrahim, C. Martin, R. Raﬀray, S. Malang, J. Lyon, L. P. Ku,X. Wang, L. Bromberg, B. Merrill, L. Waganer, F. Najmabadi, and the Aries-CS Team.Designing ARIES-CS Compact Radial Build and Nuclear System: Neutronics, Shield-ing, and Activation.

Fusion Science and Technology , 54:747, 2008.[61] N. M. Ferraro, J.-K. Park, C. Myers, A. Brooks, S. Gerhardt, J. Menard, S. Munaretto,and M. Reinke. Error ﬁeld impact on mode locking and divertor heat ﬂux in NSTX-U.

Nuclear Fusion , 59(8):086021, 2019.[62] L. K. Forbes and S. Crozier. A novel target-ﬁeld method for ﬁnite-length magneticresonance shim coils: I. Zonal shims.

Journal of Physics D: Applied Physics , 34:3447,2001.[63] L. K. Forbes, M. A. Brideson, and S. Crozier. A Target-Field Method to DesignCircular Biplanar Coils for Asymmetric Shim and Gradient Fields.

IEEE Transactionson Magnetics , 41(6):2134, 2005.[64] J. Freidberg.

Ideal MHD . Cambridge University Press, 2014.[65] E. Frieman. Collisional diﬀusion in nonaxisymmetric toroidal systems.

Physics ofFluids , 13(490):490, 1970.[66] A. Galeev and R. Sagdeev.

Theory of Neoclassical Diﬀusion , volume 7 of

Reviews ofPlasma Physics , page 257. 1979. 20367] I. M. Gamba. Viscosity approximating solutions to ODE systems that admit shocks,and their limits.

Advances in Applied Mathematics , 15(2):129–182, 1994.[68] A. Gandini. Importance and sensitivity analysis in assessing system reliability.

IEEETransactions on Reliability , 39(1):61, 1990.[69] P. Garabedian. Three-dimensional stellarator codes.

Proceedings of the NationalAcademy of Sciences , 99(16):10257, 2002.[70] P. R. Garabedian and G. B. McFadden. Design of the DEMO fusion reactor followingITER.

Journal of Research of the National Institute of Standards and Technology , 114(4):229, 2009.[71] H. Gardner. Modelling the behaviour of the magnetic ﬁeld diagnostic coils on the WVII-AS stellarator using a three-dimensional equilibrium code.

Nuclear Fusion , 30(8):1417, 1990.[72] D. Gates and L. Delgado-Aparicio. Origin of tokamak density limit scalings.

PhysicalReview Letters , 108(16):165004, 2012.[73] D. A. Gates, D. Anderson, S. Anderson, M. Zarnstorﬀ, D. A. Spong, H. Weitzner,G. Neilson, D. Ruzic, D. Andruczyk, J. Harris, et al. Stellarator research opportuni-ties: a report of the National Stellarator Coordinating Committee.

Journal of FusionEnergy , 37(1):51, 2018.[74] M. Gavrilovi´c, R. Petrovi´c, and D. ˇSiljak. Adjoint method in the sensitivity analysisof optimal systems.

Journal of the Franklin Institute , 276(1):26, 1963.[75] J. Geiger, C. Beidler, M. Drevlak, H. Maassberg, C. N¨uhrenberg, Y. Suzuki, andY. Turkin. Eﬀects of net currents on the magnetic conﬁguration of W7-X.

Contribu-tions to Plasma Physics , 50(8):770, 2010.[76] A. Geraldini and M. Landreman. Optimizing stellarator surfaces using magnetic islandwidth sensitivity.

Bulletin of the American Physical Society , 64, 2019.[77] S. P. Gerhardt, J. N. Talmadge, J. M. Canik, and D. T. Anderson. Measurements andmodeling of plasma ﬂow damping in the Helically Symmetric eXperiment.

Physics ofPlasmas , 12(5):056116, 2005.[78] M. Giles and N. Pierce. Improved lift and drag estimates using adjoint Euler equations.In , page 3293, 1999.[79] M. B. Giles and N. A. Pierce. An introduction to the adjoint approach to design.

Flow,Turbulence and Combustion , 65(3-4):393, 2000.[80] A. Glasser. The direct criterion of Newcomb for the ideal MHD stability of an axisym-metric toroidal plasma.

Physics of Plasmas , 23(7):072505, 2016.20481] A. Glasser. DCON for stellarators.

Bulletin of the American Physical Society , 63,2018.[82] R. Glowinski and O. Pironneau. On the numerical computation of the minimum-dragproﬁle in laminar ﬂow.

Journal of Fluid Mechanics , 72(2):385, 1975.[83] J. H. Goedbloed and S. Poedts.

Principles of Magnetohydrodynamics: With Applica-tions to Laboratory and Astrophysical Plasmas . Cambridge University Press, 2004.[84] H. Grad. Toroidal containment of a plasma.

The Physics of Fluids , 10(1):137, 1967.[85] J. Greene. A brief review of magnetic wells.

Comments on Plasma Physics and Con-trolled Fusion , 17:389, 1997.[86] G. Grieger, W. Lotz, P. Merkel, J. N¨uhrenberg, J. Sapper, E. Strumberger, H. Wobig,R. Burhenn, V. Erckmann, U. Gasparino, et al. Physics optimization of stellarators.

Physics of Fluids B: Plasma Physics , 4(7):2081, 1992.[87] J. Hadamard.

M´emoire sur le probl`eme d’analyse relatif `a l’´equilibre des plaques´elastiques encastr´ees , volume 33. Imprimerie Nationale, 1908.[88] K. Hammond, A. Anichowski, P. Brenner, T. S. Pedersen, S. Raftopoulos, P. Traverso,and F. Volpe. Experimental and numerical study of error ﬁelds in the CNT stellarator.

Plasma Physics and Controlled Fusion , 58(7):074002, 2016.[89] J. D. Hanson, D. Anderson, M. Cianciosa, P. Franz, J. Harris, G. Hartwell, S. P. Hir-shman, S. F. Knowlton, L. L. Lao, E. A. Lazarus, et al. Non-axisymmetric equilibriumreconstruction for stellarators, reversed ﬁeld pinches and tokamaks.

Nuclear Fusion ,53(8):083016, 2013.[90] K. Harafuji, T. Hayashi, and T. Sato. Computational study of three-dimensionalmagnetohydrodynamic equilibria in toroidal helical systems.

Journal of ComputationalPhysics , 81(1):169, 1989.[91] J. Haslinger and R. A. M¨akinen.

Introduction to Shape Optimization: Theory, Approx-imation, and Computation . Society for Industrial and Applied Mathematics, 2003.[92] D. Hastings, W. Houlberg, and K.-C. Shaing. The ambipolar electric ﬁeld in stellara-tors.

Nuclear Fusion , 25(4):445, 1985.[93] R. Hawryluk and H. Zohm. The challenge and promise of studying burning plasmas.

Physics Today , 72(12):34, 2019.[94] R. D. Hazeltine. Recursive derivation of drift-kinetic equation.

Plasma Physics , 15(1):77, 1973.[95] C. C. Hegna and N. Nakajima. On the stability of Mercier and ballooning modes instellarator conﬁgurations.

Physics of Plasmas , 5(5):1336, 1998.20596] C. C. Hegna, P. W. Terry, and B. J. Faber. Theory of ITG turbulent saturation instellarators: identifying mechanisms to reduce turbulent transport.

Physics of Plasmas ,25(2):022511, 2018.[97] P. Helander. Theory of plasma conﬁnement in non-axisymmetric magnetic ﬁelds.

Re-ports on Progress in Physics , 77(8):087001, 2014.[98] P. Helander and J. N¨uhrenberg. Bootstrap current and neoclassical transport in quasi-isodynamic stellarators.

Plasma Physics and Controlled Fusion , 51(5):055004, 2009.[99] P. Helander and D. J. Sigmar.

Collisional Transport in Magnetized Plasmas . CambridgeUniversity Press, 2005.[100] P. Helander and A. Simakov. Intrinsic ambipolarity and rotation in stellarators.

Phys-ical Review Letters , 101(14):145003, 2008.[101] P. Helander, C. Beidler, T. Bird, M. Drevlak, Y. Feng, R. Hatzky, F. Jenko, R. Kleiber,J. Proll, Y. Turkin, et al. Stellarator and tokamak plasmas: a comparison.

PlasmaPhysics and Controlled Fusion , 54(12):124009, 2012.[102] P. Helander, F. Parra, and S. Newton. Stellarator bootstrap current and plasma ﬂowvelocity at low collisionality.

Journal of Plasma Physics , 83(2), 2017.[103] P. Helander, M. Drevlak, M. Zarnstorﬀ, and S. Cowley. Stellarators with permanentmagnets.

Physical Review Letters , 124(9):095001, 2020.[104] T. Hender, J. Wesley, J. Bialek, A. Bondeson, A. Boozer, R. Buttery, A. Garofalo,T. Goodman, R. Granetz, Y. Gribov, et al. MHD stability, operational limits anddisruptions.

Nuclear Fusion , 47(6):S128, 2007.[105] S. Henneberg, M. Drevlak, and P. Helander. Improving fast-particle conﬁnement inquasi-axisymmetric stellarator optimization.

Plasma Physics and Controlled Fusion ,62(1):014023, 2019.[106] S. Henneberg, M. Drevlak, C. N¨uhrenberg, C. Beidler, Y. Turkin, J. Loizu, and P. He-lander. Properties of a new quasi-axisymmetric conﬁguration.

Nuclear Fusion , 59(2):026014, 2019.[107] E. Highcock, N. Mandell, M. Barnes, and W. Dorland. Optimisation of conﬁnementin a fusion reactor using a nonlinear turbulence model.

Journal of Plasma Physics , 84(2), 2018.[108] M. Hirsch, J. Baldzuhn, C. Beidler, R. Brakel, R. Burhenn, A. Dinklage, H. Ehmler,M. Endler, V. Erckmann, Y. Feng, et al. Major results from the stellarator Wendelstein7-AS.

Plasma Physics and Controlled Fusion , 50(5):053001, 2008.[109] S. P. Hirshman and J. Breslau. Explicit spectrally optimized Fourier series for nestedmagnetic surfaces.

Physics of Plasmas , 5:2664, 1998.206110] S. P. Hirshman and H. K. Meier. Optimized Fourier representations for three dimen-sional magnetic surfaces.

Physics of Fluids , 28:1387, 1985.[111] S. P. Hirshman and J. C. Whitson. Steepest-descent moment method for three-dimensional magnetohydrodynamic equilibria.

Physics of Fluids , 26(12):3553, 1983.[112] S. P. Hirshman, K. C. Shaing, and W. I. van Rij. Consequences of time-reversalsymmetry for the electric ﬁeld scaling of transport in stellarators.

Physical ReviewLetters , 56(16):1697, 1986.[113] S. P. Hirshman, K. C. Shaing, W. I. van Rij, C. O. Beasley, and E. C. Crume. Plasmatransport coeﬃcients for nonsymmetric toroidal conﬁnement systems.

Physics of Flu-ids , 29(9):2951, 1986.[114] S. P. Hirshman, D. A. Spong, J. C. Whitson, B. Nelson, D. B. Batchelor, J. F. Lyon,R. Sanchez, A. Brooks, G. Y.-Fu, R. J. Goldston, et al. Physics of compact stellarators.

Physics of Plasmas , 6(5):1858, 1999.[115] S. P. Hirshman, R. Sanchez, and C. Cook. SIESTA: A scalable iterative equilibriumsolver for toroidal applications.

Physics of Plasmas , 18(6):062504, 2011.[116] D.-M. Ho and R. Kulsrud. Neoclassical transport in stellarators.

Physics of Fluids , 30(2):442, 1987.[117] J. Hofmann, J. Baldzuhn, R. Brakel, Y. Feng, S. Fiedler, J. Geiger, P. Grigull, G. Herre,R. Jaenicke, M. Kick, et al. Stellarator optimization studies in W7-AS.

Plasma Physicsand Controlled Fusion , 38(12A):A193, 1996.[118] S. Hudson, C. Zhu, D. Pfeﬀerl´e, and L. Gunderson. Diﬀerentiating the shape of stel-larator coils with respect to the plasma boundary.

Physics Letters A , 382(38):2732,2018.[119] S. R. Hudson, D. Monticello, A. Reiman, A. Boozer, D. Strickler, S. Hirshman, andM. Zarnstorﬀ. Eliminating islands in high-pressure free-boundary stellarator magne-tohydrodynamic equilibrium solutions.

Physical Review Letters , 89(27):275003, 2002.[120] S. R. Hudson, R. Dewar, M. Hole, and M. McGann. Non-axisymmetric, multi-regionrelaxed magnetohydrodynamic equilibrium solutions.

Plasma Physics and ControlledFusion , 54(1):014005, 2011.[121] L.-M. Imbert-Gerard, E. Paul, and A. Wright. An introduction to symmetries instellarators. arXiv preprint arXiv:1908.05360 , 2019.[122] M. Y. Isaev, J. N¨uhrenberg, M. Mikhailov, W. Cooper, K. Watanabe, M. Yokoyama,K. Yamazaki, A. Subbotin, and V. Shafranov. A new class of quasi-omnigenous con-ﬁgurations.

Nuclear Fusion , 43(10):1066, 2003.207123] A. Jameson, L. Martinelli, and N. Pierce. Optimum aerodynamic design using theNavier-Stokes equations.

Theoretical and Computational Fluid Dynamics , 10(1-4):213,1998.[124] F. Jia, Z. Liu, M. Zaitsev, J. Hennig, and J. G. Korvink. Design multiple-layer gra-dient coils using least-squares ﬁnite element method.

Structural and MultidisciplinaryOptimization , 49(3):523, 2014.[125] S. G. Johnson. The NLopt nonlinear-optimization package, May 2014. URL http://ab-initio.mit.edu/nlopt .[126] H. J. Kelley. Gradient theory of optimal ﬂight paths.

American Rocket Society Journal ,30(10):947, 1960.[127] W. Kernbichler, S. Kasilov, G. Kapper, A. F. Martitsch, V. Nemov, C. Albert, andM. Heyn. Solution of drift kinetic equation in stellarators and tokamaks with brokensymmetry using the code NEO-2.

Plasma Physics and Controlled Fusion , 58(10):104001, 2016.[128] J. Kierzenka and L. F. Shampine. A BVP solver based on residual control and theMaltab PSE.

ACM Transactions on Mathematical Software (TOMS) , 27(3):299–316,2001.[129] J. Kisslinger, C. Beidler, E. Harmeyer, F. Herrnegger, H. Wobig, and W. Maurer. Coilsystem of a Helias reactor. Technical report, 1999.[130] T. Klinger, C. Baylard, C. Beidler, J. Boscary, H. Bosch, A. Dinklage, D. Hartmann,P. Helander, H. Maßberg, A. Peacock, et al. Towards assembly completion and prepa-ration of experimental campaigns of Wendelstein 7-X in the perspective of a path to astellarator fusion power plant.

Fusion Engineering and Design , 88(6-8):461, 2013.[131] R. Kress, V. Maz’ya, and V. Kozlov.

Linear Integral Equations , volume 82. Springer,1989.[132] J. A. Krommes and G. Hu. The role of dissipation in the theory and simulations ofhomogeneous plasma turbulence, and resolution of the entropy paradox.

Physics ofPlasmas , 1(10):3211, 1994.[133] M. D. Kruskal and R. Kulsrud. Equilibrium of a magnetically conﬁned plasma in atoroid.

The Physics of Fluids , 1(4):265, 1958.[134] L. Ku, P. Garabedian, J. Lyon, A. Turnbull, A. Grossman, T. Mau, M. Zarnstorﬀ, andA. Team. Physics design for ARIES-CS.

Fusion Science and Technology , 54(3):673,2008.[135] L. P. Ku and A. H. Boozer. New classes of quasi-helically symmetric stellarators.

Nuclear Fusion , 51:013004, 2011. 208136] M. Landreman. An improved current potential method for fast computation of stel-larator coil shapes.

Nuclear Fusion , 57(4):046003, 2017.[137] M. Landreman and A. H. Boozer. Eﬃcient magnetic ﬁelds for supporting toroidalplasmas.

Physics of Plasmas , 23(3):032506, 2016.[138] M. Landreman and E. J. Paul. Computing local sensitivity and tolerances for stellaratorphysics properties using shape gradients.

Nuclear Fusion , 58(7):076023, 2018.[139] M. Landreman and W. Sengupta. Direct construction of optimized stellarator shapes.Part 1. Theory in cylindrical coordinates.

Journal of Plasma Physics , 84(6), 2018.[140] M. Landreman, H. M. Smith, A. Moll´en, and P. Helander. Comparison of particle tra-jectories and collision operators for collisional transport in nonaxisymmetric plasmas.

Physics of Plasmas , 21(4), 2014.[141] M. Landreman, G. G. Plunk, and W. Dorland. Generalized universal instability: tran-sient linear ampliﬁcation and subcritical turbulence.

Journal of Plasma Physics , 81(5), 2015.[142] M. Landreman, W. Sengupta, and G. G. Plunk. Direct construction of optimizedstellarator shapes. Part 2. Numerical quasisymmetric solutions.

Journal of PlasmaPhysics , 85(1), 2019.[143] S. Lazerson. The virtual-casing principle for 3D toroidal systems.

Plasma Physics andControlled Fusion , 54(12):122002, 2012.[144] S. A. Lazerson, J. Loizu, S. Hirshman, and S. R. Hudson. Veriﬁcation of the idealmagnetohydrodynamic response at rational surfaces in the VMEC code.

Physics ofPlasmas , 23(1):012507, 2016.[145] L. G. Leal.

Advanced Transport Phenomena: Fluid Mechanics and Convective Trans-port Processes . Cambridge University Press, 2007.[146] S. J. Leary, A. Bhaskar, and A. J. Keane. A derivative based surrogate model for ap-proximating and optimizing the output of an expensive computer simulation.

Journalof Global Optimization , 30(1):39–58, 2004.[147] D. Lee, J. Harris, and G. Lee. Magnetic island widths due to ﬁeld perturbations intoroidal stellarators.

Nuclear Fusion , 30(10):2177, 1990.[148] C. Liu, D. P. Brennan, A. Bhattacharjee, and A. H. Boozer. Adjoint Fokker-Planckequation and runaway electron dynamics.

Physics of Plasmas , 23(1):010702, 2016.[149] H. Liu, A. Shimizu, M. Isobe, S. Okamura, S. Nishimura, C. Suzuki, Y. Xu, X. Zhang,B. Liu, J. Huang, et al. Magnetic conﬁguration and modular coil design for the ChineseFirst Quasi-Axisymmetric Stellarator.

Plasma and Fusion Research , 13:3405067, 2018.209150] J.-F. Lobsien, M. Drevlak, T. S. Pedersen, et al. Stellarator coil optimization towardshigher engineering tolerances.

Nuclear Fusion , 58(10):106013, 2018.[151] J.-F. Lobsien, M. Drevlak, T. Kruger, S. Lazerson, C. Zhu, and T. S. Pedersen. Im-proved performance of stellarator coil design optimization.

Journal of Plasma Physics ,86(2):815860202, 2020.[152] N. C. Logan, J.-K. Park, K. Kim, Z. Wang, and J. W. Berkery. Neoclassical toroidalviscosity in perturbed equilibria with general tokamak geometry.

Physics of Plasmas ,20(12):122507, 2013.[153] D. Lortz. The general peeling instability.

Nuclear Fusion , 15(1):49, 1975.[154] M. Drevlak. Automated optimization of stellarator coils.

Fusion Technology , 33:106,1998.[155] H. Maassberg, W. Lotz, and J. N¨uhrenberg. Neoclassical bootstrap current and trans-port in optimized stellarator conﬁgurations.

Physics of Fluids B: Plasma Physics , 5(10):3728, 1993.[156] G. B. McFadden. An artiﬁcial viscosity method for the design of supercritical airfoils.1979.[157] C. Mercier and H. Luc. The MHD approach to the problem of plasma conﬁnementin closed magnetic conﬁgurations.

Lectures in Plasma Physics, Commission of theEuropean Communities, Luxembourg , 1974.[158] P. Merkel. Solution of stellarator boundary value problems with external currents.

Nuclear Fusion , 27(5):867, 1987.[159] M. Mikhailov, M. Drevlak, J. N¨uhrenberg, and V. Shafranov. Medium- β free-boundaryequilibria of a quasi-isodynamic stellarator. Plasma Physics Reports , 38(6):439, 2012.[160] M. Mikhailov, J. N¨uhrenberg, and R. Zille. Elimination of current sheets at resonancesin three-dimensional toroidal ideal-magnetohydrodynamic equilibria.

Nuclear Fusion ,59(6):066002, 2019.[161] W. H. Miner Jr, P. M. Valanju, S. P. Hirshman, A. Brooks, and N. Pomphrey. Useof a genetic algorithm for compact stellarator coil design.

Nuclear Fusion , 41(9):1185,2001.[162] B. Mohammadi and O. Pironneau. Shape optimization in ﬂuid mechanics.

AnnualReview of Fluid Mechanics , 36:255, 2004.[163] S. Murakami, A. Wakasa, H. Maassberg, C. Beidler, H. Yamada, K. Watanabe, L. E.Group, et al. Neoclassical transport optimization of LHD.

Nuclear Fusion , 42(11):L19,2002. 210164] S. Murakami, H. Yamada, M. Sasao, M. Isobe, T. Ozaki, T. Saida, P. Goncharov,J. Lyon, M. Osakabe, T. Seki, et al. Eﬀect of neoclassical transport optimization onenergetic ion conﬁnement in LHD.

Fusion Science and Technology , 46(2):241–247,2004.[165] H. Mynick. Transport optimization in stellarators.

Physics of Plasmas , 13(5):058102,2006.[166] F. Najmabadi, A. Raﬀray, S. Abdel-Khalik, L. Bromberg, L. Crosatti, L. El-Guebaly,P. Garabedian, A. Grossman, D. Henderson, A. Ibrahim, et al. The ARIES-CS compactstellarator fusion power plant.

Fusion Science and Technology , 54(3):655, 2008.[167] B. Nelson, L. Berry, A. Brooks, M. Cole, J. Chrzanowski, H.-M. Fan, P. Fogarty,P. Goranson, P. Heitzenroeder, S. Hirshman, et al. Design of the National CompactStellarator Experiment (NCSX).

Fusion Engineering and Design , 66:169, 2003.[168] V. Nemov, S. Kasilov, W. Kernbichler, and M. Heyn. Evaluation of 1/ ν neoclassicaltransport in stellarators. Physics of Plasmas , 6(12):4622, 1999.[169] V. Nemov, S. Kasilov, W. Kernbichler, and G. Leitold. The ∇ B drift velocity oftrapped particles in stellarators. Physics of Plasmas , 12(11):112507, 2005.[170] J. Nocedal and S. J. Wright.

Numerical Optimization . Springer, 2006.[171] A. A. Novotny and J. Sokolowski.

Topological Derivatives in Shape Optimization .Springer, 2013.[172] C. N¨uhrenberg. Personal communication, 4 2020.[173] C. N¨uhrenberg and A. H. Boozer. Magnetic islands and perturbed plasma equilibria.

Physics of Plasmas , 10(7):2840, 2003.[174] C. N¨uhrenberg, A. H. Boozer, and S. R. Hudson. Magnetic-surface quality in nonax-isymmetric plasma equilibria.

Physical Review Letters , 102(23):235001, 2009.[175] J. N¨uhrenberg and R. Zille. Quasi-helically symmetric toroidal stellarators.

PhysicsLetters A , 129:113, 1988.[176] J. N¨uhrenberg, W. Lotz, and S. Gori. Theory of fusion plasmas. In

Proceedings of theJoint Varenna-Lausanne International Workshop , page 3, 1994.[177] L. Onsager. Reciprocal relations in irreversible processes. I.

Physical review , 37(4):405, 1931.[178] L. Onsager. Reciprocal relations in irreversible processes. II.

Physical review , 38(12):2265, 1931. 211179] S. Osher, R. Fedkiw, and K. Piechor. Level set methods and dynamic implicit surfaces.

Applied Mechanics Review , 57(3):B15, 2004.[180] C. Othmer. Adjoint methods for car aerodynamics.

Journal of Mathematics in Indus-try , 4(1):6, 2014.[181] J.-K. Park.

Ideal Perturbed Equilibria in Tokamaks . PhD thesis, Princeton University,2009.[182] J.-K. Park, A. H. Boozer, and A. H. Glasser. Computation of three-dimensional toka-mak and spherical torus equilibria.

Physics of Plasmas , 14(5):052110, 2007.[183] J.-K. Park, M. J. Schaﬀer, J. E. Menard, and A. H. Boozer. Control of asymmetricmagnetic perturbations in tokamaks.

Physical Review Letters , 99(19):195003, 2007.[184] E. J. Paul, M. Landreman, F. M. Poli, D. A. Spong, H. M. Smith, and W. Dorland.Rotation and neoclassical ripple transport in ITER.

Nuclear Fusion , 57(11):116044,2017.[185] E. J. Paul, M. Landreman, A. Bader, and W. Dorland. An adjoint method for gradient-based optimization of stellarator coil shapes.

Nuclear Fusion , 58(7):076015, 2018.[186] E. J. Paul, I. G. Abel, M. Landreman, and W. Dorland. An adjoint method forneoclassical stellarator optimization.

Journal of Plasma Physics , 85(5), 2019.[187] E. J. Paul, T. Antonsen, M. Landreman, and W. A. Cooper. Adjoint approach tocalculating shape gradients for three-dimensional magnetic conﬁnement equilibria. Part2. Applications.

Journal of Plasma Physics , 86(1):905860103, 2020.[188] T. S. Pedersen, M. Otte, S. Lazerson, P. Helander, S. Bozhenkov, C. Biedermann,T. Klinger, R. C. Wolf, H.-S. Bosch, T. Wendelstein, et al. Conﬁrmation of thetopology of the Wendelstein 7-X magnetic ﬁeld to better than 1: 100,000.

NatureCommunications , 7:13493, 2016.[189] N. A. Pierce and M. B. Giles. Adjoint and defect error bounding and correction forfunctional estimates.

Journal of Computational Physics , 200:769, 2004.[190] O. Pironneau. On optimum design in ﬂuid mechanics.

Journal of Fluid Mechanics , 64(1):97, 1974.[191] O. Pironneau.

Optimal Shape Design for Elliptic Systems . Springer, 1982.[192] R. E. Plessix. A review of the adjoint-state method for computing the gradient of afunctional with geophysical applications.

Geophysical Journal International , 167(2):495, 2006. 212193] G. G. Plunk, M. Landreman, and P. Helander. Direct construction of optimized stel-larator shapes. Part 3. Omnigenity near the magnetic axis.

Journal of Plasma Physics ,85(6), 2019.[194] N. Pomphrey, L. Berry, A. Boozer, A. Brooks, R. Hatcher, S. Hirshman, L.-P. Ku,W. Miner, H. Mynick, W. Reiersen, D. Strickler, and P. Valanju. Innovations incompact stellarator coil design.

Nuclear Fusion , 41:339, 2001.[195] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery.

Numerical Recipes:The Art of Scientiﬁc Computing . Cambridge University Press, 2007.[196] J. Proll, H. Mynick, P. Xanthopoulos, S. Lazerson, and B. Faber. TEM turbulenceoptimisation in stellarators.

Plasma Physics and Controlled Fusion , 58(1):014006, 2015.[197] A. Reiman, G. Fu, S. Hirshman, L. Ku, D. Monticello, H. Mynick, M. Redi, D. Spong,M. Zarnstorﬀ, B. Blackwell, et al. Physics design of a high-quasi-axisymmetric stel-larator.

Plasma Physics and Controlled Fusion , 41(12B):B273, 1999.[198] M. Rosenbluth, R. Hazeltine, and F. L. Hinton. Plasma transport in toroidal conﬁne-ment systems.

The Physics of Fluids , 15(1):116, 1972.[199] W. Rudin.

Real and Complex Analysis . Tata McGraw-Hill Education, 2006.[200] N. Rust, B. Heinemann, B. Mendelevitch, A. Peacock, and M. Smirnow. W7-X neutral-beam-injection : Selection of the NBI source positions for experiment start-up.

FusionEngineering and Design , 86(6-8):728, 2011.[201] S. Sakakibara, K. Watanabe, Y. Suzuki, Y. Narushima, S. Ohdachi, N. Nakajima,F. Watanabe, L. Garcia, A. Weller, K. Toi, et al. MHD study of the reactor-relevanthigh-beta regime in the Large Helical Device.

Plasma Physics and Controlled Fusion ,50(12):124014, 2008.[202] R. Sanchez, S. Hirshman, A. Ware, L. Berry, and D. Spong. Ballooning stabilityoptimization of low-aspect-ratio stellarators.

Plasma Physics and Controlled Fusion ,42(6):641, 2000.[203] T. Sauer.

Numerical Analysis . Pearson, 2012.[204] C. Schwab. Ideal magnetohydrodynamics: Global mode analysis of three-dimensionalplasma conﬁgurations.

Physics of Fluids B: Plasma Physics , 5(9):3195, 1993.[205] K.-C. Shaing, E. Crume Jr, J. Tolliver, S. Hirshman, and W. Van Rij. Bootstrap currentand parallel viscosity in the low collisionality regime in toroidal plasmas.

Physics ofFluids B: Plasma Physics , 1(1):148, 1989.[206] A. Shimizu, H. Liu, M. Isobe, S. Okamura, S. Nishimura, C. Suzuki, Y. Xu, X. Zhang,J. Liu, B.and Huang, et al. Conﬁguration property of the Chinese First Quasi-Axisymmetric Stellarator.

Plasma and Fusion Research , 13:3403123, 2018.213207] R. Sinclair, J. Hosea, and G. Sheﬃeld. Magnetic surface mappings by storage ofphase-stabilized low-energy electron beams.

Applied Physics Letters , 17(2):92, 1970.[208] C. L. Smith and S. Cowley. The path to fusion power.

Philosophical Transactionsof the Royal Society A: Mathematical, Physical and Engineering Sciences , 368(1914):1091, 2010.[209] L. Spitzer Jr. A proposed stellarator. Technical report, Princeton University, NJForrestal Research Center, 1951.[210] L. Spitzer Jr. Magnetic ﬁelds and particle orbits in a high-density stellarator. Technicalreport, Princeton University, NJ Project Matterhorn, 1952.[211] L. Spitzer Jr. The stellarator concept.

The Physics of Fluids , 1(4):253, 1958.[212] D. A. Spong and J. H. Harris. New QP / QI Symmetric Stellarator Conﬁgurations.

Plasma and Fusion Research , 5:S2039, 2010.[213] D. A. Spong, S. P. Hirshman, J. C. Whitson, D. B. Batchelor, B. A. Carreras, V. E.Lynch, and J. A. Rome. J * optimization of small aspect ratio stellarator/tokamakhybrid devices. Physics of Plasmas , 5(5):1752, 1998.[214] D. A. Spong, S. P. Hirshman, L. A. Berry, J. F. Lyon, R. H. Fowler, D. J. Strickler,M. J. Cole, B. N. Nelson, D. E. Williamson, A. S. Ware, et al. Physics issues of compactdrift optimized stellarators.

Nuclear Fusion , 41(6):711, 2001.[215] T. H. Stix. Highlights in early stellarator research at princeton.

Journal of PlasmaFusion Research Series , 1:3, 1998.[216] D. J. Strickler, L. A. Berry, and S. P. Hirshman. Designing Coils for Compact Stel-larators.

Fusion Science and Technology , 41(2):107, 2002.[217] D. J. Strickler, L. A. Berry, and S. P. Hirshman. Integrated plasma and coil optimiza-tion for compact stellarators. Technical report, 2003.[218] D. J. Strickler, S. P. Hirshman, D. A. Spong, M. J. Cole, J. F. Lyon, B. E. Nelson,D. E. Williamson, and A. S. Ware. Development of a robust quasi-poloidal compactstellarator.

Fusion Science and Technology , 45(1):15, 2004.[219] E. Strumberger and S. G¨unter. CASTOR3D: Linear stability studies for 2D and 3Dtokamak equilibria.

Nuclear Fusion , 57(1):016032, 2016.[220] R. Strykowsky, T. Brown, J. Chrzanowski, M. Cole, P. Heitzenroeder, G. Neilson,D. Rej, and M. Viol. Engineering cost & schedule lessons learned on ncsx. In , pages 1–4. IEEE, 2009.214221] H. Sugama, T.-H. Watanabe, and M. Nunami. Linearized model collision operatorsfor multiple ion species plasmas and gyrokinetic entropy balance equations.

Physics ofPlasmas , 16(11):112503, 2009.[222] G. Sun and S. Wang. A review of the artiﬁcial neural network surrogate modeling inaerodynamic design.

Proceedings of the Institution of Mechanical Engineers, Part G:Journal of Aerospace Engineering , 233(16):5863–5872, 2019.[223] T. Sunn Pedersen, A. Dinklage, Y. Turkin, R. Wolf, S. Bozhenkov, J. Geiger,G. Fuchert, H.-S. Bosch, K. Rahbarnia, H. Thomsen, et al. Key results from theﬁrst plasma operation phase and outlook for future performance in Wendelstein 7-X.

Physics of Plasmas , 24(5):055503, 2017.[224] K. Svanberg. A class of globally convergent optimization methods based on conser-vative convex separable approximations.

SIAM Journal on Optimization , 12(2):555,2002.[225] A. N. Tikhonov. On the solution of ill-posed problems and the method of regularization.In

Doklady Akademii Nauk , volume 151, pages 501–504. Russian Academy of Sciences,1963.[226] L. N. Trefethen and D. Bau III.

Numerical Linear Algebra . Society for Industrial andApplied Mathematics, 1997.[227] V. Tribaldos and J. Guasp. Neoclassical global ﬂux simulations in stellarators.

Plasmaphysics and controlled fusion , 47(3):545, 2005.[228] R. Turner. Gradient coil design : A review of methods.

Magnetic Resonance Imaging ,11:903, 1993.[229] J. G. Van Bladel.

Electromagnetic Fields , volume 19. John Wiley & Sons, 2007.[230] W. I. van Rij and S. P. Hirshman. Variational bounds for transport coeﬃcients inthree-dimensional toroidal plasmas.

Physics of Fluids B: Plasma Physics , 1(3):563,1989.[231] D. Venditti and D. Darmofal. A multilevel error estimation and grid adaptive strategyfor improving the accuracy of integral outputs. In , page 3292, 1999.[232] F. Wagner. Stellarators and optimised stellarators.

Fusion Technology , 33(2T):67,1998.[233] F. Wagner, S. B¨aumel, J. Baldzuhn, N. Basse, R. Brakel, R. Burhenn, A. Dinklage,D. Dorst, H. Ehmler, M. Endler, et al. W7-AS: One step of the Wendelstein stellaratorline.

Physics of Plasmas , 12(7):072509, 2005.215234] A. Weller, S. Sakakibara, K. Watanabe, K. Toi, J. Geiger, M. Zarnstorﬀ, S. Hudson,A. Reiman, A. Werner, C. N¨uhrenberg, et al. Signiﬁcance of MHD eﬀects in stellaratorconﬁnement.

Fusion Science and Technology , 50(2):158, 2006.[235] J. Wesson and D. J. Campbell.

Tokamaks , volume 149. Oxford University Press, 2011.[236] D. Williamson, A. Brooks, T. Brown, J. Chrzanowski, M. Cole, H.-M. Fan, K. Freuden-berg, P. Fogarty, T. Hargrove, P. Heitzenroeder, G. Lovett, P. Miller, R. Myatt, B. Nel-son, W. Reiersen, and D. Strickler. Modular coil design developments for the NationalCompact Stellarator Experiment (NCSX).

Fusion Engineering and Design , 75-79:71,2005.[237] R. Wolf, A. Alonso, S. ¨Ak¨aslompolo, J. Baldzuhn, M. Beurskens, C. Beidler, C. Bie-dermann, H.-S. Bosch, S. Bozhenkov, R. Brakel, et al. Performance of Wendelstein7-X stellarator plasmas during the ﬁrst divertor operation phase.

Physics of Plasmas ,26(8):082504, 2019.[238] X. Wu, C. Wang, and T. Kozlowski. Kriging-based surrogate models for uncertaintyquantiﬁcation and sensitivity analysis. In

Proceedings of the MC-2017, InternationalConference on Mathematics Computational Methods Applied to Nuclear Science Engi-neering , 2017.[239] P. Xanthopoulos, H. Mynick, P. Helander, Y. Turkin, G. Plunk, F. Jenko, T. G¨orler,D. Told, T. Bird, and J. Proll. Controlling turbulence in present and future stellarators.

Physical Review Letters , 113(15):155001, 2014.[240] K. Yamazaki, N. Yanagi, H. Ji, H. Kaneko, N. Ohyabu, T. Satow, S. Morimoto, J. Ya-mamoto, O. Motojima, and the LHD Design Group. Requirements for accuracy ofsuperconducting coils in the Large Helical Device.

Fusion Engineering and Design , 20:79–86, 1993.[241] S. Yoshikawa and T. Stix. Experiments on the Model C stellarator.

Nuclear Fusion ,25(9):1275, 1985.[242] M. Zarnstorﬀ, L. Berry, A. Brooks, E. Fredrickson, G. Fu, S. Hirshman, S. Hudson,L. Ku, E. Lazarus, D. Mikkelsen, et al. Physics of the compact advanced stellaratorNCSX.

Plasma Physics and Controlled Fusion , 43(12A):A237, 2001.[243] C. Zhu, S. R. Hudson, Y. Song, and Y. Wan. New method to design stellarator coilswithout the winding surface.

Nuclear Fusion , 58:016008, 2018.[244] C. Zhu, S. R. Hudson, Y. Song, and Y. Wan. Designing stellarator coils by a modiﬁedNewton method using FOCUS.

Plasma Physics and Controlled Fusion , 60(6):065008,2018. 216245] C. Zhu, D. A. Gates, S. R. Hudson, H. Liu, Y. Xu, A. Shimizu, and S. Okamura.Identiﬁcation of important error ﬁelds in stellarators using the Hessian matrix method.

Nuclear Fusion , 59(12):126007, 2019.[246] C. Zhu, M. Zarnstorﬀ, D. Gates, and A. Brooks. Designing stellarators using perpen-dicular permanent magnets. arXiv preprint arXiv:1912.05144arXiv preprint arXiv:1912.05144