[PDF] Parallel finite volume simulation of the spherical shell dynamo with pseudo-vacuum magnetic boundary conditions

Abstract

In this paper, we study the parallel simulation of the magnetohydrodynamic (MHD) dynamo in a rapidly rotating spherical shell with pseudo-vacuum magnetic boundary conditions. A second-order finite volume scheme based on a collocated quasi-uniform cubed-sphere grid is applied to the spatial discretization of the MHD dynamo equations. To ensure the solenoidal condition of the magnetic field, we adopt a widely-used approach whereby a pseudo-pressure is introduced into the induction equation. The temporal integration is split by a second-order approximate factorization approach, resulting in two linear algebraic systems both solved by a preconditioned Krylov subspace iterative method. A multi-level restricted additive Schwarz preconditioner based on domain decomposition and multigrid method is then designed to improve the efficiency and scalability. Accurate numerical solutions of two benchmark cases are obtained with our code, comparable to the existing local method results. Several large-scale tests performed on the Sunway TaihuLight supercomputer show good strong and weak scalabilities and a noticeable improvement from the multi-level preconditioner with up to 10368 processor cores.

Full PDF

PParallel ﬁnite volume simulation of the spherical shelldynamo with pseudo-vacuum magnetic boundaryconditions

Liang Yin a,b , Chao Yang c, ∗ , Shi-Zhuang Ma b , Ying Cai d , Keke Zhang a a State Key Laboratory of Lunar and Planetary Sciences, Macau University of Science andTechnology, Macau, China b School of Engineering Science, University of Chinese Academy of Sciences, Beijing100049, China c School of Mathematical Sciences, Peking University, Beijing 100871, China d Institute of Applied Physics and Computational Mathematics, Beijing 100094, China

Abstract

In this paper, we study the parallel simulation of the magnetohydrodynamic(MHD) dynamo in a rapidly rotating spherical shell with pseudo-vacuum mag-netic boundary conditions. A second-order ﬁnite volume scheme based on acollocated quasi-uniform cubed-sphere grid is applied to the spatial discretiza-tion of the MHD dynamo equations. To ensure the solenoidal condition of themagnetic ﬁeld, we adopt a widely-used approach whereby a pseudo-pressure isintroduced into the induction equation. The temporal integration is split by asecond-order approximate factorization approach, resulting in two linear alge-braic systems both solved by a preconditioned Krylov subspace iterative method.A multi-level restricted additive Schwarz preconditioner based on domain de-composition and multigrid method is then designed to improve the eﬃciencyand scalability. Accurate numerical solutions of two benchmark cases are ob-tained with our code, comparable to the existing local method results. Severallarge-scale tests performed on the Sunway TaihuLight supercomputer show goodstrong and weak scalabilities and a noticeable improvement from the multi-level ∗ Corresponding author

Email addresses: [email protected] (Liang Yin), [email protected] (Chao Yang), [email protected] (Shi-Zhuang Ma), [email protected] (Ying Cai),

[email protected] (Keke Zhang)

Preprint submitted to Elsevier September 7, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] S e p reconditioner with up to 10368 processor cores. Keywords:

Spherical shell dynamo, Pseudo-vacuum condition, Parallelsimulation, Finite volume method, Cubed-sphere grid, Multilevel method

1. Introduction

The magnetic ﬁeld of the Earth as well as many other planets is widelythought to be generated by the convection of the electrically conducting ﬂuidin the outer core, which creates the so-called self-excited dynamo action (Zhangand Schubert, 2000; Christensen and Wicht, 2007; Wicht and Tilgner, 2010;Jones, 2011; Roberts and King, 2013; Moﬀatt and Dormy, 2019; Deguen andLasbleis, 2020). Due to a series of reasons (Aurnou et al., 2015; Vantieghemet al., 2016), it is a challenging task to fully understand the dynamics in plane-tary ﬂuid cores. Starting with the pioneering work by Glatzmaier and Roberts(1995), Kageyama and Sato (1995) and Kuang and Bloxham (1997), signiﬁ-cant progresses have been made in understanding of the origin and evolution ofthe Earth’s magnetic ﬁeld by means of numerical dynamo simulations (Sheykoet al., 2016; Christensen, 2018; Petitdemange, 2018; Aubert, 2019). Althoughthere has been a number of numerical codes for dynamo modelling developedindependently by various groups (Matsui et al., 2016), there is still a long wayfrom achieving dynamo simulations with physically realistic parameters, duemainly to the diﬃculties in extreme-scale spatial resolutions (Glatzmaier, 2002;Jones, 2011; Roberts and King, 2013; Aurnou et al., 2015). Tremendous amountsof computing resources are required for such extreme-resolution dynamo sim-ulations, which can only be made possible with the aid of massively parallelsupercomputers (Harder and Hansen, 2005; Chan et al., 2006). Innovationsin the numerical algorithms and their applications on massively parallel su-percomputers are likely beneﬁcial to extend the parameter regime in dynamosimulations towards more realistic values relevant to the planetary cores (Mat-sui and Okuda, 2002, 2004a; Harder and Hansen, 2005; Chan et al., 2006, 2007;Matsui et al., 2016). 2s reported in Matsui et al. (2016), a majority of the existing widely-used nu-merical dynamo models employ global-nature spectral methods, which are basedon the poloidal-toroidal decomposition and spherical harmonic expansions. Thesolenoidal condition of the velocity and magnetic ﬁeld and the insulating bound-ary condition for the magnetic ﬁeld can be easily dealt with in such methods.However, a signiﬁcant number of global communications are usually requiredfor the computation of nonlinear terms, which could make spectral methodsless suitable for large-scale parallel computations (Harder and Hansen, 2005;Chan et al., 2006, 2007; Wicht et al., 2009). Besides, spectral methods are oftenhard to be adapted to more complicated domains without a spherical symmetry(Iskakov et al., 2004; Vantieghem et al., 2016). In contrast, local discretiza-tion approaches, such as ﬁnite volume and ﬁnite element methods, show morepotentials for parallel computations and could be more ﬂexible to complicateddomains, thus are bringing an increasing interest in dynamo simulations (e.g.Kageyama and Sato, 1995, 1997; Hejda and Reshetnyak, 2003, 2004; Kageyamaand Yoshida, 2005; Harder and Hansen, 2005; Matsui and Okuda, 2002, 2004a,b,2005; Chan et al., 2001a,b, 2006, 2007; Vantieghem et al., 2016; Yin et al., 2017,2019). However, the applications of the local methods in dynamo simulationsstill face several diﬃculties, such as: (i) the solenoidal conditions of the veloc-ity and magnetic ﬁeld, (ii) the insulating boundary condition for the magneticﬁeld, and (iii) parallel scalability. To cope with the solenoidal constraint, mostof the dynamo simulations adopt a projection-based method introduced by T´oth(2000), in which a pseudo-pressure gradient is added into the induction equationand the pseudo-pressure is interpreted as an eﬀecting projection of the provi-sional magnetic ﬁeld onto the solenoidal space, just as the pressure in momentumequation. This method has been applied successfully to a large number of dy-namo models (e.g. Chan et al., 2001a; Harder and Hansen, 2005; Chan et al.,2007; Vantieghem et al., 2016).The exterior space outside the Earth’s core is generally thought as an electri-cally insulating medium, resulting in a non-local nature of the magnetic bound-ary condition since the solution of a Laplace equation for the magnetic scalar3otential in the inﬁnite exterior domain is theoretically required. This insu-lating boundary condition brings substantial diﬃculties in dynamo simulationswith local methods. A straightforward approximation is to replace the inﬁniteexterior extent with a ﬁnite domain since the magnetic scalar potential declinesas O ( r − ) in the insulating exterior and can be approximated as zero at the loca-tion far enough from the ﬂuid domain of interest. Based on this approximation,several dynamo models with local methods treat the insulating boundary con-dition via an extra numerical cost in a ﬁnite exterior domain. Examples includeChan et al. (2001a) and Chan et al. (2007) in which a weak conductivity ap-proximation is introduced and Matsui and Okuda (2005) where a formulation ofthe magnetic vector potential is used. Another approach is to introduce a math-ematically equivalent boundary integral formulation proposed by Iskakov et al.(2004), in which the 3-D Laplace equation for the magnetic potential is recast asa 2-D integral equation on the boundary surface next to the insulating medium.However, such approach usually leads to a higher computational cost due to thedense coeﬃcient matrix and the global communication between all processorshandling the boundary points. On the other hand, pseudo-vacuum boundaryconditions, which prescribe the tangential component of the magnetic ﬁeld onthe boundary to be zero, since ﬁrst adopted by Kageyama and Sato (1995) intheir ﬁnite diﬀerence code, have become a popular alternative. Implementationof such conditions in local methods is straightforward and no extra numericalcost or global communication is required. Though these conditions may resultin quite diﬀerent numerical solutions from the insulating condition (Harder andHansen, 2005; Jackson et al., 2014), it may be quite suitable to apply them tothe benchmark studies, as was done in Jackson et al. (2014) and Vantieghemet al. (2016), thus providing a convenient way to validate dynamo codes withlocal methods.The parallel scalability, which is the key limiting factor for the massive-scale simulations, is frequently investigated in dynamo simulations to diﬀerentdegrees (e.g. Harder and Hansen, 2005; Chan et al., 2006, 2007; Vantieghemet al., 2016). In particular, parallel performance benchmarks from 15 widely-4sed parallel dynamo models are thoroughly reported by Matsui et al. (2016)but only two codes based on local methods are included therein. While localmethods show great potentials for the massive-scale dynamo simulations, thestudy on improving the parallel scalability of local codes is still of critical impor-tance. The temporal integration in the projection-based local models generallyinvolves a fractional step algorithm consisting of a prediction procedure and acorrection step. In practice, most of the computing time in the temporal inte-gration is spent on the numerical solution of the pressure Poisson equation in thecorrection step (Harder and Hansen, 2005; Vantieghem et al., 2016; Yin et al.,2017). To improve the parallel performance of this type of code, the designof a scalable solver for the prediction equation and especially for the pressurePoisson equation is highly desirable.In this work, we present a parallel ﬁnite volume solution for the convection-driven magnetohydrodynamic (MHD) dynamo in a rapidly rotating sphericalshell with pseudo-vacuum magnetic boundary conditions. As a continuation toour previous work (Yin et al., 2017) on the non-magnetic convection problem,this paper inherits the approximate factorization method in temporal integra-tion and the ﬁnite volume scheme on a collocated quasi-uniform cubed-spheregrid, and focuses on the algorithms and implementations related to the mag-netic ﬁeld. In addition to that, eﬀorts have also been made on the design of amulti-level restricted additive Schwarz preconditioner based on domain decom-position and multigrid method to improve the eﬃciency and scalability. It isworth mentioning that the adopted cubed-sphere grid can be easily extended toellipsoidal shell domains and, in theory, other geometries which can be expressedby a similar projection relationship.The remainder of the paper is organized as follows. We ﬁrst present the gov-erning equations for the MHD dynamo problem and the boundary conditions inSection 2. Then in Section 3, we introduce the numerical methods including thetemporal integration, spatial discretization and parallel multi-level solver. Thenumerical results about the validation and parallel performance are reported inSection 4. We conclude the paper in Section 5.5 . Mathematical model In this work we focus our discussion on solving the convection-driven MHDdynamo problem in a rapidly rotating spherical shell with pseudo-vacuum mag-netic boundary conditions. The spherical shell, with outer radius r o and innerradius r i , is ﬁlled with electrically conducting viscous incompressible ﬂuids androtates with a constant angular velocity Ω = Ωˆ z , where ˆ z is a unit vector par-allel to the axis of rotation. The incompressible ﬂuids in the spherical shell isassumed to satisfy the Boussinesq approximation, with the density ρ , kineticviscosity ν , thermal diﬀusivity κ , thermal expand coeﬃciency α , magnetic dif-fusivity η , magnetic permeability µ . The temperatures on the inner and outerboundaries are ﬁxed to be T i and T o , respectively and the temperature diﬀer-ence is denoted by ∆ T = T i − T o . Choosing the shell thickness D = r o − r i asthe fundamental length scale, D /ν as the time scale, ν/D as the velocity scale,∆ T as the temperature scale, √ ρµη Ω as the magnetic ﬁeld scale, ρν Ω as thepressure scale, we can obtain the non-dimensional governing equations of theMHD dynamo problem E (cid:18) ∂ u ∂t + u · ∇ u − ∇ u (cid:19) + 2ˆ z × u + ∇ P = Ra r r o ( T − T ) + 1 P m B · ∇ B , (1) ∇ · u = 0 , (2) ∂T∂t + u · ∇ T = 1 P r ∇ T, (3) ∂ B ∂t = ∇ × ( u × B ) + 1 P m ∇ B , (4) ∇ · B = 0 , (5)where u , P , B , T , T and r are non-dimensional velocity, reduced pressure,magnetic ﬁeld, temperature, reference temperature and spatial position vector,respectively. The reference temperature T can be expressed by T ( r ) = r i (cid:16) r o r − (cid:17) , (6)where the dimensionless radii are set to be r i = 7 / , r o = 20 /

13. The non-dimensional parameters

E, Ra, P r, P m in the above equations are Ekman num-6er, modiﬁed Rayleigh number, Prandtl number and magnetic Prandtl numberrespectively and deﬁned by E = ν Ω D , Ra = αg o ∆ T Dν Ω , P r = νκ , P m = νη , (7)where g o is the gravitational acceleration at the outer radius.An equivalent form of the magnetic induction equation (4) is ∂ B ∂t = ∇ × ( u × B ) − P m ∇ × ∇ × B . (8)Applying the divergence operator to the above equation, we can obtain ∂ ( ∇ · B ) ∂t = 0 . (9)This equation indicates that the magnetic ﬁeld will keep the divergence-free con-straint (5) all the time in the evolution if the initial magnetic ﬁeld is solenoidal.In numerical simulations, however, the divergence-free constraint is diﬃcult tomaintain. To overcome this diﬃculty, we adopt a technique of introducinga pseudo-pressure gradient into the magnetic induction equation (T´oth, 2000;Harder and Hansen, 2005; Chan et al., 2007; Vantieghem et al., 2016) and re-place the induction equation (4) with the following equation ∂ B ∂t + 1 E ∇ P b = ∇ × ( u × B ) + 1 P m ∇ B , (10)where P b is the pseudo-pressure. Therefore, a projection method similar to thewell-known treatment (Chorin, 1968; Guermond et al., 2006) of velocity ﬁeldscan be applied to the magnetic ﬁeld to ensure the divergence-free constraint.Replacing the temperature T with an auxiliary temperature variable Θ = T − T , we can rewrite the non-dimensional governing equations as ∂ u ∂t − ∇ u + 2 E ˆ z × u − RaEr o Θ r + 1 E ∇ P = − u · ∇ u + 1 EP m B · ∇ B , (11) ∂ Θ ∂t − P r ∇ Θ − r i r o r u · r = − u · ∇ Θ , (12) ∂ B ∂t − P m ∇ B + 1 E ∇ P b = ( B · ∇ ) u − ( u · ∇ ) B , (13) ∇ · u = 0 , (14) ∇ · B = 0 . (15)7o solve the above system, it is necessary to apply proper boundary conditionsto the velocity, temperature and magnetic ﬁeld. On the shell boundaries, weemploy the no-slip condition for the velocity and isothermal condition for thetemperature, u = 0 , Θ = 0 , r = r i , r o . (16)And for the magnetic ﬁeld, we adopt the pseudo-vacuum boundary condition r × B = 0 , r = r i , r o , (17)on the shell boundaries, which indicates that the tangential component of B iszero and only the normal component exists (Kageyama and Sato, 1995; Jacksonet al., 2014). The value of the normal component on the boundaries can beconstrained by the solenoidal condition ∇ · B = 0.

3. Numerical methods

In this section, we present the proposed numerical methods to discretize andsolve the MHD dynamo equations (11)–(15) with the boundary conditions (16)–(17). Since some of the algorithms have already been introduced in a previouswork that does not involve the magnetic ﬁeld (Yin et al., 2017), we will focuson the treatments of the issues related to the magnetic ﬁeld.

A second-order approximate factorization method (Dukowicz and Dvinsky,1992) was applied successfully to deal with the temporal integration and ensurethe solenoidal condition of the velocity in the non-magnetic convection problem(Chan et al., 2006; Yin et al., 2017). In this section, we inherit the approx-imate factorization method and extend it to the temporal integration of themagnetic ﬁeld. The dynamo governing equations (11)–(15) can be rewritten in8he following operator form ∂ u ∂t − L ( u ) − R (Θ) + G ( P ) = f , (18) ∂ Θ ∂t − L (Θ) − R ( u ) = f , (19) ∂ B ∂t − L ( B ) + G ( P b ) = f , (20) D ( u ) = 0 , (21) D ( B ) = 0 , (22)where the L , R , G , L , R , L and D are linear operators deﬁned by  L ( u ) = ∇ u − E (ˆ z × u ) , R (Θ) = RaEr o (Θ r ) , G ( P ) = 1 E ∇ P,L (Θ) = 1 P r ∇ Θ , R ( u ) = r i r o r u · r , L ( B ) = 1 P m ∇ B ,G ( P b ) = 1 E ∇ P b , D ( u ) = ∇ · u , D ( B ) = ∇ · B , (23)and the right hand sides are nonlinear terms  f = − u · ∇ u + 1 EP m B · ∇ B ,f = − u · ∇ Θ , f = B · ∇ u − u · ∇ B . (24)Applying the Crank-Nicolson scheme to the linear operators and discretizingall terms spatially, the governing equations (18)–(22) can be fully discretized as u n +1 − u n ∆ t − L (cid:18) u n +1 + u n (cid:19) − R (cid:18) Θ n +1 + Θ n (cid:19) + G (cid:18) P n +1 + P n (cid:19) = ˆ f n + + O (cid:0) ∆ t (cid:1) , (25)Θ n +1 − Θ n ∆ t − L (cid:18) Θ n +1 + Θ n (cid:19) − R (cid:18) u n +1 + u n (cid:19) = ˆ f n + + O (cid:0) ∆ t (cid:1) , (26) B n +1 − B n ∆ t − L (cid:18) B n +1 + B n (cid:19) + G (cid:18) P n +1 b + P nb (cid:19) = ˆ f n + + O (cid:0) ∆ t (cid:1) , (27) D (cid:0) u n +1 (cid:1) = D (cid:0) B n +1 (cid:1) = 0 , (28)9here ∆ t is the time step size and n denotes the time step number. L , R , G , L , R , L and D are discrete linear operators corresponding to the linear termsin equation (23). ˆ f , ˆ f and ˆ f are the discrete nonlinear terms corresponding toequation (24). The nonlinear terms ˆ f , ˆ f and ˆ f are calculated by the second-order Adams-Bashforth formula f n + = 32 f n − f n − + O (cid:0) ∆ t (cid:1) , (29)except the ﬁrst time step by a ﬁrst-order approximation f = f . The spatialdiscretization schemes of the linear and nonlinear terms in the above equationswill be described in Section 3.2.Moving the unknown terms about the time step n + 1 to the left hand sidesand others to the right, the equations (25)–(27) can be transformed into thefollowing form˜ u n +1 − ∆ t L (cid:0) ˜ u n +1 (cid:1) − ∆ t R (cid:0) Θ n +1 (cid:1) = ˆ u n + ∆ t L (ˆ u n ) + ∆ t R (Θ n )+ ∆ t ˆ f n + − ∆ t L G (cid:0) P n +1 − P n (cid:1) + O (cid:0) ∆ t (cid:1) , (30)Θ n +1 − ∆ t L (cid:0) Θ n +1 (cid:1) − ∆ t R (cid:0) ˜ u n +1 (cid:1) = Θ n + ∆ t L (Θ n ) + ∆ t R (ˆ u n )+ ∆ t ˆ f n + − ∆ t R G (cid:0) P n +1 − P n (cid:1) + O (cid:0) ∆ t (cid:1) , (31)˜ B n +1 − ∆ t L (cid:16) ˜ B n +1 (cid:17) = ˆ B n + ∆ t L (cid:16) ˆ B n (cid:17) + ∆ t ˆ f n + − ∆ t L G (cid:0) P n +1 b − P nb (cid:1) + O (cid:0) ∆ t (cid:1) , (32)where ˜ u n +1 , ˆ u n , ˜ B n +1 and ˆ B n are intermediate velocities and magnetic ﬁeldsdeﬁned by˜ u n +1 = u n +1 + ∆ t G (cid:0) P n +1 (cid:1) , ˜ B n +1 = B n +1 + ∆ t G (cid:0) P n +1 b (cid:1) , (33)ˆ u n = u n − ∆ t G ( P n ) , ˆ B n = B n − ∆ t G ( P nb ) . (34)By expanding P n +1 in Taylor series about P n , It can be observed thatthe pressure terms ( − ∆ t / L G ( P n +1 − P n ), ( − ∆ t / R G ( P n +1 − P n ) and( − ∆ t / L G ( P n +1 b − P nb ) are all of O (∆ t ) . Discarding these terms as well10s the temporal truncation error, we can obtain the following fully discretizedtime stepping equations for the intermediate velocity ˜ u n +1 , temperature Θ andintermediate magnetic ﬁeld ˜ B n +1 ˜ u n +1 − ∆ t L (cid:0) ˜ u n +1 (cid:1) − ∆ t R (cid:0) Θ n +1 (cid:1) = ˆ u n + ∆ t L (ˆ u n ) + ∆ t R (Θ n )+ ∆ t ˆ f n + , (35)Θ n +1 − ∆ t L (cid:0) Θ n +1 (cid:1) − ∆ t R (cid:0) ˜ u n +1 (cid:1) = Θ n + ∆ t L (Θ n ) + ∆ t R (ˆ u n )+ ∆ t ˆ f n + , (36)˜ B n +1 − ∆ t L (cid:16) ˜ B n +1 (cid:17) = ˆ B n + ∆ t L (cid:16) ˆ B n (cid:17) + ∆ t ˆ f n + . (37)Applying the divergence operator to equation (33) and subtracting equation(28), we can obtain two Poisson equations for the pressure P n +1 and pseudo-pressure P n +1 b respectively∆ t DG (cid:0) P n +1 (cid:1) = D (cid:0) ˜ u n +1 (cid:1) , (38)∆ t DG (cid:0) P n +1 b (cid:1) = D (cid:16) ˜ B n +1 (cid:17) . (39)Boundary conditions for the pressure P and pseudo-pressure P b are requiredto solve the above equations. The Neumann boundary condition n · ∇ P = 0is applied to the pressure P , where n denotes the outward normal unit vector,and a detailed discussion can be found in Yin et al. (2017). The pseudo-vacuumcondition (17) is applied to both P and P b , and according to equation (33), wecan obtain n × ∇ P b = 0, which indicates a Dirichlet boundary condition of P b on the boundaries.As a result, two linear algebraic systems are obtained including the equationsfor the velocity, temperature and magnetic ﬁeld (VTBE) (35)–(37) and theequations for the pressure and pseudo-pressure (PPBE) (38)–(39). Based onthese two linear systems, a predictor-corrector procedure is adopted to obtainthe required numerical solutions. The outline of the resulting semi-implicit timestepping scheme can be summarized as follows: Step 1:

According to the previous values u n , Θ n , B n and P n , calculate ˆ u n B n and then solve VTBE to obtain the current solutions ˜ u n +1 , Θ n +1 and ˜ B n +1 . Step 2:

Solve PPBE to obtain P n +1 and P n +1 b based on ˜ u n +1 and ˜ B n +1 . Step 3:

Update the current solutions u n +1 and B n +1 according to equation(33).In the above time stepping scheme, the intermediate variables ˆ u n and ˆ B n arecalculated from equations (34) except the initial values ˆ u and ˆ B by ﬁrst-orderapproximations ˆ u = u , ˆ B = B . As an alternative to the traditional latitude-longitude grid that suﬀers fromdisadvantages such as singularity and non-uniformity, the cubed-sphere grid(Sadourny, 1972; Ronchi et al., 1996) obtained by a projection of the inscribedcube is becoming popular for problems deﬁned on the spherical geometry. Adopt-ing the cubed-sphere grid based on the equiangular gnomonic projection (Ronchiet al., 1996), a spherical shell is divided into six identical blocks, of which eachblock is described by a local coordinate system ( ξ, η, r ), ξ, η ∈ [ − π/ , π/ ξ and η , the resulting cubed-sphere grid is quite regular and thus can beadapted well to the algorithms of domain decomposition (Toselli and Widlund,2005) and multigrid (Saad, 2003).A collocated arrangement by which all the unknown variables ( u , Θ , B , P, P b )are located at the center of grid cells is employed in the spatial discretization.For each block, the numbers of grid cells in ξ and η directions are set to bethe same value N s and the cell number in r direction is denoted by N r . Thecoordinates of the unknown point with the indices ( i, j, k ) , ≤ i, j ≤ N s − , ≤ k ≤ N r − ξ i = − π i + 0 . h s , η j = − π j + 0 . h s , r k = r i + ( k + 0 . h r , (40)12 a) (b)Figure 1: A cubed-sphere grid based on the equiangular gnomonic projection. (a) ( ξ, η ) gridon a spherical surface, (b) An open grid by shifting the six blocks outwards. where h s = π/ (2 N s ), h r = ( r o − r i ) /N r are the grid spacings in ξ ( η ) and r directions, respectively. The total cell number of the cubed-sphere grid isdenoted by N = N s × N s × N r × v , its divergence at the center of the grid cell ( i, j, k ) isnumerically approximated by the Gauss theorem ∇ · v | i,j,k ≈ V i,j,k (cid:90) V ∇ · v d V = 1 V i,j,k (cid:73) S v · d S ≈ V i,j,k (cid:104)(cid:0) v √ g (cid:1) i + ,j,k h s h r − (cid:0) v √ g (cid:1) i − ,j,k h s h r + (cid:0) v √ g (cid:1) i,j + ,k h s h r − (cid:0) v √ g (cid:1) i,j − ,k h s h r + (cid:0) v √ g (cid:1) i,j,k + h s h s − (cid:0) v √ g (cid:1) i,j,k − h s h s (cid:105) , (41)where V i,j,k ≈ √ g i,j,k h s h s h r refers to the volume of the grid cell ( i, j, k ),( v , v , v ) are the contravariant components of v and g is the determinantof the covariant components g mn of the metric tensor in the cubed-sphere grid √ g = (cid:112) det( g mn ) = r sec ξ sec η/ (cid:0) ξ + tan η (cid:1) . (42)13he spatial diﬀerential operators in equations (23) and (24) are transformedinto the divergence forms and then discretized according to equation (41).Most of the spatial terms in the governing equations (18)–(22) have beendiscussed in our previous work (Yin et al., 2017). In this section, we focus onthe new terms related to the magnetic ﬁeld B , including the Laplacian term ∇ B , divergence term ∇ · B and nonlinear terms B · ∇ B , B · ∇ u and u · ∇ B .The Laplacian term ∇ B and divergence term ∇ · B are discretized in the sameway as ∇ u and ∇ · u , respectively, while some additional eﬀort is required forthe three nonlinear terms.To deal with the three nonlinear terms in a uniform way, we consider ageneric form a · ∇ b , where a and b are two arbitrary vectors conformingdivergence-free condition ∇ · a = ∇ · b = 0. The nonlinear term a · ∇ b canbe rewritten in a conservative form and divided into two parts a · ∇ b = ∇ · ( ab ) = (cid:2) ∇ · (cid:0) a b k (cid:1) + a i b j Γ kij (cid:3) g k , i, j, k = 1 , , , (43)where Γ kij ( i, j, k = 1 , ,

3) are the Christoﬀel symbols and g k ( k = 1 , ,

3) are thecovariant base vectors in the cubed-sphere coordinate system. The divergenceterm is discretized in a ﬁnite volume scheme ∇ · (cid:0) a b k (cid:1)(cid:12)(cid:12) i,j,k ≈ V i,j,k (cid:90) V ∇ · (cid:0) a b k (cid:1) d V = 1 V i,j,k (cid:73) S (cid:0) a b k (cid:1) · d S ≈ V i,j,k (cid:104)(cid:0) a b k √ g (cid:1) i + ,j,k h s h r − (cid:0) a b k √ g (cid:1) i − ,j,k h s h r + (cid:0) a b k √ g (cid:1) i,j + ,k h s h r − (cid:0) a b k √ g (cid:1) i,j − ,k h s h r + (cid:0) a b k √ g (cid:1) i,j,k + h s h s − (cid:0) a b k √ g (cid:1) i,j,k − h s h s (cid:105) . (44)The rest term can be expressed as a i b j Γ ij = a b Γ + a b Γ + a b Γ + a b Γ + a b Γ ,a i b j Γ ij = a b Γ + a b Γ + a b Γ + a b Γ + a b Γ ,a i b j Γ ij = a b Γ + a b Γ + a b Γ + a b Γ , (45)and is treated as a source term. We apply the above ﬁnite volume scheme tothe three nonlinear terms B · ∇ B , B · ∇ u and u · ∇ B .14ome special attention should be paid to the boundary condition of the mag-netic ﬁeld. Let ( B , B , B ) denote the contravariant components of the mag-netic ﬁeld in the cubed-sphere grid. According to the pseudo-vacuum boundarycondition (17), we can deduce that the tangential components of the magneticﬁeld equal zero on the boundaries, i.e. B = B = 0. The normal component B can be constrained by the solenoidal condition ∇ · B = 0. In the cubed-spheregrid, the solenoidal condition can be expressed as ∇· B = ∂B ∂ξ + ∂B ∂η + (cid:0) Γ + Γ (cid:1) B + (cid:0) Γ + Γ (cid:1) B + 1 r ∂∂r (cid:0) r B (cid:1) = 0 . (46)Due to B = B = 0, we can obtain ∂∂r (cid:0) r B (cid:1) = 0 , (47)following which the normal component of magnetic ﬁeld B on the boundariesis calculated. At each time step, there are two linear algebraic equations, i.e. VTBE andPPBE, to be considered. The Krylov subspace iterative method combined withthe preconditioning technology is employed to solve these linear systems in thispaper. With preconditioning, a linear system, e.g. Ax = b , is replaced with aright preconditioned system A (cid:48) x (cid:48) = b, (48)where A (cid:48) = AM − , x (cid:48) = M x . Here the matrix M is generally called precondi-tioner. For any time step, x (cid:48) is initialized as x (cid:48) = M x where x is usually setto be the solution of previous time step. Then the new preconditioned linearsystem (48) is solved by a restarted generalized minimum residual (GMRES)algorithm until the residual satisﬁes (cid:107) A (cid:48) x (cid:48) − b (cid:107) ≤ max { ε a , ε r (cid:107) A (cid:48) x (cid:48) − b (cid:107)} , (49)where ε a , ε r are the absolute and relative convergence tolerances, respectively.And ﬁnally the present time step solution x can be obtained by x = M − x (cid:48) .15hen solving the linear system (48) by the Krylov subspace iterative method,the convergence rate strongly depends on the condition number of the coeﬃcientmatrix A (cid:48) = AM − (Demmel, 1997). If A (cid:48) is well conditioned, that is, its con-dition number is suﬃciently small, the iteration number of the Krylov subspacemethod can be dramatically reduced. This can be achieved by choosing an ap-propriate preconditioner M . A good choice of the preconditioner should alsohelp improve the scalability of parallel computations on large-scale supercom-puters. In other words, with the aid of a scalable preconditioner, the iterationnumber should remain a steady level as the number of processor cores increases.It is often problem-dependent to construct an eﬃcient and scalable precondi-tioner. In present study, we design a parallel multi-level restricted additiveSchwarz preconditioner based on domain decomposition and multigrid method.The cubed-sphere grid is divided into six identical blocks and each blockis decomposed into p = p p p non-overlapping subdomains in a structuredmanner, where p , p , p are numbers of subdivisions corresponding to threecoordinate directions respectively. Each subdomain is assigned to one proces-sor core and the number of processor cores corresponding to each block is p .Thus the total number of processor cores is 6 p as well as the total numberof subdomains. For each non-overlapping subdomain Ω i , i = 1 , , . . . , p , wecan obtain a corresponding larger overlapping subdomain Ω δi by extending Ω i with δ layers of mesh cells, as shown in Fig. 2a. The subdomains containingone or more block interfaces are extended to the adjacent mesh cells of theneighbouring block(s). The extending parts of overlapping subdomains lead todata exchanges, i.e. communications between corresponding processor cores. Toachieve good scalability, the inﬂuence of communication time should be reducedas much as possible.Let N denote the total number of mesh cells and d be the number of degreesof freedom per point. Moreover, the number of mesh cells in overlapping sub-domain Ω δi is denoted by N δi . Then we can deﬁne a one-level restricted additive16 i Ω δi δδ (a) (b)Figure 2: A two-dimensional illustration of domain decomposition and multigrid. (a) ﬁnemesh, (b) coarse mesh of which the cell number in each direction is reduced by 1 / Schwarz (RAS) (Cai and Sarkis, 1999) preconditioner as M − = p (cid:88) i =1 (cid:0) R i (cid:1) T (cid:0) A δi (cid:1) − R δi . (50)The restriction operator R δi is an N δi × N block matrix and its multiplication by a N × N δi × δi by dropping the components correspondingto the mesh cells outside Ω δi . The element of the restriction matrix (cid:0) R δi (cid:1) q ,q ,which is a d × d block submatrix, is an identity block if the integer indices1 ≤ q ≤ N δi and 1 ≤ q ≤ N belong to a cell in the overlapping subdomainΩ δi , or a block of zeros otherwise. As a special case, R i is also an N δi × N blockmatrix that is similarly deﬁned, but is a restriction to the non-overlappingsubdomain Ω i . The matrix A δi is the restriction of the coeﬃcient matrix A to the overlapping subdomain Ω δi with size N δi × N δi and is deﬁned as A δi = R δi A (cid:0) R δi (cid:1) T . The matrix-vector multiplication with (cid:0) A δi (cid:1) − refers to solving alocal linear system in subdomain Ω δi and can be computed exactly by using asparse LU factorization. Since LU factorization is often expensive and to form anexact preconditioner is generally not necessary, the matrix-vector multiplication17s usually obtained approximately by a less expensive incomplete LU (ILU)factorization.In our previous work (Yin et al., 2017), it is found that the one-level RASpreconditioner can achieve very good parallel performance for the solution ofthe velocity-related equation but scales poorly for the pressure-related equation.To improve the scalability of the one-level RAS preconditioner, we employ amulti-level RAS method based on hybrid preconditioning (Mandel, 1994) andmultigrid technique. By combining the one-level RAS preconditioner B f witha coarse level preconditioner B c deﬁned on a coarser mesh in a multiplicativemanner, we obtain a hybrid preconditioner M − = hybrid ( B c , B f ) = B c + B f − B f A f B c , (51)where B c = I fc A − c I cf and A f , A c denote the coeﬃcient matrices on the ﬁneand coarse meshes, respectively. Here, I cf is a restriction operator mapping froma vector deﬁned on the ﬁne mesh to a coarse mesh vector. Similarly, I fc is aprolongation operator from the coarse mesh to the ﬁne mesh. More preciselyspeaking, to calculate the multiplication of the hybrid two-level preconditionerand a vector x , y = M − x , we ﬁrst apply a coarse mesh preconditioning w = (cid:0) I fc A − c I cf (cid:1) x, (52)and then correct the coarse solution by adding the ﬁne mesh solution to obtainthe ﬁnal result y = w + M − ( x − A f w ) . (53)For each application of the two-level preconditioner (51), a smaller linearsystem with the coeﬃcient matrix A c on the coarse mesh needs to be dealt withduring the coarse mesh preconditioning. This coarse level linear system is solvedby using preconditioned GMRES with a relative tolerance η c . The coarse levelpreconditioner can be either one-level (50) or two-level (51). When a two-levelpreconditioner is adopted on the coarse mesh as well, another coarser mesh isrequired to form this preconditioner. Repeating the application of the two-level18AS preconditioner (51) in multiple mesh levels can result in a multilevel hybridRAS method.The best choices for some of the options in the multilevel RAS preconditionerare often problem-dependent (Yang and Cai, 2011). One important option is thenumber of mesh levels, whose choice strongly depends on a speciﬁc circumstance.Since additional computational costs can be introduced by the coarse meshes,excessive mesh levels may lead to the degradation of computational eﬃciency.Furthermore, the choice of the number of mesh levels has a close relationshipto the problem size. If too many mesh levels are applied when the problemsize is not large enough, the computational load of each processor core may betoo small. At this situation, the inﬂuence of communication time may becomeremarkable and the scalability may become worse. In the present study, wechoose a two-level version with a coarse-to-ﬁne mesh ratio 1:2 in each direction(see Fig. 2), by which an optimal eﬃciency is achieved in the considered spatialresolutions. If a larger resolution is required, three or more levels may be takeninto consideration to achieve better performance. On the ﬁne level mesh denotedby N , the preconditioner is M − N = hybrid (cid:32) I NN/ A − N/ I N/ N , p (cid:88) i =1 (cid:0) ( R N ) i (cid:1) T (cid:0) ( A N ) δi (cid:1) − ( R N ) δi (cid:33) , (54)where N/ I N/ N and a piecewise constant interpolation operator I NN/ are employed dueto their simplicities. The linear system about A N/ on the coarse level meshis solved by an inner GMRES, preconditioned by a one-level RAS approach onthe corresponding coarse level M − N/ = p (cid:88) i =1 (cid:0) ( R N/ ) i (cid:1) T (cid:0) ( A N/ ) δi (cid:1) − ( R N/ ) δi . (55)The choice of the subdomain solver at each level has a strong inﬂuence onthe overall performance of the preconditioner. A large number of numericalexperiments are often necessary to ﬁnd out the proper selection. Accordingto our tests, the ILU factorization with no ﬁll-in, ILU(0), is chosen as the19ubdomain solver for both the ﬁne level (cid:0) ( A N ) δi (cid:1) − in equation (54) and thecoarse level (cid:0) ( A N/ ) δi (cid:1) − in equation (55).

4. Numerical results

We build the parallel simulation code based on the Portable, ExtensibleToolkit for Scientiﬁc Computation (PETSc) library (Balay et al., 2013) andcarry out the numerical experiments on the Sunway TaihuLight supercomputer(Fu et al., 2016) which took the top place of the Top-500 list (TOP500, 2020)as of June 2016. The two resulting sparse linear algebraic equations, i.e. VTBEand PPBE, are solved by GMRES algorithm with the restarting parameter 30.The absolute and relative tolerance of GMRES are respectively set to be 10 − ,10 − for VTBE and 10 − , 10 − for PPBE. Following the well-known benchmark study of Christensen et al. (2001)where the insulating boundary condition is considered, a shell dynamo bench-marking exercise with pseudo-vacuum boundary conditions was carried out byJackson et al. (2014) for the ﬁrst time. Under the parameter regime in theirwork, E = 10 − , P r = 1 , Ra = 100 , P m = 5 , (56)at least ﬁve magnetic diﬀusion times are required to reach the quasi-steady statefrom the suggested initial condition u = 0 , Θ = 21 √ π (cid:0) − x + 3 x − x (cid:1) sin ( θ ) cos(4 φ ) , x = 2 r − r i − r o ,B r = 58 9 r − r i + r o )] r + [4 r o + r i (4 + 3 r o )]6 r − r i r o r cos( θ ) ,B θ = −

154 ( r − r i )( r − r o )(3 r − r sin( θ ) ,B φ = 158 sin[ π ( r − r i )] sin(2 θ ) . (57)20n the same spirit, Vantieghem et al. (2016) suggests a new benchmark casewith the non-dimensional control parameter E = 10 − , P r = 1 , Ra = 100 , P m = 8 , (58)and the same initial condition for the dynamo validations with pseudo-vacuumboundary conditions. According to their numerical results, a quasi-steady so-lution can be reached within less than one magnetic diﬀusion time. In thissection, we follow both these benchmark cases to validate the correctness of ourﬁnite volume code. For simplicity, we refer to the benchmark case proposed byJackson et al. (2014) as case P5 and the case suggested by Vantieghem et al.(2016) as case P8.The values of the magnetic energy, kinetic energy and some other quantitiesat the ﬁnal quasi-steady state are compared with the reference solutions forthe benchmark case P5 and P8. To compare with these benchmark results, weshould calculate these quantities in the consistent dimension. For the purposeof consistency, the kinetic energy E kin and magnetic energy E mag are deﬁned asfollows E kin = P m (cid:90) u d V, (59) E mag = P m E (cid:90) B d V. (60)And the velocity, temperature and magnetic ﬁeld are calculated by transformingfrom the quantities in present dimension T (cid:48) = T, (61) u (cid:48) = u P m, (62) B (cid:48) = B / √ . (63)Firstly, the benchmark case P5 is run by our ﬁnite volume code in threediﬀerent spatial resolutions, G64 ( N s = 64 , N r = 96), G80 ( N s = 80 , N r = 120),and G96 ( N s = 96 , N r = 144) until t = 25 .

6, when the magnetic time measuredin units of magnetic diﬀusion time is t m = t/P m = 5 .

12. The time step sizes21 × E n e r g y t m E mag E kin Figure 3: Time evolution of the magnetic energy E mag and the kinematic energy E kin ongrid G96 for the benchmark case P5. The magnetic time t m is measured in units of magneticdiﬀusion time t m = t/P m . are ∆ t = 4 × − for both G64 and G80, and ∆ t = 3 . × − for grid G96.The time evolution of the magnetic energy E mag and the kinetic energy E kin ongrid G96 is displayed in Fig. 3, which shows good agreement with the result inJackson et al. (2014).From the quasi-steady solution, we calculate the ﬁnal global data of the ki-netic energy E kin , magnetic energy E mag and drift frequency ω , and the localdata of T (cid:48) , u (cid:48) φ and B (cid:48) θ at a reference point in the equatorial plane at mid-depthwhere u r = 0 and ( ∂u r /∂φ ) >

0. The results are reported in Table 1, wherecomparisons with the recommended benchmark solution obtained by using spec-tral methods and three other results with local methods are also provided. V232and ZS363 are respectively the largest resolution results obtained by the ﬁnitevolume code V at the overall resolution R = 232 and the ﬁnite element codeZS at R = 363 reported in Jackson et al. (2014). And SFEMaNS refers to theﬁnite element result reported in Matsui et al. (2016). The overall resolution R is deﬁned as the third root of the number of degrees of freedom for each scalarvariable R = N / . The values in parentheses denote relative errors comparedwith the recommended benchmark solution obtained by spectral methods. From22 able 1: Comparison with the benchmark results for the case P5. G64, G80 and G96 areour numerical results in three diﬀerent spatial resolutions. SM denotes the recommendedbenchmark solution (Jackson et al., 2014) obtained by the spectral methods. V232 and ZS363are respectively the largest resolution results obtained by the ﬁnite volume code V at theoverall resolution R = 232 and the ﬁnite element code ZS at R = 363 reported in Jacksonet al. (2014). SFEMaNS refers to the ﬁnite element result reported in Matsui et al. (2016).The values in parentheses are relative errors compared with the SM results.Results E mag E kin T (cid:48) u (cid:48) φ B (cid:48) θ ω G64 79503(0.71%) 14884(0.26%) 0.42590(0.002%) -58.232(0.09%) 0.9936(0.06%) 3.5409(5.56%)G80 79486(0.73%) 14878(0.22%) 0.42604(0.035%) -58.149(0.05%) 0.9888(0.42%) 3.6781(1.90%)G96 79686(0.48%) 14854(0.05%) 0.42593(0.009%) -58.090(0.15%) 0.9901(0.29%) 3.7511(0.04%)V232 79012(1.32%) 14941(0.64%) 0.42630(0.096%) -57.932(0.42%) 0.9746(1.85%) 3.7457(0.10%)ZS363 81210(1.42%) 15032(1.25%) 0.42700(0.261%) -58.480(0.52%) 0.9951(0.21%) 3.7940(1.19%)SFEMaNS 80578(0.63%) 14797(0.33%) 0.42553(0.085%) -58.280(0.17%) 1.0015(0.86%)SM 80071 14846 0.42589 -58.179 0.9930 3.7495

Table 1, it is seen that the discrepancy is less than 1% for all quantities on allgrids except the drift frequency ω at lower resolutions. Noticing that the overallresolution of the grid G64, G80 and G96 are respectively 133, 166 and 200, ourﬁnite volume code produces highly accurate solutions, which are comparableto and even better than the existing local results, for the benchmark case P5.In addition, the drift frequency ω shows good convergence rate as the spatialresolution increases.The benchmark case P8 is then considered to further validate the proposedmethods and the implemented ﬁnite volume code. An attractive advantageof this benchmark is that a quasi-steady solution can be reached within onemagnetic diﬀusion time, which allows a much quicker validation in contrast tothe benchmark case P5. It was found by Sheyko (2014) that two diﬀerent typesof dynamo solutions can be obtained when changing the initial magnetic ﬁeldfor this benchmark problem. For initial values of the magnetic energy between407101 and 623428, such as the suggested one (57), one can obtain a quasi-steady solution expressed in the form ( u , B , Θ) = f ( r, θ, φ − ωt ). For the initial23 . . . . . . . . . E m ag t m Quasi-steady solutionOscillating solution

Figure 4: Time evolution of the magnetic energy E mag on the grid G64 for the benchmarkcase P8 with two diﬀerent initial magnetic ﬁeld intensities. The magnetic time t m is measuredin units of magnetic diﬀusion time t m = t/P m . magnetic energy outside this range, an oscillating dynamo solution can be found(Vantieghem et al., 2016).The numerical tests of the case P8 are run on grid G64, G80 and G96 aswell until t = 8, with the magnetic time t m = t/P m = 1. The time step sizes∆ t corresponding to the grid G64, G80 and G96 are 5 × − , 5 × − and4 × − , respectively. The time evolution of the magnetic energy E mag on thegrid G64 is displayed in Fig. 4, which also shows an oscillating solution ob-tained by decreasing the initial magnetic ﬁeld B (cid:48) initial = B initial / √

2. We can seefrom the ﬁgure that the magnetic energy E mag reaches a constant value withinone magnetic diﬀusion time for the quasi-steady solution while the oscillatingsolution ﬁnally exhibits oscillation behaviour. The results of the time evolutionin Fig. 4 are quite consistent with the benchmark (Vantieghem et al., 2016).We calculate the reference quantities including E mag , E kin , T (cid:48) , u (cid:48) φ , B (cid:48) θ and ω from the ﬁnal quasi-steady solution and summarize the comparison with thebenchmark results in Table 2. The values in parentheses denote relative errorscompared with the benchmark solution obtained by a pseudospectral method.It can be seen from Table 2 that our results are in good agreement with the24 able 2: Comparison with the benchmark results for the case P8. G64, G80 and G96 areour numerical results in three diﬀerent spatial resolutions. FV64 and FV128 refer to theﬁnite volume results in Vantieghem et al. (2016) with six blocks of 64 and 128 grid points,respectively. PS denotes the suggested benchmark solution (Vantieghem et al., 2016) obtainedby the pseudospectral method. The values in parentheses are relative errors compared withthe PS results.Results E mag E kin T (cid:48) u (cid:48) φ B (cid:48) θ ω G64 313188.0(0.14%) 21762.6(0.59%) 0.3935(0.08%) -80.64(0.36%) 2.2137(1.44%) 5.2215(6.07%)G80 312799.0(0.01%) 21710.2(0.35%) 0.3934(0.05%) -80.81(0.15%) 2.1977(0.71%) 5.3825(3.17%)G96 312619.0(0.04%) 21686.1(0.24%) 0.3933(0.03%) -80.88(0.06%) 2.1888(0.30%) 5.4827(1.37%)FV64 309086.0(1.17%) 21502.1(0.61%) 0.3920(0.31%) -81.23(0.36%) 2.1510(1.43%) 5.6453(1.55%)FV128 311950.2(0.26%) 21576.2(0.27%) 0.3925(0.18%) -80.74(0.24%) 2.1839(0.07%) 5.4959(1.13%)PS 312754.7 21634.9 0.3932 -80.9318 2.1823 5.5588 benchmark pseudospectral solution and the accuracy is comparable to the ex-isting ﬁnite volume results. Besides, the relative errors of the global and localquantities become smaller as the spatial resolution increases.The spatial structure of the quasi-steady solution on the grid G96 is distinctlyshown in Figs 5–7. Fig. 5 depicts the equatorial slices of the quasi-steadyquantities including T (cid:48) , B (cid:48) θ , u (cid:48) φ and u (cid:48) r . It shows good agreement with thebenchmark results (Vantieghem et al., 2016). Fig. 6 gives the contours on themid-depth spherical surface of T (cid:48) , u (cid:48) r and B (cid:48) r , and Fig. 7 displays the contourof B (cid:48) r on the outer boundary surface. The spatial structure of the four-foldazimuthal symmetry can be observed from these ﬁgures. The parallel performances of the one-level and two-level RAS preconditionerare reported systematically in this section. In terms of the solver options, theoverlap size is δ = 1 and the subdomain solver is ILU(0) for the one-levelRAS preconditioner. For the two-level method, the overlap size δ = 1 and thesubdomain solver ILU(0) are used for both the ﬁne level and coarse level, whilethe relative tolerance of the inner GMRES on the coarse level is set to be 0 .

1. We25 a) T (cid:48) (b) B (cid:48) θ (c) u (cid:48) φ (d) u (cid:48) r Figure 5: Equatorial slices of the quasi-steady solution on the grid G96 including T (cid:48) (a), B (cid:48) θ (b), u (cid:48) φ (c) and u (cid:48) r (d). a) T (cid:48) (b) u (cid:48) r (c) B (cid:48) r Figure 6: Contours on the mid-depth spherical surface of the quasi-steady solution on thegrid G96 including T (cid:48) (a), u (cid:48) r (b) and B (cid:48) r (c). The block interfaces of the cubed-sphere gridare denoted by black lines. igure 7: Contour of B (cid:48) r on the outer boundary surface of the quasi-steady solution on thegrid G96. apply both the one-level and two-level preconditioner to the solution of PPBEand compare the two results on eﬃciency and performance. The VTBE is onlysolved by GMRES with the one-level preconditioner.Firstly, the strong scalability of the GMRES algorithm is studied with aﬁxed mesh 144 × × × t = 1 × − . The strong scalability refers to the inﬂuence ofthe number of processor cores on the compute time for the problem with a ﬁxedspatial resolution. In the ideal situation, the compute time should be reducedproportionally as the number of processor cores increases. We double the num-ber of processor cores from 1296 up to 10368 and average the correspondingiteration number and compute time of GMRES over the ﬁrst ten time steps.The averaged results are summarized in Table 3.It is observed from Table 3 that the averaged iteration number of GMRESfor VTBE remains unchanged as the number of processor cores increases from1296 to 10368, and the iteration number of PPBE strongly depends on theemployed preconditioner. For the one-level preconditioner, the iteration numberof PPBE increases mildly as the number of processor cores is doubled. Withthe two-level preconditioner being employed, the iteration number of PPBE isdramatically reduced and is kept to a low level in spite of the double growthin the number of processor cores. In terms of the compute time of PPBE,28 able 3: Strong scaling results with a ﬁxed mesh 144 × × × t = 1 × − . The results are averaged over the ﬁrst ten time steps. The averagediteration numbers of the inner GMRES on the coarse mesh are given in parentheses whenusing the two-level RAS preconditioner. np denotes the number of processor cores. np VTBE PPBEIteration number Compute time (s) Iteration number Compute time (s)one-level one-level one-level two-level one-level two-level1296 3.0 6.65 144.9 22.0(13.4) 29.69 15.802592 3.0 3.41 154.0 22.2(16.0) 16.90 10.185184 3.0 1.77 158.0 22.1(14.0) 9.60 5.7210368 3.0 0.92 151.1 22.4(16.1) 5.24 4.20 the two-level preconditioner is about 20%–47% faster than the one-level, whichindicates a noticeable improvement of computational eﬃciency. Fig. 8 displaysthe averaged compute time of VTBE and PPBE with respect to the number ofprocessor cores. We can observe from the ﬁgure that the GMRES algorithm forVTBE scales very well with up to 10368 processor cores and its strong scalabilityis quite close to the ideal situation. The GMRES algorithm for PPBE scales wellif the number of processor cores is not too large. When using a large numberof processor cores, e.g. 10368, the strong scalability, as well as the eﬃciencyimprovement by the two-level preconditioner, degrade to some extent, becausethe amount of computations on each processor core is too small.To further investigate the performance of the proposed algorithms, we testour code in terms of the weak scalability, which usually draws more interestin practical applications. The weak scalability focuses on the variation of thecompute time with respect to the increase in the number of processor cores whilethe computational load on each processor core is ﬁxed. The compute time shouldremain the same as the number of processor cores grows in the ideal situation.In our weak scaling test, the time step size is set to be ∆ t = 1 × − and themesh size assigned to each processor core is ﬁxed to 20 × ×

20. The numberof processor cores is doubled from 648 to 10368 and the corresponding spatial29 . A v e r ag e d c o m pu t e t i m e ( s ) Number of processor coresVTBEPPBE (one-level)PPBE (two-level)Ideal

Figure 8: Strong scaling results displaying the averaged compute time of VTBE and PPBEwith respect to the number of processor cores. The dash line refers to the ideal situation. resolution is increased proportionally from 120 × × × × × × able 4: Weak scaling results with a ﬁxed mesh 20 × ×

20 for each processor core anda constant time step size ∆ t = 1 × − . The results are averaged over the ﬁrst ten timesteps. The averaged iteration numbers of the inner GMRES on the coarse mesh are given inparentheses when using the two-level RAS preconditioner. np denotes the number of processorcores. np VTBE PPBEIteration number Compute time (s) Iteration number Compute time (s)one-level one-level one-level two-level one-level two-level648 3.0 3.36 103.9 21.8(10.1) 11.02 7.481296 3.0 3.37 114.1 21.7(11.7) 12.20 7.872592 3.0 3.37 162.7 24.8(17.0) 17.47 11.635184 4.0 4.78 225.5 21.8(16.7) 24.56 10.3110368 4.1 4.83 240.4 21.4(18.9) 26.37 11.47 A v e r ag e d c o m pu t e t i m e ( s ) Number of processor coresVTBEPPBE (one-level)PPBE (two-level)Ideal

Figure 9: Weak scaling results displaying the average compute time of VTBE and PPBE withrespect to the number of processor cores. The dash line refers to the ideal situation. . Conclusions A scalable parallel solver for the convection-driven magnetohydrodynamicdynamo problem in a rapidly rotating spherical shell with pseudo-vacuum mag-netic boundary conditions is developed in this paper. A ﬁnite volume methodon a collocated quasi-uniform cubed-sphere grid is employed for the spatial dis-cretization of the spherical shell dynamo equations. In terms of the temporalintegration, a second-order approximate factorization method, applied success-fully to the non-magnetic thermal convection problem in our previous study(Yin et al., 2017), is extended to the dynamo governing equations, resulting intwo linear algebraic systems, VTBE and PPBE, that are both solved by a pre-conditioned Krylov subspace iterative method. To improve the computationaleﬃciency and parallel performance, we design a multi-level restricted additiveSchwarz preconditioner based on domain decomposition and multigrid method.We perform the simulations of two benchmark cases suggested respectively byJackson et al. (2014) and Vantieghem et al. (2016) and obtain highly accuratenumerical solutions, comparable to the existing local method results reportedin (Jackson et al., 2014; Vantieghem et al., 2016; Matsui et al., 2016). Severalnumerical tests are carried out to investigate the computational eﬃciency andthe parallel performance with up to 10368 processor cores on the Sunway Taihu-Light supercomputer. The solver of VTBE with the one-level restricted additiveSchwarz preconditioner shows very good strong and weak scalabilities. For thesolver of PPBE, a noticeable improvement in the computational eﬃciency andthe weak scalability by the two-level preconditioner is observed, comparing tothe one-level method.To extend our code to the full dynamo problem, the implementations of theinsulating boundary condition and the singularity in the inner core should betaken into consideration in the future. Possible solutions may include an integralboundary element approach (Iskakov et al., 2004) together with a parallel fastmultipole method (Benson et al., 2014) for the issue of the insulating boundarycondition and a logically rectangular grid suggested by (Calhoun et al., 2008)32or the inner core problem.

Acknowledgements

This work was supported by the Science and Technology Development Fund,Macau SAR (File no. 0001/2019/A1), Macau Foundation; by the pre-researchproject on Civil Aerospace Technologies No. D020308 and D020303 fundedby China National Space Administration; by Beijing Natural Science Founda-tion, Grant No. JQ18001; by National Natural Science Foundation of China,Grant No. 41174056; by the B-type Strategic Priority Program of the ChineseAcademy of Sciences, Grant No. XDB41000000; by the Hong Kong-Macau-Taiwan Cooperation Funding of Shanghai Committee of Science and TechnologyNo. 19590761300.

References