[PDF] Domain of Influence analysis: implications for Data Assimilation in space weather forecasting

Abstract

Solar activity, ranging from the background solar wind to energetic coronal mass ejections (CMEs), is the main driver of the conditions in the interplanetary space and in the terrestrial space environment, known as space weather. A better understanding of the Sun-Earth connection carries enormous potential to mitigate negative space weather effects with economic and social benefits. Effective space weather forecasting relies on data and models. In this paper, we discuss some of the most used space weather models, and propose suitable locations for data gathering with space weather purposes. We report on the application of \textit{Representer analysis (RA)} and \textit{Domain of Influence (DOI) analysis} to three models simulating different stages of the Sun-Earth connection: the OpenGGCM and Tsyganenko models, focusing on solar wind - magnetosphere interaction, and the PLUTO model, used to simulate CME propagation in interplanetary space. Our analysis is promising for space weather purposes for several reasons. First, we obtain quantitative information about the most useful locations of observation points, such as solar wind monitors. For example, we find that the absolute values of the DOI are extremely low in the magnetospheric plasma sheet. Since knowledge of that particular sub-system is crucial for space weather, enhanced monitoring of the region would be most beneficial. Second, we are able to better characterize the models. Although the current analysis focuses on spatial rather than temporal correlations, we find that time-independent models are less useful for Data Assimilation activities than time-dependent models. Third, we take the first steps towards the ambitious goal of identifying the most relevant heliospheric parameters for modelling CME propagation in the heliosphere, their arrival time, and their geoeffectiveness at Earth.

Full PDF

DDomain of Inﬂuence analysis: implications forData Assimilation in space weatherforecasting

Dimitrios Millas , , ∗ , Maria Elena Innocenti , ∗ , Brecht Laperre , JoachimRaeder , Stefaan Poedts , and Giovanni Lapenta Centre for mathematical Plasma Astrophysics, Department of Mathematics, KULeuven, Leuven, Belgium Institute for the Study of Earth, Oceans and Space, University of New Hampshire,Durham, NH, USA Institute of Physics, University of Maria Curie-Skłodowska, Lublin, Poland Department of Physics and Astronomy, University College London, London, UK

Correspondence*:Dimitrios [email protected] Elena [email protected]

ABSTRACT

Representer analysis (RA) and

Domain of Inﬂuence (DOI) analysis to three models simulating different stages of the Sun-Earthconnection: the OpenGGCM and Tsyganenko models, focusing on solar wind - magnetosphereinteraction, and the PLUTO model, used to simulate CME propagation in interplanetary space.Our analysis is promising for space weather purposes for several reasons. First, we obtainquantitative information about the most useful locations of observation points, such as solarwind monitors. For example, we ﬁnd that the absolute values of the DOI are extremely low inthe magnetospheric plasma sheet. Since knowledge of that particular sub-system is crucial forspace weather, enhanced monitoring of the region would be most beneﬁcial. Second, we are ableto better characterize the models. Although the current analysis focuses on spatial rather thantemporal correlations, we ﬁnd that time-independent models are less useful for Data Assimilationactivities than time-dependent models. Third, we take the ﬁrst steps towards the ambitious goalof identifying the most relevant heliospheric parameters for modelling CME propagation in theheliosphere, their arrival time, and their geoeffectiveness at Earth.

Keywords: solar wind, coronal mass ejections (CMEs), magnetohydrodynamics (MHD), numerical simulations, statistical tools,domain of inﬂuence, observations a r X i v : . [ phy s i c s . s p ace - ph ] S e p illas et al. DOI analysis for space weather

Solar activity affects the terrestrial environment with a constantly present but highly variable solar windand with higher energy, transient events, such as ﬂares and Coronal Mass Ejections (CMEs).“Space weather” (Bothmer and Daglis, 2007) is the discipline that focuses on the impact of these solardrives on the solar system and in particular on the Earth and its near space environment.Space weather events can have serious effects on the health of astronauts and on technology, withpotentially large economic costs (Eastwood et al., 2017). The importance of space weather forecasting hasgrown with societal dependence on advanced space technology, on communication and on the electricalgrid. For example, the Halloween 2003 solar storms that impacted Earth between 19th of October 2003and 7th of November 2003 caused an hour long power outage in Sweden (Pulkkinen et al., 2005), forcedairline ﬂight reroutes, and affected communication and satellite systems (Plunkett, 2005). The “greatgeomagnetic storm” of March 13-14, 1989 caused, among other disruptions, a blackout of up to nine hoursin most of Quebec Province, due to a massive failure experienced by the power grid Hydro-Quebec PowerCompany (Allen et al., 1989).In order to improve space weather forecasting, accurate models of the Sun-Earth connection are needed.Such forecasts are challenging because of the complexity of the processes involved and the large rangeof spatial and temporal scales. Commonly the heliosphere is divided into sub-systems, where each one issimulated with a different model, such that the models feed into each other (Luhmann et al., 2004; T´othet al., 2005). These models can be either physics-based or empirical. Empirical models (such as, in thesolar domain, Altschuler and Newkirk (1969); Schatten et al. (1969); Schatten (1971); Wang and Sheeley Jr(1992); Nikoli´c (2019)) usually require less computational resources, enabling faster forecasts. They canalso serve as a baseline for physics-based models (Siscoe et al., 2004). However, empirical models lackthe sophistication of more expensive ﬁrst-principles based numerical models. Recently, machine learningmethods have emerged, that can provide a new approach to space weather forecasting (Camporeale, 2019;Laperre et al., 2020). Most of these methods, while promising, must still undergo extensive validation.The technique of Data Assimilation (DA) was developed to improve model predictions by properlyinitializing models from data and by keeping a model on track during its time evolution. (Kalnay, 2003;Bouttier and Courtier, 1999; Evensen, 2009). DA methods were originally applied to atmosphere andocean models, which exhibit a large degree of inertia. The latter is also true for the solar wind, but notfor the magnetosphere-ionosphere system, which is strongly driven and dissipative. Therefore, in spaceweather forecasting, DA aims not only at initializing the models, but also at using information fromvarious observations to bring the evolution of a system as predicted from a model closer to the real systemevolution (Kalman, 1960; Bouttier and Courtier, 1999; Bishop et al., 2001; Evensen, 2009; Le Dimet andTalagrand, 1986; Innocenti et al., 2011), making up for model deﬁciencies in the terms of resolution andincomplete physical description.The quantity and quality of available data is a critical factor in the effectiveness of Data Assimilation.This is the reason why ﬁelds where data can be obtained more easily and continuously have shown earlysuccesses in DA implementations. Examples of these ﬁelds are meteorology and oceanography, and, inspace sciences, ionospheric and radiation belt physics (Bennett, 1992; Ghil and Malanotte-Rizzoli, 1991;Egbert and Bennett, 1996; Kalnay, 2003; Rigler et al., 2004; Schunk et al., 2004; Kondrashov et al., 2007).Examples of DA applications targeting speciﬁcally the interplanetary space environments are Schrijver andDeRosa (2003); Mendoza et al. (2006); Arge et al. (2010); Innocenti et al. (2011); Skandrani et al. (2014);Lang and Owens (2019); Lang and Owens (2019); Lang et al. (2020).

This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Representer analysis (RA) and Domain of Inﬂuence analysis (DOI) (Bennett, 1992; Egbert and Bennett,1996; Echevin et al., 2000; Evensen, 2009; Skandrani et al., 2014), brieﬂy summarized in Section 2, arepowerful statistical tools used to estimate the effectiveness of DA techniques when applied to a speciﬁcmodel, without assimilating actual data. Such analysis can be used in several ways. It allows us tooptimize assimilation strategies, it may uncover model biases that can then be addressed by further modeldevelopment, and it may be used to optimize the observation systems that provide operational data forDA. For example, RA/DOI can be used to optimize locations for solar wind monitors, such as locationsproposed near the L5 Lagrangian point (Vourlidas, 2015; Lavraud et al., 2016; Pevtsov et al., 2020).In the present paper, the RA and DOI analysis is applied to three models: the OpenGGCM magnetosphere- ionosphere model (Section 3.1), two of the empirical Tsyganenko magnetosphere magnetic ﬁeld models(Section 3.2), and a solar wind simulation based on the PLUTO code (Section 3.3). These models simulatecritical sub-systems in the Sun-Earth connection with a focus on the terrestrial magnetosphere and CoronalMass Ejection propagation.The present paper provides insights into the locations of the terrestrial magnetosphere that should beprioritized (ideally, in absence of orbital constraints) for space weather forecasting and monitoring activity.We compare a time-dependent, physics based model (e.g., OpenGGCM) and time-independent, empirical(e.g., Tsyganenko) models in terms of the expected beneﬁts that DA can provide. We conclude thattime-dependent models should be preferentially chosen for DA. We take the ﬁrst steps towards the goalif understanding the main physical parameters, close to the Sun and in interplanetary space, that controlCME propagation and hence their arrival time at Earth.This manuscript is organized as follows: in Section 2 we introduce the theoretical background on RAand the DOI; Section 3 discusses the application of the method to the different models; in Section 4 wesummarize the results and discuss potential improvements and new applications.

This Section introduces the mathematical basis of RA and DOI analysis. The reader is referred to Skandraniet al. (2014) and references therein for an in depth derivation.Let us start from a system described by the state variable vector x t ∈ R n . x t is a vector containing the n state variables that describe the system at a time t . Let us assume that the evolution of the system can bedescribed as a discrete-time process controlled by an evolution law A . The state of the system then evolvesas follows: x t = A ( x t − ) + w t − , where w ∈ R n is process noise. The process noise is assumed to beGaussian and with covariance matrix Q.If a model M (for example, a simulation model) of the evolution law A is available, we can obtain,following Kalman (1960); Evensen (2009), a prior estimate ˆ x − t of the state variable x t through thesimulation model as ˆ x − t = M (ˆ x − t − ) + w t − . (1)Assume now that we have m observational values or measurements z t ∈ R m . These measurementscan be mapped to the current state x t through the so-called observation operator H ∈ R m × n , such that z t = Hx t + ν t . Here, ν is the (assumed Gaussian) measurement noise, with a covariance matrix R . Frontiers 3 illas et al.

DOI analysis for space weather

It can be then shown (Bishop et al., 2001) that a posterior estimate of the state ( ˆ x t ) can be obtained fromthe prior estimate of the state ( ˆ x − t ), obtained from Eq. 1, as follows: ˆ x t = ˆ x − t + K t (cid:0) z t − H ˆ x − t (cid:1) . (2)Here, the term (cid:0) z t − H ˆ x − t (cid:1) is called the “innovation”, and represents the difference between themeasurements and their expected values, calculated by applying the observation operator to the priorstate estimate. The Kalman gain K t is obtained by minimizing the posterior error covariance matrix. Thisis the “correction” step of the Kalman ﬁlter, where the Kalman gain is calculated and the estimate and errorcovariance matrix of the posterior state are updated. The “prediction” (forecast) phase of the ﬁlter results inthe calculation of the prior state estimate and prior error covariance matrix (used to compute the Kalmangain).The prior and posterior error covariance matrices are respectively deﬁned as P − t = E (cid:104)(cid:0) x t − ˆ x − t (cid:1) (cid:0) x t − ˆ x − t (cid:1) T (cid:105) , P t = E (cid:104) ( x t − ˆ x t ) ( x t − ˆ x t ) T (cid:105) , (3)where E is the expected value, x t is the “real”, unknown system state and (cid:15) − = (cid:0) ˆ x − t − x t (cid:1) and (cid:15) =(ˆ x t − x t ) are the prior and posterior errors, calculated as the difference between the prior ( ˆ x − t )/posterior( ˆ x t ) state and the real state, x t . Notice that, although these are the deﬁnitions of the error covariances, thisis not how they are computed in the ﬁlter, since the real state is not known.The formula for the calculation of the posterior state, Eq. 2, can be written as ˆ x t = ˆ x − t + rb (4)where r ∈ R n × m and b ∈ R m are deﬁned as r = P − H T , b = (cid:16) HP − H T + R (cid:17) − (cid:0) z − H ˆ x − (cid:1) , (5)with R the measurement noise covariance matrix. The time index t has been dropped for ease of reading.We will from now on assume that the system (and in particular, the observation operator H ) is linear.Then, each column of the matrix r , denoted as r j with j = 1 , . . . , m , is the representer associated to a givenobservation z j (remember that z is the vector with m observations), and gives a measure of the impactof that observation in “correcting” the prior state estimate. If we further assume that each observation j is located at grid point k j , and is associated to a state variable, then each column r j (now r k j , given theassumption mentioned above) contains the covariances (“cov”) between the prior errors at the observationpoint k j and at every other grid node, for all the state variables.Since the real state is not available for error covariance calculations, an ensemble of simulations can beused to estimate the prior errors instead. An ensemble (Evensen, 2009) can be generated by perturbingone or several of the sources of model errors. In this work, ensembles are generated for each model byperturbing the respective initial / boundary conditions. Once the ensemble is available, the covariances ofthe prior errors at a certain simulated time can be approximated as the ensemble covariances (“cov ens ”) ofthe prior errors. These in turn become the ensemble covariances of the simulated state variables, if oneassumes that the prior errors are unbiased. The ensemble covariance between the state variable x and y is This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather deﬁned as cov ens ( x, y ) = 1 N N (cid:88) s =1 (cid:34)(cid:32) x − s − N N (cid:88) x − s (cid:33) (cid:32) y − s − N N (cid:88) s =1 y − s (cid:33)(cid:35) (6)where N is the number of members in the ensemble, and x and y represent two state variables (notice that,for ease of reading, we indicate here two state variables as x and y, while earlier we indicated as vector x all the state variables).In a set of simulations, the prior state variables are the simulation results at a speciﬁc time. Being ableto calculate the representers associated to observations from the ensemble rather than from assimilatingobservations simpliﬁes the RA signiﬁcantly.Following the discussion of the representer term r in the calculation of the posterior state, Eq. 4 and Eq. 5,we now examine the term b . Assuming that the only observation point for observation j is at grid node k j (i.e., the row j of the matrix H has only the term k j different from zero), the element of b associated to theobservation at grid point k j , denoted as b k j , becomes, from Eq. 5: b k j = z k j − ˆ x − k j cov (cid:16) (cid:15) − k j , (cid:15) − k j (cid:17) + cov (cid:16) (cid:15) zk j , (cid:15) zk j (cid:17) , (7)where we have made use of the simpliﬁed form of the matrix H and where (cid:15) zk j is the observation errorassociated with the observation z k j .If x i is one of the state variables at grid node i , the correction to x i brought by the assimilation of themeasure z k j , following Eq. 4, Eq. 5, Eq. 7 and some straightforward manipulation based on the deﬁnitionsof covariance, variance, correlation, can be written as (Skandrani et al., 2014) ˆ x i − ˆ x − i = corr ens (cid:16) ˆ x − k j , ˆ x − i (cid:17) F (cid:16) z k j (cid:17) . (8)Here, F ( z k j ) is the modulation factor and corr ens (cid:16) ˆ x − k j , ˆ x − i (cid:17) the correlation. The correlation is computedfrom the ensemble, and is calculated between the state variable at node k j and at node i . This correlationreﬂects how a change at node k j , caused e.g. by the assimilation of the measurement z k j , will inﬂuencethe node i , and is what we call the DOI. The correlation over the ensemble is deﬁned, using the dummyvariables x and y for brevity, as: corr ens ( x, y ) = cov ens ( x, y ) (cid:112) var ( x ) var ( y ) . (9)The modulation factor F ( z k j ) in Eq. 8 depends, among other things, on the measurement z k j and onthe error associated to the measure z k j . Hence Data Assimilation has to be performed to calculate thisterm. The corr ens (cid:16) ˆ x − k j , ˆ x − i (cid:17) term reﬂects how large we can expect the area that will be affected by theassimilation of z k j to be. But to know how large the difference between the posterior and prior state, ˆ x i − ˆ x − i , will be, we need to know the modulation factor as well. Frontiers 5 illas et al.

DOI analysis for space weather

So now the DOI of the measurement z k j on the state variable at grid point i , x i , can be deﬁned as DOI ( z k j , x i ) = corr ens (cid:16) ˆ x − k j , ˆ x − i (cid:17) . (10)One can see from its deﬁnition that the DOI can be calculated before assimilation by computing theensemble correlation of the state variable value at the grid point k j with that at grid point i . Dropping the i index, i.e. examining the expected impact of measurement z k j on all the state variables x at all grid points,we obtain the more general deﬁnition of the DOI as DOI ( z k j ) = corr ens (cid:16) ˆ x − k j , ˆ x − (cid:17) . (11)We can then draw “DOI maps” that show the correlation between a ﬁeld at grid point k j , the “observationpoint”, and the other grid points.Notice that in this derivation we have assumed, for simplicity, that the measurement z and the statevariable x refer to the same ﬁeld, for example, the x component of the magnetic ﬁeld, or of the velocity.This simpliﬁes the formulation of the numerator of Eq. 7 and improves the readability of the derivation.Skandrani et al. (2014) shows examples where the DOI is calculated between different ﬁelds, e.g. magneticﬁeld and velocity.DOI analysis has the advantage that it can be calculated for all state variables and at any grid pointwithout actual assimilation, i.e. without the need for measurements z . In order to compute the DOI at a timestep t , we only require evolving the ensemble up to said time step t , and then performing the correlationover the ensemble between the state variable value at the observation point k j and at all other grid points.Because DOI values are derived from a correlation they are bounded between -1 and 1. | DOI | ∼ indicates that the ﬁeld at that speciﬁc point signiﬁcantly changes when the same ﬁeld (or a different ﬁeld,in the case of cross-correlation) varies at the observation point. | DOI | ∼ indicates the opposite, i.e.,variation at the observation point have little or no effect. Thus, DOI analysis also provides information onhow information propagates within the model, and therefore sheds light on the physical processes withinthe model. We will exploit this property in Section 3.2.We note that in this study we only focus on spatial correlations, neglecting temporal correlations. In otherwords, the following analysis (i.e. the calculation of variances, DOI, etc.) is restricted to speciﬁc instancesin time, rather than examining correlations between ﬁelds at difference times as well. The dependence ontime will be addressed in a future project.The RA is applied to “artiﬁcial” data, obtained from ensembles of simulations focused on differentprocesses of interest in the Sun-Earth connection: the interaction of the solar wind with the terrestrialmagnetosphere (via OpenGGCM and Tsyganenko simulations) and the propagation of a CME-like event inthe steady solar wind (PLUTO).OpenGGCM and the Tsyganenko Geomagnetic Field Models both simulate the interaction of the solarwind with the magnetospheric system. OpenGGCM is a physics-based magnetohydrodynamic (MHD)model, while the Tsyganenko models are semi-empirical best-ﬁt representations for the magnetic ﬁeld,based on a large number of satellite observations (Tsyganenko, 1995, 2002a; Tsyganenko and Sitnov,2005). This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

PLUTO is an MHD-modelling software used to simulate the propagation of a CME in the backgroundsolar wind. This software can be used to numerically solve the partial differential equations encountered inplasma physics problems, in conservative form, in different regimes (from hydrodynamics to relativisticMHD). The structure of the software is explained in Mignone et al. (2007, 2012). Full documentation andreferences can be found in the relevant web page: http://plutocode.ph.unito.it/ .Because the DOI analysis is an ensemble based technique the size of the ensemble and its propertiesmatter. In order to test for sufﬁcient size, we performed the DOI calculation using a limited, random subsetof the ensemble, which we gradually expanded. We found that using at 25 runs were sufﬁcient to obtain aconsistent ensemble mean, variance, and DOI. We note, however, that this may change for different choicesof simulation resolution and parameters used for the generation of the ensemble.

The OpenGGCM (Open Geospace General Circulation Model) is a MHD based model that simulates theinteraction of the solar wind with the magnetosphere-ionosphere-thermosphere system. OpenGGCM isavailable at the Community Coordinated Modeling Center at NASA/GSFC for model runs on demand (see:http://ccmc.gsfc.nasa.gov). This model has been developed and continually improved over more than twodecades. Besides numerically solving the MHD equations with high spatial resolution in a large volumecontaining the magnetosphere, the model also includes ionospheric processes and their electrodynamiccoupling with the magnetosphere.The mathematical formulation of the software is described in Raeder (2003). The latest version ofOpenGGCM, used here, is coupled with the Rice Convection Model (RCM), (Toffoletto et al., 2001), whichtreats the inner magnetosphere drift physics better than MHD and allows for more realistic simulations thatinvolve the ring current (Cramer et al., 2017). The model is both modular and efﬁciently parallelized usingthe message passing interface (MPI). It is written in Fortran and C, and uses extensive Perl scripting forpre-processing. The software runs on virtually any massively parallel supercomputer available today.OpenGGCM uses a stretched Cartesian grid (Raeder, 2003) that is quite ﬂexible. There is a minimaluseful resolution, about 150x100x100 cells, that yields the main magnetosphere features but does notresolve mesoscale structures such as FTEs or small plasmoids in the tail plasma sheet. At the other end, wehave run simulations with grids as large as ∼ (on some 20,000 cores). In terms of computational costthat is almost a 10 ratio. Here, we used a grid of 325x150x150 cells which is sufﬁcient for the purposes ofthis study and runs faster than real time on a modest number of cores.OpenGGCM has been used for numerous studies of magnetospheric phenomena such as storms (Raederet al., 2001b; Raeder and Lu, 2005; Connor et al., 2016), substorms (Raeder et al., 2001a; Ge et al.,2011; Raeder et al., 2010), magnetic reconnection (Dorelli, 2004; Raeder, 2006; Berchem et al., 1995),ﬁeld-aligned currents (Moretto et al., 2006; Vennerstrom et al., 2005; Raeder et al., 2017; Anderson et al.,2017), and magnetotail processes (Zhu et al., 2009; Zhou et al., 2012; Shi et al., 2014), to name a few.The boundary conditions require the speciﬁcation of the three components of the solar wind velocityand magnetic ﬁeld, the plasma pressure and the plasma number density at 1 AU, which are obtained fromACE observations (Stone et al., 1998) and applied for the entire duration of the simulation at the sunwardboundary. Frontiers 7 illas et al.

DOI analysis for space weather

We generate an ensemble of 50 OpenGGCM simulations by perturbing the v x component of the inputsolar wind velocity. Changing this particular parameter guarantees a direct and easy way to interpretmagnetospheric response.First, we run a reference simulation using the observed solar wind values at 1 AU starting from May8 th , 2004, 09:00 UTC (denoted as t ) until 13:00 UTC on May 8 th , 2004. We choose this period becauseit is relatively quiet: no iCME were registered in the Richardson/ Cane list of near-Earth interplanetaryCMEs (Richardson and Cane, 2010) and, as it is common during the declining phase of the solar cycle,geomagnetic activity is driven by Corotating Interaction Regions and High Speed Streams (Tsurutani et al.,2006). The study of outlier events, such as CME arrival at Earth, is left as future work.To generate the ensemble, the solar wind compression is changed in each of the “perturbed” simulations.The perturbed velocity in the x (Earth-to-Sun) direction is set, for the entire duration of each simulation, toa constant value obtained by multiplying the average observed v avgx by a random number S sampled froma normal distribution with mean µ = 1 and standard deviation σ = 0 . : v x = Sv avgx , with S ∈ N (1 , (0 . ) . (12)The time period we use to calculate the average is the duration of the reference simulation, between 9:00UTC and 13:00 UTC on May 8 th , 2004.Our choices for the generation of the ensemble are determined by the necessity to preserve both theGaussian characteristic of the model error and the physical signiﬁcance of the simulations: the solar windcompression in all ensemble members is is not too far from the reference value. With such low standarddeviation, the average of the obtained perturbed value plus/minus several sigmas are still within the typicalrange for the solar wind: the minimum and maximum values of the constant, perturbed input velocities are | v x | ∼

363 km/s and | v x | ∼

583 km/s respectively. The solar wind velocity v x is negative in the GeocentricSolar Ecliptic (GSE) coordinates used here. The v x values that we obtain in this way are not supposed tobe representative of the full range of values that v x can assume; they are used to generate an ensemble ofsimulations “slightly perturbed” with respect to our reference simulation. We note that the real distributionof solar wind velocity is far from a normal distribution, with two distinct peaks and extreme outliers, andwould not be appropriate to produce the required ensemble. We refer the reader to Fortin et al. (2014) foroptimal procedures on how to choose the variance of the ensemble.The ensemble analysis requires running 50 simulations to produce the ensemble members, plus theunperturbed reference simulation. Each run takes ∼

12 hours on 52 cores on the supercomputer Marconi-Broadwell (Cineca, Italy), for a total cost of ∼ R E . This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 1.

Reference simulation (panel (a) and (b)), ensemble mean (panel (c) and (d)) and differencebetween the ensemble mean and the reference simulation results (panel (e) and (f)) for the ensemble ofOpenGGCM magnetospheric simulations. The ensemble is generated by perturbing the v x solar windboundary condition. v x is depicted in panel (a), (c), (e), b x in (b), (d), (f). The coordinate system is GSEThe depicted time is 172 minutes after the beginning of the simulation, May 8 th v x and B x respectively, highlight the areaswhere the behavior differs most within the ensemble: the bow shock, the plasma sheet, and the neutral line.The former is a plasma discontinuity that moves back and forth in response to the changing solar wind Frontiers 9 illas et al.

DOI analysis for space weather

Mach number, and thus gets smeared out in the ensemble. The latter is a region of marginal stability in themagnetosphere that reacts in a non-linear way to solar wind changes.In order to determine if 50 ensemble members are sufﬁcient for our analysis, we have comparedcorresponding plots of the difference between the ensemble mean and the reference simulation for v x withdecreasing number (40, 30, 20) of ensemble members. We ﬁnd that, with decreasingly smaller ensembles,the plasma sheet structure seen in Fig. 1, panel (e), is only minimally affected. However, the differencesaround the bow shock become more pronounced. The velocity difference increases in the solar wind andmagnetosheath as well, and the magnetic ﬁeld structure at the magnetosphere/ solar wind interface (asshown by the magnetic ﬁeld lines, which are drawn for the average ﬁeld in panel (e) and (f) and similaranalysis) begin to change signiﬁcantly with respect to the reference simulation. By comparing the plotswith 50, 40, 30 and 20 ensemble members, we conclude that 30 is the minimum number of ensemblemembers that gives average ﬁelds compatible with the reference simulation, with our choice of perturbationto generate the ensemble.Panels (a) and (b) of Figure 1 show characteristic signatures of magnetic reconnection in the magnetotail,i.e, the X pattern and the formation of dipolarization fronts in the magnetic ﬁeld lines, and the presence ofearthward and tailward jets in v x departing from the X point. We provide a movie showing the dynamicevolution in the supplementary material (ReferenceSimVx.avi). The movie shows the solar wind b z timevariation and the subsequent occurrence of several magnetopause/ magnetotail reconnection events. The“formation” of the magnetosphere occurs during the ﬁrst ∼ minutes of the simulation and should bedisregarded.The movie MeanVx.avi, also in the supplementary material, shows how the global evolution changesin the ensemble mean: the magnetopause and magnetotail reconnection patterns are still overall visible,but smoothed out by the averaging procedure with respect to the reference simulation, since the differentensemble instances reconnect at different times and the smaller scale features of each single run areaveraged away.In Figure 2 and movies LowerVx.avi and HigherVx.avi we compare the evolution of the members ofthe ensemble generated with the lower ( | v x | ∼ km/s, panel (a)) and higher ( | v x | ∼

583 km/s, panel(b)) absolute value of the v x velocity component. The movies show that the velocity values and magneticﬁeld line patterns are signiﬁcantly different from the reference simulation and from the ensemble average,demonstrating that the perturbations are not trivial.We now discuss the RA and DOI analysis for a set of different observation points, depicted as white starsin the following ﬁgures, in the inﬂowing solar wind (a), in the magnetosheath (b), in the northern lobe (c),and in the plasma sheet (d), for the same plane and time as Figure 1. The coordinates of each of these pointsin the xz -plane are given in Table 1, the y coordinate being y/R E = 0 . Figure 3 and Figure 4 show the DOImaps for v x and b x respectively. Note that the correlations which are displayed are not cross-correlations:the correlation is done between the value of a ﬁeld at the observation point and the values of the same ﬁeldat the other grid points.Figures 3 and 4 show that the correlations are mostly ordered by the principal regions of the magnetospheresuch as the lobes, the magnetosheath, and the plasma sheet. For example, Figure 3 shows the results for the v x correlations. The DOI values for the plasma sheet are different from those in the magnetosheath and thelobes in all panels. As expected, the stronger correlations are somewhat localized around the observationpoint, for example, the strongest correlations in panel (d), where the observation point is in the plasmasheet, are in the plasma sheet itself and its immediate surroundings. However, some other observation This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 2. v x ﬁeld component, at the same time as Figure 1, for the ensemble member with the lowest(panel (a)) and highest (panel (b)) absolute value of the perturbed, inﬂowing v x velocity component in theOpenGGCM ensemble. x/R E y/R E z/R E solar wind 15 0 20magnetosheath -10 0 20northern lobe -10 0 5plasma sheet -20 0 -3 Table 1.

Coordinates of the observation points used in the DOI analysis.points have a much larger DOI, such as the ones in the solar wind and the magnetosheath. This makesphysical sense, because v x variations in those regions will propagate through much of the magnetosphere.Figure 4 shows the B x correlations. The northern and southern lobes clearly stand out, with opposite DOIvalues, and the magnetosheath stands out as well. Because the B x values have opposite signs in the twolobes, the DOI values also have opposite signs. The correlation values depend on the variability of the ﬁeldvalue at the observation point, thus the panels that exhibit lower correlations are those with observationpoint in the plasma sheet, where the intrinsic variance of both v x and B x is higher (see Figure 1) due tothe different reconnection patterns in the different members of the ensemble. For example, in Figure 3,panel (d) the velocity value at the observation point in the plasma sheet exhibits little correlation with the v x values outside of the plasma sheet and the neighboring areas. This is a consequence of the jet structurewhich is caused by internal magnetospheric dynamics rather than the solar wind driver.The temporal dynamic DOI behavior is similar: the DOI maps of v x and B x with observation point in theplasma sheet exhibit higher temporal variability than those with a observation point in the magnetosheath, ascan be seen in the DOI movies DOI bx bx MSheath.avi, DOI bx bx pSheet.avi, DOI vx vx MSheath.avi,DOI vx vx pSheet.avi and in Figure 5. The Figure shows the DOI map for v x (panel (a) and (b)) and B x (panel (c) and (d)) with observation point in the plasma sheet (panel (a) and (c)) and in the magnetosheath(panel (b) and (d)) at t + 192 minutes. All the previous ﬁgures, Fig. 1, 2, 3 and 4, were at t + 172minutes.We note that the plots with observation points in the magnetosheath are not signiﬁcantly different toearlier plots (see Figures 3, 4), except for the plasma sheet plots, which differ profoundly. Frontiers 11 illas et al.

DOI analysis for space weather

Figure 3.

DOI maps for v x , computed from the correlation of an ensemble of OpenGGCM magnetosphericsimulation, with observation points in the solar wind (panel (a)), magnetosheath (panel (b)), northern lobe(panel (c)), plasma sheet (panel (d)).To summarize and interpret the OpenGGCM results, the DOI analysis is well in line with ourunderstanding of the terrestrial magnetosphere. In the v x case, when the observation point is in thesolar wind or in the magnetosheath, the | DOI | values are very high in both the solar wind and themagnetosheath region. This is expected, because v x in the solar wind is a correlation with itself (and thus asanity test for the calculation), whereas the magnetosheath is largely driven by the interaction betweensolar wind and the bow shock, where the Rankine-Hugoniot conditions predict a positive correlation ofthe downstream velocity with the upstream velocity. When the observation points are in the solar windand magnetosheath regions, the | DOI | values in the plasma sheet are expected to be lower due to internaltransient dynamics (e.g., reconnection events, bursty bulk ﬂows) in the sheet which may be triggered bylocal plasma sheet dynamics, rather than solar wind compression. Local dynamics in the sheet is also thereason why, in Figure 3, panel (d), when the observation point is in the plasma sheet, the correlation withthe solar wind and magnetosheath regions is close to zero. Even if, in global terms, magnetic reconnectionin the plasma sheet were triggered by magnetopause dynamics, in any region of the plasma sheet v x may ﬂow sunward or anti-sunward, depending on the location of the reconnection site, and thus wouldbe uncorrelated with the velocity in the solar wind or in the magnetosheath. The lobe magnetic ﬁeld isexpected to be directly driven by the solar wind dynamic pressure, and thus by v x . As the dynamic pressure This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 4.

DOI maps for b x , computed from the correlation of an ensemble of OpenGGCM magnetosphericsimulations, with observation points in the solar wind (panel (a)), magnetosheath (panel (b)), northern lobe(panel (c)), plasma sheet (panel (d)).increases, the lobe ﬂare angle decreases, and vice versa. As the ﬂare angle decreases, the lobe ﬁeld getscompressed. Figure 4, panels (b) and (c) show that effect, as expected.Similar consideration broadly apply to the B x DOI results shown in Figure 4. There is, however, asigniﬁcant difference between panels (a) in 3 and 4. When the observation point is in the solar wind,high correlations are obtained in large parts of the magnetosphere for v x , while B x correlations are muchlower. This can be attributed to the fact that B x in the solar wind is not a major driver of magnetosphericdynamics, unlike the solar wind speed and solar wind dynamic pressure. The geo-effective component of theinterplanetary magnetic ﬁeld (IMF) is the B z component, which controls reconnection at the magnetopauseand thus the dominant energy input into the magnetosphere. The Tsyganenko models are a family of empirical, static terrestrial magnetic ﬁeld models (Tsyganenko,1987, 1989, 1995, 2002a,b; Tsyganenko et al., 2003; Tsyganenko and Sitnov, 2005). The successive modelversions (available at http://geo.phys.spbu.ru/˜tsyganenko/modeling.html ) reﬂectincreasing knowledge of the magnetospheric systems and are based on an increasing amount of datafrom all regions in the magnetosphere.

Frontiers 13 illas et al.

DOI analysis for space weather

Figure 5.

DOI maps for v x (panel (a) and (b)) and B x (panel (c) and (d)) from an ensemble of OpenGGCMmagnetospheric simulations with observation point in the plasma sheet (panel (a) and (c)) and in themagnetosheath (panel (b) and (d)), at t + 192 minutes. All previous Figures were at t + 172 minutes.The models are based on a mathematical description of the magnetosphere, which includes contributionsfrom major magnetospheric current sources such as the Chapman-Ferraro current, the ring current, the cross-tail current sheet and large-scale ﬁeld-aligned currents. Terms are added to account for the magnetopauseand for partial penetration of the IMF into the magnetosphere. The most recent versions can also takeinto account the dipole tilt, the dawn-dusk asymmetry, and allow for open magnetospheric conﬁgurations.The parameters of the models are derived from a regression to magnetic ﬁeld observations, and keyed tomagnetic indices and/or solar wind parameters. The model requires the user to specify a date and time forthe dipole orientation. The other model parameters, either an index such as the Kp, or solar wind variables,are to be given by the user. In more recent models, Tsyganenko also provides yearly input data ﬁles for hismodels. From these inputs, an approximation of the magnetosphere is created for the speciﬁed date andtime. Notice that the Tsyganenko models are static, and only provides a snapshot of the magnetosphere.However, since the parameters are time dependent the model can be used in a quasi-dynamic mode.Several versions of the Tsyganenko model have been tested over the years against observations andphysics-based, MHD models (Thomsen et al., 1996; Huang et al., 2006; Woodﬁeld et al., 2007). While theTsyganenko models do not account for the Earth’s internal magnetic ﬁeld, methods are provided to add theinternal ﬁeld model as described in the above cited literature. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

In order to simulate the evolution of the magnetosphere with the chosen Tsyganenko model, we createsnapshots of the magnetosphere at different times. The time May 8 th , 2004, 09:00 UTC is taken as t , thesame time as the OpenGGCM simulations presented in Section 3.1. The model is “evolved” by using atime series of the required input parameters, which are obtained from the OMNIWeb database (King andPapitashvili, 2005).We use two versions of the Tsyganenko model, the T96 model (Tsyganenko, 1996) and the TA15model (Tsyganenko and Andreeva, 2015). We generate the Tsyganenko ensembles in the same way as theOpenGGCM examples, by using a distribution of v x values as described in section 3.1.Before we analyse the results of the T96 ensemble, we show the magnetospheric conﬁguration computedby the model using the original solar wind data. In Figure 6, the ﬁrst row of ﬁgures shows the results of the“reference” simulation, e.g. the simulation without any perturbed inputs, at time t + 85 min, for B x (panel(a)) and B z (panel (b)).Unlike the OpenGGCM, the T96 model cannot model reconnection, although some approximationof reconnection is included in later Tsyganenko models (Tsyganenko, 2002a,b). Also, the day-sidemagnetospheric structure is only approximated with respect to physics-based models, and bow shockand magnetosheath are not clearly distinguishable. In Figure 6, the second row shows the average of theensemble at the same time of the reference simulation, for B x (panel (c)) and B z (panel (d)). The resultsare similar to the reference simulation, as shown by the logarithm of the absolute difference between thereference and ensemble mean, e.g., Figure 6, panel (e) and (f). The only signiﬁcant difference is located atthe magnetopause, which is expected since varying the solar wind velocity changes the standoff distance.Next we analyse the DOI maps of the T96 model. Figure 7 shows the DOI maps for the B x and B z ﬁeldcomponents at time t + 85 min. Although the T96 model is parameterized by the solar wind velocity,it only models the magnetic ﬁeld in the magnetosphere. Because of this, we are only able to analyzethe DOI maps of the magnetic ﬁeld components. The observation points are placed in the northern lobe,in proximity to the current sheet, at the dayside magnetosphere, and in the southern lobe. Like in theOpenGGCM case, the DOI maps reﬂect the general regions of the magnetosphere as reproduced by theT96 model. However, the correlation only takes values of ± v x variations of the ensemble. The former is due to the fact that the model hasno intrinsic time dependence. Any variations of the solar wind affect the entire magnetosphere instantlyand in proportion to the variation. Thus, after normalization, only the sign matters, i.e., whether a givenchange at the observation point leads to a positive or negative change at a different point.Now we focus on the results of the TA15 model. Figure 8 shows the reference simulation and ensemblemean for the B x and B z ﬁelds, with superimposed ﬁeld lines, at time t + 85 min, together with thedifference between reference and ensemble mean. We observe that the reproduced dayside magnetospherestructure is improved compared to the T96 model, at the expense of unrealistically high magnetic ﬁeldvalues in the inﬂowing solar wind, and correspondingly distorted magnetic ﬁeld lines. These artiﬁcialboundary conditions in the Sunwards boundary are used to obtain an “open” magnetosphere which blendswith the inﬂowing solar wind, without seeming to form a nightside magnetosheath. Notice also the highvalues at these artiﬁcial boundary conditions in the difference plot, indicating that there is a high variabilityin their values.From the DOI maps in Figure 9 (with observation points at the same positions as Figure 7), we canconﬁrm that the modelled IMF is used to construct the internal magnetospheric solution. While in the Frontiers 15 illas et al.

DOI analysis for space weather

Figure 6.

Results from the T96 Tsyganenko model. The top row shows the reference magnetic ﬁeld ( B x and B z component) at time t + 85 min, with a negative IMF. The middle row shows the ensemble meanof the same magnetic ﬁeld components at the same time t + 85 min. We clipped the magnetic ﬁeld valuesto | nT | , in order to make variations in the tail better visible. The last row shows the logarithm of theabsolute difference between the reference simulation and the mean of the ensemble. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

T96 model the solar wind B x and B z values were uncorrelated with the magnetospheric values, here theabsolute value (i.e. ignoring the sign) of the correlation is very high: the solar wind input strictly determinesthe inner magnetospheric solution, making the correlation practically unitary. This could be because ofthe deterministic analytical formula used to construct the magnetic ﬁeld, where everything is exactlydetermined on a global scale.Note that the correlations reported are spatial and not temporal, therefore no causality is implied. Highcorrelation between the IMF and magnetospheric ﬁelds point to the fact that, in an ensemble generated byperturbing the solar wind input, the model is built in such a way that variations in the magnetic ﬁeld arehighly correlated through the system, apparently without highlighting the boundary regions that we wereable to spot in the DOI maps for the OpenGGCM and Tsyganenko T96 simulations.A last remark on the DOI analysis applied to the Tsyganenko models is the following. The analysis helpsus understand and visualize how the different models are built, with regards to the relationship between thesolar input and the magnetospheric solution. DA analysis then proves useful here as a model investigationtool.It also highlights that caution should be used when deciding to apply DA techniques to a particularmodel, depending on the objectives of the investigation. The Tsyganenko models were built to providetime-independent, empirical-based insights into the structure of the magnetosphere at a particular instant intime. They do not aim at representing the state of the pristine solar wind, which is used only to better themagnetospheric solution (hence the somehow unrealistic solar wind patterns identiﬁed in Figure 8 and 9).Also, they do not intend to reproduce temporal dynamics in the magnetosphere. These factors result inDOI maps where the absolute value of the correlation is always either 1 or -1. When using DOI techniqueswith the purposes of identifying useful locations for satellite placement, these are not useful results: we areinterested in the value of the DOI, not in the sign . Hence, caution should be used before using empirical,time-independent models for this particular purpose: more signiﬁcant information will possibly be acquiredfrom their physics-based, time-dependent counterparts. This consideration does not intend to diminish theimportance of empirical, time-independent models for other scientiﬁc objectives such us, most importantly,quick forecasting. In this section, we study the propagation of a Coronal Mass Ejection in a solar meridional plane, whichis deﬁned by the rotation axis of the Sun and a radial vector in the equatorial plane. In all the runs ofthe ensemble, the computational domain is R (cid:12) ≤ r ≤ R (cid:12) and ≤ θ ≤ π in spherical coordinates,where R (cid:12) is the solar radius and θ is the polar angle (or colatitude), corotating with the Sun. Assumingaxisymmetry around the solar rotation axis, we may limit our analysis to 2.5D (pseudo 3D) simulations.The grid resolution is uniform in both directions, × cells, which is sufﬁcient to capture the structureof the background solar wind while keeping the computational cost and output size manageable.We simulate the background solar wind using a simple adiabatic model with effective polytropic index Γ = 1 . (Keppens and Goedbloed, 2000). We also assume a time-independent dipole backgroundmagnetic ﬁeld: Frontiers 17 illas et al.

DOI analysis for space weather B r = − B o cos θ/r , (13) B θ = − B o sin θ/r , (14)where B r , B θ are the r, θ components of the magnetic ﬁeld in spherical coordinates and B o a constant usedto scale the ﬁeld to B = 1 . G on the solar equator. We impose the density distribution ρ as a function of thelatitude θ at the inner boundary to achieve a “dead zone” of low velocity near the equator and a fast solarwind near the poles simultaneously (see Keppens and Goedbloed (2000) and Chan´e et al. (2008)). Thedifferential rotation of the Sun is also taken into account, following Chan´e et al. (2005); this is achieved byimposing a varying azimuthal velocity v φ = v φ ( θ ) at the inﬂow boundary.Once the simulation reaches a steady state, roughly after ∼ . days or t = 10 in normalized units, theradial velocity at 1 AU is ∼ km/s near the equator and ∼ km/s near the poles. This is consistent withthe large-scale bimodal solar wind structure that is typically observed during solar minimum (McComaset al., 1998)).We create two ensembles of 100 simulations each. In the ﬁrst ensemble, the velocity of the CME in eachcase is randomly selected from a Gaussian distribution with mean µ = 900 km/s and standard deviation σ = 25 km/s. The resulting values are typical of strong CME events. In the second ensemble, the spatialextent of the boundary conditions that launch the CME varies as well, along with the velocity of the CMEas described above. The half-width of this region is also randomly selected from a Gaussian distributionwith µ = 10 ◦ and σ = 0 . ◦ . All other parameters remain the same in every run.The values of the CME widths that are used here are comparable to observed events. The choice ofparameters in the second ensemble is less constrained by observations and leads to the appearance of verysmall values of variance. We thus ﬁnd large areas where the DOI ∼ , since the simulations in the ensembledo not differ signiﬁcantly. This was conﬁrmed by creating and analyzing a third ensemble, where the widthof the CME is chosen from a Gaussian with µ = 20 ◦ and σ = 2 . ◦ .Figure 10 shows the evolution of the radial velocity average over the whole ensemble. Up to t=11, i.e.,before the CME is launched, all runs are identical. The CME is initialized similar to the simpliﬁed approachof Keppens and Goedbloed (2000), such that the boundary conditions on the solar surface are modiﬁed torepresent a change of mass ﬂux. In our case, we modify the boundary conditions at R=1R (cid:12) , in a givenregion around θ = 80 ◦ . A tracer (a passive scalar only present as an advected quantity within the ﬂow,without effect on the plasma) is also injected with the CME, to facilitate monitoring its propagation. In themiddle and right panel of Fig. 10 we show the ejection of the CME and its propagation. The CME frontcan be clearly distinguished at t=12.To apply the RA technique, as explained in Section 2, we select a point of interest and perform theanalysis based on (a) the plasma density, or (b) the radial velocity. We present results at t=14, when theCME has reached a distance of ∼ R (cid:12) , for two detection points (at R = 90 and R = 150 R (cid:12) , θ = 80 ◦ ).At times earlier than t = 10 (when the solar wind reaches a steady state and the CME is injected), the DOIis zero, since the observation point is disconnected from the rest of the domain before the CME reaches it.The propagation of the CME can be monitored in the MHD simulations easily via e.g. a tracer (or theradial velocity). The DOI map, when the tracer is used as a criterion, follows closely the CME propagation This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather pattern observed in the MHD runs. However, this is of limited use, besides testing, as the tracer (in ourcase) does not represent a real physical quantity.The DOI map for the ﬁrst ensemble, where we perturb only the radial velocity of the CME, is shown inFig. 11. The regions where information from the CME front has not yet arrived have DOI=0, as shown inthe radial velocity DOI map (Fig. 11). When only one parameter is modiﬁed (ﬁrst ensemble), the densityDOI map shows a very large area of the domain saturated with correlation (cid:39)

1. This is probably due tovariations in density of the background solar wind induced by the propagation of the CME. The regions withabsolute value of the DOI (cid:39) that are located far from the CME propagation front (at small or large angles θ ) are the areas of high radial velocity in Fig. 10, where the information on the perturbation introducedwhen triggering the CME has already propagated. The density and radial velocity of all ensemble membersare modiﬁed in a similar way, hence the large | DOI | values.In the second ensemble, where the width of the CME is also modiﬁed, the DOI map of the densityshows smaller correlation values (compare especially panel (b) and (d) in the two Figures) compared to theprevious ensemble, because the differences between the runs of the ensemble are now larger (see Fig. 12).This results in smaller regions where the DOI is close to unity compared to the ﬁrst case.The last ensemble, where the CME width and its perturbation are larger compared to the second case, isshown in Fig. 13. The DOI pattern is qualitatively similar to Fig. 12, but due to the larger values in the sizeof the CME and its perturbation, the regions with high DOI values (meaning the regions affected by CMEpropagation in at least one member of the ensemble) are slightly larger as well.Additional analysis, not shown here, was carried out on subsets of the ensembles to ensure the ensemblesize is sufﬁcient. We found that in this case convergence was achieved if at least 25 ensemble memberswere used (as described in Section 2); however, this number may differ in other cases, depending on thespeciﬁcs of the ensemble.The DOI analysis applied to the simulations performed with PLUTO are indicative of the versatility ofthe method. In all PLUTO ensembles, we can monitor the inﬂuence of the CME during its propagationand the response of the system via the DOI. Moreover, we can identify certain CME components, suchas the leading edge, from the DOI maps. Differences in the response of the system due to the choice ofthe perturbation or parameters are captured as well. The resolution used here was sufﬁcient to capture theCME injection and propagation within reasonable computational cost; the typical run time for simulating amember of the ensemble was of the order of ∼ (cid:48) minutes on 28 cores.However, some limitations of the model must be considered. The axisymmetric assumption simpliﬁesthe problem and allows to reduce computation costs, but with the drawback of not accounting for thethree-dimensional CME structure. The limited angular resolution imposes a weak constraint on both theperturbed and unperturbed size of the CME that we can simulate . Runs with a higher resolution canremove this constraint at additional computational cost. Simulations in 3D will be part of future work inorder to capture the full system, where also differences in the polar direction can be examined. Finally, amore realistic model for the background magnetic ﬁeld should be used, rather than a simple static dipole.We focused mainly on calculating the DOI at different times and locations, but a similar approach can beused to estimate the arrival time of the CME, as described in Owens et al. (2020). Frontiers 19 illas et al.

DOI analysis for space weather

Figure 7.

Each row displays the DOI maps for B x and B z , from an ensemble of Tsyganenko model T96simulations with observation point in the plasma sheet, dayside magnetosphere and southern lobe tail,respectively. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 8.

The reference and mean magnetic ﬁeld at time t + 85 min from the Tsyganenko TA15 modelare displayed in the top two rows of ﬁgures. Notice the non-realistic high magnetic ﬁeld values in theinﬂowing solar wind (at x/R E > ). We assume these unrealistic values are necessary for the model toconstruct the day-side magnetosphere.The values of the magnetic ﬁeld have been cut off in the top tworows of ﬁgures at | nT | , to make sure the variations in the tail are visible. The last row of ﬁgures showsthe logarithm of the absolute error between the reference simulation and the mean of the ensemble. Frontiers 21 illas et al.

DOI analysis for space weather

Figure 9.

DOI maps for B x and B z , computed from an ensembles of Tsyganenko TA15 simulations at t + 85 min with observation points at the same position as Figure 7. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 10.

Injection and propagation of a CME via the average radial velocity in the ensemble (in km/s).From left (a) to right (c): relaxed solar wind (t=10), injection and propagation of the CME ( t = 11 , t = 12 ).The leading edge (front shock) is prominent in the last panel, marked by a black arrow. Frontiers 23 illas et al.

DOI analysis for space weather

Figure 11.

DOI maps for the density (ﬁrst row) and the radial velocity (second row), from an ensemble ofPLUTO simulations. The ﬁrst column has the observation point at R = 90 R (cid:12) and the second column at R = 150 R (cid:12) (marked by the black dot). In this set we only perturb the radial velocity of the CME. Thestructure of the CME can be seen quite clearly in panels (a) and (b), where the leading edge is evident andmarked by a black arrow in all panels. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Figure 12.

DOI maps for the density (ﬁrst row) and the radial velocity (second row), computed fromPLUTO simulation ensembles. The ﬁrst column has a detection point at R = 90 R (cid:12) and the second columnat R = 150 R (cid:12) (marked by the black dot). In this set we perturb two parameters, the radial velocity and thesize of the CME. Frontiers 25 illas et al.

DOI analysis for space weather

Figure 13.

DOI maps from ensembles of PLUTO simulations for the density (ﬁrst row) and the radialvelocity (second row). The ﬁrst column has a detection point at R = 90 R (cid:12) and the second column at R = 150 R (cid:12) . In this set we perturb two parameters, the radial velocity and the size of the CME, with alarger perturbation in the size with respect to the second data set. This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

In this paper, we apply the Representer analysis and the Domain of Inﬂuence analysis to two fundamentalcomponents of the Sun-Earth connection: the interaction between the solar wind and the terrestrialmagnetosphere, simulated with the OpenGGCM MHD code and with the empirical Tsyganenko models,and the propagation of CMEs in the background solar wind, simulated with the MHD PLUTO model.In each case an ensemble is generated by appropriately perturbing initial/ boundary conditions.Subsequently, the DOI analysis is applied over the ensemble. Localisation methods, which can be used toreduce spurious correlations in the estimated prior covariance matrix (Anderson, 2007; Bishop and Hodyss,2007; Sakov and Bertino, 2011), are not used at this stage.Primarily, the DOI analysis is a ﬁrst step in the application of Data Assimilation techniques to a model,and can be applied before assimilation itself to gain insight on the system and on the model. However, theDOI analysis can also be used to gain physical insight, and to devise optimized observation systems, asdiscussed below.Our main results are as follows.First, we have demonstrated that DOI analysis can provide useful information on the most appropriatelocations for future observation points, such as solar wind and magnetospheric monitors. Large absolutevalues of the DOI, calculated with respect to an observation point, means that observations at that locationwould provide signiﬁcant information of that ﬁeld in the speciﬁc, large | DOI | area, but less so in areaswith lower | DOI | . This can be used in two different ways. On one hand, DOI analysis can help to ﬁndobservations points that are connected to large | DOI | areas, in order to increase the amount of informationbrought in by a single new observation. On the other hand, the same information can be used for a differentobjective. Given a particular location, one can ask where observations need to be obtained to improveknowledge of that area. A useful example here is the plasma sheet in the OpenGGCM analysis, Section 3.1.Figure 3 and 4 show that | DOI | values in the plasma sheet are consistently low, notwithstanding the ﬁeldwhich is examined ( v x or B x ) and the location of the observation point. | DOI | values in the plasma sheetare low even if the observation point is in the plasma sheet itself: | DOI | values, which are of course1 at the observation point itself, quickly become smaller even a small distance away. Since the plasmasheet is a location of particular importance for space weather forecasting, or basic research for that matter,single s/c in the plasma sheet are of limited use, and rather a constellation of satellites, such as proposedin Angelopoulos et al. (1998); Raeder and Angelopoulos (1998) would be necessary.Second, we have used the DOI analysis to improve our knowledge of the models we use, and in particularto investigate whether these models are appropriate for the implementation of Data Assimilation. The DOIanalysis for the Tsyganenko models in Section 3.2 powerfully highlights the model evolution from versionT96 to version TA15. In version T96, the magnetosphere is a closed system, and solar wind conditions arenot correlated (DOI ∼ Frontiers 27 illas et al.

DOI analysis for space weather larger variability. The Tsyganenko models differ with respect to OpenGGCM in two fundamental aspects,in that they (a) empirically reconstruct the magnetospheric magnetic ﬁeld from an array of observationsand (b) that they are not time-dependent. Either of these two aspects can contribute to the unrealisticallyhigh correlations we observe. Investigations on other models, and speciﬁcally on empirical, time-dependentmodels, will possibly help disentangle the role of these two aspects. At this stage of the investigation, weadvance the hypothesis that time-dependent models may be better suited than time-independent models asbackground models for Data Assimilation techniques.Third, with this analysis we have highlighted a possible path for future, targeted improvements of globalheliospheric models used, among other things, for simulations of CME propagation in the heliosphere. Ithas long been known that one of the critical aspects of the simulation of CME arrival time is the estimationof the physical parameters to use as initial conditions in the simulations. While some parameters can beeasily estimated from remote sensing, others are more difﬁcult to determine properly and their variabilityaffects the accuracy of the forecast (Falkenberg et al., 2010). In this paper, we have shown that DOI analysiscould constitute an important stage of a model analysis effort aimed at clarifying which aspects of a modelshould be prioritized in order to obtain more accurate simulations of CME propagation.In this study, as a ﬁrst step, we show DOI maps obtained from the correlations of a single variablecalculated between the variable at the observation point and the same variable in the domain underinvestigation. As demonstrated in Skandrani et al. (2014), cross-correlations can be used to ﬁnd theinﬂuence of one variable upon another.The results of a DOI cross-correlation analysis can then be used to determine which quantities andareas in a simulation are most relevant in determining a certain observational quantity (such as the radialvelocity of a CME in the case of CME propagation simulations). This analysis can then guide modelers ondeciding which aspects of a model could be improved for more realistic results. It could help understanding,for example, if CME propagation in a model is mainly controlled by the background magnetic ﬁeldconﬁguration or by the properties of the CME itself at launch. In the ﬁrst case, modeling efforts could bedirected into accurate high resolution representation of the magnetic ﬁeld conﬁguration in the lower corona.In the second case, instead, modeling improvements could be focused on extracting better estimates ofCME launch parameters (e.g. CME density, velocity, internal magnetic ﬁeld conﬁguration with respect tothe background wind) from available observations.The spatial correlations provided by DOI can also be of particular interest in evaluating the effect ofactual measurements done at positions different from the traditional L1, such as, for example, missionsplanned for L5 or missions closer to the Sun.Future work will extend this study to include temporal and cross correlations between different ﬁeldcomponents. This will further increase our knowledge of the models used to simulate such critical spaceweather processes.The DOI analysis presented here can also be combined with an Observing System Simulation Experiment(OSSE), an approach already used in ionospheric and solar dynamo studies (Hsu et al., 2018; Dikpati, 2017)to help provide a cost-effective approach to the evaluation of the potential impact of new observations.OSSE requires that DA is already implemented and uses independently simulated “data” that are ingestedinto a different model or a different instance of the same model. The effect of DA can then be investigated,albeit with caveats, since the “data” are not real. DOI analysis would obviate the need to have DAimplemented, which can be very costly. Instead, only ensemble runs with an unmodiﬁed model are required,and can provide a measure of the usefulness of a model and the available data for a speciﬁc situation.

This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

CONFLICT OF INTEREST STATEMENT

The authors declare that the research was conducted in the absence of any commercial or ﬁnancialrelationships that could be construed as a potential conﬂict of interest.

AUTHOR CONTRIBUTIONS

MEI, BL and DM created the ensembles using OpenGGCM, Tsyganenko and PLUTO respectively, asthey appear in the main body of the article, processed the data and presented the results of the analysisand prepared the manuscript. JR provided OpenGGCM and contributed to the interpretation of themagnetosphere model results. JR, GL and SP provided support during the analysis and the preparation ofthe article.

FUNDING

ACKNOWLEDGMENTS

The PLUTO simulations were performed using allocated time on the clusters Genius and Breniac. Thecomputational resources and services used in this work were provided by the VSC (Flemish SupercomputerCenter), funded by the Research Foundation Flanders (FWO) and the Flemish Government – departmentEWI. The OpenGGCM simulations were performed on the supercomputer Marconi-Broadwell (Cineca,Italy) under a PRACE allocation. We acknowledge the NASA National Space Science Data Center, theSpace Physics Data Facility, and the ACE Principal Investigator, Edward C. Stone of the CaliforniaInstitute of Technology, for usage of ACE data. We acknowledge use of NASA/GSFC’s Space PhysicsData Facility’s OMNIWeb service, and OMNI data.

SUPPLEMENTAL DATADATA AVAILABILITY STATEMENT

The OMNIWeb database was used to get the solar wind conditions for the Tsyganenko runs. The ACEdatabase was used to get the solar wind conditions for the OpenGGCM runs. The data used in the analysisare available upon reasonable request.

Frontiers 29 illas et al.

DOI analysis for space weather

REFERENCES

Allen, J., Sauer, H., Frank, L., and Reiff, P. (1989). Effects of the march 1989 solar activity.

Eos,Transactions American Geophysical Union

70, 1479–1488Altschuler, M. D. and Newkirk, G. (1969). Magnetic ﬁelds and the structure of the solar corona.

SolarPhysics

9, 131–149Anderson, B. J., Korth, H., Welling, D. T., Merkin, V. G., Wiltberger, M. J., Raeder, J., et al. (2017).Comparison of predictive estimates of high-latitude electrodynamics with observations of global-scalebirkeland currents.

Space Weather

15, 352–373. doi:10.1002/2016sw001529Anderson, J. L. (2007). Exploring the need for localization in ensemble data assimilation using ahierarchical ensemble ﬁlter.

Physica D: Nonlinear Phenomena

Science Closure and EnablingTechnologies for Constellation Class Missions , eds. V. Angelopoulos and P. V. Panetta (University ofCalifornia, Berkeley, and NASA Goddard Space Flight Center). 14Arge, C. N., Henney, C. J., Koller, J., Compeau, C. R., Young, S., MacKenzie, D., et al. (2010). Air forcedata assimilative photospheric ﬂux transport (adapt) model. In

AIP Conference Proceedings (AIP), vol.1216, 343–346Bennett, A. F. (1992).

Inverse methods in physical oceanography (Cambridge university press)Berchem, J., Raeder, J., and Ashour-Abdalla, M. (1995). Reconnection at the magnetospheric boundary:Results from global MHD simulations. In

Physics of the Magnetopause , eds. B. U. Sonnerup and P. Song.vol. 90 of

AGU Geophysical Monograph , 205Bishop, C. H. and Hodyss, D. (2007). Flow-adaptive moderation of spurious ensemble correlations andits use in ensemble-based data assimilation.

Quarterly Journal of the Royal Meteorological Society: Ajournal of the atmospheric sciences, applied meteorology and physical oceanography

Proc of SIGGRAPH, Course

Space weather: physics and effects (Springer Science & BusinessMedia)Bouttier, F. and Courtier, P. (1999). Data assimilation concepts and methods. meteorological trainingcourse lecture series. ecmwf.

Reading, UK

Camporeale, E. (2019). The challenge of machine learning in space weather: Nowcasting and forecasting.

Space Weather

17, 1166–1207Chan´e, E., Jacobs, C., van der Holst, B., Poedts, S., and Kimpe, D. (2005). On the effect of the initialmagnetic polarity and of the background wind on the evolution of CME shocks.

A&A

Journal of Space Weather and Space Climate

6, A25.doi:10.1051/swsc/2016019Cramer, W. D., Raeder, J., Toffoletto, F., Gilson, M., and Hu, B. (2017). Plasma sheet injections into theinner magnetosphere: Two-way coupled openggcm-rcm model results.

Journal of Geophysical Research:Space Physics

This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Dikpati, M. (2017). Ensemble Kalman Filter Data Assimilation in a Solar Dynamo Model. In

AGU FallMeeting Abstracts . vol. 2017, SM14A–05Dorelli, J. C. (2004). A new look at driven magnetic reconnection at the terrestrial subsolar magnetopause.

Journal of Geophysical Research

Risk Analysis

37, 206–218. doi:10.1111/risa.12765Echevin, V., De Mey, P., and Evensen, G. (2000). Horizontal and vertical structure of the representerfunctions for sea surface measurements in a coastal circulation model.

Journal of physical oceanography

30, 2627–2635[Dataset] Egbert, G. and Bennett, A. (1996). Data assimilation methods for ocean tides, modern approachesto data assimilation in ocean modeling, edited by: Malonotte-rizzoli, pEvensen, G. (2009).

Data assimilation: the ensemble Kalman ﬁlter (Springer Science & Business Media)Fortin, V., Abaza, M., Anctil, F., and Turcotte, R. (2014). Why should ensemble spread match the rmse ofthe ensemble mean?

Journal of Hydrometeorology

15, 1708–1713Ge, Y. S., Raeder, J., Angelopoulos, V., Gilson, M. L., and Runov, A. (2011). Interaction of dipolarizationfronts within multiple bursty bulk ﬂows in global MHD simulations of a substorm on 27 February 2009.

J. Geophys. Res.

Advances in geophysics (Elsevier), vol. 33. 141–266Hsu, C. T., Matsuo, T., and Liu, J. Y. (2018). Impact of Assimilating the FORMOSAT-3/COSMIC andFORMOSAT-7/COSMIC-2 RO Data on the Midlatitude and Low-Latitude Ionospheric Speciﬁcation.

Earth and Space Science

5, 875–890. doi:10.1029/2018EA000447Huang, C.-L., Spence, H. E., Lyon, J. G., Toffoletto, F. R., Singer, H. J., and Sazykin, S. (2006). Storm-timeconﬁguration of the inner magnetosphere: Lyon-fedder-mobarry mhd code, tsyganenko model, and goesobservations.

Journal of Geophysical Research: Space Physics

Space Weather

9, 1–15Kalman, R. E. (1960). A new approach to linear ﬁltering and prediction problems.

Journal of basicEngineering

82, 35–45Kalnay, E. (2003).

Atmospheric modeling, data assimilation and predictability (Cambridge universitypress)Keppens, R. and Goedbloed, J. P. (2000). Stellar Winds, Dead Zones, and Coronal Mass Ejections.

ApJ

Journal of Geophysical Research: Space Physics

EGU General Assembly Conference Abstracts . EGU General Assembly Conference Abstracts, 10909Lang, M. and Owens, M. J. (2019). A Variational Approach to Data Assimilation in the Solar Wind.

SpaceWeather

17, 59–83. doi:10.1029/2018SW001857Lang, M. and Owens, M. J. (2019). A variational approach to data assimilation in the solar wind.

SpaceWeather

17, 59–83

Frontiers 31 illas et al.

DOI analysis for space weather

Laperre, B., Amaya, J., and Lapenta, G. (2020). Dynamic time warping as a new evaluation for dst forecastwith machine learning. arXiv preprint arXiv:2006.04667

Lavraud, B., Liu, Y., Segura, K., He, J., Qin, G., Temmer, M., et al. (2016). A small mission concept to thesun–earth lagrangian l5 point for innovative solar, heliospheric and space weather science.

Journal ofAtmospheric and Solar-Terrestrial Physics

Tellus A: Dynamic Meteorology and Oceanography

Journal of atmospheric and solar-terrestrialphysics

66, 1243–1256McComas, D. J., Bame, S. J., Barraclough, B. L., Feldman, W. C., Funsten, H. O., Gosling, J. T., et al.(1998). Ulysses’ return to the slow solar wind.

Geophysical Research Letters

25, 1–4. doi:10.1029/97GL03444Mendoza, O. B., De Moor, B., and Bernstein, D. (2006). Data assimilation for magnetohydrodynamicssystems.

Journal of Computational and Applied Mathematics

ApJS

Earth, Planets, Space

58, 439–449Nikoli´c, L. (2019). On solutions of the pfss model with gong synoptic maps for 2006–2018.

Space Weather

17, 1293–1311. doi:10.1029/2019SW002205Owens, M., Lang, M., Barnard, L., Riley, P., Ben-Nun, M., Scott, C. J., et al. (2020). A ComputationallyEfﬁcient, Time-Dependent Model of the Solar Wind for Use as a Surrogate to Three-DimensionalNumerical Magnetohydrodynamic Simulations.

Sol. Phys.

Space Weather

18, e2020SW002448. doi:10.1029/2020SW002448. E2020SW00244810.1029/2020SW002448Plunkett, S. (2005).

The extreme solar storms of October to November 2003 . Tech. rep., NAVALRESEARCH LAB WASHINGTON DC SPACE SCIENCE DIVPulkkinen, A., Lindahl, S., Viljanen, A., and Pirjola, R. (2005). Geomagnetic storm of 29–31 october 2003:Geomagnetically induced currents and their relation to problems in the swedish high-voltage powertransmission system.

Space Weather

Space Plasma Simulation , eds.J. B¨uchner, C. T. Dum, and M. Scholer (Berlin Heidelberg New York: Springer Verlag). doi:10.1007/3-540-36530-3 11Raeder, J. (2006). Flux transfer events: 1. generation mechanism for strong southward IMF.

AnnalesGeophysicae

24, 381–392. doi:10.5194/angeo-24-381-2006

This is a provisional ﬁle, not the ﬁnal typeset article illas et al. DOI analysis for space weather

Raeder, J. and Angelopoulos, V. (1998). Using global simulations of the magnetosphere for multi-satellitemission planning and analysis,. In

Science Closure and Enabling Technologies for Constellation ClassMissions , eds. V. Angelopoulos and P. V. Panetta (University of California, Berkeley, and NASA GoddardSpace Flight Center). 78Raeder, J., Cramer, W. D., Germaschewski, K., and Jensen, J. (2017). Using OpenGGCM to Computeand Separate Magnetosphere Magnetic Perturbations Measured on Board Low Earth Orbiting Satellites.

Space Sci. Rev.

Advances inSpace Research

36, 1804–1808. doi:10.1016/j.asr.2004.05.010Raeder, J., McPherron, R. L., Frank, L. A., Paterson, W. R., Sigwarth, J. B., Lu, G., et al. (2001a). Globalsimulation of the geospace environment modeling substorm challenge event.

J. Geophys. Res.

Sol. Phys.

J. Geophys. Res.

Solar Physics

Space Weather

2, 1–9Sakov, P. and Bertino, L. (2011). Relation between two common localisation methods for the enkf.

Computational Geosciences

15, 225–237Schatten, K. H. (1971). Current sheet magnetic model for the solar coronaSchatten, K. H., Wilcox, J. M., and Ness, N. F. (1969). A model of interplanetary and coronal magneticﬁelds.

Solar Physics

6, 442–455Schrijver, C. J. and DeRosa, M. L. (2003). Photospheric and heliospheric magnetic ﬁelds.

Solar Physics

Radio Science

Journal of GeophysicalResearch: Space Physics

Journal of atmospheric and solar-terrestrial physics

66, 1481–1489Skandrani, C., Innocenti, M. E., Bettarini, L., Crespon, F., Lamouroux, J., and Lapenta, G. (2014).Flip-mhd-based model sensitivity analysis.

Nonlinear processes in geophysics

21, 539–553Stone, E. C., Frandsen, A., Mewaldt, R., Christian, E., Margolies, D., Ormes, J., et al. (1998). Theadvanced composition explorer.

Space Science Reviews

86, 1–22Thomsen, M., McComas, D., Reeves, G., and Weiss, L. (1996). An observational test of the tsyganenko(t89a) model of the magnetospheric ﬁeld.

Journal of Geophysical Research: Space Physics

Washington DC American Geophysical Union Geophysical Monograph Series

Frontiers 33 illas et al.

DOI analysis for space weather

T´oth, G., Sokolov, I. V., Gombosi, T. I., Chesney, D. R., Clauer, C. R., De Zeeuw, D. L., et al. (2005). Spaceweather modeling framework: A new tool for the space science community.

Journal of GeophysicalResearch: Space Physics

Journal ofGeophysical Research: Space Physics

Planetary and space science

35, 1347–1358Tsyganenko, N. (1989). A magnetospheric magnetic ﬁeld model with a warped tail current sheet.

Planetaryand Space Science

37, 5–20Tsyganenko, N. (2002a). A model of the near magnetosphere with a dawn-dusk asymmetry 1. mathematicalstructure.

Journal of Geophysical Research: Space Physics

International conference on substorms . vol. 389, 181Vennerstrom, S., Moretto, T., Rastaetter, L., and Raeder, J. (2005). Field-aligned currents duringnorthward interplanetary ﬁeld: Morphology and causes.

J. Geophys. Res. doi:10.1029/2004JA010802

Vourlidas, A. (2015). Mission to the sun-earth l5 lagrangian point: An optimal platform for space weatherresearch.

Space Weather

13, 197–201. doi:10.1002/2015SW001173Wang, Y.-M. and Sheeley Jr, N. (1992). On potential ﬁeld models of the solar corona.

The AstrophysicalJournal

Journal of Geophysical Research: Space Physics

Journal ofGeophysical Research: Space Physics

AnnalesGeophysicae

27, 1129–1138. doi:10.5194/angeo-27-1129-2009