aa r X i v : . [ phy s i c s . s p ace - ph ] J a n Astrophysics and Space ScienceDOI 10.1007/s ••••• - ••• - •••• - • Regression modeling method of space weather prediction
A.S. Parnowski c (cid:13) Springer-Verlag ••••
Abstract
A regression modeling method of spaceweather prediction is proposed. It allows forecastingDst index up to 6 hours ahead with about 90% correla-tion. It can also be used for constructing phenomeno-logical models of interaction between the solar wind andthe magnetosphere. With its help two new geoeffectiveparameters were found: latitudinal and longitudinalflow angles of the solar wind. It was shown that Dstindex remembers its previous values for 2000 hours.
Keywords space weather; prediction; forecasting;magnetic storms; statistics; regression
The humankind studies space weather for more than4000 years starting from the first mentions of auroras inancient Chinese literature. The term “space weather”itself exists for almost a century. The official defini-tion adopted by COSPAR states that “Space weatherdescribes the physical processes induced by solar ac-tivity that have impact on our terrestrial and spaceenvironment, on ground based and space technologicalsystems, and on human activities and health.” Thefirst part of this definition actually covers two spatialscales of space weather, because when we speak aboutspace weather in space, e.g. in connection with space-craft failures, we usually mean some local parametersof the environment, and when we speak about spaceweather on the Earth, e.g. in connection with humanhealth, we usually mean some integral characteristics
A.S. ParnowskiSpace Research Institute of NASU and NSAUprosp. Akad. Glushkova, 40, korp. 4/1, 03680 MSP, Kyiv-187,Ukrainetel: +380933264229, fax: +380445264124e-mail:[email protected] like the geomagnetic indices. Since this article centerson the variations of the geomagnetic field, the lattermeaning will be used. The second part of this defini-tion indicates practical manifestations of space weather.The impact of the space weather on technological sys-tems is generally accepted (see Marubashi (1989)) dueto a number of spectacular events like the superstormof 1989 when Canada’s power grid was disabled for 9hours and numerous spacecraft failures due to “killerelectrons” causing arcing in electronic components, seeRomanova et al. (2005). The impact on human health,however, is disputed by most specialists. Nevertheless,the latest reports (e.g. Khabarova & Dimitrova (2008),Stoupel et al. (2006)) indicate that there is indeed astrong correlation between the rate of sudden cardiacdeaths and the space weather.The space weather problem is twofold. The firstaspect is purely practical and aims for predictionand, eventually, mitigation of adverse effects of spaceweather. Ideally, this task should be accomplishedby launching a vast number of spacecraft which willmonitor the Sun-Earth region for large-scale structureslike CMEs. Unfortunately, the resources of the hu-mankind are insufficient to produce and maintain sucha large space fleet as well as to process all the datadelivered by these spacecraft. So, today we shoulduse the resources at hand, which include a few solarwind spacecraft (ACE, WIND, SOHO, and STEREO),magnetospheric spacecraft (CLUSTER, THEMIS), andground-based stations (Intermagnet, MAGDAS, etc.),to develop forecast techniques that will be used in fu-ture. Thus, we should try to predict space weatherwith what data we have, and we should aim for pos-sibly longer prediction times to allow for some kind ofcountermeasures.The second aspect is mostly academic and involvesstudy of the processes in the near-Earth space and,specifically, understanding of interaction between the solar wind and the magnetosphere. Naturally, improv-ing our knowledge of the underlying physics signifi-cantly improves predictive capabilities, so fulfilling thesecond task will significantly help with the first one.Modern conceptions of solar wind-magnetosphere in-teraction are mostly based on phenomenological mod-els constructed in 1960’s. However, there are numerousproblems these models cannot answer. This is largelydue to the fact that these models were developed atthe very beginning of the space era when data qualityand quantity were immeasurably worse than today. Formore than 40 years we collected astonishing amounts ofdata about solar wind parameters and geomagnetic ac-tivity and now it is time to put them to good use.
Space weather prediction is a challenging and nontrivialactivity, see Li et al. (2003). The most straightforwardapproach to space weather prediction is studying thewhole complex chain of physical processes involved inmagnetospheric dynamics and conjugating them in aglobal model of the evolution of the magnetosphere un-der the influence of the solar wind. Unfortunately, thisis not yet possible due to our poor understanding ofthe physics of the interaction between the solar windand the magnetosphere. For this reason, different ap-proaches should be tried.According to Khabarova (2007), today there are sev-eral established methods of space weather prediction,listed below.1. Morphological analysis of solar images.This method provides the longest prediction time(up to a week). Its accuracy is unknown since it isused for the academic purposes only. Today it is purelymanual and thus almost useless for practical implica-tions.2. Detection of large-scale perturbations in the solarwind, see e.g. Eselevich & Fainshtein (1993), Eselevichet al. (2009).This method provides a very good prediction time(up to several days), but is capable of predicting lessthan 10% of the most intense storms. While it is veryinaccurate when used alone, it can prove to be useful incombination with one of the following short-term meth-ods.3. Construction of empirical models, see e.g. Burtonet al. (1975), Valdivia et al. (1996), O’Brien & McPher-ron (2000a), O’Brien & McPherron (2000b), Temerin &Li (2002), Temerin & Li (2006), Ballatore & Gonzalez(2003), Cid et al. (2005), Siscoe et al. (2005).
Time offset, hrs
ACF
Fig. 1
Autocorrelation function of Dst. Horizontal linescorrespond to top and mean incidental correlation levels inabscence of periodic variations. Gray sine has a period of1/2 year and depicts seasonal variations
This method provides the shortest prediction time(up to 1 hour) with moderate accuracy ( ∼ > egression modeling method 3 Correlation coefficient N u m be r o f po i n t s Monte-Carlo simulationsNormal distribution
Fig. 2
Distribution of correlation coefficient of Dst at verylarge time offsets in abscence of periodic variations et al. (1999), Watanabe et al. (2002), Wing et al. (2005),Pallocchia et al. (2006)), optimization (e.g. Zhou &Wei (1998), Balikhin et al. (2001), Harrison & Drezet(2001)), and correlation analysis (e.g. Rangarajan &Barreto (1999), Oh & Yi (2004), Wei et al. (2004),Johnson & Wing (2004), Johnson & Wing (2005)).Neural network approach provides short-term predic-tions up to 4 hours with the correlation coefficient of0.79 in the paper by Wing et al. (2005). Earlier im-plementations of this approach experienced significantdifficulties predicting strong geomagnetic storms with
Kp >
5, but this approach remains one of the mostpopular alongside the empirical methods. Optimiza-tion approach seems to be more successful being ableto provide 8-hour predictions in the paper by Harrison& Drezet (2001). However, in the papers based uponthe optimization methods the volume of the datasetused is insufficient to correctly describe secular varia-tions of geomagnetic activity. Correlation analysis givesinteresting results, but it was used solely for develop-ing and constraining empirical models (see Johnson &Wing (2004)). However, most of these methods havea common feature: they lead to a regression relation-ship at some point, so it seems natural to skip all thepreliminary steps and instantly use the regression anal-ysis without unnecessary multiplication of entities. Re-gression analysis itself was attempted earlier by Srivas-tava (2005), but it was used to estimate the probabil-ity of intense/super-intense storm occurence dependingon the solar and interplanetary parameters. Srivastava(2005) was able to predict 2 of 4 super-intense and 5
Time offset, hrs F Fig. 3
Dependence of Fisher significance F of the corre-sponding term in equation (1) on the time offset for the 1 h autocorrelation model of 5 intense CME driven storms during the 1996-2002period using another 46 CME driven storms to trainhis model.Hereafter we propose a new approach, named “re-gression modeling”, which already allows achieving ac-curate ( ∼ The proposed method is statistical, but has some fea-tures of empirical models. It is based upon the regres-sion analysis and the mathematical statistics. In itsframework the predicted Dst value is sought in the form D st ( j + k ) = X i C i x i ( j ) , (1)where j is the number of current step (number of hourssince Jan 1, 1963), k is the prediction length, C i arethe regression coefficients, and x i are the regressors,which are functions and combinations of input quan-tities, which are already measured at the time whenprediction is made. Values of C i are determined byleast square method (LSM) over a large sample of solarwind and geomagnetic data (see next chapter), withequal statistical weights of all points. The statistical Days of year -20-15-10-50 M ean D s t, n T Fig. 4
Seasonal variation of Dst
UT hours -13-12-11-10 M ean D s t, n T Fig. 5
Diurnal variation of Dst significance of the regressors was determined by Fishertest (F-test) (see Fisher (1954), Hudson (1964)). Thistest allows separating significant and insignificant re-gressors. The insignificant parameters are then rejectedand the routine is repeated until the regression containsonly significant regressors. Of course, this method doesnot guarantee that all the significant regressors will en-ter the regression, but physical considerations and bruteforce in the form of trial and error provide us with re-quested reliability. The regressors x i are generally non-linear, so from the control theory’s point of view, thismethod is able to describe discrete dynamical systemswith strong nonlinearity. This is an essential feature ofthe regression modeling method.There is only one manual operation in this method– selection of regressors to be considered. For this pur-pose all known models, basic physical considerations,and random choice are used. Naturally, common sensealso counts: for example it would be silly to add IMFcomponents in GSE and GSM coordinates at the sametime. If some regressors x i appeared to be statisticallysignificant, we also checked the significance of productsof their powers Q i x p i i , where p i can be any real num-ber, including zero, but for practical purposes we usedinteger values of p i in the range from 0 to 6. Thisyields a very important feature of the regression anal- Days of year I npu t i n D s t, n T Fig. 6
Sum of terms directly describing seasonal variationof Dst
UT hours I npu t i n D s t, n T Fig. 7
Sum of terms directly describing diurnal variationof Dst ysis: it allows checking the statistical significance ofany regressor, which can be useful for verifying differ-ent physical hypotheses. In this sense we will call aparameter “geoeffective” if it appears at least in onestatistically significant regressor.More details of this method can be found in the ar-ticle Parnowski (2009a).
The OMNI2 (2009) database was used. It containsIMF, solar wind and geomagnetic data, averaged over1-hour intervals (49 parameters in total, starting fromJan 1, 1963). It was supplemented with provisionalDst data, taken from WDC for Geomagnetism (Ky-oto). Thus a continuous 44-year Dst time series wasobtained.We estimated the geoeffectiveness of a parameter bycoefficients and statistical significances of all regressors,which contain this parameter. This was done in thefollowing way. After processing the data with the leastsquare method, Fisher significance parameter F wasdetermined for each regressor. All the F values werecompared to the values 2.7055, 3.84, 5.02, 6.635, 7.879, egression modeling method 5 Dst > − nT ) andperturbed ( Dst ≤ − nT ) conditions.First, we determined which previous Dst values arestatistically significant. For this purpose, we con-structed an autoregression (see details in Parnowski(2009b)) Dst ( j + k ) = C + N X i =1 C i Dst ( j − i + 1) , (2)where N is the “age” of the oldest Dst value. Thismodel alone is not sufficient to correctly predict Dst,but it sets a basis for the construction of models thatare able to do so. Let us determine the maximum rea-sonable value of N . For this purpose, we plot the auto-correlation function (ACF) of the Dst index for k = 1(see Fig. 1). One can see that ACF tends to a sinu-soid with a period close to half a year. This is causedby seasonal variations. This yields a question: if therewere no temporal variations, what would ACF tend toat large offsets? If the distribution of Dst was normal,the answer would be zero. However, the distribution isnot normal, so ACF can tend to some non-zero quan-tity. To determine this quantity we need to removetemporal variations. For this purpose we need to cal-culate the ACF of a random sample with the same sta-tistical characteristics as the Dst sample. The easiestway to get such a sample is to process the Dst samplewith a permutation method, which is widely used fordetermination of correlation functions, e.g. in astron-omy. This method is based on random shuffle of thesample. Using this method many times (10000 times inour case) and calculating the correlation coefficient eachtime, we get the distribution of the correlation coeffi-cient by Monte-Carlo method. The distribution of thecorrelation coefficient for this sample (Fig. 2) appearedto be very close to a normal distribution with mean0.008 and variance 5 . · − . The maximum recordedvalue in 10000 trials was equal to 0.015. The top andthe mean values are depicted on Fig. 1 by horizontallines. As one can see on Fig. 1, in reality the correla-tion coefficient exceeds this value at most times due to Fig. 8
Temporal variation of Dst. Darker spots correspondto lower values temporal variations. The ACF crosses the top line forthe first time at ∼ ∼ N . This hints that rather old Dst valuescan be quite significant. Besides the half-a-year period-icity one can also notice the 27-day periodicity, causedby Carrington rotation of the Sun, which can be takeninto account by adding the sunspot number R to theregression.Let us return to equation (2). Applying the F-testwe can determine which previous Dst values are sta-tistically significant (see Fig. 3). We did not searchstatistically significant Dst values for N > k = 1. The statistical significance of this oldest valueis over 99.9%.At this point we already have a large number ofregressors, describing just the previous Dst values(autoregression), without satellite data and nonlinear -15 -10 -5 0 5 10 15 (cid:84) V , ° N u m be r o f po i n t s -20-15-10-50 M ean D s t, n T Fig. 9
Distribution of the latitudinal flow angle and thecorresponding mean Dst values terms. If we add those, the number of regressors willonly increase.After determining which previous Dst values are sta-tistically significant, we added all the solar wind pa-rameters available in the OMNI 2 database. Then, weadded nonlinear terms as discussed in Section 3. Af-ter adding a new regressor, all the significances are re-calculated, and some of the old regressors can becomeinsignificant. The total number of regressors is about150-200. Since it is very large, we will not give hereany lists of regressors or coefficients even for the sim-plest case k = 1, but the preliminary list is given in thepaper Parnowski (2008). In this section we will demonstrate how this methodcan be used for identification of geoeffective parameters.We will use four parameters as an example: DOY (dayof the year), UT (universal time) and latitudinal andlongitudinal flow angles of the solar wind.On Fig. 1 one could see a clear seasonal dependenceof the Dst index. This dependence was described inmany articles, for example by O’Brien & McPherron(2002), Lyatsky et al. (2001), Takalo & Mursula (2001),and Cliver et al. (2000), but the reason behind it is stilldisputed. Most authors believe these asymmetries arecaused by either of two cusps turning to the sunlit sidedue to annual rotation of the Earth with respect tothe Sun. However, O’Brien & McPherron (2002) state (cid:84) V , ° Input in Dst, nT -3-2-1-15 -10 -5
Fig. 10
Sum of terms describing the latitudinal flow angle
Fig. 11
Seasonal dependence of latitudinal flow angle’sinput in Dst that this mechanism would give only 17% of observedasymmetry. Takalo & Mursula (2001) connected thediurnal variations of Dst with an uneven distributionof Dst network stations. Let us use this known effectto validate our method.If we select two subsamples, corresponding to sum-mer and winter in northern hemisphere, bounded byvernal and autumnal equinoctia, and verify the hypoth-esis that the difference between the corresponding av-erage Dst values is statistically significant using a one-sided Student test, we obtain t ∞ = 80 . t ∞ correspondingto 99 and 99.95% significance levels are equal to 2.334and 3.31 respectively. For diurnal asymmetry Studenttest gives t ∞ = 8 . t ∞ , we canbe sure in qualitative conclusions made. Figs. 4 and 5show the histograms of seasonal and diurnal variationsof Dst index.Taking this known geoeffective factor as an exam-ple we demonstrate how easily one can take it into ac-count using regression approach. To do so one shouldsimply add terms a ( j ) = sin(( j − π/ egression modeling method 7 -15 -10 -5 0 5 10 15 (cid:77) V , ° N u m be r o f po i n t s -40-200 M ean D s t, n T Bin size = 1°Quiet conditions (Dst > -50 nT)All data
Fig. 12
Distribution of the longitudinal flow angle and thecorresponding mean Dst values. Grey columns correspondto quiet conditions, white columns – to all data a ( j ) = cos(( j − π/ j is once again the number of hours since Jan1, 1963, 1920 is the number of hours between the be-ginning of the year and the vernal equinox, and 4383is the number of hours in half a year. The first ofthese terms is significant and describes summer/winterasymmetry, and the second one (which appears statisti-cally insignificant) describes an absent spring/autumnasymmetry. Likewise, for diurnal asymmetry the cor-responding terms will be b ( j ) = sin(( j − π/
12) and b ( j ) = cos(( j − π/ a ( j ) term is 30 times less thanthe observed difference between mean Dst values ofsummer and winter subsamples. This can be explainedin the following way: there are other regressors, whichdepend on parameters with statistically significant sum-mer/winter asymmetry, e.g. previous Dst values. Theyprovide the lion share of summer/winter asymmetry ofDst. A good example of such a regressor is the sunspotnumber R , which describes the 27-day periodicity. Nev-ertheless, there is a small difference which can not beexpressed with these terms. Including it into regres-sion, we obtain these statistically significant regressors.To further illustrate this point, let us consider as an ex-ample a value X = const + A sin ωt . In the regression itwill look like X n +1 = X n + A [sin ω ( t + ∆ t ) − sin ωt ] = X n + A [(cos ω ∆ t −
1) sin ωt + cos ωt sin ω ∆ t ]. The first (cid:77) V , ° Input in Dst, nT -15 -10 -5 -1
Fig. 13
Sum of terms describing the longitudinal flow an-gle
Fig. 14
Seasonal dependence of longitudinal flow angle’sinput in Dst term in brackets is of order ( ω ∆ t ) , and the second – ω ∆ t in the natural assumption that ω ∆ t ≪
1. So, itwill seem that the coefficient is Aω ∆ t rather than A .Note that this is just an example and has nothing to dowith actual regressors.However if we look on the distribution of mean Dstvalues vs. time of the year (Fig. 4), we see a muchmore complicated pattern of seasonal variations of ge-omagnetic activity. Among other features there is astrong asymmetry between summer and winter on oneside and spring and autumn on the other. To take itinto account we introduced additional terms into ourregression, which are powers of a ( j ) and their prod-ucts with powers of a ( j ). The sum of regressors withthe corresponding coefficients, depicted on Fig. 6, isvery similar to Fig. 4. Note that Fig. 6 was obtainedindependently from Fig. 4.We did the same thing with the diurnal asymmetry.The distribution is plotted on Fig. 5, and the sum ofregressors with the corresponding coefficients – on Fig.7. The term a ( j ) · b ( j ) is also significant and shouldbe included in the regression. After this we obtained ajoint distribution of semiannual and diurnal variationsof Dst index, plotted on Fig. 8. It contains 18 regres-sors. Increasing the number of regressors describing -200-150-100-50050 D s t, n T Dst from Kyoto WDC for Geomagnetism1 h prediction h prediction (autoregression) Fig. 15
Comparison between predictions results ofO’Brien & McPherron (2000a) (top) and ours (bottom) 1hour ahead. The following designations are used: ’Kyoto’– official Dst index, available at Kyoto WDC for Geomag-netism; ’AK1’ – prediction based on the model of Burton etal. (1975) with re-calculated coefficients; ’UCB’ – predictionbased on the model of Fenrich & Luhmann (1998); ’AK2’– prediction based on the model of O’Brien & McPherron(2000b); ’ACE Gaps’ refer to the top line, indicating theavailability of solar wind data measured by ACE satellite temporal variations of geomagnetic activity we can im-prove the accuracy of this distribution. In particular,one could add 11-year and 22-year solar cycles, higherpowers of a i ( j ) and b i ( j ) etc.Thus, we demonstrated how easily one can take intoaccount new geoeffective parameters in this method’sframework.Now let us discuss parameters, whose geoeffective-ness was determined by this method, and demonstratethat they are indeed geoeffective.Latitudinal flow angle θ V was mostly associated withthe southern component of IMF. I plotted the distri-bution of its value and the corresponding mean Dstvalue on Fig. 9. The distribution looks similar toa normal distribution, but it significantly differs fromthe normal one according to χ test. This manifests -50 -40 -30 -20 -10 0 10 20 30 40 50 Deviation, nT F r a c t i on o f a ll po i n t s Prediction for
Fig. 16
Error charts of predictions results of O’Brien &McPherron (2000a) (top) and ours (bottom) 1 hour ahead.Error chart for our 9 hour prediction is plotted for reference in much larger number of points with deviations morethan σ than follows from the normal distribution. Thisis mostly caused by the number of points in the wingbins | θ V − h θ V i| > σ being equal to 196 points ver-sus 11 points in the case of normal distribution. How-ever, most of these points were obtained in the 1960s,when quality of measurements was much worse then to-day. This period includes the maximum and minimumvalues of θ V , equal to − . ◦ and 18 . ◦ . Neverthe-less, these points constitute only a minor fraction of allpoints and didn’t affect the linear regression routine.Assuming normal distribution we obtain σ = 2 .
925 and h θ V i = 0 . < . σ . Thus, the distribution is insignifi-cantly shifted towards positive values.If we ignore the wing bins in the distribution of meanDst values against θ V , which are somewhat random dueto small amount of points in them, we will notice aslight almost linear trend. If we plot the sum of termscontaining θ V (Fig. 10), we will notice a similar trend.If we select two subsamples, one − < θ V < − < θ V <
9, and verify the hypothesis that thedifference between the corresponding average Dst val-ues is statistically significant using a one-sided Student egression modeling method 9
DOY 2000 -350-300-250-200-150-100-50050 D s t, n T Dst from Kyoto WDC for Geomagnetism1 h prediction h prediction (autoregression) UT hour -350-300-250-200-150-100-50050 D s t, n T Jul 15-16, 2000
Dst from Kyoto WDC1 h (autoregression) Fig. 17
Comparison between predictions results of Cerratoet al. (2004), Burton et al. (1975), O’Brien & McPherron(2000a), Fenrich & Luhmann (1998) (top) and ours (middle)1 hour ahead. Bottom plot is a scaled up version of themiddle one test, we obtain t ∞ = 6 . t ∞ is 5.44 in the summerand only 0.059 in the winter. The prior corresponds tomore than 99.95% significance, while the latter – to lessthan 10%. This could mean that there are two factorsconnected with the latitudinal flow angle, which worktogether in the summer and against each other in thewinter. The physical explanation of this phenomenon,however, lies beyond the scope of this paper. -500-450-400-350-300-250-200-150-100-500 D s t, n T Dst from WDC for Geomagnetism (Kyoto)1-hour forecast3-hour forecast
Fig. 18
Comparison between predictions results of Pal-locchia et al. (2006) (top) and ours (bottom) 1 hour ahead.Our 3-hour prediction is given for reference. On the top plotblack line depicts Dst from Kyoto WDC, and the blue line– 1 hour prediction
Longitudinal flow angle ϕ V was only occasionallyused in models. However, it appeared to be even moresignificant than the latitudinal flow angle. Its distri-bution together with corresponding mean Dst values isplotted on Fig. 12, where white bars show the com-plete dataset sans rejects, and the grayed bars showthe quiet-time sample with Dst > − nT . Like thelatitudinal flow angle, the distribution of the longitudi-nal flow angle resembles normal distribution. However, χ test disproves the relevant null-hypothesis. Onceagain, this is mostly due to wing bins which are mostlyformed of data points, corresponding to measurementsin 1960s, including maximum and minimum valuesequal to − . ◦ and 48 . ◦ . Assuming normal distribu-tion we obtain σ = 2 .
934 and h ϕ V i = − . ≈ − . σ .A significant trend is the most prominent feature ofthis figure. If we plot a sum of regressors, which con-tain ϕ V (Fig. 13), we see a very similar trend. Likebefore, we plot the distribution for summer and winter subsamples separately (Fig. 14). We see that the trendis identical on both plots, so the corresponding effectis season-independent. The list of regressors for k = 1,containing θ V and ϕ V , is given in Table 1. It containsthe regressors themself, their coefficients and F values.Thus, we demonstrated that our method is truly ca-pable of pointing out new geoeffective parameters andverified the geoffectiveness of two such quantities. Taking into account the considered parameters togetherwith parameters, whose geoeffectiveness was beyonddoubts, like previous values of Dst, dawn-dusk electricfield, ram pressure of the solar wind and most of otherparameters from the OMNI2 database, we constructedmodels for predicting Dst 1, 3, 6, 9, 12, 18, and 24hours ahead, and 3 more models for predicting Dst 1hour ahead for quiet and perturbed conditions and forthe case when satellite data are unavailable (autoregres-sion, see more in Parnowski (2009b)). The statisticalcharacteristics of these models are summarized in Table2. They include Residual Mean Square (RMS), LinearCorrelation Coefficient (LC), and Prediction Efficiency(
P E = 1 − RM S /SD , where SD is the sample’s Stan-dard Deviation). In divided cells the top number cor-responds to the actual model, and the bottom one –to the simplest possible model D st ( j + k ) = D st ( j ). Itis noteworthy that despites good correlation for all themodels, in reality only the 1-hour and 3-hour modelsare ready for practical use, and the 6-hour model canpotentially reach this state. This is due to a significanttime shift being present in further predicting models.Note that since the proposed method is statisti-cal, there is little difference whether the “training”sample contains the period when prediction is made Table 1
Regressors containing the flow angles. V ( j ) is thebulk flow velocity of the solar wind i x i C i F i θ V ( j ) · V ( j ) (2 . ± . · − . θ V ( j ) ( − . ± . · − . a ( j ) · θ V ( j ) · D st ( j ) (1 . ± . · − . a ( j ) · θ V ( j ) ( − . ± . · − . ϕ V ( j ) ( − . ± . · − . ϕ V ( j ) ( − . ± . · − . ϕ V ( j ) ( − . ± . · − . ϕ V ( j ) (5 . ± . · − . a ( j ) · ϕ V ( j ) ( − . ± . · − . a ( j ) · ϕ V ( j ) (5 . ± . · − . a ( j ) · ϕ V ( j ) ( − . ± . · − . h model is more precise than most empir-ical 1 h models. The autoregression model, describedby eq. (2), though, lags in the left part of the plotdue to a rapid positive change of Dst at 1500 UT. Thelag persists through the growth phase and the mainphase, and vanishes only in the recovery phase. Forthis reason, the autocorrelation model holds little prac-tical value and should be considered as a transitionalresult, required to construct the full model. It is, how-ever, possible to improve it by adding terms describingtemporal variations, and, for example, the number ofsunspots, but then the term “autoregression” will nolonger be applicable.On Fig. 19 we present the results of prediction 3,6 and 9 hours ahead for a number of events, kindlyselected for us by V.G. Fainshtein, which are particu-larly hard to predict by medium-term methods, suchas Eselevich et al. (2009), to verify the efficiency of ourmethod. We can see that this method’s accuracy ishigher for stronger storms, which are of greater inter-est. A huge advantage of this method is that the mostresource-demanding operation – the calculation of theregression coefficients – should be performed only oncefor each model. The prediction itself is just a summa-tion of a polynom, which usually takes no more than 4-6 egression modeling method 11 DOY 1996 -150-100-500 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.766)6-hour forecast (LC = 0.656)9-hour forecast (LC = 0.548)
DOY 1997 -150-100-500 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.853)6-hour forecast (LC = 0.555)9-hour forecast (LC = 0.304)
DOY 1999 -100-50050100 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.889)6-hour forecast (LC = 0.714)9-hour forecast (LC = 0.602)
DOY 2000 -300-250-200-150-100-50050 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.908)6-hour forecast (LC = 0.767)9-hour forecast (LC = 0.669)
DOY 2000 -150-100-50050 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.926)6-hour forecast (LC = 0.883)9-hour forecast (LC = 0.846)
DOY 2001 -550-500-450-400-350-300-250-200-150-100-50050 D s t, n T Dst from Kyoto WDC3-hour forecast (LC = 0.966)6-hour forecast (LC = 0.873)9-hour forecast (LC = 0.669)
Fig. 19
Prediction results for some specific intervals
Table 2
Statistical characteristics of forecasting models k , h RMS, nT LC PE Note1 3 . . . . . . . .
964 autoregression1 3 . . . . Dst > − nT . . . . Dst ≤ − nT . . . . . . . . . . . . . . . .
636 for reference18 16 . . . .
514 for reference24 18 . . . .
423 for reference seconds on an average PC (including disk I/O), whichallows for creation of fully automated operational on-line space weather forecast services.
The proposed regression approach appeared to be morethan adequate for space weather forecasting. For theforecasting per se, its main advantages are quite goodcorrelation (about 90% for 6 hours forecast), adapt-ability to any samples, and very fast forecasting code(typically about 5 seconds on an average PC). For theidentification of geoeffective parameters it is extremelyconvenient and easy to use. In particular, it allowed touncover 2 new geoeffective parameters – the latitudinaland the longitudinal flow angles of the solar wind.This is just a short summary of the regression mod-eling method, since its full description would take much more space. Of course, this method can be used in con-junction with other methods, first of all, with physicalmethods of detection of large-scale perturbations in thesolar wind and with empirical models. Acknowledgements
The author would like to thankProf. O.K. Cheremnykh, Prof. V.A. Yatsenko, andAcademician V.M. Kuntsevich for fruitful discussion,Prof. V.G. Fainshtein for useful remarks and for pro-viding a list of geomagnetic events for validation of themodel, and Reviewer egression modeling method 13
References