CConfidential manuscript submitted to
Earth and Space Science
Machine Learning Approach for Solar Wind Categorization
Hui Li , , Chi Wang , , Cui Tu , , and Fei Xu State Key Laboratory of Space Weather, National Space Science Center, CAS, Beijing, 100190, China. State Key Laboratory of Space Weather, National Space Science Center, CAS, Beijing, 100190, China. Physics Department, Nanjing University of Information Science and Technology, Nanjing, China. University of Chinese Academy of Sciences, Beijing, 100049, China.
Key Points: • An eight-dimensional scheme for 4-type solar wind categorization is developed basedon 10 supervised machine-learning classifiers. • Machine learning approach significantly improves the classification accuracy by ∼
10% over existing manual schemes. • Classification only depends on typical solar wind observations, such as N α , N p , B T , V p , T p . Corresponding author: Hui Li, [email protected] –1– a r X i v : . [ phy s i c s . s p ace - ph ] A ug onfidential manuscript submitted to Earth and Space Science
Abstract
Solar wind classification is conducive to understand the physical processes ongoingat the Sun and solar wind evolution in the interplanetary space, and furthermore, it is help-ful for early warning of space weather events. With rapid developments in the field of ar-tificial intelligence, machine learning approaches are increasingly being used for patternrecognition. In this study, an approach from machine learning perspectives is developed toautomatically classify the solar wind at 1 AU into four types: coronal-hole-origin, streamer-belt-origin, sector-reversal-region-origin, and ejecta. By exhaustive enumeration, an eight-dimensional scheme ( B T , N P , T P , V P , N α p , T exp / T P , S p , and M f ) is found to perform thebest among 8191 combinations of 13 solar wind parameters. 10 popular supervised machinelearning models, namely k Nearest Neighbors (KNN), Support Vector Machines with lin-ear and Radial Basic Function kernels, Decision Tree, Random Forest, Adaptive Boosting,Neural Network, Gaussian Naive Bayes, Quadratic Discriminant Analysis, and Extreme Gra-dient Boosting, are applied to the labeled solar wind data sets. Among them, KNN classifierobtains the highest overall classification accuracy, 92.8%. It significantly improves the ac-curacy by 9.6% over existing manual schemes. No solar wind composition measurementsare needed, permitting our classification scheme to be applied to most solar wind spacecraftdata. Besides, two application examples indicate that solar wind classification is helpful forthe risk evaluation of predicted magnetic storms and surface charging of geosynchronousspacecrafts.
In 1959, the first solar wind observation was made by the Soviet satellite,
Luna 1 .Since then, decades of in-situ solar wind measurements have firmly established that the solarwind plasma comes from different origins, for example, the coronal hole, the streamer belt,and active regions. Xu & Borovsky [2015, and references therein] showed that the solar windcan generally be classified into three major types: coronal-hole-origin plasma, streamer-belt-origin plasma, and ejecta.Coronal-hole-origin plasma (CHOP) is sometimes called the fast solar wind, whichoriginates from the open field line regions of coronal holes, and typically exhibits speeds inexcess of 600 km/s at 1 AU and beyond [Sheeley et al., 1976; McComas et al., 2008]. Statis-tically, CHOP tends to be homogeneous [Bame et al., 1977] with a high proton temperature –2–onfidential manuscript submitted to
Earth and Space Science and low plasma density [Schwenn, 2006], and is dominated by outward propagating Alfvénicwaves [Luttrell & Richter, 1988]. It exhibits a statistical non-adiabatic heating of the protonsbetween 0.3 to 1.0 AU [Hellinger et al., 2011]. In addition, field-aligned relative drifts be-tween the alphas and protons can frequently be found in CHOP, with a speed up to the localproton Alfvén speed [Marsch et al., 1982]. Moreover, the relative fluctuations of magneticfield and solar wind velocity are large in CHOP, about 24% and 19%, respectively. However,the corresponding Fourier spectral indices are -1.56 and -1.55 [Borovsky, 2012], which ismore likely to Iroshnikov-Kraichnan’s theory ( f − / ). As proposed by Li et al. [2011], thisfurther indicates that current sheets are rare in such kind of solar wind.Streamer-belt-origin plasma (SBOP), also known as the slow solar wind, has a typi-cal speed less than 400 km/s. Compared to CHOP, SBOP does not exhibit much Alfvénicfluctuation [Schwenn, 1990] but is highly structured [Bame et al., 1977] with a low protontemperature and high plasma density [Schwenn, 2006]. In addition, the alpha-proton rela-tive drift is typically absent in SBOP [Asbridge et al., 1976], and the protons are closer toadiabatic [Eyni & Steinitz, 1978]. The relative fluctuations of magnetic field and solar windvelocity are small in SBOP, which are only 16% and 11%, respectively. Different from thesituations in CHOP, both of the corresponding Fourier spectral indices obey Kolmogoroff’slaw ( f − / ), giving -1.70 and -1.67, respectively [Borovsky, 2012]. This indicates that thesolar wind may contains many current sheet structures [Li et al., 2011].Recently, it is found that SBOP can be further divided into two subgroups accordingto whether there exists an interplanetary magnetic sector reversal [Antonucci et al., 2005;Schwenn, 2006]. One subgroup is referred to as streamer belt plasma (SBP) without sectorreversals, and the other one is referred to as sector reversal region plasma (SRRP) with onesector reversal. The origin mechanism of SBP at the Sun is still a major unsolved problemin solar physics. There are two main mechanisms of SBP origination. One is the interchangemagnetic reconnection of open field lines with closed streamer belt field lines [Fisk et al.,1999; Subramanian et al., 2010; Antiochos et al., 2011; Crooker et al., 2012]; the other oneis from the edge of a coronal hole near a streamer belt [Wang & Sheeley, 1990; Arge et al.,2003]. SRRP is suggested to be emitted from the top of the helmet streamers [Gosling et al.,1981; Suess et al., 2009; Foullon et al., 2011]. Statistically, SBP and SRRP have differentcharacteristics in the solar wind and subsequent effects on the geospace environments, whichhave been summarized by Borovsky & Denton [2013]. –3–onfidential manuscript submitted to Earth and Space Science
Another major category of solar wind plasma is the so called ejecta (EJECT), whichare associated with solar transients such as interplanetary coronal mass ejections (ICMEs)and magnetic clouds (MCs) [Richardson et al., 2000; Zhao et al., 2009]. The origination ofEJECT is the magnetic reconnection associated with the structures of streamer belts or ac-tive regions, which can impulsively emit plasma and make the magnetic field deviate fromthe Parker spiral [Borovsky, 2010]. The typical signatures of EJECT at 1 AU have been wellsummarized [see Zurbuchen & Richardson, 2006, and references therein], for example, en-hanced and smoothly rotating magnetic field, low proton temperature and plasma β , extremedensity decrease, enhanced density ratio between alpha and proton, abundance and chargestate anomalies of heavy ion species, bidirectional strahl electron beams, cosmic ray deple-tion, and declining velocity. Different from the expansions of CHOP or SBOP in the twodirections transverse to radially outward from the Sun, impulsive EJECT expands in all threedirections as they propagate outward [Klein & Burlaga, 1982]. Recently, Li et al. [2016] per-formed a statistical survey on Alfvénic fluctuations inside ICMEs, finding that only 12.6%of EJECT are found to be Alfvénic, and such a percentage decays linearly in general as theradial distance increases. The relative fluctuations of magnetic field and solar wind velocityare medium in EJECT, 21% and 15%, respectively [Borovsky, 2012]. The Fourier spectralindices are close to -5/3 [Borovsky, 2012], and may decrease as the radial distance increases[Li et al., 2017].The categorization of the solar wind into its origin is of great importance for solar andheliospheric physics studies. Firstly, the statistical properties of solar wind should be clar-ified by its type to make a more comprehensive understanding of solar wind. Secondly, di-viding the solar wind observations at 1 AU according to their origins can lead to a betterdiagnosing of physical processes ongoing at the Sun [Mariani et al., 1983; Thieme et al.,1989, 1990; Matthaeus et al., 2007; Borovsky, 2008; Zastenker et al., 2014]. Thirdly, thegeoeffectiveness (geomagnetic activity, specifically, magnetic storm and substorm) of solarwind from different origins differ considerably [e.g., Borovsky & Denton, 2006; Turner etal., 2009; Borovsky & Denton, 2013]. Such a categorization would be helpful for the earlywarnings of space weather. Note that, these differences are in statistical terms. For individualcases, the situations may be quite different and complicated.Usually, the solar wind classification is done manually by experienced people. In theliterature, several empirical categorization methodologies in different parameter space havebeen proposed. In a one-dimensional parameter space, the solar wind was usually separated –4–onfidential manuscript submitted to Earth and Space Science into “fast wind" or “slow wind" according to its speed, V p [Arya & Freeman, 1991; Tu &Marsch, 1995; Feldman et al., 2005; Yordanova et al., 2009]. However, such a V p schemecan only roughly divide the solar wind into CHOP and SBOP, but could not separate outEJECT, SBP, and SRRP. Moreover, the criterion of V p is not unique. In 2014, another one-dimensional scheme based on the parameter P type (= 2 log S p − log ( C + C + ) − log ( C + C + )) was proposed by Borovsky & Denton [2014]. As the understanding of ICMEs and MCs isgetting better, many methodologies have been proposed to identify EJECT [see Zurbuchen& Richardson, 2006; Kunow et al., 2006, and references therein], and several catalogs ofEJECT at 1 AU have been produced [e.g., Lepping et al., 2005; Jian et al., 2006; Richardson& Cane, 2010]. Recently, the composition measurements were used for solar wind classifi-cation. An algorithm in a two-dimensional parameter space, such as O + /O + and V p , wasconstructed by Zurbuchen et al. [2002]; Zhao et al. [2009]; von Steiger et al. [2010]. Sucha two-dimensional scheme is still not able to divide SBOP into SBP and SRRP. In addition,such a scheme is not generally available for most solar wind spacecrafts due to the lack ofon-board ion composition instruments. Xu & Borovsky [2015] developed a three-parameter,four-plasma-type categorization scheme based on commonly used solar wind measurements,and obtained a good classification accuracy. In addition, an on-board solar wind classifi-cation algorithm was already applied in the Genesis spacecraft [Neugebauer et al., 2003;Reisenfeld et al., 2003]. Such a automatic method requires the measurement of bi-directionalelectron and historic solar wind classification results.Although the traditional classification has significant improvements in recent decades,there remains some improvement room for the existing empirical categorization schemes.The multi-label classification is regarded as a typical task of machine learning. Recently,the performance of machine learning classification is getting much better as the rapid de-velopments of artificial intelligence theory and techniques. Machine learning technique isbecoming more and more popular and powerful in big-volume data analysis in space physics,which may offer a solution to improve the accuracy of solar wind classification. Camporealeet al. [2017] recently employed a machine learning technique, Gaussian Process, in a four-category solar wind classification, and obtained a median accuracy larger than 96% for allcategories. However, the time resolutions of the variables they used are not uniform. For ex-ample, the temporal resolution is one day for sunspot number and solar radio flux (10.7 cm),but is one hour for the other five solar wind parameters and for the reference solar wind data –5–onfidential manuscript submitted to
Earth and Space Science sets. Camporeale et al. [2017] did not demonstrate the reasonableness of such mixture ofhourly averaged solar wind parameters and daily sampled parameters.In this work, we will apply 10 popular supervised machine learning models to identifythe solar wind plasma into four types (CHOP, SBP, SRRP, EJECT) based on typical solarwind observations with the same temporal resolution of 1 hour. In particular, we will identifythe best parameter scheme from 8191 combinations of 13 parameters derived from typicalsolar wind observations, judged by the classification accuracy as high as possible.
For conventional classifications of the solar wind plasma at 1 AU, reference solar winddata with known plasma types should be first collected. Then, empirical relationships aredeveloped to describe the domains of different plasma in some parameter space. Generally,the human experience performs well in two/three-dimensional parameter space. For a multi-dimensional space, humans cannot easily derive the empirical relationships.For supervised machine learning approaches, reference solar wind data with knownplasma types are needed for training the classifier as well. Then, the discriminant rules wouldbe developed automatically by machine learning classifiers. One of the advantages is that thediscriminant rules can be easily obtained in a multi-dimensional space for the machine learn-ing perspective. Usually, 75% (80%) of the reference solar wind data are used for training,and the remaining 25% (20%) used for testing, especially for the situation with the cases lessthan 10000.
Classification is regarded as one of the typical tasks carried out by so-called machinelearning system. The classifier is a critically important part of machine learning toolkit. Asthe rapid development of machine learning technique, a large number of classification algo-rithms have been developed. In this study, we will apply 10 widely used classifiers [Cady,2017] to perform solar wind categorization, namely k − nearest neighbors (KNN), linearsupport vector machines (LSVM), SVM with a kernel of Gaussian radial basis function(RBFSVM), Decision Trees (DT), Random Forests (RF), Adaptive Boosting (AdaBoost),Neural Network (NN), Gaussian Naive Bayes (GNB), Quadratic Discriminant Analysis (QDA),and eXtreme Gradient Boosting (XGBoost). Table 1 gives the references of these 10 clas- –6–onfidential manuscript submitted to Earth and Space Science
Table 1.
10 machine learning classifiers used in this study.
Classifier Abbreviation Reference k − nearest neighbors KNN Denoeux [1995]linear support vector machine LSVM Fan et al. [2008]SVM with Gaussian radial basis function kernel RBFSVM Buhmann [2003]Decision Trees DT Breiman et al. [1984]Random Forests RF Ho [1995]Adaptive Boosting AdaBoost Zhu et al. [2009]Neural Network NN Rojas [1996]Gaussian Naive Bayes GNB Perez et al. [2006]Quadratic Discriminant Analysis QDA Srivastava et al. [2007]Extreme Gradient Boosting XGBoost Chen & Guestrin [2016]sifiers for readers to get more details. All the classification algorithms are included in theScikit-learn package, which is an open source machine learning library written in the Pythonprogramming language [Pedregosa et al., 2011]. In this work, we will use the Scikit-learnpackage to carry out solar wind classifications. The details of the Scikit-learn package can befound at http://scikit-learn.org/stable/index.html. For supervised machine learning, reference solar wind data sets with known types areneeded to train the classifiers. We use the same data sets utilized in [Xu & Borovsky, 2015],and the solar wind plasma will be divided into four types: CHOP, SBP, SRRP and EJECT.The collection of reference CHOP comes from the ideal events used by Xu & Borovsky[2015]. They examined the solar wind speed V p , the proton-specific entropy S p = T p / N / p ,O + /O + , C + /C + and the characteristics of the interplanetary magnetic field to identifyCHOP. The intervals of twenty-seven day repeating steady high-speed solar wind streamswith long intervals (days) are regarded as CHOP. CHOP starts after the compression of thecorotating interaction region (CIR) and ends before the onset of the trailing edge rarefaction.At the same time, they also excluded large jumps in S p , O + /O + or C + /C + to make sure –7–onfidential manuscript submitted to Earth and Space Science
CHOP were not contaminated with ejecta. A total of 3049 hours of CHOP identified by Xu& Borovsky [2015] are used here.The collection of reference SBP comes from the pseudo-streamers during 2002-2008identified by Borovsky & Denton [2013]. Looking earlier in time the plasma upstream of theCIR, they checked the preceding intervals of CHOP. If the preceding coronal hole was of thesame magnetic sector as the coronal hole immediately following the CIR, and if no sectorreversals occurred in the streamer belt origin plasma between the successive two coronalholes, then the streamer belt plasma was classified into SBP. A total of 2275 hours of SBPidentified by Borovsky & Denton [2013] are used here.The collection of reference SRRP also comes from the work done by Xu & Borovsky[2015]. They examined the electron strahl observation and found some broad regions wherethe electron strahl dropped out around magnetic sector reversals at 1 AU. They denoted theregions where the strahl was very weak, intermittent, and/or intermittently bi-direction justoutside the strahl dropped out regions, to be “strahl confusion zones”. The solar wind fromthese confusion zones are defined as SRRP. A total of 1740 hours of SRRP are used here.The magnetic cloud collection made by [Lepping et al., 2005] is used to representEJECT here, which can be found at http://wind.gsfc.nasa.gov/mfi/mag_cloud_pub1.html .Magnetic clouds are believed to be a subset of interplanetary coronal mass ejections (ICMEs)with an enhancement of magnetic field intensity and a gradual rotation in direction. The typ-ical properties of magnetic cloud are a flux rope field configuration, low proton tempera-tures and low plasma beta value [Klein & Burlaga, 1982]. In general, only about one thirdof ICMEs can be regarded as magnetic clouds [Bothmer & Schwenn, 1996; Richardson &Cane, 2004]. Xu & Borovsky [2015] found a dual-population structure for the collection ofICMEs identified by Richardson & Cane [2010], but a single population for the collection ofmagnetic clouds identified by Lepping et al. [2005]. They believed that magnetic clouds canbetter present ejecta from the Sun, while the collection of ICMEs probably contains somenon-ejecta data. A total of 1926 hours of EJECT are used here.After removing some data gaps, the reference data set is composed of 2881 (33.4%) 1-hr events categorized as CHOP, 2215 (25.7%) events of SBP, 1694 (19.6%) events of SRRP,and 1835 (21.3%) events of EJECT. The imbalance ratio of these four types solar wind mayaffect the classification accuracy. In general, the accuracy would be relatively low whenfewer reference solar wind are used for training. The ratio of reference SRRP is the lowest. –8–onfidential manuscript submitted to
Earth and Space Science
Table 2.
List of 13 parameters used for solar wind classification.
Parameter Symbolmagnetic filed intensity B T proton density N p proton temperature T p solar wind speed V p proton-specific entropy S p Alfveń speed V A temperature ratio T exp / T p ratio of proton and alpha number density N α p dynamic pressure P d solar wind electric field E y plasma beta value β Alfvén Mach number M A fast magnetosonic Mach number M f Its classification accuracy is indeed found to be lower than the other three types in the fol-lowing section. The solar wind parameters used in this study are from the OMNI database(http://omni.gsfc.nasa.gov/), which is primarily a 1963-to-current compilation of hourly-averaged, near-Earth solar wind magnetic field and plasma parameter data from several space-crafts in geocentric or L1 (Lagrange point) orbits. The data have been extensively cross com-pared and cross-normalized for some spacecrafts and parameters.
With the input of solar wind parameters and information of solar wind types, the clas-sifiers can build discriminant rules automatically based on machine learning algorithms.Note that, most solar wind spacecrafts have no composition instrumentation. To make theapplicability of our classification scheme more extensive, the typical solar wind observa-tions (the magnetic field intensity, B T , the proton number density, N p , the alpha particlenumber density, N α , the proton temperature, T p , and the solar wind speed, V p ) and their de-rived quantities are used here. As listed in Table 2, a total of 13 parameters are used for so- –9–onfidential manuscript submitted to Earth and Space Science
Figure 1.
Probability density distributions of 13 solar wind parameters calculated from the whole referencesolar wind data sets. The parameters have been rescaled as follows: X = ( X − X )/ σ X . The area under eachcurve equals 1. –10–onfidential manuscript submitted to Earth and Space Science lar wind classification, such as B T , N p , T p , V p , the proton-specific entropy, S p , the Alfveńspeed, V A = B T / (cid:112) µ m p N p ( µ is the permeability in vacuum and m p is the mass of pro-ton), the temperature ratio, T exp / T p ( T exp = ( V p / ) . is the velocity-dependent expectedproton temperature given by Xu & Borovsky [2015] in unit of eV), the number density ra-tio of proton and alpha, N α p , the dynamic pressure, P d , the solar wind electric field, E y , theplasma beta value β , the Alfvén Mach number, M A = V A / V p , and the fast magneto-sonicMach number, M f = V p / (cid:113) C s + V A ( C s is the acoustic velocity). Note that, this parameterlist includes all the parameters used in [Xu & Borovsky, 2015] and four of seven parametersused in [Camporeale et al., 2017]. As mentioned, the reference solar wind with known typesis from hourly-averaged OMNI database, thus, only the parameters with a temporal resolu-tion of one hour are considered here. The parameters with a temporal resolution of one dayused in [Camporeale et al., 2017], the Sunspot number and solar radio flux (10.7 cm), are notconsidered here. Among them, a specific combination of parameter with the highest classifi-cation accuracy will be chosen for further analysis.Figure 1 shows the probability density distributions of the above 13 parameters calcu-lated from the whole reference solar wind data sets. Similar probability density distributionsof V p , V A , S p , and T exp / T p , are also shown by Camporeale et al. [2017]. Note that, the pa-rameters have been rescaled as follows: X = ( X − X )/ σ X , where X represents the mean valueof a parameter, and σ X denotes the standard deviation. Obviously, it is difficult to distinguishthe 4-type solar wind well from any individual probability distribution, which motivates theclassification in a multi-dimensional space. Nevertheless, some parameters could contributeto distinguish some solar wind type from the others. For example, B T and M f contributeto distinguish EJECT from the others, especially from the SRRP; N p , V p , and N α p are use-ful to distinguish between CHOP and SRRP; T p and S p help to distinguish CHOP from theothers; and V A is helpful to distinguish SRRP from the others. A natural thought is that theclassification accuracy would be improved greatly by considering the above eight parameterstogether. Actually, the selected eight-dimensional parameter scheme with the best classifica-tion accuracy for KNN classifier contains 7 of the above 8 parameters, only with V A replacedby T exp / T p .Given 13 input features, a total of 8191 combinations exist. Taking the KNN classi-fier as an example, the classification accuracy is calculated by using all the 8191 combina-tions of input features. The eight-dimensional scheme, the combination of B T , N P , T P , V P , N α p , T exp / T P , S p , and M f is found to perform the best, with the overall accuracy of 92.8%. –11–onfidential manuscript submitted to Earth and Space Science
Table 3.
Classification performances for 10 classifiers based on the combination of B T , N p , T p , V p , N α p , T exp / T p , S p , and M f . From second to sixth column, the value gives the classification accuracy. The last col-umn gives the Hanssen and Kuipers’ Discriminant, HKSS. Note that, 75% of the reference solar wind data areused for training, and the remaining 25% used for testing. 100 iterations with random selection of the trainingdata are run and the mean accuracies are reported here. CHOP SBP SRRP EJECT 4-type HKSSKNN 99.2 91.1 83.8 92.9 92.8 0.902XGBoost 99.2 90.9 83.6 92.8 92.6 0.898RF 99.3 90.2 81.6 94.1 92.3 0.895RBFSVM 99.1 89.0 81.1 94.1 91.9 0.890NN 99.1 88.7 80.6 92.2 91.3 0.881DT 98.1 84.8 77.6 89.0 88.7 0.846LSVM 99.0 81.1 71.1 88.2 86.6 0.816QDA 98.7 80.4 75.0 73.7 84.0 0.779GNB 96.8 76.0 76.9 73.1 82.5 0.767AdaBoost 97.5 85.1 45.2 85.6 81.5 0.737 –12–onfidential manuscript submitted to
Earth and Space Science
The accuracy for classifying CHOP, SBP, SRRP, and EJECT is 99.2%, 91.1%, 83.8%, and92.9%, respectively. Although this scheme is choosing from 8191 combinations of 13 vari-ables from the perspective of practical effect, it really has physical meanings. As shown inFigure 1, these parameters indeed contribute to distinguish some solar wind type from theothers. If some new variables are considered, another method to determine the variable com-bination may also work and reduce the test number greatly. For example, identify the firstvariable, by using that alone the best classification accuracy can be obtained. Then, iden-tify the second variable, by considering that with the first determined variable together thebest classification accuracy can be obtained. At last, repeat the second step until the accuracycould not be improved by adding any new variable. Actually, a set of mutually independentvariables contain enough information of the classification system. Here, some combined pa-rameters, e.g., S p , V A , T exp / T p , etc, are used only for the perspective of improving the clas-sification accuracy. If the mutually independent variables ( B T - V p - N p - T p - N α p ) are used, theclassification accuracy of 4-type solar wind will decline slightly from 92.8% to 92.0%.The classification is also done for the other 9 classifiers with the same parameter schemeused. The results are listed in Table 3. Five classifiers, KNN, XGBoost, RF, RBFSVM, andNN, produce the accuracy better than 90%. DT and LSVM also perform well, with the over-all accuracy better than 85%. The remaining classifiers, QDA, GNB, and AdaBoost, yieldaccuracies between 80-85%. It should be mentioned that the overall accuracy for the other9 classifiers should be improved if some special kind of parameter combination were used.The identification of CHOP is relatively easy. All the 10 classifiers work very well, with theaccuracy better than 96.5% and the highest accuracy given by RF of 99.3%. For identifyingEJECT, the accuracy decreases slightly. Only 5 classifiers yield accuracies better than 92%,and the highest accuracy given by RBFSVM is 94.1%. For identifying SBP, only 3 classifiersyield accuracies better than 90%, with the highest accuracy given by KNN of 91.1%. Theidentification of SRRP is relatively difficult. Only 5 classifiers yield accuracies better than80%, and the highest accuracy given by KNN is only 83.8%. Note that, 75% of the referencesolar wind data are used for training, and the remaining 25% used for testing. To make surethat our results are independent on the choice of training data set, cross validation is quitenecessary. Thus, we perform 100 runs with the training data set being chosen randomly. Theaccuracy given in Table 3 is the averaged value of the 100 tests.Besides the classification accuracy, the Hanssen and Kuipers’ Discriminant, HKSS,is also given in Table 3. The HKSS, also known as the True Skill Statistic, represents the –13–onfidential manuscript submitted to Earth and Space Science classification accuracy relative to that of random chance. For multi-category classification,its expression can be written as follows:
HKSS = N (cid:205) Ki = n ( F i , O i ) − N (cid:205) Ki = N ( F i ) N ( O i ) − (cid:205) Ki = ( N ( O i )) (1)where n ( F i , O i ) denotes the number of classifications in category i that had observations incategory i , N ( F i ) denotes the total number of classifications in category i , N ( O i ) denotes thetotal number of observations in category i , and N is the total number of classification. HKSSranges from -1 to 1. 1 represents the perfect performance, 0 denotes no improvement over areference classification, and ≤ S p isnot considered, the classification accuracy has the least decrease, 0.1%. And the accuracyhas the largest decrease, 2.2%, when N α p is not considered. However, it does not imply that S p is the least importance variable in solar wind classification. Actually, the highest classifi-cation accuracy is obtained by using S p alone, among the 13 variables. For different parame-ter combination, the most sensitive parameter should be different as well.It is hard to make sure that the result of supervised machine learning is neither over-fitted nor under-fitted. Comparing the accuracy of training vs. testing data sets is a good way,but not sufficient. Cross validation is another strategy to overcome such problems. Follow-ing the methodology of Camporeale et al. [2017], we also compare the results of 100 runs fordifferent ratios of the training data. In general, over-fitting is especially likely in cases wheretraining examples are rare. Thus, a relative large ratio of training data, for example, 45%,60%, 75%, and 90% are used, and the results are shown in Figure 2. The boxes denote thefirst and third quartiles of the accuracy distribution. The horizontal lines and triangles rep-resent the median and mean values, respectively. The whiskers denote the 2nd and 98th per-centiles. It is clear that the mean accuracy slightly increases when the ratio of training dataincreases from 45% to 75%. For the ratio of 90%, the accuracy has no significant improve-ment, however, the variation amplitude of classification accuracy increases significantly, andthe lowest accuracy even decreases slightly for identifying SBP, SRRP, and EJECT. In thefollowing texts, the accuracies are all obtained by using 75% of the data for training. Thisis just a simple approach to judge whether an over-fitting occurs or not. There may existother, more robust, means of examining over-fitting or under-fitting. Camporeale et al. [2017] –14–onfidential manuscript submitted to Earth and Space Science
Figure 2.
Accuracy of the KNN classifier calculated from 100 runs with different ratio of training dataset being chosen randomly. The boxes denote the first and third quartiles of the accuracy distribution. Thehorizontal lines and triangles represent the median and mean values, respectively. The whiskers denote the2nd and 98th percentiles. –15–onfidential manuscript submitted to
Earth and Space Science showed the accuracy of the Gaussian Process classification model with 10%, 15%, 20%, and25% of the original data used for training. Similarly, the accuracy increases when more datais used for training.
Figure 3.
Receiver operating characteristic (ROC) curves for CHOP, SBP, SRRP, and EJECT. The FalsePositive Rate is defined as the ratio of false positives divided by the total number of negatives. The True Pos-itive Rate denotes the ratio of true positives divided by the total numbers of positives. The area of the curverepresents the goodness of binary classification, and unity denotes the perfect result.
For binary classification, the threshold of probability changes to accuracy in terms oftrue and false positives and negatives. Here, “true/false” denotes correct, or incorrect, clas-sification, and “positive/negative” denotes that the solar wind is classified to be, not to be,some type. Thus, “true positive/flase positive” denotes that the solar wind is correctly/incorrectlyclassified to be some type, while “true negative/flase negative” denotes that the solar wind iscorrectly/incorrectly classified not to be some type. The Receiver operating characteristic(ROC) curve for different values of thresholds gives a concise representation of this metric.The horizontal axis is the False Positive Rate (FTR), which is defined as the ratio of falsepositives divided by the total number of negatives. And the vertical axis is the True Posi-tive Rate (TPR), which denotes the ratio of true positives divided by the total numbers ofpositives. A perfect classification would give FPR = 0, TPR = 1, and the area of ROC curveequals unity. Figure 3 shows the ROC curves for CHOP, SBP, SRRP, and EJECT. The areas –16–onfidential manuscript submitted to
Earth and Space Science of the curves are 0.996, 0.967, 0.955, and 0.980, respectively, indicating that the classifica-tion is pretty good. From practice, the threshold of probability can be chosen to be 0.3-0.5 toobtain optimal FPR and TPR, which is consistent with Camporeale et al. [2017].Figure 4 shows an example of solar wind classification obtained by the KNN classi-fier. The shaded regions represent the time intervals of reference solar wind with knowntypes. In general, all the solar wind can be distinguished well. It is clear that the CHOP, SBP,and EJECT in the shaded regions are identified perfectly with the accuracy nearly 100%.The classification accuracy for SRRP is not so high but still good, ∼ http://fluxrope.info/ ) given by Dr. Jinlei Zheng and Dr. Qiang Hu at University of AlabamaHuntsville. This indicates that our categorization scheme may in certain cases be useful foridentifying small flux ropes, but more investigation and validation is needed.Table 4 gives the comparison of the performances of various categorization schemes.The O + /O + - V p scheme proposed by Zhao et al. [2009] can not distinguish SBP and SRRP,and does not work well for identifying EJECT. The accuracy is only 63.5%. Xu & Borovsky[2015] proposed the S p - V A - T exp / T p scheme, which has a significant improvement on identi-fying EJECT and increases the accuracy to 87.5%. In addition, such a scheme can also dis-tinguish SBP and SRRP, with an accuracy ∼ B T - N P - T P - V P - N α p - T exp / T P - S p - M f scheme, on KNN classifier, and obtain significant improvements in classification accura-cies. The improvements of accuracy for identifying CHOP, SBP, SRRP, and EJECT is 2.3%,21.2%, 11.8%, and 5.4%, respectively. For the 4-type solar wind classification, the overallaccuracy has an improvement of 9.6%. It should be mentioned that, the feature space hasbeen optimized only for the KNN approach. For other classifiers with some other parameterscheme used, the accuracies could be improved. Camporeale et al. [2017] proposed a classi-fication scheme based on Gaussian Process classifier. By using the V p - σ T - SSN - F . V A - –17–onfidential manuscript submitted to Earth and Space Science
Figure 4.
An example of solar wind classification obtained by the KNN classifier. From top to bottom,the panel represents the magnetic field intensity, the proton number density, the solar wind speed, the protontemperature, the proton-specific entropy, the plasma beta value, the fast magneto-sonic Mach number, thedynamic pressure, and the ratio of proton and alpha number density. The units are in nT, cm − , km/s, eV,eVcm , unity, unity, nPa, and unity, respectively. The shaded regions represent the time intervals of referencesolar wind with known types. –18–onfidential manuscript submitted to Earth and Space Science
Table 4.
Accuracies of various categorization schemes in solar wind classification. Note that, 25% of thedatabase is used for training in [Camporeale et al., 2017], but the ratio is 75% in our study. 100 iterations withrandom selection of the training data are run and the mean accuracies are reported here.
Accuracy (%) CHOP SBP SRRP EJECT 4-type O + / O + - V p S p - V A - T ex / T p S p - V A - T exp / T p V p - σ T - SSN - F . V A - S p - T exp / T p V p - σ T - SSN - F . V A - S p - T exp / T p B T - N P - T P - V P - N α p - T exp / T P - S p - M f –19–onfidential manuscript submitted to Earth and Space Science S p - T exp / T p scheme ( σ T is the standard deviation of proton temperature, SSN is the sunspotnumber, and F . Camporeale et al. [2017] used the mixture of hourly averaged solar wind parametersand daily sampled parameters to obtain “excellent” classification accuracies, however, it isnot recommended in this study. Firstly, the time resolution of
SSN and F . SSN and F . SSN - F . SSN - –20–onfidential manuscript submitted to Earth and Space Science
Figure 5.
Top: Distribution of reference solar wind in the plot of
SSN vs. F . M f vs. S p . Bot-tom: Corresponding decision boundaries for each solar wind category. The overall accuracy given by KNNclassifier under the SSN - F . M f - S p scheme.–21–onfidential manuscript submitted to Earth and Space Science F . SSN vs. F .
7. At the same time, the corresponding decision boundaries for each solar wind cate-gory are too complicate to eliminate the concerns on the probability of over-fitting problem.One plausible reason is that there are only 479 independent data points in the plot of
SSN vs. F . < M f vs. S p is alsoshown in Figure 5. Although the overall accuracy given by KNN classifier is 79.2%, muchlower than that for SSN - F . SSN and F . SSN and F . In the previous classification schemes in two- or three- dimensional parameter space,solar wind composition measurement indeed plays an important role in solar wind classi-fication. However, it is still difficult to conclude that the composition measurement is thusindispensable. To show the importance of composition information in solar wind classifica-tion, we haveÂăaccessed the 1-hr composition data (C6+/C5+, O7+/O6+) from ACE satelliteduring 1998-2011. During this time interval, the reference solar wind data sets with datagap removed are 8021 hours: CHOP (2881 hours), SBOP (2215 hours), SRRP (1694 hours),Ejecta (1231 hours). Compared to the data sets without composition information, the Ejectadata reduced from 1835 hours to 1231 hours, and the CHOP, SBOP, SRRP data are the same.The overall classification accuracy by solely using C + /C + or O + /O + is 51.0% and 65.9%,which is less or comparable to the performance, 66.7%, when Sp is used solely. –22–onfidential manuscript submitted to Earth and Space Science
The comparison of classification results with/without composition information is shownin Table 5. It is clear that the classification results indeed have some minor improvements,especially when O + /O + information is considered. But the improvements are not much sig-nificant, only 1.5%. Considering that most of solar wind satellites, for example, the recentParker Solar Probe, do not have composition instrument, it is suggested that solar wind clas-sification scheme without composition information is still useful. Table 5.
Comparison of solar wind classification with/without composition information
CHOP SBP SRRP EJECT 4-type HKSSWithout Composition 99.3 91.4 85.1 92.5 93.1 0.903 C + / C + + /O + + /C + & O + /O + The reference solar wind with known types is very important for supervised machinelearning. In this study, the reference solar wind data comes from the work based on humanexperiences, which may have some uncertainties, especially at the boundaries of events. Anatural thought is that the center part of an event has the highest probability to be correctlylabeled. For practice, if 3-hr data points at both boundaries were deleted for each EJECTevent, the classification accuracy of EJECT should have an improvement of 2.2%. Thus, thefurther improvement of classification accuracy by machine learning is limited by the uncer-tainties of the reference solar wind data set.
The information of solar wind origin may be helpful for the early warnings of spaceweather. Firstly, the solar wind category is useful for the risk evaluation of a predicted geo-magnetic storm. Turner et al. [2009] showed that the storm intensity and occurrence rate ofintense storm (Dst minimum < -100 nT) for ICME-driven storms are larger than that for CIR-driven storms. The average Dst minimum during a CIR-driven storm is ∼ -74 nT, and the oc- –23–onfidential manuscript submitted to Earth and Space Science currence rate of intense storms is only 13%, however, these two values are -128 nT and 57%for ICME-driven storms, respectively. Besides, all superstorms, with Dst minimum < -300nT and midday magnetopause shifting earthward of geosynchronous orbit [Li et al., 2010],are associated with ICMEs. Secondly, the classification of CHOP and EJECT is also help-ful for the risk evaluation of surface charging of geosynchronous spacecrafts. [Borovsky &Denton, 2006; Denton et al., 1995] found that the magnitude of spacecraft potential is, on av-erage, significantly elevated for CIR-driven storms than during ICME-driven storms. Thirdly,McGranaghan et al. [2014] showed that SBP and SRRP produce forecastable changes in ther-mospheric density.Gonzalez & Tsurutani [1987] suggested that storm intensity depends on the intensityof southward interplanetary B Z and the threshold for intense storms is summarized to be-10 nT. Echer et al. [2008] later found that storm intensity depends on the solar wind elec-tric field E Y and the threshold for intense storms is summarized to be 5 mV/m. If B Z ≤ -10 nT and E Y ≥ B Z is observed to be -11.2 nT on 00:00 Feb-27-2003, moreover, the correspond-ing E Y is observed to be 5.03 mV/m. Based on our classification algorithm, the solar windplasma is categorized to be SBP, indicating a possible CIR-driven storm. Borovsky & Den-ton [2013] indeed identified that event as a pseudostreamer CIR. Thus, the impending stormwill be predicted to be a moderate storm with a big probability, at the same time, the riskof dangerous spacecraft surface charing is predicted to be high. As a validation, the realoccurred storm is identified to be a moderate storm, with the Dst minimum of -60 nT. Be-sides, the magnitude of spacecraft potential ( Φ ) in geosynchronous orbit during this stormis close to 4000 V. For the second case, similar B Z and E Y –24–onfidential manuscript submitted to Earth and Space Science
Table 6.
Application of the information of solar wind origin in improving space weather forecast.
Time B Z E Y Type Forecast
Dst min Φ a Feb-27-2003 -11.2 5.03 SBP Moderate CIR-storm -60 400000:00 UT high charging riskNov-08-1998 -10.8 5.08 EJECT Intense ICME-storm -149 90000:00 UT low charging risk a Data from the LANL/MPA instrument.At present, we use the in-situ observation at L1 point to classify the solar wind, andcan make a space weather early warning by half an hour or more. There could be more utilityfor the present classification scheme if a solar wind monitor is placed at L5. Besides, we arestill working on improving the time advance of solar wind classification by using the obser-vations on the Sun’s surface.
Solar wind categorization is conducive to understanding the solar wind origin andphysical processes ongoing at the Sun. Facing a great deal of spacecraft observations, man-ual classification based on rich experiences is prohibitive in terms of time and is challenged.Automatic classification methods are quite needed. Recently, with rapid developments in thefield of artificial intelligence, the classification by machine learning is becoming more andmore popular and powerful in big-volume data analysis, and furthermore, its performance isimproving as well.In this study, 10 popular supervised machine learning models, k − nearest neighbors(KNN), linear support vector machines (LSVM), SVM with a kernel of Gaussian radial basisfunction (RBFSVM), Decision Trees (DT), Random Forests (RF), Adaptive Boosting (Ad-aBoost), Neural Network (NN), Gaussian Naive Bayes (GNB), Quadratic Discriminant Anal-ysis (QDA), and eXtreme Gradient Boosting (XGBoost), are used to classify the solar windat 1 AU into four plasma types: coronal-hole-origin plasma, streamer-belt-origin plasma,sector-reversal-region plasma, and ejecta. –25–onfidential manuscript submitted to Earth and Space Science
A total of 13 parameters, each with 1-hr temporal resolution, are used for trainingthe classifiers and searching for the best variable scheme. These parameters are the mag-netic field intensity B T , the proton number density N P , the proton temperature T P , the solarwind speed V P , the proton-specific entropy S p , the Alfvén speed V A , the ratio of velocity-dependent expected proton temperature and proton temperature T exp / T P , the number densityratio of proton and alpha N α p , the dynamic pressure P d , the solar wind electric field E y , theplasma beta value β , the Alfvén Mach number M A , and the fast magneto-sonic Mach number M f . Note that, all the parameters can be obtained or derived from the typical solar wind ob-servations. No composition measurements are needed, allowing our algorithm to be appliedto most solar wind spacecraft data.By exhaustive enumeration, an eight-dimensional scheme ( B T , N P , T P , V P , N α p , T exp / T P , S p , and M f ) is found to obtain the highest classification accuracy among all the 8191 combi-nations of the above 13 parameters. Among the 10 popular classifiers, the KNN classifierobtains an accuracy of 92.8%. It significantly improves the accuracy by 9.6% over existingmanual schemes. In addition, small-scale flux rope events may also be able to be identifiedbased on our method, though further validation is needed. Besides, two application examplesof solar wind classification are given, indicating that it is helpful for the risk evaluation ofpredicted magnetic storms and surface charging of geosynchronous spacecrafts.This work emphasizes the classification technique itself rather than the science of thesolar wind origin. In the future, with new solar wind types and corresponding ideal eventsare proposed in the community, our machine learning approach will be updated accordinglyand more efforts are needed to bring up some new understandings to the science of the solarwind origin. Acknowledgments –26–onfidential manuscript submitted to
Earth and Space Science
NNSFC grants 41874203, 41574169, 41574159, 41731070, Young Elite Scientists Spon-sorship Program by CAST, 2016QNRC001, and grants from Chinese Academy of Sciences(QYZDJ-SSW-JSC028, XDA15052500). H. Li was also supported by the Youth InnovationPromotion Association of the Chinese Academy of Sciences, and in part by the SpecializedResearch Fund for State Key Laboratories of China.
References
Antiochos, S. K., Mikić, Z., Titov, V. S., Lionello, R., & Linker, J. A. (2011). A modelfor the sources of the slow solar wind.
The Astrophysical Journal, 731(2), 112 .https://doi.org/10.1088/0004-637X/731/2/112.Antonucci, E., Abbo, L., & Dodero, M. A. (2005). Slow wind and magnetic topology inthe solar minimum corona in 1996-1997.
Astronomy & Astrophysics, 435(2), 699-711 .https://doi.org/10.1051/0004-6361:20047126.Arge, C. N., Odstrcil, D., Pizzo, V. J., & Mayer, L. R. (2003).
Improved Method forSpecifying Solar Wind Speed Near the Sun . Paper presented at the Solar Wind Ten,https://doi.org/10.1063/1.1618574.Arya S., & Freeman J. W. (2012). Estimates of solar wind velocity gradients between0.3 and 1 AU based on velocity probability distributions from Helios 1 at perihelionand aphelion.
Journal of Geophysical Research: Space Physics, 96(A8), 14183-14187 .https://doi.org/10.1029/91JA01135.Asbridge, J. R., Bame S. J., Feldman W. C., & Montgomery M. D. (1976). Helium and hy-drogen velocity differences in the solar wind.
Journal of Geophysical Research, 81(16),2719-2727 . https://doi.org/10.1029/JA081i016p02719.Bame, S. J., Asbridge, J. R., Feldman, W. C., & Gosling, J. T. (1977). Evidence for astructure-free state at high solar wind speeds.
Journal of Geophysical Research, 82, 1487-1492 . https://doi.org/10.1029/JA082i010p01487.Borovsky, J. E., & Denton, M. H. (2006). Differences between CME-drivenstorms and CIR-driven storms.
Journal of Geophysical Research, 111(A7) .https://doi.org/10.1029/2005JA011447.Borovsky J. E. (2008). Flux tube texture of the solar wind: Strands of the mag-netic carpet at 1 AU?
Journal of Geophysical Research: Space Physics, 113(A8) .https://doi.org/10.1029/2007JA012684. –27–onfidential manuscript submitted to
Earth and Space Science
Borovsky, J. E. (2010). On the variations of the solar wind magnetic field about the Parkerspiral direction.
Journal of Geophysical Research: Space Physics, 115(A9) , A09101.https://doi.org/10.1029/2009JA015040.Borovsky, J. E. (2012). The velocity and magnetic field fluctuations of the solarwind at 1 AU: Statistical analysis of Fourier spectra and correlations with plasmaproperties.
Journal of Geophysical Research: Space Physics, 117(A5) , A05104.https://doi.org/10.1029/2011JA017499.Borovsky, J. E., & Denton, M. H. (2013). The differences between storms driven by helmetstreamer CIRs and storms driven by pseudostreamer CIRs.
Journal of Geophysical Re-search: Space Physics, 118(9) , 5506-5521. https://doi.org/10.1002/jgra.50524.Borovsky, J. E., & Denton, M. H. (2014). Exploring the cross correlations and autocorrela-tions of the ULF indices and incorporating the ULF indices into the systems science ofthe solar wind-driven magnetosphere.
Journal of Geophysical Research: Space Physics,119(6) , 4307-4334. https://doi.org/10.1002/2014JA019876.Bothmer, V., & Schwenn, R. (1996). Signatures of fast CMEs in interplanetary space.
Ad-vances in Space Research, 17(4) , 319-322. https://doi.org/10.1016/0273-1177(95)00593-4.Breiman, L., Friedman, J. H., Olshen, R. A., Stone, C. J. (1984). Classification and regres-sion trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software. ISBN978-0-412-04841-8.Buhmann, M. D. (2003), Radial Basis Functions: Theory and Implementations,
CambridgeUniversity Press , ISBN 978-0-521-63338-3.Cady, F. (2017). Machine Learning Classification, in The Data Science Handbook, John Wi-ley & Sons, Inc., Hoboken, New Jersey. doi: 10.1002/9781119092919.ch8.Camporeale, E., Carè, A., & Borovsky, J. E. (2017). Classification of Solar Wind With Ma-chine Learning: SOLAR WIND CLASSIFICATION.
Journal of Geophysical Research:Space Physics, 122 (11), 10,910-10,920. https://doi.org/10.1002/2017JA024383.Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. InProceedings of the 22Nd ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining (pp. 785-794). New York, NY, USA: ACM.https://doi.org/10.1145/2939672.2939785Crooker, N. U., Antiochos, S. K., Zhao, X., & Neugebauer, M. (2012). Global network ofslow solar wind.
Journal of Geophysical Research: Space Physics, 117(A4) , A04104.https://doi.org/10.1029/2011JA017236. –28–onfidential manuscript submitted to
Earth and Space Science
Denoeux, T. (1995). A K-Nearest Neighbor Classification Rule-Based on Dempster-Shafer Theory.
IEEE Transactions on Systems Man and Cybernetics, 25 (5), 804-813.https://doi.org/10.1109/21.376493.Denton, M. H., Borovsky, J. E., Skoug, R. M., Thomsen, M. F., Lavraud, B., Henderson, M.G., et al. (2006). Geomagnetic storms driven by ICME- and CIR-dominated solar wind.
Journal of Geophysical Research, 111 (A7). https://doi.org/10.1029/2005JA011436.Echer, E., W. D. Gonzalez, B. T. Tsurutani, and A. L. C. Gonzalez (2008), InterplanetaryConditions Causing Intense Geomagnetic Storms (Dst ≤ -100 nT) during solar cycle 23(1996-2006), J. Geophys. Res., 113 , A05221, doi:10.1029/2007JA012744.Eyni, M., & Steinitz, R. (1978). Cooling of slow solar wind protons fromthe HELIOS 1 experiment.
Journal of Geophysical Research, 83 , 4387.https://doi.org/10.1029/JA083iA09p04387.Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., Lin, C. J. (2008). LIBLINEAR: A li-brary for large linear classification.
Journal of Machine Learning Research. 9 : 1871-1874.Feldman, U., Landi, E., & Schwadron, N. A. (2005). On the sources of fast andslow solar wind.
Journal of Geophysical Research: Space Physics, 110 , A07109.https://doi.org/10.1029/2004JA010918.Fisk, L. A., Zurbuchen, T. H., & Schwadron, N. A. (1999). On the coronal magneticfield: consequences of large-scale motions.
The Astrophysical Journal, 521 , 868-877.https://doi.org/10.1086/307556.Foullon, C., Lavraud, B., Luhmann, J. G., Farrugia, C. J., Retinò, A., Simunac, K.D. C., et al. (2011). Plasmoid releases in the heliospheric current sheet and associ-ated coronal hole boundary layer evolution.
The Astrophysical Journal, 737(1), 16 .https://doi.org/10.1088/0004-637X/737/1/16.Gonzafcz, W. D. & Tsurutani, B. T., Criteria of interplanetary parameters causing intensemagnetic storms (Dst < -100 nT),
Planet. Space Sci., 35 , 1101, 1987.Gosling, J. T., Borrini G., Asbridge J. R., Bame S. J., Feldman W. C., & Hansen R. T.(2012). Coronal streamers in the solar wind at 1 AU.
Journal of Geophysical Research:Space Physics, 86(A7) , 5438-5448. https://doi.org/10.1029/JA086iA07p05438.Hellinger, P., Matteini, L., ˘Stverák, ˘S., Trávní˘cek, P. M., & Marsch, E. (2011). Heat-ing and cooling of protons in the fast solar wind between 0.3 and 1 AU: Heliosrevisited.
Journal of Geophysical Research: Space Physics, 116(A9) , A09105.https://doi.org/10.1029/2011JA016674. –29–onfidential manuscript submitted to
Earth and Space Science
Ho, T. K. (1995). Random Decision Forests. Proceedings of the 3rd International Conferenceon Document Analysis and Recognition, Montreal, QC, 14-16 August 1995. pp. 278-282.Jian, L., Russell, C. T., Luhmann, J. G., & Skoug, R. M. (2006). Properties of InterplanetaryCoronal Mass Ejections at One AU During 1995-2004.
Solar Physics, 239(1-2) , 393-436.https://doi.org/10.1007/s11207-006-0133-2.Klein, L. W., & Burlaga, L. F. (1982). Interplanetary magnetic clouds at 1 AU.
Journal ofGeophysical Research, 87 , 613âĂŞ624. https://doi.org/10.1029/JA087iA02p00613.Kunow, H., Crooker, N. U., Linker, J. A., Schwenn, R., & Von Steiger, R. (Eds.). (2006).Coronal mass ejections. Dordrechtl; Norwell, MA: Springer, 484.Lepping, R. P., Wu, C.-C., & Berdichevsky, D. B. (2005). Automatic identification of mag-netic clouds and cloud-like regions at 1 AU: occurrence rate and other properties.
AnnalesGeophysicae, 23 (7), 2687-2704. https://doi.org/10.5194/angeo-23-2687-2005.Li, G., Miao, B., Hu, Q., & Qin, G. (2011). Effect of Current Sheets on the So-lar Wind Magnetic Field Power Spectrum from the Ulysses Observation:From Kraichnan to Kolmogorov Scaling.
Physical Review Letters, 106 (12).https://doi.org/10.1103/PhysRevLett.106.125001.Li, H., C. Wang, and J. R. Kan (2010), Midday magnetopause shifts earthward of geosyn-chronous orbit during geomagnetic superstorms with Dst ≤ -300 nT, J. Geophys. Res.,115 , A08230, doi:10.1029/2009JA014612.Li, H., Wang, C., He, J., Zhang, L., Richardson, J. D., Belcher, J. W., & Tu, C. (2016).Plasma heating inside interplanetary coronal mass ejections by alfvénic fluctuations dis-sipation.
The Astrophysical Journal Letters, 831 (2), L13. https://doi.org/10.3847/2041-8205/831/2/L13.Li, H., Wang, C., Richardson, J. D., & Tu, C. (2017). Evolution of Alfvénic Fluctuations in-side an Interplanetary Coronal Mass Ejection and Their Contribution to Local PlasmaHeating: Joint Observations from 1.0 to 5.4 au.
The Astrophysical Journal, 851 (1), L2.https://doi.org/10.3847/2041-8213/aa9c3f.Luttrell, A. H., & Richter A. K. (1988). The role of Alfvénic fluctuations in MHD turbulenceevolution between 0.3 and 1 AU, in Proceedings of the Sixth International Solar WindConference, NCAR TN-306, edited by V. J. Pizzo, T. E. Holzer, and D. G. Sime, 335 pp.,Boulder, Colo.Mariani, F., Bavassano, B., & Villante, U. (1983). A statistical study of MHD discon-tinuities in the inner solar system-HELIOS 1 and 2.
Solar Physics, 83 , 349-365. –30–onfidential manuscript submitted to
Earth and Space Science https://doi.org/10.1007/BF00148285.Marsch, E., Rosenbauer, H., Schwenn, R., Muehlhaeuser, K.-H., & Neubauer, F. M. (1982).Solar wind helium ions-Observations of the HELIOS solar probes between 0.3 and 1 AU.
Journal of Geophysical Research, 87 , 35-51. https://doi.org/10.1029/JA087iA01p00035.Matthaeus, W. H., Breech, B., Dmitruk, P., Bemporad, A., Poletto, G., Velli, M., & Romoli,M. (2007). Density and Magnetic Field Signatures of Interplanetary 1/f Noise.
The Astro-physical Journal Letters, 657 (2), L121. https://doi.org/10.1086/513075.McComas, D. J., Ebert, R. W., Elliott, H. A., Goldstein, B. E., Gosling, J. T., Schwadron, N.A., & Skoug, R. M. (2008). Weaker solar wind from the polar coronal holes and the wholeSun.
Geophysical Research Letters, 35 , L18103. https://doi.org/10.1029/2008GL034896.McGranaghan, R., D. J. Knipp, R. L. McPherron, and L. A. Hunt (2014), Impact of equinoc-tial high-speed stream structures on thermospheric responses, Space Weather, 12,277âĂŞ297, doi:10.1002/2014SW001045.Moldwin, M. B., Ford, S., Lepping, R., Slavin, J., & Szabo, A. (2000). Small-scalemagnetic flux ropes in the solar wind.
Geophysical Research Letters, 27 (1), 57-60.https://doi.org/10.1029/1999GL010724.Neugebauer, M., Steinberg, J. T., Tokar, R. L., Barraclough, B. L., Dors, E. E., & Wiens, R.C., et al. (2003). Genesis On-board Determination of the Solar Wind Flow Regime. In C.T. Russell (Ed.), The Genesis Mission (pp. 153-171). Dordrecht: Springer Netherlands.https://doi.org/10.1007/978-94-010-0241-7_6.Pedregosa F, Varoquaux G, Gramfort A, Michel V, & Thirion B, et al. (2011). Scikit-learn:Machine learning in Python.
Journal of Machine Learning Research 12 : 2825-2830. doi:10.3389/fninf.2014.00014.Perez, A., Larranaga, P., & Inza, I. (2006). Supervised classification with conditional Gaus-sian networks: Increasing the structure complexity from naive Bayes.
International Jour-nal of Approximate Reasoning, 43 (1), 1-25.Reisenfeld, D. B., Steinberg, J. T., Barraclough, B. L., Dors, E. E., Wiens, R. C., & Neuge-bauer, M., et al. (2003). Comparison Of The Genesis Solar Wind Regime Algorithm Re-sults With Solar Wind Composition Observed By ACE.
AIP Conference Proceedings,679 (1), 632-635. https://doi.org/10.1063/1.1618674.Richardson, I. G., Cliver, E. W., & Cane, H. V. (2000). Sources of geomagnetic activ-ity over the solar cycle: Relative importance of coronal mass ejections, high-speedstreams, and slow solar wind.
Journal of Geophysical Research, 105, –31–onfidential manuscript submitted to
Earth and Space Science https://doi.org/10.1029/1999JA000400.Richardson, I. G., & H. V. Cane (2004), The fraction of interplanetary coronal mass ejec-tions that are magnetic clouds: Evidence for a solar cycle variation,
Geophysical ResearchLetters. 31 , L18804, doi:10.1029/2004GL020958.Richardson, I. G., & Cane, H. V. (2010). Near-Earth Interplanetary Coronal Mass EjectionsDuring Solar Cycle 23 (1996-2009) Catalog and Summary of Properties.
Solar Physics,264 (1), 189-237. https://doi.org/10.1007/s11207-010-9568-6.Rojas, R. (1996). Neural Networks-A Systematic Introduction, Springer-Verlag, Berlin, New-York.Schwenn, R. (1990), Large scale structure of the interplanetary medium, in Physics of theInner Heliosphere I, edited by R. Schwenn and E. Marsch, 99 pp., Springer, Berlin.Schwenn, R. (2006). Solar Wind Sources and Their Variations Over the Solar Cycle.
SpaceScience Reviews, 124 (1-4), 51-76. https://doi.org/10.1007/s11214-006-9099-5.Sheeley, N. R., Harvey, J. W., & Feldman, W. C. (1976). Coronal holes, solar wind streams,and recurrent geomagnetic disturbances: 1973-1976.
Solar Physics, 49 (2), 271-278.https://doi.org/10.1007/BF00162451.Srivastava, S., Gupta, M. R., & Frigyik, B. A. (2007). Bayesian quadratic discriminant analy-sis. Journal of Machine Learning Research, 8, 1277-1305.Subramanian, S., Madjarska, M. S., & Doyle, J. G. (2010). Coronal hole boundaries evolu-tion at small scales: II. XRT view. Can small-scale outflows at CHBs be a source of theslow solar wind?
Astronomy and Astrophysics, 516 , A50. https://doi.org/10.1051/0004-6361/200913624.Suess, S. T., Ko, Y.-K., von Steiger, R., & Moore, R. L. (2009). Quiescent current sheets inthe solar wind and origins of slow wind.
Journal of Geophysical Research: Space Physics,114 , A04103. https://doi.org/10.1029/2008JA013704.Thieme, K. M., Schwenn, R., & Marsch, E. (1989). Are structures in high-speed streamssignatures of coronal fine structures?
Advances in Space Research, 9 , 127-130.https://doi.org/10.1016/0273-1177(89)90105-1.Thieme, K. M., Marsch, E., & Schwenn, R. (1990). Spatial structures in high-speed streamsas signatures of fine structures in coronal holes.
Annales Geophysicae, 8 , 713-723.Tu, C.-Y., & Marsch, E. (1995). MHD structures, waves and turbulence in the solar wind:Observations and theories.
Space Science Reviews, 73 (1-2), 1-210. –32–onfidential manuscript submitted to
Earth and Space Science
Turner, N. E., Cramer, W. D., Earles, S. K., & Emery, B. A. (2009). Geoeffi-ciency and energy partitioning in CIR-driven and CME-driven storms.
Jour-nal of Atmospheric and Solar-Terrestrial Physics, 71 (10-11), 1023-1031.https://doi.org/10.1016/j.jastp.2009.02.005.von Steiger, R., Zurbuchen, T. H., & McComas, D. J. (2010). Oxygen flux in the so-lar wind: Ulysses observations.
Geophysical Research Letters, 37 (22), L22101.https://doi.org/10.1029/2010GL045389.Wang, Y.-M., & Sheeley, N. R. (1990). Solar wind speed and coronal flux-tube expansion.
The Astrophysical Journal, 355 , 726-732. https://doi.org/10.1086/168805.Xu, F., & Borovsky, J. E. (2015). A new four-plasma categorization scheme for the solarwind: 4-Plasma Solar-Wind Categorization.
Journal of Geophysical Research: SpacePhysics, 120 (1), 70-100. https://doi.org/10.1002/2014JA020412.Yordanova, E., Balogh, A., Noullez, A., & von Steiger, R. (2009). Turbulence and intermit-tency in the heliospheric magnetic field in fast and slow solar wind: turbulence and inter-mittency in the solar wind.
Journal of Geophysical Research: Space Physics, 114 (A8),n/a-n/a. https://doi.org/10.1029/2009JA014067.Zastenker, G. N., Koloskova, I. V., Riazantseva, M. O., Yurasov, A. S., Safrankova, J., & Ne-mecek, Z., et al. (2014). Observation of fast variations of the helium-ion abundance in thesolar wind.
Cosmic Research, 52 (1), 25-36. https://doi.org/10.1134/S0010952514010109.Zhao, L., Zurbuchen, T. H., & Fisk, L. A. (2009). Global distribution of the solar windduring solar cycle 23: ACE observations.
Geophysical Research Letters, 36 (14).https://doi.org/10.1029/2009GL039181.Zhu, J., Zou, H., Rosset, S. & Hastie, T. (2009). Multi-class adaboost.
Statistics and Its Inter-face, 2 . 349-360.Zurbuchen, T. H., Fisk, L. A., Gloeckler, G., & von Steiger, R. (2002). The solar wind com-position throughout the solar cycle: A continuum of dynamic states.
Geophysical Re-search Letters, 29 , 1352. https://doi.org/10.1029/2001GL013946.Zurbuchen, T. H., & Richardson, I. G. (2006). In-Situ Solar Wind and Magnetic Field Signa-tures of Interplanetary Coronal Mass Ejections.
Space Science Reviews, 123 (1-3), 31-43.https://doi.org/10.1007/s11214-006-9010-4.(1-3), 31-43.https://doi.org/10.1007/s11214-006-9010-4.