[PDF] Incorporating Physical Knowledge into Machine Learning for Planetary Space Physics

Abstract

Recent improvements in data collection volume from planetary and space physics missions have allowed the application of novel data science techniques. The Cassini mission for example collected over 600 gigabytes of scientific data from 2004 to 2017. This represents a surge of data on the Saturn system. Machine learning can help scientists work with data on this larger scale. Unlike many applications of machine learning, a primary use in planetary space physics applications is to infer behavior about the system itself. This raises three concerns: first, the performance of the machine learning model, second, the need for interpretable applications to answer scientific questions, and third, how characteristics of spacecraft data change these applications. In comparison to these concerns, uses of black box or un-interpretable machine learning methods tend toward evaluations of performance only either ignoring the underlying physical process or, less often, providing misleading explanations for it. We build off a previous effort applying a semi-supervised physics-based classification of plasma instabilities in Saturn's magnetosphere. We then use this previous effort in comparison to other machine learning classifiers with varying data size access, and physical information access. We show that incorporating knowledge of these orbiting spacecraft data characteristics improves the performance and interpretability of machine learning methods, which is essential for deriving scientific meaning. Building on these findings, we present a framework on incorporating physics knowledge into machine learning problems targeting semi-supervised classification for space physics data in planetary environments. These findings present a path forward for incorporating physical knowledge into space physics and planetary mission data analyses for scientific discovery.

Full PDF

IIncorporating Physical Knowledge intoMachine Learning for Planetary SpacePhysics

A. R. Azari , ∗ , J. W. Lockhart M. W. Liemohn and X. Jia Climate and Space Sciences and Engineering Department, University of Michigan,Ann Arbor, MI, United States Now at the Space Sciences Laboratory, University of California, Berkeley, CA,United States Sociology Department, University of Michigan, Ann Arbor, MI, United States

Correspondence*:A. R. Azariazari at berkeley dot edu

ABSTRACT

70 kilobyte 8-track storage ability. Machine learning can helpscientists work with data on this larger scale. Unlike many applications of machine learning, aprimary use in planetary space physics applications is to infer behavior about the system itself.This raises three concerns: ﬁrst, the performance of the machine learning model, second, theneed for interpretable applications to answer scientiﬁc questions, and third, how characteristicsof spacecraft data change these applications. In comparison to these concerns, uses of ‘blackbox’ or un-interpretable machine learning methods tend toward evaluations of performance onlyeither ignoring the underlying physical process or, less often, providing misleading explanationsfor it. The present work uses Cassini data as a case study as these data are similar to spacephysics and planetary missions at Earth and other solar system objects. We build off a previouseffort applying a semi-supervised physics-based classiﬁcation of plasma instabilities in Saturn’smagnetic environment, or magnetosphere. We then use this previous effort in comparisonto other machine learning classiﬁers with varying data size access, and physical informationaccess. We show that incorporating knowledge of these orbiting spacecraft data characteristicsimproves the performance and interpretability of machine learning methods, which is essential forderiving scientiﬁc meaning. Building on these ﬁndings, we present a framework on incorporatingphysics knowledge into machine learning problems targeting semi-supervised classiﬁcationfor space physics data in planetary environments. These ﬁndings present a path forward forincorporating physical knowledge into space physics and planetary mission data analyses forscientiﬁc discovery.

Keywords: planetary science, automated event detection, space physics, Saturn, physics-informed machine learning, featureengineering, domain knowledge, interpretable machine learningAccepted as of May 2020. The typeset version of this article may be found at the Frontiers Research Topic on Machine Learning inHeliophysics under: frontiersin.org/articles/10.3389/fspas.2020.00036. a r X i v : . [ phy s i c s . s p ace - ph ] J un zari et al. Machine Learning: Planetary Space Physics

CONTRIBUTION TO THE FIELD STATEMENT

With the explosion of machine learning usage across scientiﬁc ﬁelds, a struggle has emerged to derivephysical meaning and interpretation from these results. This interest partially stems from a desire to thenuse these applications to further understanding of complex physical systems. Spacecraft missions offernever before obtained data volumes for characterizing space environments around planets. Spacecraftmission data, however, are inherently challenging to incorporate into machine learning models due tothe unique spatio-temporal nature of sampling for instance. Within this work we detail a case study ofan interpretable automated event detection method to study Saturn’s space plasma environment. We thendiscuss considerations of machine learning models in a range from physics-informed to physics-blind butdata-rich situations and provide a framework for future interpretable methods in planetary space physics.We demonstrate that the incorporation of physical information improves the performance and usage forscientiﬁc understanding, of machine learning methods. This contributes to the ﬁelds of space physics andplanetary science a path forward to incorporating the domain knowledge of space environments.

Planetary space physics is a young ﬁeld for large-scale data collection. At Saturn for example, it was only in2004 that the ﬁrst Earth launched object orbited this planet (Cassini) and landed on Titan (Huygens). Afterarriving Cassini collected data about Saturn and its near-space environment for 13 years, resulting in 635gigabytes (GB) of scientiﬁc data (NASA Jet Propulsion Laboratory, 2017a). To put this into perspective,the Voyager I mission which ﬂew by Saturn in 1980 had onboard ∼

70 kilobytes (kB) of memory total(NASA Headquarters, 1980). The Cassini mission represents the ﬁrst large-scale data collection of Saturn.This enabled the ﬁeld of planetary science to apply statistics to large-scale data sizes , including machinelearning, to the most detailed spatio-temporally resolved dataset of the planet and its environment.This surge of data is not unique to Saturn science. In planetary science broadly, Mars in 2020 has 8 activemissions roving along the surface and orbiting (Planetary Society, 2020). The Mars Reconnaissance Orbiteralone has already collected over 300 terabytes (TB) of data (NASA Jet Propulsion Laboratory, 2017b).It is commonly accepted that upcoming missions will face similar drastic advances in the collectionof scientiﬁc data. Traditionally planetary science has employed core scientiﬁc methods such as remoteobservation and theoretical modeling. With the new availability of sampled environments provided by thesemissions, methods in machine learning offer signiﬁcant potential advantages. Applying machine learningin planetary space physics differs from other common applications. Cassini’s data are characteristic ofother planetary and space physics missions like the Magnetospheric Multiscale Mission (MMS) at Earthand the Juno mission to Jupiter. The plasma and magnetic ﬁeld data collected by these missions are fromorbiting spacecraft. This conﬂates spatial and temporal phenomena. This is a shared characteristic with thebroader ﬁeld of geoscience which often represents complex systems undergoing signiﬁcant spatio-temporalchanges with limitations on quality and resolution (Karpatne et al., 2019).The main use of these data in planetary science is to advance fundamental scientiﬁc theories. This requiresthe ability to infer meaning from applications of statistical methods. Unlike similar missions at Earth,machine learning for space physics data at Saturn has limited direct application to the prediction of spaceweather. A central interest in space weather prediction is to give lead-time information for operationalpurposes. As a result, the prediction accuracy in machine learning applications in space weather predictionis seen as paramount. In comparison, at Saturn, machine learning applications require highly interpretableand explainable techniques to investigate scientiﬁc questions (Ebert-Uphoff et al., 2019). How to improve

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics machine learning generally from an interpretability standpoint is itself an active research area in domainapplications of machine learning (e.g. Molnar, 2019). Within this work we speciﬁcally focus on evaluatingand implementing interpretable machine learning. Interpretable machine learning usually relies on domainknowledge and is therefore domain speciﬁc, but it can be extended to generally refer to models withfunctional forms simple enough for humans to understand how they make predictions, such as logical rulesor additive factors (Rudin, 2019). Complexity depends in part on what constitutes common knowledgewithin a domain. Scientists are trained to interpret different models depending on their ﬁeld. As a resultmodels will range in perceived interpretability across ﬁelds. While the ﬁnal models must be relativelysimple in order for humans to understand their decision process, the algorithms which produce optimalinterpretable models often require solving computationally hard problems. Importantly, despite widespreadmyths about performance, interpretable models can often be designed to perform as well as un-interpretableor ‘black box’ models (Rudin, 2019).In planetary science it’s important to discern the workings of a model in order to understand theimplications for the workings of physical systems. Interpretability is not the same as explainability:explainability refers to any attempt to explain how a model makes decisions, typically this is doneafterwards and without reference to the model’s internal workings. Interpretability, however, refers towhether the inner workings of the model, its actual decision process, can be observed and understood (Rudin,2019). Within this work we are concerned with interpretability in order to gain scientiﬁcally actionableresults from applied machine learning. The dual challenges of spatio-temporal data and interpretabilityare compounded for planetary orbiting spacecraft. Complications for orbiting spacecraft can range fromrare opportunities for observation, and engineering constraints on spacecraft data transmission. A maininterest in this work is to begin to ask: how can machine learning be used within these constraints to answerfundamental scientiﬁc questions?Scientists have approached interpretable machine learning for physics in two ways. First, they haveadded known physical constraints and relationships into modeling. Within the space weather predictioncommunity, such integration has shown promise in improving the performance of deep learning modelsover models that do not account for the physics of systems (Swiger et al., 2020). Several ﬁelds includingbiology have argued for an equal value of domain knowledge and machine learning techniques for thatreason (see discussion within Coveney et al., 2016). These discussions have culminated in several reviewsfor scientiﬁc ﬁelds on the integration of machine learning for data rich discovery (Butler et al., 2018;Bergen et al., 2019). Second, scientists have long tried to use machine learning for the discovery of physicallaws from machine learning (e.g. Kokar, 1986). Recently, this work has turned to deep learning tools (e.g.Raissi et al., 2019; Iten et al., 2020; Ren et al., 2018). However, as Rudin (2019) points out, explanationsfor the patterns deep learning tools ﬁnd are often inaccurate and at worst, totally unrelated to both themodel and the world it models. These two approaches lie on a continuum between valuing increasing dataand model freedom, or incorporating physical insight and model constraint.In Figure 1, we present a diagram for considering physical theory and machine learning within thecontext of theoretical constraints. The examples at one end of the continuum represent applications oftraditional space physics from global theory driven modeling, while those at the other end of the continuumfocus on data driven approaches to space weather and solar ﬂare prediction. The model adjusted centerpresented below takes advantage of data, but limits or constrains the application by merging with domainunderstanding. Our work is in the middle of the continuum. We leverage domain knowledge about spacephysics, while also aiming to learn more about the physical system we study. Importantly, we use an

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics interpretable machine learning approach so that we can be more conﬁdent in drawing physical insightsfrom the model.We present comparisons between a range of data sizes and physics incorporation to classify uniqueplasma transport events around Saturn using the Cassini dataset. As a characteristic data set of space physicsand planetary environments, this provides valuable insights toward future implementation of automateddetection methods for space physics and machine learning. We focus on three primary guiding axes in thiswork to address implementations of machine learning. First, we address the performance and accuracy ofthe application. Second, we consider how to increase interpretability of machine learning applicationsfor planetary space physics. Third, we tackle how characteristics of spacecraft data change considerationsof machine learning applications. All of these issues are essential to consider in applications of machinelearning to planetary and space physics data for scientiﬁc interpretation.To investigate these questions and provide a path toward application of machine learning to planetaryspace physics datasets, we compare and contrast physics-based and non-physics based machine learningapplications. In Section 2, we discuss the previous development of a physics-based semi-supervisedclassiﬁcation from Azari et al. (2018) for the Saturn system within the context of common characteristicsof orbiting spacecraft data. We then provide an outline for general physics-informed machine learning forautomated detection with space physics datasets in Section 3. Section 4 describes the machine learningmodel set up and datasets that we use to compare and contrast physics-based and non-physics based eventdetection. Section 5 details the implementation of logistic regression and random forest classiﬁcationmodels as compared to this physics-based algorithm with the context of physics-informed or model adjustedmachine learning. Section 6 then concludes with paths forward in applications of machine learning forscientiﬁc insight in planetary space physics.

Saturn’s near space environment where the magnetic ﬁeld exerts inﬂuence on particles, or magnetosphere,ranges from the planet’s upper atmosphere to far from the planet itself. On the dayside the magnetospherestretches to an average distance of 25 Saturn radii (R S ) with a dynamic range between 17 and 29 R S (Arridge et al., 2011) (1 R S = 60,268 km). This distance is dependent on a balance between the internaldynamics of the Saturn system and the Sun’s inﬂuence from the solar wind. Within this environment acomplex system of interaction between a dense disk of neutrals and plasma sourced from a moon of Saturn,Enceladus, interacts with high-energy, less dense plasma from the outer reaches of the magnetosphere (seeFigure 2).This interaction, called interchange, is most similar to Rayleigh-Taylor instabilities and results in theinjection of high-energy plasma toward the planet. In Figure 2, a system of interchange is detailed with acharacteristic Cassini orbit cutting through the interchanging region. The red box in this ﬁgure is presentedas an illustrative slice through the type of data obtained to characterize interchange. One of the majorquestions in magnetospheric studies is how mass, plasma, and magnetic ﬂux moves around planets. Atthe gas giant planets of Saturn and Jupiter, interchange is thought to be playing a fundamental role insystem-wide transport by bringing in energetic material to subsequently form the energetic populations ofthe inner magnetosphere, and to transport plasma outwards from the moons. Until Cassini, Saturn neverhad a spacecraft able to develop statistics based on large-scale data sizes to study this mass transportsystem. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

The major scientiﬁc question surrounding studying these interchange injections is what role theseinjections are playing in the magnetosphere for transport, energization, and loss of plasma. To answer thisquestion, it’s essential to understand where these events are occurring and the dependency of these eventson other factors in the system, such as inﬂuence from other plasma transport processes and spatio-temporallocation. From Cassini’s data, several surveys of interchange had been pursued by manual classiﬁcation,but these surveys disagreed on both the identiﬁcation of events and resulting conclusions (Kennelly et al.,2013; Lai et al., 2016; Chen and Hill, 2008; Chen et al., 2010; M¨uller et al., 2010; DeJong et al., 2010).The main science relevant goal was to create a standardized, and automated, method to identify interchangeinjections. This list needed to be physically justiﬁed to allow for subsequent conclusions and comparisons.In Section 2.1 we provide background on the Cassini dataset and summarize the previous developmentof a physics-based detection method in Section 2.2. We then provide a generalized framework in thefollowing Section 3 for incorporating physical understanding into machine learning with the developmentof this previous physics-based method as an example. Subsequent sections investigate comparisons of thisprevious physics-based effort to other automated identiﬁcation methods.

Cassini has onboard multiple plasma and wave sensors which are in various ways sensitive to interchangeinjections. However, none of the previous surveys focused on high-energy ions, which are the primaryparticle species transported inwards during injections. In Figure 3, a series of injections are shown inhigh-energy (3-220 keV) ions (H + ) and magnetic ﬁeld datasets. This ﬁgure shows three large injectionsbetween 0400 and 0600 UTC followed by a smaller injection after 0700 most noticeable in the magneticﬁeld data. It is evident from these examples that using different sensors onboard Cassini will result indifferent identiﬁcation methods for interchange injections. This was a primary driver in a standardizedidentiﬁcation method for these events. The top two panels detail the Cassini Magnetospheric ImagingInstrument: Charge Energy Mass Spectrometer (CHEMS) dataset while the last contains the Cassinimagnetometer magnetic ﬁeld data (Krimigis et al., 2004; Dougherty et al., 2004).The CHEMS instrument onboard Cassini collected multiple species of ion data and ﬁnds the intensity ofincoming particles in the keV range of data. This datastream can be thought of as unique energy channels,each with a spacecraft position and time dependence. In Figure 3b three unique energy channels are shownfrom the overall data in the top panel, to illustrate the nature of these high-energy data. This type ofspatio-temporal data is often a characteristic of space physics missions (see Baker et al., 2016, for a reviewof MMS’ data products). When applying automated or machine learning methods, such data discussed above provides uniquechallenges and characteristics including: rare events (class imbalances), spatio-temporal sampling,heterogeneity in space and time, extreme high-dimensionality, and missing or uncertain data (Karpatneet al., 2019). These challenges are in addition to desired interpretability. It’s essential that an interpretablemodel is used to learn substantive information about this application. One common use of machine learningis to input a large number of variables and/or highly granular raw data (e.g. individual sensor readings orimage pixel values) into a model, letting the algorithm sort out relationships among them. Such modelsare inherently ‘black boxes’ because the number and granularity of variables, not to mention complicated

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics recursive relationships among them, makes it difﬁcult or impossible for humans to interpret (Rudin, 2019).One solution to this issue is to reduce dimensionality to fewer, more meaningful-to-humans inputs. But atthe same time, the model needs to be informative, and the inputs need to be meaningful. Incorporatingdomain knowledge and then letting the model determine their effectiveness in the system of study is apotential framework to consider.For this reason, when developing a detection method to standardize, characterize, and subsequently buildoff the detected list, a physics-based method was chosen to address these unique challenges. This previouseffort is discussed in Azari et al. (2018) and the resultant dataset is located on the University of Michigan’sDeep Blue Data hub (Azari, 2018). We build on this effort in the present work to provide a new evaluationof alternative solutions for data-driven methods.To develop this physics-based method, the common problems in space physics data described in Karpatneet al. (2019) were considered and addressed to develop a single dimension array ( S ). S was then used ina style most similar to a single dimensional logistic regression to ﬁnd the optimum value for detectinginterchange events. This classiﬁcation was standardized in terms of event severity, as well as physicallybounded in deﬁnition of events. As a result, it was able to be used to build up a physical understandingof the high-energy dynamics around Saturn’s magnetosphere including: to estimate scale sizes (Azariet al., 2018) and to demonstrate the inﬂuence of tail injections as compared to the ionosphere (Azari et al.,2019). Following machine learning practices, S was designed through cross validation. It was created toperform best at detecting events in a training dataset and then evaluated on a separate test dataset. These setscontain manually identiﬁed events and were developed from 10 % of the dataset (representing 7,375/68,090time samples). Training and test dataset selection and limiting spatial selection is of critical importancein spatio-temporal varying datasets. Our particular selection considerations are discussed in followingsections. The training set was used to optimize the ﬁnal form of S . The test dataset was used to compareperformance and prevent over ﬁtting. The same test and training datasets are used in the following sections. S was developed in Azari et al. (2018) to provide a single-dimension parameter which separated out themultiple dependencies of energy range and space while dealing with common challenges in space physicsand planetary datasets. S is calculated from S r by removing the radial dependence through normalization.In mathematical form, S r can be written as: S r = (cid:88) e =0 w ( Z e,r − C ) (1) S can be thought of as a single number which describes the intensiﬁcation of particle ﬂux over anormalized background. In other words, S can be calculated as: S = ( S r − ¯ S r ) /σ S r . In which ¯ S r is theaverage radially dependent average and σ S r the radially dependent standard deviation. These calculationsallow for S to be used across the entire radial and energy range for optimization in units of standarddeviation. . The variables w and C represent weighting values which are optimized for and discussed in thefollowing section. The notations of e and r represent energy channel and radial value. Z e,r represents anormalized intensity value observed by CHEMS. This is similar to the calculation of S from S r .Additional details on the development, and rationale behind, S are described in Section 3 as a speciﬁcexample for a general framework for inclusion of physical information into machine learning.The ﬁnal form of S depends non-linearly on the intensity values of the CHEMS sensor and radial distance.In Figure 4 we show the dependence of the ﬁnalized S value over the test dataset for the intensity at a single DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics energy value of 8.7 keV and over all radial distances. Within this ﬁgure the events in the test dataset aredenoted with dark pink dots. From panel d and e it’s evident that S disambiguates events from underlyingdistributions, for example in panel b. By creating S it was possible to create a single summary statisticwhich separated events from a background population.The strategies pursued in developing S are most applicable for semi-supervised event detection withspace physics data. They can, however, prove a useful guide in starting to incorporate physical knowledgeinto other applications in heliophysics and space physics. Within the previous effort we used the modeloptimization process from machine learning to guide a physics incorporated human effort. This was asolution to incorporating the computational methods employed in machine learning optimization to ahuman-built model. The end result was optimized in a similar fashion as machine learning models butthrough manual effort to ensure physical information preservation. Moving from this effort, we nowpresent a framework for expanding the style of integrating human effort and physical-information intoother applications for space physics data.Below we provide a framework for incorporating physical-understanding into machine learning. In eachstrategy we discuss common issues in space physics data, using a similar phraseology as Karpatne et al.(2019). In addition to characteristics in the structure of geoscience data, we also add interpretability as anecessary condition. For space physics and planetary data, the challenges within Karpatne et al. (2019)are often compounded and where appropriate we note potential overlap. After each strategy, we provide awalk-through of the development of S employed in Azari et al. (2018). This framework focuses on interpretable semi-supervised event detection with space physics data fromorbiters for the end goal of scientiﬁc analysis. Depending on the problem posed certain solutions couldbe undesirable. For a similarly detailed discussion on creating a machine learning workﬂow appliedto problems in space weather, see Camporeale (2019). The framework presented here can be thoughtof as a directed application of feature engineering for space physics problems, mostly for requiringinterpretability. In general the strategies below provide a context for careful consideration of the natureof domain application which is essential for applications of machine learning models to gather scientiﬁcinsights.1.

Limit to region of interest.

Orbiting missions often range over many environments and limitingfocus to regions of interest can assist in automated detection by increasing the likelihood of detectionof events.

Issues : heterogeneity in space and time, rare events (class imbalance)

Example : The Cassini dataset represents a wide range of sampled environments, the majority ofwhich do not exhibit interchange. In addition, the system itself undergoes seasonal cycles, changing intime, presenting a challenge to any long-ranging spatial or temporal automated detection. The originalwork targeted a speciﬁc radial region between 5 and 12 R S in the equatorial plane. This region isknown to be sensitive to interchange from previous studies. Similarly, each season of Saturn wastreated to a separate calculation of S , allowing for potential temporal changes to the detection of DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics interchange.2.

Careful consideration of training and test datasets.

Due to the orbiting nature of spacecraft,ensuring randomness in training and test datasets is usually not sufﬁcient to create a representativeset of data across space and time. For event studies, considerations of independence for training andtest dataset while containing prior and post-event data (at times critical for event identiﬁcation) areimportant. This is similar to recent strides in activity recognition studies with spatio-temporal data, inwhich training set considerations drastically affect the accuracy of activity classiﬁcation (e.g. Lockhartand Weiss, 2014a,b).

Issues : heterogeneity in space and time, spatio-temporal data, rare events, small sample sizes

Example : While the test and training set represent 10 % of the data for the worked example, the10 % was taken such that it covered the widest range of azimuthal and radial values, while still beingcontinuous in time and containing a range of events.3. Normalize and/or transform.

Many space environments have a spatio-temporal dependentbackground. Normalizing separately to spatial or other variables will address these dependencies andcan prove advantageous if these are not critical to the problem.

Issues : heterogeneity in space and time, spatio-temporal sampling, multi-dimensional data

Example : As seen Figure 4b ﬂux values depend on radial distance and energy value. Similarly, ﬂuxexhibits log scaling, where values can range over multiple powers of 10 in the span of minutes tohours as seen in Figure 3. To handle the wide range of values from the CHEMS sensor, each separateenergy channel’s intensity was ﬁrst converted into logarithmic space before then being normalized bysubtracting off the mean and dividing by its standard deviation. Effectively, this transforms the rangeof intensities to a near-normal distribution dependent on radial distance and energy value (see Z e,r inequation 1). A similar treatment is performed on creating the ﬁnal S from S r . This is important due tothe commonality of normalcy assumptions in which models can assume normally distributed data onthe same scale across inputs.4. Incorporate physical calculations.

Space physics data can come with hundreds if not thousands offeatures. While many machine learning techniques are designed for just this kind of data, they do nottypically yield results that are amenable to human interpretation and scientiﬁc insight into the processesof physical systems. They express a complex array of relationships among raw measurements that dolittle to help humans build theory or understanding. Summary statistics like summing over multiplevariables, or taking integrals, can preserve a large amount of information from the raw data forthe algorithm while leaving scientists with smaller sets of relationships between more meaningfulvariables to interpret. For other ﬁelds rich in noisy and incomplete time-series data with a longerhistory of automated detection methods, summary statistic transformations have been a valuable wayof handling this type of data for improved performance (e.g. Lockhart and Weiss, 2014a).

Issues : interpretability, multi-dimensional data, missing data

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Example : To address missing values . building up summary statistics, for example through summingover multiple energy channels can help. This creates an particle pressure like calculation (see sumin equation 1). Particle pressure itself is not used to identify events, as the ability to tune the exactparameters was desired in the identiﬁcation of injections and developing S proved more reliable. Thisallows for the lower 14 energy channels to contribute without removing entire timepoints from thecalculation where partial data is missing and also increasing interpretability of the end result . Onlythe lower 14 channels are used as the higher energy channels also show long duration backgroundfrom earlier events drifting in the Saturn environment (see Figure 3).5. Compare with alternate metrics.

Dependent on your use case, the trade-off costs between falsepositives and false negatives could be different from the default settings in standard machine learningtools. Investigating alternate metrics of model performance and accuracy are useful toward increasinginterpretability.

Issues : interpretability, rare events (class imbalance)

Example : In the training and test datasets only 2.4 % of the data exist in an event state. This provesto be challenging for then ﬁnding optimum detection due to the amount of false positives and usagefor later analysis. In equation 1 scaling factors of w and C are introduced. These scale factors arechosen by optimizing for the best performance of the Heidke Skill Score (HSS) (Heidke, 1926). HSSis more commonly used in weather forecasting than in machine learning penalty calculations but hasshown potential for handling rare events (see Manzato, 2005, for a discussion of HSS). In Section 5we evaluate how HSS performs as compared to other regularization schemes (ﬁnal values: w = 10, C = 2).6. Compare deﬁnitions of events, consider grounding in physical calculations . Much of the purposeof developing an automated detection is to standardize event deﬁnitions. Developing a list of eventsthen can become tricky.

Issues : lack of ground truth, interpretability, rare events (class imbalance)

Example : At this point in the calculation of S , there is a single number, in units of standarddeviations, for each time point. This calculation so far, takes in the ﬂux of the lowest 14 energychannels of CHEMS before normalizing and combining these values to return a single value at eachtime. This number is higher (in the useful units of standard deviation) for higher ﬂux intensiﬁcationsand lower for ﬂux drop outs. The ﬁnal question becomes at which S value should an event beconsidered real or false.Based on the training dataset, 0.9 standard deviations above the mean of S is the optimum parameterfor peak HSS performance. As discussed in Section 2.2 0.9 was determined through optimizingagainst the training set. Since S is in terms of standard deviations, additional higher thresholds can beimplemented to sub-classify events into more or less severe cases with a physical meaning (ranking).This allowed for the application as a deﬁnition task with a physical justiﬁcation. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics Investigate a range of machine learning models and datasets.

Incorporating a range of machinelearning models, from the most simple to the most complex in addition to varying datasets, can offerinsights in the nature of the underlying physical data.

Issues : interpretability

Example : In developing S , alternative feature inclusions were considered. S was settled on for itsgrounding in physical meaning. A secondary major consideration was its accuracy compared to othermachine learning applications. In the following sections we discuss additional models.As similarly discussed within Camporeale (2019), the desire to incorporate physical calculations comesfrom an interest in using machine learning for knowledge discovery. In the use cases of interest here,both the needs for accuracy and interpretability are essential. These presented strategies are designed toimprove the potential performance for semi-supervised classiﬁcation problems and the interpretability forsubsequent physical understanding. Creating the ﬁnal form of S was a labor intensive process to create andthen optimize. Due to S ’s non-linear dependence on the features shown in Figure 4, this was a non-trivialtask. Similarly expanding S into additional dimensions is challenging. This is where the machine learninginfrastructure offers signiﬁcant advantages as compared to the previous effort. In the following Sections 4and 5 we discuss alternative solutions to identiﬁcation of interchange. In the previous physics-based approach, events were deﬁned through intensiﬁcations of H + only, allowingfor comparisons to other surveys and advancement of the understanding of events. This was a non-intuitiveapproach as common logic in application of machine learning algorithms suggests that greater data sizeswill result in additional accuracy given a well-posed problem. To explore both the potential for higheraccuracy as well as interpretability of the application, we compare the performance of two distinct machinelearning models with access to varying data set sizes. Below we discuss models we use in this comparisoneffort. Two commonly used machine learning models for supervised classiﬁcation are logistic regression andrandom forest classiﬁcation. Both are considered standard classiﬁcation models when applying machinelearning and performing comparative studies (Couronn´e et al., 2018). While both models can be interpretedby humans, the additive functional form of logistic regression and the broad literature on interpreting itmake it highly interpretable. Random Forest models consist of easy to interpret logical rules, but the largenumbers and weighted combinations of those rules mean it is less interpretable (Rudin, 2019). The originalphysics-based algorithm was designed with a logistic regression method in mind, but with signiﬁcantadjustment. Comparisons to this model are directly informative as a result. Logistic regression categorizesfor binary decisions by ﬁtting a logistic form, or a sigmoid. Logistic regression is a simple, but powerful,method toward predicting categorical outcomes from complex datasets. The basis of logistic regressionis associated with progress made in the 19 th century in studying chemical reactions, before becomingpopularized in the 1940s by Berkson (1944) (see Cramer, 2002, for a review). When implemented andoptimized using domain knowledge, highly interpretable models, like logistic regression, generally performas well as less interpretable models and even deep learning approaches (Rudin, 2019). DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Random forest in comparison classiﬁes by building up collection of decision trees trained on randomsubsets of the input variables. The predictions of all trees are then combined in an ensemble to developthe ﬁnal prediction. Similar to logistic regression, the method of random forest has been built over timewith the most modern development associated with Breiman (2001). While logistic regression requiresresearchers to specify the functional form of relationships among variables, random forests add complexitytoward classiﬁcation decisions, by allowing for arbitrary, unspeciﬁed non-linear dependencies betweenfeatures, also known as model inputs.The models used within this chapter are from the scikit-learn machine learning package in Python(Pedregosa et al., 2011). Within the logistic regression the L2 (least squares) regularization penalty isapplied. Within the random forest a grid search with 5-fold cross-validation is used to ﬁnd the optimumdepth between 2 and 5, while the number of trees is kept at 50. These search parameters are chosento constrain the random forest within the perspective of the noisy nature of the CHEMS dataset and toprevent over ﬁtting. Alterations to this tuning parameter scheme are not seen to alter the results in thefollowing section. Events are relatively rare in the data (2.4% of the data in the training and test datasetscorresponds with an event), and this can bias the ﬁt of models. As such, unless otherwise noted, we use classweighting to adjust the importance of data from each class (event and non-event) inversely proportionalto its frequency so that the classes exert balanced inﬂuence during model ﬁtting. This results in eventsweighted higher more important than non-events due to their rarity. Performance is shown in Section 5against the test dataset deﬁned above.

To explore the performance of logistic regression and random forest, four distinct subsets of the Cassiniplasma and magnetic ﬁeld data are utilized ranging in data complexity and size as follows:1. S \ C (Spacecraft) Location and Magnetic Field6 features, 68,090 time samples2. S \ C Location, Magnetic Field, and H + ﬂux (3-220 keV)38 features, 68,090 time samples3. Low Energy H + ﬂux (3-22 keV)14 features, 68,090 time samples4. Azari et al., 2018 ( S Value)1 feature, 68,090 time samplesThese subsets are chosen to represent additional features, complexity, and physics inclusion. All of thesesubsets should be sensitive in varying amounts toward identiﬁcation of interchange injections as evidencedin Figures 3 and 4. The ﬁrst two datasets are a comparison of increasing features that should assist inidentiﬁcation of interchange injection. The third dataset includes less features, but is the originator mostsimilar to the derived parameter from Azari et al. (2018). The ﬁnal dataset contains the single summarystatistic array of the S parameter. In the following result section, these four dataset segments are used toevaluate the two models. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

We are interested in evaluating how the former physics-based S parameter performs with other commonlyused subsets of space physics data. Our primary goal in this section is to investigate the trade off betweenthe performance of these more traditional models and their interpretability, and therefore usage for scientiﬁcanalyses. We complete this through applying supervised classiﬁcation models and evaluate the ease ofinterpretability and their relative performance. In Figure 5 the ROC curve of a logistic regression for all four subsets of Cassini data is presented.ROC or receiver operating characteristics, are a common method employed for visualizing the efﬁcacyof classiﬁcation methods (see Fawcett, 2006, for a generalized review of ROC analysis). ROC curves inthis particular example are created by sweeping over a series of classiﬁcation thresholds. Ideally a perfectclassiﬁer will result in a curve that carves a path nearest to the upper left corner. Area under the curve, orAUC is presented as a metric to understand the overall performance of each logistic regression evaluation.AUC has the ideal parameters of ranging between 0 and 1, with 0.5 representative of random guessing,1 representing perfect classiﬁcation, and 0 as the inverse of truth. AUC can be thought of as an averageaccuracy of a model and isn’t sensitive to class-balance and thresholds. ROC curves present the ratios oftrue positive rate (y-axis) to false positive rate (x-axis). This can be thought of as the trade off for classiﬁersbetween events successfully identiﬁed (y-axis), and events unsuccessfully identiﬁed (x-axis).The purple curve represents the logistic regression with only the derived physics-based S as an input .This is rather redundant with optimizing by hand as it’s a single variable space. Instead the purple curve isprovided as a benchmark against the identical performance and curves found within Azari et al. (2018).From this ﬁgure, this single summary statistic ( S ) outperforms all other subsets of Cassini data withan AUC approaching near 1.0 (0.97). This is evidence for the current case, that incorporating physicalinformation, even at the expense of greater dataset size improved the performance of certain machinelearning applications.Following this it is not the largest dataset that has the second best performance. Instead, the red curvewhich contains only the low energy H + intensities shows the best performance of the non physics-adjusteddatasets. The magnetic ﬁeld is a useful parameter for the prediction of interchange as demonstrated inFigure 3 but the form of the logistic regression is unable to use this information successfully. This ispossibly due to the higher time resolution needed for interchange identiﬁcation from magnetic ﬁeld dataand any future identiﬁcation work needs to focus on adjusting the magnetic ﬁeld inputs and models. Thecurrent dataset is processed such that each time point in the CHEMS set is matched with a single magneticﬁeld vector. Normally within interchange analyses, the magnetic ﬁeld information is of a much higherresolution. It is likely if a study pursued solely magnetic ﬁeld data of higher time resolution and processedthese data to represent pre and post event states dependent on time, the performance of the magnetic ﬁelddata would be improved. It’s evident from Figure 4 that S exhibits non-linear behavior from the distributionof S on intensity, distance, and energy. Similarly the magnetic ﬁeld values likely range over a far range dueto the background values, that the linear dependency requirements of logistic regression are unable to usethis information. Without the ﬂux data especially (the blue curve) logistic regression is unable to predictinterchange as compared to the previous physics-based parameter. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

The AUC doesn’t capture the entire picture for our interest. While it shows the performance of thealgorithm, it contains information for multiple ﬁnal classiﬁcations of events. The grey dots on Figure 5demonstrates the chosen cut-point for L2 regularization for class weighted events, or the ﬁnal classiﬁcationdecision for an optimal trade between real events and false events. Within the previous section, the HeidkeSkill Score or HSS was discussed as the ﬁnal threshold separating events from non-events (denoted as theorange dot on Figure 5). Deciding the threshold of what separates an injection event from a non-eventis critical for the implementation of statistical analysis on the results especially in this case, in whichnon-events outnumber events at a ratio of ∼ S with categories of events (Azari et al., 2018). We now move to evaluating the previous HSS optimization to the logistic regression L2 regulation forboth class weighted and non-class weighted models. In Figure 6 the ﬁnal forms of the weighted andnon-weighted logistic regression for the trivial 1 dimensional array case of the S parameter are shown. Thethresholds for the ﬁnal decisions and for HSS are shown as vertical lines (the orange dashed line representsHSS). Due to the extreme imbalance of non-events to events, implementing class weighting results in largeshifts between what is considered an injection event or not. We suggest that the class imbalance inherentin this problem is the main rationale between the differences of HSS and other regularizations. Betweenthe two decision points of the blue and purple vertical lines there are 46 real events, but 202 non-events.This means that if using class-weighting in logistic regression for this problem, 202 non-events would beclassiﬁed as events. Non-intuitively, for this application where the ﬁnal events are used to understand theSaturn system, it’s advantageous to use a non-class weighted model, as it limits the non-events. Howeverthe un-class-weighted model results in removing many real events as well as can be seen in the bulk of thepink events (real events) being misclassiﬁed by the purple vertical line.The Heidke Skill Score provides an in-between choice of these by providing a higher threshold than theclass-weighted, and lower than non-class weighted. The logistic regression for the S parameter shown hereis easily intuited since the X-axis represents only one variable. The power of machine learning howeveris most advantageous in multiple dimensions. HSS has shown to be a more applicable metric for rareevents. Other skill scores, such as the True Skill Score have also shown promise in machine learningapplications to space physics (Bobra and Couvidat, 2015). Skill score metrics themselves have a long andrich history in space physics before more recent applications in machine learning with interest originatingin space weather prediction (see Morley, 2020, for an overarching review of space weather prediction).We also direct the reader to discussions of metrics for physical model and machine learning predictionof space weather (Camporeale, 2019; Liemohn et al., 2018). How can these traditional metrics for spaceapplications be integrated into the regularization schemes? Future work in machine learning applicationsshould consider shared developments between the physical sciences communities usage of skill scores andregularization of models. In Figure 7 the ROC diagram for the same subsets of data but for a random forest model are presented.In this case, unlike the logistic regression, other subsets of data can reproduce the same performance (orAUC) as the derived parameter. All curves, with the exception of the spacecraft location and magneticﬁeld, quickly approach or slightly surpass the AUC of the physics-based parameter at 0.97, with small

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics differences in the performance of the low energy H + ﬂux (0.98) and of the combined spacecraft location,all ﬂux, and magnetic ﬁeld (0.97). The model form of random forest allows for non-linear behavior in theintensity and magnetic ﬁeld data to ﬁnd injection events. Increasing the features then helps in the case ofrandom forest whereas it did not for logistic regression. Similar to the logistic regression, HSS results in adifferent ratio between true positive rate and false positive rate than the random forest model cut-off pointwith the grey dots.Comparing back to logistic regression, even with a relatively complex model such as random forest, theAUC of the best ROC curves are near-identical. Given that S is an array, this is not that surprising. Inboth cases the physics-derived parameter outperforms or is effectively equivalent to all other data subsets,including those with access to a much richer information set and therefore more complex model. For theapplication of interpretability for then gathering scientiﬁc conclusions, logistic regression is advantageousas it presents a much simpler model. However, random forest, has shown ability to mimic the underlyingphysics adjustments through selection of datasets.Within these results, it’s evident that the S parameter performs as well as simplistic machine learningmodels. Given that S is also grounded in a physics-based deﬁnition dependent on solely a variableﬂux background, this offers advantages to subsequent usage in scientiﬁc results. However, many of theadjustments in creating S can be implemented into other space physics data, and integrated into machinelearning as evidenced here. In the description of the development of S , several challenges in geosciencedata from the framework discussed in Karpatne et al. (2019), and CHEMS speciﬁc solutions were presented.From the above evaluation, it is evident that applications of machine learning are useful to the task ofautomated event detection from ﬂux data, but with diminishing interpretability. A potential solution to bothenhancing the interpretability, similar to the S based parameter, but also incorporating the advantages ofmachine learning is presented in Figure 1. Rather than consider incorporation of physics-based informationas deleterious to the implementation of machine learning, we have found that including this informationsimpliﬁes the application, enhances the interpretability, and improves the overall performance. Planetary space physics has reached a data volume capacity at which implementation of statistics includingmachine learning is a natural extension of scientiﬁc investigation. Within this work we addressed howmachine learning can be used within the constraints of common characteristics of space physics datato investigate scientiﬁc questions. Care should be taken when applying automated methods to planetaryscience data due to the unique challenges in spatio-temporal nature. Such challenges have been broadlydiscussed for geoscience data by Karpatne et al. (2019), but until now limited attention in comparison toother ﬁelds has been given toward reviews of planetary data.Within this work we have posed three framing concerns for applications of machine learning to planetarydata. First, it’s important to consider the performance and accuracy of the application. Second, it’s necessaryto increase interpretability of machine learning applications for planetary space physics. Third, it’s essentialto consider how the underlying issue characteristics of spacecraft data changes applications of machinelearning. We argue that by including physics-based information into machine learning models, all threeconcerns of these applications can be addressed.For certain machine learning models the performance can be enhanced but importantly in this application,the interpretability improves along with handling of characteristic data challenges. To reach this conclusionwe presented a framework for incorporating physical information into machine learning. This framework

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics targeted considerations for increasing interpretability and addressing aspects of spacecraft data into machinelearning with space physics data. In particular, it addresses challenges such as the spatio-temporal nature oforbiting spacecraft, and other common geoscience data challenges (see Karpatne et al., 2019). After whichwe then cross-compared a previous physics-based method developed using the strategies in the frameworkto less physics-informed but feature rich datasets.The physics-based semi-supervised classiﬁcation method was built on high-energy ﬂux data from theCassini spacecraft to Saturn (see Azari et al., 2018). In investigating the accuracy of machine learningapplications, we demonstrated this physics-based approach outperformed automated event detection forsimple logistic regression models. It was found that traditional regularization through L2 penalties bothunder, and overestimated ideal cutoff points for ﬁnal event classiﬁcation (depending on class weighting).Instead, metrics more commonly used in weather prediction, such as the Heidke Skill Score, showedpromise in class imbalance problems. This is similar to work demonstrating the applicability of True SkillScore in heliophysics applications (Bobra and Couvidat, 2015). Future work should consider building onthe rich history of prediction metrics in the space physics community for shared development between thephysical sciences usage of skill scores and in regularization of models.While logistic regression is a more interpretable model, random forest proved that with the addition ofmore and lower level variables from the Cassini mission, the model could approximate our physics-basedlogistic model successfully. In this case physics-informed or model adjusted machine learning, can eachthe same performance but with different levels of interpretability, thus different ability to draw furtherconclusions about implications of the results. The logistic approach provides a coefﬁcient and thresholdfor a meaningful physical quantity, S , effectively the normalized intensiﬁcation of particle ﬂux. Therandom forest approach can provide an ‘importance’ score for S or show a large number of conjunctionrules involving it, but neither is as useful for human analysts. A forest model using a large number of rawvariables instead of a small number of more meaningful ones like S is even harder for humans to make senseof. Deep neural networks, as multi-layered webs of weighted many-to-many relationships, are even lessinformative for human analysts interested in understanding the workings of the model and physical system.Further, ﬁndings that the interpretable model performs as well or better than other approaches demonstratethat, despite the widespread myth to the contrary, there is no inherent tradeoff between performance andinterpretability (Rudin, 2019). For example, the ability to further split and deﬁne identiﬁed events basedon their ﬂux intensity using S gives the ability to address further scientiﬁc questions as to the fundamentalmechanisms behind the interchange instability itself. The simplistic model of logistic regression whichresults in the same performance as random forest is highly advantageous for the current task.The framework and comparison presented here opens up avenues toward consideration of applyingmachine learning to answer planetary and space physics questions. In the future, cross-disciplinary workwould greatly advance the state of these applications. Particularly within the context of interpretabilitytoward scientiﬁc conclusions through physics-informed, or model adjusted machine learning. The inclusionof planetary science and space physics domain knowledge in application of data science allows for thepursuit of fundamental questions. We have found that incorporating physics-based information increasesthe interpretability, and improves the overall performance of machine learning applications for scientiﬁcinsight. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

CONFLICT OF INTEREST STATEMENT

The authors declare that the research was conducted in the absence of any commercial or ﬁnancialrelationships that could be construed as a potential conﬂict of interest.

AUTHOR CONTRIBUTIONS

We use the CRediT (Contributor Roles Taxonomy) categories for providing the following contributiondescription (see Brand et al., 2015). AA led the conceptualization and implemented the research for thismanuscript including the investigation, visualization, formal analysis, and original drafting of this work.JL assisted in the conceptualization and discussions of methodology in this work along with editing themanuscript. ML provided funding acquisition, resources, supervision, and assisted in conceptualizationalong with editing the manuscript. XJ provided funding acquisition, resources, supervision, and assisted inconceptualization.

FUNDING

This material is based on work supported by the National Science Foundation Graduate Research FellowshipProgram under Grant No. DGE 1256260 and was partially funded by the Michigan Space Grant Consortiumunder NNX15AJ20H. JL received funding through an NICHD training grant to the Population StudiesCenter at the University of Michigan (T32HD007339). ML was funded by NASA grant NNX16AQ04G.

ACKNOWLEDGMENTS

We would like to thank Monica Bobra, Brian Swiger, Garrett Limon, Kristina Fedorenko, Dr. Nils Smit-Anseeuw, and Dr. Jacob Bortnik for relevant discussions related to this draft. We would also like to thankthe conference organizers of the 2019 Machine Learning in Heliophysics conference at which this workwas presented, and the American Astronomical Society Thomas Metcalf Travel Award for funding travelto this conference. This work has additionally appeared as a dissertation chapter (Azari, 2020). Figure 2’scopyright is held by Falconieri Visuals. It is altered here with permission. Figure 1 contains graphics fromJia et al. (2012) and Chen et al. (2019) which can be found in journals with copyright held by AGU. Wewould like to thank Dr. Jon Vandegriff for assistance with the CHEMS data used within this work.

DATA AVAILABILITY STATEMENT

The events analyzed for this study can be found in the Deep Blue Dataset under doi: 10.7302/Z2WM1BMN(Azari, 2018). The original datasets from the CHEMS (Krimigis et al., 2004) and MAG (Dougherty et al.,2004) instruments can be found on the NASA Planetary Data System (PDS). Details on the most recentdatasets for CHEMS and MAG can be found on the Cassini-Huygens Archive page at the PDS PlanetaryPlasma Interactions node (https://pds-ppi.igpp.ucla.edu/mission/Cassini-Huygens). Associated data notincluded in the above repositories can be obtained through contacting the corresponding author.

REFERENCES

Arridge, C. S., Andr´e, N., McAndrews, H. J., Bunce, E. J., Burger, M. H., Hansen, K. C., et al. (2011).Mapping magnetospheric equatorial regions at Saturn from Cassini Prime Mission observations. SpaceScience Reviews 164, 1–83. doi:10.1007/s11214-011-9850-4

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics [Dataset] Azari, A. R. (2018). Event list for ”Interchange injections at Saturn: Statistical survey of energeticH+ sudden ﬂux intensiﬁcations”. doi:10.7302/Z2WM1BMNAzari, A. R. (2020). A Data-Driven Understanding of Plasma Transport in Saturn’s Magnetic Environment.Ph.D. thesis, University of MichiganAzari, A. R., Jia, X., Liemohn, M. W., Hospodarsky, G. B., Provan, G., Ye, S.-Y., et al. (2019). AreSaturn’s interchange injections organized by rotational longitude? Journal of Geophysical Research:Space Physics 124, 1806–1822. doi:10.1029/2018JA026196Azari, A. R., Liemohn, M. W., Jia, X., Thomsen, M. F., Mitchell, D. G., Sergis, N., et al. (2018).Interchange injections at Saturn: Statistical survey of energetic H+ sudden ﬂux intensiﬁcations. Journalof Geophysical Research: Space Physics 123, 4692–4711. doi:10.1029/2018JA025391Baker, D. N., Riesberg, L., Pankratz, C. K., Panneton, R. S., Giles, B. L., Wilder, F. D., et al. (2016).Magnetospheric Multiscale instrument suite operations and data system. Space Science Reviews 199,545–575. doi:10.1007/s11214-014-0128-5Bergen, K. J., Johnson, P. A., de Hoop, M. V., and Beroza, G. C. (2019). Machine learning for data-drivendiscovery in solid Earth geoscience. Science 363, eaau0323. doi:10.1126/science.aau0323Berkson, J. (1944). Application of the logistic function to bio-assay. Journal of the American StatisticalAssociation 39, 357–365. doi:10.2307/2280041Bobra, M. G. and Couvidat, S. (2015). Solar ﬂare prediction using SDO/HMI vector magnetic ﬁeld data witha machine-learning algorithm. The Astrophysical Journal 798, 135. doi:10.1088/0004-637x/798/2/135Brand, A., Allen, L., Altman, M., Hlava, M., and Scott, J. (2015). Beyond authorship: attribution,contribution, collaboration, and credit. Learned Publishing 28, 151–155. doi:10.1087/20150211Breiman, L. (2001). Random Forests. Machine Learning 45, 5–32. doi:10.1023/A:1010933404324Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O., and Walsh, A. (2018). Machine learning formolecular and materials science. Nature 559, 547–555. doi:10.1038/s41586-018-0337-2Camporeale, E. (2019). The challenge of machine learning in space weather: Nowcasting and forecasting.Space Weather 17, 1166–1207. doi:10.1029/2018SW002061Chen, Y. and Hill, T. W. (2008). Statistical analysis of injection/dispersion events in Saturn’s innermagnetosphere. Journal of Geophysical Research: Space Physics 113, A07215. doi:10.1029/2008JA013166Chen, Y., Hill, T. W., Rymer, A. M., and Wilson, R. J. (2010). Rate of radial transport of plasmain Saturn’s inner magnetosphere. Journal of Geophysical Research: Space Physics 115, A10211.doi:10.1029/2010JA015412Chen, Y., Manchester, W. B., Hero, A. O., Toth, G., DuFumier, B., Zhou, T., et al. (2019). IdentifyingSolar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters. Space Weather17, 1404–1426. doi:10.1029/2019SW002214Couronn´e, R., Probst, P., and Boulesteix, A. L. (2018). Random forest versus logistic regression: alarge-scale benchmark experiment. BMC Bioinformatics 19, 270. doi:10.1186/s12859-018-2264-5Coveney, P. V., Dougherty, E. R., and Highﬁeld, R. R. (2016). Big data need big theory too. PhilosophicalTransactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374. doi:10.1098/rsta.2016.0153Cramer, J. S. (2002). The origins of logistic regression. Tinbergen Institute Working Paper No. 2002-119/4doi:10.2139/ssrn.360300DeJong, A. D., Burch, J. L., Goldstein, J., Coates, A. J., and Young, D. T. (2010). Low-energy electrons inSaturn’s inner magnetosphere and their role in interchange injections. Journal of Geophysical Research:Space Physics 115, A10229. doi:10.1029/2010JA015510

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Dougherty, M. K., Kellock, S., Southwood, D. J., Balogh, A., Smith, E. J., Tsurutani, B. T., et al.(2004). The Cassini magnetic ﬁeld investigation. Space Science Reviews 114, 331–383. doi:10.1007/s11214-004-1432-2Ebert-Uphoff, I., Samarasinghe, S. M., and Barnes, E. A. (2019). Thoughtfully using artiﬁcial intelligencein Earth science. Eos 100. doi:10.1029/2019EO135235Fawcett, T. (2006). An introduction to roc analysis. Pattern Recognition Letters 27, 861 – 874. doi:10.1016/j.patrec.2005.10.010Heidke, P. (1926). Berechnung des Erfolges und der G¨ute der Windst¨arkevorhersagen imSturmwarnungdienst (Calculation of the success and goodness of strong wind forecasts in the stormwarning service). Geograﬁska Annaler Stockholm 8, 301–349Iten, R., Metger, T., Wilming, H., del Rio, L., and Renner, R. (2020). Discovering physical concepts withneural networks. Physical Review Letters 124, 10508. doi:10.1103/PhysRevLett.124.010508Jia, X., Hansen, K. C., Gombosi, T. I., Kivelson, M. G., T´oth, G., Dezeeuw, D. L., et al. (2012).Magnetospheric conﬁguration and dynamics of Saturn’s magnetosphere: A global MHD simulation.Journal of Geophysical Research: Space Physics 117, A05225. doi:10.1029/2012JA017575Karpatne, A., Ebert-Uphoff, I., Ravela, S., Babaie, H. A., and Kumar, V. (2019). Machine Learning for theGeosciences: Challenges and Opportunities. IEEE Transactions on Knowledge and Data Engineering31, 1544–1554. doi:10.1109/TKDE.2018.2861006Kennelly, T. J., Leisner, J. S., Hospodarsky, G. B., and Gurnett, D. A. (2013). Ordering of injection eventswithin Saturnian SLS longitude and local time. Journal of Geophysical Research: Space Physics 118,832–838. doi:10.1002/jgra.50152Kokar, M. M. (1986). Coper: A methodology for learning invariant functional descriptions. InMachine Learning. The Kluwer International Series in Engineering and Computer Science (KnowledgeRepresentation, Learning and Expert Systems), eds. T. M. Mitchell, J. G. Carbonell, and R. S. Michalski(Boston, MA: Springer). 151–154. doi:10.1007/978-1-4613-2279-5 34Krimigis, S. M., Mitchell, D. G., Hamilton, D. C., Livi, S., Dandouras, J., Jaskulek, S., et al. (2004).Magnetosphere Imaging Instrument (MIMI) on the Cassini mission to Saturn/Titan. Space ScienceReviews 114, 233–329. doi:10.1007/s11214-004-1410-8Lai, H. R., Russell, C. T., Jia, Y. D., Wei, H. Y., and Dougherty, M. K. (2016). Transport of magneticﬂux and mass in Saturn’s inner magnetosphere. Journal of Geophysical Research: Space Physics 121,3050–3057. doi:10.1002/2016JA022436Liemohn, M. W., McCollough, J. P., Jordanova, V. K., Ngwira, C. M., Morley, S. K., Cid, C., et al.(2018). Model evaluation guidelines for geomagnetic index predictions. Space Weather 16, 2079–2102.doi:10.1029/2018SW002067Lockhart, J. W. and Weiss, G. M. (2014a). Limitations with activity recognition methodology and data sets.Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing:Adjust Publication , 747doi:10.1145/2638728.2641306Lockhart, J. W. and Weiss, G. M. (2014b). The beneﬁts of personalized smartphone-basedactivity recognition models. 2014 SIAM International Conference on Data Mining doi:10.1137/1.9781611973440.71Manzato, A. (2005). An Odds Ratio Parameterization for ROC Diagram and Skill Score Indices. Weatherand Forecasting 20, 918–930. doi:10.1175/WAF899.1Molnar, C. (2019). Interpretable machine learning. A Guide for Making Black Box Models Explainable

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Morley, S. K. (2020). Challenges and opportunities in magnetospheric space weather prediction.Space Weather 18, e2018SW002108. doi:10.1029/2018SW002108. E2018SW00210810.1029/2018SW002108M¨uller, A. L., Saur, J., Krupp, N., Roussos, E., Mauk, B. H., Rymer, A. M., et al. (2010). Azimuthal plasmaﬂow in the Kronian magnetosphere. Journal of Geophysical Research: Space Physics 115, A08203.doi:10.1029/2009JA015122NASA Headquarters (1980). Voyager Backgrounder, Release No: 80-160NASA Jet Propulsion Laboratory (2017a). Cassini Huygens by the NumbersNASA Jet Propulsion Laboratory (2017b). Mars Reconnaissance Orbiter By the NumbersPedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn:Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830Planetary Society (2020). Missions to MarsRaissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deeplearning framework for solving forward and inverse problems involving nonlinear partial differentialequations. Journal of Computational Physics 378, 686–707. doi:10.1016/j.jcp.2018.10.045Ren, H., Stewart, R., Song, J., Kuleshov, V., and Ermon, S. (2018). Learning with weak supervision fromphysics and data-driven constraints. AI Magazine 39, 27–38. doi:10.1609/aimag.v39i1.2776Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and useinterpretable models instead. Nature Machine Intelligence 1, 206–215. doi:10.1038/s42256-019-0048-xSwiger, B. M., Liemohn, M. W., and Ganushkina, N. Y. (2020). Improvement of Plasma Sheet NeuralNetwork Accuracy With Inclusion Of Physical Information. Frontiers in Astronomy and Space Sciences,under reviewWaskom, M., Botvinnik, O., Ostblom, J., Lukauskas, S., Hobson, P., Gelbart, M., et al. (2020).mwaskom/seaborn: v0.10.0 (January 2020) doi:10.5281/zenodo.3629446

FIGURE CAPTIONS

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Figure 1.

Framework for incorporating physical understanding in machine learning. This ﬁgure diagramsa continuum moving from purely theory bound, toward model free. The ﬁgure in model bound is fromJia et al. (2012), a magnetohydrodynamics simulation of Saturn’s magnetosphere. The ﬁgure in modelfree is from Chen et al. (2019), deep learning feature correlations for solar ﬂare precursor identiﬁcation.This ﬁgure contains subﬁgures from American Geophysical Union (AGU) journals. AGU does not requestpermission in use for republication in academic works but we do point readers toward the associated AGUworks for citation and ﬁgures in Jia et al. (2012) and Chen et al. (2019).

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Magnetopause

Cassini trajectoryDistance (R S ) EnceladusBowshock SaturnInterchange region

Less densehot plasma (H + ) Dense cool plasma (H O source)

Example eventsin Figure 3 I m age © F a l c on i e r i V i s ua l s For display with "Incorporating Physical Knowledge into Machine Learning for Planetary Space Physics" preprint only. Reuse prohibited.

Figure 2.

Diagram of interchange injection in the Saturn system. The illustrated orbit is an equatorialCassini orbit from 2005. Injections are denoted by the pale orange material interspersed with the watersourced plasma from Enceladus. Along the example orbit the red box denotes a hypothetical segment ofCassini data discussed in Figure 3. The purpose of developing an automated event detection is to identifythe pale orange material traveling toward the planet. This ﬁgure is produced in consultation with, andcopyright permissions from Falconieri Visuals.

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics a.b.c.

Figure 3.

Series of interchange injections characterized by high-energy ions. Panel a details an energytime spectrogram of the intensity from the Cassini CHEMS sensor. The color black denotes ﬂux eitherbelow the colorbar limit or missing data.. The three lines are placed at the energy channels for the plot inpanel b. Panel b shows the same CHEMS data, but split out into three characteristic energies over the entireCHEMS range. Panel c shows the magnetic ﬁeld data in KRTP (Kronocentric body-ﬁxed, J2000 sphericalcoordinates).

DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Figure 4.

Distributions of S parameter developed in (Azari et al., 2018). This ﬁgure represents a subset ofmultiple dependencies of S from a kernel density estimation (kde). The data used in this ﬁgure is from thetest dataset of the data. . Panels a, c, and f represent a single dimension kde of a CHEMS energy channelintensity, spacecraft location in radial distance, and of S. Panels b, d, and e represent two dimensionaldistributions. This ﬁgure was developed using the Seaborn statistics package’s kde function (Waskom et al.,2020). DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Figure 5.

Logistic regression ROC diagram for Cassini data subsets. The grey dots represent the cut-offfor L2 regularization for logistic regression. The orange dot represents the peak HSS value, used foroptimization in Azari et al., 2018. The distinct curves represent separate ROC curves for each subset ofdata described in section 4. The Azari et al., 2018 subset denotes the usage of S . Figure 6.

Finalized logistic regression against test dataset . The grey dots represent the test dataset valuesof non-events, and the pink of events. The scatter in the dots around 0 and 1 are for aesthetic reasons anddo not represent offset values. This ﬁgure contains logistic regressions performed on the physics-basedparameter from Azari et al. (2018). The blue curve represents a class-weighted model and the purplewithout class weights. Similarly the dashed lines for blue and purple represent the ﬁnalized cut-off pointsfor the class-weighted and un-weighted models. The orange dashed line represents the HSS optimizationused within Azari et al. (2018). The x-axis is in logarithmic scale to demonstrate the range of the values, S itself does span both negative and positive values. From being presented in logarithmic space this gives thefalse illusion that the blue curve does not approach zero. DRAFT. Final can be found at Frontiers under frontiersin.org/articles/10.3389/fspas.2020.00036. zari et al. Machine Learning: Planetary Space Physics

Figure 7.