Classification of exoplanets according to density
CClassification of exoplanets according to density
Andrzej Odrzywolek , ∗ and Johann Rafelski Department of Physics, The University of Arizona Tucson, AZ 85721, USA M. Smoluchowski Institute of Physics, Jagiellonian University Cracov, Poland ∗ To whom correspondence should be addressed; E-mail: [email protected] .
Abstract
Considering probability distribution as a function ofthe average density ¯ ρ computed for 424 extrasolarplanets we identify three log-normal Gaussian popu-lation components. The two most populous compo-nents at ¯ ρ (cid:39) . g/cc and ¯ ρ (cid:39) g/cc are the ice/gasgiants and iron/rock super-Earths, respectively. Athird component at ¯ ρ (cid:39) g/cc is consistent withbrown dwarfs, i.e. , electron degeneracy supportedobjects. We note presence of several extreme den-sity planetary objects. The raw radius-mass data.
Our objective is torecognize statistical regularities and possible anoma-lies in the physical state of the matter according todensity (Weisskopf, 1975) addressing the databasesof exoplanets (Hecht, 2016; Witze, 2015; Lissaueret al., 2014). The average density ¯ ρ of the planets ¯ ρ = M πR , (1)is closely related to the theoretical mass-radius ( M − R ) relation (Howard, 2013).Source of the data is the NASA exoplanet archive, exoplanetarchive.ipac.caltech.edu (Akeson & et. al., 2013) and The Extrasolar PlanetsEncyclopedia, exoplanet.eu (Schneider et al., 2011). Both were retrieved on 22 October 2016.Number of objects reported with both M and R is510 out of 3388 and 610 out of 3533, respectively.To ensure quality of data we concatenated databases,merged duplicates and split into “gold”, “silver”and “bronze” subsets. The “gold” sample of 424includes only exoplanets data with consistent (butnot necessarily identical) and unambiguous values M, R in both sources, and reviewed in originalsources (Xie, 2014; Hadden & Lithwick, 2014;Marcy & et. al., 2014) all dubious cases. “Silver”,including 146 objects, includes unconfirmed resultsappearing only once, and the remaining “bronze”data includes ∼ upper mass limits only.In this analysis only the “gold” sample plus eightSolar System planets were used. The here consid-ered raw M – R data is presented visually in Fig. 1.Curved (red) line shows the theoretical radius-massrelation for a pure Fe planet (Zeng & Sasselov,2013). Solar System planets are marked by + . Theresulting histogram for base-10 logarithm of the den-sity is shown in Fig. 2, using 32-bins chosen for vi-sual convenience.Three low density outliers below 0.05 g/cc (yel-low bars in Fig. 2: Kepler-51 b,c,d Masuda 2014)and three high density above 50 g/cc (orange bars inFig. 2: Kepler-128 b,c Xie 2014, Kepler-131 c Marcy& et. al. 2014) are also visible in Fig. 1 below and1 a r X i v : . [ a s t r o - ph . E P ] D ec + ++++++ ++++++++ ⨯ km 5 ⨯ km 10 km 2 ⨯ km 5 ⨯ km 10 km 2 ⨯ km10 R ⊕ R ⊕ R ⊕ R J R J R J R J M J M J M J M J M ⊕ M ⊕ M ⊕ M ⊕ M ⊕ M ⊕ M ⊕ Radius [ km ] M a ss [ k g ] . / c m / c m / c m Red - TransitBlue - RVGreen - OtherBlack - Transit&RV + Solar System
Figure 1: Scatter plot in mass – radius plane (log scale) of raw data for 432 (exo)planets. Data pointsare color coded according to detection method: red: transit; blue: radial velocity (RV); green: imaging,microlensing; black: both RV and transit. Diagonal lines along constant average density delimit σ -domainsidentified in our analysis as belonging to the three main families of exoplanets, see text.respectively, above the diagonal lines. These are sep-arated from the bulk of data and are excluded fromthe statistical analysis. Data analysis.
Wolfram
Mathematica 11
EstimatedDistributionEstimatedDistributionEstimatedDistribution command (”Wolfram Re-search Inc.”, 2016) was used to process our data setof 418 exoplanets (424 less 6 outliers) + 8 SolarSystem planets. Log-likelihood maximum for thedata is the continuous black dot-dashed curve shownin Fig. 2. This result suggests that the probabilitydensity for exoplanet data ¯ ρ is the superposition ofthree log-normal Gaussian distributions. The two biggest Gaussian components in Fig. 2(red and green lines) can be recognized in the densitydistribution figure before the numerical fit. A thirdsmaller component (blue line) emerges as an addi-tional component during numerical treatment. Val-ues for positions ¯ ρ k , normalizations n k , and stan-dard deviations σ k of the three Gaussians are shownwithin contents off Fig. 2. The Pearson χ prob-ability test shows a value above P > . forthe 3-Gaussian probability density function shown inFig. 2.The envelope curve seen in Fig. 2 for exoplanetdata is thus a superposition of three dimensionless2robability distributions dPd lg ¯ ρ = ρ dPd ¯ ρ . (2)A probability distribution normal in lg ¯ ρ could be dueto extraneous factors such as data sampling duringobservation, but it could also be related to scale-freeplanet formation mechanisms. Proposed classification of exoplanets.
We clas-sify the populations from left to right in Fig. 2; thenames of the components are based in our intuitiveexpectation and prior knowledge. Since distributionsoverlap objects within a given density range may notbe of the same nature.I.
Ice/gas
The first and dominant ( P I (cid:39) )population depicted as a red line and centeredat ρ I = 0 . g/cc in Fig. 2 corresponds to theSaturn/Uranus/Jupiter planet type. Consider-ing the full width at half maximum (FWHM),the distribution extends from ρ (cid:39) . g/cc to ρ (cid:39) . g/cc (Fig. 2, dotted horizontal seg-ment). Members of this population are foundpredominantly between red diagonal lines inFig. 1.II. Iron/rock
The second component ( P e (cid:39) , ρ e = 6 . g/cc, FWHM from . to . g/cc,with objects found between green diagonallines in Fig. 1) is shown as a green line in Fig. 2.These objects are very near to Earth’s averagedensity of 5.5 g/cc. This is so-called super-Earth population, i.e. , planets with compositionsimilar to Earth, but often more massive, seee.g. Petigura et al. (2015), Fig. 6.III. Degenerate
The third and smallest component( P d (cid:39) . , ρ d (cid:39) g/cc, at FWHM extend-ing from to g/cc, cf. Fig. 1, blue bands) is shown as a blue line in Fig. 2. This density do-main overlaps with electron degenerate matter, i.e. brown dwarfs (Burrows et al., 2001; Bur-rows & Liebert, 1993).Since the three population classes are overlapping in density, the individual object planet class member-ship is to be understood in a statistical sense. For ex-ample, according to the proposed classification, theEarth, given the average density 5.5 g/cc, has 4.4times less chance to be an ice giant than super-Earthobject. It is possible that with more abundant andprecise exoplanet data and allowing for additionalinformation (e.g. range of M , and R ; surface com-position) the classification can be made more pre-cise. The super-Earths normalization ( P e = 19 %) issmaller than expected based on Solar System experi-ence ( P e (cid:39) %). This could be result of a bias in-duced by observational methods available today thatfavor detection of M and R for large objects as wenote visually in Fig. 1. Conclusions.
Understanding of mean density dis-tribution for exoplanets offers a convenient tool toidentify the new and mysterious in the Universe. Theknowledge of the widths of the population distribu-tions allows to realize presence of anomalies whenlarger exoplanet data base becomes available. Ouranalysis results thus lay out the basis for the discov-ery of new classes of rare objects, e.g. CUDO (Rafel-ski et al., 2013), dark matter (Diemand et al., 2005)or strange matter (Shaw et al., 1989) contaminatedexoplanets. Indeed three small ultra-dense outliersare a tantalizing indication of mysteries that the fu-ture exoplanet results may reveal.We proposed that the extrasolar planet distribu-tion is a superposition of three log-normal Gaussianspopulation components allowing the introduction ofthree classes of exoplanetary objects, distinguishingthese by average density. The two classes (I. ice/gas3iants, 80% and II. super-Earths, 19%) dominate theavailable data. Our classification in terms of densityagrees with the Solar System situation where outerand inner planets are in classes I and II, respectively.The observed relative normalization of the compo-nents, strongly favoring the ice/gas class, is probablyan observational bias. This bias is also the reasonwhy we do not divide the results seen in Fig. 1 intodomains according to M - R ranges as the eye easilycaptures.The degenerate class III. includes about 1.5% ofthe available objects among 432. In mathematicalanalysis a separate population can be assigned. Onthe other hand, 2 x 3 outliers removed from fit are toofew to be assigned their own population class. Theseoutliers are inconsistent with the derived probabil-ity distribution function (PDF) at 94% confidencelevel in the sense of conservative Pearson χ test,this inconsistency is higher (97.1%) if low/high den-sity are considered separately. When more data be-comes available it will be possible to decide if thetwo groups of three density outlier exoplanets are adata fluctuation or, more interestingly, a new popula-tion.The proposed classification method employing av-erage density statistics can be used to analyze grow-ing data sets on other astrophysical objects: stars in-cluding white dwarfs, neutron stars, and minor bod-ies of the Solar System, which analysis is currentlyunderway. Acknowledgment.
A.O. work was supported by The Kosciuszko Foun-dation. A.O. thanks the Department of Physics, Uni-versity of Arizona for kind hospitality.
References
Akeson R. L., et. al. 2013, PASP, 125, 989Burrows A., Hubbard W. B., Lunine J. I., Liebert J.,2001, Rev. Mod. Phys., 73, 719Burrows A., Liebert J., 1993, Rev. Mod. Phys., 65,301Diemand J., Moore B., Stadel J., 2005, Nature, 433,389Hadden S., Lithwick Y., 2014, ApJ, 787, 80Hecht J., 2016, Nature, 530, 272Howard A. W., 2013, Science, 340, 572Lissauer J. J., Dawson R. I., Tremaine S., 2014, Na-ture, 513, 336Marcy G. W., et. al. 2014, ApJS, 210, 20Masuda K., 2014, ApJ, 783, 53Petigura E. A., Schlieder J. E., Crossfield I. J. M.,Howard A. W., Deck K. M., Ciardi D. R., SinukoffE., Allers K. N., Best W. M. J., Liu M. C., Beich-man C. A., Isaacson H., Hansen B. M. S., L´epineS., 2015, ApJ, 811, 102Rafelski J., Labun L., Birrell J., 2013, Physical Re-view Letters, 110, 111102Schneider J., Dedieu C., Le Sidaner P., Savalle R.,Zolotukhin I., 2011, A&A, 532, A79Shaw G. L., Shin M., Desai M., Dalitz R. H., 1989,Nature, 337, 436Weisskopf V. F., 1975, Science, 187, 605Witze A., 2015, Nature, 527, 2884Wolfram Research Inc.”, 2016, ’Wolfram—AlphaKnowledgebase”Xie J.-W., 2014, ApJS, 210, 25Zeng L., Sasselov D., 2013, PASP, 125, 227 5 . Ice / gas giants ρ = / cm n = σ = - Earths ρ = / cm n = σ = ρ =
29. g / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Figure 2: Distribution of the average density for 424 exoplanets and 8 Solar System planets. Histogramof 32 data bins is shown for visualization purpose only; our fit uses the density data directly. Interpreta-tion/names of the component curves is based on the study of average density only.6 upplement
S.1 Methodology of finding best physics fit
The sample of 424 available exoplanet mean densities is large enough to exclude most probability distri-butions but still leaves room for a few good candidates. We find that sum of log-normal distributions (i.e.Gaussians in base-10 logarithm of mean density, lg ¯ ρ ) is the best physics characterization of the availabledata. However, even within the log-normal distribution, there is still room for consideration of a variablenumber of contributing components. We therefore provide here comparative analysis of the various fits forthe density data. An overview of the most important hypothetical distributions considered is presented inTable S.1.To find best parameters, e.g. ¯ ρ , σ for the assumed statistical distribution P ( ¯ ρ ; ¯ ρ , σ ) the log-likelihood: L ( ¯ ρ , σ ) = N (cid:88) i =1 ln P i , P i = P ( ¯ ρ i ; ¯ ρ , σ ) is maximized. N is the total number of mean density measurements ¯ ρ i . Since usually most probabilities P i < , ln P i < and numerical value for good fits L = O ( − N ) , where N = 426 . Better probabilitydensity results in less negative log-likelihood, cf. Table S.1.To fit density data, a probability distribution must fulfill some obvious requirements, e.g. a range fromzero to infinity for ¯ ρ , i.e. the full real line for lg ¯ ρ . One often attempts to check first superposition of severalGaussians. However, the main maximum ¯ ρ (cid:39) g/cc with a width of the same magnitude and a long tailstretching to high density prohibits use of multiple Gaussian distribution in density, producing P-valueswhich are essentially zero, see last line entry in Table S.1.On the other hand, on first sight a histogram in lg ¯ ρ showed a compact distribution shape with visibletwo “peaks”; that is, two contributing populations. This observation allows multiple choices for each in-dividual distribution. Therefore we must take into account simplicity of hypothesis and physical intuitionto proceed. Arguments strongly favoring simple log-normal, i.e. Gaussians dependent on lg ¯ ρ as popula-tion components, are: (i) density distribution in lg ¯ ρ is dimensionless; (ii) multiple factors playing a role inplanet formation are consistent with the central limit theorem outcome; (iii) automated brute-force fit-search(symbolic regression) places 2-component log-normal mixture distribution on the top of the list as the mostlikely; (iv) P-values of typical tests, Pearson χ in particular, are the largest we find.We see in Fig. S.1 that a single log-normal population has indeed a small P-value ( P > . ) while in-troducing a second log-normal population, see Fig. S.2, the P-value raises to acceptable level ( P > . ).We thus assume that both population components are described by the same functional log-normal normaldistribution and for comparison purpose we present in Table S.1 also log-logistic distribution result.The width of distributions we report has considerable physical significance, and it depends on the possiblepresence of a third distribution. We thus explore this option further. When adding a third population wehad to select among similar numerical outcome considering three cases: A) A high density third population,7istribution χ ) log-Likelihood Figurelog-normal, 1 0.59% -343.9 Fig. S.1log-normal, 2 93.47% -325.9 Fig. S.2log-normal, 3-A 97.15% -322.9 Fig. S.3log-normal, 3-B 92.86% -325.4 Fig. S.4log-normal, 3-C -324.7 Fig. S.5log-Logistic, 1 0.34% -350.9 -log-Logistic, 2 84.33% -329.8 -log-Logistic, 3 82.28% -327.6 -normal, 3 . × − % - -Table S.1: Probability estimates for various log-Normal and other fits.a choice motivated by brown-dwarf theories, see Fig. S.3, which improves the P-value; B) A low densitysub-population see Fig. S.4, which results in reduced P-value ( P > . ); and C) we allow some of thedistribution irregularities to be characterized by a narrow distribution, in exploration of what one calls “over-fitting”, see Fig. S.5. Case A (cf. Table S.1) has maximum likelihood, but not the largest P-value (smallest χ ) which we find for the case C). Given priors in physics we retain case A) as our choice distribution seeFig. 1 of the main article corresponding to Fig. S.3 of this supplement. However, we keep in mind thatthere could be still further distinct populations in future data. Here, we remind that the 2 x 3 data high/lowdensity outliers are not considered in the present analysis and these also could signal additional exoplanetpopulations. 8 . ρ = / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Statistic P-ValueAnderson-Darling 3.105376 0.02422163Baringhaus-Henze 4.502077 0.0003196615Cramér-von Mises 0.5348495 0.0325268Jarque-Bera ALM 16.75831 0.004104904Mardia Combined 16.75831 0.004104904Mardia Kurtosis - χ × - Figure S.1: Single log-normal population fit to the exoplanet density data showing also the resulting P-valueand distribution parameters. 9 . ρ = / cm n = σ = ρ = / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Statistic P-ValueAnderson-Darling 0.1324396 0.9995Cramér-von Mises 0.01799333 0.9985019Pearson χ Figure S.2: Double log-normal population fit to the exoplanet density data shewing also the resultingP-value and distribution parameters. 10 . ρ = / cm n = σ = ρ = / cm n = σ = ρ =
29. g / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Statistic P-ValueAnderson-Darling 0.1344163 0.9994305Cramér-von Mises 0.02057215 0.9964321Pearson χ Figure S.3: Case A) triple log-normal population fit to the exoplanet density data showing also the resultingP-value and distribution parameters. 11 . ρ = / cm n = σ = ρ = / cm n = σ = ρ = / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Statistic P-ValueAnderson-Darling 0.1211084 0.9997822Cramér-von Mises 0.01593697 0.9993872Pearson χ Figure S.4: Case B) triple log-normal population fit to the exoplanet density data showing also the resultingP-value and distribution parameters. 12 . ρ = / cm n = σ = ρ = / cm n = σ = ρ = / cm n = σ = [ g / cm ] P r obab ili t y den s i t y P > % Statistic P-ValueAnderson-Darling 0.1063573 0.999944Cramér-von Mises 0.01231755 0.9999386Pearson χ Figure S.5: Case C) triple log-normal population fit to the exoplanet density data showing also the resultingP-value and distribution parameters with third population focused on ρ (cid:39) . g/cc.13 .2 Data files and programs Additional files required to reproduce our results are included separately. List and description is providedbelow. • Text file exoplanet GOLD.txt (CSV, comma separated values) with 424 gold sample of exoplanets usedin analysis.Header:
Name, SRC, Mneg, M0, M1, M2, Limit, Mass Src, TTV, R, dRneg, dRpos, Name2, RV, Transit, RA, DEC
Column description in file:1. Name, selected common name to identify planet2. SRC, source of the values (NASA or exoplanet.eu)3. Mneg, max(0,reported mass - error) [Jupiter mass]4. M0, reported mass [Jupiter mass]5. M1, reported mass + error [Jupiter mass]6. M2, reported mass + 2*error [Jupiter mass]7. Limit, 0/1 mass limit flag8. Mass Src, method used to measure mass9. TTV, 0/1 flag indicating use of Time Transit Variation to measure mass10. R, reported radius [Jupiter radius]11. dRneg, radius error towards zero [Jupiter radius]12. dRpos, radius error towards infinity [Jupiter radius]13. Name2, alternative name14. RV, Radial Velocity method flag15. Transit, Transit method flag16. RA, Right ascension [degrees]17. DEC, Declination, [degrees] • Mathematica notebook file Exoplanets.nb including code used to obtain results and Figures. ••