[PDF] Mapping stellar surfaces II: An interpretable Gaussian process model for light curves

Abstract

The use of Gaussian processes (GPs) as models for astronomical time series datasets has recently become almost ubiquitous, given their ease of use and flexibility. GPs excel in particular at marginalization over the stellar signal in cases where the variability due to starspots rotating in and out of view is treated as a nuisance, such as in exoplanet transit modeling. However, these effective models are less useful in cases where the starspot signal is of primary interest since it is not obvious how the parameters of the GP model are related to the physical properties of interest, such as the size, contrast, and latitudinal distribution of the spots. Instead, it is common practice to explicitly model the effect of individual starspots on the light curve and attempt to infer their properties via optimization or posterior inference. Unfortunately, this process is degenerate, ill-posed, and often computationally intractable when applied to stars with more than a few spots and/or to ensembles of many light curves. In this paper, we derive a closed-form expression for the mean and covariance of a Gaussian process model that describes the light curve of a rotating, evolving stellar surface conditioned on a given distribution of starspot sizes, contrasts, and latitudes. We demonstrate that this model is correctly calibrated, allowing one to robustly infer physical parameters of interest from one or more stellar light curves, including the typical radii and the mean and variance of the latitude distribution of starspots. Our GP has far-ranging implications for understanding the variability and magnetic activity of stars from both light curves and radial velocity (RV) measurements, as well as for robustly modeling correlated noise in both transiting and RV exoplanet searches. Our implementation is efficient, user-friendly, and open source, available as the Python package starry-process.

Full PDF

DDraft version February 11, 2021

Typeset using L A TEX modern style in AASTeX62

Mapping stellar surfacesII: An interpretable Gaussian process model for light curves

Rodrigo Luger,

1, 2, ∗ Daniel Foreman-Mackey, and Christina Hedges

3, 4 Center for Computational Astrophysics, Flatiron Institute, New York, NY Virtual Planetary Laboratory, University of Washington, Seattle, WA Bay Area Environmental Research Institute, Moﬀett Field, CA NASA Ames Research Center, Moﬀett Field, CA

ABSTRACTThe use of Gaussian processes (GPs) as models for astronomical time series datasetshas recently become almost ubiquitous, given their ease of use and ﬂexibility. GPsexcel in particular at marginalization over the stellar signal in cases where the vari-ability due to starspots rotating in and out of view is treated as a nuisance, suchas in exoplanet transit modeling. However, these eﬀective models are less useful incases where the starspot signal is of primary interest since it is not obvious how theparameters of the GP model are related to the physical properties of interest, suchas the size, contrast, and latitudinal distribution of the spots. Instead, it is commonpractice to explicitly model the eﬀect of individual starspots on the light curve andattempt to infer their properties via optimization or posterior inference. Unfortu-nately, this process is degenerate, ill-posed, and often computationally intractablewhen applied to stars with more than a few spots and/or to ensembles of many lightcurves. In this paper, we derive a closed-form expression for the mean and covarianceof a Gaussian process model that describes the light curve of a rotating, evolvingstellar surface conditioned on a given distribution of starspot sizes, contrasts, andlatitudes. We demonstrate that this model is correctly calibrated, allowing one torobustly infer physical parameters of interest from one or more stellar light curves,including the typical radii and the mean and variance of the latitude distribution ofstarspots. Our GP has far-ranging implications for understanding the variability andmagnetic activity of stars from both light curves and radial velocity (RV) measure-ments, as well as for robustly modeling correlated noise in both transiting and RVexoplanet searches. Our implementation is eﬃcient, user-friendly, and open source,available as the

Python package starry_process . (cid:135) Keywords: analytical mathematics — time series analysis — Gaussian processes re-gression — starspots

Features: open-source ﬁgures - ; equation unit tests: 17 passed ¸ , 0 failed Ø rluger@ﬂatironinstitute.org ∗ Flatiron Fellow a r X i v : . [ a s t r o - ph . S R ] F e b INTRODUCTIONOver the past two decades, Gaussian processes (GPs; Rasmussen & Williams 2005)have gained traction as a leading tool for modeling correlated signals in astronomicaldatasets. In particular, GPs are commonly used to model stellar variability in photo-metric time series (e.g., Brewer & Stello 2009; Aigrain et al. 2016; Luger et al. 2016;Foreman-Mackey et al. 2017; Angus et al. 2018) and radial velocity measurements(e.g., Rajpaul et al. 2015; Jones et al. 2017; Perger et al. 2020). GPs are popular mod-els for these applications because they allow marginalization over a stochastic noiseprocess speciﬁed only by a kernel describing its autocorrelation structure. There areseveral popular open source implementations that allow eﬃcient evaluation of GPs,and these have been widely demonstrated to be useful eﬀective models for the timeseries when the stochastic variability due to the star is primarily a nuisance (e.g.,Ambikasaran et al. 2015; Foreman-Mackey et al. 2017; Gilbertson et al. 2020).A major source of stellar variability in both light curves and radial velocity datasetsis the modulation induced by magnetically-driven surface features like starspots ro-tating in and out of view. While GPs excel at marginalizing over stellar rotationalvariability, they have been less useful when the goal is to make inferences about theactual source of this variability, such as the properties of starspots and the magneticprocesses that generate them.

While it is straightforward to derive posteriorconstraints on the hyperparameters of an eﬀective GP model for observa-tions of a star, it is not clear what those constraints actually tell us aboutthe stellar surface.

Speciﬁcally, in all but a few restricted cases, there is no ﬁrstprinciples relationship between the descriptive parameters of a typical GP model (see§2.1) and the physical properties of the stellar surface that is being observed. Forinstance, it may be tempting to interpret the GP amplitude hyperparameter as somemeasure of the spot contrast or the total number of spots, or the GP timescale hy-perparameter as the spot lifetime, but there are no guarantees these interpretationswill hold in general. After all, the choice of kernel is quite often ad hoc , providingan eﬀective —as opposed to interpretable —description of the physics. There are twoimportant exceptions to this: asteroseismic studies, in which the the GP hyperparam-eters can oﬀer direct insight into the behavior of complex pulsation modes and thusphysical properties of the stellar interior (e.g., Brewer & Stello 2009; Foreman-Mackeyet al. 2017); and stellar rotation period studies, in which the period hyperparameter P can usually be associated with the rotation period of the star (e.g., Angus et al.2018). For spot-induced variability, on the other hand, GPs are usually used whenthe variability itself is a nuisance parameter. For example, if the goal is to constrainthe properties of a transiting exoplanet or to search for a planetary signal in a radialvelocity dataset, a GP might be used to remove (or, better yet, to marginalize over) An exception to this is in the presence of strong diﬀerential rotation, in which case manyperiods may be present in the data, or when spots evolve coherently, which can also introduce weakperiodicities in the light curve. the stellar variability (e.g., Haywood et al. 2014; Rajpaul et al. 2015; Luger et al.2017b). In this case, the physics behind the variability is irrelevant, so an eﬀectivemodel of this sort may be suﬃcient.However, understanding the properties of stellar surfaces and starspots in particu-lar is a crucial step toward understanding stellar magnetism, which plays a fundamen-tal part in stellar interior structure and evolution. Stellar magnetic ﬁelds control thespin-down of stars over time, on which the ﬁeld of gyrochronology is founded (Barneset al. 2001; Angus et al. 2019). They aﬀect wave propagation in stellar interiors andmust be properly understood to interpret asteroseismic measurements (e.g., Fulleret al. 2015). Strong magnetic ﬁelds are also likely the driving force behind chemi-cal peculiarity in Ap/Bp stars (Turcotte 2003; Sikora et al. 2018), as well as radiusinﬂation in M dwarfs (Gough & Tayler 1966; Ireland & Browning 2018). Stellar mag-netohydrodynamics (MHD) is therefore an active area of research, with many openquestions (e.g., Miesch & Toomre 2009). Because of the nonlinearity of the MHDequations and the vast range of scales on which magnetic processes operate, there isstill signiﬁcant theoretical uncertainty concerning how dynamos operate in stars ofdiﬀerent masses, how magnetic ﬁelds aﬀect stellar rotation, and how star spots form(Yadav et al. 2015; Weber & Browning 2016). Observational constraints on starspotsand other magnetically-controlled surface features are therefore extremely valuableto understanding various problems in stellar astrophysics.Moreover, even when the stellar signal is considered a nuisance, a physically-drivenvariability model may be a better choice than an eﬀective model in some cases, par-ticularly when the signal of interest is small compared to the systematics. A speciﬁcexample of this is in transmission spectroscopy of transiting exoplanets, where thecontribution from unocculted spots and faculae to the spectrum can be an order ofmagnitude larger than that of the planet atmosphere (Rackham et al. 2018). In thiscase, failure to explicitly model the eﬀect of starspots can lead to spurious featuresin the planet spectrum. A similar situation arises in extreme precision radial veloc-ity (EPRV) searches for planets, where the stellar signal can be orders of magnitudelarger than the planetary signal. While eﬀective models of variability have often beensuccessful at disentangling the planetary and stellar contributions (e.g., Rajpaul et al.2015), these models can struggle when the (a priori unknown) orbital period of theplanet is close to an alias of the rotational period of the star (Vanderburg et al. 2016;Damasso et al. 2019; Robertson et al. 2020). In this case, a physically-driven modelof variability would likely perform better.When the goal is to learn about the stellar surface, the common approach in theliterature has not been to use GPs, but to explicitly forward model the surface. Such amodel allows one to compute a stellar light curve or spectral timeseries conditioned oncertain surface properties, a procedure that must then be inverted in order to constrainthe surface given a dataset. We discussed this approach for rotational light curves ofstars in Luger et al. (2021b) (hereafter Paper I), where we argued a unique solutionto the surface map of the star is not possible without the use of aggressive (and often ad hoc ) priors.

The degeneracies at play make it eﬀectively impossible forone to know the exact conﬁguration of starspots and other features on thesurface of a star from its rotational light curve alone.

However, it is hardly ever the case that this is actually our end goal. After all,physics can be used to predict properties of stellar surfaces at a fairly high level: i.e.,typical spot sizes, active spot latitudes, or approximate timescales on which spotsevolve (e.g., Schuessler et al. 1996; Solanki et al. 2006; Cantiello & Braithwaite 2019).We are hardly ever interested in the particular properties of a particular spot, aswe wouldn’t really know what to do with that information! Instead, we often treat(whether explicitly or not) the properties of a starspot as a draw from some parentdistribution controlling (say) the average and spread in the radii of the spots. Theparameters controlling this distribution are the ones that we can predict with physics;they are therefore also the ones we are usually interested in.Thus, if it were possible to derive robust posterior constraints on the properties ofeach of the spots on a star, we could then marginalize (integrate) over them to inferthe properties describing the distribution of all the spots as a whole. We could dothis using the forward model approach described above, by modeling the properties ofeach of the spots and computing the corresponding light curves. Then, we could solvethe “inverse” problem via a posterior sampling scheme, such as Markov Chain MonteCarlo (MCMC), while including a few hyperparameters controlling the distributionof those properties across all spots: i.e., a one-level hierarchical model. The marginalposteriors for the hyperparameters, then, would encode what we actually wish toknow. In practice, however, the degeneracies and often extreme multi-modality of thedistributions of individual spot properties would make this quite hard (and expensive)to perform. If only we could use the elegant machinery of Gaussian processes toperform this marginalization for us!

In this paper, we derive an exact, closed-form expression for the Gaus-sian approximation to the marginal likelihood of a light curve conditionedon the statistical properties of starspots, which allows us deﬁne an interpretableGaussian process for stellar light curves. Our GP analytically marginalizes over thedegenerate and often unknowable distributions of properties of individual starspots,revealing the constraints imposed on the bulk spot properties without the need toexplicitly model or sample over properties of individual spots. It inherits the speed,ease-of-use, and all other properties of traditionally-used GPs, with the added beneﬁtof direct physical interpretability of its hyperparameters.While our GP can be used to model light curves of individual stars, it is particularlyuseful for ensemble analyses of light curves of many similar stars.

As weshowed in Paper I, the joint information content of the light curves of many similarstars can be harnessed to constrain statistical properties of the surfaces of those stars,even in the presence of degeneracies that preclude knowledge about the surfaces ofindividual stars. By “similar”, we do not mean stars that look similar, but whosespot properties are drawn from the same parent distribution. The parameters of thisparent distribution are the ones we can constrain; the are also usually the physicallyinteresting ones, such as the typical spot sizes or typical active latitudes and thevariance in those quantites across the population. Ensembles may thus comprise lightcurves of stars in a narrow spectral type, metallicity, and rotation period bin, which wemight reasonably expect to have statistically similar surfaces. We encourage readersto read Paper I to better understand this and other points regarding the informationtheory behind stellar rotational light curves.The present paper is organized as follows: we present an overview of the derivationof the GP in §2 and a suite of tests on synthetic data to show the model is calibratedin §3. We discuss our results and the limitations of our model in §4 and presentstraightforward extensions of the GP, including its application to time-variable sur-faces, in §5. In §6 we summarize our results and discuss topics we will address infuture papers in this series.Most of the math behind the algorithm is presented in the Appendix, followed by aseries of supplementary ﬁgures (discussed in §3). Appendix A discusses the notationwe adopt throughout the paper and Table 2 lists the main symbols and variables, withlinks to their deﬁnitions. The algorithm developed in this paper is fully implementedin the starry_process code, which is available on GitHub and is described in moredetail in Luger et al. (2021a).Finally, we note that all of the ﬁgures in this paper were auto-generated usingthe Azure Pipelines continuous integration (CI) service, which ensures they are upto date with the latest version of the starry_process code. In particular, icons nextto each of the ﬁgures - link to the exact script used to generate them to ensure thereproducibility of our results. As in Paper I, the principal equations are accompaniedby “unit tests”: pytest -compatible test scripts associated with the principal equationsthat pass (fail) if the equation is correct (wrong), in which case a clickable ¸ ( Ø ) isshown next to the equation label. In most cases, the validity of an equation is gaugedby comparison to a numerical solution. Like the ﬁgure scripts, the equation unit testsare run on Azure Pipelines upon every commit of the code. A GAUSSIAN PROCESS FOR STARSPOTSIn this section, we provide a brief overview of Gaussian processes (§2.1) and spher-ical harmonics (§2.2), followed by an outline of the derivation of our interpretableGP (§2.3). This derivation boils down to computing the mean and covariance ofthe stellar ﬂux conditioned on certain physical properties of the star and its starspotdistribution. In §2.4 and §2.5 we derive useful extensions of the model.

For con-venience, we summarize the results of this entire section in §2.6.

Most ofthe math is folded into the Appendix for readability; readers may want to refer toAppendix A in particular for a discussion of the notation and conventions we adopt.2.1.

Brief overview of Gaussian processes

Despite whatever mystique the words “Gaussian process” may evoke, a GP is noth-ing but a Gaussian distribution in many (formally inﬁnite) dimensions. Speciﬁcally,it is a Gaussian distribution over functions spanning a continuous domain (in ourcase, the time domain). Similar to a multivariate Gaussian, which is described by a ( K × vector µµµ characterizing the mean of the process and a ( K × K ) matrix ΣΣΣ characterizing its covariance, a GP is fully speciﬁed by a mean function m ( t ) and akernel function k ( t, t (cid:48) ) . To say that a random vector-valued variable f deﬁned on a ( K × time array t is “distributed as a GP” means that we may write f ∼ N ( µµµ, ΣΣΣ) , (1)where the elements of the mean and covariance are given by µ i = m ( t i ) and Σ i,j = k ( t i , t j ) , respectively. Because of this relationship to multivariate Gaussians,GPs are easy to sample from. But, as we alluded to earlier, the real showstopperis the application of GPs to inference problems. Multivariate Gaussian distributionshave a closed-form (marginal) likelihood function, so it is easy to compute the prob-ability of one’s data conditioned on a given value of µµµ and

ΣΣΣ (i.e., the “likelihood”;see Equation 14 below). This can in turn be maximized to infer the optimal values ofthe model parameters or used in a numerical sampling scheme to compute the prob-ability of those parameters given the data (i.e., the “posterior”). Thanks to moderncomputer architectures, linear algebra packages, and GP algorithms, evaluating theGP likelihood may typically be done in a fraction of a second for a reasonably-sizeddataset (i.e., K (cid:46) datapoints).Another big advantage of GPs is their ﬂexibility. GPs are often dubbed a class of“non-parametric” models, given that nowhere in the speciﬁcation of the GP is there anexplicit functional form for f . Rather, a GP is a stochastic process whose draws canin principle take on any functional form, subject, however, to certain smoothness andcorrelation criteria of tunable strictness that are fully encoded in the covariance ΣΣΣ .In many applications, particularly when modeling stellar light curves, it is customaryto restrict the problem by assuming that the process is stationary , such that we maywrite Σ i,j = k ( t i , t j )= k ( | t i − t j | ) ≡ k (∆ t ) . (2) In this paper, we will use blackboard font (i.e., f ) to denote random variables and serif font (i.e., f ) to denote particular realizations of those variables. See Appendix A for a detailed explanation ofour notation. Given a 1-d array mean and a 2-d array cov in Python , sampling from the corresponding GP (if itexists) can be done in a single line of code by calling numpy.random.multivariate_normal(mean,cov) . l = 0 l = 1 l = 2 l = 3 l = 4 m = - l = 5 m = - m = - m = - m = - m = m = m = m = m = m = Figure 1.

The real spherical harmonics in the polar frame up to l = 5 , where dark colorscorrespond to negative intensity and bright colors to positive intensity. Rows correspondto the degree l and columns to the order m . The set of all spherical harmonics forms acomplete, orthogonal basis on the sphere. - A stationary process is one that is independent of phase (or, in this case, the actualvalue of the time t ); rather, it depends only on the diﬀerence between the phases oftwo data points. The kernel of a stationary process is therefore a one-dimensionalfunction, typically chosen from a set of standard functions with desirable smoothnessand spectral properties.The GP we derive in this paper is stationary and admits a representation as a one-dimensional kernel function. However, as we show in §2.5, the common practice ofnormalizing stellar light curves to their mean or median value breaks this stationarity.For this reason, it is more convenient to derive and present our GP covariance as a ( K × K ) matrix ΣΣΣ and our GP mean as a ( K × vector µµµ for arbitrary K insteadof as a kernel and a mean function. Note, importantly, that these representations areequivalent given the deﬁnitions above.2.2. Spherical harmonics

Before we dive into the computation of our GP, it is useful to introduce the spher-ical harmonics, a set of orthogonal functions on the surface of the sphere which wewill use to describe the intensity ﬁeld on the surface of a star (Figure 1). As wewill see below, the spherical harmonics are a particularly convenient basis in whichto describe starspot distributions , as they will allow us to compute moments of theintensity distribution analytically. Of more immediate concern, Luger et al. (2019)showed that there is a linear relationship between the spherical harmonic expansionof a stellar surface and the total disk-integrated ﬂux f (i.e., the light curve) one wouldobserve as the star rotates about a ﬁxed axis. If the stellar surface intensity is de-scribed by a spherical harmonic coeﬃcient vector y (up to a certain degree l max ), theﬂux is given by f = + AAA ( I, P, u ) y , (3)where is the ones vector and AAA is the starry design matrix, a purely linear operatorthat transforms from the spherical harmonic basis to the ﬂux basis; it is a function ofthe stellar inclination I , the stellar rotation period P , and the stellar limb darkeningcoeﬃcients u , as well as the observation times (see Appendix B for details).2.3. Computing the GP

Let f = ( (cid:102) (cid:102) · · · (cid:102) K − ) (cid:62) denote a random vector of K ﬂux measurements attimes t = ( t t · · · t K − ) (cid:62) , deﬁned in units such that a star with no spots on itwill have unit ﬂux. Conditioned on the stellar inclination I , the rotational period P , a set of limb darkening coeﬃcients u , and on certain properties of the starspots θθθ • (including their number, sizes, positions, and contrasts), we wish to compute themean µµµ ( I, P, u , θθθ • ) and covariance ΣΣΣ(

I, P, u , θθθ • ) of f . Together, these specify a mul-tidimensional Gaussian distribution, which we assume fully describes how our ﬂuxmeasurements are distributed: f ( I, P, u , θθθ • ) ∼ N (cid:16) µµµ ( I, P, u , θθθ • ) , ΣΣΣ (

I, P, u , θθθ • ) (cid:17) (4)As with any random variable, the mean and covariance may be computed from theexpectation values of f and f f (cid:62) , respectively: µµµ ( I, P, u , θθθ • ) = E (cid:104) f (cid:12)(cid:12)(cid:12) I, P, u , θθθ • (cid:105) (5) ΣΣΣ(

I, P, u , θθθ • ) = E (cid:104) f f (cid:62) (cid:12)(cid:12)(cid:12) I, P, u , θθθ • (cid:105) − µµµ ( I, P, u , θθθ • ) µµµ (cid:62) ( I, P, u , θθθ • ) . (6)Given the linear relationship between ﬂux and spherical harmonic coeﬃcients (Equa-tion 3), we may write the mean and covariance of our GP as µµµ ( P, u , θθθ • ) = + AAA ( I, P, u ) µµµ y ( θθθ • ) (7) ΣΣΣ( P, u , θθθ • ) = AAA ( I, P, u ) ΣΣΣ y ( θθθ • ) AAA (cid:62) ( I, P, u ) , (8) There are, of course, drawbacks to using this basis: in particular, the spherical harmonics aresmooth, continuous functions that struggle (at ﬁnite degree l ) to capture high resolution featuressuch as small starspots. We discuss this point at length in §4.1, where we show that our model isuseful even when applied to stars with spots smaller than the eﬀective resolution of the GP. Note, importantly, that these are not the units we observe in! See Paper I and §2.5 below. The true distribution of stellar light curves conditioned on I , P , u , and θθθ • is not Gaussian, soour assumption is formally wrong. But, as the saying goes, all models are wrong; some are useful .As we will show later, this turns out to be an extremely useful assumption. where µµµ y ( θθθ • ) = E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) (9) ΣΣΣ y ( θθθ • ) = E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) − µµµ y ( θθθ • ) µµµ (cid:62) y ( θθθ • ) (10)are the mean and covariance of the distribution over spherical harmonic coeﬃcientvectors y . The bulk of the math in this paper (Appendix C) is devoted to computingthe expectations in the expressions above, which are given by the integrals E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = (cid:90) y ( x ) p ( x (cid:12)(cid:12) θθθ • )d x (11) E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = (cid:90) y ( x ) y (cid:62) ( x ) p ( x (cid:12)(cid:12) θθθ • )d x , (12)where x is a random vector-valued variable corresponding to a particular distributionof features on the surface and p ( x (cid:12)(cid:12) θθθ • ) is its probability density function (PDF). Inthe Appendix we show that for suitable choices of θθθ • , y ( x ) , and p ( x (cid:12)(cid:12) θθθ • ) , the integralsin the expressions above have closed form solutions that may be evaluated quickly.While we present a few diﬀerent ways of specifying θθθ • , our default representation ofthe GP hyperparameters is θθθ • = ( n c µ φ σ φ r ) (cid:62) , (13)where n is the number of starspots, c is their contrast (deﬁned as the intensity diﬀer-ence between the spot and the background intensity, as a fraction of the backgroundintensity), µ φ and σ φ are the mode and standard deviation of the spot latitude dis-tribution, respectively, and r is the radius of the spots. For simplicity, the PDFs forthe spot radius, the spot contrast, and the number of spots are chosen to be deltafunctions centered at r , c , and n , respectively (Appendices C.1 and C.4), while thespot longitude is assumed to be uniformly distributed (Appendix C.3). Finally, thePDF for the latitude φ of the spots is chosen to be a Beta distribution in cos φ with(normalized) parameters a and b , which have a one-to-one correspondence to themode µ φ and standard deviation σ φ of the distribution in φ (Appendix C.2). Thisallows us to model starspot distributions with “active latitudes” of tunable width thatare symmetric about the equator. The distribution is ﬂexible enough to also modelequatorial spots and isotropically-distributed spots. Stars with multiple active lat-itudes can easily be modeled as a sum of Gaussian processes (§5.1). These choicesfor the spatial distribution of spots are based on the Sun, whose spots emerge inazimuthally-symmetric belts at roughly the same latitude in both hemispheres, thenmigrate toward the equator over the course of the 11-year cycle (Solanki et al. 2006).In this paper, we assume that the parameters θθθ • described above are the physicallyinteresting ones. That is, given a light curve f or an ensemble of M light curves ofstatistically similar stars ( f f · · · f M − ) (cid:62) , we wish to infer the statistical properties0of the starspots, encoded in the entries of the vector θθθ • . This is typically a tall order,since it requires marginalizing over all the nuisance parameters, which include thenitty-gritty details of the size, contrast, and location of every spot (and, if M > ,on every star in the ensemble). Fortunately, however, the Gaussian process we con-structed does just that. Speciﬁcally, given the mean and covariance of the process, weare able to directly evaluate the log marginal likelihood of the m th dataset conditionedon a speciﬁc value of θθθ • (as well as I , P , and u ): ln L m ( I, P, u , θθθ • ) = − r (cid:62) m ( I, P, u , θθθ • ) (cid:2) ΣΣΣ (

I, P, u , θθθ • ) + C m (cid:3) − r m ( I, P, u , θθθ • ) −

12 ln (cid:12)(cid:12)(cid:12)

ΣΣΣ (

I, P, u , θθθ • ) + C m (cid:12)(cid:12)(cid:12) − K π ) , (14)where r m ( I, P, u , θθθ • ) ≡ f m − µµµ ( I, P, u , θθθ • ) (15)is the residual vector, C m is the data covariance (which in most cases is a diagonalmatrix whose entries are the squared uncertainty corresponding to each data point inthe light curve), | · · · | denotes the determinant, and K is the number of data pointsin each light curve. In an ensemble analysis, the joint marginal likelihood of alldatasets is simply the product of the individual likelihoods, so in log space we have ln L ( I, P, u , θθθ • ) = (cid:88) m ln L m ( I, P, u , θθθ • ) . (16)The marginal likelihood may be interpreted as the probability of the data given themodel. Typically, we are interested in the reverse: the probability of the model giventhe data , i.e., the posterior probability distribution. In later sections we presenta comprehensive suite of posterior inference exercises demonstrating that our GPmodel is correctly calibrated, allowing one to eﬃciently infer statistical properties ofstarspots from light curves with minimal bias.2.4. Marginalizing over inclination

As we mentioned in the previous section, the equations for the mean and covari-ance of our GP (Equations 7 and 8, respectively) are conditioned on speciﬁc values ofthe stellar and spot properties. To obtain the posterior distribution for these param-eters, we must typically resort to numerical sampling techniques, which often scalesteeply with the number of parameters. It is therefore generally desirable to keep thetotal number of parameters small, especially when employing the GP in an ensemblesetting. In such a setting, we might have light curves from M stars, all of which we In Equation (14) we implicitly assume all stars in the ensemble are observed at the same set oftimes t . If this is not the case, the mean and covariance of the GP for each star must be computedfrom Equations (5) and (6) with the ﬂux design matrix AAA ( t m ) evaluated at the particular observationtimes t m . N = 4 M + 5 , (17)since each of the stars will have their own set of 4 stellar properties (an inclination,a period, and usually two limb darkening coeﬃcients) but will all share the same5 spot properties θθθ • (by assumption). For a reasonably sized ensemble of M =100 stars, we would have to sample over N = 405 parameters. While large, thisnumber is certainly not absurd, especially by modern standards. However, it doespose a problem when considering how complex the posterior distribution for the spotmapping problem can be. In addition to strong nonlinear degeneracies between someof the parameters (such as the contrast c and the number of spots n ), the posterioris often multimodal, especially in the stellar inclinations. While modern samplingschemes such as Hamiltonian Monte Carlo and Nested Sampling may in principle beable to deal with these issues, in practice it can be very diﬃcult to obtain convergencein a reasonable amount of time.One workaround is to ﬁx the values of the stellar parameters. This could be done,for instance, to the rotational period P , which can often be estimated with fairlygood accuracy from a periodogram. The limb darkening coeﬃcients could be ﬁxedat theoretical values, or perhaps their values could be shared among all stars (andsampled over), given the similarity assumption above.The inclination, however, is a diﬀerent matter. Absent prior information for aparticular star (such as a measurement of its projected rotational velocity v sin I orthe knowledge that it hosts a transiting planet), it is simply not possible to reliablyestimate the inclination in a pre-processing step. Any light curve statistic one mightargue should scale with inclination—such as the amplitude of the variability—is in-variably degenerate with the spot parameters θθθ • . If one knew θθθ • , then perhaps adecent point estimate of I could be obtained, but in that case the analysis wouldn’tbe needed in the ﬁrst place!Fortunately, there is a better way to reduce the number of parameters in theproblem: we can explicitly marginalize over the stellar inclination. That is, we maywrite the mean and covariance of our GP as µµµ ( P, u , θθθ • ) = E (cid:104) f (cid:12)(cid:12)(cid:12) P, u , θθθ • (cid:105) = + e I (18) ΣΣΣ( P, u , θθθ • ) = E (cid:104) f f (cid:62) (cid:12)(cid:12)(cid:12) P, u , θθθ • (cid:105) − µµµ ( P, u , θθθ • ) µµµ (cid:62) ( P, u , θθθ • )= E I − e I e (cid:62) I (19)2where we deﬁne the inclination ﬁrst moment integral e I ≡ (cid:90) AAA ( (cid:73) , P, u ) E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) p ( (cid:73) ) d (cid:73) (20)and the inclination second moment integral E I ≡ (cid:90) AAA ( (cid:73) , P, u ) E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) AAA (cid:62) ( (cid:73) , P, u ) p ( (cid:73) ) d (cid:73) , (21)and (cid:73) is the random variable corresponding to the inclination. The expectationsinside the integrals in the expressions for e I and E I are given by Equations (11) and(12), respectively, and are computed in Appendix C. If we are able to perform theintegrals in those expressions, we can dramatically reduce the number of parametersin our ensemble problem. As we show in Appendix 2.4, if we assume that stellarinclinations are distributed isotropically, these integrals do in fact have closed-formsolutions.Finally, for future reference, it is useful to note that the mean of the GP is constant: µµµ ( P, u , θθθ • ) = (1 + e I ) ≡ µ , (22)since by construction our GP is longitudinally isotropic (see Appendix D.2).2.5. Normalization correction

In Paper I we discussed a subtle but important point about stellar light curves:the common procedure of normalizing light curves to their mean or median levelchanges the covariance structure of the data, since it correlates all the observationsin a nontrivial way. When normalizing a light curve by the mean, the operation weperform is ˜ f = f (cid:104) (cid:102) (cid:105) , (23)where ˜ f is the normalized, unit-mean light curve, f is the measured light curve (indetector counts), and (cid:104) (cid:102) (cid:105) is the sample mean: i.e., the average value of a given star’slight curve (which we model as a sample from our GP). This may be close to butis in general diﬀerent from the process mean, µµµ ( P, u , θθθ • ) , since the mean of a drawfrom the GP is itself normally distributed with a variance that scales with the GPvariance. When modeling normalized light curves, we must correct our expression for thecovariance matrix

ΣΣΣ of the GP. Computing the new covariance matrix is tricky, es-pecially because the normalized process is not strictly Gaussian: the distribution of In practice, the expressions derived here also work well for median-normalized light curves, sincethe distribution of the GP sample median is usually close to the distribution of the sample mean. Importantly, the sample mean and process mean will be diﬀerent even in the absence of mea-surement error! In other words, the mean ﬂux of a given star (i.e., the sample mean) will in generalbe diﬀerent from the mean ﬂux across all stars with similar surface properties (the process mean). ˜ f diverges as the samplemean approaches zero. In fact, because of these tails, the covariance of the normalizedprocess is formally inﬁnite , since the probability of drawing a sample whose mean isarbitrarily close to zero is ﬁnite.If this is all starting to sound like a bad idea, that’s because it is! A much saferapproach is to resist the temptation to normalize the light curve and instead modelthe (unknown) amplitude of the data as a multiplicative latent variable. However,this would require an extra parameter for every light curve , so the computationalsavings we achieved by marginalizing out the inclination would be gone. Fortunately,in practice, the variance of a stellar light curve is usually small compared to its mean:stellar variability amplitudes are typically at the level of a few percent or lower. Whenthis is the case, the probability of drawing a GP sample whose mean is close to zerois extremely small, and we can make use of the approximate expression derived inLuger (2021) for the covariance of a normalized Gaussian process: ˜ΣΣΣ ≈ Aµ ΣΣΣ + z (cid:16) ( A + B ) ( − q ) ( − q ) (cid:62) − A q q (cid:62) (cid:17) , ¸ (24)where z ≡ (cid:104) Σ (cid:105) µ (25)is the ratio of the average element in ΣΣΣ to the square of the mean of the Gaussianprocess, q is the ratio of the average of each row in ΣΣΣ to the average element in

ΣΣΣ , and A , B are order unity and zero scalars, respectively, given by the optimally-truncateddiverging series A ≡ i max (cid:88) i =0 (2 i + 1)!2 i i ! z i (26) B ≡ i max (cid:88) i =0 i (2 i + 1)!2 i i ! z i , (27)where i max is the largest value for which the series coeﬃcient at i max is smaller thanthe coeﬃcient at i max − . In the expressions above, it is assumed that the mean µµµ isconstant, i.e., µµµ = µ . Since our Gaussian process is azimuthally isotropic (i.e., nopreferred longitude), that is the case throughout this paper.What Equation (24) allows us to do is eﬀectively marginalize over the unknownnormalization by modeling the normalized ﬂux as a draw from a Gaussian process: ˜ f ( P, u , θθθ • ) ∼ N (cid:16) , ˜ΣΣΣ ( P, u , θθθ • ) (cid:17) . (28)This is appropriate as long as z (cid:28) , for which the true distribution of ˜ f is approxi-mately Gaussian. In practice, we recommend employing this trick only for z (cid:46) . ,4 Figure 2.

An example of a ﬂux covariance matrix

ΣΣΣ for a starry process (left) and thecorresponding covariance of the normalized process (right), computed from Equation (24).In addition to an oﬀset and an overall scaling relative to the original covariance matrix, thecovariance of the normalized process is discernibly non-stationary. - for which the error in the approximation to the covariance is less than − . In caseswhere the light curve variability exceeds about ten percent, we recommend model-ing the multiplicative amplitude in each light curve as a latent variable, as discussedabove.Figure 2 shows an example of a covariance matrix normalized according to theprocedure outlined above. The principal diﬀerence between the normalized covarianceand the original covariance is an overall scaling and a small oﬀset. However, thenormalization also results in the process becoming non-stationary: the covariancebetween two points in a light curve is now slightly dependent on their phases.2.6. Summary

As the computation of the GP relies on many interdependent equations scatteredthroughout the previous sections and the Appendix, it is useful to summarize theprocedure for the case where we marginalize over the inclination (§2.4) and the lightcurves are normalized to their means (§2.5), which is likely to be the primary use casefor our algorithm.We model the mean-normalized ﬂux ˜ f (Equation 23) as a Gaussian process: ˜ f ( P, u , θθθ • ) ∼ N (cid:16) , ˜ΣΣΣ ( P, u , θθθ • ) (cid:17) . (29)The hyperparameters of the GP are the stellar rotation period P , the vector of limbdarkening coeﬃcients u , and the vector of parameters describing the spot distribution θθθ • = ( n c µ φ σ φ r ) (cid:62) , (30)consisting of the number of spots n , their contrast c (the fractional intensity diﬀerencebetween the background and the spot), the mode µ φ and standard deviation σ φ of the5latitude distribution, and the radius of the spots r . The quantity ˜ΣΣΣ is the covarianceof the normalized process (Equation 24), which is a straightforward correction to thetrue covariance of the process, accounting for changes in scale and phase introducedby the common process of normalizing light curves to a mean of unity. It dependson the true (constant) mean µ and true covariance ΣΣΣ , given by Equations (18) and(19), respectively. Those expressions in turn depend on the inclination expectationintegrals e I (Appendix D.2) and E I (Appendix D.3). Those, in turn, depend onthe ﬁrst and second moments of the distribution of spherical harmonic coeﬃcientvectors, E (cid:2) y (cid:12)(cid:12) θθθ • (cid:3) and E (cid:2) y y (cid:62) (cid:12)(cid:12) θθθ • (cid:3) , given by Equations (C18) and (C19), respectively.To compute those, we must evaluate four nested integrals (Equations C20–C23 forthe ﬁrst moment and C24–C27 for the second moment), corresponding to integralsover the radius, latitude, longitude, and contrast distributions, respectively. Thecomputation of these integrals is discussed at length in Appendix C.While lengthy (and quite tedious), all of the computations described above relyon equations whose solutions have a closed form. Moreover, most of the terms inthe expectation vectors and matrices may be computed recursively, and many maybe pre-computed, as they do not depend on user inputs. It is therefore possible toevaluate ˜ΣΣΣ in an eﬃcient manner. In the companion paper (Luger et al. 2021a), wediscuss our implementation of the algorithm in a user-friendly

Python package.2.7.

An example

A concrete example of the GP derived above is presented in Figure 3, where weshow random samples from the process evaluated up to spherical harmonic degree l max = 15 and conditioned on diﬀerent values of the hyperparameter vector θθθ • . Eachcolumn corresponds to a diﬀerent random draw from the GP, while each row corre-sponds to a diﬀerent value of θθθ • . The images are intensity maps of the stellar surfaceseen in an equal-area Mollweide projection, in units such that a spotless star wouldhave intensity equal to 1 everywhere. Below them are the corresponding light curves(in units of parts per thousand deviation from the mean) over four rotation cycles,seen at inclinations varying from ◦ (yellow) to ◦ (dark blue), and assuming nolimb darkening (i.e., u = ). From top to bottom, the hyperparameter vectors θθθ • foreach row are ( n c µ φ σ φ r ) (cid:62) = (10 . .

10 30 . .

00 10 . (cid:62) (31a) = (10 . .

10 60 . .

00 10 . (cid:62) (31b) = (10 . .

05 0 .

00 5 .

00 30 . (cid:62) (31c) = (20 . .

10 0 .

00 40 . . (cid:62) . (31d) The exception to this is the normalization correction (§2.5), which depends on a fast-to-evaluateseries and thus adds negligible overhead to the computation. (a) rotations f l u x [ pp t ] (b) rotations f l u x [ pp t ] (c) rotations f l u x [ pp t ] (d) rotations f l u x [ pp t ] i n t e n s i t y i n t e n s i t y i n t e n s i t y i n t e n s i t y Figure 3.

Five random samples from the GP prior (columns) conditioned on four diﬀerentvalues of θθθ • (rows). The samples are shown on the surface of the star in a Mollweide pro-jection (upper panels) alongside their corresponding light curves viewed over four rotationcycles at several diﬀerent inclinations (lower panels). From top to bottom, the GP corre-sponds to a star with (a) small mid-latitude spots; (b) small circumpolar spots; (c) largeequatorial spots; and (d) small isotropic spots. See text for details. - These correspond to (a) 10 spots of radius ◦ centered at ◦ ± ◦ latitude with acontrast of ; (b) 10 spots of radius ◦ centered at ◦ ± ◦ latitude with a contrastof ; (c) 10 spots of radius ◦ centered at ◦ ± ◦ latitude with a contrast of ;and (d) 20 spots of radius ◦ centered at ◦ ± ◦ latitude (a good approximation toa perfectly isotropic distribution; see Appendix C.2) with a contrast of .The surface maps in the ﬁgure show dark, compact features of roughly the ex-pected size and contrast and at the expected latitudes. However, there are importantdiﬀerences between these maps and what we would obtain by procedurally addingdiscrete circular spots to a gridded stellar surface:1. The spots are not circular.

This is most evident in row (c), where some spotsare distinctly asymmetric.72.

There is signiﬁcant variance in contrast from one spot to another , even thoughour model implicitly treats c as constant. Within spots, the contrast is also notconstant, even though (again) our model implictly treats it as such.3. There is ringing in the background.

This is apparent to some extent in row (b),where there are small ﬂuctuations in the brightness at low latitudes where nospots are present.4.

There aren’t exactly 10 (or 20) spots in those maps.

This is most obvious inrow (c), where only a few large distinct spots, plus maybe a few smaller ones,are visible.5.

There are bright spots in addition to dark spots.

This may be the most glaringissue. We explicitly model spots as being dark, and yet there are (almost)just as many bright spots, particularly in row (d). While bright spots (such asplages) certainly exist in reality, we did not explicitly ask for them here!While these may appear to be critical shortcomings of our model, it is important tokeep in mind that a model consisting of discrete, circular, constant contrast spotsis likely just as far (or perhaps even farther) from the truth. In fact, points (1)and (2) above suggest our model is more ﬂexible than the discrete spot model andthus (potentially) better suited to modeling real stellar surfaces. Points (3), (4),and (5), on the other hand, are more concerning, since they are due, respectively, totruncation error in the spherical harmonic expansion, to an intrinsic limitation of ourGaussian approximation, and to the fact that Gaussian distributions are symmetricabout the mean: a positive deviation is just as likely as a negative deviation ofthe same magnitude. However, as we have argued before, the true power of theGP is in its applicability to inference problems. In other words, while our GP hassome undesirable features when used as a prior sampling distribution, the real testof the GP is when it is faced with data in an inference setting. As long as the datais suﬃciently informative, it does not matter that the prior has ﬁnite support forunphysical conﬁgurations, as those will be conﬁdently rejected.In §3 below, we exhaustively test the performance of our GP as an inference toolwhen used to model synthetic light curves. We will show that, despite the issuesraised above, the GP is in general unbiased and correctly estimates the posteriorvariance when used to infer the spot properties θθθ • .But before we dive into calibration tests, it is worth pausing for a moment to takeanother look at Figure 3. While we focused on the shortcomings of the GP as a prior inthe discussion above, it is important to appreciate that it even works in the ﬁrst place!A Gaussian process, after all, is a non-parameteric process describing a smooth andcontinuous function only via its covariance structure. The GP knows nothing about Even still, the model favors dark spots over bright spots because the GP mean itself is lowerthan unity, in practice making positive deviations from unity less likely than negative deviations.This is why there are usually more dark spots than bright spots in the samples shown in the ﬁgure. CALIBRATION3.1.

Why we need calibration tests

In the previous section we derived a closed form solution to the Gaussian approx-imation to the distribution of stellar surfaces (and their corresponding light curves)conditioned on a vector θθθ • of spot hyperparameters. As we mentioned, the real testof this GP is in how well it performs as a likelihood function for stellar light curves.It is not immediately obvious that the GP approach should work, because the truemarginal likelihood function p ( ˜ f (cid:12)(cid:12) P, u , θθθ • ) is certainly not Gaussian. To see why, letus generate stellar surfaces sampled from the true distribution we are trying tomodel: that is, a surface with n = 5 discrete circular spots of ﬁxed, uniform contrast c = 0 . and radius r = 20 ◦ at latitudes µ φ ± σ φ = 30 ◦ ± ◦ . Let us then expand eachsurface in spherical harmonics and visualize the distribution of coeﬃcients y . Figure 4shows the joint distribution for ﬁve of the coeﬃcients with the most non-Gaussianmarginal distributions (selected by eye). Diﬀerent slices through this distribution (inblack) are skewed, strongly peaked, non-linearly correlated, and even bimodal. Ourapproach in this paper is to model this distribution as a Gaussian (orange contours).While this may be a good approximation in certain regions of parameter space, it iscertainly a poor approximation in others. In this section, we will show that, fortu-nately, the non-Gaussianity of the distribution is not in general an issue when doinginference with our GP, as the resulting posteriors are correctly calibrated.3.2. Setup

We seek to demonstrate that our model is correctly calibrated by testing it onsynthetic data, which we generate as follows. For each of M synthetic light curvesin a given ensemble, we create a rectangular (150 × latitude-longitude grid ofsurface intensity values, all intialized to zero. We then add (cid:110) spots to this grid,each of fractional contrast (cid:99) and radius (cid:114) centered at latitude φ and longitude λ , bydecreasing the intensity at all points within an angular distance (cid:114) (measured alongthe surface of the sphere) by an amount (cid:99) . In order to compute the correspondinglight curve, we expand the surface in spherical harmonics, although at much higherdegree ( l (0)max = 30 ) than the degree we will use in the inference step ( l max = 15 )9 Symbol Description Value (cid:110) number of spots ∼ N (20 , ) (cid:99) spot contrast ∼ N (0 . , ) φ spot latitude ∼ N (30 ◦ , ◦ ) λ spot longitude ∼ U (0 ◦ , ◦ ) (cid:114) spot radius ∼ N (15 ◦ , ◦ ) (cid:73) stellar inclination ∼ sin P rotational period day u limb darkening coeﬃcients (0 0) (cid:62) σ f photometric uncertainty − K number of cadences per light curve ∆ t time baseline days M number of light curves in ensemble Table 1 . Default parameters used to generate synthetic lightcurves in the calibration tests. to minimize potential ringing eﬀects or other artefacts in the synthetic data. Forreference, the chosen degree l max = 30 is large enough to resolve features on the orderof ◦ / = 6 ◦ across, but small enough that the algorithm for computing the lightcurve is numerically stable. We compute the light curve at K points equally spacedover a baseline ∆ t using the starry algorithm (Appendix B), assuming an inclination (cid:73) ,a rotational period P , and limb darkening coeﬃcients u . Finally, we divide the lightcurve by the mean and add Gaussian noise with standard deviation σ f to emulatephoton noise. The default values/distributions of each of the parameters mentionedabove are given in Table 1. Some, like the number of spots, their contrast, etc., aredrawn from ﬁducial distributions, while others, like the photometric uncertainty, therotational period, and the limb darkening coeﬃcients, are ﬁxed across all M lightcurves in an ensemble. These ﬁxed values are not realistic, but they greatly speedup the inference step, since they allow us to invert a single covariance matrix for alllight curves when computing the log likelihood. A more principled approach would perhaps be to geneate light curves using a completely diﬀerentmodel, such as by discretizing the surface at very high resolution and computing the ﬂux via aweighted sum of the visible pixels. However, this would take orders of magnitude longer than theadopted approach and would still be subject to artefacts due to the discretization scheme. Wehave gone to great lengths in Luger et al. (2019) to show that our ﬂux computation from sphericalharmonics is both accurate and precise, so we are conﬁdent that our synthetic light curves correctlyrepresent the assumed spot distributions. In theory, we should do this in the reverse order: we should add photon noise and then normalizethe light curve to the mean, as that is the order in which those steps occur in reality. However, if wedid that, we would have to normalize σ f in our inference step, such that the variance of each of thenormalized light curves in the ensemble would be diﬀerent, requiring us to invert a diﬀerent matrixfor each light curve when computing the log likelihood (see Equation 14). This would signiﬁcantlyincrease the computational cost of our tests. Fortunately, in practice, the diﬀerence between thesetwo approaches has negligible eﬀect on our results, so we opt for the faster of the two methods.Note, of course, that when applying our GP to real data, we won’t have this choice! Figure 4.

Corner plot showing the joint distributions of select spherical harmonic coeﬃ-cients corresponding to the l = 15 expansion of × stellar surface maps drawn from acertain distribution of discrete circular spots (black points and contours). At l = 15 there are256 coeﬃcients in total; we chose ﬁve of the coeﬃcients with the most odd-looking distribu-tions to illustrate the non-Gaussianity of the process. In addition to non-linear correlations,skewness, and the existence of points of very high curvature, some of the distributions arealso multi-modal. The orange contours show slices of the Gaussian approximation to thejoint distribution; this is the approximation adopted in this paper. - i n t e n s i t y i n t e n s i t y i n t e n s i t y i n t e n s i t y i n t e n s i t y i n t e n s i t y i n t e n s i t y Figure 5.

A synthetic dataset generated by adding discrete spots to a stellar surface withparameters given by the default values listed in Table 1. The surface maps for 49 syntheticstars are shown in a Mollweide equal-area projection above their corresponding light curves.All maps and light curves are plotted on the same scale. The × on each map indicates thesub-observer latitude assumed when generating the light curve. The blue curves correspondto the exact light curve, while the black dots are the observed light curve. - Inference

We use our

Python -based implementation of the GP (Luger et al. 2021a) to per-form inference on the synthetic datasets. For simplicity, we assume we know the valueof the period P , which is ﬁxed at unity for all stars, as well as the value of the limbdarkening coeﬃcients (ﬁxed at zero for the default run). In practice these will not beknown exactly; we discuss this further in §5.3. Since we explicitly marginalize overthe inclinations of individual stars, the only quantities we must constrain are the ﬁveparameters in the spot parameter vector θθθ • (Equation 13). We experimented withthree diﬀerent methods for doing posterior inference on our synthetic datasets: No-U Turn Sampling, a variant of Hamiltonian Monte Carlo (NUTS; Duane et al. 1987;Hoﬀman & Gelman 2011), automatic diﬀerentiation variational inference (ADVI; Ku-cukelbir et al. 2016; Blei et al. 2016), and nested sampling (Skilling 2004, 2006). Weobtained the best performance using the nested sampling algorithm implemented inthe dynesty package (Speagle 2020), so that is what we will use below.Our sampling parameters are the number of spots n , their contrast c , their radius r ,and the Beta distribution parameters a and b describing their distribution in latitude.As we discuss in Appendix C.2, the parameters a and b are easier to sample in thanthe mode µ φ and standard deviation σ φ , provided we account for the Jacobian of thetransformation (Equation C71) in our log probability function, which maps a uniformprior on a and b to a uniform prior on µ φ and σ φ .We place uniform priors on all ﬁve quantities, with support in ≤ n ≤ , < c ≤ , ◦ ≤ r ≤ ◦ , ≤ a ≤ , and ≤ b ≤ . Note that while n formally represents an integer, its eﬀect on the GP is a scaling of the covariance (seeEquation C19); as such, it has support over all real numbers within the bounds listedabove. We could restrict it to integer values, but this would make sampling quitetricky. Moreover, in practice it is useful to allow for noninteger values to add someﬂexibility to the model; we discuss this in more detail in §4.4.We use Equation (14) as our log likelihood term, adding the log of the absolutevalue of Equation (C71) to enforce a uniform prior on µ φ and σ φ . As we mentionedabove, the fact that P , u , and σ f are shared among all M light curves means that ΣΣΣ + C m is the same for all of them, greatly speeding up the likelihood evaluation sincewe need only invert (or factorize) it a single time per sample. Our covariance is thecovariance of the normalized process, given by Equation (24). Since we only considerlight curves with variability limited to a few percent or less, the approximation for ˜ΣΣΣ l max = 15 asa compromise between resolution, computational speed, and numerical stability (seeLuger et al. 2021a).We use the standard implementation of the nested sampler, dynesty.NestedSampler ,with all arguments set to their default values (multi-ellipsoidal decomposition forbounds determination (Feroz et al. 2009), uniform sampling within the bounds, 500live points, and no gradients), to perform our inference step. Convergence—deﬁnedas when the estimate of the remaining evidence ∆ ln Z drops below . —is usuallyattained after , to , samples and within a couple hours on a typical machinefor most of the trials we perform.Below we describe several calibration runs : experiments where we generate anensemble of light curves from synthetic stars with given properties (§3.2) and attemptto infer their statistical spot properties.3.4. Default run

The input parameters for the default run are shown in Table 1, and the corre-sponding light curves in Figure 5. We run the nested sampler as described in theprevious section and transform the posteriors in a and b to posteriors in µ φ and σ φ via Equations (C63), (C66), and (C68). The results are shown in Figure 6, where wecorrectly infer all ﬁve parameters within − standard deviations. Posterior distri-butions for the spot radius r , the central spot latitude µ φ , and the spot contrast c arefairly tight, while the distribution for the latitudinal scatter σ φ has wider tails andthe distribution for the number of spots n is very poorly constrained. The latter, inparticular, is degenerate with the spot contrast c ; we discuss this at length in PaperI. Figure 7 shows samples from the spot latitude posterior (hyper)distribution. Sincethe parameters µ φ and σ φ characterize a distribution over spot latitudes, uncertaintyin their values translates to uncertainty in the actual shape of the spot latitude dis-tribution. Thus, the collection of blue curves in Figure 7 quantiﬁes our knowledgeof how spots are distributed on the surfaces of the stars in the dataset. These dis-tributions are again consistent with the true distribution used to generate the spots(orange curve) within less than 2 standard deviations. Even though we explicitly marginalized over inclination, we can still derive pos-terior constraints on the inclinations of the individual stars in our ensemble by com-puting the log-likelihood as a function of I conditioned on the value of θθθ • from aparticular draw from the posterior. We do this in Figure 8, where blue curves againcorrespond to samples from the posterior hyperdistribution, and orange lines indicatethe true inclination; the panels are arranged in the same order as those in Figure 5. If the results in Figure 7 seem biased, recall from Figure 6 that the mean of the latitudedistribution is consistent with the truth at − σ . That is roughly the diﬀerence between theorange curve and the average of the blue curves. As we will see, inference with a larger ensemble(Figure S23) allows us to infer the mean latitude to within about two degrees. . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n Figure 6.

Posterior distributions for the spot parameters θθθ • (radius r in degrees, latitudemode µ φ and standard deviation σ φ in degrees, fractional contrast c , and number of spots n ) for the default run (Table 1 and Figure 5). The axes span the entire prior volume, andthe orange lines and markers indicate the true (input) values. - In almost all cases, we are able to constrain the inclinations of individual stars towithin about ◦ , consistent with the truth at less than − σ .The results in Figures 6—8 are based on a single run: i.e., a single realization ofthe light curve ensemble conditioned on the properties of Table 1. To properly gaugepotential biases in our model, it is useful to perform the run under many diﬀerentrealizations of the dataset. We therefore generate 100 ensembles of light curves inexactly the same way as above and perform inference on each of them. Figure 9 showsthe marginal and joint posterior distributions for θθθ • for all 100 trials. Posteriors forindividual trials are shown as the transluscent colored curves (in the marginal plots)and as ellipses bounding the σ posterior level (in the joint posterior plots). The5 -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure 7.

Posterior distributions for the spot latitudes for the default run. Each bluecurve corresponds to the Beta distribution PDF (Equation C61) for the spot latitude withparameters drawn from the posterior in µ φ and σ φ (Figure 6); the (hyper)distribution ofblue curves quantiﬁes our beliefs about how spots are distributed on any given star. Theorange curve is the true distribution used to generate the spots (see Table 1). - inclination p r o b a b ili t y Figure 8.

Posterior distributions for the inclinations of individual stars; individual panelscorrespond directly to those in Figure 5. As in Figure 7, each blue curve is a sample fromthe hyperdistribution of inclination PDFs, conditioned on a speciﬁc value of θθθ • drawn fromthe posterior. The orange lines indicate the true inclination of each star. - black curves show the marginal distributions of all samples across all trials, and theblack contours show the corresponding σ levels in the joint posterior plots.If our model is truly unbiased, in the limit of an inﬁnite number of realizationsof the data, the expectation value of the distribution of samples across all ensembles(the mean of the black curves in the marginal posterior plots) should coincide withthe true values (orange lines). This is approximately the case with the spot size r posterior: on average, the posterior distributions are centered on the correct value.However, this is not the case for the spot latitude parameters µ φ and σ φ , for whichour posterior means are biased high. While the modes of their posteriors are very6 Figure 9.

Similar to Figure 6, but showing the posterior distribution for 100 diﬀerentsynthetic datasets, all generated from the same default input parameters (Table 1). Eachcolored histogram corresponds to a single run, with the corresponding σ contours shownas shaded ellipses for each pair of parameters. The black histograms correspond to thedistributions of all samples from each of the 100 runs, and the black curves again indicate σ contours in the joint posterior. - d e n s i t y measured (0,1) Figure 10.

Distribution of stellar inclination residuals normalized to the posterior standarddeviation for all , stars across the 100 trials in Figure 9 (blue). The standard normaldistribution is shown in orange for comparison. The inclination posteriors inferred with ourGP are largely unbiased and have the expected variance. - Figure 11.

Same as Figure 9, but showing the posterior distributions of 100 realizationsof ensembles with M = 1 , (instead of 50) light curves each. The constraints on mosts ofthe parameters are much tighter, and the polar spot degeneracy is gone. - close to the true values, the distributions have long tails extending to high latitudesand high variance, respectively.The reason for this bias has to do with the normalization degeneracy discussed inPaper I: the total spottiness of a stellar surface is not an observable in single-bandphotometry. In particular, this means spots near the poles lie almost entirely inthe null space. Applied to the problem at hand, this degeneracy makes it diﬃcultto distinguish between stars with spots concentrated exclusively at mid-latitudes (inthis case, the truth) and stars with spots centered closer to the poles but with largelatitudinal variance. The latter conﬁguration leads to many spots close to the poles,whose eﬀect on the (relative) light curve is negligible, and some spots at mid-latitudes,whose eﬀect on the light curve is similar to that of the former conﬁguration. Thus, thedata alone cannot be used to discriminate between these two scenarios, introducingthe degeneracy we see in the posterior. In fact, it is clear that in the tails of thedistribution, the mean spot latitude and the standard deviation of spot latitudes arepositively correlated. The bias we see is therefore not a shortcoming of the model, but8of the data itself. To get around this, we either need to impose stronger priors on µ φ and σ φ (§4.4), observe in multiple wavelength bands (Paper I), or simply collect moredata. As we will see below, the particular degeneracy described above is not perfect:for very large ( M ∼ , ) ensembles, high-variance polar spots can conﬁdently beruled out.The posteriors for the contrast c and the number of spots n are mostly unbiased.The contrast distribution has a bit of a tail; inspection of Figure 9 reveals that it too ispositively correlated with the mean spot latitude and therefore suﬀers from the samedegeneracy as above. And while the mean of the spot number distribution is roughlycorrect, the posterior is nearly unchanged across all runs, and equally uninformativein all of them. This is yet another manifestation of the normalization degeneracy:the total number of spots is not an observable in single-band photometry (Paper I).There is one ﬁnal distribution that is instructive to consider: the distributionof errors on the inferred stellar inclination. Figure 10 shows a histogram of stellarinclination residuals (posterior mean minus true value) normalized to the posteriorstandard deviation for all stars across the 100 trials described above. For a correctlycalibrated model, this distribution should equal the standard normal N (0 , ) in thelimit of inﬁnite trials. This, in fact, is roughly what we ﬁnd (compare to the orangehistogram in the ﬁgure). Our posterior has marginally heavier tails, meaning wetend to slightly underestimate the posterior variance, but in general it is an excellentestimator of individual stellar inclinations.Finally, Figure 11 shows the same posterior distributions as in Figure 9, but for100 runs each with M = 1 , light curves. In addition to the constraints on allparameters (except the number of spots) being much tighter, the larger amount ofdata breaks the polar spot degeneracy discussed above. Given enough light curves,the model is capable of diﬀerentiating between concentrated mid-latitude spots andhigh-latitude spots with large variance. Interestingly, the inferred radius appears tobe biased high by a small amount. This is likely due to the fact that our prescriptionfor generating the spots (§3.2) is diﬀerent from how we actually model these spots.While we generate the spots as compact circular disks expanded at high sphericalharmonic degree, we model them as sigmoids (§C.1) expanded at signiﬁcantly lowerspherical harmonic degree. Some minor disagreement is therefore to be expected inthe inferred radii. 3.5. Other runs

In this section we test the robustness of our model by changing one or more ofthe ﬁducial values listed in Table 1. Each of the runs below corresponds to a singlerealization of the ensemble dataset, and the corresponding ﬁgures are presented atthe end of the Appendix.Figures S17—S20 show the results for diﬀerent latitudinal distributions, keepingall other values in Table 1 the same. Speciﬁcally, Figure S17 corresponds to a run9with mid-latitude ( µ φ = 45 ◦ and σ φ = 5 ◦ ) spots, Figure S18 to a run with high-latitude ( µ φ = 60 ◦ and σ φ = 5 ◦ ) spots, Figure S19 to a run with equatorial ( µ φ = 0 ◦ and σ φ = 5 ◦ ) spots, and Figure S20 to a run with isotropically-distributed spots( φ ∼ cos ). The results are largely consistent with those of the default run: in all caseswe infer the correct spot radius, the mean and standard deviation of the spot latitude,and the spot contrast within − σ ; the number of spots is equally unconstrained inall runs. In Figure S18 and to a lesser extent in Figure S17, the polar spot degeneracydiscussed above is evident, particularly in the lower panels showing the latitudinaldistribution of spots. Nevertheless, the distribution peaks near the correct latitude inboth cases. Figures S19 and S20 are interesting because, while the true latitude dis-tribution is unimodal, most of the posterior samples are not. In the equatorial case,the posterior peaks at very low (but nonzero) latitudes and σ φ appears to be incon-sistent with the true value at many standard deviations; however, recall that σ φ is a local approximation the standard deviation of the PDF at the mode (Appendix C.2),which deviates from the true standard deviation (i.e., the square root of the variance,computed from the expectation of the second moment of the distribution) when thetwo modes are very closely spaced. In fact, the latitude PDF samples (lower panel inthe ﬁgure) nearly span the true distribution, to the extent that our parametrizationof the latitude distribution can approximate a zero-mean Gaussian. While the Betadistribution in cos φ can be unimodal in φ (see Equation C61 and the ﬁrst columnof Figure 16), this happens only when β = 0 , which occupies an inﬁnitesimally thinhyperplane in parameter space. In practice, the majority of the posterior mass willbe close to but not exactly at β = 0 , leading to the bimodality in the ﬁgure. Thesame argument applies to Figure S20. In both cases, the posterior approximates thetrue distribution as best it can given the constraints of the adopted PDF.Figure S21 tests the performance of the model on light curves of stars with spotsmuch smaller than the eﬀective resolution of the GP. Our expansion to l max = 15 only allows us to model spots with radii r (cid:38) ◦ (see Figure 15), so we place zeroprior mass below this value. The ﬁgure shows the results of inference on a datasetgenerated from spots with r = 3 ◦ (and an increased contrast c = 1 to enforce acomparable signal-to-noise to the other trials). On the Sun, these would correspondto spots with diameters of about , km—typical of the larger spots during solarmaximum. While the radius posterior is biased (as it must be, given our prior), thefact that it peaks at the lower bound of the prior suggests the presence of spots smallerthan the model can capture. More importantly, however, the latitudinal parametersare inferred correctly and at fairly high precision: even though or model is biasedagainst small spots, this does not aﬀect inference about their latitudes. On the otherhand, the spot contrast is wrong by many standard deviations, since the model mustcompensate for the fact that the radii are biased high with a lower contrast to matchthe variability amplitude of the light curves.0Figures S22 and S23 show results for the default run but with extreme values ofthe number of light curves in the ensemble: M = 1 and M = 1 , , respectively.These two ﬁgures underscore the power of ensemble analyses: a single light curve(Figure S22) is simply not informative enough about the properties of its spots. Onthe other hand, a very large ensemble can be extremely informative: the radius,latitude, and even the contrast are inferred correctly at high precision.Figure S24—S26 show results for the default run but with limb darkening. In allcases we assume quadratic limb darkening with ﬁducial values u = 0 . and u = 0 . for all stars. From Figure S24, in which we assume we know the limb darkening coef-ﬁcients exactly, it is clear that the presence of limb darkening signiﬁcantly degradesour ability to infer both the radii and latitudes of the spots. Limb darkening has acomplicated eﬀect on the mapping between surface features and disk-integrated ﬂux,as it reveals information about odd harmonics at the expense of introducing strongdegeneracies with the even harmonics (Paper I). In practice, this leads to higher un-certainty in the spot radii and latitudes relative to the same dataset without limbdarkening (Figure 6). Fortunately, this uncertainty can be dramatically reduced withmore data, as evident in Figure S25, which shows the results of the same run butwith M = 1 , light curves in the ensemble. The constraints on r , µ φ , and σ φ arenow much tighter and in good agreement with the truth. Finally, Figure S26 showsthe results of inference on limb-darkened light curves under the (wrong) assumptionthat limb darkening is not present ( u = ). Neglecting the eﬀect of limb darkeningcan lead to biases in the spot radius and latitude parameters. While the model stillfavors mid-latitude spots (at ∼ ◦ instead of ◦ ), the constraints are deceptivelytight and discrepant by many standard deviations. We discuss these points in moredetail in §4.4.The runs so far correspond to stars with many ( n = 20 ) spots, for which theresulting light curves are smooth due to the fact that many spots are in view at anygiven time. Figures S27 and S28 show what happens when the model is applied tostars with n = 2 and n = 1 spots, respectively. Despite large portions of the lightcurves being ﬂat (and therefore extremely non-stationary) in these scenarios, the GPdoes surprisingly well, recovering the radii and latitude parameters within − σ inboth cases. Note that in order to preserve the same signal-to-noise ratio relative tothe other runs, we gave the spots in Figure S27 a much higher contrast ( c = 0 . ).Even though the contrast is degenerate with the number of spots (which is very poorlyconstrained), the c posterior has a much heavier tail than in the other runs. Thus, inspite of the arguments in Luger et al. (2021b) about the diﬃculty in constraining c and n from single-band photometry, it is evident that the full covariance structure ofthe data encodes some information about the contrast and—to a much lesser extent—the number of spots. In Figure S28, we compensate for the smaller number of spotsby increasing the spot radius to r = 45 ◦ ± ◦ instead, showing that the model can1accurately model large spots, even in the presence of some (unmodeled) scatter intheir sizes.In Figure S29 we add variance to all the spot properties when generating the lightcurves: we add n = 20 ± spots to each star with radii r = 15 ◦ ± ◦ , contrasts c = 0 . ± . , and at latitudes φ = 30 ◦ ± ◦ . As before, we only explicitly accountfor the variance of the latitude distribution in our model. We correctly infer thelatitude parameters and the contrast, but our radii appear to be biased high. Thisis likely due to the fact that larger spots have a bigger impact on the signal, so ourinferred radius is a weighted average of all spot radii. In Appendix C.1 we derive anexpression for the moment integrals of the spot size distribution assuming a uniformdistribution between r − ∆ r and r + ∆ r (instead of a delta function at r ), which canbe used to compute the GP if one wishes to explicitly account for scatter in the spotsizes. We ﬁnd that repeating the run shown in Figure S29 while explicitly samplingover the distribution in ∆ r shifts the posterior mass to lower radii, mitigating thebias described above.Our ﬁnal run is shown in Figure S30, in which we assume we know the truenormalization of each light curve. That is, we assume that we can measure all lightcurves in units of the ﬂux we would measure if the stars had no spots on them, andwe do not normalize them (see §2.5). In practice, this would require knowledge of thebrightness (or temperature) of the unspotted photosphere, which is not an observablein single-band photometry. This value can in principle be probed, however, in multi-band photometry (e.g., Gully-Santiago et al. 2017; Guo et al. 2018), for which this runis extremely relevant. We again recover the radii and latitude parameters to within − σ , but most importantly, we also infer the correct spot contrast and the correctnumber of spots with fairly high precision. In particular, knowledge of the correctnormalization breaks the c − n degeneracy. Photometric measurements in multiplebands (even just two!) are therefore extremely useful when inferring spot properties.We discussed this point in Paper I. DISCUSSION4.1.

Small spots

One of the biggest downsides of adopting a spherical harmonic representation ofthe stellar surface as the foundation of our GP is the inherent limitation it imposes onthe resolution of surface features. In order to maximize computational eﬃciency andnumerical stability, our default approach is to model the surface using an expansionup to degree l max = 15 , which can model features only as small as ∼ ◦ / =12 ◦ across. Even on scales slightly larger than this, the presence of ringing canbe seen (see panel (b) in Figure 3, where ringing is just barely noticeable in theequatorial region of the maps). The spherical harmonic basis consists of global modes,all of which contribute to the intensity everywhere on the surface. Localized featuresrequire constructive interference of modes inside and destructive interference of modes2outside, often leading to a wave-like ringing pattern that gets worse as the size ofthe features gets smaller. Taken at face value, this might suggest that a diﬀerentbasis—such as the common choice of pixels on a grid, or perhaps a localized waveletbasis—would be better at modeling small spots. While this is probably true, it maybe quite diﬃcult to ﬁnd closed-form expressions for the expectation integrals (§C)that make the GP covariance evaluation tractable.One option is to bypass the computation of the covariance on the surface of thestar and to write down an expression for the ﬂux directly in terms of the propertiesof a starspot. Circular spots of uniform intensity can be modeled as spherical caps,which are simply segments of ellipses when projeted onto the sky. It is possible toexpress in closed form the projected area they cover—and hence their contribution tothe ﬂux f —even in the case where part of the spot is on the hemisphere facing awayfrom the observer. This was done in Luger et al. (2017a), who solved this problemin closed form (see §3.3.2 and Appendix A.3 in that paper). Computing the GP isthen a matter of integrating f and f f (cid:62) , weighted by the hyperparameter PDFs, aswe did in §2.3. However, the expression for f involves square roots and arctangents offunctions of the spot latitude and longitude (see Equation 40 in Luger et al. 2017a),so computing these integrals in closed form is likely to be very diﬃcult. Even if aclosed form solution can be found, incorporating limb darkening (which we arguedcan be extremely important) poses an even greater challenge. It is likely that severalsimpliﬁcations must be made in order to make this approach tractable. This was doneto some extent in Morris (2020b), who derived a closed form expression for the ﬂuxby ignoring certain projection eﬀects, such as the self-occultation of large spots bythe limb of the star, and neglected variations in limb darkening within spots. Sucha model could admit a closed-form solution to the GP covariance and may be betterat capturing the eﬀects of small spots, at the expense of the ability to model largerspots.In principle, small spots can be modeled under our current approach with negligibleringing by simply increasing the degree of the spherical harmonic expansion. Aswe discuss in (Luger et al. 2021a), however, the algorithm presented here becomesunstable for l (cid:38) , so doing so would require a reparametrization of the equations inthe Appendix. Importantly, however, as we showed in §3, the current implementationof the GP is suitable to modeling light curves of stars with spots smaller than thelimiting resolution of ∼ ◦ . Consider Figure S21, which shows the results of doinginference on an ensemble of light curves of stars with small ( r = 3 ◦ ) spots; these arecomparable to some of the larger spots seen on the Sun with diameters of about , km. Even though our inferred radius and contrast are wrong, the fact that the radiusposterior peaks at the lower prior bound of ◦ is strongly suggestive of the presenceof spots smaller than the resolution of the model. Moreover, the latitude mode andstandard deviation posteriors are unbiased , and we correctly infer the presence of low-3 rotations f l u x [ pp t ] i n t e n s i t y i n t e n s i t y darkbright Figure 12.

Comparison of GP samples with dark spots (top) and bright spots (center)alongside their corresponding light curves viewed at I = 75 ◦ (bottom). Without knowledgeof the correct normalization (§2.5), it is very diﬃcult to diﬀerentiate between the two fromstellar light curves. - variance, mid-latitude spots. With the above caveats in mind, our GP can thereforestill be used to model stars with small spots.4.2. Bright spots

All the calibration tests performed in the previous section assumed the stellarsurface was dominated by dark spots. We can easily model the eﬀect of bright spotsby choosing negative values for the contrast parameter, i.e. c < . The top two panelsof Figure 12 show ﬁve random samples from the GP with dark spots ( c = 0 . ) andbright spots ( c = − . ), respectively; the random seed is the same for both panels,so the maps are identical in all other respects. While the surface maps can be easilydistinguished by eye, the same is not true for the corresponding light curve samples(bottom panel), which are almost identical. There is a slight diﬀerence in amplitudebetween the two cases: surfaces with dark spots have slightly higher light curveamplitudes than surfaces with bright spots of the same contrast magnitude. However,the magnitude of the bright spots can be increased slightly to get a near-perfect matchto the dark spot light curves, meaning it may be diﬃcult (if not impossible) to tellthe diﬀerence between dark and bright spots via the GP approach.The reason for this degeneracy is rooted (once again) in the fundamental issuewith photometry: we lack any information about the correct normalization of thelight curve. Consider the dependence of the (unnormalized) GP covariance on thecontrast: it enters in a single place, via Equation (C107), as c , meaning dark andbright spot models have exactly the same covariance. These models diﬀer only in themean of the unnormalized process, since that is proportional to c (via Equation C106).However, as we argued in Paper I, the mean is not a direct observable. Instead, insingle-band photometry, we are only sensitive to the ratio of the covariance to the4square of the mean (see Equation 24). From that equation, we can deduce thatstars with dark spots (for which the light curve mean µ < ) will therefore havelarger variance than stars with bright spots ( µ > ), leading to the slight diﬀerenceseen in the ﬁgure. However, since this is strictly a multiplicative factor aﬀecting thecovariance, it is degenerate with the two other properties that scale the covariance:the magnitude of the contrast and the total number of spots. We therefore conclude that single-band photometry is largely insensitive tothe diﬀerence between bright spots and dark spots . However, it is importantto bear in mind that the degeneracy described above exists only for the

Gaussianapproximation to the likelihood function. As we argued earlier, the true likelihoodfunction is not a Gaussian; in particular, the true probability distribution has higher-order moments that we do not model here. These moments should in principle encodeinformation about the sign of the spot contrast, but they may be very diﬃcult toinfer in practice. It may be possible to distinguish between dark spots and brightspots with traditional forward models of stellar surfaces, but (as we argued earlier)a statistically rigorous ensemble analysis of stellar light curves using such forwardmodels is probably computationally intractable.We can, however, skirt this degeneracy with observations in multiple bands,which can provide limited information about the correct normalization. Recently,Morris et al. (2018) used approximately coeval

Kepler and

Spitzer light curves ofTRAPPIST-1 to argue that a bright spot model for the star is more consistent withthe data. A detailed exploration of the eﬀect of multi-band photometry on the de-generacies of the mapping problem is deferred to a future paper in this series.4.3.

Comparison to other work

Synthetic likelihoods, random ﬁelds, and approximate inference

The core idea behind the methodology presented in this paper—to compute theGaussian approximation to an intractable multidimensional distribution in order toobtain a likelihood function for inference—is not new. Although the method likelygoes by diﬀerent names in diﬀerent ﬁelds, it is a popular technique particularly inthe ﬁeld of ecological population dynamics, where it is referred to as the syntheticlikelihood (SL; Wood 2010) or Bayesian synthetic likelihood (BSL; Price et al. 2018)method. In many ecological systems, population growth is a chaotic process; obser-vations of the size of a population over time can be dominated by steep spikes anddrops in the population that occur due to sudden, random environmental pressure.While population growth can be forward modeled with ease, it is very diﬃcult to useforward models to constrain basic growth parameters in an inference setting, sincethat requires marginalization over the extremely nonlinear noice processes. As a wayaround this, Wood (2010) introduced the SL method, in which, conditioned on a set There is also a small additive term in Equation (24), but this, too, depends only on the ratioof entries in the covariance matrix to the mean, so it is of little help in breaking the degeneracy. θθθ , one computes the forward model f ( θθθ ) many times underdiﬀerent realizations of the noise, and adopts the sample mean and sample covariance(usually of a summary statistic of the data) as the mean and covariance of the Gaus-sian likelihood function p ( f | θθθ ) . Wood (2010) showed that, provided the number offorward model samples is large enough, this “synthetic” likelihood allows one to inferthe population growth parameters eﬃciently and without bias.The method presented in this paper may be thought of as a synthetic likelihoodmethod in the limit of an inﬁnite number of forward model samples. Unlike Wood(2010), whose method determines the mean and covariance of the distribution ofsome function of f conditioned on θθθ by sampling, we are able to actually computethe mean and covariance of f directly in closed form . While traditional SL methodsare inherently noisy, our method employs the exact Gaussian approximation to thelikelihood function.Our GP is also closely related to techniques commonly employed in models of thecosmic microwave background (CMB). In particular, it is a type of Gaussian randomﬁeld (GRF) on the sphere, which is frequently used to model perturbations in theCMB (Wandelt 2012). In general, however, GRFs used in cosmology are isotropic:when expressed in the spherical harmonic basis, their covariance matrix is diagonaland admits a representation as a (one-dimensional) power spectrum. Our GP, incontrast, is anisotropic in the polar coordinate (i.e., the latitude) by construction.Our method is also related to various families of approximate inference, such asvariational inference (VI), in which a multivariate Gaussian is used to approximatethe posterior distribution (e.g, Blei et al. 2016), or to approximate Bayesian compu-tation (ABC), in which an (often intractable) likelihood function is replaced with anapproximation computed from simulations from the prior (e.g., Beaumont 2019).4.3.2.

Starspots and stellar variability

The methodology developed in this paper is closely related to that in Perger et al.(2020), who studied the eﬀect of diﬀerent starspot conﬁgurations on the autocor-relation and covariance of stellar radial velocity (RV) measurements. The authorsof that study compared the performance of various commonly used quasi-periodickernels when applied to synthetic RV datasets, arguing that a new four-parameterquasi-periodic cosine kernel (QPC) can better capture the variability due to starspots.However, their study was empirical and related spot conﬁgurations to their eﬀect onthe covariance structure of the data primarily in a qualitative fashion. Their QPCkernel is a function of two interpretable hyperparameters (the rotation period and aspot timescale) as well as two amplitudes, which are not explicitly related to physicalspot properties. Our GP, in contrast, is built from the ground up such that all of itshyperparameters directly correspond to physical spot properties, allowing one to useit in starspot inference (not just marginalization) problems. While the methodology6presented here applies to photometry, it is possible to extend it to model RV datasetsas well; we discuss this in §5.Recently, Morris (2020b) used

Kepler , K2 , and TESS light curves to derive a rela-tionship between stellar age and spot coverage using an ensemble analysis similar tothat proposed here. Because of the intractability of the marginal likelihood function,that study used an approximate Bayesian computation (ABC) method to infer spotproperties from a large ensemble of stars. Morris (2020b) developed a fast, approx-imate forward model for light curves of spotted stars ( ﬂeck ; Morris 2020a), whichthey used to generate a large number of prior samples for diﬀerent values of the spotradii, contrasts, and latitude distributions. For each collection of samples generatedfrom a given set of hyperparameters, Morris (2020b) computed the distribution of the“smoothed amplitude,” the peak-to-trough diﬀerence of the (normalized, de-trended)light curve. This distribution was then compared to the distribution of observedsmoothed amplitude values among stellar clusters of diﬀerent ages within an ABCalgorithm, yielding approximate posterior distributions for the hyperparameters asa function of stellar age. While we believe the spot coverage results of that paperare predominantly driven by the prior (due to the strong degeneracy between thespot contrast and the number of spots; see §4.2 in Luger et al. 2021b), the ensembleanalysis employed in that paper is nevertheless a powerful technique to infer spotproperties. Our work builds on that of Morris (2020b) by deriving a closed formsolution to the likelihood function (as opposed to a sample-based likelihood-free in-ference algorithm) and by harnessing the covariance structure of the data when doinginference (as opposed to relying solely on the amplitude of the data).Finally, Basri & Shah (2020) recently presented a large suite of forward models oflight curves of spotted stars, which they used to discuss the (complicated) dependenceof various light curve metrics on the physical spot parameters used to generate thedata. They concluded that it is not possible to uniquely relate these metrics to theunderlying starspot conﬁguration. While we agree this is the case for individual stars,our work stresses that it is possible to circumvent many of these degeneracies withensemble analyses. Basri & Shah (2020) also concluded it is not in general possible touniquely disentangle diﬀerential rotation from spot evolution when their timescalesare comparable; nor is it possible to conﬁdently measure a rotation period when theevolution timescale is very short. However, their study relied on the eﬀect theseprocesses have on simple light curve metrics, which are almost certainly not suﬃcientstatistics of the data. Inference that takes into account the full covariance structureof the data, while considering large ensembles of light curves, could in principle breakthis degeneracy. While we do not explicitly model diﬀerential rotation in this paper,it will be the subject of a future paper in this series.74.4.

Caveats

We conclude our discussion with a list of several notes and caveats that should bekept in mind when using our algorithm and its

Python implementation.1.

The assumed latitude distribution is not Gaussian . Because we require theﬁrst and second moment integrals (Equations C72 and C79) to have closedform solutions, there are restrictions on the probability density function we canassume for the spot latitude. We ﬁnd that a Beta distribution in the cosineof the latitude is integrable in closed form and can be evaluated eﬃciently interms of recursion relations. In many cases, particularly when µ φ (cid:46) ◦ and σ φ (cid:46) ◦ , the distribution in the spot latitude is close to a bi-modal Gaussianwith mean ± µ φ and standard deviation σ φ (see Figure 16). In general, however, µ φ is formally equal to the mode (as opposed to the mean) of the distribution,and σ φ is the Laplace approximation to the local standard deviation at themode.2. The number of spots n does not have to be an integer . Samples from our GPprior will not in general have exactly n spots (see, e.g., Figure 3). This is dueto the fact that our model is only an approximation to the true distribution ofstellar surfaces conditioned on the spot properties. A corollary of this point isthat n need not be an integer, which makes it easier in practice to sample overusing modern inference techniques such as MCMC, HMC, ADVI, and nestedsampling.3. Care should be taken when modeling large-amplitude light curves . We discussedthis point at length in §2.5. Modeling a light curve that has been normalizedto its mean (or median) as a Gaussian process is conceptually a bad idea whenthe amplitude of variability is large compared to the mean. As a rule of thumb,if the amplitude of variability exceeds ∼ , we recommend not normalizingthe light curve in this way, and instead modeling the normalization amplitudeas a latent variable.4. Keep in mind the polar spot degeneracy . Even when modeling ensembles oflight curves, there are still strong degeneracies at play (Paper I). In particular,spots centered on the poles are always in the null space, so it can be diﬃcult inpractice to rule out their presence. This can be seen in Figure S18, in which themodel cannot distinguish between spots localized at ◦ and polar spots withhigh latitude variance. It may thus be advisable to adopt a prior that favorssmall values of σ φ , such as the common inverse gamma prior for the variance.Alternatively, one could place an isotropic prior on the latitude (with densityproportional to cos µ φ ) to downweight very high latitude spots.5. Limb darkening matters!

The null space is extremely sensitive to limb darkening(Paper I). It is therefore extremely important to model it correctly; otherwise,8 rotations f l u x [ pp t ] i n t e n s i t y Figure 13.

Prior samples from a sum of two starry_process models. The ﬁrst model consistsof small ( r = 10 ◦ ) circumpolar ( φ = 60 ◦ ± ◦ ) spots and the second model consists of larger( r = 20 ◦ ) equatorial ( φ = 0 ◦ ± ◦ ) spots. The sum of two starry_process models is also a starry_process , making it easy to model more complex distributions of spots. - there may be substantial bias in the inferred spot parameters. For stars withtransiting exoplanets, it may be possible to infer the limb darkening coeﬃcientsempirically, but in general we recommend modeling them as latent variableswith priors informed by theoretical models.6. Careful with the data . In general, covariances can be very hard to estimatefrom noisy data. This makes it especially important to ensure one is correctlymodeling the noise. When applying our GP to model real data, we recommendthe usual inference practices of clipping outliers, modeling a latent white noise(jitter) term, and modeling a small latent additive oﬀset term to minimize therisk of bias in the posteriors of interest.7.

Careful with the sample selection . When performing an ensemble analysis ofstellar variability, it is tempting to only analyze light curves that show variabilityin the ﬁrst place.

This is extremely dangerous , since the lack of variability couldsimply be due to low inclinations. It is extremely important to ensure the sampleselection step does not introduce bias. If there is reason to believe there aretwo distinct populations within an ensemble—say, a population of active starsand a population of quiet (spotless) stars—we strongly recommend the use ofa Gaussian mixture model. EXTENSIONS5.1.

Composite GPs

Thus far we have assumed that spots are concentrated at a single latitude (aboveand below the equator). We baked this assumption directly into our choice of distribu-tion function for the latitude (§C.2), which has exactly two modes at ± µ σ . However,it is possible (at least in principle) that certain stars could have two or more activelatitudes, in which case our GP is not an appropriate description of the stellar surface.Fortunately, Gaussian distributions (and thus also Gaussian processes) are closedunder addition, meaning that the sum of two GPs is also a GP. We can thus constructmore complex models for stellar variability by summing GPs with diﬀerent spot hy-9perparameter vectors θθθ • . The composite GP will then have a mean equal to the sumof the means of each GP, and a covariance matrix equal to the sum of the covariancematrices of each GP. One possible application of this is to model stars with multipleactive latitudes, as described above; an example of this is shown in Figure 13, wheresamples are drawn from a GP with small circumpolar spots and large equatorial spots(see caption for details). Since the composite GP inherits all of the properties of thestandard GP, it can be used to do inference under more complex priors than thosepresented here.It is also worth noting that this technique may be employed to model arbitrarydistributions for parameters like the spot radius and the number of spots. In theAppendix we present a formulation of our GP that admits a radius distribution half-width parameter ∆ r ; this generalizes our delta function distribution to a uniformdistribution between r − ∆ r and r + ∆ r . One may then compute the weighted sumof several GPs with half-widths ∆ (cid:54) = 0 and central radii r = r , r = r + 2∆ r , r = r + 4∆ r , etc., to approximate any distribution of spot radii. Similarly, onemay compute the weighted sum of several GPs with diﬀerent values of the numberof spots n to enforce any discrete distribution for that quantity. The reader shouldkeep in mind that the cost of computing the GP covariance matrix will scale linearlywith the number of GP components. In many cases, however, the computationalbottleneck is the covariance factorization step (Luger et al. 2021a), in which caseadding components to the model will result in negligible overhead.5.2. Time evolution

Another big limitation of the base algorithm is the implicit assumption that stellarsurfaces are static. Our GP hyperparameters θθθ • describe the spatial conﬁgurationof starspots, but they say nothing about their evolution in time. We know fromobservations of the Sun and of Kepler stars that temporal variability is extremelycommon: spots appear, disappear, and even migrate in latitude over time. Whileit may be possible to parametrize their evolution in a way that is general enoughto capture all the ways in which they may change over time, such an approach isbeyond the scope of the present paper. It is, however, straightforward to implementan uninformative temporal prior within the framework of our GP. To do this, we willmake two simplifying assumptions:1.

The temporal process is stationary.

This implies that there is no preferred time(or phase) and that the spatial covariance is the same at all points in time.Stars may still have active longitudes under this assumption, but there is nopreferred longitude across all stars .2.

The temporal and spatial covariances are independent.

This implies that theevolution of each spherical harmonic mode in time is independent of the evolu-tion of any of the other modes in time.0There is some tension between these assumptions and our knowledge of how stel-lar surfaces evolve. Assumption (1) excludes surfaces whose total spottiness changessigniﬁcantly or whose spots migrate in latitude, as both processes change the spatialcovariance over time. Assumption (2) ignores the correlation between spherical har-monic modes due to the migration of spots, which requires the coherent evolution ofmany modes at once. These assumptions likely limit the ability of our GP to modellight curves on very long baselines (i.e., on timescales of years) over which stellaractivity cycles take place. However, given that the use case of our algorithm is likelyto be the analysis of individual quarters of

Kepler and individual sectors of

TESS data, our assumptions are likely valid in most cases. Analyses of (say) all quarters of

Kepler data could process each quarter at a time in a hierarchical framework in whichhyperparameters like the mean spot latitude in each quarter are treated as functionsof time.The two assumptions listed above suggest a fairly straightforward form for the GPcovariance in spherical harmonics and time:

ΣΣΣ ( t ) y = K ⊗ ΣΣΣ y , (32)where K is a ( K × K ) matrix describing the covariance among the K points in time, ΣΣΣ y is the ( N × N ) matrix describing describing the covariance among the N ≡ ( l max + 1) spherical harmonic coeﬃcients (Equation 10), and ⊗ denotes the Kronecker product.The quantity ΣΣΣ ( t ) y is the ( N K × N K ) temporal-spatial covariance, whose coeﬃcient atindex ( N k + n, N k (cid:48) + n (cid:48) ) is the covariance between the spherical harmonic coeﬃcient y n ( t k ) and the spherical harmonic coeﬃcient y n (cid:48) ( t k (cid:48) ) , where the index n is related tothe spherical harmonic indices l and m via Equation (A3). Finally, because of ourassumption of stationarity, the mean of the GP is still constant and equal to the meanof the static process.The covariance matrix in Equation (32) can be sampled from to yield time-variablesurface maps, or it can be transformed into ﬂux space for sampling light curves orcomputing likelihoods. The covariance in ﬂux space is given by ΣΣΣ ( t ) = AAA ‡ ΣΣΣ ( t ) y AAA ‡(cid:62) (33)where

AAA ‡ ≡  a (cid:62) a (cid:62) . . . a (cid:62) K −  (34)is a design matrix constructed by staggering the rows of the standard design matrix AAA (see Appendix B.1). The diagonal structure of

AAA ‡ is due to the fact that eachsnapshot of the surface is only observed at a single phase.1 time [days] f l u x [ pp t ] Figure 14.

Prior sample from a time-variable starry_process , assuming a rotation period P = 1 day and a temporal evolution timescale τ = 20 days modeled with an exponentialsquared kernel. Evolving map samples are shown at the top, with the corresponding lightcurve viewed at an inclination I = 60 ◦ at the bottom. The GP hyperparameters are set totheir default values (Table 1). - In the temporally-variable version of our GP, the covariance matrix in Equa-tion (33) replaces the standard covariance

ΣΣΣ ; it can similarly be modiﬁed (§2.5) toobtain the covariance of the normalized process. While

ΣΣΣ ( t ) is ( K × K ) , it is com-puted from the contraction of a much larger ( N K × N K ) matrix, which is extremelyineﬃcient to instantiate and operate on. Fortunately, it can be shown that ΣΣΣ ( t ) = AAA ‡ ΣΣΣ ( t ) y AAA ‡(cid:62) = AAA ‡ ( K ⊗ ΣΣΣ y ) AAA ‡(cid:62) = ΣΣΣ (cid:12) K , ¸ (35)that is, the ﬂux covariance is just the elementwise (cid:12) product of the GP covariance ΣΣΣ and the temporal covariance K . This fact makes the temporal GP just as eﬃcient to evaluate as the standard GP!Armed with this algorithm for computing ΣΣΣ ( t ) , it only remains for us to decide ona structure for K . While this can in principle be any covariance matrix constructedfrom a stationary kernel, we recommend one of the common radial kernels such asthe exponential squared kernel k E (∆ t ) = σ exp (cid:18) − ∆ t τ (cid:19) (36)or the Matérn-3/2 kernel k M (∆ t ) = σ (cid:32) √ tτ (cid:33) exp (cid:32) − √ tτ (cid:33) (37)2with variance parameter σ set to unity (since the variance is already speciﬁed within ΣΣΣ ). In this case, the temporal covariance K is a function of a single parameter: thetimescale of the variability, τ . Similar to the other hyperparameters of our GP, thisparameter can be estimated in an inference setting. However, a detailed investigationof the ability of our temporal GP to accurately capture spot variability is beyond thescope of this paper, and will be revisited in the future, along with an algorithm toexplicitly model the eﬀects of diﬀerential rotation on the covariance structure.Figure 14 shows a single prior sample from the temporal GP described above,assuming a squared exponential covariance with τ = 20 days and a rotation period P = 1 day. We compute both the map samples at ﬁve diﬀerent times (top) and thelight curve sample over ten rotations (bottom). The ﬂux variability is qualitativelysimilar to that seen in temporally-variable Kepler light curves.5.3.

Marginalizing over period and limb darkening

In §2.4 we discussed the value of marginalizing over the stellar inclination whencomputing the GP covariance (and thus the likelihood). When jointly analyzingthe light curves of many stars (which we have repeatedly argued is the best wayto infer their spot properties), it can be extremely useful to minimize the numberof latent variables associated with individual stars by analytically marginalizing overthem. This can dramatically reduce the number of free parameters, turning a diﬃcultinference problem in (possibly) hundreds or thousands of dimensions into a mucheasier problem in a handful of dimensions. We showed how it is possible to marginalizeaway the dependence of our GP on the inclinations of individual stars, which is a hugestep in this direction. However, our GP remains a function of two other quantitiesthat will in general be diﬀerent for diﬀerent stars: the stellar rotation period P andthe limb darkening coeﬃcient vector u . We argued that in some cases one may beable to ﬁx the period of each star at an estimate obtained in a pre-processing step(i.e., from a periodogram) and ﬁx the limb darkening coeﬃcients at theoretical values,in which case the number of free parameters of the GP is equal to the size of the spothyperparameter vector θθθ • (ﬁve by default), and independent of the number of lightcurves in the ensemble. However, this procedure ignores any uncertainty in the periodand limb darkening coeﬃcients, which could be signiﬁcant; it is also subject to biasdue to the fact that there may be systematic errors in theoretical models for the limbdarkening coeﬃcients, particularly for low mass stars (e.g., Kervella et al. 2017). Amuch better approach would be to analytically marginalize over these two quantities.Consider Equation (B7) in the Appendix, from which the rows of the design matrix(which transforms vectors in the spherical harmonic basis to vectors representing theﬂux at a point in time) are computed. Marginalization over the inclination, whichwe demonstrated how to perform analytically in Appendix D, entails integrating overthe term R ˆ x ( − I ) , the Wigner matrix that rotates the star by the inclination angle I into the observer’s frame. Similarly, marginalization over the rotation period would3entail integration over the term R ˆ z (cid:0) πP t k (cid:1) , another Wigner matrix that rotates thestar to the correct rotational phase at time t = t k . Given an appropriate prioron the rotational angular frequency πP , it should be possible to follow the sameprocedure outlined in Appendix D to analytically compute the ﬁrst two moments ofthe distribution of light curves marginalized over the rotation period.Equation (B7) also makes clear the dependence of the GP on the limb darkeningcoﬃcients, which enter via the limb darkening operator L ( u ) . The coeﬃcients of thismatrix are linear in the limb darkening coeﬃcients u and u (see Equation B9), soit should be possible to derive closed form solutions to the relevant integrals over L ( u ) to yield the GP covariance marginalized over u . It may be possible to computethe marginalized covariance even when parametrizing the quadratic limb darkeningcoeﬃcients in terms of the uncorrelated q and q parameters from Kipping (2013),which would allow us to incorporate hard constraints on positivity and strict limbdarkening (as opposed to brightening) directly into the prior.Given the complexity of the operations involved, performing these marginalizationsis beyond the scope of the present paper. However, given the signiﬁcant computationalbeneﬁts of this marginalization, as well as the fact that the mapping problem isparticularly sensitive to the limb darkening coeﬃcients (Paper I), marginalizing over P and u will be the subject of an upcoming paper.5.4. Modeling transits and radial velocity datasets

Finally, we would like to note that the GP developed in this paper is not limitedto modeling rotational light curves of stars. We derived the covariance structureof spotted stellar surfaces in the spherical harmonic basis, which we then linearlytransformed into ﬂux space to obtain the light curve GP presented above. However,our GP may be used to model any kind of observation that is linearly related to thespherical harmonic representation. If A is the linear operator that transforms thespherical harmonic representation y to the data vector d via d = Ay , then the meanand covariance of the GP model for d are given by µµµ d = A µµµ y (38) ΣΣΣ d = A ΣΣΣ y A (cid:62) , (39)respectively, where µµµ y and ΣΣΣ y are the mean and covariance in the spherical harmonicbasis derived in this paper. In Luger et al. (2019) we showed that occultation lightcurves are also linearly related to the spherical harmonic representation of the stellarsurface, so it is straightforward to use our GP to model transits of planets acrossspotted stars, either to marginalize over the spot variability or to constrain the surfacemap of the star. In this context, A may be computed via the design_matrix() methodof a Map instance of the starry package.The formalism developed here may also be extended to model radial velocity (RV)datasets, although this requires a bit more work. The instantaneous radial velocity4shift v induced by a rotating spotted star may be approximated as v = (cid:82)(cid:82) S I V d S (cid:82)(cid:82) S I d S (40)where I is the stellar intensity at a point on the surface, V is the radial component ofthe rotational velocity vector at that point, and the integral is taken over the projecteddisk of the star. If we expand the surface intensity distribution in spherical harmonics,the integral in the denominator is just a classical starry integral, as it is just the disk-integrated intensity (i.e., the ﬂux). The numerator may also be computed followingthe starry formalism, provided we weight the surface intensity representation by thevelocity ﬁeld V . In the case of rigid body rotation, V is exactly a dipole ( l = 1 ) ﬁeld,so the quantity IV can be expressed exactly as a product of spherical harmonics. Since spherical harmonics are closed under multiplication, we may write IV as alinear combination of spherical harmonics, meaning the integral in the numerator is also a starry integral. For some linear operators A and B , we may therefore write v = B y A y (41)where v is the random variable representing the observed radial velocity time series, y is the Gaussian random variable representing the spherical harmonic coeﬃcientsdescribing the stellar surface, and the division is performed elementwise. Since v isthe ratio of two Gaussian random variables, its distribution is not Gaussian. However,we can still compute the ﬁrst two moments of the distribution of v to derive a Gaussianapproximation to it (similar to what we did in §2.5), which will yield the mean andcovariance of the Gaussian process representation of the radial velocity time series.Given the complexity of the operations described above, we defer this calculation(and the calibration of the resulting GP model) to future work. CONCLUSIONSThis paper is the second in a series devoted to the development of statistically rig-orous techniques to model stellar surfaces based on unresolved photometric and spec-troscopic measurements. Here, we presented a new Gaussian process (GP) model forstellar variability whose hyperparameters explicitly correspond to physical propertiesof the stellar surface. Our GP allows one to eﬃciently compute the likelihood func-tion for stellar light curves marginalized over nuisance parameters such as the speciﬁcsizes, positions, and contrasts of individual spots, which are generally unknowabledue to the extreme degeneracies involved in the light curve mapping problem. OurGP therefore makes it easy to do posterior inference on the real quantities of interest:parameters controlling the distribution of spot sizes, latitudes, and contrasts withina star and/or across many stars in an ensemble. Higher order eﬀects such as diﬀerential rotation and convective blueshift can be easily modeledwith a higher degree expansion of u . K ∼ , points takes about 20ms ona modern laptop. Our algorithm is implemented in the open-source, user-friendly Python package starry_process , which is pip -installable, available on GitHub, and de-scribed in Luger et al. (2021a). The algorithm is implemented in a combination of

C++ and

Python , linked using the theano package. Because our GP covariance hasan exact representation, so too do its derivatives. We therefore implement backprop-agated derivatives with respect to all input parameters for out-of-the-box usage withgradient-based inference and optimization tools such as Hamiltionain Monte Carlo(HMC), autodiﬀerentation variational inference (ADVI), and gradient-based nestedsampling.We devoted a large portion of this paper to testing the algorithm on a variety ofsynthetic datasets, showing that it is a well-calibrated and in most cases unbiasedestimator for starspot properties. Below we list our main results:1.

Our GP works best for ensemble analyses.

The light curve mapping prob-lem is extremely degenerate, as light curves contain a vanishingly small fractionof the total information about a stellar surface. However, the degenerate sur-face modes are a strong function of the observer’s viewing angle, so light curvesof stars seen at diﬀerent inclinations constrain diﬀerent components of the sur-face. We have shown that if we jointly analyze the light curves of many stars,we can break many of the degeneracies at play and uniquely infer the statisticalproperties of the spots across the ensemble. This type of analysis works bestif the stars in the ensemble are statistically similar: i.e., the properties of theirspots are all drawn from the same parent distribution, whose parameters wecan constrain.2.

Typically, an ensemble of at least M ∼ light curves is needed toplace meaningful constraints on starspot properties . This estimate isbased on ensemble analyses of light curves with K = 1 , cadence each andper-cadence precision of one part per thousand. Lower quality, shorter baselineobservations, or datasets contaminated by outliers will in general require largervalues of M for the same constraining power. The presence of strong limbdarkening also degrades the information content of light curves, in which casean ensemble of hundreds or even a thousand light curves is recommended.3. Our GP is in most cases an unbiased estimator for the spot radiusand latitude distributions.

We showed that our GP can accurately infer theangular size of spots and the mode and standard deviation of their distributionin latitude from stellar light curves. For the ﬁducial ensemble of M = 50 lightcurves described above, we are able to constrain the average spot radii to withina couple degrees and the average spot latitudes to within ◦ .64. Our GP can accurately infer stellar inclinations.

We presented twoversions of our GP: one conditioned on a speciﬁc value of the stellar inclination,and one marginalized over inclination under an isotropic prior. In both cases,we ﬁnd that we can accurately infer the inclinations of individual stars in anensemble analysis (in the latter case, a simple post-processing step can yield theinclination posterior distribution). While the inclination is not an observablefor an individual stellar light curve, the population-level constraints on thespot properties achieved by the GP can break the degeneracies involving theinclination, allowing us to usually infer it to within about ◦ and without bias.5. Our GP can be used to model small, Sun-like spots.

The algorithmpresented here is limited to a surface resolution of about ◦ , corresponding tospots about an order of magnitude larger than typical sunspots. However, wehave showed that when we apply our model to light curves of stars with small( r ∼ ◦ ) spots, we can still infer their latitudinal distribution without bias, aswell as the presence of spots below our resolution limit.6. Our GP can be extended to model time-variable surfaces.

The algo-rithm presented here was derived for static stellar surfaces, corresponding toperfectly periodic light curves. However, time variability can easily be modeledas the product of the starry_process kernel and a kernel describing the covari-ance of the process in time, such as a simple exponential squared or Matérn-3/2kernel. The hyperparameters of the temporal covariance are then strictly tiedto the timescale on which the surface evolves. More complex temporal variabil-ity, such as that induced by diﬀerential rotation, will be the subject of a futurepaper in this series.7.

Our GP can be used in exoplanet transit modeling and radial velocitydatasets.

At its core, the starry_process

GP deﬁnes a distribution over spher-ical harmonic representations of stellar surfaces. It can therefore be used as aphysically interpretable prior when modeling transits of exoplanets across spot-ted stars, either to marginalize over the stellar inhomogeneity or to explicitlyinfer spot properties. When combined with the starry package, it can also beused as a prior on the stellar surface in Doppler imaging, Doppler tomography,or even radial velocity searches for exoplanets. The latter will be the subject ofa future paper in this series.The GP presented here has far-ranging applications for stellar light curve studies.It serves as a drop-in replacement for commonly used GP kernels for stellar vari-ability, which currently do not have physically interpretable parameters other thanthe rotation period and, in some cases, a spot evolution timescale. As such, it canbe used to marginalize over stellar rotational variability signals in (say) transitingexoplanet searches, asteroseismic characterization of stars, radial velocity searches,7etc. It can also be used to learn about stellar surfaces directly: to infer spot prop-erties of main sequence stars as a function of spectral type and age, to diﬀerentiatebetween spot-dominated and plage-dominated stellar surfaces, to better understandchemically peculiar massive stars, and to better understand the spot properties oftransiting exoplanet hosts for unbiased spectroscopic characterization of their atmo-spheres (to name a few).The next papers in this series will focus on (in no particular order)•

A more rigorous treatment of time variability . This paper will focus on modelingdiﬀerential rotation, whose eﬀect on the covariance of the process can be derivedin a similar fashion to what we did here. This will enable direct inference aboutthe diﬀerential rotation rates and spot evolution timescales of stars, processeswhose eﬀects on light curves are too similar for current methodology to reliablydiscern between them.•

Explicit marginalization over the remaining stellar parameters . These in-clude the stellar rotation period and limb darkening coeﬃcients, which can bemarginalized over analytically under certain choices of prior. This will eliminateall per-star hyperparameters in the expression for the GP covariance, greatlyspeeding up inference for large ensembles of stellar light curves.•

Extension of this formalism to radial velocity datasets . As we discussed in§5.4, it is possible to extend the methodology presented here to model thecontribution of stellar surface variability to radial velocity measurements, whichcan be used (for instance) to mitigate systematics in extreme precision radialvelocity (EPRV) searches for exoplanets.•

Extension of this formalism to Doppler imaging . As we show in upcoming work,it is possible to derive an exact linear relationship between the wavelength-dependent spherical harmonic representation of a rotating star and its time-variable spectrum. This linearity makes it possible to adapt our GP formalismto the Doppler imaging problem, providing an eﬃcient marginal likelihood func-tion for rigorous inference studies.In keeping with other papers in the starry series, all ﬁgures in this paper aregenerated automatically from open-source scripts linked to in each of the captions - , and the principal equations link to associated unit tests that ensure the accuracyand reproducibility of the algorithm presented here ¸ / Ø .We would like to thank David W. Hogg, Michael Gully-Santiago, Adam Jermyn,Megan Bedell, Will Farr, Dylan Simon, and the Astronomical Data Group at theCenter for Computational Astrophysics for their help and for many thought-provokingdiscussions that made this paper possible.8 REFERENCES Aigrain, S., et al. 2016, MNRAS, 459,2408Ambikasaran, S., et al. 2015, IEEETransactions on Pattern Analysis andMachine Intelligence, 38, 252Angus, R., et al. 2018, MNRAS, 474, 2094—. 2019, AJ, 158, 173Barnes, S., et al. 2001, ApJ, 548, 1071Basri, G., & Shah, R. 2020, ApJ, 901, 14Beaumont, M. A. 2019, Annual Review ofStatistics and Its Application, 6, 379Blei, D. M., et al. 2016, arXiv e-prints,arXiv:1601.00670Brewer, B. J., & Stello, D. 2009, MNRAS,395, 2226Cantiello, M., & Braithwaite, J. 2019,ApJ, 883, 106Collado, J. R. A., et al. 1989, ComputerPhysics Communications, 52, 323Damasso, M., et al. 2019, MNRAS, 489,2555Duane, S., et al. 1987, Physics Letters B,195, 216Feroz, F., et al. 2009, MNRAS, 398, 1601Foreman-Mackey, D., et al. 2017, AJ, 154,220Fuller, J., et al. 2015, Science, 350, 423Gilbertson, C., et al. 2020, TheAstrophysical Journal, 905, 155. https://doi.org/10.3847/1538-4357/abc627Gough, D. O., & Tayler, R. J. 1966,Monthly Notices of the RoyalAstronomical Society, 133, 85.https://doi.org/10.1093/mnras/133.1.85Gully-Santiago, M. A., et al. 2017, ApJ,836, 200Guo, Z., et al. 2018, ApJ, 868, 143Haywood, R. D., et al. 2014, MNRAS,443, 2517Hoﬀman, M. D., & Gelman, A. 2011,arXiv e-prints, arXiv:1111.4246Ireland, L. G., & Browning, M. K. 2018,ApJ, 856, 132Jones, D. E., et al. 2017, arXiv e-prints,arXiv:1711.01318Kervella, P., et al. 2017, A&A, 597, A137Kipping, D. M. 2013, MNRAS, 435, 2152Kucukelbir, A., et al. 2016, arXiv e-prints,arXiv:1603.00788 Luger, R. 2021, in preparationLuger, R., et al. 2019, AJ, 157, 64—. 2016, AJ, 152, 100—. 2021a, arXiv e-prints,arXiv:2102.01774—. 2021b, arXiv e-prints,arXiv:2102.00007—. 2017a, ApJ, 851, 94—. 2017b, Nature Astronomy, 1, 0129Miesch, M. S., & Toomre, J. 2009, AnnualReview of Fluid Mechanics, 41, 317.https://doi.org/10.1146/annurev.ﬂuid.010908.165215Morris, B. 2020a, The Journal of OpenSource Software, 5, 2103Morris, B. M. 2020b, ApJ, 893, 67Morris, B. M., et al. 2018, ApJ, 857, 39Perger, M., et al. 2020, arXiv e-prints,arXiv:2012.01862Price, L. F., et al. 2018, Journal ofComputational and GraphicalStatistics, 27, 1Rackham, B. V., et al. 2018, ApJ, 853, 122Rajpaul, V., et al. 2015, MNRAS, 452,2269Rasmussen, C. E., & Williams, C. K. I.2005, Gaussian Processes for MachineLearning (Adaptive Computation andMachine Learning) (The MIT Press)Robertson, P., et al. 2020, ApJ, 897, 125Schuessler, M., et al. 1996, A&A, 314, 503Sikora, J., et al. 2018, Monthly Notices ofthe Royal Astronomical Society, 483,3127.https://doi.org/10.1093/mnras/sty2895Skilling, J. 2004, in American Institute ofPhysics Conference Series, Vol. 735,Bayesian Inference and MaximumEntropy Methods in Science andEngineering: 24th InternationalWorkshop on Bayesian Inference andMaximum Entropy Methods in Scienceand Engineering, ed. R. Fischer,R. Preuss, & U. V. Toussaint, 395–405Skilling, J. 2006, Bayesian Anal., 1, 833Solanki, S. K., et al. 2006, Reports onProgress in Physics, 69, 563Speagle, J. S. 2020, MNRAS, 493, 3132 Turcotte, S. 2003, in Astronomical Societyof the Paciﬁc Conference Series, Vol.305, Magnetic Fields in O, B and AStars: Origin and Connection toPulsation, Rotation and Mass Loss, ed.L. A. Balona, H. F. Henrichs, &R. Medupe, 199Vanderburg, A., et al. 2016, MNRAS, 459,3565 Wandelt, B. 2012, Gaussian RandomFields in CosmostatisticsWeber, M. A., & Browning, M. K. 2016,ApJ, 827, 95Wood, S. N. 2010, Nature, 466, 1102Yadav, R. K., et al. 2015, A&A, 573, A68 Table 2 . List of common variables and symbols usedthroughout this paper.

Symbol Description Reference Vector of ones — ∼ Denotes a normalized vector-valued random variable §2.5 (cid:12)

Elementwise product §5.2 ⊗ Kronecker product §5.2 a GP hyperparameter: spot latitude shape parameter Appendix C.2 a (cid:62) Row of the starry design matrix Equation (B6) A starry change of basis matrix Appendix B AAA starry design matrix Equation (B6) α GP hyperparameter: spot latitude shape parameter Appendix C.2 b GP hyperparameter: spot latitude shape parameter Appendix C.2 β GP hyperparameter: spot latitude shape parameter Appendix C.2 c GP hyperparameter: spot contrast Appendix C.4 (cid:99)

Spot contrast (random variable) Appendix C.4 Γ( · · · ) Gamma function — D u Complex Wigner rotation matrix about an axis u Appendix C.2.1 δ ( · · · ) Delta function — δ ij Kronecker delta — ∆ r GP hyperparameter: spot radius spread Appendix C.1 E (cid:2) · · · (cid:3) Expected value Equation (5) e I First moment integral of the inclination Equation (20) e r First moment integral of the radius Equation (C20) e φ First moment integral of the latitude Equation (C21) e λ First moment integral of the longitude Equation (C22) e c First moment integral of the contrast Equation (C23) E I Second moment integral of the inclination Equation (21) E r Second moment integral of the radius Equation (C24) E φ Second moment integral of the latitude Equation (C25) E λ Second moment integral of the longitude Equation (C26) E c Second moment integral of the contrast Equation (C27) f Flux vector Equation (3) f Flux vector (random variable) Equation (1) F ( · · · ) Gauss hypergeometric function — θθθ • Vector of GP hyperparameters Equation (13) I Stellar inclination — (cid:73)

Stellar inclination (random variable) §2.4 J Jacobian of the spot latitude transform Equation (C71) k (∆ t ) GP kernel function Equation (2) K Number of points in light curve — K Temporal covariance matrix §5.2 l Spherical harmonic degree Appendix A L Limb darkening operator §B.2 L Likelihood function Equation (14) Table 2 . (continued from previous page)

Symbol Deﬁnition Reference λ Spot longitude (random variable) Appendix C.3 m Spherical harmonic order Appendix A M Number of light curves in ensemble — µ Flux GP mean Equation (22) µµµ

Flux GP mean vector Equation (7) µµµ y Spherical harmonic GP mean vector Equation (9) n GP hyperparameter: number of spots Appendix C (cid:110)

Number of spots contrast (random variable) Appendix C N ( µ, σ ) Normal distribution: mean µ , variance σ — p ( · · · ) Probability, probability density — P Stellar rotation period — r GP hyperparameter: spot radius Appendix C.1 r (cid:62) Integral over unit disk of polynomial basis Appendix B (cid:114)

Spot radius (random variable) Appendix C.1 R u Real wigner rotation matrix about an axis u Appendix C.2.1 s Spherical harmonic expansion of spot Equation (C33) σ f Photometric uncertainty §3 σ φ GP hyperparameter: spot latitude standard deviation Appendix C.2

ΣΣΣ

Flux GP covariance matrix Equation (8)

ΣΣΣ ( t ) Flux GP covariance matrix w/ temporal evolution Equation (8) ˜ΣΣΣ

Flux GP covariance matrix (normalized processs) Equation (24)

ΣΣΣ y Spherical harmonic GP covariance matrix Equation (10)

ΣΣΣ ( t ) y Spherical harmonic GP covariance w/ temporal evolution Equation (10) t time — τ GP hyperparameter: timescale §5.2 u Limb darkening coeﬃcient vector Appendix B.2 u , u Linear and quadratic limb darkening coeﬃcients Appendix B.2 U Complex-to-real basis change operator (Wigner matrices) Appendix C.2.1 U ( a, b ) Uniform distribution between a and b — x Random vector of spot properties Equation (C13) y Spherical harmonic coeﬃcient vector Equation (3) y Spherical harmonic coeﬃcient vector (random variable) Equation (C14) µ φ GP hyperparameter: spot latitude mode Appendix C.2 φ Spot latitude (random variable) Appendix C.2 z GP normalization number Equation (25) A. NOTATIONUnless otherwise noted, we adopt the following conventions throughout this paper:integers are represented by italic uppercase letters (i.e., K ), scalars are represented byitalic lowercase letters (i.e., x ), column vectors are represented by boldface lowercaseletters ( x ), and matrices are represented by boldface capital letters ( X ). In general,the elements of a vector x are denoted x i and the elements of a matrix X are denoted X i,j . Importantly, we make a distinction between quantities like X i,j and X i,j : theformer is a scalar element of a matrix, while the latter is a matrix , which is itselfa component of a higher-dimensional (in this case, 4-dimensional) linear operator.Thus, lowercase bold symbols always represent vectors, and uppercase bold symbols always represent matrices.We also make an explicit distinction between numerical quantities and randomvariables. The former are typeset in serif font (as above), while the latter are typesetin blackboard font. For example, the quantity (cid:120) denotes a scalar random variable,while x denotes a particular realization of that variable. The same applies to vector-valued ( x is a realization of x ) and matrix-valued random variables ( X is a realizationof X ).Much of the math in this paper involves vectors representing coeﬃcients in thespherical harmonic basis, which are customarily indexed by two integers l and m . Wetherefore make an exception to our indexing notation for quantities in the sphericalharmonic basis: we use two indices to represent a scalar vector element, x lm , and four indices to represent a scalar matrix element, X l,l (cid:48) m,m (cid:48) . The upper indices correspondsto the spherical harmonic degree, l ∈ [0 , l max ] and l (cid:48) ∈ [0 , l max ] , while the lower indicescorrespond to the spherical harmonic order, m ∈ [ − l, l ] and m (cid:48) ∈ [ − l (cid:48) , l (cid:48) ] . Vectorelements are arranged in order of increasing l and, within each l , in order of increasing m . For example, a vector x representing a quantity in the spherical harmonic basisup to degree l max has components given by x = (cid:0) x x − x x · · · x l max − l max · · · x l max l max (cid:1) (cid:62) , (A1)while a matrix X in the same basis has components given by X =  X , , X , , X , − , X , , X , , − X , , − X , − , − X , , − X , , X , , X , − , X , , X , , X , , X , − , X , , . . .  . (A2)3For completeness, the element of a spherical harmonic vector x with degree l andorder m is at (ﬂattened) index n = l + l + m . (A3)Conversely, the element at (ﬂattened) index n has degree and order l = (cid:98)√ n (cid:99) m = n − l − l , (A4)respectively. Note, ﬁnally, that our use of upper and lower indices is purely a nota-tional convenience, and should not be confused with exponentiation or a distinctionbetween covariant and contravariant tensors. It should also not be confused with thenotation used for the complex spherical harmonics, which also uses upper and lowerindexing.For reference, Table 2 lists the principal symbols, operators, and variables usedthroughout the paper, with links to the equations and/or section in which they arepresented. B. COMPUTING THE FLUXB.1.

Basic expression

As we mentioned in §2.3, the ﬂux f is a purely linear function of the sphericalharmonic coeﬃcient vector y : f = + AAA y . (B5)Even though this is derived in detail in Luger et al. (2019), it is useful to expand onthe computation of the design matrix AAA in more detail here. Let a (cid:62) k denote the k th row of AAA , such that

AAA =  a (cid:62) K − ... a (cid:62) a (cid:62)  . (B6)The row vector a (cid:62) k encodes how the spherical harmonic coeﬃcient vector projectsonto the k th cadence in the ﬂux time series, and may be computed from a (cid:62) k = r (cid:62) A R ˆ x ( − I ) R ˆ z (cid:18) πP t k (cid:19) R ˆ x (cid:16) π (cid:17) . (B7)To understand the expression above, let us proceed from right to left, starting withthe spherical harmonic vector y , which we assume describes the surface intensity of4the star at time t = 0 in a frame where ˆ x points to the right, ˆ y points up, and ˆ z points out of the page. The quantity R ˆ x is a Wigner rotation matrix (described indetail in §C.2.1), which in this case rotates the spherical harmonic representation ofthe star by an angle π / counter-clockwise about ˆ x such that the north pole of thestar points along ˆ z . In this frame, we apply a second Wigner rotation matrix, R ˆ z , torotate the star about ˆ z counter-clockwise (i.e., eastward) by an angle πt k / P , where P is the rotation period and t k is the time at cadence t . Next, we rotate the star by a clockwise angle of I about ˆ x , where I is the stellar inclination ( I = 0 correspondingto a pole-on view and I = π / corresponding to an edge-on view). With this lastrotation, we are now in the observer’s frame. Following Luger et al. (2019), the next step is to project the representation of thestar into a more convenient basis for performing the integration over the stellar disk.The change-of-basis matrix A (see Appendix B in Luger et al. 2019) projects thestellar map into the polynomial basis (Equation 7 in Luger et al. 2019), comprisedof the sequence of monomials in Cartesian coordinates (1 x z y x xz xy yz y · · · ) where z = (cid:112) − x − y on the surface of the unit sphere. We can now compute thedisk-integrated ﬂux by integrating each of the terms in the basis over the unit disk,which is straightforward in the polynomial basis; the individual terms integrate tosimple ratios of Gamma functions. These are then assembled into the row vector r (cid:62) ,given by Equation (20) in Luger et al. (2019), which we dot into our expression (andadd one) to obtain the ﬂux at the k th cadence.B.2. With limb darkening

We must adjust our expression for the ﬂux in the presence of limb darkening. Forany polynomial limb darkening law of the form I ( µ ) I ( µ = 1) = 1 − n max (cid:88) n =1 u n (1 − µ ) n , (B8)where I is the intensity on the stellar surface, µ = z = (cid:112) − x − y is the radialcoordinate on the projected disk, and u n is a limb darkening coeﬃcient, the eﬀect oflimb darkening on the stellar map can be expressed exactly as a linear operation on thespherical harmonic coeﬃcient vector (Luger et al. 2019). This includes the popularlinear and quadractic limb darkening laws and generalizes to any limb darkening lawin the limit n max → ∞ . The linearity of the problem can be understood by noting thatall terms in Equation (B8) are strictly polynomials in x , y , and z , all of which can beexpressed exactly as sums of spherical harmonics (Luger et al. 2019). When weightingthe surface intensity by the limb darkening proﬁle, the resulting intensity is simplya product of spherical harmonics, which is itself a linear combination of spherical In principle, one last rotation could be performed about ˆ z to orient the projected disk of thestar on the plane of the sky; however, the disk-integrated ﬂux is independent of the rotation anglealong the plane of the sky (which we refer to as the obliquity ), so this step is unnecessary. n max with coeﬃcients u ,we can construct a matrix L ( u ) that transforms a spherical harmonic vector y ofdegree l max to a limb-darkened spherical harmonic vector y (cid:48) of degree l max + n max .As an example, consider a map of degree l max = 1 and the linear limb darkening law( n max = 1 ) with coeﬃcient vector u = ( u ) . The transformation matrix from y to y (cid:48) is L ( u ) = 11 − u  − u u √

00 1 − u u √ − u

00 0 0 1 − u u √ u √

00 0 0 u √  . ¸ (B9)The columns of L are constructed from the coeﬃcient vectors of each transformedspherical harmonic, which are in turn computed by multiplying each spherical har-monic by the spherical harmonic representation of the particular limb darkening law.In the presence of limb darkening, we may therefore replace our expression for the k th row of the ﬂux design matrix A (Equation B7) with a (cid:62) k = r (cid:62) A L ( u ) R ˆ x ( − I ) R ˆ z (cid:18) πP t k (cid:19) R ˆ x (cid:16) π (cid:17) (B10)for a given limb darkening coeﬃcient vector u . C. THE EXPECTATION INTEGRALSOur goal in this section is to ﬁnd closed-form solutions to the ﬁrst and secondmoments of the spherical harmonic representation of the stellar surface y , E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = (cid:90) y ( x ) p ( x (cid:12)(cid:12) θθθ • )d x (C11) E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = (cid:90) y ( x ) y (cid:62) ( x ) p ( x (cid:12)(cid:12) θθθ • )d x , (C12)which are linearly related to the mean and the covariance of our GP (§2.3). Recallthat x is a vector of parameters describing the exact conﬁguration of features onthe surface of a star, and p ( x (cid:12)(cid:12) θθθ • ) is its probability density function conditioned onhyperparameters θθθ • , which describe the distribution of the features on the surface ofone or many stars. As we are speciﬁcally interested in modeling the eﬀect of starspotson stellar light curves, we let x = ( (cid:110) (cid:99) · · · (cid:99) (cid:110) − λ · · · λ (cid:110) − φ · · · φ (cid:110) − (cid:114) · · · (cid:114) (cid:110) − ) (cid:62) (C13)6and y ( x ) = − (cid:110) − (cid:88) i =0 (cid:99) i R ˆ y ( λ i ) R ˆ x ( φ i ) s ( (cid:114) i ) , (C14)where (cid:110) is the total number of spots, (cid:99) i is the contrast of the i th spot, λ i is itslongitude, φ i is its latitude, and (cid:114) i is its radius. The vector function s ( (cid:114) i ) returns thespherical harmonic expansion of a negative unit brightness circular spot of radius (cid:114) i at λ = φ = 0 , R ˆ x ( φ i ) is the Wigner matrix that rotates the expansion about ˆ x suchthat the spot is centered at a latitude φ i , and R ˆ y ( λ i ) is the Wigner matrix that thenrotates the expansion about ˆ y such that the spot is centered at a longitude λ i ; thesethree functions are detailed in the sections below. Equation (C14) thus provides away of converting a random variable x describing the size, brightness, and position ofspots to the corresponding representation in terms of spherical harmonics. Regardingthis equation, two things should be noted. First, we deﬁne y relative to a baselineof zero: i.e., a star with no spots on it will have y = (which is why we add unityin the expression for the ﬂux in Equation 3). Second, and more importantly, we arenot interested in any speciﬁc value of y ; rather, we would like to know its expectationvalue under the probability distribution governing the diﬀerent spot properties x , i.e., p ( x (cid:12)(cid:12) θθθ • ) .For simplicity, we assume that the total number of spots is ﬁxed to a value n , i.e., p ( (cid:110) (cid:12)(cid:12) θθθ • ) = δ ( (cid:110) − n ) , (C15)where δ is the delta function. We further asume that p ( x (cid:12)(cid:12) θθθ • ) is separable in eachof the four other spot properties, and that all of the spots are drawn from the samedistribution: p ( x (cid:12)(cid:12) θθθ • ) = n − (cid:89) i =0 p ( (cid:99) i (cid:12)(cid:12) θθθ c ) p ( λ i (cid:12)(cid:12) θθθ λ ) p ( φ i (cid:12)(cid:12) θθθ φ ) p ( (cid:114) i (cid:12)(cid:12) θθθ r ) , (C16)where θθθ • = ( n θθθ c θθθ λ θθθ φ θθθ r ) (cid:62) (C17)is the vector of hyperparameters describing the process and θθθ c , θθθ λ , θθθ φ , and θθθ r are yetto be speciﬁed. This allows us to rewrite the expectation integrals (C11) and (C12)as E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = n e c (C18) E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) = n E c (C19) When modeling a single star using the GP, this assumption is justiﬁed by deﬁnition. It is lessjustiﬁed when the GP is used to model an ensemble of stars, where each star may have a diﬀerenttotal number of spots (cid:110) . However, as we argue in the text, (cid:110) is extremely diﬃcult to constrain fromlight curves, in particular because of how degenerate it is with the spot contrast. In practice, weﬁnd that assuming that all stars in the ensemble have the same number of spots n leads to highervariance in the estimate of n , but it does not lead to noticeable bias in n or in any of the otherhyperparameters. e r ≡ (cid:90) s ( (cid:114) ) p ( (cid:114) (cid:12)(cid:12) θθθ r ) d (cid:114) (C20) e φ ≡ (cid:90) R ˆ x ( φ ) e r p ( φ (cid:12)(cid:12) θθθ φ ) d φ (C21) e λ ≡ (cid:90) R ˆ y ( λ ) e φ p ( λ (cid:12)(cid:12) θθθ λ ) d λ (C22) e c ≡ − (cid:90) (cid:99) e λ p ( (cid:99) (cid:12)(cid:12) θθθ c ) d (cid:99) (C23)and the second moment integrals E r ≡ (cid:90) s ( (cid:114) ) s (cid:62) ( (cid:114) ) p ( (cid:114) (cid:12)(cid:12) θθθ r ) d (cid:114) (C24) E φ ≡ (cid:90) R ˆ x ( φ ) E r R (cid:62) ˆ x ( φ ) p ( φ (cid:12)(cid:12) θθθ φ )d φ (C25) E λ ≡ (cid:90) R ˆ y ( λ ) E φ R (cid:62) ˆ y ( λ ) p ( λ (cid:12)(cid:12) θθθ λ )d φ (C26) E c ≡ (cid:90) (cid:99) E λ p ( (cid:99) (cid:12)(cid:12) θθθ c )d (cid:99) . (C27)In Equations (C18) and (C19), we used the fact that both the mean and the varianceof the sum of n independent, identically-distributed random variables are equal to n times the individual mean and the variance, respectively.We devote the remainder of this section to the computation of these eight integrals.C.1. The Radius Integrals

Below we compute the ﬁrst and second moments of the radius distribution ( e r , E r )under a suitable spherical harmonic expansion s ( (cid:114) ) of the spot proﬁle and a suitableprobability distribution function for the spot radius, p ( (cid:114) (cid:12)(cid:12) θθθ r ) .C.1.1. Spot proﬁle

We model the brightness b an angle ϑ away from the center of a spot of negativeunit intensity and radius (cid:114) as b ( r ; ϑ ) = 11 + exp (cid:18) r − ϑs (cid:19) − (C28)8for some (constant) shape parameter s . In the limit s → , b approaches an invertedtop-hat function with half-width equal to r , corresponding to a circular spot of uni-form intensity. For s > , each half of b is a sigmoid with half-width at half-minimumequal to r . In our implementation of the algorithm we choose s = 0 . ◦ , which is smallcompared to features of interest but not too small as to create numerical issues whencomputing model gradients (which would be undeﬁned at the spot boundary if thespot proﬁle were truly an inverted top-hat).Our goal now is to expand the function above in spherical harmonics. To that end,we note that in a frame where the spot is centered on ˆ z (i.e., at polar angle ϑ = 0 ),the brightness proﬁle is azimuthally symmetric, so the only nonzero coeﬃcients inthe spherical harmonic expansion are those with order m = 0 . The correspondingspherical harmonics are simply proportional to the Legendre polynomials in cos ϑ , soour task is simpliﬁed to ﬁnding the Legendre polynomial expansion of b . Deﬁne avector ϑϑϑ of K equally-spaced points between and π , with coeﬃcients given by ϑ k = kπK − . (C29)We wish to model the brightness evaluated at each ϑ k as a weighted combination ofLegendre polynomials, B s ( r ) = b ( r ) (C30)where b ( r ) is computed by evaluating Equation (C28) at each of the ϑ k , B is a designmatrix whose columns are the weighted Legendre polynomials, B k,l = √ l + 1 P l (cos ϑ k ) , (C31)and s ( r ) are the coeﬃcients of the expansion. These are related to the full vector ofspherical harmonic coeﬃcients describing the spot, s ( r ) , by s lm ( r ) = s l δ m, , (C32)or, in vector form, s ( r ) = III s ( r ) (C33)where III is a rectangular (cid:0) ( l max + 1) × ( l max + 1) (cid:1) identity-like matrix with compo-nents I n,l = δ n,l + l , (C34)and δ is the Kronecker delta function. To ﬁnd the coeﬃcients s ( r ) (and hence s ( r ) ),we solve the (linear) inverse problem, s ( r ) = B + b ( r ) (C35)9

90 75 60 45 30 15 0 15 30 45 60 75 90angle from spot center1.000.750.500.250.00 i n t e n s i t y r =5 r =10 r =20 r =30 r =40 Figure 15.

Intensity proﬁles for spots with diﬀerent radii r computed at spherical harmonicdegree l max = 15 . For r (cid:38) ◦ , the spherical harmonic expansion captures the spot shapeand intensity reasonably well, albeit with some ringing due to the truncated expansion. - where B + = S (cid:16) B (cid:62) B + (cid:15) I (cid:17) − B (cid:62) (C36)is the smoothed pseudo-inverse of B with small regularization parameter (cid:15) , I is theidentity matrix, and S is a diagonal smoothing matrix with coeﬃcients S k,l = exp (cid:20) − l ( l + 1)2 ξ (cid:21) δ k,l (C37)for smoothing strength ξ . For (cid:15) → and ξ → ∞ , B + is the exact pseudo-inverse of B . However, (cid:15) > is chosen for improved numerical stability and ξ > is chosen tomitigate the eﬀect of ringing in the solution. In practice, we obtain good results with (cid:15) ≈ − and ξ ≈ .Figure 15 shows the intensity proﬁle for spots of diﬀerent radii expanded to spher-ical harmonic degree l max = 15 . The average intensity within the spots is close to − and the half-widths at half-minimum are equal to the spot radii, as expected.The eﬀect of ringing due to the truncated spherical harmonic expansion is evident,although it is strongly suppressed compared to an expansion without the smoothingterm (i.e., ξ = ∞ ). However, for r (cid:46) ◦ , an expansion to l max = 15 is insuﬃcientto correctly model the spot, as can be seen from the r = 5 ◦ proﬁle (dashed curve).Expansions to higher spherical harmonic degree allow one to model spots with radiismaller than ◦ , although at increased computational cost and potential numericalstability issues; we discuss this point at length in §4.1.0 C.1.2. Probability density function

For simplicity, we will adopt a uniform probability distribution for the spot radius,characterized by a mean radius r and a half-width ∆ r : p ( (cid:114) (cid:12)(cid:12) θθθ r ) =  r r − ∆ r ≤ r ≤ r + ∆ r , (C38)where the hyperparameters of the distribution are θθθ r = ( r ∆ r ) (cid:62) . (C39)As we argue in the text, in practice it is often diﬃcult to constrain the momentsof the radius distribution above the ﬁrst (the mean). It is therefore useful to alsoconsider the limiting case of the radius distribution as ∆ r → , in which case thePDF becomes p ( (cid:114) (cid:12)(cid:12) θθθ r , ∆ r = 0) = δ ( (cid:114) − r ) , (C40)where δ is the delta function. C.1.3. First moment

The ﬁrst moment of the radius distribution is (Equation C20) e r ≡ (cid:90) s ( (cid:114) ) p ( (cid:114) (cid:12)(cid:12) θθθ r ) d (cid:114) = 12∆ r (cid:90) r +∆ rr − ∆ r s ( (cid:114) ) d (cid:114) . (C41)Using the equations from the previous section, its components may be written ( e r ) lm = δ m, r (cid:90) r +∆ rr − ∆ r ˜ s l ( (cid:114) ) d (cid:114) = δ m, r K − (cid:88) k =0 B + l,k (cid:90) r +∆ rr − ∆ r b k ( (cid:114) )d (cid:114) = δ m, K − (cid:88) k =0 B + l,k c k ( r, ∆ r ) ¸ (C42)where c k ( r, ∆ r ) = s r ln (cid:18) χ − k ( r, ∆ r )1 + χ + k ( r, ∆ r ) (cid:19) ¸ (C43)and χ − k ( r, ∆ r ) ≡ exp (cid:18) r − ∆ r − ϑ k s (cid:19) χ + k ( r, ∆ r ) ≡ exp (cid:18) r + ∆ r − ϑ k s (cid:19) . ¸ (C44)1In vector form, we may simply write e r = III B + c ( r, ∆ r ) . ¸ (C45)Note, ﬁnally, that in the limit ∆ r → , lim ∆ r → e r = III B + b ( r ) . ¸ (C46)C.1.4. Second moment

The second moment of the radius distribution is (Equation C24) E r ≡ (cid:90) s ( (cid:114) ) s (cid:62) ( (cid:114) ) p ( (cid:114) (cid:12)(cid:12) θθθ r ) d (cid:114) = 12∆ r (cid:90) r +∆ rr − ∆ r s ( (cid:114) ) s (cid:62) ( (cid:114) ) d (cid:114) . (C47)As before, its components may be written ( E (cid:114) ) l,l (cid:48) m,m (cid:48) = δ m, δ m (cid:48) , r (cid:90) r +∆ rr − ∆ r ˜ s lm ( (cid:114) ) ˜ s l (cid:48) m (cid:48) ( (cid:114) ) d (cid:114) = δ m, δ m (cid:48) , r K − (cid:88) k =0 B + l,k K − (cid:88) k (cid:48) =0 B + l (cid:48) ,k (cid:48) (cid:90) r +∆ rr − ∆ r b k ( (cid:114) ) b k (cid:48) ( (cid:114) )d (cid:114) = δ m, δ m (cid:48) , K − (cid:88) k =0 B + l,k K − (cid:88) k (cid:48) =0 B + l (cid:48) ,k (cid:48) C k,k (cid:48) ( r, ∆ r ) , ¸ (C48)where C k,k (cid:48) ( r, ∆ r ) ≡ s r  exp (cid:16) ϑ k − ϑ k (cid:48) s (cid:17) ln (cid:16) χ − k χ + k (cid:17) − ln (cid:16) χ − k (cid:48) χ + k (cid:48) (cid:17) − exp (cid:16) ϑ k − ϑ k (cid:48) s (cid:17) k (cid:54) = k (cid:48) χ + k + χ − k χ − k − ln (cid:16) χ − k χ + k (cid:17) − k = k (cid:48) . ¸ (C49)In vector form, this may be written E r = III B + C ( r, ∆ r ) B + (cid:62) III (cid:62) . ¸ (C50)Finally, in the limit ∆ r → , lim ∆ r → E r = III B + b ( r ) b (cid:62) ( r ) B + (cid:62) III (cid:62) . ¸ (C51)C.2. The Latitude Integrals

Our goal in this section is to compute the ﬁrst and second moments of the latitudedistribution ( e φ and E φ , given by Equations C21 and C25, respectively). Theseinvolve integrals over the terms in the Wigner rotation matrix for spherical harmonics,which we discuss below.2 C.2.1. Rotation matrices

The Wigner rotation matrix for real spherical harmonics up to degree l max may bewritten as the block-diagonal matrix R =  R R R . . . R l max  , (C52)where R l = U l − D l U l (C53)is the Wigner rotation matrix for a single spherical harmonic degree, U l = 1 √  . . .. . . ii − ii ii √ −

11 11 − . . .. . .  (C54)describes the transformation from complex to real spherical harmonics, and D isthe Wigner matrix for complex spherical harmonics, whose terms are given by theexpression D lm,m (cid:48) ( α , β , γ ) = exp ( − i m (cid:48) α ) d lm,m (cid:48) ( β ) exp ( − i m γ ) (C55)where α , β , and γ are the Euler angles describing the rotation in the ˆ z − ˆ y − ˆ z con-vention and i is the imaginary unit (Collado et al. 1989). The terms of the d -matrixdepend on powers of sin ( β / ) and cos ( β / ) (see Equation C15 in Luger et al. 2019),but it is convenient to use the half-angle formula to express these terms instead as d lm,m (cid:48) ( β ) = l (cid:88) i =0 c lm,m (cid:48) ,i sgn(sin β ) i (1 − cos β ) l − i (1 + cos β ) i ¸ (C56)3where c lm,m (cid:48) ,i =  ( − l − m + m (cid:48)− i (cid:112) ( l − m )! ( l + m )! ( l − m (cid:48) )! ( l + m (cid:48) )!2 l (cid:0) i − m − m (cid:48) (cid:1) ! (cid:0) i + m + m (cid:48) (cid:1) ! (cid:0) l − i − m + m (cid:48) (cid:1) ! (cid:0) l − i + m − m (cid:48) (cid:1) ! m − m (cid:48) − i even0 m − m (cid:48) − i odd ¸ (C57)C.2.2. Probability density function

The latitude integrals (Equations C21 and C25) involve rotations by an angle φ about ˆ x , which may be accomplished by choosing Euler angles α = π / , β = φ , and γ = − π / , such that R l ˆ x ( φ ) = U l − D l ˆ x ( φ ) U l (C58)with D l ˆ x ( φ ) = D l (cid:16) π , φ , − π (cid:17) . (C59)From the expressions above, it is clear that all terms in R ˆ x ( φ ) are equal to (weighted)sums of powers of (1 ± cos φ ) . Since our goal is to compute integrals of these termsmultiplied by a probability density function, it is convenient to model cos φ as a Beta-distributed variable. As we will see, this choice will allow us to analytically computethe ﬁrst two moments of the distribution of φ conditioned on θθθ φ .The Beta distribution in cos φ has hyperparameters α and β (not to be confusedwith the Euler angles α and β ) and PDF given by p (cid:0) cos φ (cid:12)(cid:12) α, β (cid:1) = Γ( α + β )Γ( α )Γ( β ) (cos φ ) α − (1 − cos φ ) β − , (C60)where Γ is the Gamma function. The implied distribution for φ may be computed bya straightforward change of variable: p (cid:0) φ (cid:12)(cid:12) α, β (cid:1) = Γ( α + β )2Γ( α )Γ( β ) (cid:12)(cid:12) sin φ (cid:12)(cid:12) (cos φ ) α − (1 − cos φ ) β − , ¸ (C61)for φ ∈ (cid:2) − π , π (cid:3) . Both α and β are restricted to (0 , ∞ ) . However, in practice itis necessary to limit the values of these parameters to a ﬁnite range to ensure thenumerical stability of the algorithm. It is also convenient to work with the log ofthese quantities because of their large dynamic range. We therefore introduce themodiﬁed parameters a ≡ ln α − K K − K b ≡ ln β − K K − K (C62)4with inverse transform α = exp ( K + ( K − K ) a ) β = exp ( K + ( K − K ) b ) , (C63)where the matrix K =  ln  (C64)deﬁnes the minimum and maximum values of ln α (top row) and ln β (bottom row)we adopt in our implementation of the algorithm. The lower limits correspond to α > and β > , which excludes distributions with unphysically sharp peaks at φ = 90 ◦ . Both a and b are restricted to the domain (0 , , and together comprise thehyperparameter vector θθθ φ = ( a b ) (cid:62) . (C65)Because of their trivial domain, these parameters are convenient to sample in whendoing inference (provided we account for their implied prior on the spot latitudes; seebelow). However, a and b do not intuitively relate to physical quantities of interest.In many cases it is more desirable to parametrize the latitude distribution in termsof a parameter µ φ controlling the central latitude and a parameter σ φ controlling thedispersion in latitude among the spots. In this case, we may choose instead θθθ φ = ( µ φ σ φ ) (cid:62) . (C65*)Over most of the parameter space in a and b , the spot latitude distribution deﬁnedabove is well approximated by a bimodal Gaussian. In particular, there exists a one-to-one relationship between a and b and the mean µ φ and standard deviation σ φ of anormal approximation to the distribution. Moreover, we ﬁnd that if we let µ φ be the mode of the PDF and σ φ be a local approximation to the variance of the PDF, therelationship has a convenient closed form.To compute µ φ , we diﬀerentiate Equation (C61) with respect to φ , set the expres-sion equal to zero, and solve for φ to obtain µ φ = 2 tan − (cid:18)(cid:113) α + β − − (cid:112) α − α − β + 4 αβ + β + 5 (cid:19) . ¸ (C66)To compute σ φ , we note that the variance of a Gaussian distribution ϕ ( φ ; µ, σ ) isthe negative reciprocal of its curvature in log space: σ = − (cid:18) d ln ϕ ( φ ; µ, σ )d φ (cid:19) − (C67)5 p r o b a b ili t y d e n s i t y a =0.74 b =0.00 = = 0 a =0.72 b =0.56 = 30 a =0.63 b =0.65 = 60 a =0.32 b =0.58 = 85 p r o b a b ili t y d e n s i t y a =0.42 b =0.00 = a =0.40 b =0.27 a =0.31 b =0.36 a =0.06 b =0.28 p r o b a b ili t y d e n s i t y a =0.28 b =0.00 = a =0.26 b =0.15 a =0.19 b =0.23 a =0.02 b =0.18

90 60 30 0 30 60 90 latitude p r o b a b ili t y d e n s i t y a =0.06 b =0.00 =

90 60 30 0 30 60 90 latitude a =0.05 b =0.02

90 60 30 0 30 60 90 latitude a =0.02 b =0.05

90 60 30 0 30 60 90 latitude a =0.00 b =0.07 Figure 16.

Probability density function for the spot latitude (blue curves) for diﬀerentvalues of the mode µ φ (columns) and local standard deviation σ φ (rows). The correspondingvalues of a and b are indicated within each panel. The bimodal normal distribution withmean µ φ and standard deviation σ φ is shown as the dashed orange curves; for mid-latitudemodes and low standard deviations, the Gaussian approximation is quite good. The shadedpanel in the lower right ( µ φ = 0 ◦ , σ φ = 0 ◦ ) corresponds to an approximately isotropicdistribution of spots over the surface of the star; in this panel only, the dashed orange curvecorresponds to a cosine distribution in φ (i.e., the exact isotropic distribution). - We therefore twice diﬀerentiate the log of Equation (C61), negate it, take the recip-rocal, and evaluate it at φ = µ φ to obtain a local approximation to the standarddeviation of the distribution at the mode: σ φ = sin µ φ (cid:115) − α + β + ( β −

1) cos µ φ + α − µ φ . ¸ (C68)For completeness, the inverse transform also has a closed form: α = 2 + 4 σ φ + (3 + 8 σ φ ) cos µ φ + 2 cos (2 µ φ ) + cos (3 µ φ )16 σ φ cos (cid:0) µ φ (cid:1) ¸ (C69) β = cos µ φ + 2 σ φ (3 + cos (2 µ φ )) − cos (3 µ φ )16 σ φ cos (cid:0) µ φ (cid:1) . ¸ (C70)For reference, Figure 16 shows the latitude PDF and the corresponding Gaussianapproximation for diﬀerent values of µ φ and σ φ . The corresponding values of a and6 b are indicated in the top left of each panel. For µ φ at intermediate latitudes andmoderate values of σ φ , the approximation is quite good. However, for µ φ very closeto the equator or to the poles, the curvature of the distribution changes signiﬁcantlyas a function of φ , so the variance is somewhat underestimated by the approximation;and for σ φ large, the distribution becomes noticeably non-Gaussian.The shaded panel at the lower left is a special case of the distribution ( µ φ = 0 ◦ , σ φ ≈ ◦ ; or equivalently, a ≈ . , b = 0 ), which is approximately isotropic inlatitude. In this panel, the orange curve instead corresponds to an isotropic (cosine)distribution in φ ; note the excellent agreement. Thus, in addition to having closed-form moments, the Beta distribution is quite ﬂexible and, via the transforms outlinedabove, intuitive in how it aﬀects the distribution of spots on the surface of a star.In the following sections we derive expressions for the moments of the distribu-tion in terms of α and β , as this is somewhat more convenient; these can easily betransformed into expressions involving either µ φ and σ φ or a and b via the equationsabove. The former parametrization is convenient when these properties are knownand can be ﬁxed; i.e., when using the GP as a restrictive prior for the light curveof a star whose spot distribution is already understood. However, for the purposesof posterior inference—that is, when trying to constrain the hyperparameters of theGP—we recommend sampling in the parameters a and b , since their domains aretrivial, with uncorrelated boundaries. Posterior constraints on these quantities mayeasily be transformed into constraints on µ φ and σ φ via the equations above. Note,importantly, that this requires us to explicitly add a Jacobian term to the likelihoodto account for the prior implied by sampling uniformly in a and b in the range (0 , .The Jacobian is given by J = ∂µ φ ∂a ∂σ φ ∂b − ∂µ φ ∂b ∂σ φ ∂a ¸ (C71) = ( K − K )( K − K ) αβ ( µ φ ) sin ( µ φ ) σ φ ( α + β − α + β −

1) cos µ φ )( α + β − β −

1) cos µ φ − α − β −

1) cos ( µ φ ) +( β −

1) cos ( µ φ )) Adding the log of the absolute value of J to the log likelihood corrects for the ad hoc prior on the latitude parameters introduced by our particular choice of parametriza-tion, enforcing instead a uniform prior on the quantities µ φ and σ φ .C.2.3. First moment

Since the Wigner matrices are block diagonal, we may evaluate the moments ofthe distribution one spherical harmonic degree at a time. To that end, let us writethe ﬁrst moment integral as e φ ≡ (cid:90) R ˆ x ( φ ) e r p ( φ (cid:12)(cid:12) θθθ φ ) d φ = (cid:0) e φ e φ e φ · · · e l max φ (cid:1) (cid:62) , ¸ (C72)7where e lφ = (cid:90) R l ˆ x ( φ ) e lr p ( φ (cid:12)(cid:12) θθθ φ ) d φ = U l − p lφ , ¸ (C73)and we deﬁne p lφ ≡ (cid:90) D l ˆ x ( φ ) ¯ e lr p ( φ (cid:12)(cid:12) θθθ φ ) d φ (C74) ¯ e lr ≡ U l e lr . (C75)The integral p lφ deﬁned above has a closed-form solution. To show this, we write theterms of p lφ as ( p lφ ) m = (cid:90) l (cid:88) µ = − l ( D l ˆ x ) m,µ ( φ ) (¯ e lr ) µ p ( φ (cid:12)(cid:12) θθθ φ ) d φ = Γ( α + β )Γ( α )Γ( β ) l (cid:88) µ = − l (¯ e lr ) µ exp (cid:20) i π m − µ ) (cid:21) l (cid:88) i =0 c lm,µ,i ( q lφ ) i ( θθθ φ ) , (C76)where ( q lφ ) i ( θθθ φ ) ≡ (cid:90) π − π sgn(sin φ ) i (cid:12)(cid:12) sin φ (cid:12)(cid:12) (cos φ ) α − (1 − cos φ ) l + β − i − (1 + cos φ ) i d φ = (cid:90) (cid:120) α − (1 − (cid:120) ) l + β − i − (1 + (cid:120) ) i d (cid:120) i even0 i odd , ¸ (C77)and in the last line we made use of the transformation (cid:120) = cos φ . The integral in theexpression above has a closed-form solution in terms of the hypergeometric function F : ( q lφ ) i ( θθθ φ ) =  Γ( α )Γ( l + β − i )Γ( l + α + β − i ) F (cid:16) − i , α ; l + α + β − i ; − (cid:17) i even0 i odd . ¸ (C78)In order to compute the integral e φ (Equation C21) we must evaluate Equation (C78)for all ≤ l ≤ l max , ≤ i ≤ l , which can be done eﬃciently via upward recursionrelations for both the gamma function and the hypergeometric function.8 C.2.4. Second moment

Similarly as before, let us write the second moment integral as E φ ≡ (cid:90) R ˆ x ( φ ) E r R (cid:62) ˆ x ( φ ) p ( φ (cid:12)(cid:12) θθθ φ ) d φ =  E l max , φ ... E , φ E , φ E , φ E l max , φ ... E , φ E , φ E , φ E l max , φ ... E , φ E , φ E , φ · · · . . . · · ·· · ·· · · E l max ,l max φ ... E ,l max φ E ,l max φ E ,l max φ  , ¸ (C79)where E l,l (cid:48) φ = (cid:90) R l ˆ x ( φ ) E l,l (cid:48) r R l (cid:48) ˆ x (cid:62) ( φ ) p ( φ (cid:12)(cid:12) θθθ φ ) d φ = U l − P l,l (cid:48) φ U l (cid:48) − (cid:62) ¸ (C80)and we deﬁne P l,l (cid:48) φ ≡ (cid:90) D l ˆ x ( φ ) ¯ E l,l (cid:48) r D l (cid:48) ˆ x (cid:62) ( φ ) p ( φ (cid:12)(cid:12) θθθ φ ) d φ (C81) ¯ E l,l (cid:48) r ≡ U l E l,l (cid:48) r U l (cid:48) (cid:62) . (C82)As before, we may express the solution to the integral P l,l (cid:48) φ in closed form. Let uswrite the terms of P l,l (cid:48) φ as ( P l,l (cid:48) φ ) m,m (cid:48) = (cid:90) l (cid:88) µ = − l l (cid:48) (cid:88) µ (cid:48) = − l (cid:48) ( D l ˆ x ) m,µ ( φ ) ( ¯ E l,l (cid:48) r ) µ,µ (cid:48) ( D l (cid:48) ˆ x ) m (cid:48) ,µ (cid:48) ( φ ) p ( φ (cid:12)(cid:12) θθθ φ ) d φ (C83) = Γ( α + β )Γ( α )Γ( β ) l (cid:80) µ = − l l (cid:48) (cid:80) µ (cid:48) = − l (cid:48) ( ¯ E l,l (cid:48) r ) µ,µ (cid:48) exp (cid:2) i π ( m − µ + m (cid:48) − µ (cid:48) ) (cid:3) l (cid:80) i =0 2 l (cid:48) (cid:80) i (cid:48) =0 c lm,µ,i c l (cid:48) m (cid:48) ,µ (cid:48) ,i (cid:48) ( Q l,l (cid:48) φ ) i,i (cid:48) ( θθθ φ ) , where, similarly to before, ( Q l,l (cid:48) φ ) i,i (cid:48) ( θθθ φ ) ≡ (cid:82) π − π sgn(sin φ ) i + i (cid:48) (cid:12)(cid:12) sin φ (cid:12)(cid:12) (cos φ ) α − (1 − cos φ ) l + l (cid:48) + β − i + i (cid:48) − (1 + cos φ ) i + i (cid:48) d φ = (cid:90) (cid:120) α − (1 − (cid:120) ) l + l (cid:48) + β − i + i (cid:48) − (1 + (cid:120) ) i + i (cid:48) d (cid:120) i + i (cid:48) even0 i + i (cid:48) odd . ¸ (C84)We may again express this integral in closed form: ( Q l,l (cid:48) φ ) i,i (cid:48) ( θθθ φ ) =  Γ( α )Γ( l + l (cid:48) + β − i + i (cid:48) )Γ( l + l (cid:48) + α + β − i + i (cid:48) ) F (cid:16) − i + i (cid:48) , α ; l + l (cid:48) + α + β − i + i (cid:48) ; − (cid:17) i + i (cid:48) even0 i + i (cid:48) odd . ¸ (C85)As before, this integral may be evaluated recursively to eﬃciently compute all of theterms in E φ .9C.3. The Longitude Integrals

In this section we will compute the ﬁrst and second moments of the longitudedistribution ( e λ and E λ , given by Equations C22 and C26, respectively). The mathhere is very similar to that in the previous section, as we are again dealing withintegrals of Wigner matrices (§C.2.1).C.3.1. Probability density function

The longitude integrals (Equations C22 and C26) involve rotations by an angle λ about ˆ y , which may be accomplished by choosing Euler angles α = 0 , β = λ , and γ = 0 , such that R l ˆ y ( λ ) = U l − D l ˆ y ( λ ) U l (C86)with D l ˆ y ( λ ) = D l (0 , λ , . (C87)Since we expect the longitudinal distribution of features on the surfaces of stars tobe (on average) isotropic, we will place a uniform prior on λ ∈ [ − π, π ) : p ( λ (cid:12)(cid:12) θθθ λ ) =  π − π ≤ λ < π . (C88)We therefore have no hyperparameters controlling the longitudinal distribution, i.e., θθθ λ = ( ) . (C89)C.3.2. First moment

As before, we will solve for the terms of the moment integrals one spherical har-monic degree at a time: e λ ≡ (cid:90) R ˆ y ( λ ) e φ p ( λ (cid:12)(cid:12) θθθ λ ) d λ = (cid:0) e λ e λ e λ · · · e l max λ (cid:1) (cid:62) , ¸ (C90)where e lλ = (cid:90) R l ˆ y ( λ ) e lφ p ( λ (cid:12)(cid:12) θθθ λ ) d λ = U l − p lλ , ¸ (C91)and we deﬁne p lλ ≡ (cid:90) D l ˆ x ( λ ) ¯ e lφ p ( λ (cid:12)(cid:12) θθθ λ ) d λ (C92) ¯ e lφ ≡ U l e lφ . (C93)0The integral p lλ deﬁned above has a closed-form solution. To show this, we write theterms of p lλ as ( p lλ ) m = (cid:90) l (cid:88) µ = − l ( D l ˆ y ) m,µ ( λ ) (¯ e lφ ) µ p ( λ (cid:12)(cid:12) θθθ λ ) d λ = l (cid:88) µ = − l (¯ e lφ ) µ l (cid:88) i =0 c lm,µ,i ( q lλ ) i , (C94)where ( q lλ ) i ≡ π (cid:90) π − π sgn(sin λ ) i (1 − cos λ ) l − i (1 + cos λ ) i d λ =  l Γ (cid:0) i (cid:1) Γ (cid:0) l + − i (cid:1) π Γ( l + 1) i even0 i odd , ¸ (C95)whose terms may easily be computed by upward recursion. Since q lI does not dependon any user inputs, it may be computed a single time as a pre-processing step foreﬃciency. C.3.3. Second moment

We write the second moment integral as E λ ≡ (cid:90) R ˆ y ( λ ) E φ R (cid:62) ˆ y ( λ ) p ( λ (cid:12)(cid:12) θθθ λ ) d λ =  E l max , λ ... E , λ E , λ E , λ E l max , λ ... E , λ E , λ E , λ E l max , λ ... E , λ E , λ E , λ · · · . . . · · ·· · ·· · · E l max ,l max λ ... E ,l max λ E ,l max λ E ,l max λ  , ¸ (C96)where E l,l (cid:48) λ = (cid:90) R l ˆ y ( λ ) E l,l (cid:48) φ R l (cid:48) ˆ y (cid:62) ( λ ) p ( λ (cid:12)(cid:12) θθθ λ ) d λ = U l − P l,l (cid:48) λ U l (cid:48) − (cid:62) ¸ (C97)and we deﬁne P l,l (cid:48) λ ≡ (cid:90) D l ˆ x ( λ ) ¯ E l,l (cid:48) φ D l (cid:48) ˆ x (cid:62) ( λ ) p ( λ (cid:12)(cid:12) θθθ λ ) d λ (C98) ¯ E l,l (cid:48) φ ≡ U l E l,l (cid:48) φ U l (cid:48) (cid:62) . (C99)1We then express the terms of P l,l (cid:48) λ as ( P l,l (cid:48) λ ) m,m (cid:48) = (cid:90) l (cid:88) µ = − l l (cid:48) (cid:88) µ (cid:48) = − l (cid:48) ( D l ˆ x ) m,µ ( λ ) ( ¯ E l,l (cid:48) φ ) µ,µ (cid:48) ( D l (cid:48) ˆ x ) m (cid:48) ,µ (cid:48) ( λ ) p ( λ (cid:12)(cid:12) θθθ λ ) d λ = l (cid:88) µ = − l l (cid:48) (cid:88) µ (cid:48) = − l (cid:48) ( ¯ E l,l (cid:48) φ ) µ,µ (cid:48) l (cid:88) i =0 2 l (cid:48) (cid:88) i (cid:48) =0 c lm,µ,i c l (cid:48) m (cid:48) ,µ (cid:48) ,i (cid:48) ( Q l,l (cid:48) λ ) i,i (cid:48) , (C100)where ( Q l,l (cid:48) λ ) i,i (cid:48) ≡ π (cid:90) π − π sgn(sin λ ) i + i (cid:48) (1 − cos λ ) l + l (cid:48) − i + i (cid:48) (1 + cos λ ) i + i (cid:48) d λ =  l + l (cid:48) Γ (cid:0) i + i (cid:48) (cid:1) Γ (cid:0) l + l (cid:48) + − i − i (cid:48) (cid:1) π Γ( l + l (cid:48) + 1) i + i (cid:48) even0 i + i (cid:48) odd , ¸ (C101)whose terms may again be computed by upward recursion in a single pre-processingstep. C.4. The Contrast Integrals

The ﬁnal integrals we must take in our computation of E (cid:2) y (cid:12)(cid:12) θθθ • (cid:3) and E (cid:2) yy (cid:62) (cid:12)(cid:12) θθθ • (cid:3) are the integrals over the spot contrast distribution, e c and E c . These are by farthe easiest, since the spot contrast is a scalar multiplier of the spherical harmoniccoeﬃcient vector, so we can pull the terms e λ and E λ out of the integrals in Equations(C23) and (C27) to write e c ≡ − e λ (cid:90) (cid:99) p ( (cid:99) (cid:12)(cid:12) θθθ c ) d (cid:99) (C102) E c ≡ E λ (cid:90) (cid:99) p ( (cid:99) (cid:12)(cid:12) θθθ c )d (cid:99) . (C103)These integrals may be computed analytically for any choice of probability densityfunction p ( (cid:99) (cid:12)(cid:12) θθθ c ) with closed-form moments. However, in practice, it is quite diﬃcultto constrain the spot contrast from light curves, let alone higher moments of itsdistribution; this is due largely to the fact that the contrast is extremely degeneratewith the total number of spots (Paper I). In our implementation of the algorithm, wetherefore choose the simplest possible probability distribution, a delta function: p ( (cid:99) (cid:12)(cid:12) θθθ c ) = δ ( (cid:99) − c ) (C104)characterized by a single parameter, the contrast of the spots: θθθ c = ( c ) (cid:62) . (C105)The moment integrals are then trivial to evaluate: e c = − c e λ (C106) E c = c E λ . (C107)2 D. INCLINATIONIn this section we will compute the ﬁrst and second moment integrals of the incli-nation distribution (Equations 20 and 21), which allow us to compute the mean andcovariance of the process that describes the ﬂux marginalized over all values of theinclination (Equations 18 and 19).D.1.

Probability density function

Similar to the latitude integrals, the process of inclining a star relative to theobserver (see Equation B7) involves rotations by an angle − (cid:73) about ˆ x , which may beaccomplished by choosing Euler angles α = π / , β = − (cid:73) , and γ = − π / , such that R l ˆ x ( − (cid:73) ) = U l − D l ˆ x ( − (cid:73) ) U l (D108)with D l ˆ x ( − (cid:73) ) = D l (cid:16) π , − (cid:73) , − π (cid:17) . (D109)Since we expect an isotropic distribution of stellar rotation axes (absent prior con-straints on individual stars), the prior probability density for the inclination I issimply p ( (cid:73) ) = sin (cid:73) (D110)for (cid:73) ∈ [0 , π ] . D.2. First moment

The expression for the ﬁrst moment is e I ≡ (cid:90) AAA ( (cid:73) , P, u ) e y p ( (cid:73) ) d (cid:73) (D111)where e y ≡ E (cid:104) y (cid:12)(cid:12)(cid:12) θθθ • (cid:105) (D112)is the ﬁrst moment of the distribution over spherical harmonic coeﬃcients (Equa-tion 11). We can use Equations (B6) and (B7) to express the element at index k (corresponding to the mean of the GP at time t = t k ) as ( e I ) k = (cid:90) a (cid:62) k ( (cid:73) ) e y p ( (cid:73) ) d (cid:73) = r (cid:62) A (cid:90) R ˆ x ( − (cid:73) ) ( e y (cid:48) ) k p ( (cid:73) ) d (cid:73) , (D113)3where we deﬁne ( e y (cid:48) ) k ≡ R ˆ z (cid:18) πP t k (cid:19) R ˆ x (cid:16) π (cid:17) e y (D114)as the expectation of y in the polar frame at time t = t k . At this point, it isconvenient to invoke the fact that our GP is longitudinally isotropic: there is nopreferred longitude on the surface of the star, or, equivalently, no preferred phase inthe light curve. The rotation about ˆ z (i.e., the rotational axis of the star) thereforecannot change the expectation of y , so ( e y (cid:48) ) k = ( e y (cid:48) ) = R ˆ x (cid:16) π (cid:17) e y ≡ e y (cid:48) . (D115)We therefore have e I ≡ e I (D116)where e I = r (cid:62) A e y (cid:48)(cid:48) (D117)and e y (cid:48)(cid:48) = (cid:90) R ˆ x ( − (cid:73) ) e y (cid:48) p ( (cid:73) ) d (cid:73) = (cid:0) e y (cid:48)(cid:48) e y (cid:48)(cid:48) e y (cid:48)(cid:48) · · · e l max y (cid:48)(cid:48) (cid:1) (cid:62) (D118)is the expectation of y in the observer’s frame, and as before we explicitly separate itout by spherical harmonic degree. As in §C.2, we may write e ly (cid:48)(cid:48) = (cid:90) R l ˆ x ( − (cid:73) ) e ly (cid:48) p ( (cid:73) ) d (cid:73) = U l − p lI , ¸ (D119)and we deﬁne p lI ≡ (cid:90) D l ˆ x ( − (cid:73) ) ¯ e ly (cid:48) p ( (cid:73) ) d (cid:73) (D120) ¯ e ly (cid:48) ≡ U l e ly (cid:48) . (D121) In the presence of limb darkening, we must include the limb darkening operator L ( u ) in Equa-tion (D113); see §B.2. p lI deﬁned above has a closed-form solution. To show this, we write theterms of p lI as ( p lI ) m = (cid:90) l (cid:88) µ = − l ( D l ˆ x ) m,µ ( − (cid:73) ) (¯ e ly (cid:48) ) µ p ( (cid:73) ) d (cid:73) = l (cid:88) µ = − l (¯ e ly (cid:48) ) µ exp (cid:20) i π m − µ ) (cid:21) l (cid:88) i =0 c lm,µ,i ( q lI ) i , ¸ (D122)where ( q lI ) i ≡ (cid:90) π ( − i (1 − cos (cid:73) ) l − i (1 + cos (cid:73) ) i sin (cid:73) d (cid:73) = ( − i (cid:90) (1 − (cid:120) ) l − i (1 + (cid:120) ) i d (cid:120) = ( − i l − i + 1 F (cid:18) , − i l − i − (cid:19) , ¸ (D123)which may easily be computed recursively. As with the longitude integrals, the vector q lI need only be computed a single time as a pre-processing step, as it does not dependon any user inputs. D.3. Second moment

The expression for the second moment is E I ≡ (cid:90) AAA ( (cid:73) , P, u ) E y AAA (cid:62) ( I, P, u ) p ( (cid:73) ) d (cid:73) (D124)where E y ≡ E (cid:104) y y (cid:62) (cid:12)(cid:12)(cid:12) θθθ • (cid:105) (D125)is the second moment of the distribution over spherical harmonic coeﬃcients (Equa-tion 12). We can use Equations (B6) and (B7) to express the element at index k, k (cid:48) (corresponding to the covariance of the GP between times t = t k and t (cid:48) = t k (cid:48) ) as ( E I ) k,k (cid:48) = (cid:90) a (cid:62) k ( (cid:73) ) E y a k (cid:48) ( (cid:73) ) p ( (cid:73) ) d (cid:73) = r (cid:62) A ( E y (cid:48)(cid:48) ) k,k (cid:48) A (cid:62) r , (D126)where ( E y (cid:48)(cid:48) ) k,k (cid:48) = (cid:90) R ˆ x ( − (cid:73) ) ( E y (cid:48) ) k,k (cid:48) R (cid:62) ˆ x ( − (cid:73) ) p ( (cid:73) ) d (cid:73) (D127)5is the expectation of y y (cid:62) in the observer’s frame at times t = t k and t (cid:48) = t k (cid:48) and ( E y (cid:48) ) k,k (cid:48) ≡ R ˆ z (cid:18) πP t k (cid:19) R ˆ x (cid:16) π (cid:17) E y R (cid:62) ˆ x (cid:16) π (cid:17) R (cid:62) ˆ z (cid:18) πP t k (cid:48) (cid:19) (D128)is the expectation of y y (cid:62) in the polar frame at times t = t k and t (cid:48) = t k (cid:48) . Therest of the computation follows what we did in §C.2, except that the number ofoperations required to compute E I is a factor of K larger than in the computationof expectations like E φ (Equation C79). That is because we must compute the integralof all terms of a matrix for each of the K elements of E I . We discuss in §D.4 belowstrategies that can drastically improve the computational scaling of marginalizingover the inclination.Let us write Equation (D127) in terms of its spherical harmonic components: ( E y (cid:48)(cid:48) ) k,k (cid:48) =  (cid:16) E l max , y (cid:48)(cid:48) (cid:17) k,k (cid:48) ... (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:16) E l max , y (cid:48)(cid:48) (cid:17) k,k (cid:48) ... (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:16) E l max , y (cid:48)(cid:48) (cid:17) k,k (cid:48) ... (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) (cid:0) E , y (cid:48)(cid:48) (cid:1) k,k (cid:48) · · · . . . · · ·· · ·· · · (cid:16) E l max ,l max y (cid:48)(cid:48) (cid:17) k,k (cid:48) ... (cid:16) E ,l max y (cid:48)(cid:48) (cid:17) k,k (cid:48) (cid:16) E ,l max y (cid:48)(cid:48) (cid:17) k,k (cid:48) (cid:16) E ,l max y (cid:48)(cid:48) (cid:17) k,k (cid:48)  , ¸ (D129)where (cid:16) E l,l (cid:48) y (cid:48)(cid:48) (cid:17) k,k (cid:48) = (cid:90) R l ˆ x ( − (cid:73) ) (cid:16) E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) R l (cid:48) ˆ x (cid:62) ( − (cid:73) ) p ( (cid:73) ) d (cid:73) = U l − (cid:16) P l,l (cid:48) I (cid:17) k,k (cid:48) U l (cid:48) − (cid:62) ¸ (D130)and we deﬁne (cid:16) P l,l (cid:48) I (cid:17) k,k (cid:48) ≡ (cid:90) D l ˆ x ( − (cid:73) ) (cid:16) ¯ E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) D l (cid:48) ˆ x (cid:62) ( − (cid:73) ) p ( (cid:73) ) d (cid:73) (D131) (cid:16) ¯ E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) ≡ U l (cid:16) E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) U l (cid:48) (cid:62) . (D132)As before, we may express the solution to the integral in Equation (D131) in closedform. Let us write its terms as (cid:20)(cid:16) P l,l (cid:48) I (cid:17) k,k (cid:48) (cid:21) m,m (cid:48) = (cid:82) l (cid:80) µ = − l l (cid:80) µ (cid:48) = − l ( D l ˆ x ) m,µ ( − (cid:73) ) (cid:20)(cid:16) ¯ E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) (cid:21) µ,µ (cid:48) ( D l (cid:48) ˆ x ) µ (cid:48) ,m (cid:48) ( − (cid:73) ) p ( (cid:73) ) d (cid:73) (D133) = l (cid:80) µ = − l l (cid:48) (cid:80) µ (cid:48) = − l (cid:48) (cid:20)(cid:16) ¯ E l,l (cid:48) y (cid:48) (cid:17) k,k (cid:48) (cid:21) µ,µ (cid:48) exp (cid:2) i π ( m − µ + m (cid:48) − µ (cid:48) ) (cid:3) l (cid:80) i =0 2 l (cid:48) (cid:80) i (cid:48) =0 c lm,µ,i c l (cid:48) m (cid:48) ,µ (cid:48) ,i (cid:48) ( Q l,l (cid:48) I ) i,i (cid:48) , ( Q l,l (cid:48) I ) i,i (cid:48) ≡ (cid:90) π ( − i + i (cid:48) (1 − cos (cid:73) ) l + l (cid:48) − i + i (cid:48) (1 + cos (cid:73) ) i + i (cid:48) sin (cid:73) d (cid:73) = ( − i + i (cid:48) (cid:90) (1 − (cid:120) ) l + l (cid:48) − i + i (cid:48) (1 + (cid:120) ) i + i (cid:48) d (cid:120) = ( − i + i (cid:48) l + l (cid:48) − i + i (cid:48) + 1 F (cid:18) , − i + i (cid:48) l + l (cid:48) − i + i (cid:48) − (cid:19) , ¸ (D134)which may again be computed recursively in a pre-processing step.D.4. Speeding up the computation

The expressions in the previous section are a bit of a nightmare, particularlybecause of the dimensionality of some of the linear operators involved. The complexityof the expressions is due to the fact that the second moment of the spherical harmonicvector projected onto the sky (Equation D127) is time-dependent: it changes as thestar rotates. Computing the second moment of the ﬂux requires computing theouter product of this tensor with itself, leading to multi-indexed quantities like thosein Equation (D133). In addition to being cumbersome to evaluate, the full secondmoment matrix E I (and hence the ﬂux covariance matrix) is costly to compute.It is helpful that Equation (D134) does not depend on any user inputs and thusmay be pre-computed, but even still we require evaluating the four nested sums inEquation (D127) O ( l ) times for each entry in the ( K × K ) matrix E I .Fortunately, the inner two sums in Equation (D127) do not depend on user inputs,so those may be pre-computed, and Equation (D127) may be cast as a straightforwardmatrix dot product. In practice we also ﬁnd it helpful to take advantage of the phaseindependence (i.e., stationarity) of the covariance of our GP, as we did in §D.2: anytwo entries ( E I ) k,k (cid:48) and ( E I ) j,j (cid:48) are the same if t k − t k (cid:48) = t j − t j (cid:48) . If the data happento be evenly sampled, such that the time diﬀerence between adjacent cadences isconstant, then we need only compute the covariance at a total of K points (as opposedto K ), as the covariance is a circulant matrix which is fully speciﬁed by a single vectorof length K .In the more general case where the data are not evenly sampled, we may stillevaluate the covariance at a ﬁxed number of points K (cid:48) < K and approximate thefull covariance matrix via interpolation. As long as the data are roughly evenlysampled, as is the case with Kepler or TESS light curves, this approximation leadsto negligible error when K (cid:48) ≈ K , aﬀording the same O ( K ) computational savings.Note that even in the case where the ﬂux is normalized (see §2.5), the non-stationarycorrection to the covariance is applied after the step where we marginalize over theinclination, so this approach is still valid.7S. SUPPLEMENTARY FIGURESFigures S17-S30 below are referenced in §3.5 discussing additional calibration runsfor our GP.8 . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S17.

Same as Figures 6 and 7, but for mid-latitude spots with µ φ = 45 ◦ . Theradius and latitude parameters are again inferred correctly. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S18.

Same as Figure S17, but for high-latitude spots with µ φ = 60 ◦ . The radiusand latitude parameters are again inferred correctly, although the model cannot rule out thepresence of polar spots. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.050.100.15 p r o b a b ili t y truthsamples Figure S19.

Same as Figure S17, but for equatorial spots with µ φ = 0 ◦ . Even though themodel favors a bimodal distribution at low latitudes, the posterior strongly supports thepresence of equatorial spots. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.0000.0050.0100.015 p r o b a b ili t y truthsamples Figure S20.

Same as Figure S17, but for isotropically-distributed spots. The posterioraccurately captures the cosine-like distribution of spot latitudes. - . . . . . c r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S21.

Same as Figures 6 and 7, but for high contrast ( c = 1 ) small ( r = 3 ◦ ) spots(signiﬁcantly lower than the eﬀective resolution of the model). The hatched regions in theposterior plots for the radius ( r < ◦ ) are excluded by the prior, since the model cannotcapture features that small. Despite this, the spot latitude distribution is still inferredcorrectly, although the spot contrast is oﬀ by more than σ . - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S22.

Same as Figures 6 and 7, but for inference based on a single light curve( M = 1 ). The constraints on all of the parameters are dramatically weaker. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S23.

Same as Figures 6 and 7, but for inference based on one thousand light curves( M = 1 , ). The constraints on the radius and the latitude parameters are dramaticallytighter. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S24.

Same as Figures 6 and 7, but for stars with (assumed known) limb darkeningcoeﬃcients u = 0 . and u = 0 . . Limb darkening makes it much harder to infer thevariance of the distribution of starspot latitudes. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S25.

Same as Figure S24, but for an ensemble consisting of M = 1 , light curves.For a suﬃciently large ensemble, it is possible to correctly infer the spot radii and latitudesin the presence of limb darkening. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S26.

Same as Figure S24, but for M = 500 light curves and assuming no limbdarkening when doing inference. Neglecting limb darkening leads to signiﬁcant bias in theinferred spot radii and to a lesser extent in the mean spot latitude. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S27.

Same as Figures 6 and 7, but for stars with n = 2 spots with high contrast c =0 . . Our model correctly captures the increased contrast, but it is still strongly degeneratewith the number of spots. - . . . . . c

20 30 40 50 60 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S28.

Same as Figures 6 and 7, but for a single ( n = 1 ) large ( r ∼ N (45 ◦ , ◦ ) ) spoton each star. Our model correctly infers the larger radius, and infers the latitude within σ ,albeit with large uncertainty. - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S29.

Same as Figures 6 and 7, but for stars with variance in their number of spots, n ∼ N (20 , ) , the spot radii, r ∼ N (15 ◦ , ◦ ) , and the spot contrasts, c ∼ N (0 . , . ) . - . . . . . c

12 16 20 24 28 r n

20 40 60 80 10 20 30 40 0 . . . . . c

10 20 30 40 50 n -90 -75 -60 -45 -30 -15 0 15 30 45 60 75 90latitude0.000.020.040.06 p r o b a b ili t y truthsamples Figure S30.

Same as Figures 6 and 7, but assuming the light curves are not normalized andthe true amplitude is known. Knowledge of the normalization breaks the c − n degeneracyand allows us to infer the total number of spots.degeneracyand allows us to infer the total number of spots.