[PDF] Simultaneous inference of periods and period-luminosity relations for Mira variable stars

Abstract

The Period--Luminosity relation (PLR) of Mira variable stars is an important tool to determine astronomical distances. The common approach of estimating the PLR is a two-step procedure that first estimates the Mira periods and then runs a linear regression of magnitude on log period. When the light curves are sparse and noisy, the accuracy of period estimation decreases and can suffer from aliasing effects. Some methods improve accuracy by incorporating complex model structures at the expense of significant computational costs. Another drawback of existing methods is that they only provide point estimation without proper estimation of uncertainty. To overcome these challenges, we develop a hierarchical Bayesian model that simultaneously models the quasi-periodic variations for a collection of Mira light curves while estimating their common PLR. By borrowing strengths through the PLR, our method automatically reduces the aliasing effect, improves the accuracy of period estimation, and is capable of characterizing the estimation uncertainty. We develop a scalable stochastic variational inference algorithm for computation that can effectively deal with the multimodal posterior of period. The effectiveness of the proposed method is demonstrated through simulations, and an application to observations of Miras in the Local Group galaxy M33. Without using ad-hoc period correction tricks, our method achieves a distance estimate of M33 that is consistent with published work. Our method also shows superior robustness to downsampling of the light curves.

Full PDF

SSubmitted to the Annals of Applied Statistics

SIMULTANEOUS INFERENCE OF PERIODS AND PERIOD-LUMINOSITYRELATIONS FOR MIRA VARIABLE STARS B Y S HIYUAN H E , Z HENFENG L IN , W ENLONG Y UAN ,L UCAS

M. M

ACRI AND J IANHUA

Z. H

UANG Institute of Statistics and Big Data, Renmin University of China, [email protected] Microsoft, [email protected] Department of Physics and Astronomy, Johns Hopkins University, [email protected] Department of Physics and Astronomy, Texas A&M University [email protected] Department of Statistics, Texas A&M University [email protected]

The Period–Luminosity relation (PLR) of Mira variable stars is an impor-tant tool to determine astronomical distances. The common approach of esti-mating the PLR is a two-step procedure that ﬁrst estimates the Mira periodsand then runs a linear regression of magnitude on log period. When the lightcurves are sparse and noisy, the accuracy of period estimation decreases andcan suffer from aliasing effects. Some methods improve accuracy by incorpo-rating complex model structures at the expense of signiﬁcant computationalcosts. Another drawback of existing methods is that they only provide pointestimation without proper estimation of uncertainty. To overcome these chal-lenges, we develop a hierarchical Bayesian model that simultaneously modelsthe quasi-periodic variations for a collection of Mira light curves while esti-mating their common PLR. By borrowing strengths through the PLR, ourmethod automatically reduces the aliasing effect, improves the accuracy ofperiod estimation, and is capable of characterizing the estimation uncertainty.We develop a scalable stochastic variational inference algorithm for compu-tation that can effectively deal with the multimodal posterior of period. Theeffectiveness of the proposed method is demonstrated through simulations,and an application to observations of Miras in the Local Group galaxy M33.Without using ad-hoc period correction tricks, our method achieves a dis-tance estimate of M33 that is consistent with published work. Our methodalso shows superior robustness to downsampling of the light curves.

1. Introduction.

One essential step to determine the age and composition of our Uni-verse is to calibrate its current expansion rate (commonly referred to as the “Hubble constant”or H in astronomy) using the cosmological distance ladder. Observational astronomers haveextensively relied on the Period–Luminosity relations (PLRs) of classical Cepheid variablestars to measure distances to galaxies within 150 million light years of the Milky Way, en-abling the calibration of farther-reaching techniques (such as type Ia supernovae) to obtainincreasingly more robust estimates of H (Freedman et al., 2001; Riess et al., 2011; Riesset al., 2016; Riess et al., 2019). Recent studies (Whitelock et al., 2014; Whitelock and Feast,2014; Yuan et al., 2017; Huang et al., 2018) have shown that Mira variable stars (hereafter,Miras) also exhibit tight PLRs at near-infrared (NIR) wavelengths and thus can serve as in-dependent distance indicators in lieu of Cepheids. Miras are highly-evolved stars with twosubclasses based on their atmospheric chemistry: oxygen- and carbon-rich (O- and C-rich,respectively). They constitute a class of long-period variable stars whose luminosity changesdramatically (in a quasi-periodic manner) as a function of time. One pulsation cycle of Mirasusually lasts more than one hundred days, with extreme cases beyond several years. Keywords and phrases: variational inference, hierarchical Bayesian modeling, astrostatistics, Mira variables a r X i v : . [ s t a t . A P ] J a n lllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Date [day] M agn i t ude l VI (a) One LMC O-rich Mira. M agn i t ude + O ff s e t log (period) M agn i t ude + O ff s e t llll lll l ll ll l l lll ll lll l l ll lllll l lll l ll ll ll ll ll ll l l lllll l ll l ll l lll l lll ll ll l l l l llll lll l l ll ll l ll llll ll l lll lll ll l lll ll l llll l l ll ll ll ll ll ll ll llll l ll ll llll ll lll lll ll ll ll ll lll l ll lll l ll llll lll l ll ll l l lll ll lll l l ll lllll l lll l ll ll ll ll ll ll l l lllll l ll l ll l lll l lll ll ll l l l l llll lll l l ll ll l ll llll ll l lll lll ll l lll ll l llll l l ll ll ll ll ll ll ll llll l ll ll llll ll lll lll ll ll ll ll lll l ll lll l ll llll lll l ll ll l l lll ll lll l l ll lllll l lll l ll ll ll ll ll ll l l lllll l ll l ll l lll l lll ll ll l l l l llll lll l l ll ll l ll llll ll l lll lll ll l lll ll l llll l l ll ll ll ll ll ll ll llll l ll ll llll ll lll lll ll ll ll ll lll l ll lll l ll ll J + 1.5H + 1K s (b) PLRs for LMC. Fig 1: The left panel illustrates observations of the O-rich Mira by the OGLE survey (ID:

OGLE-LMC-LPV-00082 ). The right panel shows the J -, H - and K s -band PLRs for O-richMiras in the LMC, along with the best-ﬁt quadratic polynomials.In a given system, such as the Triangulum Galaxy (also known as Messier 33; hereafter,M33), all its member stars essentially are at the same distance ( D ) from us. Compared totheir intrinsic luminosity, their apparent brightness is scaled by the same factor of /D . Asa result, the observed PLR of Miras in different galaxies located at various distances are ex-pected to follow the same functional form, up to a zero-point offset. Astronomical time-seriesimaging surveys, using modern CCD cameras on meter-class or larger telescopes, routinelyperform accurate brightness measurements (called photometry ) of millions of stars in othergalaxies. These measurements are usually reported using units of magnitude , which are in-versely proportional to the logarithm of the light ﬂux. Thus, brighter objects have smallermagnitudes. Repeated imaging of a given ﬁeld containing a large number of stars yieldstime series measurements of stellar brightness called light curves . To determine the colorsof stars, images are usually taken using ﬁlters (also called bands ) which only allow certainwavelengths to pass. As a result, several light curves in various colors are collected for eachstar; these constitute the so-called multi-band light curve data.The third phase of the OGLE survey (Udalski et al., 2008) obtained densely-sampled lightcurves of 1,663 Miras (Soszy´nski et al., 2009) in the Large Magellanic Cloud (LMC), asatellite galaxy of the Milky Way. The survey was conducted using two ﬁlters ( V and I ). TheOGLE-III I - and V -band light curves have a median of about 470 and 45 photometric mea-surements per object, respectively. Figure 1a shows the light curve obtained by this surveyfor one O-rich Mira with a period of . days. The horizontal axis represents time, whilethe vertical axis shows the magnitude. Note that as brighter objects have smaller magnitudevalues, the vertical axis is ﬂipped. The I - and V -band light curves are plotted using blackcircles and orange triangles, respectively. The dashed curves are the best sinusoidal ﬁt toeach band separately. Using additional measurements in the NIR J , H and K s bands, Yuanet al. (2017) obtained tight PLRs shown in Figure 1b. In this ﬁgure, each point representsone O-rich Mira. The vertical axis is the average magnitude of each star, while the horizontalaxis is the period of each object, using a logarithmic scale (with base ). The solid lines inFigure 1b are least-squares ﬁts to each band, using a quadratic polynomial in the logarithmof the period. The zero-order terms of the polynomials are used for astronomical distancedetermination.The OGLE survey has obtained light curves of unusual high quality thanks to decades-longobservations with a dedicated telescope. More typical astronomical time-series surveys have IMULTANEOUS INFERENCE OF PERIODS AND PLRS Date [day] M agn i t ude lllllllllllllllllllllllllllllllllllllll llll llll l IJHK s Frequency [day - ] P r o f il e Log − L i k e li hood Fig 2: In the left panel is a typical light curve for a Mira in M33. In the right panel is resultingperiodogram applying the SP method (He et al., 2016) to its I -band light curve. The verticaldashed line indicates the estimated frequency.lower data quality which hinders accurate period estimation and PLR construction. Yuanet al. (2018) analyzed Miras in M33, a galaxy in our Local Group located ∼ × fartherthan the LMC. Each object was measured in four bands ( I , J , H , and K s ) with a mediannumber of observations of 68, 5, 6, and 11, respectively. The left panel of Figure 2 showsthe light curves of one of these Miras. It is evident that both the number of data points andtheir signal-to-noise ratio are limited. This dataset is not alone in terms of the difﬁculty toaccurately recover periods. For example, the second data release of the Gaia survey (Mowlaviet al., 2018) consists of 550,737 variable stars observed in three ﬁlters each with an median ofabout 25 data points, while the second data release of the Pan-STARRS survey (Flewelling,2018), contained an average of less than 10 measurements in each of the ﬁve bands it used.When light curves are sparsely sampled and subject to a high level of noise, the task ofperiod estimation is no doubt challenging. The Lomb-Scargle method (Lomb, 1976; Scargle,1982) is the most commonly used for this purpose. For a general periodic star, the methodﬁts a simple sinusoidal curve to a single-band light curve. The ﬁtting is performed over asequence of trial frequencies (the inverse of periods), and the frequency with the minimalresidual sum of squares is selected as the estimate. However, a light curve of one Mira vari-able is not exactly periodic. From Figure 1a, it is evident that either the I - or the V -bandlight curves have additional ﬂuctuations that are not accounted for by the ﬁtted sinusoids(dashed curve). The semi-parametric (SP) model of He et al. (2016) addresses the issue byﬁtting a Gaussian Process to the residuals. The Lomb-Scargle method has also been extendedby some authors (Vanderplas and Ivezic, 2015; Long et al., 2016) via exploiting multi-bandlight curves of a periodic star. As the number of parameters usually grows with the number ofbands, Long et al. (2016) employed explicit regularization to constrain the model degree offreedom. For multi-band Mira observations, Yuan et al. (2018) enhanced the SP model andhave achieved the best performance for Mira variables to date.After all these efforts and explorations, the next bottleneck in raising estimation accuracyeven further has been encountered: existing methods are susceptible to the so-called “alias-ing effect” due to the discrete sampling of a continuous function. For illustration, we applythe SP model (He et al., 2016) to the I -band light curve shown in the left panel of Figure 2.The model output is in the right panel of Figure 2, with proﬁle log-likelihood on the verticalaxis and frequency on the horizontal axis. Notice the evaluated log-likelihood is highly mul-timodal, and the frequency with maximum log-likelihood (the primary peak marked by thevertical dashed line) is identiﬁed as the estimate. For a light curve with a weak signal, a false frequency can show stronger evidence (larger likelihood) than the true one. When this hap-pens, an incorrect period estimation is acquired. The work of Yuan et al. (2018) devised anad-hoc trick for remedy: instead of the primary peak (with the largest log-likelihood value),they selected the secondary peak (with the second largest log-likelihood value) as the ﬁnalestimate if it better ﬁts the PLR.In addition to the estimation accuracy, computational cost is another curb on the applica-tion of some existing methods. Due to the multimodal nature of the likelihood, a grid searchover the frequency domain seems inevitable to ﬁnd the optimal frequency. This is especiallytroublesome for the methods of He et al. (2016) and Yuan et al. (2018). When these methodsestimate the period of a Mira, the related Gaussian process likelihood has to be computed re-peatedly over the grid of trial frequencies. Moreover, this computationally intensive processhas to be carried out for each Mira in the dataset.In this work, we propose a novel period estimation framework to address these challenges.Based on the state-of-the-art method of He et al. (2016) and Yuan et al. (2018), a Bayesianhierarchical model is constructed for Mira variables. The model embeds the PLR as one ofits model components, and simultaneously models all Mira light curves in a dataset. Theresulting model is capable of automatically reducing the aliasing effect, thereby increas-ing estimation accuracy. Under this framework, each Mira supplies information to updatetheir common PLR. In return, the PLR component guides parameter estimation of individualMiras. In our numerical studies, this mechanism is demonstrated to be key for improvingestimation accuracy.To reduce the computational cost, we develop an efﬁcient algorithm for model inference.Given a Bayesian model, Monte Carlo Markov Chain (MCMC) is the standard tool for in-ference. However, computing MCMC will be prohibitively expensive for our model. As weare ﬁtting a collection of Mira light curves, tens of thousands of parameters will be involvedeven for a dataset of moderate size. Moreover, the multimodal posteriors will also obstructthe convergence of the Markov chain. Fortunately, advances in variational inference (VI; Jor-dan et al., 1999; Blei et al., 2017; Zhang et al., 2019) bring us the opportunity to exploit theproblem in the new framework. VI converts the Bayesian inference into an optimization prob-lem, and ﬁnds the best density in a pre-speciﬁed family to approximate the actual posteriordistribution. To further improve computational efﬁciency for large scale problems, we adoptthe stochastic variational inference framework by Hoffman et al. (2013). Using an unbiasedestimate of the natural gradient during each iteration, our algorithm is able to avoid costlynumerical integration for the frequency parameter, and avoid scanning over all samples periteration.Under the same simulation as Yuan et al. (2018), we show our approach only fails torecover the period for ∼ % of the Miras, as opposed to their ∼ %. The absolute deviationerror is remarkably reduced by 77%. Furthermore, the method of Yuan et al. (2018) costshundreds of CPU hours, which is not possible to run without a computer cluster. In contrast,our method ﬁnishes within a few hours on a personal computer.The beneﬁt of the proposed model goes beyond fast and accurate period estimation. Firstly,existing methods only deliver a point estimate for the frequency (or period) without an uncer-tainty measurement. Our work is the ﬁrst attempt to quantify and approximate the multimodalposterior uncertainty. This is achieved by allowing the variational distribution for frequencyto be any continuous function over a compact interval. An analytical solution has been de-rived for its update in our work. Secondly, the PLR is traditionally determined by a two-stepprocedure: average magnitudes and periods are estimated for individual Miras separately;then the PLR is obtained via a least-squares ﬁt. The proposed method is a more system-atic way to estimate the PLR, which is the fundamental tool for distance measurements. Byadopting Bayesian modeling, we directly harvest its estimate and uncertainty quantiﬁcationas direct model products. IMULTANEOUS INFERENCE OF PERIODS AND PLRS y ibj | s ibj ∼ N ( s ibj , σ ibj ) for i ∈ I , b ∈ B , j ∈ J ib , (1) s ibj = m ib + b Tibj β ib + h ibj ; (2) m ib | f i , α b , γ b ∼ N ( d Ti α b , /γ b ) for i ∈ I , b ∈ B ; (3) β i | Ω ∼ N ( , Ω − ) for i ∈ I ; (4) h ib ( · ) | f i ∼ GP (0 , k b ( · , · )) for i ∈ I , b ∈ B ; (5) f i ∼ Unif( f min , f max ) for i ∈ I ; (6) α b ∼ N (cid:0) ¯ α b , (1 / ¯ δ ) I (cid:1) for b ∈ B ; (7) γ b ∼ Gamma (¯ γ b ¯ r, ¯ r ) for b ∈ B ; (8) Ω ∼ Wishart ( ¯ Ω / ¯ n, ¯ n ) . (9) Fig 3: The hierarchical model for period estimation and PL relation construction.The rest of the paper is organized as follows. Our proposed method is developed in Sec-tion 2. Its computational algorithm via the stochastic variational inference is developed inSection 3. We then conduct simulation studies to compare our method with existing methodsin Section 4. Finally, in Section 5, the proposed method is applied to a real data set of M33Miras (Yuan et al., 2018). Section 6 studies the robustness of the proposed method to down-sampling. The Supplementary Material (He et al., 2020) contains proofs of theoretical resultsand additional numerical results.

2. The Hierarchical Model.

Now we elucidate our Bayesian hierarchical model forperiod estimation with multi-band Mira data. Our dataset is a collection of N Miras in agalaxy and the observations of each Mira are collected with B ﬁlters (bands) indexed by b ∈ { , , · · · , B } . For the i -th star ( i = 1 , · · · , N ) , the data collected with the b -th ﬁlteris denoted as D ib = { ( t ibj , y ibj , σ ibj ) } n ib j =1 , where y ibj is the magnitude measured throughthe b -th ﬁlter at time t ibj , and the corresponding measurement uncertainty is σ ibj . As eachMira is observed through the b -th ﬁlter, all measurements for the i -th Mira are denoted as D i = ∪ b D ib . The period of the i -th Mira will be denoted as p i , and its frequency is f i = 1 /p i .Note p i (or f i ) is one of our key quantities of interest.In the rest of the paper, our whole collection of Mira data will be denoted as D = ∪ i D i .The following index sets will be also used: I = { , · · · , N } , B = { , · · · , B } , and J ib = { , · · · , n ib } . Our model is based on the work of He et al. (2016) and Yuan et al. (2018),where each Mira is modeled separately. We jointly model the whole collection D of Miras,and simultaneously estimate the PLR. The overall model is listed in Figure 3 and its diagramplotted in Figure 4. This is a hierarchical Bayesian model, where Eqn. (1)–(2) constitutethe level-1, Eqn. (3)–(6) constitute the level-2, and Eqn. (7)–(9) constitute the level-3 of themodel. The following subsections explain the model components in details.2.1. The light curve decomposition.

Denote D ib = { ( t ibj , y ibj , σ ibj ) } n ib j =1 as the observedlight curve for the i -th Mira with the b -th ﬁlter. The magnitude y ibj is decomposed as signalplus noise terms y ibj = s ib ( t ibj ) + σ ibj (cid:15) ibj , where s ib ( · ) is the signal as a function of time,and (cid:15) ibj ’s are standard Gaussian noise. In other words, the observed light curve is noisyobservation of s ib ( t ) at sparse time points. Following He et al. (2016), this signal function isfurther decomposed as(10) s ib ( t ) = m ib + b ( f i · t ) T β ib + h ib ( f i · t ) , f i α b ¯ α b ¯ δ m ib γ b ¯ γ b ¯ r β ib Ω ¯ Ω ¯ n y ibj h ibj b ∈ B j ∈ J ib i ∈ I b ∈ B Fig 4: The hierarchical model diagram.where m ib is the average magnitude, b ( u ) = (cos(2 πu ) , sin(2 πu )) T is the sinusoidal basisvector and β ib = ( β ib , β ib ) T is the corresponding coefﬁcient, and h ib ( f i · t ) is a smoothfunction that captures the non-periodic variations in Mira magnitudes. Note the sinusoidalfunction b ( f i · t ) T β ib has a period of p i = 1 /f i . Figure 5 illustrates the decomposition (10)for the I -band light curve in Figure 1a. In the upper panel, the solid curve represents the ﬁttedsignal s ib ( t ) . The middle panel shows the ﬁtted periodic component m ib + b ( f i · t ) T β ib , andthe lower panel is for the non-periodic component m ib + h ib ( f i · t ) .With simpliﬁed notations s ibj = s ib ( t ibj ) , b ibj = b ( f i · t ibj ) and h ibj = h ib ( f i · t ibj ) ,Eqn. (10) constitutes the level-1 of the hierarchical model in Figure 3. This top level modelsthe individual observed light curve D ib separately. Our joint modeling comes in by introduc-ing the PLR model component in the next subsection.R EMARK

1. The Generalized Lomb-Scargle method (Zechmeister and Kürster, 2009)and its multi-band version only have the average magnitude and the sinusoidal componentin the decomposition (10). The additional term h ib ( · ) is introduced by He et al. (2016) andYuan et al. (2018) to account for the additional variability of Mira light curves. All of thesemethods model each Mira separately, without the hierarchical information introduced in thefollowing subsections.2.2. The Period-Luminosity relation.

For a speciﬁc band b ∈ B , the average magnitude m ib and the period p i follow the PLRfor all i ∈ I . As in Figure 1b, for each band, the PLR can be ﬁtted by a quadratic polynomialin log ( p ) . Since our model is parameterized by frequency, the following equivalent PLR(noting log p i = − log f i ) is employed,(11) m ib = α b log f i + α b log f i + α b + e ib = α Tb d i + e ib . In the above, d i = (1 , log ( f i ) , log ( f i )) T is the PLR basis vector for the i -th Mira, α b = ( α b , α b , α b ) T is the PLR coefﬁcient for the b -th band, and the PLR residual e ib is IMULTANEOUS INFERENCE OF PERIODS AND PLRS . . . I [ m ag ] llllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . I [ m ag ] llllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll . . . I [ m ag ] llllllllllllllllllllllllllllllllllll l lllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllll lllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Fig 5: An example of Mira light curve decomposition. The solid curve in the upper panel isthe ﬁtted light curve. The solid curve in the middle panel is the exact periodic component.The lower panel is the non-periodic function.assumed to follow N (0 , /γ b ) with precision γ b . In summary, given f i , α b , γ b , the averagemagnitude is dictated by m ib | f i , α b , γ b ∼ N ( α Tb d i , /γ b ) . In addition, we place on f i a uni-form prior over a compact interval [ f min , f max ] , i.e., f i ∼ Unif( f min , f max ) . The compactinterval [ f min , f max ] can be selected as the search range of interest. For most Miras, theirfrequency lies inside the interval [1 / , / . These constitutes parts of the level-2 ofour model in Figure 3.We remark that while the level-1 of the model treats each light curveseparately, the level-2 dictates the common a-priori information shared by individual Miraparameters.In our proposed model, the coefﬁcient α b and the precision γ b are also unknown param-eters with conjugate prior α b ∼ N ( ¯ α b , (1 / ¯ δ ) I ) , γ b ∼ Gamma (¯ γ b ¯ r, ¯ r ) , where ¯ α b , ¯ γ b , ¯ δ , ¯ r are ﬁxed hyper-parameters. These become Eqn. (7)–(8), and part of level-3 of our modelin Figure 3. Notice that the level-3 parameters α b , γ b and Ω (to be introduced in the nextsubsection) govern the probability distribution of the Mira samples.R EMARK

2. Eqn. (3) plays a key role in our hierarchical model. It governs the relationbetween m ib and f i , and discourages implausible parameter values. This model component isthe key to reduce the aliasing effect and improves the period estimation accuracy. Moreover,a direct product of our proposed method is the PLR coefﬁcients α b ’s, which is one of theimportant pursuits for modern astronomical surveys.2.3. The periodic component.

For the i -th star, let β i = ( β Ti , β Ti , · · · , β TiB ) T be the col-lection of periodic coefﬁcients across all bands. This vector has multivariate Gaussian prior β i | Ω ∼ N ( , Ω − ) in Eqn. (4) of Figure 3. The precision matrix Ω is also unknown with aWishart prior Ω ∼ Wishart ( ¯ Ω / ¯ n, ¯ n ) in Eqn. (9), for ﬁxed hyper-parameters ¯ Ω and ¯ n .The multivariate Gaussian prior (4) on β i empowers us the ability to modeling the corre-lation between the periodic components across all bands. Long et al. (2016) and Yuan et al. (2018) have noticed both periodic amplitude and phase are linearly correlated across bands.The prior distribution as given in (4) on β i allows us to take this correlation into account.Nevertheless, the assumption imposed by Eqn. (4) is weaker than the explicit regularizationof Long et al. (2016).2.4. The non-periodic component.

Recall the decomposition (10) has an additional term h ib ( · ) accounting for the non-periodic variation. For each j ∈ J ib , we denote u ibj = f i · t ibj as the phase, which is the frequency f i multiplied by the time points t ibj . Note h ib ( · ) isa function of u ibj ’s. For the a speciﬁc band b ∈ B , the random function h ib ( · ) is modeledby a Gaussian process (GP, Rasmussen and Williams, 2005) with zero mean and a kernelfunction k b ( · , · ) . Under the GP assumption, the vector h ib = ( h ib ( u ib ) , · · · , h ib ( u ibn ib )) T follows a multivariate Gaussian distribution with zero mean and covariance matrix H ib = (cid:2) k b ( u ibp , u ibq ) (cid:3) n ib p,q =1 . The Eqn. (5) in Figure 3 is exactly this GP component. Note the b -th band data of all Miras share a common kernel k b ( · , · ) . In this work, we employed thefollowing kernel function, k b ( u, u (cid:48) ) = τ b · exp (cid:8) − ( u − u (cid:48) ) /τ b (cid:9) + τ b · I( u = u (cid:48) ) , where I( · ) is the indicator function, and τ b = ( τ b , τ b , τ b ) T is the kernel parameter for the b -th band. The ﬁrst term of the above kernel is the squared exponential kernel with varianceparameter τ b and length-scale parameter τ b . The second term is a nugget term accountingfor additional variability. Our adoption of the squared exponential kernel follows a commonpractice in the kernel machine ﬁeld, though other kernel functions can also be used. TheGaussian process curve ﬁtting is generally not sensitive to the choice of kernel.In this formulation, for each band, the non-period components of light curves of all Mirasare assumed to be realizations of the same Gaussian process. This allows Miras with sparseobservations or relatively low data quality to “borrow strength" from one another when ﬁttingindividual light curves. Letting all Miras share the same set of kernel parameters has the ad-ditional advantage of avoiding the intensive optimization of kernel parameters for individuallight curves, and thereby greatly speeding up computation.R EMARK

3. In the work of He et al. (2016) and Yuan et al. (2018), the term h ib ( · ) is chosen as a function of time t , and the kernel parameter is distinct for different bandsand different Miras. In contrast, our h ib ( · ) is a function of phase u = f i · t , and the kernelparameter is shared by all Miras in a given band. Our choice of using h ib ( f i · t ) makes thefunction argument of the non-periodic component consistent with the periodic component b ( f i · t ) T β ib as shown in (10). This choice is supported by two numerical results presentedin Section S.3 of the Supplementary Material (He et al., 2020).2.5. The joint likelihood.

All of the above leads to our model listed in Figure 3, whichhas a few unknown parameters. In the rest of the paper, to simplify the notation, the PLRcoefﬁcients for all bands are collectively denoted as α = { α , · · · , α B } , and all precision pa-rameters as γ = ( γ , · · · , γ B ) T . For the i -th Mira, its average magnitudes across all bands aredenoted by m i = ( m i , · · · , m iB ) T . For the whole collection of Miras, all their average mag-nitude parameters are denoted by m = { m , · · · , m N } . In addition, we set f = { f , · · · , f N } for the frequency parameters of all Miras and β = { β , · · · , β N } for all periodic coefﬁcients.Given the hierarchical model in Figure 3, Bayesian inference requires computing the pos-terior of parameters, which is proportional to the joint distribution of all parameters. For thedata D i of the i -th Mira, its related joint likelihood (of Eqn. (1) through (6)) is B (cid:89) b =1 n ib (cid:89) j =1 exp (cid:110) − σ ibj ( y ibj − m ib − b Tibj β ib − h ibj ) (cid:111) × B (cid:89) b =1 | H ib | − / exp (cid:110) − h Tib H − ib h ib (cid:111) IMULTANEOUS INFERENCE OF PERIODS AND PLRS × | Ω | / exp (cid:110) − β Ti Ω β i (cid:111) × B (cid:89) b =1 γ / b exp (cid:110) − γ b m ib − d Ti α b ) (cid:111) . The Gaussian process component h ib can be integrated out. In fact, given m ib , β ib , f i , the vec-tor y ib = ( y ib , · · · , y ibn ib ) T follows a Gaussian distribution with mean vector m ib + B ib β ib and covariance matrix Σ ib = H ib + diag ( σ ib , · · · , σ ibn ib ) , where ∈ R n ib is a vector of onesand B ib = (cid:0) b ib , · · · , b ibn ib (cid:1) T ∈ R n ib × . Consequently, the likelihood for the observation ofthe i -th Mira simpliﬁes to L i = B (cid:89) b =1 | Σ ib | − / exp (cid:110) −

12 ( y ib − m ib − B ib β ib ) T Σ − ib ( y ib − m ib − B ib β ib ) (cid:111) × | Ω | / exp (cid:110) − β Ti Ω β i (cid:111) × B (cid:89) b =1 γ / b exp (cid:110) − γ b m ib − d Ti α b ) (cid:111) . (12)Given the whole dataset D , the overall joint distribution (of Eqn. (1) through (9)) is p ( m , β , f , Ω , γ , α , D ) = (cid:16) N (cid:89) i =1 L i (cid:17) × exp (cid:110) − (cid:104) ¯ n ¯ Ω − , Ω (cid:105) + ¯ n − B −

12 log det Ω (cid:111) × B (cid:89) b =1 exp (cid:110) − ¯ rγ b + (¯ r ¯ γ b −

1) log γ b − ¯ δ (cid:107) α b − ¯ α b (cid:107) (cid:111) . (13)The right hand side is the product of the likelihoods for all observations and the priors of allrelated parameters. We know the posterior p ( m , β , f , Ω , γ , α | D ) is proportional to the abovedistribution up to an unknown constant. Traditionally, inference for this hierarchical Bayesianmodel is carried out by the Monte Carlo Markov Chain (MCMC). However, it is well knownthat the traditional MCMC lacks scalability to a large data set. To make it computationallyfeasible to apply our model to real Mira light-curve datasets, we will develop an efﬁcientalgorithm based on the stochastic variational inference in the next section.R EMARK

4. Before ﬁtting the model, we need to specify the hyper-parameters ¯ α b , ¯ δ , ¯ γ b , ¯ r , ¯ Ω , ¯ n . We can set relative small values for ¯ r , ¯ n , and large values for ¯ δ . This makes thepriors less informative. The other hyper-parameters ( ¯ Ω and ¯ δ ) can be easily speciﬁed basedon information from existing high-quality astronomical surveys. This is applicable to the ﬁrstand second order parameters ¯ α b , ¯ α b of PLRs, as the shape of PLRs is the same for Mirasin different galaxies. The only exception is the intercept terms ¯ α b for the PLR prior, as thedistance for a new dataset is unknown. Fortunately, for a new survey, we only need to set aroughly reliable value of ¯ α b . For this purpose, we can apply multi-band generalized Lomb-Scargle (MGLS) method to a small subset of the data, and then use a robust method (e.g.minimal absolute deviation) to get a ﬁtted value of ¯ α b .It is known in the literature that Bayesian hierarchical model may perform poorly whenvague priors are used. Gelman (2006) observed that choice of “non-informative" prior dis-tributions on hierarchical variance parameters may make a big impact on the inference, andimproper choice may lead to inaccurate posterior distributions. In our numerical studies, bysetting non-extreme values ¯ r = 1 , ¯ n = 1 for the variance parameters, our prior distributionsdo not have the adverse shrinkage effect pointed out in Gelman (2006) and our model showsaccurate estimation results. Moreover, we have sufﬁcient data so that the posterior distribu-tion is dominated by the information provided by the data. R EMARK

5. We also need to specify a value for the kernel parameter τ . This is achievedby applying the MGLS method to a small subset of the dataset. With the estimated frequencyfrom MGLS, we ﬁt sinusoidal model to the light curves and obtain the sinusoidal residuals.After that, we ﬁt a Gaussian process to the residual and obtain maximum likelihood estimatesof the kernel parameters.

3. Main Algorithm.

Variational inference (Jordan et al., 1999; Wainwright et al.,2008) aims at approximating the true posterior by ﬁnding an optimal density function q ( · ) among a speciﬁc distribution family Q . This q ( · ) is then employed for subsequent infer-ence. To ﬁnd the optimal approximation, the commonly used strategy is to minimize theKullback-Leibler divergence (KL divergence) between q ( · ) and the actual posterior distribu-tion p ( m , β , f , Ω , γ , α | D ) . In our context, we are trying to solve(14) min q ( · ) ∈Q (cid:90) q ( α , γ , Ω , f , m , β ) log q ( α , γ , Ω , f , m , β ) p ( m , β , f , Ω , γ , α | D ) d m d β d f d Ω d γ d α for q ( · ) among a speciﬁc distribution family Q . Equivalently, the variational inference aimsat maximizing the evidence lower bound (ELBO) L ( D ) := E q (cid:8) log p ( m , β , f , Ω , γ , α , D ) q ( m , β , f , Ω , γ , α ) (cid:9) , wherethe expectation is taken with respect to q ( m , β , f , Ω , γ , α ) . By Jensen’s inequality, it holdsthat log L ( D ) = log (cid:104) (cid:90) p ( m , β , f , Ω , γ , α , D ) d m d β d f d Ω d γ d α (cid:105) ≥ (cid:90) q ( m , β , f , Ω , γ , α ) log p ( m , β , f , Ω , γ , α , D ) q ( m , β , f , Ω , γ , α ) d m d β d f d Ω d γ d α = L ( D ) . The term log L ( D ) is the marginal likelihood for the dataset, and it is interpreted as the modelevidence. The above implies the ELBO L ( D ) acts as a lower bound for log L ( D ) .The variational inference problem (14) requires specifying the distribution family Q . Thisfamily should be large enough to approximate the true posterior accurately, while not toocomplicated for ease of computation. The mean ﬁeld family (Blei et al., 2017) is a popularchoice, for which the distribution q ( · ) factors as a product of distributions for each parameter.In this work, we build a structured mean-ﬁeld variational family Q . The densities inside Q have the following form, q ( m , β , f , Ω , γ , α ) = N (cid:89) i =1 q ( m i , β i , f i ) × q ( α ) q ( γ ) × q ( Ω )= N (cid:89) i =1 [ q ( m i , β i | f i ) q ( f i )] × B (cid:89) b =1 [ q ( α b ) q ( γ b )] × q ( Ω ) . (15)The distribution speciﬁcation for each factor of Eqn. (15) is listed in Table 1. Most of thembelong to the exponential family. Section S.1 in the Supplementary Material (He et al., 2020)provides a review on these exponential family distributions and our employed notions fortheir canonical parameters.For most parameters (e.g. α b , γ b , Ω ) of our model, their full conditional distributions be-long to the exponential distribution family. Their variational distributions are assumed to betheir conjugate counterparts. For example, from the joint distribution (13), we can ﬁnd thefull conditional for γ b is a Gamma distribution. Correspondingly, the variational distribution q ( γ b ) is assumed to be a Gamma distribution.The variational distribution q ( m i , β i , f i ) = q ( m i , β i | f i ) q ( f i ) of each Mira is constructedwith special consideration. Recall the actual posterior is multimodal in f i . To facilitate uncer-tainty quantiﬁcation of the estimated frequency, the variational distribution q ( f i ) is allowed to IMULTANEOUS INFERENCE OF PERIODS AND PLRS T ABLE Notation for the mean ﬁeld distributions with their canonical parameters

Local GlobalFactor Distribution Parameter Factor Distribution Parameter q ( m i , β i | f i ) Gaussian η θi = ( η θ i , η θ i ) q ( α b ) Gaussian η αb = ( η α b , η α b ) q ( f i ) Free Form None q ( γ b ) Gamma η γb = ( η γ b , η γ b ) q ( Ω ) Wishart η Ω = ( η Ω , η Ω ) be any continuous density on the compact interval [ f min , f max ] . Besides, for convenience, thedistribution q ( m i , β i | f i ) is assummed to be Gaussian and depends on the frequency f i . Thisconditional dependence is necessary, because the amplitude and phase of the ﬁtted sinusoidalcurve vary with f i .After specifying Q , the model inference is completely framed as an optimization problem.Our target is maximizing the evidence lower bound, i.e. max q ( · ) ∈Q L ( D ) , with Q determinedin (15). The optimal solution q ( m , β , f , Ω , γ , α ) will then be used for model inference, inplace of the actual posterior p ( m , β , f , Ω , γ , α | D ) . This optimization is typically carried outby the coordinate ascent algorithm (Blei et al., 2017), where one factor in (15) gets updatedwith the others ﬁxed during each round of iteration. Though this updating strategy improvescomputational speed over MCMC in lots of cases, it is still not fast enough for our problemat hand. For efﬁcient computation, we will exploit our hierarchical model structure under theframework of stochastic variational inference (SVI, Hoffman et al., 2013). According to theSVI scheme, the unknown parameters can be divided into two groups: local parameters andglobal parameters. The local parameters are speciﬁc to each Mira. For the i -th Mira, its ownlocal parameters include the average magnitude m i , the periodic coefﬁcient β i , and its fre-quency f i . On the other hand, the global parameters govern all Mira samples. These globalparameters include the PLR coefﬁcients α , the PLR residual precision vector γ , and theprecision matrix Ω for the sinusoidal coefﬁcients. By exploiting the model structure, SVI be-comes much more efﬁcient than the classical coordinate ascent algorithm. In fact, conditionalon the global parameters, the local parameters of each Mira can be updated independentlyof each other. On the other hand, the global parameters can be updated based on mini-batchsamples. The details to update local and global parameters will be provided in Section 3.1and Section 3.2, respectively.3.1. Update local parameters.

In this subsection, we derive analytic formulae to updatethe local variational distribution q ( m i , β i , f i ) = q ( m i , β i | f i ) q ( f i ) for each i ∈ I , while theglobal q ( α ) , q ( Ω ) and q ( γ ) are ﬁxed. For one speciﬁc i ∈ I , we need to maximize the ELBOwith respect to q ( m i , β i , f i ) . Note the ELBO can be factored as L ( D ) = E q (cid:110) log p ( m , β , f , Ω , γ , α , D ) q ( m , β , f , Ω , γ , α ) (cid:111) = N (cid:88) j =1 E q (cid:110) log p ( m j , β j , f j | Ω , γ , α , D j ) q ( m j , β j , f j ) (cid:111) + E q (cid:110) log (cid:81) Bb =1 [ p ( α b ) p ( γ b )] × p ( Ω ) (cid:81) Bb =1 [ q ( α b ) q ( γ b )] × q ( Ω ) (cid:111) = E q (cid:110) log p ( m i , β i , f i | Ω , γ , α , D i ) q ( m i , β i , f i ) (cid:111) + C . (16)In the above, the second equation uses the conditional independence between observations,and the last term C is a constant irrelevant to the update of q ( m i , β i , f i ) . According to thelast line (16), we only need to consider the data D i of the i -th Mira. This means given theglobal q ( α ) , q ( Ω ) and q ( γ ) , the local variational distribution q ( m i , β i , f i ) can be updatedindependently of the other Miras. The average magnitude m i and the sinusoidal coefﬁcient β i can be conveniently up-dated as a whole. For this purpose, we introduce a few notations. Let θ ib = ( m ib , β Tib ) T be the parameter for the b -th band. Then, we arrange θ ib ’s of all bands into a vectoras θ i = ( θ Ti , · · · , θ TiB ) T ∈ R B . Correspondingly, two index sets can be deﬁned as K = { , , , · · · , B − } and its complementary L = { , , · · · , B }\K . The index set K recordsthe locations of m ib ’s in the vector θ i , while L corresponds to the locations of β ib ’s inside θ i .Recall our target of updating the local q ( m i , β i , f i ) = q ( m i , β i | f i ) q ( f i ) with ﬁxed q ( α ) , q ( Ω ) and q ( γ ) . With the deﬁnition of θ i , it is equivalent to state the update interms of q ( θ i , f i ) = q ( θ i | f i ) q ( f i ) . According to our distribution speciﬁcation in Table 1, q ( f i ) is allowed to be any continuous density on a compact interval [ f min , f max ] , while q ( θ i | f i ) ∝ exp (cid:110)(cid:10) η θ i , θ i θ Ti (cid:11) + (cid:10) η θ i , θ i (cid:11)(cid:111) is assumed to be Gaussian with canonical param-eters η θ i ∈ R B × B and η θ i ∈ R B . Note the value of η θ i and η θ i depends on the frequency f i . With these notations, our goal boils down to ﬁnding the optimal q ( f i ) , and for each f i computing the optimal η θ i , η θ i . The optimal solutions should maximize the ELBO (16),which is now equivalently expressed as L ( D ) = E q (cid:110) log p ( θ i , f i | Ω , γ , α , D i ) q ( θ i , f i ) (cid:111) + C . (17)This equivalent ELBO involves the conditional likelihood p ( θ i , f i | Ω , γ , α , D i ) , which isproportional to the likelihood in (12). To convert the expression in a compact form, weneed to deﬁne a few more notations. According to (3) and (4) of our model, the condi-tional distribution of θ i given ( f i , α , γ , Ω ) is N ( θ , Θ − ) , with its mean vector θ andprecision matrix Θ deﬁned as follows. The sub-vectors of θ indexed by K and L are θ [ K ] = ( d Ti α , · · · , d Ti α B ) T and θ [ L ] = , respectively. In fact, θ [ K ] contains the a-priori mean of m i in (3), while θ [ L ] is the a-priori mean of β i in (4). Similarly, the sub-matrices of Θ indexed by K and L are Θ [ K , K ] = diag( γ , · · · , γ B ) , Θ [ L , L ] = Ω , and Θ [ K , L ] = (cid:0) Θ [ L , K ] (cid:1) T = , respectively. In the above, for example, Θ [ K , L ] is the submatrix of Θ with rows indexed by K , and columns indexed by L . All of the above is simply writing (3) and (4) of all bands in acompact form, i.e., θ i | ( f i , α , γ , Ω ) ∼ N ( θ , Θ − ) . In addition, for the b -th band of the i -thMira, deﬁne a basis matrix C ib = ( , B ib ) ∈ R n ib × . Then, by rewriting (12) with these newnotations, it follows that log p ( θ i , f i | Ω , γ , α , D i ) = − B (cid:88) b =1 ( y ib − C ib θ ib ) T Σ − ib ( y ib − C ib θ ib ) −

12 log det Σ ib −

12 ( θ i − θ ) T Θ ( θ i − θ ) + C (cid:48) , with some irrelevant constants C (cid:48) . The above expression will be plugged into (17) to ﬁndthe optimal solution. Our updating formulae for the local parameter is summarized by thefollowing theorem, whose proof is provided in Section S.2 of the Supplementary Material(He et al., 2020).T HEOREM D i of the i -th Mira and q ( α ) , q ( γ ) , and q ( Ω ) are ﬁxed.Then, for each f i , the optimal q ( θ i | f i ) maximizing ELBO L ( D ) has canonical parameters(18) η θ i = E q ( γ ) E q ( Ω ) ( ξ θ i ) , and η θ i = E q ( α ) ( ξ θ i ) . IMULTANEOUS INFERENCE OF PERIODS AND PLRS Algorithm 1

Updating Local Parameters

Input : f min , f max , ∆ f . The dataset D i . Output : q ( θ i , f i ) .1: f i ← f min while f i ≤ f max do

3: Update q ( θ i | f i ) by (18).4: Compute q ( f i ) by (19).5: f i ← f i + ∆ f end while

7: Normalize q ( f i ) to become a proper density. In the above, ξ θ i = − × diag (cid:0) C Ti Σ − i C i , · · · , C TiB Σ − iB C iB (cid:1) − Θ and ξ θ i = (cid:0) ( C Ti Σ − i y i ) T , · · · , ( C TiB Σ − iB y iB ) T (cid:1) T + Θ θ . Moreover, the optimal q ( f i ) is given by(19) q ( f i ) ∝ exp (cid:110) − g ( f i ) −

12 log det( − η θ i ) (cid:111) , with g ( f i ) = 12 ( η θ i ) T ( η θ i ) − η θ i + (cid:10) E q ( Ω ) E q ( γ ) Θ , E q ( α ) θ θ T (cid:11) + B (cid:88) b =1 (cid:2) y Tib Σ − ib y ib + log det Σ ib (cid:3) . This theorem develops analytical formulae to update q ( θ i , f i ) for the i -th Mira. At eachﬁxed f i , the canonical parameter for q ( θ i | f i ) is directly available via (18). Notice the basismatrix C ib , the covariance matrix Σ ib as well as the a-priori mean vector θ all implicitlydepend on the frequency f i . Besides, the optimal density q ( f i ) is computed by (19). In ourimplementation, q ( f i ) is discretized and evaluated on a dense grid of [ f min , f max ] . The grid isgaped by a small value of ∆ f . After that q ( f i ) is normalized to become a proper density. Theﬁnal algorithm for updating q ( θ i , f i ) = q ( θ i | f i ) q ( f i ) is summarized in Algorithm 1. In ournumerical studies, the computation is performed on equally spaced points on the interval [1 / , / .3.2. Update global parameters.

Now, we address updating the global q ( α ) , q ( Ω ) and q ( γ ) , with ﬁxed local variational distributions q ( m i , β i , f i ) for all i ∈ I . We ﬁrstly developthe update of q ( Ω ) in detail. The way to update q ( α ) and q ( γ ) is derived similarly.When q ( α ) , q ( γ ) and all q ( m i , β i , f i ) ’s are ﬁxed, optimizing the ELBO with respect to q ( Ω ) is equivalent to maximizing E q log (cid:104) p ( Ω | α , γ , β , m , f , D ) q ( Ω ) (cid:105) . This is because L ( D ) = E q (cid:110) log p ( m , β , f , Ω , γ , α , D ) q ( m , β , f , Ω , γ , α ) (cid:111) = E q log (cid:20) p ( Ω | α , γ , β , m , f , D ) q ( Ω ) (cid:21) + E q log (cid:20) p ( α , γ , β , m , f , D ) q ( m , β , f , Ω , γ , α ) /q ( Ω ) (cid:21) = E q log (cid:20) p ( Ω | α , γ , β , m , f , D ) q ( Ω ) (cid:21) + C (cid:48)(cid:48) , (20)where C (cid:48)(cid:48) is some constant invariant to the change of q ( Ω ) . In the above, the expectation E q ( · ) is taken with respect to q ∈ Q in (15) at the current iteration. From the joint distribu- tion (13), we can ﬁnd the full conditional for Ω as p ( Ω | α , γ , β , m , f , D ) ∝ exp (cid:110) − (cid:68) ¯ n ¯ Ω − + N (cid:88) i =1 β i β Ti , Ω (cid:69) + N + ¯ n − B −

12 log | Ω | (cid:111) . This full conditional is a Wishart distribution with canonical parameter ξ Ω = − (¯ n ¯ Ω − + (cid:80) Ni =1 β i β Ti ) / and ξ Ω = ( N + ¯ n − B − / . In addition, note q ( Ω ) ∝ exp (cid:8) (cid:104) η Ω , Ω (cid:105) + η Ω log | Ω | (cid:9) is also Wishart with canonical parameters η Ω and η Ω . It can be veriﬁed theoptimal q ( Ω ) maximizing (20) has canonical parameters(21) η Ω = E q ( ξ Ω ) = − (cid:104) ¯ n ¯ Ω − + N (cid:88) i =1 E q ( m i , β i ,f i ) (cid:0) β i β Ti (cid:1) (cid:105) , and η Ω = ξ Ω . Although it is straightforward to compute η Ω = ξ Ω = ( N + ¯ n − B − / ,the involved computation in (21) is expensive for a large dataset. In particular, the expectationwith respect to q ( m i , β i , f i ) does not have a close form expression. It can only be carried outby costly numerical integration. Moreover, this has to be done for all of the N observationsduring each algorithm iteration, which is especially troublesome when N is large.Fortunately, the intensive computation can be avoided by SVI (Hoffman et al., 2013),which only requires an unbiased estimation of natural gradient. For the canonical parameter η Ω of q ( Ω ) , its natural gradient of the ELBO is E q ( ξ Ω ) − η Ω . It is the difference be-tween the expected parameters of the full conditional and the parameters of the current q ( Ω ) .The gradient ascent algorithm would update q ( Ω ) by η Ω ← η Ω + κ t (cid:2) E q ( ξ Ω ) − η Ω (cid:3) with some step size κ t at the t -th iteration. In the context of stochastic gradient algorithm(Ghadimi and Lan, 2013), it is plausible to employ an unbiased estimation of the naturalgradient via sampling. For this purpose, at the t -th iteration, we sample a subset of indices (cid:101) I t = { i , i , · · · , i I } from I = { , · · · , N } without replacement and with equal probability.The cardinality of the set is | (cid:101) I t | = I for some ﬁxed I . This corresponds to the mini-batchstrategy, and the batch size I is usually set to be , , , , etc. Given the set (cid:101) I t and for each j ∈ (cid:101) I t , we further sample a frequency value ˜ f j from the current variational distribution q ( f j ) .Then, an unbiased estimate of the natural gradient E q ( ξ Ω ) − η Ω is − (cid:104) ¯ n ¯ Ω − + NI (cid:88) j ∈ (cid:101) I t E q ( m j , β j | ˜ f j ) (cid:16) β j β Tj (cid:17)(cid:105) − η Ω . Based on the unbiased natural gradient estimate, the update for q ( Ω ) becomes η Ω ← η Ω + κ t (cid:16) − (cid:104) ¯ n ¯ Ω − + NI (cid:88) j ∈ (cid:101) I t E q ( m j , β j | ˜ f j ) (cid:16) β j β Tj (cid:17)(cid:105) − η Ω (cid:17) , (22) η Ω ← ( N + ¯ n − B − / . (23)Notice we only need to compute with I samples in the udpate (22). In addition, the expecta-tion with respect to q ( m j , β j | ˜ f j ) is straightforward because it is a Gaussian distribution. Theoptimal value for η Ω is directly provided in (23), and η Ω keeps ﬁxed at this value during theiteration. These features make the algorithm scale easily to a large collection of observations.Under the same principle, we can ﬁnd the full conditional for γ b as q ( γ b | α , Ω , β , m , f , D ) ∝ exp (cid:8) ξ γ b γ b + ξ γ b log γ b (cid:9) , which is gamma distribution with canonical parameter ξ γ b = − ¯ r − (cid:80) Ni =1 ( m ib − d ( f i ) T α b ) / and ξ γ b = N/ γ b ¯ r − . Note d ( f ) = (1 , log ( f ) , log ( f )) T IMULTANEOUS INFERENCE OF PERIODS AND PLRS Algorithm 2

Main Algorithm for t = 1 , , · · · do

2: Set step size κ t = ( c + t ) − c .3: Sample (cid:101) I t = { i , i , · · · , i I } from I = { , , · · · , N } without replacement.4: for j ∈ (cid:101) I t do

5: Update local q ( θ j , f j ) by Algorithm 1.6: Sample ˜ f j according to q ( f j ) .7: end for

8: Update q ( Ω ) by (22) and (23).9: for b = 1 , , · · · , B do

10: Update q ( γ b ) by (24) and (25).11: Update q ( α b ) by (26) and (27).12: end for end for is the PLR basis. With the sampled subset of data and the sampled frequencies, the SVIalgorithm updates q ( γ b ) by ξ γ b ← ξ γ b + κ t (cid:16) − ¯ r − NI (cid:88) j ∈ (cid:101) I t E q ( m j , β j | ˜ f j ) E q ( α b ) (cid:104) m jb − d ( ˜ f j ) T α b (cid:105) / (cid:17) , (24) ξ γ b ← N/ γ b ¯ r − . (25)Similarly, the full conditional for α b is q ( α b | γ , Ω , β , m , f , D ) ∝ exp (cid:110) (cid:10) η α b , α b α b (cid:11) + (cid:10) η α b , α b (cid:11) (cid:111) . This full conditional is multivariate Gaussian with canonical parameter η α b = − ¯ δ I − γ b (cid:80) Ni =1 d ( f i ) d T ( f i ) , and η α b = ¯ δ ¯ α b + γ b (cid:80) Ni =1 m ib d ( f i ) . The SVI algorithm up-dates q ( α b ) by η α b ← η α b + κ t (cid:16) − ¯ δ I − γ b × NI (cid:88) j ∈ (cid:101) I t d ( ˜ f j ) d ( ˜ f j ) T (cid:17) , (26) η α b ← η α b + κ t (cid:16) ¯ δ ¯ α b + γ b × NI (cid:88) j ∈ (cid:101) I t E q ( m j , β j | ˜ f j ) ( m jb ) × d ( ˜ f j ) (cid:17) . (27)The ﬁnal algorithm is summarized in Algorithm 2. The algorithm chooses the step size κ t = ( c + t ) − c . At the start of each iteration, a small subset of samples is selected and theirlocal parameters get updated in Line 3–7 of Algorithm 2. With the sampled and updated localvariational distribution, the global parameters get updated by (22)–(27). In our numericalstudies, we set c ∈ [1000 , and c ∈ [0 . , . We also take mini-batch size I = 8 , andﬁnd that 1000 iterations are enough for convergence.

4. Simulation.

We have conducted two simulation experiments to compare the perfor-mance of our method (SVI) with some existing methods. These include the generalizedLomb-Scargle method (GLS, Zechmeister and Kürster, 2009) and its multi-band version(MGLS, Vanderplas and Ivezic, 2015), the single-band semi-parametric model (SP, He et al.,2016) and its multi-band extension (MSP, Yuan et al., 2018).The ﬁrst simulation uses the same simulated dataset used in Yuan et al. (2018), where thelight curves for 5,000 O-rich Miras were simulated in the I , J , H , and K s bands. The cadenceand sampling quality of the light curves closely matched their actual survey of M33. Giventhe observation time points, the signal light curves were generated using the Mira templateof Yuan et al. (2017). This template is trained from the well-observed OGLE LMC data set. T ABLE Performance for Simulation I

GLS MGLS SP MSP SVIADE ( × − ) 5.98 3.97 5.70 2.18 RR (%) 72.00 82.44 78.58 91.26 T ABLE SVI Conﬁdence Set

CoverageNomimal (%) 90 95 99 99.5Actual (%) 78.5 86.2 95.3 97.8

After this, the magnitude of the generated light curves were shifted by +6 . mag to accountfor the relative distance between the LMC and M33. Finally, the light curve magnitude valueswere contaminated by realistic noise. Four simulated multi-band Mira light curves are shownin the left column of Figure 6.The ﬁve methods GLS, MGSL, SP, MSP, and SVI are applied to this dataset. The single-band models (GLS and SP) are applied to the I -band dataset as it has the most data points.For our proposed method, we use maximum a posterior (MAP) as a point estimate for thefrequency. In other words, for the i -th star, its estimated frequency is ˆ f i = arg max f i q ( f i ) with the ﬁnal estimated variational distribution q ( f i ) .The accuracy of period estimation is measured by two metrics: recovery rate and absolutedeviation error. The recovery rate (RR) is the percentage of light curves with accuratelyestimated frequency. Suppose for the i -th simulated Mira, f i is its true frequency and ˆ f i is theestimated frequency. The estimation is considered to be accurate if their absolute differenceis less than a threshold value, i.e., if | f i − ˆ f i | ≤ λ . We choose λ = 2 . × − in this work,which is the average half distance to the sidelobes in the frequency spectra (He et al., 2016).Then, the recovery rate (RR) is computed as RR = N (cid:80) Ni =1 I( | f i − ˆ f i | ≤ λ ) , where I( · ) is the indicator function. The other employed metric is the absolute deviation error (ADE),computed by ADE = N (cid:80) Ni =1 | f i − ˆ f i | .For all methods, their recovery rate and ADE on this dataset are reported in Table 2. Noticethe single-band GLS method has worst performance in both metrics. By exploiting more ob-servational information, the multi-band methods (MGLS and MSP) increase the RR by 10%and reduces ADE by more than 40%, compared with their single-band counterparts (GLSand SP). It is also notable that SP (or MSP) outperforms GLS (or MGLS resp.). Recall theSP and MSP have modeled the sinusoidal residuals by Gaussian process. The improvementdemonstrates the effectiveness of modeling the quasi-periodic signal of Mira light curves.Lastly, we note our SVI method has the best performance. The proposed SVI has remarkablyestimated almost all samples accurately. It has a recovery rate of . and ADE accuracyof . × − .In the right column of Figure 6, we plot the ﬁtted variational distribution q ( f i ) for eachsimulated Mira light curve shown in the left column of the same row. The red reversed trian-gles mark the estimated frequencies ˆ f i , and the blue triangles mark the true frequencies. Theﬁrst two rows demonstrate the cases when the true frequencies are correctly identiﬁed. Thelast two rows show the cases when the MAP ˆ f i fails to estimate the true frequency. However,in these cases when the signal is weak, our proposed method still recovers the multi-modalposterior of the frequency, and the true frequency lies around one of the posterior modes.The plots of q ( f i ) illustrate the power of this method to quantify the frequency estimationuncertainty. To further assess the uncertainty quantiﬁcation from q ( f i ) , we construct a − α level conﬁdence set for each Mira in the simulated dataset. The conﬁdence set C i = { f i : q ( f i ) > c i } is constructed with some c i such that (cid:82) C i q ( f i ) d f i = 1 − α . Note each C i maybe composed of a few short intervals due to the multimodal nature of q ( f i ) . For differentvalue of α , the conﬁdence set is computed, and the actual coverage rate is evaluated overthe simulated dataset. The empirical results are reported in Table 3. For example, when thenominal coverage rate is (with α = 5% ), the actual coverage rate is only . . Itindicates the conﬁdence sets are not wide enough and have an under-coverage issue. This IMULTANEOUS INFERENCE OF PERIODS AND PLRS Date [day] M agn i t ude lllllllllllllllllllllllllll llllllllllllllllllll llllllllllllll llllllllll lll IJHK s Frequency [day - ] q ( f ) True freqEstimated freq

Date [day] M agn i t ude llllllllllllllllllllllllll lllllllllllllllll lllllllllllllllllll llllllllllllll llllllllll lll Frequency [day - ] q ( f ) . . . . Date [day] M agn i t ude llllllll ll lllllll l Frequency [day - ] q ( f ) Date [day] M agn i t ude llllllllllllllllllllllllll lllllllllllllllll lllllllll llllllll Frequency [day - ] q ( f ) Fig 6: In the left column are four simulated Mira light curves from Simulation I. In the rightcolumn are the estimated variational distributions q ( f i ) for the corresponding Mira in thesame row.problem is inherent for mean ﬁled variational inference which underestimates estimationuncertainty (Wang and Blei, 2019).Taking a closer look of the coverage rate issue, we found that, in most cases when the con-ﬁdence set misses the true frequency, the true frequency resides very close to the conﬁdenceset; it is rare that the conﬁdence set completely misses the true frequency. More precisely, let C i be the 99% conﬁdence set for the i sample. Let λ = 2 . × − be the threshold to deter-mine the recovery rate (RR) as used in the paper. We say that the set C i “entirely misses” the − . . . . . log (Period) D J [ m ag ] (a) − . . . . . log (Period) D H [ m ag ] (b) − . − . . . . . log (Period) D K s [ m ag ] (c) Fig 7: The horizontal axis is the logarithm of SVI period, and the vertical axis is the differenceof average magnitude between the results of SVI and Yuan et al. (2018). Left, center and rightpanels show J − , H and K s magnitudes, respectively. The horizontal dashed lines indicatethe average difference between the two methods.true frequency, if dist ( f i , C i ) > λ ; and it “almost covers" the true frequency, if otherwise. Inother words, the C i is considered as completely failed if the distance of the true frequency f i to the set C i is larger than λ . Out of the total 5,000 simulated samples, for 234 samples(4.68%) the 99% conﬁdence set fails to cover the corresponding the true frequencies; amongthem, for only 19 (0.38%) samples, the conﬁdence set entirely misses the true frequency, andfor 215 samples (4.3%), the conﬁdence set almost covers the true frequency.To further assess the performance of the proposed method, we conducted another, morecomprehensive simulation study where 90 datasets with distinct characteristics were gener-ated by considering 90 different combinations of sampling cadence patterns, different obser-vation noise levels, and various number of sampling points. Some of these datasets are morechallenging than others with higher level of data sparsity and noise. Our proposed methodstill delivers the best performance in terms of the two metrics (RR and ADE) for each dataset.Details of this simulation study are presented in Section S.4 of the Supplementary Material(He et al., 2020).

5. The M33 Dataset.

In this section we analyze the dataset of M33 Miras from Yuanet al. (2018). The authors collected images and photometric measurements from severalsources (Javadi et al., 2015; Macri et al., 2001; Pellerin and Macri, 2011) to generate lightcurves in four bands ( I , J , H , and K s ) with a median number of observations of 68, 5,6, and 11, respectively. Yuan et al. (2018) identiﬁed 1,781 candidate Miras and classiﬁed1,265 of them as O-rich, 88 as C-rich, and 428 as unknown, respectively. The O-rich Miraswere used to estimate distance moduli (logarithms of the distance) for M33. From the J -, H - and K s -band PLRs, they obtained values of . ± . mag, . ± . mag, and . ± . mag, respectively. Earlier works based on Cepheid variables in the same galaxyyielded estimates of its distance modulus as . ± . mag (Macri, 2001), . ± . mag (Scowcroft et al., 2009), and . ± . mag (Gieren et al., 2013). Detached eclipsingbinaries resulted in an estimate of . ± . mag (Bonanos et al., 2006), while RR Lyraevariables provided . ± . mag (Sarajedini et al., 2006).We carry out an analysis with our proposed method using the 1,265 candidate Miras clas-siﬁed as O-rich by Yuan et al. (2018). Our method ﬁts the light curves reasonably well,as can be visually assessed by plotting the MAP estimator; see Section S.5 of the Sup-plementary Material for some randomly selected examples. We obtain the frequency and IMULTANEOUS INFERENCE OF PERIODS AND PLRS

200 400 600 800 1000

SVI Period M SP P e r i od l ll l l ll ll ll ll lll llll l l ll ll lll ll l l lll ll lllll ll ll l l llll ll ll lll ll l ll l Fig 8: Comparison of the period estimatedby SVI and by Yuan et al. (2018). The redpoints are Miras with inconsistent periodsfrom the two methods. − . − . . . . −1.0−0.50.00.51.0 b I −1.0−0.50.00.51.0−1.0−0.50.00.51.0−1.0−0.50.00.51.0−1.0−0.50.00.51.0−1.0−0.50.00.51.0−1.0−0.50.00.51.0 − . − . . . . −1.0−0.50.00.51.0 b I − . − . . . . b J − . − . . . . b J − . − . . . . b H − . − . . . . b H − . − . . . . b K − . − . . . . − . − . . . . −1.0−0.50.00.51.0 b K Fig 9: Scatterplot for the sinusoidal coefﬁcientsfor M33 Miras. T ABLE J -, H - & K s -band PLR coefﬁcients for candidate M33 O-rich Miras galaxy band linear (period < d) quadratic a a σ a a a σ LMC (Yuan+17) J . ± . − . ± .

09 0 .

15 12 . ± . − . ± . − . ± .

23 0 . M33 (Yuan+18) J . ± . . . . .

25 18 . ± . . . . . . . . M33 (SVI) J . ± . − . ± .

06 0 .

22 19 . ± . − . ± . − . ± .

02 0 . LMC (Yuan+17) H . ± . − . ± .

09 0 .

16 11 . ± . − . ± . − . ± .

31 0 . M33 (Yuan+18) H . ± . . . . .

24 18 . ± . . . . . . . . M33 (SVI) H . ± . − . ± .

06 0 .

21 18 . ± . − . ± . − . ± .

02 0 . LMC (Yuan+17) K s . ± . − . ± .

07 0 .

12 11 . ± . − . ± . − . ± .

20 0 . M33 (Yuan+18) K s . ± . . . . .

21 17 . ± . . . . . . . . M33 (SVI) K s . ± . − . ± .

06 0 .

20 17 . ± . − . ± . − . ± .

02 0 . average magnitude for each Mira. The frequency of the i -th Mira is directly estimated by ˆ f i = arg max f i q ( f i ) . The uncertainty for its estimated period ˆ p i = 1 / ˆ f i is computed by σ (ˆ p i ) = (cid:82) (1 /f i ) q ( f i )d f i − (cid:2) (cid:82) (1 /f i ) q ( f i )d f i (cid:3) . The computation of ˆ f i is straightforwardand more reliable, as our model has built-in PLR to suppress aliasing frequencies. In contrast,the work of Yuan et al. (2018) involves an ad-hoc post-estimation correction. For each Mira,they test both the primary and secondary peaks in their log-likelihood. Their ﬁnal estimatedfrequency is selected as the one that best ﬁts the overall PLR.A comparison of the periods estimated by our method and the MSP method is shown inFigure 8. Most of the estimated periods from the two methods are consistent, with only . Miras (red points) deviating by more than 10%. We also compare the average

J HK s mag-nitude between our determinations and that of Yuan et al. (2018) in Figure 7a, 7b and 7c.Our determination has slighter higher average magnitude values. Nevertheless, the two re-sults are generally consistent, despite there are some outliers. The standard deviations of thedifference are 0.16, 0.13, and 0.10 mag for the J , H , and K s bands, respectively. T ABLE Derived Distance Moduli for M33 band ∆ a ∆ ¯ m ∆ A λ ∆ct ∆ µ µ LMC µ M33 J . ± . − . ± .

035 0 . ± .

008 0 . ± .

036 6 . ± .

053 18 . ± .

048 24 . ± . H . ± . − . ± .

029 0 . ± .

005 0 . ± .

040 6 . ± .

052 18 . ± .

048 24 . ± . K s . ± . − . ± .

024 0 . ± . − . ± .

032 6 . ± .

042 18 . ± .

048 24 . ± . In addition, we compute the posterior mean of the sinusoidal coefﬁcient β i for each Mira.The scatter plot of the expected β i ’s (the expectation is taken with respect to q ( β i ) ) is shownin Figure 9. Notice that the coefﬁcients for the J HK s bands have a tight correlation, and theirrelation with the I -band coefﬁcients β I , β I is also non-negligible. The correlation structurein the real data justiﬁes our model Eqn. (4) on β i . By exploiting this structure, our method isable to constrain the model degree of freedom and estimate parameters more accurately.We note the doughnut shape of the posterior joint distribution of β I , β I shown in Fig-ure 9 does not suggest model misspeciﬁcation. In fact, it is the Bayesian method in action.The posterior distribution needs not be consistent with the prior. In Bayesian statistics, it isexpected that when there is enough data, the information provided in the data will “wash outthe prior."Next, we will compare our estimated PLRs with the results from Yuan et al. (2017) andYuan et al. (2018). For each band, Yuan et al. (2018) ﬁt a quadratic PLR(28) M = a + a (log ( P ) − .

3) + a (log ( P ) − . to all Miras; and a linear PLR M = a + a (log ( P ) − . to Miras with period less than400 days . Notice the above quadratic PLR is simply an equivalent reparameterization of ourPLR in (11). In the work of Yuan et al. (2018), the PLRs are ﬁrstly ﬁtted to the LMC Miras(Yuan et al., 2017). After that, they ﬁx the coefﬁcients a and a , and only ﬁt the intercept a to the M33 Miras. Table 4 presents their results. The coefﬁcients for the LMC Miras arein the rows of LMC (Yuan+17) , and their results on the M33 Miras are in the rows named

M33 (Yuan+18) .The quadratic PLRs from our models are plotted as red curves in Figure 10. In the ﬁg-ure, each point represents one Mira with our estimated average magnitude and period. ForMiras with period less than 400 days, we also ﬁt additional linear PLRs, which are shownas black lines. Our coefﬁcients are presented in the rows named

M33 (SVI) in Table 4.For comparison, the reported coefﬁcient values have been converted to respect the equivalentparameterization (28). Notice our estimated a and a are consistent with the LMC resultsfrom Yuan et al. (2017) within the error budget.In Table 4, the column σ is the standard deviation of the PLR residuals. Smaller σ meanstighter PLRs. It is remarkable that our model achieves competitive performance as the M33result of Yuan et al. (2018), without their ad-hoc period correction tricks. This veriﬁes thequality the automatically estimated PLRs embedded in our model.Lastly, we compute the distance modulus of the M33 galaxy. Denote a LMC0 as the zero-order coefﬁcient of the LMC quadratic PLR from Yuan et al. (2017). Denote a M330 as the zero-order coefﬁcient of our M33 quadratic PLR . Then, up to a few corrections, ∆ a = a M330 − a LMC0 is equal to the difference in distance moduli of the two galaxies. These correctionsinclude: conversion to ﬂux-averaged magnitude ∆ ¯ m , correction for the different interstellarextinction ∆ A λ towards LMC and M33; and the color term bias ∆ct of the photometriccalibration. We note the ﬂux correction ∆ ¯ m is necessary because the average magnitudewas computed in units of ﬂux for the LMC Miras. This correction is carried out as follows:We ﬁt the signal light curve s ib ( t ) in (10) for each Mira over the whole observational timedomain. The signal s ib ( t ) is in the unit of magnitude and can be converted to ﬂux via s (cid:48) ib ( t ) =10 − . · s ib ( t ) . The average value l ib of s (cid:48) ib ( t ) is computed, and it is converted back to magnitude IMULTANEOUS INFERENCE OF PERIODS AND PLRS log (Period) J [ m ag ] log (Period) H [ m ag ] log (Period) K s [ m ag ] Fig 10: The PLRs of M33 Miras in the J (left), H (center) and K s (right) bands. The verticalaxes shown the average magnitude in the respective band while the horizontal axes are thelogarithm of the period. Red curves are the ﬁtted PLRs from our SVI method. Black lines areﬁtted linear PLRs for Miras with period less than 400 days.by m (cid:48) ib = − . ( l ib ) . Note some irrelevant constants have been dropped during theseconversions. Then, the correction value ∆ ¯ m for each band is the average difference m (cid:48) ib − m ib from all Miras.After making these corrections, we obtain the relative distance modulus ∆ µ = ∆ a +∆ ¯ m + ∆ A λ + ∆ct between the LMC and M33. We adopt a distance modulus µ LMC =18 . ± . for LMC (Pietrzy´nski et al., 2013). Thus, the distance modulus for M33 is µ M33 = µ LMC + ∆ µ . This procedure is applied separately to the J -, H - and K s -band dataand summarized in Table 5. From the respective PLRs, our analysis yields distance moduliof . ± . mag, . ± . mag, and . ± . mag, respectively. Theseare also consistent with earlier estimates.

6. Effect of downsampling.

In Section 5, we showed that our proposed SVI methodproduces distance estimates and uncertainties which are consistent with other methods with-out resorting to ad hoc period estimate corrections. This section studies how robust theseestimates are to downsample the light curves, e.g., how do distance estimates change as thenumber of photometric measurements and/or light curve goes down.We conducted an experiment of downsampling the actual M33 data set used in Section 5.We compared our SVI method with two other multi-band methods, MGLS and MSP, withoutusing any ad hoc period estimate corrections. To assess the reliability and quality of theestimated PLR for distance estimation in downsampling, we measured the changes in thePLR intercept and in the PLR scattering between the downsampled estimates and the fullsample estimates.In each replication of the experiment, ˜ N = 50 or ˜ N = 100 Miras are randomly drawnfrom the original N = 1 , candidate O-rich Miras. For each selected Mira, we also puta restriction ¯ n b over the number of photometric measurements for its b -th band light curve( b ∈ { I, J, H, K s } ). In particular, for the b -th band light curve of the i -th selected Mira, wesample ˜ n ib = min { n ib , ¯ n b } photometric measurements without replacement from the original n ib measurements. Six combinations of ˜ N and ¯ n b are considered as in Table 6.Each setting in Table 6 is repeated 20 times. In each replication of one setting, three multi-band methods (MGLS, MSP and SVI) are applied to estimate the period and the averagemagnitude for each down-sampled Mira. To ﬁt a quadratic PLR for MGLS and MSP, weﬁx the coefﬁcient a and a (estimated from LMC data) and only estimate the intercept T ABLE Experiment settings for downsampling the actual M33 observations

Setting ˜ N ¯ n I ¯ n J ¯ n H ¯ n K s S1 50 10 5 5 5S2 50 20 10 10 10S3 50 30 15 15 15S4 100 10 5 5 5S5 100 20 10 10 10S6 100 30 15 15 15 term a . Suppose the intercept estimated from the subsample is denoted as ˜ a and the fullsample estimate is denoted ˆ a . The ﬁrst row of Figure 11 shows the boxplot of the difference ˜ a − ˆ a . The three panels in the ﬁrst row correspond to the J , H and K s bands, respectively;and the results for MGLS, MSP and SVI are colored as red, green, and blue, respectively.The intercept difference ˜ a − ˆ a is closest to for our SVI method, and the variability isalso small. As comparison, MGLS and MSP have much larger bias and variability than SVI.Compared with MGLS, the MSP method has smaller bias in the H band but larger bias inthe J and K s bands.We also compare the scattering of the PLR residuals. Let ˜ σ be the standard deviation ofthe PLR residuals for each method in each experiment replication. Also, let ˆ σ be the standarddeviation of the PLR residuals obtained by our SVI method for the full sample. The secondrow of Figure 11 shows the boxplot of the ratio ˜ σ/ ˆ σ for the 20 replications of the threemethods. The residual scattering ratio is only slightly above for our SVI method over thesix scenarios of downsampling. As comparison, MGLS and MSP have substantially largerresidual scattering ratio. The residual scattering of MSP is smaller than that of MGLS.These results suggest that our SVI is much more robust to downsample Mira light curvesthan MGLS and MSP methods. It would be an interesting research topic to study the implica-tion of these results for telescope cadence selection and the necessary density of time domainsampling. Acknowledgements.

He’s Project 11801561 was supported by NSFC. L. Macri ac-knowledges support from the Mitchell Institute for Fundamental Physics and Astronomy atTexas A&M University. Huang’s work was partially supported by NSF grants DMS-1208952,IIS-1900990, and CCF-1956219. The authors would like to thank two anonymous reviewersfor helpful comments which led to inclusion of Section 6 and other signiﬁcant improvementsof the paper. SUPPLEMENTARY MATERIAL

Supplementary proof and results . The supplementary material contains a review of thenotations of the exponential family used in the paper. It also contains the proof of Theorem 3.1and additional numerical results.

Supplementary code and dataset . The code and datasets are released online at https://github.com/shiyuanhe/sviperiodplr . They serve to reproduce the simulationand real data analysis results in this work.().

IMULTANEOUS INFERENCE OF PERIODS AND PLRS lll lll llll l ll ll llllll ll lll l l lll l J H KS1 S2 S3 S4 S5 S6 S1 S2 S3 S4 S5 S6 S1 S2 S3 S4 S5 S6−1.0−0.50.00.51.0

Setting a ~ - a ^ method MGLS MSP SVI lll ll ll l ll ll lll llll ll ll l ll llll lllll lll l lllll ll l

J H KS1 S2 S3 S4 S5 S6 S1 S2 S3 S4 S5 S6 S1 S2 S3 S4 S5 S60.02.55.07.510.0

Setting s ~ s ^ Fig 11: Method comparison via down-sampling the real M33 observation. The ﬁrst row showsthe difference between the downsampled PLR intercept and the full sample PLR intercept.The second row shows the standard deviation ratio of the downsampled PLR residuals andthe full sample PLR residuals. The three columns correspond to the J , H and K s bands,respectively. The result of 20 replications for MGLS, MSP and SVI are shown in the red,green and blue boxplots, respectively. REFERENCES

Blei, D. M., A. Kucukelbir, and J. D. McAuliffe (2017). Variational inference: A review for statisticians.

Journalof the American Statistical Association 112 (518), 859–877.Bonanos, A. Z., K. Z. Stanek, R. P. Kudritzki, L. M. Macri, D. D. Sasselov, J. Kaluzny, P. B. Stetson, D. Bersier,F. Bresolin, T. Matheson, et al. (2006). The ﬁrst direct distance determination to a detached eclipsing binaryin m33.

The Astrophysical Journal 652 (1), 313.Flewelling, H. (2018, January). Pan-STARRS Data Release 2. In

American Astronomical Society Meeting Ab-stracts , Volume 231 of

American Astronomical Society Meeting Abstracts , pp. 436.01.Freedman, W. L., B. F. Madore, B. K. Gibson, L. Ferrarese, D. D. Kelson, S. Sakai, J. R. Mould, R. C. Kennicutt Jr,H. C. Ford, J. A. Graham, et al. (2001). Final results from the hubble space telescope key project to measurethe hubble constant.

The Astrophysical Journal 553 (1), 47.Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article bybrowne and draper).

Bayesian analysis 1 (3), 515–534.Ghadimi, S. and G. Lan (2013). Stochastic ﬁrst-and zeroth-order methods for nonconvex stochastic programming.

SIAM Journal on Optimization 23 (4), 2341–2368.Gieren, W., M. Górski, G. Pietrzy´nski, P. Konorski, K. Suchomska, D. Graczyk, B. Pilecki, F. Bresolin, R.-P.Kudritzki, J. Storm, et al. (2013). The araucaria project. a distance determination to the local group spiral m33from near-infrared photometry of cepheid variables.

The Astrophysical Journal 773 (1), 69.He, S., Z. Lin, W. Yuan, L. M. Macri, and J. Z. Huang (2020). Supplement to “simultaneous inference of periodsand period-luminosity relations for mira variable stars”.He, S., W. Yuan, J. Z. Huang, J. Long, and L. M. Macri (2016). Period estimation for sparsely sampled quasi-periodic light curves applied to miras.

The Astronomical Journal 152 (6), 164.Hoffman, M. D., D. M. Blei, C. Wang, and J. Paisley (2013). Stochastic variational inference.

The Journal ofMachine Learning Research 14 (1), 1303–1347.Huang, C. D., A. G. Riess, S. L. Hoffmann, C. Klein, J. Bloom, W. Yuan, L. M. Macri, D. O. Jones, P. A.Whitelock, S. Casertano, et al. (2018). A near-infrared period–luminosity relation for miras in ngc 4258, ananchor for a new distance ladder.

The Astrophysical Journal 857 (1), 67.Javadi, A., M. Saberi, J. T. van Loon, H. Khosroshahi, N. Golabatooni, and M. T. Mirtorabi (2015). The ukinfrared telescope m33 monitoring project–iv. variable red giant stars across the galactic disc.

Monthly Noticesof the Royal Astronomical Society 447 (4), 3973–3991.Jordan, M. I., Z. Ghahramani, T. S. Jaakkola, and L. K. Saul (1999, Nov). An introduction to variational methodsfor graphical models.

Machine Learning 37 (2), 183–233.Lomb, N. R. (1976). Least-squares frequency analysis of unequally spaced data.

Astrophysics and space sci-ence 39 (2), 447–462.Long, J. P., E. C. Chi, and R. G. Baraniuk (2016). Estimating a common period for a set of irregularly sampledfunctions with applications to periodic variable star data.

The Annals of Applied Statistics 10 (1), 165–197.Macri, L., K. Stanek, D. Sasselov, M. Krockenberger, and J. Kaluzny (2001). Direct distances to nearby galaxiesusing detached eclipsing binaries and cepheids. vi. variables in the central part of m33.

The AstronomicalJournal 121 (2), 870.Macri, L. M. M. (2001).

The extragalactic distance scale . Ph. D. thesis, Harvard University.Mowlavi, N., I. Lecoeur-Taïbi, T. Lebzelter, L. Rimoldini, D. Lorenz, M. Audard, J. De Ridder, L. Eyer, L. Guy,B. Holl, et al. (2018). Gaia data release 2-the ﬁrst gaia catalogue of long-period variable candidates.

Astronomy& Astrophysics 618 , A58.Pellerin, A. and L. M. Macri (2011). The m 33 synoptic stellar survey. i. cepheid variables.

The AstrophysicalJournal Supplement Series 193 (2), 26.Pietrzy´nski, G., W. Gieren, D. Graczyk, I. Thompson, B. Pilecki, N. Nardetto, R.-P. Kudritzki, F. Bresolin,G. Bono, P. P. Moroni, P. Konorski, M. Gorski, J. Storm, R. Smolec, and P. Karczmarek (2013, February).A precise and accurate distance to the large magellanic cloud from late-type eclipsing-binary systems. In R. deGrijs (Ed.),

Advancing the Physics of Cosmic Distances , Volume 289 of

IAU Symposium , pp. 169–172.Rasmussen, C. E. and C. K. I. Williams (2005).

Gaussian Processes for Machine Learning (Adaptive Computationand Machine Learning) . The MIT Press.Riess, A. G., S. Casertano, W. Yuan, L. M. Macri, and D. Scolnic (2019). Large magellanic cloud cepheidstandards provide a 1% foundation for the determination of the hubble constant and stronger evidence forphysics beyond λ cdm. The Astrophysical Journal 876 (1), 85.Riess, A. G., L. Macri, S. Casertano, H. Lampeitl, H. C. Ferguson, A. V. Filippenko, S. W. Jha, W. Li, andR. Chornock (2011). A 3% solution: determination of the hubble constant with the hubble space telescope andwide ﬁeld camera 3.

The Astrophysical Journal 730 (2), 119.IMULTANEOUS INFERENCE OF PERIODS AND PLRS Riess, A. G., L. M. Macri, S. L. Hoffmann, D. Scolnic, S. Casertano, A. V. Filippenko, B. E. Tucker, M. J. Reid,D. O. Jones, J. M. Silverman, R. Chornock, P. Challis, W. Yuan, P. J. Brown, and R. J. Foley (2016, July). A2.4% Determination of the Local Value of the Hubble Constant.

ApJ 826 , 56.Sarajedini, A., M. Barker, D. Geisler, P. Harding, and R. Schommer (2006). Rr lyrae variables in m33. i. evidencefor a ﬁeld halo population.

The Astronomical Journal 132 (3), 1361.Scargle, J. D. (1982, December). Studies in astronomical time series analysis. II - Statistical aspects of spectralanalysis of unevenly spaced data.

The Astrophysical Journal 263 , 835–853.Scowcroft, V., D. Bersier, J. Mould, and P. R. Wood (2009). The effect of metallicity on cepheid magnitudes andthe distance to m33.

Monthly Notices of the Royal Astronomical Society 396 (3), 1287–1296.Soszy´nski, I., A. Udalski, M. K. Szyma´nski, M. Kubiak, G. Pietrzy´nski, Ł. Wyrzykowski, O. Szewczyk,K. Ulaczyk, and R. Poleski (2009, March). The Optical Gravitational Lensing Experiment. The OGLE-IIICatalog of Variable Stars. III. RR Lyrae Stars in the Large Magellanic Cloud.

ACTA ASTRONOMICA 59 ,1–18.Udalski, A., M. K. Szyma´nski, I. Soszy´nski, and R. Poleski (2008). The optical gravitational lensing experiment.ﬁnal reductions of the ogle-iii data.

ACTA ASTRONOMICA 58 , 69–87.Vanderplas, J. T. and Ž. Ivezic (2015). Periodograms for multiband astronomical time series.

The AstrophysicalJournal 812 (1), 18.Wainwright, M. J., M. I. Jordan, et al. (2008). Graphical models, exponential families, and variational inference.

Foundations and Trends® in Machine Learning 1 (1–2), 1–305.Wang, Y. and D. M. Blei (2019). Frequentist consistency of variational bayes.

Journal of the American StatisticalAssociation 114 (527), 1147–1161.Whitelock, P., J. Menzies, M. Feast, F. Nsengiyumva, and N. Matsunaga (2014). Vizier online data catalog: Jhksphotometry of agb stars in ngc 6822 (whitelock+, 2013).

VizieR Online Data Catalog 742 .Whitelock, P. A. and M. W. Feast (2014, July). Gaia, Variable Stars and the Distance Scale. In

EAS PublicationsSeries , Volume 67 of

EAS Publications Series , pp. 263–269.Yuan, W., L. M. Macri, S. He, J. Z. Huang, S. M. Kanbur, and C.-C. Ngeow (2017). Large magellanic cloudnear-infrared synoptic survey. v. period–luminosity relations of miras.

The Astronomical Journal 154 (4), 149.Yuan, W., L. M. Macri, A. Javadi, Z. Lin, and J. Z. Huang (2018). Near-infrared mira period–luminosity relationsin m33.

The Astronomical Journal 156 (3), 112.Zechmeister, M. and M. Kürster (2009). The generalised lomb-scargle periodogram. a new formalism for theﬂoating-mean and keplerian periodograms.

Astronomy and Astrophysics 496 (2), 577–584.Zhang, C., J. Butepage, H. Kjellstrom, and S. Mandt (2019). Advances in variational inference.

IEEE Transactionson Pattern Analysis and Machine Intelligence 41 (8), 2008–2026. Supplementary Materials to “Simultaneous inference of periods andperiod-luminosity relations for Mira variable stars”

S.1. Canonical Exponential Family.

This section reviews the notation for the exponen-tial family used in the paper. The general form for the canonical exponential family is f ( x | η ) = h ( x ) exp { η T t ( x ) − A ( η ) } . where η is the canonical parameter, r ( x ) is the sufﬁcient statistics, and A ( η ) is the log-partition function. It is known that E ( t ( x )) = ∇ η A ( η ) .For a p dimensional multivariate Gaussian distribution with mean µ and precision matrix Ω , its density is exp (cid:110) −

12 ( x − µ ) T Ω ( x − µ ) + 12 log det Ω + 12 log 2 π (cid:111) = exp (cid:110) (cid:104)− Ω , xx T (cid:105) + (cid:104) Ω µ , x (cid:105) − µ T Ω µ + 12 log det Ω + 12 log 2 π (cid:111) with t ( x ) = ( xx T , x ) , η = ( η , η ) = ( − Ω , Ω µ ) and A ( η ) = µ T Ω µ − log det Ω . Inaddition, it holds that E ( x ) = −

12 ( η ) − η , Var( x ) = −

12 ( η ) − , E ( xx T ) = −

12 ( η ) − + 14 ( η ) − η ( η ) T ( η ) − . The expression for its entropy is − (cid:90) log f ( x | η ) f ( x | η )d x = 12 log det (cid:0) πe Ω − (cid:1) = −

12 log det (cid:0) − η / ( πe ) (cid:1) . For gamma distribution with shape α and rate β , its density is exp (cid:110) − βx + ( α −

1) ln x + α ln β − ln Γ( α ) (cid:111) with t ( x ) = ( x, ln x ) and canonical parameter η = ( η , η ) = ( − β, α − . We also have E ( x ) = α/β = − ( η + 1) /η .For p dimensional Wishart distribution with n ( > p − degrees of freedom and scalematrix V (cid:31) , its density is exp (cid:110) −

12 tr (cid:0) V − X (cid:1) + n − p −

12 log det X − np − n V − log Γ p ( n/ (cid:111) with canonical parameter η = ( η , η ) = ( − V − , ( n − p − / . The mean is E X = n V = − . · (2 η + p + 1) · ( η ) − . S.2. Proof of Theorem 3.1.

With ﬁxed q ( α ) , q ( Ω ) , q ( γ ) , updating q ( θ i , f i ) = q ( θ i | f i ) q ( f i ) to maximize ELBO in (17) can be equivalently expressed by maximizing (cid:90) log p ( f i , θ i | α , Ω , γ , D i ) q ( f i , θ i ) × q ( f i , θ i ) q ( α ) q ( Ω ) q ( γ ) d θ i d f i d α d Ω d γ = (cid:90) h ( f i ) q ( f i ) d f i − (cid:90) log q ( f i ) q ( f i ) d f i , (S.1) IMULTANEOUS INFERENCE OF PERIODS AND PLRS where h ( f i ) = (cid:90) log p ( f i , θ i | α , Ω , γ , D i ) q ( θ i | f i ) × q ( θ i | f i ) q ( α ) q ( Ω ) q ( γ ) d θ i d α d Ω d γ = E q ( θ i | f i ) E q ( α ) E q ( Ω ) E q ( γ ) (cid:20) log p ( θ i | f i , α , Ω , γ , D i ) q ( θ i | f i ) + log p ( f i | α , Ω , γ ) (cid:21) . (S.2)To ﬁnd the optimal q ( θ i , f i ) , we ﬁrstly ﬁnd the optimal q ( θ i | f i ) at each ﬁxed f i . Then fromequation (S.1), q ( f i ) is readily available as q ( f i ) ∝ exp( h ( f i )) .Notice log p ( f i , θ i | α , Ω , γ , D i ) ∝ − B (cid:88) b =1 ( y ib − C ib θ ib ) T Σ − ib ( y ib − C ib θ ib ) −

12 log det Σ ib −

12 ( θ i − θ ) T Θ ( θ i − θ ) , where the irrelevant constants have been dropped. Notice in the above, the basis matrix C ib ,the covariance matrix Σ ib as well as the prior mean vector θ all depend on the frequency f i .The frequency has uniform prior. The full conditional for θ i is p ( θ i | f i , α , Ω , γ , D i ) ∝ exp (cid:110) B (cid:88) b =1 (cid:28) − C Tib Σ − ib C ib , θ ib θ Tib (cid:29) + B (cid:88) b =1 (cid:10) C Tib Σ − ib y ib , θ ib (cid:11) (cid:111) × exp (cid:110) −

12 ( θ i − θ ) T Θ ( θ i − θ ) (cid:111) ∝ exp (cid:110) (cid:68) ξ θ i , θ i θ Ti (cid:69) + (cid:68) ξ θ i , θ i (cid:69) (cid:111) . This full conditional is Gaussian with canonical parameters ξ θ i = − × diag (cid:0) C Ti Σ − i C i , · · · , C TiB Σ − iB C iB (cid:1) − Θ ,ξ θ i = (cid:0) ( C Ti Σ − i y i ) T , · · · , ( C TiB Σ − iB y iB ) T (cid:1) T + Θ θ . For ﬁxed f i , q ( θ i | f i ) is imposed as a Gaussian distribution with canonical parame-ter ( η θ i , η θ i ) . Holding q ( α ) , q ( γ ) , q ( Ω ) and f i ﬁxed, the optimal q ( θ i | f i ) maximizing(S.2) is found by setting its canonical parameter as η θ i = E q ( α ) E q ( γ ) E q ( Ω ) ( ξ θ i ) and η θ i = E q ( α ) E q ( γ ) E q ( Ω ) ( ξ θ i ) . This is the updating equation (18).In order to compute q ( f i ) ∝ exp( h ( f i )) , we now need to evaluate h ( f i ) , which is h ( f i ) = E q ( θ i | f ) E q ( α ) E q ( γ ) log p ( f i , θ i | y i , α , Ω , γ ) − E q ( θ i | f ) q ( θ i | f ) . Then, with irrelevant constants dropped, we have h ( f i ) ∝ (cid:104) η θ i , E q ( θ i | f i ) ( θ i θ Ti ) (cid:105) + (cid:104) η θ i , E q ( θ i | f i ) ( θ i ) (cid:105) − (cid:104) E q ( Ω ) E q ( γ ) Θ , E q ( α ) θ θ T (cid:105)− B (cid:88) b =1 (cid:2) y Tib Σ − ib y ib + log det Σ ib (cid:3) −

12 ln det( − η θ i / ( πe )) . (S.3)The last term is the entropy of the Gaussian distribution q ( θ i | f i ) . In addition, the ﬁrst andsecond moments of θ i can be evaluated, (cid:104) η θ i , E q ( θ i | f i ) ( θ i θ Ti ) (cid:105) + (cid:104) η θ i , E q ( θ i | f i ) ( θ i ) (cid:105) = (cid:104) η θ i , −

12 ( η θ i ) − + 14 ( η θ i ) − η θ i ( η θ i ) T ( η θ i ) − (cid:105) + (cid:104) η θ i , −

12 ( η θ i ) − η θ i (cid:105) = − B −

14 ( η θ i ) T ( η θ i ) − η θ i . (S.4)Now, (S.3) and (S.4) together imply (19). S.3. Discussion of parameterization of the non-periodic component of a light curve.

In our formulation of the non-periodic component of a light curve (Section 2.4), h ib ( · ) isa function of phase u = f i · t . This differs from the work of He et al. (2016) and Yuan et al.(2018), where the term h ib ( · ) is chosen as a function of time t . This section provides somenumerical results supporting the new formulation.Our choice of the model is guided by the goal of using the same kernel length-scale pa-rameter τ b for all Miras so that the computationally expensive process of ﬁnding the optimallength-scale parameter individually for each Mira can be avoided. Equation (10) reads(S.5) s ib ( t ) = m ib + b ( f i · t ) T β ib + h ib ( f i · t ) , where b ( u ) = (cos(2 πu ) , sin(2 πu )) T . Note that the argument in the periodic component b ( f i · t ) T β ib is f i · t . Our choice of using h ib ( f i · t ) makes the function argument of the non-period component consistent with the period component. This choice is supported by twonumerical results given below. First result.

We ﬁt the Gaussian process model to the I - band light curves of OGLE MLCO-rich Miras under the two parameterizations ( h ib ( t ) versus h ib ( f i t ) ) of the non-period com-ponent. We used the kernel given in Section 2.4 of the paper and compared the optimal ﬁttedlength-scale parameters. Because these MLC light curves are densely sampled with highsignal-to-noise ratio, we can ﬁt the Gaussian process model directly to each Mira, withoutusing the hierarchical model structure developed in our work (each Mira has its own length-scale parameter). (period)10 b (period)10 b Fig S.1: The ﬁtted length-scale parameters τ b (on the log scale) versus the logarithm of theperiods for O-rich LMC Mira I -band light curves. The model ﬁtting is carried out indepen-dently to 469 OGLE LMC O-rich Miras. The left penal shows the ﬁtted τ b of the unscaledmodel h ib ( t ) . The right penal shows the ﬁtted τ b of the scaled model h ib ( f i · t ) . IMULTANEOUS INFERENCE OF PERIODS AND PLRS . . . . Estimated Frequency T r ue F r equen cy . . . . Estimated Frequency T r ue F r equen cy Fig S.2: The frequency estimation comparison of using unscaled version h ib ( t ) (the left pe-nal) versus the scaled version h ib ( f i · t ) (the right penal) in the Beyesian hierarchical model.In each penal, the horizontal axis is the estimated frequency, and the vertical axis is the truefrequency.The left and right panels of Figure S.1 shows respectively the results of using the unscaledmodel h ib ( t ) and the scaled model h ib ( f i · t ) . The vertical axis (on the log scale) is theoptimal ﬁtted τ b , and the horizontal axis is the logarithm of the period. The left panel showsthat τ b varies in a large range, invalidating the idea of letting all Miras share the same length-scale parameter, while the right panel shows much smaller scattering of τ b . The positivecorrelation between τ b and log ( p ) is evident on the left panel but much less so on the rightpanel. This result suggests that the scaled model h ib ( f i · t ) is preferred if we let all Mirasshare the same length-scale parameter in the Gaussian process model of the non-periodiccomponent. Second result.

Using the simulated data for 5,000 O-rich Miras presented in Section 4,we tested the unscaled version h ib ( t ) with the shared kernel parameter for all Miras, andkeeping the rest of procedures the same in our Bayesian hierarchical model. It turns out themodel performance deteriorates considerably. More precisely, when h ib ( t ) is used instead of h ib ( f i · t ) , the recovery rate (RR) as deﬁned in the paper drops from . to . , andthe absolute deviation error (ADE) increases from . × − to . × − .Figure S.2 depicts the estimation result. The left penal shows the result of using h ib ( t ) ,while the right penal presents the result of using h ib ( f i · t ) . In each penal, the horizontal axisis the estimated frequency, and the vertical axis is the true frequency. It is obvious that the leftpanel shows a large scattering, while on the right panel most points reside on the diagonal.This result strongly supports our use of the scaled model h ib ( f i · t ) . S.4. Additional Simulation Study.

In addition to the simulation in the main text, wehave conducted a more extensive simulation. We generated 90 datasets of O-rich Mira lightcurves, containing observations in the I and K s bands. In a given dataset, each star hasexactly n I and n K data points in the I and K s bands, respectively. The number of data points ( n K , n I ) takes one of the following ten choices: (5 , , (5 , , (5 , , (5 , , (10 , , (10 , , (10 , , (20 , , (20 , and (30 , .Each dataset chooses one of three patterns (cadence) to generate time points. The obser-vation time points are generated by the cadence of the M33 dataset (C1), the cadence of the

14 16 18 20 . . . . . . ll O b s [ s e l O b s , ] Magitude s ( m ) s ( m ) s ( m ) + s ( m ) + Fig S.3: The magnitude noise relation for Simulation II. The three noise levels (N1, N2, andN3) are plotted as solid, dashed and dotted curves, respectively.OGLE LMC dataset (C2), or simply by uniform distribution (C3). For the ﬁrst cadence C1,the K s -band data is obtained at a later time than the I -band data. For the second cadence C2,the two bands are observed roughly at the same time but have a seasonal pattern, such thatthere are no observations for a range of months each year when the Sun is too close to thegalaxy. As for the last cadence, the two bands have similar uniform coverage over the wholeobservational time interval.Given the observation time points, the light curve magnitudes are also generated from thetemplate of Yuan et al. (2017). In addition, Gaussian noise is added to the magnitudes. Ac-cording to real data from the actual survey, the noise level increases as the object becomesfainter (with larger magnitude values). A commonly-used function to ﬁt the magnitude-noiserelation is σ ( m ) = exp ( a · m c − b ) . We adopted the values a = 1 . × − , b = 5 . and c = 5 that closely match the OGLE survey. This is our ﬁrst noise setting (N1). The other twosettings increase the noise level by using σ ( m ) + 0 . (N2) and σ ( m ) + 0 . (N3), respec-tively. Figure S.3 illustrates the three noise levels. In the ﬁgure, the grey points represent theactual noise-mag relation from the OGLE survey. They are 20,000 randomly-selected obser-vation points from OGLE. For our second simulation, the signal-to-noise ratio is lower thanthat of the OGLE survey, as the curves are slightly higher than the points. Therefore, it ismore difﬁcult to recover the correct periods.In summary, we have 10 choices for the number of data points, 3 choices of samplingcadence, and 3 choices of noise level. By all possible combinations, the above procedureproduces 90 datasets. Figure S.4 illustrates a few generated light curves for the choice ofsample size n K = 30 and n I = 20 . The ﬁrst, second and third rows correspond to samplingcadences C1, C2 and C3, respectively, while the ﬁrst, second, and third columns correspondto noise levels N1, N2 and N3, respectively. The black points and blue squares denote I - and K s -band light curves, respectively.We apply the methods SP, MSP, GLS, MGLS and SVI to the simulated datasets. In thesedatasets, the single-band methods are applied to the I -band data. The same metrics as in thesimulation Section 4 of the paper, RR and ADE, are used for model comparison. Figure S.5shows the recovery rate for the 90 datasets. The horizontal axis represents the varying samplesizes ( n K , n I ) for K s - and I -band data. Its columns correspond to different noise levels (N1–3 from left to right), and its rows for various sampling cadences (C1–3 from up to bottom). IMULTANEOUS INFERENCE OF PERIODS AND PLRS Date [day] M agn i t ude ll lllll ll ll IK s

500 1500 2500

Date [day] M agn i t ude lll ll lll ll Date [day] M agn i t ude llllll llll Date [day] M agn i t ude l lll l llll l . . . Date [day] M agn i t ude llll ll ll ll

600 1000 1400 1800 . . . . Date [day] M agn i t ude llll ll ll l l . . . . Date [day] M agn i t ude l l ll l ll ll l Date [day] M agn i t ude l l l l l l lll l

500 1500 2500 3500 . . . Date [day] M agn i t ude ll l l l l l l l l Fig S.4: Randomly selected synthetic Mira data for Simulation 2. The ﬁrst, second and thirdrows correspond to sampling cadence C1, C2 and C3, respectively. The ﬁrst, second, and thirdcolumns correspond to noise level N1, N2 and N3, respectively. Black points are I -band lightcurves, and blue squares are K s -band light curves.Similarly, Figure S.6 shows the result for the ADE, with its vertical axes on the logarithmscale. It is remarkable that the proposed method SVI has the best performance, and consis-tently estimates period with higher accuracy for all of the datasets. For this simulation, thesingle-band methods SP and GLS have a similar performance. However, MGLS outperformsMSP in both metrics. This suggests the MSP might not be robust for a dataset with slightlydifferent conﬁgurations from its model setting. S.5. Visual assessment of light curve ﬁtting for Miras in M33.

The quality of lightcurve ﬁtting using Gaussian process can be visually assessed by plotting the MAP estimator.Figure S.7 plots the data and ﬁtted curves for six randomly selected Miras in M33. Thelight curves are ﬁtted by our model with the double exponential kernel. The model providesreasonable ﬁts to the data. In particular, for the sparse M33 light curves, the

J, H, K s bandsshow smaller variability beyond the sinusoidal periodical component. The I -band light curveshave larger variability beyond the sinusoidal periodical component, and the corresponding l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l N1 N2 N3 C C C ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Size (n K , n I ) RR methods l GLS MGLS MSP SP SVI

Fig S.5: The recovery rate for Simulation II.kernel parameter τ b is larger than that of the other three bands. The kernel we used is k b ( u, u (cid:48) ) = τ b · exp (cid:8) − ( u − u (cid:48) ) /τ b (cid:9) + τ b · I( u = u (cid:48) ) . The table below summaries the kernel parameter ﬁtted to each band of the real data. Thevalues of τ b (or τ b ) among different bands are close, the minor difference may not be easilyexplained. band b τ b τ b τ b I e − . e − . e − . J e − . e − . e − . H e − . e − . e − . K s e − . e − . e − . IMULTANEOUS INFERENCE OF PERIODS AND PLRS l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l l l l ll l ll l ll l l ll l ll l ll l l ll l ll l l N1 N2 N3 C C C ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Size (n K , n I ) A D E methods l GLS MGLS MSP SP SVI

Fig S.6: The absolute deviation error for Simulation II. Date [day] M agn i t ude lllllllllllllllllllllllllllllllllllll llllllll l llllllllll l IJHK s p^=700.39 Date [day] M agn i t ude lllllllll lll lll lllll p^=646.84 Date [day] M agn i t ude lllllllllllllllllllllllllllllllllllll llllllll lll llllllllll p^=613.35 Date [day] M agn i t ude llllllllllllllllllllllllll lllllllllllllllll lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllll llllllllllllll ll l p^=408.74 Date [day] M agn i t ude llllll ll lllll p^=248.6 Date [day] M agn i t ude lllllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllllllll llllllllllllll llllllllll l p^=127.3p^=127.3