[PDF] Trend Filtering -- II. Denoising Astronomical Signals with Varying Degrees of Smoothness

Abstract

Trend filtering---first introduced into the astronomical literature in Paper I of this series---is a state-of-the-art statistical tool for denoising one-dimensional signals that possess varying degrees of smoothness. In this work, we demonstrate the broad utility of trend filtering to observational astronomy by discussing how it can contribute to a variety of spectroscopic and time-domain studies. The observations we discuss are (1) the Lyman- α forest of quasar spectra; (2) more general spectroscopy of quasars, galaxies, and stars; (3) stellar light curves with planetary transits; (4) eclipsing binary light curves; and (5) supernova light curves. We study the Lyman- α forest in the greatest detail---using trend filtering to map the large-scale structure of the intergalactic medium along quasar-observer lines of sight. The remaining studies share broad themes of: (1) estimating observable parameters of light curves and spectra; and (2) constructing observational spectral/light-curve templates. We also briefly discuss the utility of trend filtering as a tool for one-dimensional data reduction and compression.

Full PDF

MMNRAS , 1–15 (2020) Preprint 13 January 2020 Compiled using MNRAS L A TEX style ﬁle v3.0

Trend Filtering – II. Denoising Astronomical Signals withVarying Degrees of Smoothness

Collin A. Politsch, , , (cid:63) Jessi Cisewski-Kehe, Rupert A. C. Croft, , , and Larry Wasserman , , Department of Statistics & Data Science, Carnegie Mellon University, Pittsburgh, PA 15213 Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213 McWilliams Center for Cosmology, Carnegie Mellon University, Pittsburgh, PA 15213 Department of Statistics and Data Science, Yale University, New Haven, CT 06520 Department of Physics, Carnegie Mellon University, Pittsburgh, PA 15213 School of Physics, University of Melbourne, VIC 3010, Australia

Accepted XXX. Received YYY; in original form 2019 August 20

ABSTRACT

Trend ﬁltering—ﬁrst introduced into the astronomical literature in Paper I of thisseries—is a state-of-the-art statistical tool for denoising one-dimensional signals thatpossess varying degrees of smoothness. In this work, we demonstrate the broad utilityof trend ﬁltering to observational astronomy by discussing how it can contribute toa variety of spectroscopic and time-domain studies. The observations we discuss are(1) the Lyman- α forest of quasar spectra; (2) more general spectroscopy of quasars,galaxies, and stars; (3) stellar light curves with planetary transits; (4) eclipsing bi-nary light curves; and (5) supernova light curves. We study the Lyman- α forest in thegreatest detail—using trend ﬁltering to map the large-scale structure of the intergalac-tic medium along quasar-observer lines of sight. The remaining studies share broadthemes of: (1) estimating observable parameters of light curves and spectra; and (2)constructing observational spectral/light-curve templates. We also brieﬂy discuss theutility of trend ﬁltering as a tool for one-dimensional data reduction and compression. Key words:

Methods: statistical, techniques: spectroscopic, cosmology: observations,stars: planetary systems, stars: binaries: eclipsing, supernovae: general

Many astronomical analyses can be described by the fol-lowing problem setup. Suppose we collect noisy measure-ments of an observable quantity (e.g., ﬂux, magnitude, pho-ton counts) according to the data generating process f ( t i ) = f ( t i ) + (cid:15) i , i = , . . . , n , (1)where t , . . . , t n is an arbitrarily-spaced grid of one-dimensional inputs (e.g., times or wavelengths), (cid:15) i is meanzero noise, and f is a signal that may exhibit varying degreesof smoothness across the input domain (e.g., a smooth signalwith abrupt dips/spikes). Given the observed data sample,we then attempt to estimate (or denoise) the underlyingsignal f from the observations by applying appropriate sta-tistical methods. In Politsch et al. (2020)—hereafter referredto as Paper I—we introduced trend ﬁltering (Tibshirani & (cid:63) E-mail: [email protected]

Taylor 2011; Tibshirani 2014) into the astronomical litera-ture. When the underlying signal is spatially heterogeneous,i.e. possesses varying degrees of smoothness, trend ﬁlteringis superior to popular statistical methods such as Gaussianprocess regression, smoothing splines, kernels, LOESS, andmany others (Nemirovskii et al. 1985; Nemirovskii 1985; Tib-shirani 2014). Furthermore, the trend ﬁltering estimate canbe computed via a highly eﬃcient and scalable convex op-timization algorithm (Ramdas & Tibshirani 2016) and onlyrequires data-driven selection of a single scalar hyperparam-eter. In this paper, we directly demonstrate the broad utilityof trend ﬁltering to observational astronomy by using it tocarry out a diverse set of spectroscopic and time-domainanalyses.The outline of this paper is as follows. In Section 2, weuse trend ﬁltering to study the Lyman- α forest of quasarspectra—a series of absorption features that can be usedas a tracer of the matter density distribution along quasar-observer lines of sight. We choose to study this application in © a r X i v : . [ a s t r o - ph . I M ] J a n C. A. Politsch et al. depth and then illustrate the breadth of trend ﬁltering’s util-ity through our discussions in Section 3. The applications wediscuss in Section 3 can be grouped into two broad (and oftenintertwined) categories: (1) deriving estimates of observableparameters from trend-ﬁltered observations; and (2) usingtrend ﬁltering to construct spectral/light-curve templates ofastronomical objects/events. In Section 3.1, we discuss con-structing spectral template libraries for astronomical objectsby trend ﬁltering coadded spectroscopic observations. We il-lustrate our approach on quasar, galaxy, and stellar spectrafrom the Sloan Digital Sky Survey. Emission-line parame-ters can also be robustly estimated by ﬁtting radial basisfunctions (e.g., Gaussians) to trend-ﬁltered estimates nearemission lines. In Section 3.2, we use relaxed trend ﬁlteringto model the detrended, phase-folded light curve of a Keplerstellar system with a transiting exoplanet. We derive esti-mates and full uncertainty distributions for the transit depthand total transit duration. In Section 3.3, we use trend ﬁlter-ing to denoise a detrended, phase-folded Kepler light curveof an eclipsing binary (EB) system. We illustrate that trendﬁltering provides signiﬁcant improvements upon the popular polyfit method of Prˇsa et al. (2008) that is used to modelKepler EB light curves and derive observable parameters.In Section 3.4, we discuss using trend ﬁltering to constructlight-curve templates of supernova (SN) explosions. We il-lustrate this approach on a SN light curve obtained from theOpen Supernova Catalog (Guillochon et al. 2017). Further-more, we derive estimates and full uncertainty distributionsfor a set of observable parameters—namely, the maximumapparent magnitude, the time of maximum, and the declinerate. Finally, in Section 3.5, we brieﬂy discuss a diﬀerent,non-data-analysis application of trend ﬁltering. Speciﬁcally,we discuss the use of trend ﬁltering as a tool for fast and ﬂex-ible one-dimensional data reduction and compression. Theﬂexibility of the trend ﬁltering estimator, paired with its ef-ﬁcient speed and storage capabilities, make it a potentiallypowerful tool to include in large-scale (one-dimensional) as-tronomical data reduction and storage pipelines.We utilize the Arnold et al. (2014) glmgen R implemen-tation of the Ramdas & Tibshirani (2016) trend ﬁlteringoptimization algorithm in this work. See Paper I for imple-mentations in other programming languages. α FOREST

The Lyman- α (Ly α ) forest is the name given to the absorp-tion features seen in quasar spectra which are caused byneutral hydrogen (H I ) in the intergalactic medium betweena quasar and an observer. When emitted from an accretiondisk close to the central black hole, the light from the quasarhas a relatively smooth spectrum—a continuum—causedby the summed black-body emission of gas with diﬀerenttemperatures at diﬀerent disk radii (Osterbrock & Ferland2006). Emission lines are also seen, and their intensities andline ratios supply information on the physical conditions inthe line emitting gas. At least twenty broad emission lines,broadened by high velocities and temperatures can be mea-sured in a single AGN, along with a similar number of nar-row lines from colder gas (Marziani et al. 1996). The emit-ted spectrum therefore already consists of a superposition of components with varying degrees of smoothness. The Ly α forest arises when this spectrum is further processed withthe addition of absorption lines. Light moving towards theobserver is redshifted into resonance with the Ly α transitionof H I , and the strength of absorption features is dictated bythe densities of intergalactic material along the line of sight(Rauch 1998). The smoothness of the absorption lines variesdepending on the gas pressure, and thermal doppler broad-ening (Gnedin & Hui 1998; Peeples et al. 2010). Sharperabsorption features, metal lines, are also caused by otherintergalactic species, such as C IV , O VI and Mg II (Hellstenet al. 1998; Pieri et al. 2010). The usefulness of the Ly α forest as a cosmological probe (e.g., Palanque-Delabrouilleet al. 2015) stems from its relationship to the matter den-sity ﬁeld in the Universe, eﬀectively mapping out structurealong each quasar-observer line of sight (e.g., Croft et al.1997; Lee et al. 2014). In order to extract this informationfrom noisy spectra and separate it from other components,it is useful to have a method that can deal with the com-plexities outlined above, i.e. one that can naturally adapt tovarying degrees of smoothness without extensive tuning.The relative ﬂuctuations in the Ly α forest transmittedﬂux fraction are of primary interest since they possess amonotonic relationship with the relative distribution of theabsorbing H I . We utilize trend ﬁltering to ﬁrst denoise thespatially heterogeneous ﬂux signal in an observed Ly α for-est. Estimates for the ﬂuctuations in transmitted ﬂux dueto absorbing H I are then typically produced by couplingthe denoised Ly α forest with estimates for the quasar con-tinuum and the cosmic mean transmitted ﬂux in the Ly α forest. We take an alternative approach: directly estimatingthe mean ﬂux level—deﬁned as the product of the contin-uum and cosmic mean transmitted ﬂux—as in Croft et al.(2002). The mean ﬂux level is a very smooth, spatially ho-mogeneous function within the truncated Ly α forest rest-frame. It is therefore appropriate to use a linear smoother(see Paper I, Section 2) for this stage of estimation. Speciﬁ-cally, we use local polynomial regression (LOESS; Cleveland1979; Fan & Gijbels 1996; Loader 1999). In this section, weillustrate these methods on a mock quasar Ly α forest fromBautista et al. (2015) and a real quasar Ly α forest fromthe Baryon Oscillation Spectroscopic Survey Data Release12 (BOSS DR12; Alam et al. 2015) of the Sloan Digital SkySurvey III (SDSS-III; Eisenstein et al. 2011; Aihara et al.2011).Historically, Ly α forest analyses have typically utilizedkernel smoothers (e.g., Croft et al. 2002; Kim et al. 2004),wavelets (e.g., Theuns & Zaroubi 2000), or Gaussian pro-cesses (e.g., Palanque-Delabrouille et al. 2013). Suppose we observe a quasar located at redshift z = z . Ig-noring systematic eﬀects such as sky contamination and in-terstellar extinction for the moment, the observational DGPof the Ly α forest can be assumed to follow the model f ( λ ) = f ( λ ) + (cid:15) ( λ ) , λ ∈ Λ ( z ) , (2) = F ( λ ) · C ( λ ) · (cid:0) + δ F ( λ ) (cid:1) + (cid:15) ( λ ) , (3)where f ( λ ) is the observed ﬂux at wavelength λ , f ( λ ) is theﬂux signal, (cid:15) ( λ ) is zero mean white Gaussian noise, Λ ( z ) = MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering Input Deﬁnition Range (quasar at z = z ) λ Observed wavelength Λ ( z ) ν Rest wavelength Λ rest ( z ) = Λ ( z )/( + z ) z Redshift Π ( z ) = Λ ( z )/ λ Ly α − ζ Log-wavelength (scaled) Z ( z ) = · log ( Λ ( z )) Table 1.

Various input spaces utilized for the Ly α forest analysis.Notation of functions is held constant, e.g. δ F (·) , and an alterationof the input variable implicitly indicates a change of input spaces.Logarithmic wavelengths are scaled for numerical stability of thetrend ﬁltering optimization algorithm. ( λ Ly β , λ Ly α ) · ( + z ) is the redshifted Ly α forest, C ( λ ) is theﬂux of the unabsorbed quasar continuum, F ( λ ) = f ( λ )/ C ( λ ) is the transmitted ﬂux fraction, F ( λ ) = E [ F ( λ )] is the meantransmitted ﬂux fraction (over the sky) in the Ly α forest atredshift z = λ / λ Ly α − , and δ F ( λ ) = F ( λ )/ F ( λ ) − (4)is the ﬂuctuation about the mean Ly α transmitted ﬂux atredshift z = λ / λ Ly α − . Here, δ F is the quantity we areprimarily interested in estimating since δ F ∝ δ − H I at eachﬁxed redshift, where δ H I is the density of H I . The estimationof the ﬂux signal f is viewed as an ancillary step.Although, in principle, it is preferable to study the fullspectral range Λ ( z ) we have found that, in the nonpara-metric setting, estimating the quasar continuum near thelocalized Ly α and Ly β emission peaks at the boundaries ofthe Ly α forest reduces the estimation accuracy in the inte-rior of Λ ( z ) . Therefore, in this work we limit our analysisto the truncated Ly α forest range Λ ( z ) = ( ˚A , ˚A ) · ( + z ) . (5)We simplify notation in this work by changing the inputspace of the functions introduced above by merely alteringthe input variable. For example, with respect to δ F , we main-tain the notation δ F (·) for all inputs λ , ν , z , ζ , while it isunderstood that a proper change of input spaces has takenplace. The various input spaces are deﬁned in Table 1. We use quadratic trend ﬁltering (Tibshirani & Taylor 2011;Tibshirani 2014) to estimate the ﬂux signal f of the obser-vational model (2). In both BOSS DR12 and the Bautistaet al. (2015) mock catalog, the quasar spectra are sampledon equally-spaced grids in logarithmic wavelength space with ∆ log ( λ i ) = − dex (in logarithmic angstroms). Further-more, ﬂux measurement variances are provided by the BOSSpipeline (Bolton et al. 2012), accounting for the statisticaluncertainty introduced by photon noise, CCD read noise,and sky-subtraction error. We correct the BOSS spectrumfor interstellar extinction with the Cardelli et al. (1989) ex-tinction law and the Schlaﬂy & Finkbeiner (2011) dust map.We ﬁt the trend ﬁltering estimator on the equally-spaced logarithmic grid and tune the complexity by mini-mizing Stein’s unbiased risk estimate (SURE) of the ﬁxed-input mean-squared error (see Paper I). More precisely, weﬁt the trend ﬁltering estimator in the input space Z ( z ) = · log ( Λ ( z )) , as deﬁned in Table 1, where we add the scaling to unit spacing for numerical stability of the trendﬁltering convex optimization. We utilize a modiﬁed Croft et al. (2002) approach to prop-agate the trend ﬁltering estimate for the ﬂux signal f fromSection 2.2 into an estimate for the ﬂuctuation ﬁeld δ F alonga line of sight to an observed quasar. Namely, given the trendﬁltering estimate (cid:98) f , we directly estimate the smooth meanﬂux level deﬁned as the product m = F · C and then deﬁne the δ F estimates via the transformation (cid:98) δ F : = (cid:98) f / (cid:98) m − . We carryout the estimation of m via a wide-kernel LOESS smooth ofthe trend ﬁltering estimate, with the speciﬁc bandwidth ofthe kernel selected by optimizing over a large sample of mockspectra (detailed in Section 2.4). We ﬁnd that regressing onthe ﬁtted values of the trend ﬁltering estimate—instead ofthe observational DGP (2)—signiﬁcantly improves the ac-curacy and robustness of the δ F estimates. We carry outthe LOESS estimation in the Ly α restframe Λ rest ( z ) (seeTable 1) in order to remove the eﬀect of redshifting on thesmoothness of m . The LOESS estimation of m is a fully non-parametric procedure and provides a reduction in bias overpopular parametric approaches such as low-order power lawsand principal components analyses (PCA). The sole assump-tion of our LOESS approach is that, in the restframe, themean ﬂux level m always has a ﬁxed degree of smoothness,deﬁned by an optimal ﬁxed kernel bandwidth. There is ofcourse a bias-variance tradeoﬀ here; namely, the decreasedbias comes with a modest increase in variance comparedto a parametric power law or low-dimensional PCA model.Our decision in favor of the LOESS approach directly re-ﬂects our stance that low bias is preferable to low variancein this context since statistical uncertainty due to estima-tor variability is tracked by our uncertainty quantiﬁcation(Section 2.5), while uncertainty due to modeling bias is noteasily quantiﬁable. Therefore, we can be more conﬁdent thatsigniﬁcant ﬂuctuations in the estimated δ F ﬁeld are in factreal, and not due to statistical bias in the quasar continuumestimate.To be explicit, the LOESS estimator for m is a regressionon the data set {( ν i , (cid:98) f ( ν i ; (cid:98) γ )} ni = , ν i ∈ Λ rest ( z ) , (6)which can be viewed as arising from the DGP (cid:98) f ( ν i ; (cid:98) γ ) = m ( ν i ) + ρ i , (7)where (cid:98) f is the trend ﬁltering estimate ﬁxed at the minimumSURE hyperparameter (cid:98) γ , e i = (cid:98) f ( ν i ; (cid:98) γ ) − f ( ν i ) are the errorsof the trend ﬁltering estimate, and ρ i = m ( ν i ) · δ F ( ν i ) + e i areautocorrelated ﬂuctuations about zero. The LOESS estima-tor is the natural extension of kernel regression (Nadaraya1964; Watson 1964) to higher-order local polynomials. Givena kernel function K (·) with bandwidth h > , for each i = , . . . , n , the LOESS estimator is obtained by minimiz-ing n (cid:213) j = (cid:16) (cid:98) f ( ν j ; (cid:98) γ ) − φ ν i ( ν j ; β , . . . , β d ) (cid:17) K (cid:32) | ν j − ν i | h (cid:33) , (8)and letting (cid:98) m ( ν i ) = (cid:98) β , where φ ν i (· ; β , . . . , β d ) is a d th orderpolynomial centered at ν i . Speciﬁcally, we utilize the local MNRAS000

C. A. Politsch et al. linear regression estimator (LLR; d = ) and the Epanech-nikov kernel (Epanechnikov 1967) K ( t ) = ( − t ) {| t | < } . (9)The LLR estimator is described in full detail by Algorithm 1.Given the trend ﬁltering estimate (cid:98) f and the LLR estimate (cid:98) m , the δ F estimates are then deﬁned as (cid:98) δ F ( z i ; (cid:98) γ ) = (cid:98) f ( z i ; (cid:98) γ ) (cid:98) m ( z i ) − , z , . . . , z n ∈ Π ( z ) , (10)where we deliberately express (cid:98) m as “hyperparameter-less”since γ has already been ﬁxed at the minimum SURE value (cid:98) γ and we provide the optimal LOESS bandwidth value h = ˚A—optimized over the large mock sample. Here, we havealso done a change of variables to redshift space—our desiredinput domain for studying the H I density ﬂuctuations in theIGM. Algorithm 1

LOESS (local linear) estimator for mean ﬂux level

Require:

Training Data {( ν i , (cid:98) f ( ν i ; (cid:98) γ ))} ni = , Bandwidth h = ˚A for all i do Let (cid:98) β , (cid:98) β minimize the locally weighted sum ofsquares n (cid:213) j = (cid:16) (cid:98) f ( ν j ; (cid:98) γ ) − β − β ( ν j − ν i ) (cid:17) · K (cid:32) | ν j − ν i | h (cid:33) . Let (cid:98) m ( ν i ) = (cid:98) β ( ν i ) . end forOutput: { (cid:98) m ( ν i )} ni = Although h is chosen to directly optimize (cid:98) δ F accuracy,an estimate for the quasar continuum C (·) arises intrinsically: (cid:98) C ( ν i ) = F ( ν i ) − · (cid:98) m ( ν i ) , ν i ∈ Λ rest ( z ) , (11)where precise estimates of F follow from a rich literature (e.g.Bernardi et al. 2003; McDonald et al. 2005; Faucher-Gigu`ereet al. 2008; Dall’Aglio et al. 2009; Pˆaris et al. 2011; Beckeret al. 2013). The δ F estimates could then be equivalentlyrestated as (cid:98) δ F ( z i ; (cid:98) γ ) = (cid:98) F ( z i ; (cid:98) γ ) F ( z i ) − , z , . . . , z n ∈ Π ( z ) , (12)where (cid:98) F ( z i ; (cid:98) γ ) = (cid:98) f ( z i ; (cid:98) γ )/ (cid:98) C ( z i ) . (13) We utilize a sample of 124,709 mock quasar spectra from theBautista et al. (2015) catalog to optimize the bandwidth ofthe LOESS estimator for the mean ﬂux level that intrinsi-cally removes the eﬀect of the quasar continuum. Our mockdata reduction is detailed in the appendix and the redshiftdistribution of the quasars is shown in the top panel of Fig-ure 1.For each mock quasar Ly α forest with DGP Q j , j = , Figure 1. Top : Distribution of mock quasar redshifts (datareduction detailed in the appendix). We utilize this sample of124,709 quasars to calibrate the optimal nonparametric contin-uum smoothness.

Bottom : Mean absolute deviation error curvefor selecting the optimal kernel bandwidth for the LOESS (locallinear) estimator of the mean ﬂux level, averaged over the 124,709spectra in the mock sample. The optimal choice of bandwidth is h = ˚A. . . . , 124,709, we ﬁrst compute the trend ﬁltering hyperpa-rameter value that minimizes the true ﬁxed-input mean-squared prediction error γ j = arg min γ> n n (cid:213) i = E Q j (cid:104)(cid:0) f ( ζ i ) − (cid:98) f ( ζ i ; γ ) (cid:1) (cid:12)(cid:12)(cid:12) ζ , . . . , ζ n (cid:105) . (14)Then, given the trend ﬁltering restframe ﬁtted values {( ν i , (cid:98) f ( ν i ; γ j )} ni = , we ﬁt a LOESS estimator with bandwidth h and deﬁne the error (as a function of h ) of the resulting δ F estimator as the ﬁxed-input mean absolute deviation (MAD)error R j ( h ) = n (cid:213) i E Q j (cid:104)(cid:12)(cid:12) δ F ( ν i ) − (cid:98) δ F ( ν i ; γ j , h ) (cid:12)(cid:12) (cid:12)(cid:12)(cid:12) ν , . . . , ν n (cid:105) , (15)where E Q j denotes the mathematical expectation over the MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering randomness arising from the observational DGP Q j . Becausewe can repeatedly sample from each mock quasar DGP, theexpectations in (14) and (15) can be computed to an ar-bitrary precision. Here, we utilize 300 realizations of eachDGP to approximate the mathematical expectations.We then deﬁne the optimal choice of h as the minimizerof the summed error over the full sample of m = h = arg min h > m (cid:213) j = R j ( h ) . (16)The aggregate error curve is shown in the bottom panel ofFigure 1, yielding an optimal value of h = ˚A. We ﬁndthat deﬁning R ( h ) as the conditional MAD error—insteadof the conditional MSE—provides an essential boost in ro-bustness that keeps the error from being dominated by avery small proportion of worst-case estimates. More complexchoices of bandwidth, e.g. variable bandwidths that dependon the S/N ratio of the trend ﬁltering estimator, the rest-frame pixel spacing, and/or the redshift of the quasar, donot signiﬁcantly improve upon the h = ˚A restframe ﬁxedbandwidth. Given the assumed noise model (cid:15) i ∼ N ( , σ i ) , i = , . . . , n provided by the BOSS pipeline (Bolton et al. 2012), we canconstruct a pointwise variability band for (cid:98) δ F via an augmen-tation of the parametric bootstrap outlined in Algorithm 2of Paper I. Speciﬁcally, given the parametric bootstrap en-semble of trend ﬁltering estimates provided by the paramet-ric bootstrap algorithm, we ﬁt the mean ﬂux level of eachwith the LOESS estimator detailed in Algorithm 1, and thendeﬁne the (cid:98) δ F bootstrap ensemble (cid:98) δ ∗ F , b ( z i ) = (cid:98) f ∗ b ( z i ) (cid:98) m ∗ b ( z i ) − , i = , . . . , n , b = , . . . , B , (17)where, for each b = , . . . , B , (cid:98) m ∗ b is the LOESS estimate ﬁt tothe data set (cid:8) ( ν i , (cid:98) f ∗ b ( ν i )) (cid:9) ni = . Note that reﬁtting the LOESSestimator on each bootstrap realization allows us to trackthe extra variability introduced into the δ F estimates bythe uncertainty in the continuum estimate. A ( − α ) · quantile-based pointwise variability band for (cid:98) δ F is then givenby V − α ( z i ) = (cid:16)(cid:98) δ ∗ F ,α / ( z i ) , (cid:98) δ ∗ F , − α / ( z i ) (cid:17) , i = , . . . , n . (18)where (cid:98) δ ∗ F ,β ( z i ) = inf g (cid:40) g : 1 B B (cid:213) b = (cid:8)(cid:98) δ ∗ F , b ( z i ) ≤ g (cid:9) ≥ β (cid:41) , (19)for any β ∈ ( , ) . A mock quasar spectrum from Bautista et al. (2015) anda real quasar spectrum from the Baryon Oscillation Spec-troscopic Survey (BOSS; Alam et al. 2015) are displayed inFigure 2, with the results of our Ly α forest analysis overlaid. Starting with the top panel, the quadratic trend ﬁltering es-timate is ﬁt on the equally-spaced observations in logarith-mic wavelength space and then transformed to the restframewavelength space, where the LOESS estimate for the meanﬂux level—the product of the continuum and cosmic meanLy α absorption—is then ﬁt to the trend ﬁltering ﬁtted val-ues. The δ F estimates are then computed according to (10)and displayed in redshift space in the second panel, wherethey closely track the true δ F deﬁned by (4). Furthermore,the 95% bootstrap variability band deﬁned in (18) can beseen to almost fully cover the true δ F signal. The estimated δ F can be interpreted as an inversely proportional proxy forthe deviations from the mean H I density in the interveningintergalactic medium at each redshift—with negative valuesof δ F corresponding to (epoch-relative) over-densities of H I and positive values corresponding to under-densities.The third and fourth panels are the analogous plots fora BOSS quasar spectrum Ly α forest (Data Release 12, Plate= 6487, MJD = 56362, Fiber = 647). The quasar is locatedin the northern galactic cap at (RA, Dec, z ) ≈ (196.680 ◦ ,31.078 ◦ , 2.560). The analysis of signals possessing varying degrees of smooth-ness permeates many areas of astronomy. In this section, wediscuss a variety of further applications for which trend ﬁl-tering may ﬁnd use. For the sake of brevity, we discuss theseapplications in less detail than the Ly α forest analysis in Sec-tion 2. Naturally, we expect trend ﬁltering may ﬁnd manyuses in astronomy beyond those we explicitly discuss. Automated spectral classiﬁcation and redshift estimationpipelines require rich template libraries that span the fullspace of physical objects in the targeted set in order toachieve high statistical accuracy. In this section, we discussusing trend ﬁltering to construct spectral templates fromobservational spectra. We describe the procedure here forgenerating a single spectral template from a well-sampledobserved spectrum. Suppose we observe coadded ﬂux mea-surements of a targeted source at wavelengths λ , . . . , λ n ∈ Λ according to the data generating process f ( λ i ) = f ( λ i ) + (cid:15) i , i = , . . . , n , (20)where the observations have been corrected for systematiceﬀects (e.g., sky subtraction, interstellar extinction, etc.)and the (cid:15) i are mean-zero errors that arise from instrumentaluncertainty and systematic miscalibrations. After removal ofpotentially problematic pixels (e.g., near bright sky lines),let (cid:98) f ( λ ) , λ ∈ Λ denote the continuous trend ﬁltering esti-mate ﬁt to the observations. Given a conﬁdent object classi-ﬁcation and redshift estimate z (e.g., determined by visualinspection), we then deﬁne the restframe spectral template b ( λ rest ) = (cid:98) f ( λ /( z + )) , λ rest ∈ Λ /( z + ) , (21)and store it in the respective object template library.In Figure 3, we show three optical coadded spectra from MNRAS000

C. A. Politsch et al.

Figure 2.

Results of Ly α forest analysis. Top panel : Ly α forest of a mock quasar spectrum (Bautista et al. 2015) in the restframe,with the quadratic trend ﬁltering estimate shown in orange and the LOESS (local linear) estimate for the mean ﬂux level shown in blue. Second panel : The redshift-space ﬂuctuations in the Ly α transmitted ﬂux fraction, with our estimate superposed. The ﬂuctuationsinversely trace the relative under- and over-densities of H I in the intervening intergalactic medium between Earth and the quasar. Thirdand Fourth panels : Analogous plots for a real quasar Ly α forest from the twelfth data release of the Baryon Oscillation SpectroscopicSurvey (Alam et al. 2015; Plate = 6487, MJD = 56362, Fiber = 647). The quasar is located at (RA, Dec, z ) ≈ (196.680 ◦ , 31.078 ◦ , 2.560).MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering Figure 3.

Optical coadded spectra collected by the Baryon Oscillation Spectroscopic Survey (Alam et al. 2015) of the Sloan Digital SkySurvey III (Eisenstein et al. 2011; Aihara et al. 2011). From top to bottom, a quasar (DR12, Plate = 7140, MJD = 56569, Fiber = 58)located at (RA, Dec, z ) ≈ (349.737 ◦ , 33.414 ◦ , . ), a galaxy (DR12, Plate = 7140, MJD = 56569, Fiber = 68) located at (RA, Dec, z ) ≈ (349.374 ◦ , 33.617 ◦ , . ), and a star (DR12, Plate = 4055, MJD = 55359, Fiber = 84) located at (RA, Dec, z ) ≈ (236.834 ◦ , 0.680 ◦ , . ). We ﬁt a quadratic trend ﬁltering estimate to each spectrum in the logarithmic wavelength space in which the observations areequally spaced, and optimize the hyperparameter by minimizing Stein’s unbiased risk estimate. Given conﬁdently determined redshifts(e.g., determined by visual inspection), the trend ﬁltering estimate for each object can be scaled to the restframe and stored as a spectraltemplate. Furthermore, emission-line parameter estimates for a spectrum can be obtained by ﬁtting Gaussian radial basis functions tothe emission lines of the trend ﬁltering estimate.MNRAS000 , 1–15 (2020) C. A. Politsch et al. the twelfth data release of the Baryon Oscillation Spectro-scopic Survey (BOSS DR12; Alam et al. 2015) of the SloanDigital Sky Survey III (SDSS-III; Eisenstein et al. 2011; Ai-hara et al. 2011). The ﬁgure is zoomed to a subinterval ofthe optical range for visual clarity and the spectra are easilyidentiﬁable as a quasar, a galaxy, and a star, respectively.We ﬁt an error-weighted quadratic trend ﬁltering estimateto each spectrum in the logarithmic-angstrom wavelengthspace in which the BOSS spectra are gridded, and tune thehyperparameter to minimize Stein’s unbiased risk estimate(see Section 3.5 of Paper I). The trend ﬁltering estimatesgive good results, adequately adapting to even the strongestemission and absorption features in each spectrum.As in the BOSS spectroscopic pipeline (Bolton et al.2012), after a spectral classiﬁcation and redshift measure-ment have been precisely determined, emission-line parame-ter estimates can then be obtained by ﬁtting Gaussian radialbasis functions to the emission lines of the spectrum “bestﬁt”—nonlinearly optimizing the amplitudes, centroids, andwidths. We propose that the trend ﬁltering estimate of aspectrum be used as the “best ﬁt” for this type of proce-dure, e.g. instead of the low-dimensional principal compo-nents models currently used by the BOSS pipeline. The rel-ative magnitudes of the emission-line parameter estimatescan then be used to determine object subclassiﬁcations.

In this section we discuss the use of trend ﬁltering for model-ing the photometric time series of an extrasolar planet tran-siting its host star. Given a phase-folded stellar light curve,corrected for stellar variability and spacecraft motion sys-tematics, trend ﬁltering can be used to automatically pro-duce fully nonparametric estimates and uncertainties for thetransit depth and total transit duration.We demonstrate our approach on long-cadence photo-metric observations of Kepler-10 (KOI-72, KIC 11904151;Borucki et al. 2010). Kepler-10 is conﬁrmed to host atleast two exoplanets: Kepler-10b (KOI-72 b, KIC 11904151b; Batalha et al. 2011) and Kepler-10c (KOI-72 c, KIC11904151 c; Fressin et al. 2011). Each planet was ﬁrst de-tected via the transit method—a measurable periodic dim-ming in the photometry caused by the planet crossing infront of the host star. Here, we use trend ﬁltering to es-timate a nonparametric transit model for Kepler-10c andderive depth and duration measurements. The results ofour analysis are displayed in Figure 4. The top panel dis-plays a sample of the Kepler-10 long-cadence (30-min. in-crement), quarter-stitched, median detrended, relative ﬂuxlight curve processed by the Kepler pipeline (Jenkins et al.2010), which we accessed from the NASA Exoplanet Archive(Akeson et al. 2013). The observed transit events of Kepler-10c are indicated by the vertical dashed lines. The middlepanel displays the light curve (with σ error bars) afterphase-folding with respect to the ∼ K -fold cross validation. The optimal relaxationhyperparameter is (cid:98) φ = . , indicating that relaxation sig-niﬁcantly improves the traditional trend ﬁltering estimate in this setting. In particular, we ﬁnd that relaxation allowsthe estimate to faithfully capture the sharp transitions cor-responding to the beginning of the ingress phase and theend of the egress phase. The relaxed trend ﬁltering estimateis overlaid on the phase-folded light curve, along with 95%variability bands. Estimates for the transit depth and totaltransit duration follow immediately from the relaxed trendﬁltering estimate, as detailed below. The estimated incep-tion and termination of the transit event are indicated bythe vertical dashed lines in the middle panel. The nonpara-metric bootstrap sampling distributions (see Algorithm 1of Paper I) of the transit depth and total transit durationmeasurements are displayed in the bottom panel. Our pointestimates for the transit depth and total transit durationfor Kepler-10c are (cid:98) δ = . ppm and (cid:98) T = . hours,respectively.The knot-selection property of trend ﬁltering is partic-ularly appealing in this setting because it provides interpre-tation as to where the transit event begins and ends. Specif-ically, we deﬁne our estimate of the inception of the transitevent (cid:98) T as the leftmost knot selected by the trend ﬁlteringestimator and we deﬁne our estimate of the termination ofthe transit event (cid:98) T as the rightmost knot . The total transitduration estimate then follows as (cid:98) T = (cid:98) T − (cid:98) T . Naturally, wedeﬁne our transit depth estimate (cid:98) δ to be the minimum ofthe relaxed trend ﬁltering estimate.It is thus far unclear to us whether trend ﬁltering canalso reliably detect the end of the ingress and beginningof the egress phases—e.g. via knot selection or examiningchanges in the estimated derivates—and therefore also pro-vide nonparametric ingress/egress duration measurements.We recommend pairing trend ﬁltering with the traditionalanalytical transit model search (e.g., Mandel & Agol 2002)for these particular measurements. That is, given the transitdepth and total transit duration measurements provided bytrend ﬁltering, an analytical planet model can be ﬁt overa parameter space that is constrained by the trend ﬁlter-ing parameter measurements. Coupling the two methods inthis way therefore also provides signiﬁcant computationalspeedups over a traditional single-stage analytical modelsearch since it greatly reduces the dimensionality of the pa-rameter space to be searched over. Furthermore, the relaxedtrend ﬁltering estimate may provide a benchmark χ statis-tic for the constrained analytical model comparison.A dedicated paper on this application of trend ﬁlteringis forthcoming. In this section we discuss how trend ﬁltering can furtherimprove the work of the Kepler Eclipsing Binary workinggroup, speciﬁcally in regard to the Eclipsing Binaries viaArtiﬁcial Intelligence (EBAI) pipeline used to characterizeKepler eclipsing binary stars via their phase-folded lightcurves (Prˇsa et al. 2008, 2011; Slawson et al. 2011; Slaw-son, Prˇsa, Welsh, Orosz, Rucker, Batalha, Doyle, Engle,Conroy, Coughlin & et al. Mat). The EBAI pipeline uti-lizes an artiﬁcial neural network (ANN) to estimate a set Note that the boundary points of the input space are not con-sidered knots. MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering Figure 4.

Kepler-10c transit light curve analysis.

Top : Long-cadence (30-min.), quarter-stitched, median detrended, relative ﬂux lightcurve (

LC_DETREND ) of the conﬁrmed exoplanet host Kepler-10 (KOI-072, KIC 11904151; Batalha et al. 2011), processed by the Keplerpipeline (Jenkins et al. 2010) and obtained from the NASA Exoplanet Archive (Akeson et al. 2013). Vertical lines indicate the observedtransit events of the system’s second conﬁrmed planet Kepler-10c (KOI-072 c, KIC 11904151 c; Fressin et al. 2011).

Middle : Phase-foldedtransit light curve for Kepler-10c ( ∼ σ error bars. The error-weighted relaxed trend ﬁltering estimate,optimized by sequential K -fold cross validation, is superposed with 95% variability bands. The estimated inception and termination ofthe transit event are indicated by the vertical dashed lines. The estimated transit depth and total transit duration are (cid:98) δ = . ppmand (cid:98) T = . hours, respectively. Bottom : Bootstrap sampling distributions of the transit depth and transit duration estimates.MNRAS000

Figure 5.

Long-cadence (30-min. increment), detrended, median-normalized light curve of a Kepler eclipsing binary system (KIC 6048106;Borucki et al. 2010; Prˇsa et al. 2011). The vertical red lines mark the primary eclipses (the eclipses of the hotter star) and the verticalblue lines mark the secondary eclipses (the eclipses of the cooler star). KIC 6048106 has an orbital period of ∼ of physical parameters for each binary pair (the tempera-ture ratio, sum of fractional radii, photometric mass ratio,radial and tangential components of the eccentricity, ﬁlloutfactor, and inclination) from the observables of the phase-folded light curve (e.g., the eclipse widths, depths, and sep-arations). Prˇsa et al. (2008) outline the EBAI light curvepre-processing algorithm, which they call polyfit , that pro-vides the crucial step of taking a noisy, irregularly-spaced,phase-folded light curve (detrended for spacecraft motionand normalized by the median ﬂux) and outputting a de-noised and gridded phase-folded light curve, which is thenfed to the ANN. We propose that trend ﬁltering be used forthis pre-processing step instead of the polyfit algorithm forthe reasons detailed below.An eclipsing binary (EB) light curve is characterized byperiodic dips in the observed brightness that correspond toeclipse events along the line of sight to an observer. In par-ticular, there are two eclipses per orbital period—a primaryand a secondary eclipse. The primary eclipse occurs whenthe hotter star is eclipsed by the cooler star and produces acomparatively deep dip in observed brightness. Conversely,the secondary eclipse occurs when the cooler star is eclipsedby the hotter star and produces a comparatively shallow dipin the observed brightness. Depending on the eﬀective tem-perature ratio and orbital period of the EB, the dips mayrange from very narrow and abrupt to very wide and smooth.In Figure 5, we display a detrended, median-normalized,long-cadence (30 min. increment) light curve of a Kepler EB(KIC 6048106; Borucki et al. 2010; Prˇsa et al. 2011), withthe primary eclipses and secondary eclipses designated bydashed red and blue lines, respectively. The orbital periodof KIC 6048106 is ∼ Phase = 0 ,the purpose of the EBAI light-curve pre-processing step is tofaithfully extract the signal of the phase-folded light curveand evaluate it on a regular grid so that it can then be inputinto the EBAI ANN. We show a comparison of the poly-fit algorithm of Prˇsa et al. (2008) and our trend ﬁlteringapproach in Figure 6 on the phase-folded KIC 6048106 lightcurve. We choose a relatively high S/N light curve here in or- der to elucidate the signiﬁcant statistical bias that underlies polyfit . The polyfit algorithm ﬁts a piecewise quadraticpolynomial by weighted least-squares with four knots se-lected by a randomized computational search over the phasespace. The piecewise quadratic polynomial is forced to becontinuous, but no smoothness constraints are placed on thederivatives of the estimate at the knots. Recalling our dis-cussion in Section 2.1 of Paper I, this overly-stringent mod-eling assumption leads to signiﬁcant statistical bias in thelight-curve estimate. The bias is particularly apparent byexamining the residuals of the polyfit estimate, which wedisplay below the light curve. Moreover, recalling our dis-cussion of variable-knot regression splines in Section 3.1.1 ofPaper I, a randomized partial search over the space of feasi-ble knots inherently provides no guarantee of ﬁnding a globalsolution—leaving the algorithm susceptible to extreme fail-ure scenarios. We display a quadratic trend ﬁltering estimateof the KIC 6048106 phase-folded light curve in the bottompanel of Figure 6, with the hyperparameter chosen by K -fold cross validation. The trend ﬁltering estimate accuratelyrecovers the signal of the light curve (clearly apparent hereby examining a high S/N ratio light curve) and produces adesirable random residual scatter about zero. Since the sta-tistical bias introduced by the polyfit pre-processing stagepropagates through to the EBAI ANN as systematic bias inthe input data, we are conﬁdent that the use of trend ﬁl-tering will in turn improve the error rate of the EBAI ANNoutput-parameter estimates. In this section we demonstrate the use of trend ﬁltering forgenerating light-curve templates of supernova (SN) eventsand estimating observable parameters. We illustrate our ap-proach on SN 2016coi (ASASSN-16fp; Holoien et al. 2016)by constructing a B-band light-curve template from the well-sampled observations of Brown et al. (2014) and Prenticeet al. (2018) and deriving nonparametric estimates and fulluncertainty distributions for the maximum apparent bright-ness, the time of maximum, and the decline rate param-

MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering Figure 6.

Comparison of the polyfit algorithm of Prˇsa et al. (2008) and our trend ﬁltering approach for denoising phase-folded eclipsingbinary light curves. The light curve shown in this example comes from the Kepler eclipsing binary system KIC 6048106 (Borucki et al.2010; Prˇsa et al. 2011).

Top : The polyfit algorithm ﬁts a piecewise quadratic polynomial by weighted least-squares with four knotsselected by a randomized search over the phase space. The estimate is constrained to be continuous but no constraints are enforcedon the derivatives at the knots. The overly-stringent assumed model leads to signiﬁcant statistical bias, which is readily apparent byexamining the autocorrelation in the residuals.

Bottom : Trend ﬁltering is suﬃciently ﬂexible to accurately denoise the diverse set ofsignals observed in phase-folded eclipsing binary light curves. Here, the goodness-of-ﬁt is clear by the random, mean-zero residual scatter.MNRAS000 , 1–15 (2020) C. A. Politsch et al. eter ∆ m ( B ) introduced by Phillips (1993). The improve-ment yielded by trend ﬁltering as a tool for SN light-curvetemplate generation, compared to, for example, the SNooPy cubic smoothing spline approach of the Carnegie SupernovaProject (CSP; Contreras et al. 2010; Burns et al. 2010; Burnset al. 2014) primarily corresponds to light curves with an es-pecially bright peak magnitude and fast decline rate. In suchcases, trend ﬁltering is better able to recover the abruptnessof the peak, the initial sharp decline, and the subsequentslow decay. This behavior is particularly characteristic ofType Ia SNe (e.g., Woosley et al. 2007). In cases wherethe peak is not particularly prominent, trend ﬁltering andcubic smoothing splines produce nearly identical estimates.Our procedure for generating the SN light-curve templatesrequires reasonably well-sampled observations (in particu-lar, with the initial observation occurring before maximumlight). The resulting template libraries can then be used toclassify SNe with partially-sampled light curves and deriveparameter estimates (e.g., Belokurov et al. 2004).The SN light-curve template generation procedure isanalogous to the spectral template generation procedure dis-cussed in Section 3.1, so we discuss it in less formal detailhere. Naturally, the same procedure can also be implementedfor generating ﬁxed-time spectral templates of SN events.Given a well-sampled light curve, corrected for systematiceﬀects (e.g., K -corrections and interstellar reddening), wepropose the use of quadratic trend ﬁltering to generate a“best ﬁt” to the observations. Given a conﬁdent type classi-ﬁcation, the trend ﬁltering estimate can then be stored as alight-curve template in the respective library.We show a B-band light curve for SN 2016coi (ASASSN-16fp; Holoien et al. 2016) in the top panel of Figure 7. Thelight curve is an aggregation of observations collected byBrown et al. (2014) and Prentice et al. (2018), which we ac-cessed from the Open Supernova Catalog (Guillochon et al.2017). SN 2016coi was discovered on May 27, 2016 (UT 2016-05-27.55) by the All Sky Automated Survey for SuperNovae(ASAS-SN; Shappee et al. 2014) in the galaxy UGC 11868at redshift z ≈ . . The classiﬁcation of SN 2016coi cur-rently remains intermediate between Type Ib and Type Ic(Elias-Rosa et al. 2016; Kumar et al. 2018; Prentice et al.2018; Terreran et al. 2019). We ﬁt a quadratic trend ﬁlter-ing estimate to the observations, weighted by measurementuncertainties and with the hyperparameter selected by K -fold cross validation. The trend ﬁltering estimate, along with95% nonparametric bootstrap variability bands, is overlaidin the top panel of the ﬁgure. In the bottom panels we showthe univariate nonparametric bootstrap sampling distribu-tions for the estimates of the maximum apparent magnitude,the time of maximum, and the decline rate—deﬁned as therelative change in the B-magnitude light curve from max-imum light to the magnitude 15 days after the maximum(Phillips 1993). We also show a bivariate bootstrap sam-pling distribution for maximum apparent magnitude versusdecline rate. The bimodality in the bootstrap sampling dis-tributions arises from systematic discrepancies between thetwo separate sets of B-band observations that form the lightcurve (Brown et al. 2014; Prentice et al. 2018). Although the primary focus of this paper is the use of trendﬁltering as a tool for astronomical data analysis, it also pos-sesses potential utility for large-scale reduction and com-pression of one-dimensional data sets. This owes to severalfactors: its speed, scalability, ﬂexibility, and representationas a sum of basis functions with a sparse coeﬃcient vector.Given a one-dimensional set of n observations ( t , f ( t )) , . . . , ( t n , f ( t n )) ∈ ( a , b ) × R , trend ﬁltering can quicklyprovide a ﬂexible, lower-dimensional approximation of thedata where the dimensionality is controlled by the choice ofthe hyperparameter γ . In this context γ is a subjective choicethat speciﬁes the amount of (lossy) compression desired—unrelated to the discussion in Section 3.5 of Paper I. For anygiven choice of γ , let p be the number of knots selected bythe trend ﬁltering estimator. The corresponding continuous-time representation of this lower-dimensional approximationis fully encoded by the knot locations and the sparse basisvector with p + k + nonzero entries, which can be stored eﬃ-ciently. The falling factorial basis then serves as the “dictio-nary” from which the continuous-time representation can belosslessly recovered. Gridded uncertainty measurements forthe reduced observations can also be computed and stored,though not in a sparse format, via the methods discussed inSection 3.6 of Paper I. In order to illustrate the broad utility of trend ﬁltering toastronomy, we demonstrated its promising performance on awide variety of problems across time-domain astronomy andastronomical spectroscopy. We studied the Lyman- α forestof quasar spectra with the most depth—using trend ﬁlter-ing to map the relative distribution of neutral hydrogen inthe high redshift intergalactic medium along quasar-observerlines of sight. Furthermore, we discussed how trend ﬁlter-ing can be used to (1) generate galaxy, quasar, and stel-lar spectral templates and estimate emission-line parame-ters; (2) produce nonparametric models for exoplanet tran-sit events in phase-folded stellar light curves, providing es-timates for the transit depth and total duration; (3) im-prove upon the polyfit algorithm utilized by the KeplerEclipsing Binary via Artiﬁcial Intelligence (EBAI) pipelinefor denoising phase-folded eclipsing binary light curves (asa preliminary step to estimating the physical parameters);(4) generate supernova light-curve templates and producenonparametric estimates of the maximum apparent magni-tude, the time of maximum, and the decline rate; (5) quicklyand eﬃciently compress large collections of one-dimensionaldata sets. Naturally, we expect trend ﬁltering will ﬁnd usesin astronomy beyond those that we explicitly discussed. ACKNOWLEDGEMENTS

We gratefully thank Ryan Tibshirani for his inspiration andgenerous feedback on this topic. This work was partiallysupported by NASA ATP grant NNX17AK56G, NASA ATPgrant 80NSSC18K1015, and NSF grant AST1615940.This research contains data collected by the Sloan Dig-ital Sky Survey III (SDSS-III). Funding for SDSS-III has

MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering Figure 7.

SN light-curve analysis (SN 2016coi).

Top : B-band photometry of the supernova SN 2016coi (ASASSN-16 fp; Holoien et al.2016) discovered on May 27, 2016 by the All Sky Automated Survey for SuperNovae (ASAS-SN; Shappee et al. 2014) in the galaxy UGC11868 at redshift z ≈ . . We ﬁt a quadratic trend ﬁltering estimate, tuned by K -fold cross validation, and overlay 95% nonparametricbootstrap variability bands. Bottom : Univariate/bivariate nonparametric bootstrap sampling distributions of the observable parameterestimates derived from the trend ﬁltering light-curve estimate. The bimodality in the bootstrap parameter distributions arises fromsystematic discrepancies between the observations of the two separate observers (Brown et al. 2014; Prentice et al. 2018).MNRAS000 , 1–15 (2020) C. A. Politsch et al. been provided by the Alfred P. Sloan Foundation, the Par-ticipating Institutions, the National Science Foundation, andthe U.S. Department of Energy Oﬃce of Science. The SDSS-III web site is . SDSS-III is man-aged by the Astrophysical Research Consortium for the Par-ticipating Institutions of the SDSS-III Collaboration includ-ing the University of Arizona, the Brazilian ParticipationGroup, Brookhaven National Laboratory, Carnegie MellonUniversity, University of Florida, the French ParticipationGroup, the German Participation Group, Harvard Univer-sity, the Instituto de Astroﬁsica de Canarias, the MichiganState/Notre Dame/JINA Participation Group, Johns Hop-kins University, Lawrence Berkeley National Laboratory,Max Planck Institute for Astrophysics, Max Planck Insti-tute for Extraterrestrial Physics, New Mexico State Univer-sity, New York University, Ohio State University, Pennsyl-vania State University, University of Portsmouth, PrincetonUniversity, the Spanish Participation Group, University ofTokyo, University of Utah, Vanderbilt University, Univer-sity of Virginia, University of Washington, and Yale Univer-sity. This research has made use of the NASA ExoplanetArchive, which is operated by the California Institute ofTechnology, under contract with the National Aeronauticsand Space Administration under the Exoplanet ExplorationProgram. This paper includes data collected by the Keplermission. Funding for the Kepler mission is provided by theNASA Science Mission Directorate. This research containsdata that was accessed from the Open Supernova Catalog,a centralized, open repository for supernova metadata, lightcurves, and spectra. The Open Supernova Catalog web siteis https://sne.space . The original sources of the supernovalight curve studied in this paper are credited in the text.

REFERENCES

Aihara H., et al., 2011, ApJS, 193, 29Akeson R. L., et al., 2013, PASP, 125, 989Alam S., et al., 2015, ApJS, 219Arnold T. B., Sadhanala V., Tibshirani R. J., 2014, Fast algo-rithms for generalized lasso problems, https://github.com/glmgen

Batalha N. M., et al., 2011, ApJ, 729, 27Bautista J. E., et al., 2015, JCAP, 1505Becker G. D., Hewett P. C., Worseck G., Prochaska J. X., 2013,MNRAS, 430, 2067Belokurov V., Evans N. W., Le Du Y., 2004, MNRAS, 352, 233Bernardi M., et al., 2003, AJ, 125, 32Bolton A. S., et al., 2012, AJ, 144, 144Borucki W. J., et al., 2010, Science, 327, 977Brown P. J., Breeveld A. A., Holland S., Kuin P., Pritchard T.,2014, Ap&SS, 354, 89Burns C. R., et al., 2010, AJ, 141, 19Burns C. R., et al., 2014, ApJ, 789, 32Cardelli J., Clayton G., Mathis J., 1989, ApJ, 345, 245Cleveland W. S., 1979, Journal of the American Statistical Asso-ciation, 74, 829Contreras C., et al., 2010, AJ, 139, 519Croft R. A. C., Weinberg D. H., Katz N., Hernquist L., 1997,ApJ, 488, 532Croft R. A. C., et al., 2002, ApJ, 581Dall’Aglio A., Wisotzki L., Worseck G., 2009Eisenstein D. J., et al., 2011, AJ, 142, 72Elias-Rosa N., et al., 2016, The Astronomer’s Telegram Epanechnikov V., 1967, Theory of Probability & its Applications,14, 153Fan J., Gijbels I., 1996, Local Polynomial Modeling and Its Ap-plications. Chapman and HallFaucher-Gigu`ere C.-A., Lidz A., Hernquist L., Zaldarriaga M.,2008, ApJ, 688, 85Fressin F., et al., 2011, ApJS, 197, 5Gnedin N. Y., Hui L., 1998, MNRAS, 296, 44Guillochon J., Parrent J., Kelley L. Z., Margutti R., 2017, ApJ,835, 64Hellsten U., Hernquist L., Katz N., Weinberg D. H., 1998, ApJ,499, 172Holoien T. W.-S., et al., 2016, The Astronomer’s Telegram, 9086Jenkins J. M., et al., 2010, ApJ, 713, L87Kim T.-S., Viel M., Haehnelt M. G., Carswell R. F., Cristiani S.,2004, MNRAS, 347, 355Kumar B., Singh A., Srivastav S., Sahu D. K., Anupama G. C.,2018, MNRAS, 473, 3776Lee K.-G., et al., 2014, ApJL, 795, L12Loader C., 1999, Local Regression and Likelihood. Springer-VerlagMandel K., Agol E., 2002, ApJ, 580, L171Marziani P., Sulentic J. W., Dultzin-Hacyan D., Calvani M.,Moles M., 1996, ApJS, 104, 37143McDonald P., et al., 2005, ApJ, 635, 761Nadaraya E. A., 1964, Theory of Probability & its Applications,9, 141Nemirovskii A., 1985, Izv. Akad. Nauk. SSSR Tekhn. Kibernet.(in Russian), 3, 50Nemirovskii A., Polyak B., Tsybakov A., 1985, Problems of In-formation Transmission, 21Osterbrock D. E., Ferland G. J., 2006, Astrophysics of gaseousnebulae and active galactic nucleiPalanque-Delabrouille N., et al., 2013, A&A, 559, A85Palanque-Delabrouille N., et al., 2015, JCAP, 2015, 011Pˆaris I., et al., 2011, A&A, 530Peeples M. S., Weinberg D. H., Dav´e R., Fardal M. A., Katz N.,2010, MNRAS, 404, 1281Phillips M. M., 1993, ApJL, 413, L105Pieri M. M., Frank S., Weinberg D. H., Mathur S., York D. G.,2010, ApJL, 724, L69Politsch C. A., Cisewski-Kehe J., Croft R. A. C., Wasserman L.,2020, MNRAS, 492Prentice S. J., et al., 2018, MNRAS, 478, 4162Prˇsa A., Guinan E. F., Devinney E. J., DeGeorge M., BradstreetD. H., Giammarco J. M., Alcock C. R., Engle S. G., 2008,ApJ, 687, 542Prˇsa A., et al., 2011, AJ, 141, 83Ramdas A., Tibshirani R. J., 2016, Journal of Computational andGraphical Statistics, 25, 839Rauch M., 1998, ARA&A, 36, 267Schlaﬂy E., Finkbeiner D., 2011, ApJ, 737, 103Shappee B. J., et al., 2014, ApJ, 788, 48Slawson R. W., et al., 2011, AJ, 142, 160Terreran G., et al., 2019, ApJ, 883, 147Theuns T., Zaroubi S., 2000, MNRAS, 317, 989Tibshirani R. J., 2014, The Annals of Statistics, 42, 285Tibshirani R. J., Taylor J., 2011, The Annals of Statistics, 39,1335Watson G. S., 1964, Sankhy¯a: The Indian Journal of Statistics,Series A, 26, 359Woosley S. E., Kasen D., Blinnikov S., Sorokina E., 2007, ApJ,662, 487 MNRAS , 1–15 (2020) enoising Astronomical Signals via Trend Filtering APPENDIXMock quasar Lyman- α forest reduction The Bautista et al. [2015] mock quasar catalog is designedto mimic the observational data generating processes of thequasar spectra released in Data Release 11 of the BaryonOscillation Spectroscopic Survey (Alam et al. 2015). Wepool the ﬁrst three realizations of the mock catalog, i.e.

M3_0_3/000 , M3_0_3/001 , and

M3_0_3/002 and remove alldamped Ly α systems (DLAs), Lyman-limit systems (LLS),and broad absorption line quasars (BALs). We assume nometal absorption in the Ly α forest and correct estimationand subtraction of the sky. We mask all pixels with and_mask (cid:44) or or_mask (cid:44) . Finally, we retain only the spectra with ≥ pixels in the truncated Ly α forest and those with aﬁxed-input-optimal trend ﬁltering hyperparameter satisfy-ing γ < . . Spectra with γ ≥ . correspond to the verylowest S/N ratio observations, where the trend ﬁltering esti-mate typically reduces to a global power law ﬁt (zero knots).The ﬁnal mock sample contains 124,709 quasar spectra. This paper has been typeset from a TEX/L A TEX ﬁle prepared bythe author.MNRAS000