Baseline Drift Estimation for Air Quality Data Using Quantile Trend Filtering
SSubmitted to the Annals of Applied Statistics
BASELINE DRIFT ESTIMATION FOR AIR QUALITYDATA USING QUANTILE TREND FILTERING
By Halley L. Brantley ∗ , Joseph Guinness † and Eric C. Chi ∗ North Carolina State University ∗ and Cornell University † We address the problem of estimating smoothly varying base-line trends in time series data. This problem arises in a wide rangeof fields, including chemistry, macroeconomics, and medicine; how-ever, our study is motivated by the analysis of data from low costair quality sensors. Our methods extend the quantile trend filter-ing framework to enable the estimation of multiple quantile trendssimultaneously while ensuring that the quantiles do not cross. Tohandle the computational challenge posed by very long time series,we propose a parallelizable alternating direction method of moments(ADMM) algorithm. The ADMM algorthim enables the estimationof trends in a piecewise manner, both reducing the computation timeand extending the limits of the method to larger data sizes. We alsoaddress smoothing parameter selection and propose a modified crite-rion based on the extended Bayesian Information Criterion. Throughsimulation studies and our motivating application to low cost airquality sensor data, we demonstrate that our model provides betterquantile trend estimates than existing methods and improves signalclassification of low-cost air quality sensor output.
1. Introduction.
In the last decade, low cost and portable air qualitysensors have enjoyed dramatically increased usage. These sensors can pro-vide an un-calibrated measure of a variety of pollutants in near real time,but deriving meaningful information from sensor data remains a challenge(Snyder et al., 2013). The “SPod” is a low-cost sensor currently being in-vestigated by researchers at the U.S. Environmental Protection Agency todetect volatile organic compound (VOC) emissions from industrial facilities(Thoma et al., 2016). Due to changes in temperature and relative humiditythe output signal exhibits a slowly varying baseline drift on the order ofminutes to hours. Figure 1 provides an example of measurements from threeSPod sensors co-located at the border of an industrial facility. All of the sen-sors respond to the pollutant signal, which is illustrated by the three sharptransient spikes at 11:32, 14:10, and 16:03. However, the baseline drift variesfrom one sensor to another, obscuring the detection of the peaks that alertthe intrusion of pollutants. We show later that by estimating the baseline
Keywords and phrases: air quality, non-parametric quantile regression, trend estima-tion imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 a r X i v : . [ s t a t . M E ] A p r H. L. BRANTLEY ET AL.
Fig 1: Example of 3 co-located SPod PID sensor readings over a 24 hourperiod.drift in each sensor and removing it from the observed signals, peaks can bereliably detected from concordant residual signals from a collection of SPodsusing a simple data-driven thresholding strategy. Thus, accurately demixinga noisy observed time series into a slowly varying component and a tran-sient component can lead to greatly improved and simplified downstreamanalysis.While this work is motivated by the analysis of data from low cost airquality sensors, the problem of demixing nosy time series into trends andtransients is ubiquitous across many fields of study. In a wide range of appli-cations that spans chemistry (Ning, Selesnick and Duval, 2014), macroeco-nomics (Yamada, 2017), environmental science (Brantley et al., 2014), andmedical sciences (Pettersson et al., 2013; Marandi and Sabzpoushan, 2015),scalar functions of time y ( t ) are observed and assumed to be a superposi-tion of an underlying slowly varying baseline trend θ ( t ), other more rapidlyvarying components s ( t ), and noise. In practice, y ( t ) is observed at discretetime points t , . . . , t n , and we model the vector of samples y i = y ( t i ) as y = θ + s + ε , where θ i = θ ( t i ), s i = s ( t i ), and ε ∈ R n is a vector of uncorrelated noise.For notational simplicity, for the rest of the paper, we assume that the timepoints take on the values t i = i , but it is straightforward to generalize to anarbitrary grid of time points.In some applications, the slowly varying component θ is the signal ofinterest, and the transient component s is a vector of nuisance parameters.In our air quality application, the roles of θ and s are reversed; s representsthe signal of interest and θ represents a baseline drift that obscures the imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING identification of the important transient events encoded in s .To tackle demixing problems, we introduce a scalable baseline estimationframework by building on (cid:96) -trend filtering , a relatively new nonparametricestimation framework. Our contributions are three-fold. • Kim et al. (2009) proposed using the check function as a possible ex-tension of (cid:96) -trend filtering but did not investigate it further. Here,we develop the basic (cid:96) -quantile-trend-filtering framework and extendit to model multiple quantiles simultaneously with non-crossing con-straints to ensure validity and improve trend estimates . • To reduce computation time and extend the method to long time se-ries, we develop a parallelizable ADMM algorithm. The algorithm pro-ceeds by splitting the time domain into overlapping windows, fittingthe model separately for each of the windows and reconciling estimatesfrom the overlapping intervals. • Finally, we propose a modified criterion for performing model selection.In the rest of the paper, we detail our quantile trend filtering algorithms(Section 2) as well as how to choose the smoothing parameter (Section 3).We demonstrate through simulation studies that our proposed model pro-vides better or comparable estimates of non-parametric quantile trends thanexisting methods (Section 4). We further show that quantile trend filteringis a more effective method of drift removal for low-cost air quality sensorsand results in improved signal classification compared to quantile smoothingsplines (Section 5). Finally, we discuss potential extensions of quantile trendfiltering (Section 6).
2. Baseline Trend Estimation.
Background.
Kim et al. (2009) originally proposed (cid:96) -trend filtering to estimate trends with piecewise polynomial functions, assuming that theobserved time series y consists of a trend θ plus uncorrelated noise ε , namely y = θ + ε . The estimated trend is the solution to the following convexoptimization problemminimize θ (cid:107) y − θ (cid:107) + λ (cid:107) D ( k +1) θ (cid:107) , where λ is a nonnegative regularization parameter, and the matrix D ( k +1) ∈ R ( n − k − × n is the discrete difference operator of order k + 1. To understandthe purpose of penalizing the 1-norm of D ( k +1) θ consider the differenceoperator when k = 0. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL. D (1) = − · · · − · · · · · · − . Thus, (cid:107) D (1) θ (cid:107) = (cid:80) n − i =1 | θ i − θ i +1 | , which is known as the total varia-tion denoising penalty in one dimension in the signal processing literature(Rudin, Osher and Fatemi, 1992) or the fused lasso penalty in the statis-tics literature (Tibshirani et al., 2005). The penalty term incentivizes so-lutions which are piecewise constant. For k ≥
1, the difference operator D ( k +1) ∈ R ( n − k − × n is defined recursively as follows D ( k +1) = D (1) D ( k ) . Penalizing the 1-norm of the vector D ( k +1) θ produces estimates of θ thatare piecewise polynomials of order k .Tibshirani (2014) proved that with a judicious choice of λ the trend filter-ing estimate converges to the true underlying function at the minimax ratefor functions whose k th derivative is of bounded variation and showed thattrend filtering is locally adaptive when the time series consists of only thetrend and random noise, which is illustrated in Figure 2a. As noted earlier,in some applications, such as the air quality monitoring problem consideredin this paper, the data contain a rapidly varying signal in addition to theslowly varying trend and noise. Figure 2b shows that standard trend fil-tering is not designed to distinguish between the slowly varying trend andthe rapidly-varying signal, as the smooth component estimate θ is biasedtowards the peaks of the transient components.To account for the presence of transient components in the observed timeseries y , we propose quantile trend filtering Figure 2b. To estimate the trendin the τ th quantile, we solve the convex optimization problemminimize θ ρ τ ( y − θ ) + λ (cid:107) D ( k +1) θ (cid:107) , (1)where ρ τ ( r ) is the check function ρ τ ( r ) = n (cid:88) i =1 r i ( τ − ( r i < , (2)and ( A ) is 1 if its input A is true and 0 otherwise. Note that we do notexplicitly model s . Rather, we focus on estimating θ . We then estimate s + ε as the difference y − θ . imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING (a) Slow Trend + Noise (b) Slow Trend + Spikes + Noise Fig 2: Examples of trend filtering solutions (red) and 15 th quantile trendingfiltering solution (blue). Standard trend filtering performs well in the no-signal case (a) but struggles to distinguish between the slowly varying trendand the rapidly-varying signal (b). The quantile trend is not affected by thesignal and provides an estimate of the baseline.Before elaborating on how we compute our proposed (cid:96) -quantile trendfiltering estimator, we discuss similarities and differences between our pro-posed estimator and existing quantile trend estimators.2.1.1. Relationship to Prior Work.
In this application, as well as thosedescribed in Ning, Selesnick and Duval (2014), Marandi and Sabzpoushan(2015), and Pettersson et al. (2013), the goal is to estimate the trend in thebaseline not the mean. We can define the trend in the baseline as the trendin a low quantile of the data. A variety of methods for estimating quantiletrends have already been proposed. Koenker and Bassett (1978) were thefirst to propose substituting the sum-of-squares term with the check function(2) to estimate a conditional quantile instead of the conditional mean. Later,Koenker, Ng and Portnoy (1994) proposed quantile trend filtering with k = 1producing quantile trends that are piecewise linear, but they did not considerextensions to higher order differences. Rather than using the (cid:96) -norm topenalize the discrete differences, Nychka et al. (1995) used the smoothingspline penalty based on the square of the (cid:96) -norm: n (cid:88) i =1 ρ τ ( y ( t i ) − θ ( t i )) + λ (cid:90) θ (cid:48)(cid:48) ( t ) dt, where θ ( t ) is a smooth function of time and λ is a tuning parameter thatcontrols the degree of smoothing. Oh, Lee and Nychka (2011) proposed analgorithm for solving the quantile smoothing spline problem by approximat- imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL. ing the check function with a differentiable function. Racine and Li (2017)propose a method for estimating quantile trends that does employ the checkfunction. In their method, the response is constrained to follow a locationscale model and the conditional quantiles are estimated by combining Gaus-sian quantile functions with a kernel smoother and solving a local-linearleast squares problem.2.2.
Quantile Trend Filtering.
We combine the ideas of quantile regres-sion and trend filtering. For a single quantile level τ , the quantile trend filter-ing problem is given in (1). As with classic quantile regression, the quantiletrend filtering problem is a linear program which can be solved by a num-ber of methods. We want to estimate multiple quantiles simultaneously andto ensure that our quantile estimates are valid by enforcing the constraintthat if τ > τ then Q ( τ ) ≥ Q ( τ ) where Q is the quantile function of y .Even if a single quantile is ultimately desired, ensuring non-crossing allowsinformation from nearby quantiles to be used to improve the estimates aswe will see in the peak detection experiments in Section 4.2. Given quantiles τ < τ < . . . < τ J , the optimization problem becomesminimize Θ ∈C J (cid:88) j =1 (cid:104) ρ τ j ( y − θ j ) + λ j (cid:107) D ( k +1) θ j (cid:107) (cid:105) (3)where Θ = (cid:0) θ θ · · · θ J (cid:1) ∈ R n × J is a matrix whose j th column cor-responds to the j th quantile signal θ j and the set C = { Θ ∈ R n × J : θ ij ≤ θ ij (cid:48) for j ≤ j (cid:48) } encodes the non-crossing quantile constraints. The additionalnon-crossing constraints are linear inequalities involving the parameters, sothe non-crossing quantile trends can still be estimated by a number of avail-able linear programming solvers. We allow for the possibility that the degreeof smoothness in the trends varies by quantile by allowing the smoothingparameter to vary with quantile as well. In the rest of this paper, we use k = 2 to produce piecewise quadratic polynomials and report numerical re-sults using the commercial solver Gurobi (Gurobi Optimization, 2018) andits R package implementation. However, we could easily substitute a freesolver such as the Rglpk package by Theussl and Hornik (2017).2.3. ADMM for Big Data.
As the size of the data increases, computa-tion time becomes prohibitive. In our application to air quality sensor data,measurements are recorded every second resulting in 86,400 observations perday. This number of observations is already too large to use with currentlyavailable R packages used for estimating quantile trends (Douglas Nychkaet al., 2017; Koenker, 2018). To our knowledge, no one has addressed the imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING Fig 3: Window boundaries and trends fit separately in each window. Eachwindow’s trend estimate is plotted in a different line type.problem of finding smooth quantile trends of series that are too large to beprocessed simultaneously. We propose a divide-and-conquer approach via anADMM algorithm for solving large problems in a piecewise fashion.2.3.1.
Formulation.
To decrease computation time and extend our methodto larger problems, we divide our observed series y i with i = 1 , . . . , n into W overlapping windows of observations, defining the vector of sequentialelements indexed from l w to u w as y ( w ) = { y l w , y l w +1 , . . . , y u w − , y u w } , with1 = l < l < u < l < u < l < u < · · · < u W = n. We define n w = u w − l w + 1 so that y ( w ) ∈ R n w . Figure 3 shows an exampleof 1200 observations being mapped into three equally sized overlapping win-dows of observations. While the overlapping trend estimates between l and u do not vary dramatically, the difference is more pronounced in the trendin the 5 th quantile between l and u . Thus, we need a way of enforcingestimates to be identical in the overlapping regions.Given quantiles τ < · · · < τ J , we introduce dummy variables θ ( w ) j ∈ R n w as the value of the τ j th quantile trend in window w . We then “stitch” togetherthe W quantile trend estimates into consensus over the overlapping regionsby introducing the constraint θ ( w ) ij = θ i + l w − ,j for i = 1 , . . . , n w and for all j .Let Θ ( w ) be the matrix whose j th column is θ ( w ) j . Then we can write theseconstraints more concisely as Θ ( w ) = U ( w ) Θ , where U ( w ) ∈ { , } n w × n is amatrix that selects rows of Θ corresponding to the w th window, namely U ( w ) = e T l w ... e T u w , imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 4: Trend fit with our ADMM algorithm with 3 windows (converged in 7iterations), compared to trend from simultaneous fit.where e i ∈ R n denotes the i th standard basis vector. Furthermore, let ι C denote the indicator function of the non-crossing quantile constraint, namely ι C ( Θ ) is zero if Θ ∈ C and infinity otherwise. Our windowed quantile trendoptimization problem can then be written asminimize W (cid:88) w =1 J (cid:88) j =1 (cid:104) ρ τ j ( y ( w ) − θ ( w ) j ) + λ j (cid:107) D ( k +1) θ ( w ) j (cid:107) (cid:105) + ι C ( Θ ( w ) ) subject to Θ ( w ) = U ( w ) Θ for w = 1 , . . . , W. (4)The solution to (4) is not identical to the solution to (3) because of doublecounting of the overlapping sections. The solutions are very close, however,and the differences are essentially immaterial concerning downstream anal-ysis. Figure 4 provides an illustration of the trends estimated using multiplewindows compared with the trends estimated using a single window; esti-mates using multiple and single windows are nearly indistinguishable.2.3.2. Algorithm.
The ADMM algorithm (Gabay and Mercier, 1975; Glowin-ski and Marroco, 1975) is described in greater detail by Boyd et al. (2011),but we briefly review how it can be used to iteratively solve the followingequality constrained optimization problem which is a more general form of(4). minimize f ( φ ) + g ( ˜ φ )subject to A φ + B˜ φ = c . (5)Recall that finding the minimizer to an equality constrained optimizationproblem is equivalent to the identifying the saddle point of the Lagrangian imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING function associated with the problem (5). ADMM seeks the saddle point ofa related function called the augmented Lagrangian, L γ ( φ , ˜ φ , ω ) = f ( φ ) + g ( ˜ φ ) + (cid:104) ω , c − A φ − B˜ φ (cid:105) + γ (cid:107) c − A φ − B˜ φ (cid:107) , where the dual variable ω is a vector of Lagrange multipliers and γ is anonnegative tuning parameter. When γ = 0, the augmented Lagrangiancoincides with the ordinary Lagrangian.ADMM minimizes the augmented Lagrangian one block of variables at atime before updating the dual variable ω . This yields the following sequenceof updates at the ( m + 1) th ADMM iteration φ m +1 = arg min φ L γ ( φ , ˜ φ m , ω m ) ˜ φ m +1 = arg min ˜ φ L γ ( φ m +1 , ˜ φ , ω m ) ω m +1 = ω m + γ ( c − A φ m +1 − B˜ φ m +1 ) . (6)Returning to our constrained windows problem giving in (4), let Ω ( w ) denote the Lagrange multiplier matrix for the w th consensus constraint,namely Θ ( w ) = U ( w ) Θ , and let ω ( w ) j denote its j th column.The augmented Lagrangian is given by L ( Θ , { Θ ( w ) } , { Ω ( w ) } ) = W (cid:88) w =1 L w ( Θ , Θ ( w ) , Ω ( w ) ) , where L w ( Θ , Θ ( w ) , Ω ( w ) ) = J (cid:88) j =1 (cid:20) ρ τ j ( y ( w ) − θ ( w ) j ) + λ j (cid:107) D ( k +1) θ ( w ) j (cid:107) + ( θ ( w ) j − U ( w ) θ j ) T ω ( w ) j + γ (cid:107) θ ( w ) j − U ( w ) θ j (cid:107) (cid:21) + ι C ( Θ ( w ) ) , where γ is a positive tuning parameter.The ADMM algorithm alternates between updating the consensus vari-able Θ , the window variables { Θ ( w ) } , and the Lagrange multipliers { Ω ( w ) } .At the ( m + 1) th iteration, we perform the following sequence of updates Θ m +1 = arg min Θ L ( Θ , { Θ ( w ) m } , { Ω ( w ) m } ) Θ ( w ) m +1 = arg min { Θ ( w ) } L ( Θ m +1 , { Θ ( w ) } , { Ω ( w ) m } ) imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Algorithm 1
ADMM algorithm for quantile trend filtering with windows
Define D = D ( k +1) . initialize:for w = 1 , . . . , W doΘ ( w )0 ← arg min Θ ( w ) ∈C (cid:80) Jj =1 ρ τ j ( y ( w ) − θ ( w ) j ) + λ (cid:107) D θ ( w ) j (cid:107) Ω ( w )0 ← m ← repeatΘ m +1 ← ψ ( { Θ ( w ) m } , { Ω ( w ) m } ) for w = 1 , . . . , W doΘ ( w ) m +1 ← arg min Θ ( w ) L w ( Θ m +1 , Θ ( w ) , Ω ( w ) m ) Ω ( w ) m +1 ← Ω ( w ) m + γ ( Θ ( w ) m +1 − U ( w ) Θ m +1 ) end for m ← m + 1 until convergence return Θ m Updating Θ:
Some algebra shows that, defining i w = i − l w +1, updatingthe consensus variable step is computed as follows. θ ij = (cid:16) θ ( w − i w − j + θ ( w ) i w j (cid:17) − γ (cid:16) ω ( w − i w − j + ω ( w ) i w j (cid:17) if l w ≤ i ≤ u w − ,θ ( w ) i w j if u w − < i ≤ l w +112 (cid:16) θ ( w ) i w j + θ ( w +1) i w +1 j (cid:17) − γ (cid:16) ω ( w ) i w j + ω ( w +1) i w +1 j (cid:17) if l w +1 < i ≤ u w , (7)The consensus update (7) is rather intuitive. We essentially average thetrend estimates in overlapping sections of the windows, subject to someadjustment by the Lagrange multipliers, and leave the trend estimates innon-overlapping sections of the windows untouched. For notational ease, wewrite the consensus update (7) compactly as Θ = ψ ( { Θ ( w ) } , { Ω ( w ) } ). Updating { Θ ( w ) } : We then estimate the trend separately in each win-dow, which can be done in parallel, while penalizing the differences in theoverlapping pieces of the trends as outlined in Algorithm 1. The use of theAugmented Lagrangian converts the problem of solving a potentially largelinear program into a solving a collection of smaller quadratic programs.The gurobi
R package (Gurobi Optimization, 2018) can solve quadraticprograms in addition to linear programs, but we can also use the free Rpackage quadprog (Weingessel and Turlach, 2013). imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING Algorithm 1 has the following convergence guarantees.
Proposition . Let {{ Θ ( w ) m } , Θ m } denote the m th collection of iter-ates generated by Algorithm 1. Then (i) (cid:107) Θ ( w ) m − U ( w ) Θ m (cid:107) F → and (ii) p m → p (cid:63) , where p (cid:63) is the optimal objective function value of (4) and p m isthe objective function value of (4) evaluated at {{ Θ ( w ) m } , Θ m } . The proof of Proposition 2.1 is a straightforward application of the con-vergence result presented in Section 3.2 of Boyd et al. (2011).To terminate our algorithm, we use the stopping criteria described byBoyd et al. (2011). The criteria are based on the primal and dual residuals,which represent the residuals for primal and dual feasibility, respectively.The primal residual at the m th iteration, r m primal = (cid:118)(cid:117)(cid:117)(cid:116) W (cid:88) w =1 (cid:107) Θ ( w ) m − U ( w ) Θ m (cid:107) , represents the difference between the trend values in the windows and theconsensus trend value. The dual residual at the m th iteration, r m dual = γ (cid:118)(cid:117)(cid:117)(cid:116) W (cid:88) w =1 (cid:107) Θ m − Θ m − (cid:107) , represents the change in the consensus variable from one iterate to the next.The algorithm is stopped when r m primal < (cid:15) abs √ nJ + (cid:15) rel max w (cid:104) max (cid:110) (cid:107) Θ ( w ) m (cid:107) F , (cid:107) Θ m (cid:107) F (cid:111)(cid:105) r m dual < (cid:15) abs √ nJ + (cid:15) rel (cid:118)(cid:117)(cid:117)(cid:116) W (cid:88) w =1 (cid:107) Ω ( w ) m (cid:107) . (8)The quantile trend filtering problem for a single window is a linear pro-gram with N × J parameters (number of observations by number of quan-tiles), which can be solved in computational time proportional to ( N J ) .Consequently, solving a large problem using Algorithm 1 should require lesscomputational time than solving (3), even if the sub-problems are solvedsequentially. We demonstrate the advantages of Algorithm 1 through timing imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 5: Timing experiments comparing quantile trend filtering with varyingnumbers of windows by data size.experiments (Figure 5). For each data size, 25 datasets were simulated usingthe peaks simulation design described below. Trends for the fifth, tenth, andfifteenth quantiles were fit simultaneously using λ j = n/ j . We usefrom one to four windows for each data size with an overlap of 500. Algo-rithm 1 was the stopped when (8) was satisfied, defining (cid:15) abs = 0 .
01 and (cid:15) rel = 0 .
3. Model Selection.
An important practical issue in baseline estima-tion is the choice of the regularization parameter λ , which controls the degreeof smoothness in θ . In this section, we introduce four methods for choos-ing λ . The first is a validation based approach; the latter three are basedon information criteria. Each of the criteria we compare is calculated for asingle quantile ( τ j ). Rather than combine results over quantiles, we allowthe value of λ to vary by quantile resulting in λ = { λ , ..., λ J } . To choosethe best value for each λ j , we first estimate all of the quantile trends using λ j = λ for all j over a grid of values for λ . We then determine the λ j thatmaximizes the criteria chosen evaluated using τ j . Finally, we re-estimate thenon-crossing trends with the optimal values for λ j . A more thorough ap-proach would involve fitting the model on a J dimensional grid of values for λ but this is computationally infeasible.3.1. Validation.
Our method can easily handle missing data by definingthe check loss function to output 0 for missing values. Specifically, we use imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING the following modified function ˜ ρ τ in place of the ρ τ function given in (2)˜ ρ τ ( r ) = (cid:88) i (cid:54)∈V r i ( τ − ( r i < , (9)where V is a held-out validation subset of { , . . . , n } and solve the problemminimize Θ ∈C J (cid:88) j =1 (cid:104) ˜ ρ τ j ( y − θ j ) + λ j (cid:107) D ( k +1) θ j (cid:107) (cid:105) , (10)which can be solved via Algorithm 1 with trivial modification to the quadraticprogram sub-problems. For each quantile level j , we select the λ j that min-imizes the hold-out prediction error quantified by ˘ ρ τ j ( y − ˆ θ j ( λ j )) where ˆ θ j ( λ j ) is the solution to (10) and˘ ρ τ ( r ) = (cid:88) i ∈V r i ( τ − ( r i < . Information Criteria.
Koenker, Ng and Portnoy (1994) addressedthe choice of regularization parameter by proposing the Schwarz criterionfor the selection of λ SIC( p λ , τ j ) = log (cid:20) n ρ τ j ( y − θ j ) (cid:21) + 12 n p λ log n. where p λ = (cid:80) i ( y i = (cid:98) θ i ) is the number of non-interpolated points, whichcan be thought of as active knots. Equivalently, p λ can be substituted withthe number of non-zero components in D ( k +1) ˆ θ j which we denote ν and havefound to be more numerically stable. The SIC is based on the traditionalBayesian Information Criterion (BIC) which is given by(11) BIC( ν ) = − L { ˆ θ } ) + ν log n where L is the likelihood function. If we take the approach used in Bayesianquantile regression (Yu and Moyeed, 2001), and view minimizing the checkfunction as maximizing the asymmetric Laplace likelihood, L ( y | θ ) = (cid:20) τ n (1 − τ ) σ (cid:21) n exp (cid:26) − ρ τ (cid:18) y − θ σ (cid:19)(cid:27) , we can compute the BIC asBIC( ν, τ j ) = 2 σ ρ τ j ( y − ˆ θ j ) + ν log n imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL. where ˆ θ is the estimated trend, and ν is the number of non-zero elementsof D ( k +1) ˆ θ . We can choose any σ > σ = −| − τ | produces stable estimates.Chen and Chen (2008) proposed the extended Bayesian Information Cri-teria (eBIC), specifically designed for large parameter spaces.eBIC γ ( ν ) = − L { ˆ θ } ) + ν log n + 2 γ log (cid:18) Pν (cid:19) , γ ∈ [0 , P is the total number of possible parameters and ν is the number ofnon-zero parameters included in given mod. We used this criteria with γ = 1,and P = n − k −
1. In the simulation study, we compare the performance ofthe SIC, scaled eBIC (with σ defined above), and validation methods.
4. Simulation Studies.
We conduct two simulation studies to com-pare the performance of our quantile trend filtering method and regular-ization parameter selection criteria with previously published methods. Thefirst study compares the method’s ability to estimate quantiles when the ob-served series consists of a smooth trend plus independent error, but does notcontain transient components. The second study is based on our applicationand compares the method’s ability to estimate baseline trends and enablepeak detection when the time series contains a non-negative, transient signalin addition to the trend and random component.We compare three criteria for choosing the smoothing parameter for quan-tile trend filtering: λ chosen using SIC (11) ( detrendr SIC ); λ chosen usingthe validation method with the validation set consisting of every 5th obser-vation ( detrendr valid ); and λ chosen using the proposed eBIC criterion(12) ( detrendr eBIC ). For the second study we also examine the effect ofthe non-crossing quantile constraint by estimating the quantile trends sep-arately and choosing λ using eBIC ( detrendr Xing ). We do not include detrendr Xing in the first study because the difference in quantiles is largeenough that we would not expect the non-crossing constraint to make adifference.We also compare the performance of our quantile trend filtering methodwith three previously published methods, none of which guarantee non-crossing quantiles: • npqw : The local linear quantile method (quantile-ll) described in Racineand Li (2017). Code was obtained from the author. • qsreg : Quantile smoothing splines described in Oh et al. (2004) andNychka et al. (1995) and implemented in the fields R package (Dou-glas Nychka et al., 2017). The regularization parameter was chosenusing generalized cross-validation. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING Fig 6: Simulated data with true quantile trends. • rqss : Quantile trend filtering with k = 1 available in the quantreg package and described in Koenker, Ng and Portnoy (1994). The regu-larization parameter is chosen using a grid search and minimizing theSIC (11) as described in Koenker, Ng and Portnoy (1994).4.1. Estimating Quantiles.
To compare performance in estimating quan-tile trends in the absence of a signal component, three simulation designsfrom Racine and Li (2017) were considered. For all designs t = 1 , . . . , n , x ( t ) = t/n , and the response y was generated as y ( t ) = sin(2 πx ( t )) + (cid:15) ( x ( t ))The errors were simulated as independent draws from the following distri-butions: • Gaussian: (cid:15) ( x ( t )) ∼ N (cid:18) , (cid:16) x ( t ) (cid:17) (cid:19) • Beta: (cid:15) ( x ( t )) ∼ Beta (1 , − x ( t )) • Mixed normal: (cid:15) ( x ( t )) is simulated from a mixture of N ( − ,
1) and N (1 ,
1) with mixing probability x ( t ).The true quantile trends and an example simulated data set is show inFigure 6. One hundred datasets were generated of sizes 300, 500 and 1000. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Quantile trends were estimated for τ = { . , . , . , . , . } and theroot mean squared error was calculated as RMSE( τ j ) = (cid:113) n (cid:80) ni =1 (ˆ θ ij − θ ij ) ,where θ ij is the true value of the τ j th quantile of y at t = i . Figure 7 showsthe mean RMSE plus or minus twice the standard error for each method,quantile level, and sample size. In all three designs the proposed detrend methods are either better than or comparable to existing methods. Overallthe detrend eBIC performs best. In the mixed normal design, specifically,our methods have lower RMSEs for the 5 th and 95 th quantiles. The npqw method performs particularly poorly on the mixed normal design due to theviolation of the assumption that the data come from a scale-location model.4.2. Peak Detection.
The second simulation design is closely motivatedby our air quality analysis problem. We assume that the measured data canbe represented by y ( t i ) = θ ( t i ) + s ( t i ) + ε i , where t i = i for i = 1 , ..., n , θ ( t ) is the drift component that varies smoothlyover time, s ( t ) is the true signal at time t , and ε i are i.i.d. errors distributedas N (0 , . ). We generate θ ( t ) using cubic natural spline basis functionswith degrees of freedom sampled from a Poisson distribution with meanparameter equal to n/ s ( t ) is assumed to be zero withpeaks generated using the Gaussian density function. The number of peaksis sampled from a binomial distribution with size equal to n and probabil-ity equal to 0 .
005 with location parameters uniformly distributed between1 and n − n ∈ { , , , } .We compare the ability of the methods to estimate the true quantiles of y ( t i ) − s ( t i ) for τ ∈ { . , . , . } and calculate the RMSE (Figure 8).In this simulation study, our proposed method detrend eBIC method sub-stantially outperforms the others. The detrend Xing method, which is the detrend eBIC method fit without the non-crossing constraints, performssimilarly for larger quantiles and larger datasets. However, detrend Xing produces significantly worse estimates for more extreme quantiles ( τ = 0 . τ = 0 .
05 and n = 500). These results indicate thateven when a single quantile is of interest, simultaneously fitting nearbyquantiles and utilizing the non-crossing constraint can improve estimates imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING Fig 7: RMSE by design, method, quantile and data size. Points and errorbars represent mean RMSE ± twice the standard error. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 8: RMSE by method, quantile, and data size for peaks design.when data is sparse by using information from nearby quantiles. The qsreg method is comparable to the detrend eBIC method on the larger datasets,but its performance deteriorates as the data size shrinks. The npqw and rqss methods both perform poorly on this design.While minimizing RMSE is desirable in general, in our application, theprimary metric of success is accurately classifying observations y i into signalpresent or absent. To evaluate the accuracy of our method compared toother methods we define true signal as any time point when the simulatedpeak value is greater than 0.5. We compare three different quantiles for thebaseline estimation and four different thresholds for classifying the signalafter subtracting the estimated baseline from the observations. Figure 9illustrates the observations classified as signal after subtracting the baselinetrend compared to the “true signal.”To compare the resulting signal classifications, we calculate the class av-eraged accuracy (CAA), which is defined asCAA = 12 (cid:34) (cid:80) ni =1 [ δ i = 1 ∩ ˆ δ i = 1] (cid:80) ni =1 [ δ i = 1] + (cid:80) ni =1 [ δ i = 0 ∩ ˆ δ i = 0] (cid:80) nt = i =1 [ δ i = 0] (cid:35) . where δ i ∈ { , } is the vector of true signal classifications and ˆ δ i ∈ { , } isthe vector of estimated signal classifications, namely ˆ δ i = ( y i − ˆ θ i > . δ i = 0for all i .Our detrend eBIC method results in the largest CAA values (Figure 10)in addition to the smallest RMSE values (Figure 8). While qsreg was com-petitive with our method in some cases, in the majority of cases the largest imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING CAA values for each threshold were produced using the detrend eBIC methodwith the 1 st or 5 th quantiles.Fig 9: Example signal classification using threshold. Red indicates true signal( y i − θ i > . detrend eBIC .Fig 10: Class averaged accuracy by threshold, data size, and method (1 isbest 0.5 is worst). imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 11: Estimated 5 th quantile trends for SPods a, b, and c, using qsreg and detrendr . SPod c contains some missing values that were interpolatedbefore the trends were estimated. qsreg is more influenced by the signalcomponent apparent on both nodes resulting in under-smoothing.
5. Analysis of Air Quality Data.
The low-cost “SPod” air qualitysensors output a time series that includes a slowly varying baseline, the sen-sor response to pollutants, and high frequency random noise. These sensorsrecord measurements every second and are used to monitor pollutant con-centrations at the perimeter of industrial facilities. Time points with highconcentrations are identified and compared with concurrent wind directionand speed. Ideally, three co-located and time aligned sensors (as shown inFigure 1) responding to a pollutant plume would result in the same signalclassification after baseline trend removal and proper threshold choice. Wefirst illustrate the difference between our detrend eBIC method, hereafterreferred to as detrendr , and qsreg using data from 13:10 to 15:10 fromFigure 1 (Section 5.1). We then compare the methods on the complete dayshown in Figure 1, estimating trends by applying the qsreg method to 2hour non-overlapping windows of the data and Algorithm 1 to the entireday. We focus on this day because data from three sensors was available.Finally, we examine an entire week of measurements from two co-locatedsensors (Section 5.2). imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING Short series of air quality measurements.
We compare our detrendr method with the qsreg method on a two-hour subset of one-second SPoddata (n=7200) both to facilitate visualization and because the qsreg methodcannot handle all 24 hours simultaneously. We estimate the baseline trendusing τ = { . , . , . } and compare three thresholds for classifying thesignal. The thresholds are calculated as the 90 th , 95 th , and 99 th quantilesof the de-trended series for each SPod. If there is signal present in thedataset, values above these thresholds should occur simultaneously on allthree SPods. We do not use class-averaged accuracy to compare the signalclassifications because we do not have a reference value to define as the“true” signal. Instead, we compute the variation of information (VI) whichcompares the similarity between two classifications. Given the signal classi-fications for SPods a and b, δ ( a ) i ∈ { , } and δ ( b ) i ∈ { , } , for i ∈ { , ..., n } the VI is defined as: r jk = 1 n (cid:88) i (cid:16) δ ( a ) i = j ∩ δ ( b ) i = k (cid:17) VI( a, b ) = − (cid:88) j,k r jk (cid:34) log (cid:32) r jk n (cid:80) i ( δ ( a ) i = j ) (cid:33) + log (cid:32) r jk n (cid:80) i ( δ ( b ) i = k ) (cid:33)(cid:35) where ( j, k ) ∈ { (0 , , (0 , , (1 , , (1 , } . The VI is a distance metric formeasuring similarity of classifications and will be 0 if the classifications areidentical and increase as the classifications become more different.Figure 11 shows the estimated 5 th quantile trends from each method foreach SPod. The detrendr method results in a smoother baseline estimatewhile the qsreg method absorbs more of the peaks obscuring some of thesignal. Figure 12 shows the series after subtracting the detrendr estimateof the 5 th quantile and classifying the signal using the 95 th quantile of thedetrended data. The 90 th and 99 th thresholds are also shown for comparisonin blue and orange, respectively. The largest peaks at 14:10 are easily iden-tified as signal, but good baseline estimates also enable proper classificationof the smaller peaks like the one at 15:12. The under-smoothing of the qsreg method results in less similar signal classifications and higher VI values forthe 90 th and 95 th quantile thresholds (Figure 13). However, when the 99 th threshold is used only the highest observations are classified as signal andthe baseline estimation method isn’t as important (Figure 13).5.2. Long series of air quality measurements.
Algorithm 1 was used toremove the baseline drift from the full day of data (Figure 1) consisting of86,400 observations per SPod and compared to the series detrended using imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 12: Rugplot showing locations of signal after baseline removal usingthe detrendr estimate of 5th quantile. Horizontal dashed lines representthe thresholds calculated using the 90 th , 95 th , and 99 th quantiles of thedetrended data. The 95 th quantile (black) was used to classify the signalshown as vertical lines at the bottom of the plot.Fig 13: Variation of Information between sensors by method (color), quantile(columns) and threshold (shape) for two hour time period. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING Fig 14: Variation of Information between sensors by method (color), quantile(columns) and threshold (shape) for full day.the qsreg trends estimated using non-overlapping 2 hour windows. As in theshorter illustration, the detrendr method results in generally lower VI scoresthan the qsreg method (Figure 14). The detrendr method also results inbetter correlation in the detrended series as is illustrated in (Figure 15).The Spearman correlation coefficients for SPods a and b, SPods a and c,and SPods b and c after removing the 5 th quantile trend using detrendr were 0.37, 0.75, and 0.43, compared with 0.07, 0.24, and 0.16 using qsreg .The noise variance was higher for SPod b than for SPods a and c resulting inlower correlation and higher VI values for the ab and bc metrics comparedwith ac.Finally, we estimated the quantile trends for 7 days of measurements oftwo co-located SPods. Figure 16 demonstrates the improvement in classi-fication similarity when using detrendr . Each point represents a day ofmeasurements and all points that fall below the dashed line have more sim-ilar classifications using detrendr compared to qsreg . The improvement of detrendr over qsreg is more severe at lower thresholds. This indicates that detrendr gives greater agreement on signal classification when the methodsare tuned to deliver positive classifications more frequently.
6. Conclusion and Discussion.
We have expanded the quantile trendfiltering method by implementing a non-crossing constraint, a new algorithmfor processing large series, and proposing a modified criteria for smoothingparameter selection. Furthermore, we have demonstrated the utility of quan- imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Fig 15: SPod a versus SPod c before and after de-trending with qsreg and detrendr .Fig 16: Variation of Information (VI) for detrendr and qsreg by quantiletrend and threshold. Each point represents a full day of data. The dashed linerepresents y=x. In most cases detrendr results in a lower VI than qsreg . imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 UANTILE TREND FILTERING tile trend filtering in both simulations and applied settings. Our ADMM al-gorithm for large series both reduces the computing time and allows trendsto be estimated on series that cannot be estimated simultaneously whileour scaled extended BIC criterion was shown to provide better estimates ofquantile trends in series with and without a signal component.In our application to low cost air quality sensor data, we have shown thatthe baseline drift in low cost air quality sensors can be removed through esti-mating quantile trends, but the data size was too large for existing methodsto be computationally feasible. While qsreg cannot feasibly handle morethan a few hour windows of data, our new methods were able to process24 hours simultaneously and deliver signal classifications that were moreconsistent between the two sensors for a week of data (168 hours).In the future, quantile trend filtering could be extended to observationsmeasured at non-uniform spacing by incorporating the distance in covariatespacing into the differencing matrix. It could also be extended to estimatesmooth spatial trends by a similar adjustment to the differencing matrixbased on spatial distances between observations.
7. Acknowledgments.
The authors greatly appreciate Eben Thomaat the US Environmental Protection Agency for providing the SPod datasets.The authors are also thankful for the Oak Ridge Institute of Science Fel-lowship that partially supported this work. Plots were created using ggplot2(Wickham, 2016).
SUPPLEMENTARY MATERIAL
R-package for detrend routine:
R-package detrendr containing code toperform the methods described in the article. (GNU zipped tar filealso available at https://github.com/halleybrantley/detrendr)
References.
Boyd, S. , Parikh, N. , Chu, E. , Peleato, B. , Eckstein, J. et al. (2011). Distributedoptimization and statistical learning via the alternating direction method of multipliers.
Foundations and Trends R (cid:13) in Machine learning Brantley, H. , Hagler, G. , Kimbrough, E. , Williams, R. , Mukerjee, S. and
Neas, L. (2014). Mobile air monitoring data-processing strategies and effects on spatialair pollution trends.
Atmospheric measurement techniques Chen, J. and
Chen, Z. (2008). Extended Bayesian information criteria for model selectionwith large model spaces.
Biometrika Gabay, D. and
Mercier, B. (1975).
A dual algorithm for the solution of non linear vari-ational problems via finite element approximation . Institut de recherche d’informatiqueet d’automatique. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019 H. L. BRANTLEY ET AL.
Glowinski, R. and
Marroco, A. (1975). Sur l’approximation, par ´el´ements finis d’ordreun, et la r´esolution, par p´enalisation-dualit´e d’une classe de probl`emes de Dirichletnon lin´eaires.
Revue fran¸caise d’automatique, informatique, recherche op´erationnelle.Analyse num´erique Gurobi Optimization, L. (2018). Gurobi Optimizer Reference Manual.
Kim, S.-J. , Koh, K. , Boyd, S. and
Gorinevsky, D. (2009). (cid:96) Trend Filtering.
SIAMReview Koenker, R. (2018). quantreg: Quantile Regression R package version 5.36.
Koenker, R. and
Bassett, G. (1978). Regression Quantiles.
Econometrica Koenker, R. , Ng, P. and
Portnoy, S. (1994). Quantile smoothing splines.
Biometrika Marandi, R. Z. and
Sabzpoushan, S. (2015). Qualitative modeling of the decision-making process using electrooculography.
Behavior research methods Ning, X. , Selesnick, I. W. and
Duval, L. (2014). Chromatogram baseline estimationand denoising using sparsity (BEADS).
Chemometrics and Intelligent Laboratory Sys-tems
156 - 167.
Nychka, D. , Gray, G. , Haaland, P. , Martin, D. and
O’connell, M. (1995). A non-parametric regression approach to syringe grading for quality improvement.
Journal ofthe American Statistical Association Douglas Nychka , Reinhard Furrer , John Paige and
Stephan Sain (2017). fields:Tools for spatial data . R package version 9.6.
Oh, H.-S. , Lee, T. C. M. and
Nychka, D. W. (2011). Fast Nonparametric Quantile Re-gression With Arbitrary Smoothing Methods.
Journal of Computational and GraphicalStatistics Oh, H.-S. , Nychka, D. , Brown, T. and
Charbonneau, P. (2004). Period analysis ofvariable stars by robust smoothing.
Journal of the Royal Statistical Society: Series C(Applied Statistics) Pettersson, K. , Jagadeesan, S. , Lukander, K. , Henelius, A. , Hæggstr¨om, E. and
M¨uller, K. (2013). Algorithm for automatic analysis of electro-oculographic data.
Biomedical engineering online Racine, J. S. and
Li, K. (2017). Nonparametric conditional quantile estimation: A locallyweighted quantile kernel approach.
Journal of Econometrics
Rudin, L. I. , Osher, S. and
Fatemi, E. (1992). Nonlinear total variation based noiseremoval algorithms.
Physica D: Nonlinear Phenomena
259 - 268.
Snyder, E. , Watkins, T. , Solomon, P. , Thoma, E. , Williams, R. , Hagler, G. , Sh-elow, D. , Hindin, D. , Kilaru, V. and
Preuss, P. (2013). The changing paradigm ofair pollution monitoring.
Environmental science & technology Theussl, S. and
Hornik, K. (2017). Rglpk: R/GNU Linear Programming Kit InterfaceR package version 0.6-3.
Thoma, E. D. , Brantley, H. L. , Oliver, K. D. , Whitaker, D. A. , Mukerjee, S. , Mitchell, B. , Wu, T. , Squier, B. , Escobar, E. , Cousett, T. A. et al. (2016).South Philadelphia passive sampler and sensor study.
Journal of the Air & WasteManagement Association Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering.
The Annals of Statistics Tibshirani, R. , Saunders, M. , Rosset, S. , Zhu, J. and
Knight, K. (2005). Sparsityand smoothness via the fused lasso.
Journal of the Royal Statistical Society: Series B(Statistical Methodology) Weingessel, A. and
Turlach, B. A. (2013). quadprog: Functions to solve QuadraticProgramming Problems. R package version 1.5-5. imsart-aoas ver. 2014/10/16 file: detrendify_arXiv.tex date: April 25, 2019
UANTILE TREND FILTERING Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis . Springer-Verlag NewYork.
Yamada, H. (2017). Estimating the trend in US real GDP using the (cid:96) trend filtering. Applied Economics Letters Yu, K. and
Moyeed, R. A. (2001). Bayesian quantile regression.
Statistics & ProbabilityLetters Department of Statistics2311 Stinson DrRaleigh, NC 27606E-mail: [email protected]
E-mail: eric [email protected]
Department of Statistics and Data Science129 Garden Ave.Ithaca, NY 14853E-mail: [email protected]@cornell.edu