Gap Filling of Biophysical Parameter Time Series with Multi-Output Gaussian Processes
Anna Mateo-Sanchis, Jordi Munoz-Mari, Manuel Campos-Taberner, Javier Garcia-Haro, Gustau Camps-Valls
GGAP FILLING OF BIOPHYSICAL PARAMETER TIME SERIESWITH MULTI-OUTPUT GAUSSIAN PROCESSES
Anna Mateo-Sanchis , Jordi Mu˜noz-Mar´ı , Manuel Campos-Taberner ,Javier Garc´ıa-Haro , Gustau Camps-Valls Image Processing Laboratory (IPL). Universitat de Val`encia, Spain. Dep. F´ısica de la Terra i Termodin`amica, Universitat de Val`encia, Spain.
ABSTRACT
In this work we evaluate multi-output (MO) Gaussian Process(GP) models based on the linear model of coregionalization(LMC) for estimation of biophysical parameter variables un-der a gap filling setup. In particular, we focus on LAI andfAPAR over rice areas. We show how this problem cannot besolved with standard single-output (SO) GP models, and howthe proposed MO-GP models are able to successfully predictthese variables even in high missing data regimes, by implic-itly performing an across-domain information transfer.
1. INTRODUCTION
Monitoring vegetation from space is of paramount impor-tance. The Leaf area index (LAI) and the Fraction ofAbsorbed Photosynthetically Active Radiation (fAPAR)are among the most important essential climate variables(ECVs) [1] for land and vegetation monitoring. LAI is a keybio-physical parameter which represents half of the total leafarea per unit of ground area [2], while fAPAR accounts forthe light absorption across an integrated plant canopy. Bothvariables are used as indicators of the state and evolution ofthe vegetation cover.LAI and fAPAR have been extensively used in many agri-cultural and remote sensing studies [3, 4], and are key in cli-mate models [5]. Both products are assimilated into physicalmodels to describe vegetation processes, such transpirationand photosynthesis, as well as machine learning models toestimate carbon, energy and heat fluxes [6].It goes without saying that the precise estimation andmodeling of the evolution through time of these parametershave deep societal, economical and environmental impli-cations. Unfortunately, LAI and fAPAR time-series oftenexhibit high rates of missing data due to the presence ofclouds and snow. Gaps within the time-series reduce theirusefulness for modeling and monitoring environmental phe-nomena [7]. In this context, gap filling and interpolation
This paper has been partially supported by the European ResearchCouncil under Consolidator Grant SEDAL ERC-2014-CoG 647423, andMINECO/ERDF under Grant CICYT TIN2015-64210-R. Published in 2018IEEE International Geoscience and Remote Sensing Symposium. with statistical and process-based approaches has become asuccessful choice in the last decade. Fusion techniques andregression trees were used in image mosaics reconstructionwithout clouds [8, 9]. Kriging and co-kriging techniques wereapplied for spatial interpolation of missing data in Landsatimages [10, 11]. Recently, [12] compared gap filling andinterpolation methods on MODIS LAI products concludingthat, in general, temporal smoothing techniques performedbetter than the rest, especially in high ( > ) missing dataregimes.In this work, we aim to explore advanced machine learn-ing approaches for interpolation. In particular we will focuson Gaussian processes (GPs) for regression [13], which hasrecently provided very good results for model inversion andbio-physical parameter estimation [14]. Noting the tight as-sociation between LAI and fAPAR, we aim to model them to-gether using multi-output Gaussian Processes (MO-GP). Wedo this under the linear model of coregionalization (LMC)framework [15]. These models take into account the relation-ships among output variables learning a cross-domain kernelfunction able to transfer information between time series. Thelearned relations are exploited to do inferences on regionswhere no training samples (gaps) are available for one of thetwo variables. In contrast, the standard procedure of usingindividual GPs for each variable cannot make reliable predic-tions on areas with missing training samples.The paper is organized as follows. Section 2 briefly re-views the standard Gaussian Processes formulation, and intro-duces the linear model of co-regionalization for multi-outputregression problems. Section 3 describes the datasets usedand the experiments and results. Section 4 draws final con-clusions and outlines future work.
2. MULTIPLE-OUTPUT GAUSSIAN PROCESSES
This section first fixes the notation and reviews the standardGP formulation, and then introduces the linear model for co-regionalization in GPs to tackle multiple-output problems. a r X i v : . [ q - b i o . Q M ] D ec .1. Gaussian Process Regression GPs are state-of-the-art statistical methods for regression andfunction approximation, and have been used with great suc-cess in biophysical variable retrieval by following statisticaland hybrid approaches [16]. We start assuming we are givena set of n pairs of measurements, { x i , y i } ni =1 , perturbed by anadditive independent noise. We consider the following model, y i = f ( x i ) + e i , e i ∼ N (0 , σ n ) , (1)where f ( x ) is an unknown latent function, x ∈ R d , and σ n represents the noise variance. Defining y = [ y , . . . , y n ] (cid:124) and f = [ f ( x ) , . . . , f ( x n )] (cid:124) , the conditional distribution of y given f becomes p ( y | f ) = N ( f , σ n I ) , where I is the n × n identity matrix. It is assumed that f follows a n -dimensionalGaussian distribution f ∼ N ( , K ) . The covariance matrix K of this distribution is determined by a squared exponen-tial (SE) kernel function with entries K ij = k ( x i , x j ) =exp( −(cid:107) x i − x j (cid:107) / (2 σ )) , encoding the similarity betweeninput points [13]. In order to make a new prediction y ∗ givenan input x ∗ we obtain the joint distribution over the trainingand test points, (cid:20) y y ∗ (cid:21) ∼ N (cid:18) , (cid:20) C n k (cid:124) ∗ k ∗ c ∗ (cid:21)(cid:19) , where C n = K + σ n I , k ∗ = [ k ( x ∗ , x ) , . . . , k ( x ∗ , x n )] (cid:124) isan n × vector and c ∗ = k ( x ∗ , x ∗ ) + σ n . Using the standardBayesian framework we obtain the distribution over y ∗ con-ditioned on the training data, which is a normal distributionwith predictive mean and variance given by µ GP ( x ∗ ) = k (cid:124) ∗ ( K + σ n I n ) − y ,σ GP ( x ∗ ) = c ∗ − k (cid:124) ∗ ( K + σ n I n ) − k ∗ . (2)One of the most interesting things about GPs is that they yieldnot only predictions µ GP ∗ for test data, but also the uncer-tainty of the mean prediction, σ GP ∗ . Model hyperparameters θ = [ σ, σ n ] determine, respectively, the width of the SE ker-nel function and the noise on the observations, and they areusually obtained by maximizing the marginal likelihood. One the problems with the standard GPR formulation is thatit applies only to scalar functions, i.e., we can predict onlyone variable. A straightforward strategy to deal with severaltarget variables is to develop as may individual GP models asvariables. While generally good performance is attained inpractice, the approach has a clear shortcoming: the obtainedmodels are independent and they do not take into account therelationships between the output variables. In order to handlethis problem, we propose a multi-output GP model based onthe linear model of coregionalization (LMC) [15], also knownas co-kriging in the field of geostatistics [17]. In the multi-output GP model we have a vector function, f : X → R D , where D is the number of outputs. Given areproducing kernel, defined as a positive definite symmetricfunction K : X × X → R N × N , where N is the number ofsamples of each output. In the following, and in order to sim-plify the equations that follows, we assume that all outputshave the same number of training samples, N . The formula-tion where each output has a different number of sources canbe straightforwardly obtained. We can express f ( x ) as f ( x ) = N (cid:88) i =1 K ( x i , x ) c i , (3)for some coefficients c i ∈ R N . These coefficients can beobtained by solving the linear system, obtaining ¯ c = ( K ( X , X ) + λN I ) − ¯ y , (4)where ¯ c , ¯ y are N D vectors obtained by concatenating thecoefficients and outputs, respectively, and K ( X , X ) is an N D × N D matrix with entries ( K ( x i , x j )) d,d (cid:48) for i, j =1 , . . . , N and d, d (cid:48) = 1 , . . . , D . The blocks of this matrix are ( K ( X i , X j )) i,j N × N matrices. Predictions are given by f ( x ∗ ) = K (cid:62) x ∗ ¯ c , (5)with K x ∗ ∈ R D × ND composed by blocks ( K ( x ∗ , x j )) d,d (cid:48) .When the training kernel matrix K ( X , X ) is block diag-onal, that is, ( K ( X i , X j )) i,j = for all i (cid:54) = j , then eachoutput is independent of the others, and we have individualGP models. The non-diagonal matrices establish relation-ships among the outputs.In the linear model of coregionalization (LMC) each out-put is expressed as a linear combination of independent latentfunctions [17], f d ( x ) = Q (cid:88) q =1 a d,q u q ( x ) , (6)where a d,q are scalar coefficients, and u q ( x ) are latent func-tions with zero mean and covariance k q ( x , x (cid:48) ) . It can beshown [15] that the full covariance (matrix) of this model canbe expressed as K ( X , X ) = Q (cid:88) q =1 B q ⊗ k q ( X , X ) , (7)where ⊗ is the Kronecker product. Here each B q ∈ R D × D is a positive definite matrix known as coregionalization ma-trix ,and it encodes the relationships between outputs.
3. DATA COLLECTION AND EXPERIMENTALRESULTS3.1. Data collection
In this study, we focus on the LAI and fAPAR biophysicalvariables from the Moderate Resolution Imaging Spectrora-
003 2003.5 2004 2004.5 2005 2005.5 2006 2006.5 2007
Years L A I Results in SO and MO models
TrainTestSOMO
Years f A P AR TrainTestSOMO
Fig. 1 . Individual (SO, green) and multi-output (MO, blue)predictions for LAI (top) and fAPAR (bottom).diometer (MODIS) products. In particular, both variableswere obtained from the Collection-5 MOD15A2 1-kilometerresolution product on a sinusoidal grid. The temporal resolu-tion of LAI/fAPAR is eight days based on a daily composi-tion, which allows to obtain 46 estimates every year. Specif-ically, we focused on inter-annual variability of rice areaslocated in the Val`encia rice district (Mediterranean coast inIberian peninsula) from 2003 to 2014. Typical rice plant phe-nology exhibits LAI values varying from zero (seeding) up to6-7 (flowering), whereas fAPAR ranges from 0 to 1.
For the experiments, we simulate an scenario where we havemissing data at the two ends of both time series. For LAI,the missing data is during the first years while for fAPAR isduring the last years. Our goal is to obtain a model able topredict the missing parts in one variable using the informationavailable from the other. The experiment was repeated withdifferent number of missing years.We compared two GP models. First, we trained individ-ual, single-output (SO-GP) models for each variable. The ra-tional behind this first experiment is to assess how well SOmodels predict the variables in the regions with gaps, i.e.,those without training samples. In the second experiment,we used MO-GP models. In this case, these models shouldbe able to make better predictions on ‘gap’ regions using theinformation inferred from the second, correlated variable.
Figure 1 shows the predictions obtained with SO and MOGPs. For improved visualization, we focus in the missingyears period only. The SO models (green line) are unable
Number missing years L A I R SOMO
Number missing years f APA R R SOMO
Fig. 2 . Evolution of the coefficient R for both LAI (left) andfAPAR (right) w.r.t. number of missing years.to make predictions where training data are missing, whilethe MO GPs (blue line) can predict efficiently the variablesin those regions, performing a sort of transfer learning acrossvariables. Table 1 shows the determination coefficient R and the rootmean square error (RMSE) in the test set for both modelsand different number of missing years (gaps). SO-GP mod-els achieves an R lower than 0.90, whereas MO-GP obtainmuch better results, close to 1. When comparing the RMSE,SO results are three times bigger than MO for both variables. Table 1 . Results in terms of R and RMSE for different gaps. LAI fAPAR
Model miss. years R RMSE R RMSE
SO-GP 1 0.914 0.553 0.894 0.089MO-GP 1 0.992 0.174 0.986 0.033SO-GP 2 0.830 0.779 0.827 0.113MO-GP 2 0.987 0.231 0.978 0.042SO-GP 3 0.737 0.968 0.744 0.138MO-GP 3 0.982 0.289 0.971 0.049SO-GP 4 0.652 1.113 0.647 0.162MO-GP 4 0.978 0.332 0.964 0.056SO-GP 5 0.559 1.253 0.549 0.183MO-GP 5 0.966 0.387 0.955 0.063
We repeated the experiments with an increasing numberof full missing years. Figure 2 shows how the coefficient R decreases as the number of missing years increases. In alltests, the coefficient R for MO is greater than SO for bothvariables, but for SO it gradually decreases, while for MO itdrops more abruptly. The decrease in the error of the MO-GPhappens when more than five years are removed, therefore nothaving a common period in both time series. In Fig. 3 we show the autocorrelation functions (ACFs) ofthe actual time series, and their SO and MO predictions. Theunction gives us a summary about the time structure cap-tured by the models. For both variables, the MO model (bluecurve) shows a closer ACF to the actual time series ACF(black curve) than SO does (green curve).
Lags [weeks] L A I au t o c o rr e l a t i on Lags [weeks] f APA R au t o c o rr e l a t i on Fig. 3 . Autocorrelation curves for both LAI (left) and fAPAR(right) w.r.t. number of weeks. This curves show how differ-ent models capture the series time structure.
4. CONCLUSIONS
We have shown in this work how estimating several biophys-ical parameters simultaneously helps to attain consistent pre-dictions and for gap filling scenarios where the missing infor-mation in one variable can be complemented by another one.Multi-output regression is also convenient because only onemodel is needed, which reduces the computational workload.We evaluated a multiple-output GP model based on lin-ear co-regionalization to solve a time series gap filling prob-lem. We used MODIS LAI and fAPAR times series of 11years. Results showed that the presented model is able tosuccessfully predict both variables simultaneously in regionswhere no training samples are available by intrinsically ex-ploiting the relationships between the considered output vari-ables, LAI and fAPAR. Future work will consider more so-phisticated methods of variable coupling and sparse GPs.
5. REFERENCES [1] GCOS, “Systematic observation requirements for satellite-based products for climate, 2011,” 2011.[2] J. M. Chen and T. A. Black, “Defining leaf area index for non-flat leaves,”
Plant, Cell & Environment , vol. 15, no. 4, pp.421–429, May 1992.[3] R. B. Myneni, R. Ramakrishna, R. Nemani, and S. W. Running,“Estimation of global leaf area index and absorbed par usingradiative transfer models,”
IEEE Transactions on Geoscienceand Remote Sensing , vol. 35, no. 6, pp. 1380–1393, Nov. 1997.[4] M. Campos-Taberner, F. J. Garc´ıa-Haro, G. Camps-Valls,G. Grau-Muedra, F. Nutini, A. Crema, and M. Boschetti, “Mul-titemporal and multiresolution leaf area index retrieval for op-erational local rice crop monitoring,”
Remote Sensing of Envi-ronment , vol. 187, no. Supplement C, pp. 102–118, Dec. 2016. [5] Piers J. Sellers, Compton J. Tucker, G. James Collatz, Si-etse O. Los, Christopher O. Justice, Donald A. Dazlich, andDavid A. Randall, “A Revised Land Surface Parameteriza-tion (SiB2) for Atmospheric GCMS. Part II: The Generation ofGlobal Fields of Terrestrial Biophysical Parameters from Satel-lite Data,”
Journal of Climate , vol. 9, no. 4, pp. 706–737, Apr.1996.[6] G. Camps-Valls, M. Jung, K. Ichii, D. Papale, G. Tramontana,P. Bodesheim, C. Schwalm, J. Zscheischler, M. Mahecha, andM. Reichstein, “Ranking drivers of global carbon and energyfluxes over land,” in
Geoscience and Remote Sensing Sym-posium (IGARSS), 2015 IEEE International , July 2015, pp.4416–4419.[7] Daniel J. Weiss, others, Peter M. Atkinson, Samir Bhatt, Bon-nie Mappin, Simon I. Hay, and Peter W. Gething, “An effec-tive approach for gap-filling continental scale remotely sensedtime-series,”
Isprs Journal of Photogrammetry and RemoteSensing , vol. 98, pp. 106–118, Dec. 2014.[8] Christine Pohl, “Tools and Methods for Fusion of Images ofDifferent Spatial Resolution,”
International Archives of Pho-togrammetry and Remote Sensing , vol. 32, pp. 3–4, July 1999.[9] C.G. Homer, R.D. Ramsey, T.C. Edwards, and A. Falconer,“Landscape covertype modeling using a multi-scene thematicmapper mosaic.,”
Photogrammetric Engineering and RemoteSensing , vol. 63, pp. 59–67, 1997.[10] C. Zhang, W. Li, and D. Travis, “Gaps-fill of SLC-off LandsatETM+ satellite image using a geostatistical approach,”
Inter-national Journal of Remote Sensing , vol. 28, no. 22, pp. 5103–5122, Nov. 2007.[11] Chuanrong Zhang, Weidong Li, and David J. Travis, “Restora-tion of clouded pixels in multispectral remotely sensed imagerywith cokriging,”
International Journal of Remote Sensing , vol.30, no. 9, pp. 2173–2195, May 2009.[12] S. Kandasamy, F. Baret, A. Verger, P. Neveux, and M. Weiss,“A comparison of methods for smoothing and gap filling timeseries of remote sensing observations – application to MODISLAI products,”
Biogeosciences , vol. 10, no. 6, pp. 4055–4071,June 2013.[13] C. E. Rasmussen and C. K. I. Williams,
Gaussian Processesfor Machine Learning , The MIT Press, New York, 2006.[14] G. Camps-Valls, J. Verrelst, J. Mu˜noz-Mar´ı, V. Laparra,F. Mateo-Jimenez, and J. Gomez-Dans, “A survey on gaus-sian processes for earth observation data analysis: A compre-hensive investigation,”
IEEE Geoscience and Remote SensingMagazine , , no. 6, June 2016.[15] Mauricio A. Alvarez, Lorenzo Rosasco, and Neil D.Lawrence, “Kernels for Vector-Valued Functions: a Re-view,” arXiv:1106.6251 [cs, math, stat] , June 2011, arXiv:1106.6251.[16] J. Verrelst, L. Alonso, G. Camps-Valls, J. Delegido, andJ. Moreno, “Retrieval of vegetation biophysical parameters us-ing gaussian process techniques,”
IEEE Transactions on Geo-science and Remote Sensing , vol. 50, no. 5/P2, pp. 1832–1843,2012, cited By 26.[17] A. Journel and C. Huijbregts,