[PDF] A method for filling gaps in solar irradiance and in solar proxy data

Abstract

Data gaps are ubiquitous in spectral irradiance data, and yet, little effort has been put into finding robust methods for filling them. We introduce a data-adaptive and nonparametric method that allows us to fill data gaps in multi-wavelength or in multichannel records. This method, which is based on the iterative singular value decomposition, uses the coherency between simultaneous measurements at different wavelengths (or between different proxies) to fill the missing data in a self-consistent way. The interpolation is improved by handling different time scales separately. Two major assets of this method are its simplicity, with few tuneable parameters, and its robustness. Two examples of missing data are given: one from solar EUV observations, and one from solar proxy data. The method is also appropriate for building a composite out of partly overlapping records.

Full PDF

aa r X i v : . [ a s t r o - ph . I M ] J u l A method for ﬁlling gaps in solar irradianceand in solar proxy data

Thierry Dudok de Wit

LPC2E, Observatoire des Sciences de l’Univers en Région Centre,3A avenue de la Recherche Scientiﬁque, 45071 Orléans cedex 2, France([email protected])

Slightly expanded version of an article to appear in Astronomy and Astrophysics (2011),Doi: 10.1051/0004-6361/201117024

Abstract

Data gaps are ubiquitous in spectral irradiance data, and yet, little effort has been put into ﬁndingrobust methods for ﬁlling them. We introduce a data-adaptive and nonparametric method that allowsus to ﬁll data gaps in multi-wavelength or in multichannel records. This method, which is based onthe iterative singular value decomposition, uses the coherency between simultaneous measurements atdifferent wavelengths (or between different proxies) to ﬁll the missing data in a self-consistent way. Theinterpolation is improved by handling different time scales separately.Two major assets of this method are its simplicity, with few tuneable parameters, and its robustness.Two examples of missing data are given: one from solar EUV observations, and one from solar proxydata. The method is also appropriate for building a composite out of partly overlapping records.

Solar and stellar irradiance records are often plagued by data gaps. The proper interpolation of these missingdata is a longstanding and notoriously delicate problem that requires a good understanding of the data(Wiener, 1964; Little and Rubin, 2002). Considerable attention has been given to this problem in ﬁelds suchas climate science (Dobesch et al., 2007) but much less so in solar physics and in astrophysics. Often, thelimited attention that is paid to data gaps contrasts with the sophistication of the analysis that is subsequentlyperformed on these data.While short gaps can easily be ﬁlled by linear or by nonlinear interpolation, data gaps whose durationexceeds the characteristic time scales are much more difﬁcult to handle. A notable exception is whenmultichannel synoptic observations of the same process are available, with gaps in some or in all of them.Spectral irradiance observations, which we shall concentrate on, precisely belong to that category. Ourexamples will be taken from the Sun, but the results can be easily extended to other types of multichannelobservations. Our method applies to any set of observations that are recorded simultaneously (i.e. the timestamps are the same for all records), are correlated with each other, and whose time intervals fully or partlyoverlap. Our main assumption is their linear correlation, in the sense that each record can be approximatedby a linear combination of the other ones.A strong linear correlation is typically observed between spectral irradiance observations made at differ-ent wavelengths or between simultaneous measurements of different proxies. These synoptic records arefrequently used to assess subtle changes in the variability of the Sun; they are often remarkably coherentin time t and in wavelength λ . As a consequence, their variability can be explained in terms of a fewcontributions only. This property is well known for the Extreme UltraViolet (EUV) (Lean et al., 1982;Amblard et al., 2008) but also for the visible range (Rabbette and Pilewskie, 2001), when measured fromspace. The same coherency is observed among different proxies for solar activity (Pap and Guhathakurta,1992; Schmahl and Kundu, 1994; Lean, 2000; Kane, 2002; Floyd et al., 2005; Dudok de Wit et al., 2009).This property is rooted in the structuring effect of the solar magnetic ﬁeld. The coherency partly breaks1own during the impulsive phase of solar ﬂares because the spectrum then considerably depends on the lo-cal conditions of the solar atmosphere. Here, however, as in many applications, we consider daily or hourlyaverages, so that the effect of short transients can be discarded.This coherency in both time and wavelength is the key to the reconstruction technique we shall introducebelow. By interpolating along two dimensions (in time and along different records), we not only improvethe quality of the reconstruction, but we also can ﬁll arbitrarily large data gaps without having to rely on thetedious bookkeeping that is required by most interpolation schemes.The nonparametric and data-adaptive method we advocate is based on the SVD or singular value decom-position (Golub and Van Loan, 2000), which is to linear algebra what the Fourier transform is to spectralanalysis. The SVD allows the extraction of the coherent part of the solar spectral irradiance, which is thenused to ﬁll the data gaps iteratively. The method is described in Section 2, and two applications are detailed.The ﬁrst one (Sec. 3) deals with solar spectral irradiance data in the EUV. In the second application (Sec.4) we consider a set of solar proxies with numerous gaps. Let I ( λ, t ) be a multichannel record that represents either the solar spectral irradiance at different wave-lengths (or in different spectral bands) or a set of solar proxies, or a combination thereof. All these quantitiesmust be sampled simultaneously; the sampling rate, however, does not need to be constant. These data areconveniently stored in a matrix I ij = [ I ( t i , λ j )] , in which columns are time series. Each column may havean arbitrarily large number of data gaps, as long as a reasonable fraction of observations are available, sayat least 20%. The method we propose exploits either the coherency in wavelength, or both the coherency in wavelengthand in time. We start with a description of the ﬁrst option, because the second one can be readily obtainedby data embedding. Let us ﬁrst assume that there are no gaps. The SVD of the data matrix then yields aseparable set of functions (hereafter called modes ) I ( t i , λ j ) = M X k =1 s k u k ( t i ) v k ( λ j ) , (1)which are orthonormal hh u k ( t ) u l ( t ) i = h v k ( λ ) v l ( λ ) i = (cid:26) if k = l if k = l . (2)The weights s ≥ s ≥ . . . ≥ s M ≥ are positive by construction. The number M of modes equals therank of the matrix, which is usually the smallest of the number N t of samples or the number N λ of records.This decomposition is unique. The SVD of the data matrix directly yields a set of three matrices I = USV T that respectively contain u ( t ) , the weights s , and v ( λ ) .A key property of the SVD is that modes with heavy weights describe salient features of the data. That is,the truncated expansion ˆ I K ( t i , λ j ) = K ≤ M X k =1 s k u k ( t i ) v k ( λ j ) (3)will capture the coherent part of the data while deferring incoherent ﬂuctuations to the remaining modes.This property has made the SVD popular in multichannel and array data processing (Dudok de Wit, 1995;Cline and Dhillon, 2006). We shall use it here to reconstruct the missing values.The performance of the reconstruction can be quantiﬁed by the mean square error e = N t X i =1 N λ X j =1 (cid:16) I ( t i , λ j ) − ˆ I K ( t i , λ j ) (cid:17) = M X k = K +1 s k , (4)2hich shows that by taking the few largest modes, the reconstruction error can be made arbitrarily small.As it turns out with spectral irradiance data, the ﬁrst few weights are often orders of magnitude heavier thanthe subsequent ones, so that excellent reconstructions can be achieved with a few modes only. We implicitlyassume here that features departing from the behaviour observed at other wavelengths are unlikely to havea solar origin (except during the impulsive phase of ﬂares), so that they can be readily discarded. This willbe illustrated below in Sec. 3.Let us now assume that some samples are missing. The data covariance matrix and the SVD then cannot becomputed anymore. This problem, however, can be circumvented by using the following iterative schemewith two embedded loops:1. Fill each gap with some adequate value (typically the temporal mean of the record).2. Compute the SVD.3. Compute the approximation ˆ I k of the data by retaining the k largest mode(s) of the SVD. Initially, k = 1 .4. Fill the gaps with ˆ I k , as deﬁned in Eq. 3. As long as these values have not converged, go back to 2.(inner loop).5. Increment the number of modes k and start again at 2. Iterate until k = K (outer loop).This method seems to have emerged independently in different contexts (Schneider, 2001; Beckers and Rixen,2003; Kondrashov and Ghil, 2006); it has mostly been used for spatio-temporal data sets, with some subtledifferences (Schneider, 2007). We refer to Schneider (2001) for discussions on optimality, convergence,etc. Three additional adaptations, however, need to be considered before the method can be applied toirradiance data. The relative variability and the average value of the solar spectral irradiance vary by orders of magnitudebetween the soft X-ray and the visible range. The SVD, however, is scaling-dependent and so a renormal-isation is required. We do so by standardising each record: ﬁrst, the time average is subtracted and then anormalisation with respect to the standard deviation σ λ j or the noise level (if known) is performed. Bothoperations are affected by the value of the missing samples, so they must be repeated at each iteration. Thisis particularly important for the offset subtraction. The renormalisation may be done only once. The solar spectral variability contains a mix of scales that are driven by different processes: 27-day vari-ations are due to solar rotation, the 11-year periodicity is caused by the solar cycle, etc. Each of theseprocesses leads to a speciﬁc spectral dependence; different scales should therefore be processed separatelywhen ﬁlling gaps. This feature considerably improves the reconstruction skill and to the best of our knowl-edge has not yet been used.The two ranges of scales that are most frequently encountered in solar studies are: below 81 days (whichcaptures solar rotation and the evolution of active regions) and above 81 days. We apply the iterative SVDprocedure described in Sec. 2.1 separately to both scales. The à trous wavelet transform (Mallat, 2008)is used to decompose the data into two records at each iteration: one with short time-scales and one withlong time-scales. Classical bandpass ﬁlters may also be used because this has no signiﬁcant impact onthe results. The wavelet transform, however, is better suited for non-stationary data. One may also want toextract additional scales, such as the 13-day periodicity associated with centre-to-limb effects of hot coronallines. This indeed results in a small but discernible improvement in the reconstruction of the EUV, at theexpense of a longer computation time. 3 .4 Coherency in time

The methodology so far only exploits the coherency between different wavelengths (or proxies), which isthe key property. One may also want, however, to make use of the temporal coherency. This is usefulwhen there are speciﬁc times at which there is no single observation, or if the number of records is small(typically N λ < ), or if each record can be considered as a smoothly varying waveform with incoherentnoise superimposed on it.The main asset of the iterative SVD reconstruction method is its straightforward extension to such a ﬁlteringin time, using the concept of embedding, which has been pioneered in the study of chaotic systems byBroomhead and King (1986). Let us expand the data matrix by appending replicates that are shifted intime, i.e. E ij = [ I ( t i , λ j ) I ( t i +1 , λ j ) I ( t i +2 , λ j ) . . . I ( t i + D − , λ j )] . (5)By applying the SVD to this embedded matrix we exploit both the coherency in wavelength and in time. Itis important (but not mandatory) that the data be regularly sampled since the method essentially computes aweighted average of each sample with its nearest neighbours. The higher the value of the embedding dimen-sion D , the more adjacent time steps are used in the reconstruction, thus leading to a stronger smoothingin time. This is equivalent to using a symmetric ﬁnite-impulse ﬁlter whose coefﬁcients are obtained data-adaptively. The particular case wherein one single record is embedded and decomposed by SVD is calledsingular spectrum analysis (SSA). In the SSA, only temporal information is used and so it is important forthe embedding dimension to exceed the value of the dominant period in the data (Ghil et al., 2002). Ourreconstruction, however, mostly relies on the strong coherency across wavelengths or proxies to ﬁll the gapsand so the conditions on the value of the embedding dimension are much less stringent. In practice, lowvalues ( D = 2 − ) already bring a signiﬁcant improvement. The main reason for keeping this dimensionas low as possible is to reduce the computational load. The three tuneable parameters of the method are: a) the number K of signiﬁcant modes, b) the number ofscales into which the data are decomposed, and c) the embedding dimension D . Only the ﬁrst one reallyaffects the outcome. A separation into two scales only (with a threshold between 50-100 days) is enoughto properly capture both short- and long-term evolutions, and embedding dimensions of D = 2 − areusually adequate for reconstructing daily averages. The determination of the optimum parameters and thevalidation of the results is made by cross-validation and will be illustrated below.The only critical question is memory and computational load. For an irradiance data set with ﬁve years ofdaily values at 100 wavelengths, and an embedding dimension of D = 5 , the size of the embedded matrix is [1822 , . The computation of the SVD at each iteration typically takes several seconds. For that reason,it may be desirable to process separately those spectral bands that evolve differently, such as the soft X-ray,the EUV and the MUV bands. The routine in Matlab R (cid:13) is available from the author. The Solar EUV Monitor (SEM) is a solar Extreme UltraViolet (EUV) spectrometer that has been operatingcontinuously on the SoHO satellite since January 1996 (Judge et al., 1998). In its ﬁrst-order mode, SEMmeasures the irradiance within an 8 nm bandpass centred about the bright 30.38 nm He II line. On June25, 1998, SoHO suffered a mission interruption, leading to the loss of several months of data. This longdata gap considerably complicates the use of SEM data for upper atmosphere model validation. The SEM,however, mostly captures chromospheric emissions, which are highly correlated with other gauges of solaractivity. Foremost among these are: • the f . or decimetric index, which is the solar radio ﬂux at 10.7 cm. This index, which is measuredfrom the ground, captures a mix of thermal and electron gyro-resonance emissions, and has beenshown to be highly correlated with the EUV ﬂux (Tapping and Detracey, 1990). • the Mg II index, which is the core-to-wing ratio of the Mg II line at 280 nm. This index is widelyused as a proxy for chromospheric activity (Viereck et al., 2001).4

000 2005 201001234567 a m p li t ude [ a . u .] p f . MgIILyman α SEMJan 03 Jul 03 Jan 0401234567 date a m p li t ude [ a . u .] Figure 1: Upper plot: four chromospheric proxies, averaged over 80 days, using a Gaussian ﬁlter. The twomajor outages are shown shaded. Bottom plot: excerpt of the same proxies, showing daily values. Thelong-term trend has been subtracted from the latter. All records have been normalised to their standarddeviation and shifted vertically for easier visualisation. • the intensity of the H I Lyman α line at 121.57 nm, which is the brightest spectral line below 200 nm(Woods et al., 2000).Together with the ﬂux from the SEM, we have four quantities that have different physical origins and yet arehighly correlated, thereby opening the prospect of ﬁlling the large gaps in the SEM data. We consider dailyaverages made from January 1, 1996 until April 29, 2011. The linear correlation between the f . indexand the other proxies improves when taking its square root, which we shall systematically do from now on.The correlation between these four proxies on both long and short time-scales is illustrated in Fig. 1.Our working hypothesis is that each of the missing samples from the SEM can be reconstructed from a linearcombination of (possibly non-simultaneous) observations of the other proxies. As we shall see shortly, thebest value of the embedding dimension is 4; let us therefore select D = 4 and ﬁrst determine the optimumnumber of modes. With four variables and an embedding dimension of 4, the total number of SVD modesis 16; their weights are displayed in Fig. 2. The ﬁrst weight surpasses all the others because the ﬁrst modeis an average of all four proxies, which is by far the most conspicuous coherent feature. The inﬂexion pointbetween the few heaviest weights and the ﬂat tail provides a convenient but visual criterion for determiningthe number of signiﬁcant modes (Dudok de Wit, 1995). According to this criterion, the best interpolationskill is for K = 5 − modes out of 16.A better validation test consists in generating a small number of synthetic gaps, reconstructing them, andthen checking how the residual error varies with the model parameters. To do so, we remove 5-10 % of thesamples from each record and then compute the normalised error ǫ K ( λ j ) = 1 σ λ j vuut N gaps X i = { gaps } (cid:16) I ( t i , λ j ) − ˆ I K ( t i , λ j ) (cid:17) , −2 −1 mode number k s k / s no r m a li s ed e rr o r ε K [ % ] sqrt(f10.7)MgIILyman α SEM

Figure 2: Upper plot: distribution of the normalised weights s k /s , for an embedding dimension of D = 4 .The total number of modes is 16. Bottom plot: variation of the reconstruction error with the number ofmodes K , for each of the four variables.where the average is computed for synthetic gaps only. This procedure is repeated ten times to obtain anestimate of the average value of the normalised error. A value of 100 % can be interpreted as an error whosestandard deviation equals the solar cycle variability of the original data. This value truly reﬂects the errormade by ﬁlling short data gaps. Note that it tends to underestimate the error for larger gaps, unless thelength distribution of the synthetic gap matches that of the original data.The evolution of the normalised error with K is illustrated in Fig. 2, which shows a broad minimum around K = 4 − , in agreement with the estimate obtained by visualisation. Note that the four minima occur atdifferent values of K . The normalised error is on average larger for the √ f . index, which suggests thatthis quantity is relatively more difﬁcult to reconstruct than the others. This is not so surprising, becauseit is the only emission from the radio band. The smallest normalised error is obtained for the SEM, with ǫ K = 4 . . This value is about half that of the estimated normalised uncertainty (Judge et al., 1998),which shows the excellent quality of the reconstruction. In practice, the optimum value of K is frequentlyfound to be one or two units higher than the value obtained by visual inspection. As Fig. 2 suggests, anoverestimation of K is preferable to an underestimation.The choice of the embedding dimension D is mostly based on physical insight. With D = 1 (i.e. noembedding) we assume that the missing samples are reconstructed from simultaneous observations only,whereas D > implies that the information contained in past and future observations is also used. Setting D > therefore involves a weighted averaging over time, which is appropriate for records whose samplesare highly correlated in time.In Fig. 3 we estimate the normalised error for different embedding dimensions, using the optimum numberof modes for each of them. The smallest error is obtained for an embedding dimension of D = 4 . Largerdimensions hardly reduce the error but do increase the computational load substantially. As expected, thehigher the value of D , the smoother the reconstruction and the more likely that ﬁne features may be missed.This is particularly evident in August 1998, when a group of rapidly evolving active regions were movingacross the solar disc. An embedding dimension of 4 properly captures their evolution, whereas a dimensionof 15 smears out all but the most pronounced peaks. 6 no r m a li s ed e rr o r ε K [ % ] Apr 98 Jul 98 Oct 98 Jan 992345 x 10 date f l u x [ pho t on s / c m / s ] Figure 3: Upper plot: variation of the reconstruction error for the SEM with the embedding dimension D . For each embedding dimension, the number K of modes that minimises the error is chosen. Bottomplot: comparison between the measured ﬂux from the SEM (dashed line) and the ﬂux reconstructed withan embedding dimension of D = 4 (thick line), and D = 15 (thin line). The Mg II index is shown forcomparison (ﬁlled curve), with arbitrary units. 7

000 20010.080.0850.090.0950.1 C a K i nde x C a K i nde x Figure 4: Reconstruction of the missing values of the Ca K index using 1 to 6 modes. The upper plot showsan excerpt at solar maximum and the bottom one at solar minimum. The observations are indicated withcrosses and the different reconstructions with continuous lines. Also shown is the Mg II index (ﬁlled line),in arbitrary units.This example illustrates a relatively simple case because only one record has gaps in it. Let us now, however,consider a more frequent case in which several of the records have large gaps. Filling these gaps by standardinterpolation schemes can become very time-consuming because of the amount of bookkeeping that isrequired to test whether gaps occur simultaneously in several records, etc. The SVD-based interpolationdoes not require any of these tests.

The Ca K index is the normalised intensity of the Ca II K-line at 393.37 nm and has been advocated asa proxy for magnetic activity, including plages, faculae, and the network. This line is measured from theground, so it cannot be observed continuously. Here we consider a record of daily observations made at theNational Solar Observatory at Sacramento Peak (Keil et al., 1998), in which about 66% of the samples aremissing. This index is known to be highly correlated with other solar indices, in particular with the Mg IIindex (Foukal et al., 2009), so that the SVD method is ideally suited for ﬁlling its gaps.To reconstruct the missing values, we consider the following set of proxies that are highly correlated withthe Ca K index: the square root of the f . index, the intensity of the H I Lyman α line, the Mg II index andthe magnetic plage strength index (MPSI) (Parker et al., 1998). The time interval ranges from Nov. 1, 1980to July 1, 2010; all proxies have data gaps except for the ﬁrst two. These gaps occur erratically and 6% ofthem exceed 10 days. In this particular example, the coherency between proxies is crucial and indeed thechoice of the embedding dimension D does not signiﬁcantly affect the results. Let us take D = 2 , whichis the value that is recommended by the reconstruction error. The maximum number of SVD modes is 10because we have ﬁve records. Out of these, three only are found to be signiﬁcant.The result of the reconstruction is illustrated in Fig. 4 for periods of high and low solar activity. Note that theresults obtained with different number of modes lead to similar temporal evolutions. The reconstruction at8

980 1990 2000 2010−0.0100.010.020.03 year C a K i nde x Figure 5: Ca K index after reconstruction and ﬁltering by wavelet transform (continuous curve) and the dif-ference with the observations (crosses). For easier visualisation, the Ca K index has been shifted downwardsby 0.08.solar maximum looks reasonable because it passes through the observations while staying highly correlatedwith the Mg II index. During solar minimum, however, the observed values of the Ca K index continueto ﬂuctuate whereas the reconstructed values and the other proxies stay almost constant. The differencebetween the observed Ca K index and the smoothly varying reconstruction varies randomly in time, whichquestions its solar origin.To further investigate the origin of this difference between the observed and reconstructed index, we ﬁlteredthe reconstructed data with the à trous wavelet transform, which allows the separation of the sharp peaksfrom the more regular reconstruction. The residuals, i.e. the difference between the ﬁltered reconstructionand the original observations, are shown in Fig. 5: they are found to be independent and their Gaussiandistribution only weakly varies with the solar cycle. This is a strong indication that the residuals are mea-surement errors rather than solar ﬂuctuations. Their standard deviation is 0.0008, which represents 20 %of the solar cycle variability of the Ca K index. Our reconstruction thereby provides a means for ﬁtting thenumerous data gaps in the Ca K index while also evaluating the conﬁdence interval of the observations.

This study shows that SVD-interpolation is a powerful technique for ﬁlling arbitrarily large gaps in multi-wavelength, multichannel or in synoptic records. We focused here on solar spectral irradiance observations,which are frequently plagued by missing data. These gaps may be distributed at random in time or inwavelength. The main tuneable parameter is the number of SVD modes that is needed to reconstruct thedata; this value may be estimated either by visualisation or by cross-validation. The method works bestwhen each record can be approximated by a linear combination of the others. Since it relies on linearcombinations only, it may be desirable to apply a nonlinear static transform beforehand to increase thelinear correlation between the records.For the method to work, the observations must be sampled simultaneously but not necessarily evenly. Non-simultaneous observations can be handled by resampling all variables to a common grid, for example byFourier decomposition (e.g. Hocke and Kämpfer, 2009), and then ﬁlling the gaps by SVD. By alternatingbetween the two, both the gaps and the interpolated values can be progressively reﬁned.This method has several applications in addition to mere interpolation. The ﬁrst one is the cross-calibrationof measurements of the same quantity by different instruments. The Mg II index, for example, is at presentmeasured by different instruments that give different amplitudes. These data sets are incomplete and onlypartly overlap, which considerably impairs their inter-comparison. The iterative SVD method is ideallysuited for ﬁlling these gaps because the records are by deﬁnition strongly correlated.9 second potential application is the stitching together of total solar irradiance (TSI) observations. MergingTSI records from several instruments is a delicate and controversial task (Fröhlich, 2002) because instru-ments disagree on the absolute value of the TSI and often do not operate simultaneously. The iterativeSVD provides a means for estimating the different offsets in a self-consistent way because it allows us toextrapolate each TSI record by assuming that its statistical properties with respect to the other records donot change in time. This property is particularly useful for checking composites that are built from differentrecords, such as the TSI, the H I Lyman α intensity, the Mg II index and the sunspot index (Clette et al.,2007). This will be detailed in a forthcoming publication. Acknowledgements

I thank the following institutes for providing the data: the Laboratory for Atmospheric and Space Physics (Universityof Colorado) for the Mg II and H I Lyman α References

Amblard, P., Moussaoui, S., Dudok de Wit, T., Aboudarham, J., Kretzschmar, M., Lilensten, J., andAuchère, F. (2008). The EUV Sun as the superposition of elementary Suns.

Astron. Astroph. , 487:L13–L16.Beckers, J. M. and Rixen, M. (2003). EOF Calculations and Data Filling from Incomplete OceanographicDatasets*.

Journal of Atmospheric and Oceanic Technology , 20:1839–1856.Broomhead, D. S. and King, G. P. (1986). Extracting qualitative dynamics from experimental data.

PhysicaD Nonlinear Phenomena , 20:217–236.Clette, F., Berghmans, D., Vanlommel, P., van der Linden, R. A. M., Koeckelenbergh, A., and Wauters,L. (2007). From the Wolf number to the International Sunspot Index: 25 years of SIDC.

Adv. SpaceResearch , 40:919–928.Cline, A. K. and Dhillon, I. S. (2006). Computation of the singular value decomposition. In Hogben, L.,editor,

Handbook of linear algebra , chapter 45, pages 1–13. CRC Press, Boca Raton.Dobesch, H., Dumolard, P., and Dyras, I., editors (2007).

Spatial Interpolation for Climate Data: The Useof GIS in Climatology and Meteorology . Geographical Information System Series. Wiley, London.Dudok de Wit, T. (1995). Enhancement of multichannel data in plasma physics by biorthogonal decompo-sition.

Plasma Physics and Controlled Fusion , 37:117–135.Dudok de Wit, T., Kretzschmar, M., Lilensten, J., and Woods, T. (2009). Finding the best proxies for thesolar UV irradiance.

Geoph. Res. Lett. , 36:10107.Floyd, L., Newmark, J., Cook, J., Herring, L., and McMullin, D. (2005). Solar EUV and UV spectralirradiances and solar indices.

Journal of Atmospheric and Solar-Terrestrial Physics , 67:3–15.Foukal, P., Bertello, L., Livingston, W. C., Pevtsov, A. A., Singh, J., Tlatov, A. G., and Ulrich, R. K.(2009). A century of solar CaII measurements and their implication for solar UV driving of climate.

Solar Physics , 255:229–238.Fröhlich, C. (2002). The total solar irradiance variations since 1978.

Adv. Space Research , 29:1409–1416.Ghil, M., Allen, M. R., Dettinger, M. D., Ide, K., Kondrashov, D., Mann, M. E., Robertson, A. W., Saunders,A., Tian, Y., Varadi, F., and Yiou, P. (2002). Advanced Spectral Methods for Climatic Time Series.

Reviews of Geophysics , 40:1003.Golub, G. H. and Van Loan, C. F. (2000).

Matrix Computations . Johns Hopkins Press, Baltimore.10ocke, K. and Kämpfer, N. (2009). Gap ﬁlling and noise reduction of unevenly sampled data by means ofthe Lomb-Scargle periodogram.

Atmospheric Chemistry & Physics , 9:4197–4206.Judge, D. L., McMullin, D. R., Ogawa, H. S., Hovestadt, D., Klecker, B., Hilchenbach, M., Mobius, E.,Canﬁeld, L. R., Vest, R. E., Watts, R., Tarrio, C., Kuehne, M., and Wurz, P. (1998). First solar EUVirradiances obtained from SOHO by the CELIAS/SEM.

Solar Physics , 177:161–173.Kane, R. P. (2002). Correlation of solar indices with solar EUV ﬂuxes.

Solar Physics , 207:17–40.Keil, S. L., Henry, T. W., and Fleck, B. (1998). NSO/AFRL/Sac Peak K-line Monitoring Program. InBalasubramaniam, K. S., Harvey, J., and Rabin, D., editors,

Synoptic Solar Physics , volume 140 of

Astronomical Society of the Paciﬁc Conference Series , pages 301–308.Kondrashov, D. and Ghil, M. (2006). Spatio-temporal ﬁlling of missing points in geophysical data sets.

Nonlinear Processes in Geophysics , 13:151–159.Lean, J. L. (2000). Short Term, Direct Indices of Solar Variability.

Space Science Reviews , 94:39–51.Lean, J. L., Livingston, W. C., Heath, D. F., Donnelly, R. F., Skumanich, A., and White, O. R. (1982). Athree-component model of the variability of the solar ultraviolet ﬂux 145-200 nm.

Journal of GeophysicalResearch , 87:10307–10317.Little, R. J. A. and Rubin, D. B. (2002).

Statistical analysis with missing data . Wiley series in probabilityand statistics. Wiley, New York, 2nd edition.Mallat, S. (2008).

A Wavelet Tour of Signal Processing: the Sparse Way . Academic Press, London, 3rdedition.Pap, J. and Guhathakurta, M. (1992). Variability of Solar UV Irradiance and Its Relation to the Variabilityin Coronal Green Line Index and Equivalent Width of He Line at 1083nm. In K. L. Harvey, editor,

TheSolar Cycle , volume 27 of

Astronomical Society of the Paciﬁc Conference Series , pages 483–490.Parker, D. G., Ulrich, R. K., and Pap, J. M. (1998). Modeling solar UV variations using Mount WilsonObservatory indices.

Solar Physics , 177:229–241.Rabbette, M. and Pilewskie, P. (2001). Multivariate analysis of solar spectral irradiance measurements.

Journal of Geophysical Research , 106:9685–9696.Schmahl, E. J. and Kundu, M. R. (1994). Solar cycle variation of the microwave spectrum and total irradi-ance.

Solar Physics , 152:167–173.Schneider, T. (2001). Analysis of Incomplete Climate Data: Estimation of Mean Values and CovarianceMatrices and Imputation of Missing Values.

Journal of Climate , 14:853–871.Schneider, T. (2007). Comment on ”Spatio-temporal ﬁlling of missing points in geophysical data sets” byD. Kondrashov and M. Ghil, Nonlin. Processes Geophys., 13, 151-159, 2006.

Nonlinear Processes inGeophysics , 14:1–2.Tapping, K. F. and Detracey, B. (1990). The origin of the 10.7 CM ﬂux.

Solar Physics , 127:321–332.Viereck, R., Puga, L., McMullin, D., Judge, D., Weber, M., and Tobiska, W. K. (2001). The Mg II index:A proxy for solar EUV.

Geoph. Res. Lett. , 28:1343–1346.Wiener, N. (1964).