aa r X i v : . [ a s t r o - ph . E P ] D ec Of ‘Cocktail Parties’ and Exoplanets
I. P. Waldmann
University College London, Gower Street, WC1E 6BT, UK [email protected]
ABSTRACT
The characterisation of ever smaller and fainter extrasolar planets requires an intricate un-derstanding of one’s data and the analysis techniques used. Correcting the raw data at the 10 − level of accuracy in flux is one of the central challenges. This can be difficult for instruments thatdo not feature a calibration plan for such high precision measurements. Here, it is not alwaysobvious how to de-correlate the data using auxiliary information of the instrument and it becomesparamount to know how well one can disentangle instrument systematics from one’s data, givennothing but the data itself. We propose a non-parametric machine learning algorithm, basedon the concept of independent component analysis, to de-convolve the systematic noise and allnon-Gaussian signals from the desired astrophysical signal. Such a ‘blind’ signal de-mixing iscommonly known as the ‘Cocktail Party problem’ in signal-processing. Given multiple simulta-neous observations of the same exoplanetary eclipse, as in the case of spectrophotometry, we showthat we can often disentangle systematic noise from the original light curve signal without theuse of any complementary information of the instrument. In this paper, we explore these signalextraction techniques using simulated data and two data sets observed with the Hubble-NICMOSinstrument. Another important application is the de-correlation of the exoplanetary signal fromtime-correlated stellar variability. Using data obtained by the Kepler mission we show that thedesired signal can be de-convolved from the stellar noise using a single time series spanning severaleclipse events. Such non-parametric techniques can provide important confirmations of the exis-tent parametric corrections reported in the literature, and their associated results. Additionallythey can substantially improve the precision exoplanetary light curve analysis in the future. Subject headings: methods: data analysis — methods: statistical — methods: data analysis — tech-niques: photometric — techniques: spectroscopic
1. Introduction
The field of transiting extrasolar planets andespecially the study of their atmospheres isone of the youngest and most dynamic sub-jects in current astrophysics. Permanently atthe edge of technical feasibility, we have comefrom the first radial velocity and transit detec-tions (Mayor & Queloz 1995; Marcy et al. 1998;Charbonneau et al. 2000), via the first detec-tions of molecular features in hot-Jupiter atmo-spheres (Charbonneau et al. 2002) to ever moredetailed characterisations of multitudes of systems(Grillmair et al. 2008; Charbonneau et al. 2008;Snellen et al. 2010b; Bean 2011; Swain et al. 2008, 2009a,b; Tinetti et al. 2007, 2010). Withover 700 exoplanets discovered (Schneider et al.2011) and over 1200 exoplanetary candidates thatawait confirmation (Borucki et al. 2011), the fo-cus of interest shifts from the detection to thecharacterisation of smaller and smaller targets.The governing factor of this progression is theprecision at which we can control our instrumentand/or stellar systematics, hence the accuracywith which we can analyse the data.To minimise the impact of the systematicnoise components, different approaches havebeen proposed in the past. For space andground-based observations, eg. Spitzer and1ubble (eg. Agol et al. 2010; Beaulieu et al.2008, 2011; Charbonneau et al. 2002, 2005, 2008;Deming et al. 2007; Gillon et al. 2010; Grillmair et al.2008; Knutson et al. 2007a,b; Sing et al. 2011;Snellen et al. 2010b; Bean et al. 2011; Swain et al.2008, 2009a,b; Tinetti et al. 2007, 2010), in-strumental systematic noise has been approx-imated using parametric models, often basedon auxiliary information (optical state vectors)such as instrumental temperature, orbital inclina-tion, inter and intra-pixel positions of the point-spread-function. Using optical state vectors tode-correlate one’s data is an effective technique(Swain et al. 2008). However for instrumentsthat lack a calibration plan at the precision of10 − , the accuracy of the retrieved optical statevectors (e.g. sensor sampling rates) and the ad-equacy of the instrument model’s definition it-self become difficult to determine. Some of therecent controversy over results reported by var-ious teams can be attributed to this circum-stance (Knutson et al. 2011; Stevenson et al.2010; Beaulieu et al. 2011; Swain et al. 2008;Gibson et al. 2011; Pont et al. 2010; Hatzes et al.2010; Brunt et al. 2010).The situation is even further complicatedby brightness variability of the planet hostingstar, in particular for small, very active M-stars.Hence, it is important to work towards an al-ternative route to quantify or remove system-atic noise using non-parametric models that workas independent confirmations of existing results.Carter & Winn (2009), Thatte et al. (2010),Gibson et al. (2011) and Waldmann et al. (2011)have progressed towards non-parametric noisemodels and signal separation using wavelets, prin-cipal component, Gaussian processes and Fourierbased analysis.In this publication, we propose a new non-parametric method to separate systematic noisefrom the desired lightcurve signal. Given mul-tiple lightcurves, observed simultaneously us-ing spectrographs for example, we can disentan-gle our desired astrophysical signal from othertime-correlated or non-Gaussian systematic noisesources using un-supervised machine learningalgorithms. We furthermore explore the de-correlation of individual time series spanning sev-eral consecutive eclipse events. The importanceof this work lies with the fact that no additional knowledge of the system or the star is required,besides the observations themselves. Such non-parametric methods provide a potential new routeto de-correlate one’s date in the case where theinstrument does not feature an adequate calibra-tion plan, the quality of the auxiliary informa-tion of the instrument is non-optimal or the hoststar shows significant activity. Such blind de-convolution techniques provide new insight andpowerful validation of the established parametricinstrumental models reported in the literature.Here we will briefly introduce the more gen-eral theory of blind-source separation and proceedwith a description of the algorithm proposed. Theefficiency of said algorithm is tested with two syn-thetic models and two HST/NICMOS data setsavailable in the public domain and one Kepler(Q2 & Q3 data release) time series featuring strongtime-correlated stellar variability. Future publica-tions will focus on in-depth discussion of the pro-posed algorithm to specific data sets.
2. Background: the Cocktail Party Prob-lem
In this section we will briefly describe the fun-damental concepts on which the following analysisis based. Readers familiar with independent com-ponent analysis may proceed straight to section 3.Let us consider the analogy of three people talk-ing simultaneously in one room. The speech sig-nals of these people are denoted by s ( t ), s ( t )and s ( t ). In the same room are three micro-phones recording the observed signals x ( t ), x ( t )and x ( t ). The observed signals can be expressedin terms of the original speech signals: x ( t ) = a s ( t ) + a s ( t ) + a s ( t ) (1) x ( t ) = a s ( t ) + a s ( t ) + a s ( t ) x ( t ) = a s ( t ) + a s ( t ) + a s ( t )instead of assuming x ( t ) and s ( t ) to be proper timesignals, we drop the time dependence and assumethem to be random variables x k = a k s + a k s + ... + a kN s N , for all k = 1 , ..., N (2)2here a kl is a weighting factor (in this case thesquare of the distance of the speakers to the mi-crophone) and k, l = 1 , ..., N are some real coeffi-cients with N being the maximum number of ob-served signals. The individual time series can alsobe expressed in terms of vectors where bold lower-case letters denote column vectors and upper-caseletter denote matrices: x = As (3)where the rows of x comprise the individual timeseries, x k , and similarly s is the signal matrix ofthe individual source signals s l . A is the ’mix-ing matrix’ consisting of the weights a lk . Equa-tion 3 is also known as the instantaneous mix-ing model and often referred to as the classi-cal ’Cocktail Party Problem’ (Hyv¨arinen & Oja.2001; Hyv¨arinen 1999).The challenge is to estimate the mixing matrix, A and its (pseudo)inverse the de-mixing matrix, W , W = A − (4)given the observations contained in x without anyadditional prior knowledge of either A or s , or forsome methods without restrictions of A & s .Several algorithms have been proposed tofind the linear transformation of equation 3.Amongst these are principal component analy-sis (PCA) (Pearson 1901; Manly 1994; Jolliffe2002; Press et al. 2007; Oja 1992), factor analy-sis (FA) (Jolliffe 2002; Harman 1967), projectionpursuit (PP) (Friedman 1987; Huber 1985) andthe more recently developed independent compo-nent analysis (ICA) (Comon 1994; Hyv¨arinen1999; Hyv¨arinen et al. 1999; Hyv¨arinen & Oja2000; Hyv¨arinen & Oja. 2001; Comon & Jutten2000; Stone 2004).The underlying differences between PCA andFA on one hand and ICA and PP on the other arethe underling assumptions on the probability dis-tribution functions (pdfs) of the signals comprising x . The former group assumes the signals to fol-low: 1) a Gaussian distribution whilst the latterassume the signals to be, 2) predominantly non-Gaussian or sparse with specific signatures in thespectral domain (e.g. SMICA Delabrouille et al.2003). This results in significant differences in the way we estimate our signal components.1) If the observed signals comprising x fol-low Gaussian distributions, we can define theirstatistics by the first and second statistical mo-ments (mean and covariance) only. Algorithmssuch as PCA and FA find a linear transformationfrom the correlated observed signals, x , to a setof uncorrelated signals, s . In this case, uncor-relatedness is equivalent to mutual independenceand the source signals are at their most separated(please see Appendix A for a more in-depth dis-cussion on independence). Such a linear trans-formation is always possible and easily achievedusing, for example, single value decompositions(SVD) (Pearson 1901; Manly 1994; Jolliffe 2002;Press et al. 2007). An application of PCA toexoplanetary light curve de-trending is given byThatte et al. (2010).2) In the case of the observed signals follow-ing non-Gaussian distributions, significant infor-mation is contained in the higher statistical mo-ments (skew & kurtosis) and it can be shownthat uncorrelated signals (as produced by PCA &FA) are not necessarily mutually independent andhence not optimally separated from one another.Here uncorrelatedness is a weaker constraint thanindependence and it can be said that independentsignals are always uncorrelated but not vice versa(see appendix A). As a consequence using PCA orFA algorithms may only yield a partially separatedresult for non-Gaussian sources.It is important to note that most observed sig-nals, astrophysical or stellar/instrumental noise,are predominantly non-Gaussian by nature. Wecan also state that most of these signals should bestatistically independent from one another (e.g.an exoplanet light curve signal is independentof the systematic noise of the instrument withwhich it was recorded). These properties haveled to a surge in ICA based analysis methodsin current cosmology and extra-galactic astron-omy. Here ICA is used to separate the cosmicmicrowave background (CMB) or signatures ofdistant galaxies from their galactic foregrounds(e.g. Stivoli et al. 2006; Maino et al. 2002, 2007;Wang et al. 2010). Aumont & Mac´ıas-P´erez(2007) furthermore separates instrumental noisefrom the desired astrophysical signal. Other appli-cations include data-compression of sparse, largedata sets to improve model fitting efficiencies (e.g.3u et al. 2006; Delabrouille et al. 2003). In this publication, we focus on the applica-tion of ICA to exoplanetary lightcurve analysis.Let us consider multiple time series observationsof the same exoplanetary eclipse signal either inparallel, by performing spectrophotometry with aspectrograph, or consecutive in time (as explainedin section 5.3).Without excluding the most general case, letus focus on a time-resolved spectroscopic measure-ment of an exoplanetary eclipse. For most obser-vations, the signal recorded is a mixture of astro-physical signal, Gaussian (white) noise and sys-tematic noise components originating from instru-mental defects and other sources such as stellaractivity and telluric fluctuations. We can there-fore write the individual time series as sum of thedesired astrophysical signal, s a , systematic (non-Gaussian) noise components, s sn , and Gaussiannoise, s wn . We can now define the underlying lin-ear model of our time series data to be x ( t ) = a s a ( t ) + a s sn ( t ) + a s sn ( t ) + ... + s wn ( t )(5)or x k = a k s a + N sn X l =1 a kl s sn,l + N wn X l =1 a kl s wn,l (6)where N sn and N wn are the number of system-atic and white noise components respectively and N = N sn + N wn + 1 assuming only one com-ponent is astrophysical. The basic assumptions of ICA are that the ele-ments comprising s , s l , are mutually independentrandom variables with probability densities, p l ( s l ).We further assume that all (or at least one) ofthe probability densities, p l ( · ), are non-Gaussian.This non-Gaussianity is key since it allows thede-mixing matrix, W , to be estimated. Fromthe central limit theorem, we know that a con-volution of any arbitrary probability distributionfunctions (pdfs) that feature a formal mean and variance, asymptotically approaches a Gaussiandistribution in the limit of large N (Riley et al.2002). In other words, the sum of any two non-Gaussian pdfs (ie. p l ( · ) and p l +1 ( · )) is more Gaus-sian than the respective original pdfs. There-fore by maximising the non-Gaussianity of theindividual signals, we maximise their statisticalindependence. (Comon 1994; Hyv¨arinen 1999;Hyv¨arinen & Oja 2000; Koldovsk´y et al. 2006;Hyv¨arinen & Oja. 2001; Comon & Jutten 2000;Stone 2004). Although several measures of non-Gaussianityexist (we refer the reader to Cichocki & Amari(2002), Hyv¨arinen & Oja (2000), Hyv¨arinen & Oja.(2001) and Comon & Jutten (2000) for detailedsummaries), we here use the concept of ’negen-tropy’ (Brillouin 1953). Negentropy is derivedfrom the basic information-theoretical concept ofdifferential entropy. In information theory, in closeanalogy to thermodynamics, the entropy of a sys-tem is at its maximum when all data points are atits most random. In thermodynamics we measurethe distribution of particles, in information theoryit is the probability distribution of a random vari-able. From information theory we can derive thefundamental result that a Gaussian distributionhas the largest entropy among all random vari-ables of equal variance and known mean. Hence,by minimising the entropy of a variable, we max-imise its non-Gaussianity. For a random vector y ,with random variables y i , i = 1 , ..., n , the entropyis given byH( y ) = − Z p ( y )log p ( y )d y (7)where H( y ) is the differential or Shannon entropy(Shannon 1948) and p ( y ) is the pdf of the randomvector y . H( y ) is at its minimum when p ( y ) isat its most non-Gaussian. We can now normaliseequation 7 to yield the definition of negentropy:J( y ) = H( y gauss ) − H( y ) (8)where y gauss is a random Gaussian vector with thesame covariance matrix as y . Now y is at its mostnon-Gaussian when J( y ) is at its maximum. It isimportant to note that negentropy is insensitive toa multiplication by a scalar constant. This has the4mportant consequence of introducing a sign andscaling ambiguity into the retrieved signal compo-nents of s . We will discussed the implications ofthis limitation in section 4. In practice it is very difficult to calculate the ne-gentropy of a system and various methods were de-vised to approximate J( y ). The classic method isto measure the kurtosis of mean-subtracted y withunit variance. However, kurtosis is very prone todistortions by outliers in the data and hence lacksthe robustness required as measure of negentropy(Hyv¨arinen & Oja. 2001). To overcome this lim-itation, more robust measures of negentropy havebeen devised. One can approximate negentropy byequation 9 (Hyv¨arinen & Oja. 2001; Hyv¨arinen1999; Comon & Jutten 2000; Stone 2004)J( y ) ∝ (E[ G ( y )] − E[ G ( ν )]) (9)where ν is a random Gaussian variable with zeromean and unit variance and G is a non-quadraticcontrast function chosen to approximate the un-derlying probability distribution. There are usu-ally three types of contrast functions: G as gen-eral purpose function, G optimised for super-Gaussian (leptokurtic) distributions and G op-timised for sub-Gaussian (platykurtic) distribu-tions (Hyv¨arinen 1999; Hyv¨arinen & Oja. 2001;Comon & Jutten 2000): G ( y ) = 1 a log [cosh( a y )] (10) G ( y ) = − exp( − y / G ( y ) = 14 y The choice of contrast function is only impor-tant if one wants to optimise the performance ofthe ICA algorithm as it is done for the EFICA(Koldovsk´y et al. 2006) algorithm where choicesof contrast functions are tried iteratively.
Finally, we can maximise the negentropy givenin equation 9 by finding a unit vector w thatmaximises the non-Gaussianity of the projec-tion y i = w T x , so that J( w T x ) is at its maxi-mum. For the fixed-point FastICA algorithm, this can be achieved by the simple iterations scheme(Hyv¨arinen 1999; Hyv¨arinen & Oja 2000):1. Choose initial (i.e. random) weight vector w
2. Increment: w + = E[ x g ( w T x )] − E[ g ′ ( w T x )] w
3. Normalise: w = w + / || w + ||
4. Repeat steps 2 & 3 until convergedwhere g and g ′ are the first and second deriva-tives of the contrast function G ( · ). The itera-tion scheme above estimates only one weight vec-tor at a time, for estimating all non-Gaussiancomponents in parallel the iteration scheme andderivates of G ( · ) are given in appendix C.1 andin the standard literature (e.g. Hyv¨arinen 1999;Hyv¨arinen & Oja. 2001; Comon & Jutten 2000;Koldovsk´y et al. 2006). For a comprehensivesummary of other ICA algorithms please refer to(Comon & Jutten 2000). Throughout this paper,we use a variant of the FastICA algorithm. Projection Pursuit and Independent Compo-nent Analysis are inherently linked as both al-gorithms try to represent the most non-Gaussiancomponents in an multidimensional data set. Inthis sense, one can think of ICA as a variant ofPP with two major differences: 1) PP only es-timates one non-Gaussian component at a timewhilst ICA is the multivariate definition of PP andestimates all non-Gaussian components simultane-ously, 2) as opposed to ICA, PP does not need anunderlying data model and no assumptions aboutindependent components are made. If the ICAmodel holds, optimizing the ICA non-Gaussianitymeasures produce independent components; if themodel does not hold, then what we get are theprojection pursuit directions (Hyv¨arinen & Oja2000; Stone 2004). This is an important pointto make with regards to time-consecutive transitobservations, which break the underlying ICA as-sumptions otherwise.
3. The algorithm
Following from the discussion above, we can un-derstand the signal de-mixing to be a two stepprocess: de-correlation of the Gaussian compo-nents in the observed data using PCA, followed5 nput Data (X)Signal separation s t P C Lightcurve + Noise model fitting
Lightcurve Parameters
Post analysis
If Residual = Gaussian-> accept ResultsStatistics
Preprocessing
PCA whiteningOthers
ConvergenceCheckSignal separationSignal sorting G au ss i an de - c o rr e l a t i on N on - G au ss i an de - c o rr e l a t i on Fig. 1.— Flowchart illustrating the algorithm.The input data is first transformed into an orthog-onal set using PCA. The latent signals compris-ing the input data are then separated using theMULTI-COMBI algorithm which is followed by asignal sorting step. The separated lightcurve andsystematic noise components are then fitted to theoriginal data. by the de-correlation of non-Gaussian componentsusing ICA and WASOBI algorithms. The de-correlation of Gaussian components to form a newuncorrelated vectors set can be understood as pre-processing step to the ICA procedure. The algo-rithm proposed here consists of five main parts:1) Pre-processing of the observed data, 2) Signalseparation, 3) Signal reconstruction 4) Lightcurvefitting and 4) Post-analysis. Figure 1 lays out theindividual processing steps of the algorithm.
Similarly to section 2, the observed signals canbe expressed as k × m dimensional matrix X whereeach row constitutes an individual time series, x k ,with m number of data points. Multiple time se-ries observations are needed to separate the instan-taneously mixed non-Gaussian components. Theprocess of identifying statistical independent com-ponents is greatly simplified if the input signals toany ICA algorithm have previously been whitened(also referred to as sphering). Whitening is essen-tially a transformation of our input data matrix X into a mean subtracted, ( X − ¯X ), orthogonalmatrix ˜X , where its auto-covariance matrix, C ˜x ,equals the identity matrix, C ˜x = E [ ˜X ˜X T ] = I .The instantaneous mixing model for the whiteneddata is now given by ˜X = C − / ( X − ¯X ) = ˜AS (11)where C − / is the inverse square root of C x and ˜A the corresponding mixing matrix of ˜X . For amore detailed explanation see Appendix.This whitening is easily achieved by performinga principal component analysis on the data (seeAppendix). This step has two distinct advantages:1) It reduces the complexity of the un-whitenedmixing matrix, A , from n parameters, to n ( n − / ˜A (Hyv¨arinen & Oja.2001). 2) Using whitening by principal compo-nents, we can reduce the dimensionality of thedata-set by only maintaining a sub-set of eigen-vectors. This reduces possible redundancies of thecomponents comprising the data and prevents thelater to be employed ICA algorithm from over-learning for over-complete sets.We also like to note that any type of addi-tional linear signal cleaning or pre-processing step,such as those described by Carter & Winn (2009);6aldmann et al. (2011), are allowed. Linear datafiltering or cleaning can be understood as multi-plying equation 3 from the left with a linear trans-formation B to get: BX = BAS . The underlyingdata model assumed in this paper is hence not af-fected.
After the observed signals have successfullybeen whitened ( ˜X ), we estimate the mixing ma-trix of the whitened signal, ˜A , using the MULTI-COMBI algorithm (Tichavsk´y et al. 2006a).MULTI-COMBI comprises two complimentary al-gorithms, EFICA (Koldovsk´y et al. 2006) andWASOBI (Yeredor 2000). EFICA, an asymp-totically efficient variant of the FastICA algo-rithm (Hyv¨arinen 1999), is designed to sepa-rate non-Gaussian, instantaneously mixed sig-nals. WASOBI, on the other hand, is an asymp-totically efficient version of the SOBI algorithm(Belouchrani et al. 1997), and is geared towardsseparating Gaussian auto-regressive (AR) andtime-correlated components. It uses second-orderstatistics and can be understood to be similar toprincipal component analysis. The use of both al-gorithms is necessary since a real life data set willalways contain a mixture of both, non-Gaussianand Gaussian AR processes. For a more in-depthdiscussion of the algorithms employed here, welike to refer the interested reader to the appen-dices C.1 & C.2 and the original publications.The EFICA and WASOBI algorithm can beshown to be asymptotically efficient, i.e. the es-timators approach the Cram´er-Rao lower bound(Davison 2009). In other words, the algorithmsemployed here can be shown to converge to thecorrect solution given the original source signalsand in the limit of N → ∞ iterations. In realitythe number of iterations is finite and and imper-fect convergence results in traces of other sourcesto remain in the individual signals comprising S .We can hence state that equation 4 becomes W ≃ A − (12)A measure of this error is the deviation of WA (or ˜W ˜A for the whitened case) from the unitymatrix by inspecting the variance of its elements(Koldovsk´y et al. 2006; Hyv¨arinen & Oja. 2001).This leads to the concept of the interference-over- signal ratio (ISR) matrix. The ISR is the standardmeasure in signal processing of how well a givensignal has been transmitted or de-convolved froma mixture of signals. It can be understood asthe inverse of the signal-to-noise ratio (SNR). Thehigher the ISR for a specific signal, the less wellhas it been separated from the original mixture.For a real case example, A is unknown and theISR needs to be estimated. An analytic approxi-mation to the ISRs for the EFICA and WASOBIalgorithms are found in the appendices C.1 & C.2.Finally, we check the stability of the signal sep-aration by perturbing the input matrix ˜X by arandom and known matrix P to give ˜X = P ˜X = P ˜AS (13)We now re-run the MULTI-COMBI procedureusing ˜X as input and estimate P ˜A . Knowing P we can work backwards to obtain ˜A as we aredealing with a linear transformation. This step isrepeated several times to check the convergenceof the algorithm by inspecting the variation onthe mean ISR values of each separation attemptand the variations in consecutive estimations of ˜A directly. Once the mixing matrix, ˜A is estimated, weneed to identify which signals are astrophysical,which ones are white and which are systematicnoise. This is done in a two step process:1) We construct the estimated signal matrix, ˆS , and for its individual components ˆs l computethe Pearson correlation coefficient between ˆs l andthe first principal component of the PCA decom-position in section 3.1. For medium signal tonoise (SNR) observations, the first principal com-ponent (PC), ie. the one with the highest eigen-value associated to it, will contain the predomi-nant lightcurve shape. As previously discussed,the first PC is not perfectly separated from thesystematic signals and hence cannot be used di-rectly for further analysis but it is good enough touse it as lightcurve identification. The identifiedlightcurve signal is labeled ˆS a .2) Once the lightcurve signal is identified, weexclude this row from ˆS and proceed to classifythe remaining signals with respect to their non-Gaussianity (ie. systematic noise sources). Here7e use the Ljung-Box portmanteau test (see Ap-pendix and Brockwell & Davis 2006) to test forthe hypothesis that the time series is statisticallywhite (ie. Gaussian). This test was originallydesigned to check the residuals of auto-regressivemoving-average (ARMA) models for significantdepartures from Gaussianity. It is hence ideallysuited for our need to identify which estimatedsignal components are the desired non-Gaussianones.The identified non-Gaussian, systematic noise,signals are hence labeled ˆS sn and the remainingwhite noise signals ˆS wn to give ˆS a + ˆS sn + ˆS wn △ = ˆS = ˜W ˜X (14)where ˆS has dimensions k × l and ˆS a , ˆS sn , ˆS wn ,have dimensions of k × k × l sn and k × l wn respec-tively where l = P l sn + P l wn + 1. The de-mixingmatrix is given by ˜W = ˜A − .As previously mentioned, the components of ˆS have ambiguities in scaling and sign and can bethought to be similar to the eigenvectors of a prin-cipal component analysis with missing eigenvalues.Fortunately there exist two approaches to resolv-ing this degeneracy:1. In the case of ˆS a being well separated as in-dividual component, we can take ˆS a and thede-mixing matrix ˜W and only retain the rowcontaining the astrophysical signal compo-nent forming the row-vector ˜w a . We thenreconstruct the original data ˜X using onlythe separated signal component: ˜X a = ˜w − ˆS a = ˜w − ˜W ˜X (15)where ˜X a is the reconstructed whitened datawith all but the astrophysical signal compo-nents removed. Using equation 11, we cannow calculate the un-whitened matrix X a . X a = Z ( X − ¯X ) + ¯X (16) Z = ˜w − ˜W (17)Hence we can think of Z as a linear, optimalfilter for the signal component in X . Pleasenote that this linear filtering does not impairthe scaling information as this is re-instatedgoing from ˆS a to X a . 2. In the case of ˆS a not being well separatedbut other systematic noise components are,a different, more indirect approach can beused. Here, the systematic noise compo-nents, ˆS sn which do not contain sign orscaling information, are simultaneously fit-ted to the time series data (preferably out-of-transit data), x k . We therefore definethe systematic noise model for an individualtime series by M sn , M sn = OˆS sn (18)where O is a k × k diagonal scaling matrixof ˆS sn , which needs to be fitted iterativelyas free parameters in the following section. Having either filtered the data to obtain X a orconstructed the noise model M sn , we can now fitthe original time series, x k using the standard an-alytical lightcurve models (Mandel & Agol 2002;Seager & Mall´en-Ornelas 2003) in addition to thediagonal matrix O , if necessary. For the pur-pose of this paper, which focuses on blind-source-separation, we will restrict ourselves to demon-strating the feasibility of estimating O and onlyleave the transit depth as variable lightcurve pa-rameter. We use the analytic lightcurve model byMandel & Agol (2002) and a Nelder-Mead min-imisation algorithm (Press et al. 2007). For realdata applications, we advise the reader to useMarkov Chain Monte Carlo methods, or simi-lar, which have become standard in the field ofexoplanets and allow the estimation of the pos-terior probability distributions and their associ-ated errors (Bakos et al. 2007; Burke et al. 2007;Cameron et al. 2007; Ford 2006; Gregory 2011). Once the model fitting stage has been com-pleted, we are left with fitting residual, r k , i.e. r k = x k − m k . Several tests are useful to de-termine how well the signals have been removedfrom the original time series, x k . In the case of afull Markov Chain Monte Carlo fitting, the poste-rior distributions of the fitting parameters may beused to asses the impact of the remaining system-atic noise in the data when compared to a simu-8ated data set only containing white noise. Port-manteau tests on individual time series are use-ful to test for non-white noise signals and allowa measure of the overall performance of the al-gorithm (Brockwell & Davis 2006). Additionally,we can determine the Kullback-Leibler divergence(Kullback & Leibler 1951) of our residual’s prob-ability distribution function (pdf) to an idealisedGaussian case.For the simulations and real-data examples pre-sented in the following section, we have merelyplotted the autocorrleation functions (ACF) ofthe residuals obtained to determine whether fora given lag, these are within the 3 σ confidencelimit of the residual being dominated by white-noise (Brockwell & Davis 2006; Davison 2009).Here the ACF is given by: ACF ( k, τ ) = 1 m m − τ X t =1 ( r k,t − ¯ r k )( r k,t + τ − ¯ r k )(19) τ = 0 , , , , ...m/ m is the number of data points in the timeseries, τ the specific lag and the confidence inter-vals are given by ± σ/ √ m .
4. Simulations
In order to test the behaviour and efficiencyof the algorithm described above, we produced atoy model simulation with five observed signals:1) a secondary eclipse Mandel & Agol (2002)lightcurve; 2) a sinusoidal signal; 3) a sawtoothfunction; 4) a fourth order auto-regressive signalto simulate time-correlated signals; 5) Gaussiannoise with a full width half maximum (FWHM)of 0.01 magnitudes. The premixed signals are dis-played in figure 2. This gives us our signal matrix, S , which needs to be recovered later on. We havethen proceeded to mix the signals in figure 2 us-ing a random mixing matrix, A , to obtain our’observed signals’, X , in figure 3. For the sake ofcomparability we keep the mixing matrix A to bethe same for all simulations.We now subdivide the simulations to illustratethe two possible methods of the signal reconstruc-tion. Method X a using equation 16whilst Method M sn (equa- tion 18) simultaneously with the Mandel & Agol(2002) lightcurve. These two examples demon-strate that both techniques work equally well fora well behaved data set. In this example, we use the ’observed’ signalsin figure 3 as input to the algorithm. We how-ever do not perform a dimensionality reductionusing PCA since we are not dealing with an over-complete set in this example. The results of theseparation are shown in figure 5. Here the topthree, red lightcurves are the estimated system-atic noise components as identified by the algo-rithm. The fourth component is Gaussian noiseand the bottom is an inverse of the lightcurve sig-nal. It should again be noted here that the blind-source separation does not preserve the scaling northe signs of the signals in ˆS . However, when theoriginal data is reconstructed using only the sig-nal component, ˆS a , to obtain X a (equation 16),the scaling and sign informations are re-instated.For a well behaved data set, i.e. one that obeysthe instantaneous mixing model and has negligi-ble Gaussian noise in their signal components, itis therefore possible to re-construct the lightcurvesignal from the raw data as explained in section3.3. Figure 4 shows the top lightcurve of figure3 (blue circles) and overplotted the retrieved sig-nal component (red crosses) and offset below thesystematic noise component (black squares).As a useful by-product of the algorithm, weobtain the interference over signal matrices (ISR,equations C7 & C11 in the Appendix C) for boththe EFICA and WASOBI algorithms. These giveus valuable information on the efficiency at whichthe signals have been separated. Figure 6 showsthe Hinton diagrams of the EFICA and WASOBI ISR matrices. Here, the smaller the off-diagonalelements of the matrix, the better the signal sep-aration. In this example, the EFICA algorithmoutperforms the WASOBI one, which is to be ex-pected since all signals but one are non-Gaussian.
In the previous section, we have shown that inthe case that the astrophysical component ˆS a iswell separated as individual signal, we can create9 .47 0.48 0.49 0.5 0.51 0.520.9511.05 Phase Rel. flux
Fig. 2.— Simulated input signals before mix-ing. From top to bottom: 1) secondary eclipseMandel & Agol (2002) curve, 2) sinusoidal func-tion, 3) sawtooth function, 4) time-correlatedauto-regressive function, 5) Gaussian noise. Thescaling of the ordinate is identical for all subplots.
Rel. flux
Fig. 3.— The signals, S , in figure 2 were mixed us-ing a random mixing matrix A to obtain the ’ob-served signals’, X normalised to unity, shown inthis diagram. The algorithm takes the lightcurvesin this diagram as starting values. No further in-put is provided or assumptions on the underlyingsignals made. The scaling of the ordinate is iden-tical for all subplots. a filter for the raw data that directly filters thelightcurve signal from the noise. However, in mostreal data applications, ˆS a , is not perfectly sepa-rated but the components of ˆS sn may be more so.In this case we can construct the noise model M sn given by equation 18 and the diagonal elements of O are fitted as described in section 3.4. The start-ing position of the algorithm is the same as forthe previous example (figure 3). The model fit ofthe first lightcurve in figure 3 and its residuals areshown in figure 7. The autocorrelation functionfor 100 lags is plotted in figure 8. All but two lagsare within the 3 σ confidence limit that the resid-ual is white noise dominated, indicating that allsignals have been removed effectively.Finally we simulate the convergence propertiesof both EFICA and WASOBI under varying whitenoise conditions. Here we repeatedly run the al-gorithm until signal separation is completed andrecord the mean ISRs of the source separation. Weperformed this simulation 300 times for Gaussiannoise FWHMs varying from 0.0 - 0.3 magnitudes(figure 7 has a FWHM gauss = 0.01) and every ISRmeasurement reported is the mean of 10 iterations.Figure 9 summarises the results. Here, the redcircles represent the mean ISR or the EFICA al-gorithm and the blue crosses that of WASOBI.It can clearly be seen that for this example theEFICA algorithm outperforms WASOBI and onaverage reaches lower ISR values. We can furthernote that the blind source separation is not sig-nificantly affected by the magnitude of the whitenoise and performs well under difficult signal tonoise conditions.
5. Application to data
The previous examples illustrated the twomethods available to correct the observed data:
Method 1 : Filtering the astrophysical signal out ofthe systematic noise or
Method 2 : fitting a system-atic noise model to the raw data. Here we applythese techniques to two primary eclipse data setsof HD 189733b and XO1b recorded by the NIC-MOS instrument on the Hubble Space Telescopeas well as a single time-correlated time series ob-tained by the Kepler spacecraft .10 .47 0.48 0.49 0.5 0.51 0.520.990.9920.9940.9960.99811.0021.0041.0061.0081.01 Phase
Rel. flux
Fig. 4.— Results of the blind-source separation.The blue circles present the the first lightcurveof the raw data X , the red crosses the retrievedsignal component, X a , and the black squares thesystematic noise component X sn . Rel. flux
Fig. 5.— Results of the blind-source separation.The top three signals in red were identified bythe algorithm to comprise the systematic noisemodel, ˆS sn . The 4th signal was correctly iden-tified to be Gaussian noise and the bottom to bethe lightcurve signal. Note that the blind-source-separation does not preserve signs nor scaling ofthe estimated signals. The scaling of the ordinateis identical for all subplots. ISR−EFICA max: 0.17886
ISR−WASOBI max: 0.034883
Fig. 6.— Hinton diagram of the EFICA and WA-SOBI interference-over-signal matrices for Exam-ple 1. The polygon areas are normalised to thehighest value in the matrix (given in the bottomcorners). The smaller the off-diagonal elements ofthe matrix, the higher the signal separation effi-ciency of the algorithm. In this case we can seethe EFICA algorithm to perform better than theWASOBI one.
Rel. flux
Fig. 7.— showing the raw lightcurve (first row infigure 3, blue) normalised to unity, with the modelfit (red) overlaid and the fitting residuals plottedunderneath (black).11
20 40 60 80 100−0.15−0.1−0.0500.050.10.15 Lags A u t o c o rr e l a t i on Fig. 8.— showing the auto-correlation functionfor 250 lags (red). The 3 σ confidence limits thatthe observed residual is normally distributed areshown in blue. All but two lags are within the con-fidence limits, strongly suggesting that the resid-ual is dominated by white noise and correlationswere efficiently removed. ISR
Fig. 9.— showing the mean interference over sig-nal ratios (ISRs) for both the EFICA (red circles)and WASOBI (blue crosses) algorithms for Ex-ample 1. In this example, the EFICA algorithmclearly outperforms WASOBI by reaching lowerISR values. Both algorithms are stable even un-der low signal to noise conditions.
First presented by Swain et al. (2008), thisdata set of the primary eclipse of HD 189733b wasrecorded using HST/NICMOS in the G206 grismsetting spanning five consecutive orbits. TheHST-pipeline calibrated data were downloadedfrom the MAST archive and the spectrum wasextracted using both standard IRAF routines aswell as a custom built routine for optimal spec-tral extraction. Both extractions are in good ac-cord with each other but the custom built routinewas found to yield a better signal to noise andwas subsequently used for all further analysis. Abinning of 10 spectral channels ( ∼ µ m) wasused resulting in 10 light curves across the G206grism band. Figure 10 shows the obtained time se-ries which serve as input to the algorithm. It canbe seen that each time series is strongly affectedby instrument systematics propagating from theblue side of the spectrum (bottom light curve)to the red with varying intensity and even sign.Swain et al. (2008) showed that these systemat-ics are correlated to instrument state vectors suchas orbital phase, relative positions and angles ofthe spectrum on the detector, instrument tem-perature, etc. We can hence expect that thesesystematics are statistically independent from therecorded astrophysical signal (the light curve) andit should therefore be in principle possible to de-correlate the signal.We here demonstrate the de-trending on an in-dividual light curve at ∼ µ m (8 th one down infigure 13). All time series in figure 13 were takenas input to the algorithm described above to esti-mate the de-mixing matrix ˜W , the astrophysicalsignal vectors, ˆS a and the systematic noise vec-tors, ˆS sn . The interference over signal (ISR) ma-trix indicated the good separation of four maincomponents figure 11 with the rest of the com-ponents being classified as predominantly Gaus-sian or weakly systematic. The existence of morethan one Gaussian component ( l wn >
1) indicatesthat the set is overcomplete. However since thedata-set is small enough, no PCA dimensionalityreduction was performed. After the algorithm hasidentified the correct astrophysical signal, it pro-ceeded to reconstruct the light curve using both http://archive.stsci.edu/ http://iraf.noao.edu/ Method 1 : The astrophysical signal was filteredusing equations 15 & 16. Figure 12 shows theraw light curve (blue circles) with the de-trendedtime series, X a underneath (green squares). Su-perimposed light curves were computed usingMandel & Agol (2002) with orbital parameterswere taken from Winn et al. (2007) and limb-darkening parameters from Claret (2000). It isclear that the de-trended light curve is an improve-ment to the raw time series but that systematicsstill remain in the data. This is further illustratedby plotting the autocorrelation function of themodel-fit residual in figure 16 (red circles). Here,residual correlation can be observed in particularat low lags. This is a consequence of the astro-physical signal, ˆS a , being well separated but asshown in figure 11 (component 1), there remainssome weak interference between the ˆS a and othervectors, which is a consequence of equation 12 andto be expected for real data-sets. Method 2 : The second method is a less directapproach. Instead of filtering for the astrophysicalsignal directly, we try to construct a ’systematicnoise model’ that is then subtracted off the rawdata. Using equation 18 and a simplex downhill al-gorithm (Nedler & Mead 1965) we estimated thescaling matrix, O , by fitting the the systematicnoise vectors, ˆS sn to the four out of transit orbits.The scaled systematic noise vectors are shown infigure 14 which combine to form the systematicnoise model, M sn , in figure 13. It should be notedthat O is only a scaling matrix of the individualvectors as the scaling information is not preservedby the independent component analysis. Hence,relative intra and inter-orbit variations are pre-served. Figure 15 shows the corrected data bysubtracting the systematic noise model off the rawdata. Inspecting the fitting residual’s autocorrela-tion function in figure 16 (black circles) indicatesthe residual to be statistically white and a maxi-mal de-correlation of the data has been achieved. Originally presented by Tinetti et al. (2010),the primary eclipse of XO1b was observed usingthe HST/NICMOS instrument in the G141 grismsetting. The HST-pipeline calibrated data wasdownloaded and the spectra extracted using thesame settings as for section 5.1. This yielded 10 −0.06 −0.04 −0.02 0 0.02 0.04 0.060.90.920.940.960.981 Phase R e l a t i v e F l u x Fig. 10.— showing ’raw’, extractedHST/NICMOS light-curves of HD 189733bprimary eclipse. Light curves are offset for clarity.
ISR−EFICA max: 0.062571
ISR−WASOBI max: 0.036442
Fig. 11.— the Interference over Signal (ISR) ma-trix of the component separation for both theEFICA and the WASOBI algorithms. All val-ues were normalised with the maximum ISR =0.0626. Components 1, 3, 5 & 8 yielding the low-est ISR values and correspond to the astrophys-ical light curve signal (comp. 1) and the threemost prominent systematic noise vectors in fig-ure 14. Other components were identified as pre-dominantly Gaussian or weakly systematic by thepipeline.13 R e l a t i v e F l u x Fig. 12.— showing the raw-data light curve(blue crosses) and the corrected light curve (greensquares) offset below. In this example, we usedequations 15 & 16 as light curve filter. The sys-tematic noise components were reduced but resid-ual systematics remain in the final light curve. −0.05 0 0.050.980.9850.990.9951 Phase R e l a t i v e F l u x Raw dataNoise model
Fig. 13.— showing the same ’raw’ light curve asin Figure 12 (blue crosses) and the calculated sys-tematic noise model (red circles) offset below. −0.05 0 0.050.9850.990.9951 Phase R e l a t i v e F l u x Fig. 14.— Individual systematic noise vectors, ˆS sn , of HD 189733b, with the appropriate scaling.Combined they form the systematic noise modelin figure 13 (red circles). −0.05 0 0.050.9750.980.9850.990.9951 Phase R e l a t i v e F l u x Fig. 15.— showing the de-trended data by sub-tracting the noise model of the raw data.14ight curves and which serve as input to the algo-rithm, see figure 17. Similar to the HD 189733bthe algorithm retrieved four main components, thelight curve signal and three main systematic noisecomponents. The ISR matrix can is shown in fig-ure 18. We now proceeded to de-trending the lightcurve at the very red end of the spectrum (firstfrom top in figure 17) as it, after visual inspec-tion, exhibits the most prominent systematics ofthe 10 time series. Light curve fits assumed limb-darkening and orbital parameters by Burke et al.(2010).
Method 1 : Figure 19 shows the raw time seriesand the de-trended light curve using equation 16.The light curve is significantly de-trended but sys-tematics remain in the data as also shown by theautocorrelation function (red circles) in figure 23.
Method 2 : As described in previous sections,figure 21 shows the retrieved systematic noise vec-tors and figure 22 features the ’raw’ data withthe combined systematic noise model (red) under-neath. The autocorrelation function of the modelfit residual is shown in figure 23 (black crosses)and shows a factor 2 improvement on the de-correlation in the lower lags.Figure 20 compares the de-trended light curvesof method 1 and method 2 and shows the residualof method 1 - method 2 (black crosses). There islittle difference between both methods indicatingthat the signal separation for this data-set is closeto its maximum with the data being partially de-correlated. This is in contrast to the HD 189733bexample where a near perfect de-correlation wasachieved and can be attributed to the systemat-ics being mostly wavelength invariant in the caseof XO1b. In other words, systematic noise compo-nents which have a constant weighting throughoutthe data set cannot be de-correlated using ICA orPCA methods, which is to be expected followingequations 1 - 3. In the previous examples we have shown thatspectroscopic datasets can be de-correlated effec-tively. We test here how well the proposed al-gorithm can de-correlate consecutively observeddata-sets. This is of particular interest in caseswhere no multi-channel data are available andthe time series data are contaminated with time- A u t o c o rr e l a t i on Fig. 16.— showing the autocorrelation functionfor 100 lags of the fitting residual in figure 12(red squares) and figure 15 (black circles). Theblue lines signify 3 σ limits for a Gaussian distri-bution. The fitting residual of figure 12 showshigh amounts of residual correlation, particularlyat lower lags whilst the fitting residual of figure 15follows a Gaussian distribution. −0.04 −0.02 0 0.020.90.920.940.960.981 Phase R e l a t i v e F l u x Fig. 17.— showing ’raw’, extractedHST/NICMOS light-curves of HD 189733bprimary eclipse. Light curves are offset for clarity.15
ISR−EFICA max: 0.02956
ISR−WASOBI max: 0.089858
Fig. 18.— same than for figure 11. The light curvevector (component 1) shows residual interferencewith other vectors for both EFICA and WASOBIalgorithms. Overall the EFICA algorithm outper-forms WASOBI. −0.04 −0.02 0 0.020.970.980.991 Phase R e l a t i v e F l u x Fig. 19.— showing the raw-data light curve(blue crosses) and the corrected light curve (greensquares) offset below. In this example, we usedequations 15 & 16 as light curve filter. The sys-tematic noise components were reduced but resid-ual systematics remain in the final light curve. −0.04 −0.02 0 0.020.9650.970.9750.980.9850.990.9951 Phase R e l a t i v e F l u x Fig. 20.— showing the de-trended data usingmethod 1 (top blue circles) and method 2 (bottomgreen squares) offset from each other. Both resultsshow little differences between them as seen by theresidual of method 1 - method 2 (black crosses). −0.04 −0.02 0 0.020.9940.9960.9981 Phase R e l a t i v e F l u x Fig. 21.— Individual systematic noise vectors, ˆS sn , of XO1b, with the appropriate scaling. Com-bined they form the systematic noise model in fig-ure 13 (red circles).16 R e l a t i v e F l u x Fig. 22.— showing the same ’raw’ light curve as inFigure 19 (blue crosses) and the calculated system-atic noise model using the systematic noise vectorsin figure 21. A u t o c o rr e l a t i on Fig. 23.— showing the autocorrelation functionfor 100 lags of the fitting residual for method 1 (redsquares) and method 2 (black circles). The bluelines signify 3 σ limits for a Gaussian distribution.The fitting residual of method 1 shows residualcorrelation, particularly at lower lags whilst thefitting residual of method 2 is by a factor of twobetter de-correlated in the lower lags. correlated noise, be it stellar or instrumental. Asopposed to the previous examples, where severaltime series, x k , were observed simultaneously, herewe take a single time series covering several con-secutive eclipse features and cut the time seriesinto segments spanning equal lengths over eacheclipse event. Using these segments as inputs tothe algorithm clearly violates the underlying as-sumptions of the independent component analy-sis, as the mixing is not instantaneous. In thiscase, the ICA analysis can be understood as aProjection Pursuit (PP) analysis, see section 2.2.4and (Hyv¨arinen & Oja 2000; Stone 2004; Huber1985). Here the ICA algorithm, in the absence ofa working ICA data-model, will try to extract asmany non-Gaussian components as possible andreturn the rest of the data in its original form.This is very similar to Projection Pursuit, wherethe data is not described by an underlying datamodel at all but only the most non-Gaussian com-ponent is retrieved. In other words, we can onlyexpect to retrieve the eclipse signal component, ˆS a , with any degree of accuracy. As a result wewill not be able to retrieve systematic noise com-ponents, ˆS sn , and we can only use M ethod , we identified four main periodicallyrecurring signals in the data-set. Choosing thesecond strongest feature with a period of 0.040915days, we phase folded the data and cut the timeseries in 10 equally sized segments. As for theprevious examples we now took these time seriessegments as input to the algorithm.We performed our de-correlation as for the pre-vious examples but using Method 1 only. Figure 25shows the ISR matrix of the separation indicating http : //nsted.ipac.caltech.edu/applications/ET SS/Kepler index.html
17 relatively poor separation of the components buttwo (components 4 & 9). As discussed above, thisbehaviour is to be expected with the breaking ofthe instantaneous mixing model. Nonetheless, weobtained a clear feature (component 4) in our anal-ysis which is over plotted (red circles) on the mean,phase-folded data (blue crosses) in figure 26. Herethe de-correlated signal has a much reduced scat-ter compared to the mean of the phase-folded fea-ture, which indicates that much of the unwantedstellar variability has been removed. It is alsoclear from this figure that we are not dealing withan exoplanetary light curve but a stellar pulsa-tion feature. As expected, the remaining compo-nents returned from the algorithm (figure 27) arethe residuals of the input data minus the compo-nent shown in figure 26. Hence we only used thecomponent in figure 26 to re-construct the origi-nal time series. This was done by using equation16 on each segment of the time series, followed byadding the segments back together in the orderthey were originally split up.Figure 24 shows the original input data (bluecrosses) with the filtered signal (red circles) overplotted on top. In the bottom plot (a zoomedin version of the time series), it is clear that thedesired feature remains in the filtered time serieswhilst the contribution of other time-correlatedstellar noise is substantially reduced.
6. Discussion
In the previous sections we have shown thatfor a set of simultaneously observed time seriesdata (e.g. following an exoplanetary eclipse witha spectrograph) we can describe the data by aninstantaneous mixing model (equation 3). Thisallows the separation of non-Gaussian, time andspatially-correlated signals from one another. Thedegeneracy caused by not being able to retrievethe component’s signs or amplitudes can be cir-cumvented in two ways:
Method 1 ) The separatedsignals are used to construct a linear transforma-tion to filter the astrophysical signal from the orig-inally observed data and hence preserve all scalinginformation;
Method 2 ) The separated astrophys-ical signal is not used directly but instead all sys-tematic noise components are combined to form a‘systematic noise model’ which can then be usedto correct the original observed data.
180 190 200 210 220 230 240 250 2600.99811.002 270 280 290 300 310 320 330 3400.99811.002 200 205 210 215 220 2250.99811.002
Time (Days) R e l a t i v e F l u x Fig. 24.— Input time series (blue crosses) withfiltered signal using
M ethod
ISR−EFICA max: 0.092932
ISR−WASOBI max: 0.042835
Fig. 25.— the Interference over Signal (ISR) ma-trix of the component separation for both theEFICA and the WASOBI algorithms. All val-ues were normalised with the maximum ISR =0.09293. Components 4 and 9 are the best sepa-rated, with component 4 being the desired signalcomponent.18 R e l a t i v e F l u x Fig. 26.— showing the mean, phase-folded fea-ture (blue crosses) with the ICA filtered signalcomponent (red circles) over plotted. The ICA fil-tered signal shows a significant reduction in scatterand auto-correlative noise compared to the simplyphase folded data. −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.50.920.930.940.950.960.970.980.991 R e l a t i v e F l u x Phase
Fig. 27.— Addition components to the signal infigure 26 as calculated by the algorithm. We have explored the efficiency of the signal de-trending on two simulated and two HST/NICMOSdata sets with different types of systematic noisedue to different grisms. The simulations demon-strate the two methods of de-trending the data inan idealised case and explore the efficiency of thesignal separation in the presence of varying Gaus-sian noise in the data. In the instantaneous mixingmodel employed here, Gaussian noise sources areonly indirectly allowed and can interfere with theeffectiveness of separating non-Gaussian vectors.We tested this point by adding additional Gaus-sian noise components of variable amplitude to thesimulations but did not observe any significant re-ductions in the signal separation efficiency.We proceeded to analyse two HST/NICMOSdata sets: the primary eclipses of HD189733b andXO1b. For both data sets we find the
Method 2 toyield better results. In the case of HD189733b,we can achieve a near perfect de-correlation ofastrophysical signal and systematic noise and nofurther steps are necessary to the de-correlationprocess. A more in depth discussion of this dataset and HST/NICMOS systematics is beyond thescope of this publication. In the case of XO1b thede-correlation is significant but incomplete. Thedifference in maximum de-correlation achievablecan be attributed to the systematic noise sourcesbeing strong functions of wavelength in the case ofHD189733b whilst almost with constant weighting( a kl in equation 2) in the case of XO1b.Whenever systematics have constant weightingper channel observed ( x k ) and/or time, it becomesvery difficult for PCA or ICA based approaches tode-correlate the signal from the systematics. Hereauxiliary information of the instrument is neededto proceed with the de-correlation process. This isvery well possible in the case of dedicated instru-ments such as Kepler, as these have been specif-ically designed with such high precision and sta-bility measurements in mind (Borucki et al. 1996;Jenkins et al. 2010). For instruments that do notfeature the calibration plan required to further de-correlate with instrument state parameters, thesolution is far less obvious.We furthermore explored the de-correlation ofeclipse signals observed consecutively rather thanin parallel. We demonstrated, using Kepler data,that despite the formal violation of the ‘instan-taneous mixing model’, the proposed algorithm19s able to retrieve the desired signal componentwith good accuracy. Such an application is par-ticularly important for treating variability of thehost-star which can significantly impair the qual-ity of the final science result (eg. Czesla et al.2009; Boisse et al. 2011; Aigrain et al. 2011;Ballerini et al. 2011).It is furthermore interesting to note thatpre and post-processing steps (e.g. wavelets(Carter & Winn 2009), Fourier based techniques(Waldmann et al. 2011), de-correlation using in-strument state parameters (Swain et al. 2008)),do not break the instantaneous mixing model andcan be run in conjunction with ICA methods. Thismakes independent component analysis a verypowerful and versatile tool for non-parametric de-correlation of exoplanetary data sets.
7. Conclusion
In the light of searching and characterisingever smaller and fainter exoplanetary targets,the development of novel de-trending routines be-comes increasingly critical. Based on the conceptsof blind source deconvolution of instantaneouslymixed signals, we have presented a first step to-wards non-parametric corrections and data filtersthat do not require additional information on thesystematic noise of the instrument or stellar ac-tivity. Such algorithms have two important appli-cations:1) For instruments that lack a calibration planat the accuracy of 10 − in flux variation, whichis required for spectroscopy of exoplanetary atmo-spheres, the spectroscopic signatures become in-herently entangled and dependent on the methodused to correct instrument and other systematicsin the data. The de-correlation of spectroscopicdata was demonstrated using two HST/NICMOSdata sets.2) Detections of faint exoplanetary eclipses areoften made difficult by time-correlated activity ofthe host star. We demonstrated, using a singleKepler time series, that much of the stellar vari-ability can be removed in time series that spanseveral exoplanetary eclipse events.The algorithm proposed is a powerful tool forlightcurve de-trending, which can be used by itsown or in conjunction with any other type ofdata filtering or cleaning technique. This becomes an invaluable advantage for data analysis whenthe instrument’s response function is unknown orpoorly characterised.I.P.W. would like to thank Prof. E. Feigelson,the referee, Dr. G. Tinetti, Dr. F. Abdalla andDr. S. Fossey for comments and suggestions thathelped to greatly improve this paper. I.P.W. issupported by an STFC Studentship. REFERENCES
Aigrain, S., Pont, F., Zucker, S., 2011, arXiv:1110.1034Agol, E., Cowan, N. B., Knutson, H. A., Deming,D., et al., 2010, ApJ, 721, 1861Aumont, J. & Mac´ıas-P´erez, J., F., 2007, MNRAS,376, 739Bakos, G. ´A., Shporer, A., P´al, A., Torres, G.,Kov´acs, G. et al., 2007, ApJL, 671, L173Ballerini, P., Micela, G., Lanza, A. F., Pagano, I.,2011, A&A, submittedBean, J. L., , 2011, Nature, 478, 41Bean, J. L., D´esert, J.-M., Kabath, P., et al., 2011,ApJ, 743, 92Beaulieu, J. P.; Carey, S.; Ribas, I. and Tinetti,G., 2008, ApJ, 677, 1343Beaulieu, J. P., Kipping, D. M., Batista, V.,Tinetti, G., Ribas, I. , Carey, S. et al., 2010,arXiv,0909.0185Belouchrani A., Abed-Meraim K., Cardoso J.F.,Moulines E., 1997, IEEE Trans. Signal Process-ing, vol. 45, 434Boisse, I., Bouchy, F., H´ebrard, G., et al., 2011,A&A, 528, A4Borucki, W. J., Dunhm, E. W., Koch, D. G.,Cochran, W. D., et al., 1996, Astrophys. &Space Science 241, 111Borucki, W. J., Koch, D., Basri, G., et al., 2010,Science, 327, 977Borucki, W. J.,Koch, D. G., Basri, G. et al. 2011,ApJ, 736, 1920rillouin, L., 1953, J. of Applied Physics, 24, 1152Brockwell P.J., Richard A.D., 2006, ’Time Se-ries: Theory and Methods (second edition)’,Springer Verlag, ISBN: 1-4419-0319-8Bruntt, H., Deleuil, M., Fridlund, M., Alonso, R.,et al., 2010, A&A, 519, A51Burke, C. J., McCullough, P. R., Valenti, J. A.,Johns-Krull, C. M., Janes, K. A. et al., 2007,ApJ, 671, 2115Burke, C. J., McCullough, P. R., Bergeron, L. E.,et al., 2010, ApJ, 719,1796Caldwell, D. A., Kolodziejczak, J. J., Van Cleve,et al., 2010, ApJL, 713, L92Carter J.A., Winn J.N., ApJ, 2009, ApJ, 704, 51tCollier Cameron, A., Wilson, D. M., West, R. G.,Hebb, L., et al., 2007, MNRAS, 380, 1230Charbonneau, D., Brown, T., M., Latham, D., W.,Mayor, M., 2000, ApJ, 529, L45Charbonneau, D., Brown, T. M., Noyes, R. W.and Gilliland, R. L., 2002, ApJ, 568, 377Charbonneau, D. and Allen, L. E., Megeath, S. T.,Torres, G., Alonso, R., Brown, T. M., et al., A.,2005, ApJ, 626,523Charbonneau, D., Knutson, H. A., Barman, T.,Allen, L. E., Mayor, M., et al., S., 2008, ApJ,686,1341Cichocki A., Amari S., 2002, John Wiley & SonsInc., Adaptive Blind Signal and Image Process-ing, ISBN: 0471607916Claret, A., 2000, A&A, 363, 1081Comon, P., 1994, Signal Processing, 36, 287Comon, P., & Jutten, C., 2000, Handbook of BlindSource Separation: Independent ComponentAnalysis and Applications, Academic PressCzesla, S., Huber, K., F., Wolter, U., Schr¨oter, S.,Schmitt, H., M., M., 2009, A&A, 505, 1277Davison, A. C., 2009, Cambridge University Press,Statistical Models, ISBN: 9780521734493 Delabrouille, J., Cardoso, J.-F. & Patanchon, G.,2003, MNRAS, 346, 1089Deming, D., Richardson, L. J. & Harrington, J.,2007, MNRAS, 378, 148Doron E., Yeredor A., 2004, Proc. of ICA 2004,390Ford E. B., 2006, ApJ, 642, 505Friedman, J., H., 1987, J. of the American Statis-tical Association, 82, 249Gibson, N. P.,Pont, F. & Aigrain, S., 2011, MN-RAS, 411, 2199Gibson, N. P., Aigrain, S., Roberts, S., et al., 2011,arXiv:1109.3251Gillon, M. and Lanotte, A. A., Barman, T., Miller,N., Demory, B.-O., et al., J., 2010, A&A, 511,3Gregory, P. C., 2011, MNRAS, 410, 94Grillmair, C. J., Burrows, A., Charbonneau, D.,Armus, L., Stauffer, J., Meadows, V., vanCleve, J., von Braun, K. & Levine, D., 2008,Nature, 456, 767Harman, H., H., 1967, Modern Factor Analysis,University of Chicago Press, 2nd editionHatzes, A. P., Fridlund, M., Nachmani, G.,Mazeh, T., et al. 2010, arXiv: 1105.3372v1Huber, P., J., 1985, The Annals of Statistics, 13,435Hyv¨arinen A.,1999, IEEE Trans. on Neural Net-works, 10(3), 626Hyv¨arinen A., Pajunen, P., 1999, Neural Net-works, 12, 429Hyv¨arinen A., Oja, E., 2000, Neural Networks, 13,411Hyv¨arinen A., Karhunen J., Oja E., 2001, JohnWiley & Sons Inc., Independent ComponentAnalysis, ISBN: 0-471-40540-XJenkins, J. M., Caldwell, D. A., Chandrasekaran,et al., 2010, ApJL, 713, L8721olliffe I.T., 2002, Springer Verlag New York Inc.,ISBN: 0-387-95442-2Knutson, H. A., Charbonneau, D., Allen, L. E.,Fortney, J. J., et al., 2007, Nature, 447,183Knutson, H. A., Charbonneau, D., Noyes, R. W.,Brown, T. M., Gilliland, R. L., 2007, ApJ,655,564Knutson, H. A., Madhusudhan, N. , Cowan, N. B.,Christiansen, J. L., Agol, E., Deming, D., et al.,2011, arXiv: 1104.2901Koch, D. G., Borucki, W. J., Basri, G., et al.,2010, ApJL, 713, L79Koldovsk´y Z., Tichavsk´y P., Oja E., IEEE Trans.on Neural Networks, 17(5),1265Koldovsk´y Z., Tichavsk´y P., Oja E., 2005, Proc.ofIEEE/SP 13th Workshop on Stat. Signal Pro-cessingKullback, S., Leibler, R.A., 1951, Annals of Math-ematical Statistics 22, 79Lagarias, J.C., J. A. Reeds, M. H. Wright, & P.E. Wright, 1998SIAM Journal of Optimization,9(1), 112Lu, H., Zhou, H., Wang, J., et al., 2006, ApJ, 131,790Maino, D., Farusi, A., Baccigalupi, C., et al., 2002,MNRAS, 334, 53Maino, D., Donzelli, S., Banday, A., J., Stivoli, F.,Baccigalupi, C., 2007, MNRAS, 374, 1207Mandel, K.and Agol, E., 2002, ApJL, 580, L171Manly, B., J., F., 1994, ’Multivariate StatisticalMethods - A Primer’, 2nd Ed., Chapman & HallMarcy, G., W., Butler, R., P., Vogt, S., S., Fischer,D., Lissauer, J., J., 1998, ApJL, 505, 147Mayor, M., Queloz, D., 1995, Nature, 378, 355Nadaraya, E. A., 1964, Prob. & its Applic., 10,186Nelder, J. A.m Mead, R., 1965, Computer Journal,7, 308Oja, E., 1992, Neural Networks, 5, 927 Pearson, K., 1901, Phil. Mag., 2, 559Pont, F., Aigrain, S. & Zucker, S., 2010, MNRAS,411, 1953Press, W. H., Teukolsky, S. A., Vetterling, W. T.,Flannery, B. P., 2007, ’Numerical Recipes’,Cambridge Uni. Press, ISBN: 978-0-521-88407-5Redfield, S., Endl, M., Cochran, W. D. andKoesterke, L., 2008, ApJL,673,L87Riley, K. F., Hobson, M., P., Bence, S. J., 2002,’Mathematical Methods for Physics and Engi-neering’, Cambridge Uni. Press, ISBN: 0-521-89067-5Schneider, J., Dedieu, C., Le Sidaner, P., Savalle,R., Zolotukhin, I., 2011, A&A, 532, 79Seager, S. and Mall´en-Ornelas, G., 2003, ApJ, 585,1038Shannon, C., 1948, Bell System Tech. J., 27, 379Sing, D. K., Pont, F., Aigrain, S., et al., 2011,MNRAS, 416, 1443Snellen, I. A. G. and Albrecht, S. and de Mooij,E. J. W. and Le Poole, R. S., 2008, A&A, 487,357Snellen, I. A. G. and de Kok, R. J. and de Mooij,E. J. W. and Albrecht, S., 2010a, Nature, 465,1049Snellen, I. A. G. and de Mooij, E. J. W. and Bur-rows, A, 2010b, A&A, 513, 76Stevenson, K. B., Harrington, J., Nymeyer, S.,Madhusudhan, N., Seager, S., et al., 2010, Na-ture, 464, 1161Stivoli, F., Baccigalupi, C., Maino, D., Stomper,R., 2006, MNRAS, 372, 615Stone, J., V., 2004, Independent ComponentAnalysis: A tutorial Introduction, A BradfordBookSwain, M. R.,Vasisht, G. and Tinetti, G., 2008,Nature, 452, 329Swain, M. R., Vasisht, G., Tinetti, G., Bouwman,J., Chen, P., et al , 2009, ApJL, 690, L11422wain, M. R., Tinetti, G., Vasisht, G., Deroo, P.,Griffith, C., et al., D., 2009, ApJ, 704, 1616Tichavsk´y P., Doron E., Yeredor A., Gomez-Herrero G., 2006, Proc. EUSIPCO-2006Tichavsk´y P., Doron E., Yeredor A., Nielsen J.,2006, Proc. EUSIPCO-2006Tinetti, G.,Vidal-Madjar, A., Liang, M.-C.,Beaulieu, J.-P. , Yung, Y., et al., F., 2007, Na-ture, 448, 169Tinetti, G., Deroo, P., Swain, M. R., Griffith,C. A., Vasisht, G., et al., P., 2010, ApJL, 712,L139Thatte, A. and Deroo, P. and Swain, M. R., 2010,A&A, 523, 35Waldmann I.P., Drossart P., Tinetti G., GriffithC.A., Swain M., Deroo P., ApJ, in pressWang, J., Xu, H., Gu, J., An, T., 2010, arXiv:1008.3391v1Watson, G.S., 1964, Sanky Series A, 26, 359Winn, J. N., Holman, M. J., Henry, G. W., et al.,2007, AJ, 2007, 133, 1828Yeredor A., 2000, IEEE Sig. Proc. Letters, 7, 197
This 2-column preprint was prepared with the AAS L A TEXmacros v5.2.
A. Uncorrelatedness, orthogonality and independence of Gaussian and non-Gaussian signals
In Gaussian statistics, our probability densities are fully defined by the first and second statistical mo-ments, i.e. their means and covariances. Two random vectors, s l and s l + , are said to be uncorrelated whentheir covariance ( C s l , s l + ) is zero: C s l , s l + = E [( s l − E [ s l ])( s l +1 − E [ s l +1 ])] (A1)= E [ s l , s l +1 ] − E [ s l ] E [ s l +1 ] = 0where E [ s l ] is the expectation value of s l which can be approximated by the mean in this case by E [ s l ] ≈ M M X t =1 s l ( t ) (A2)with M being the number of data points in the time series.Furthermore, we define two random variables ( s l and s l +1 ) to be orthogonal, when both their expectationvalues, in addition to their covariance are zero:E[ s l ] = E[ s l +1 ] = C s l ,s l +1 = 0 (A3)We can always find an affine, linear transformation from a correlated set of variables to an orthogonalone.Finally, our two random vectors s l and s l +1 are independent from one another if and only if the joinedprobability distribution P ( s l , s l +1 ) of both signals are factorizable into the product of their marginal pdfs, P ( s l ) and P ( s l +1 ): P ( s l , s l +1 ) = P ( s l ) P ( s l +1 ) (A4)and satisfy the property E[ g ( s l ) h ( s l +1 )] = E[ g ( s l )]E[ h ( s l +1 )] (A5)where g ( s l ) and h ( s l +1 ) are absolutely integrable functions of s l and s l +1 respectively. From the defini-tion of independence in equation A5, we obtain the definition of uncorrelatedness (equation A3) in thespecial case where both s l and s l +1 are linear and are only defined by their covariances (i.e. no higherorder statistical moments) (Hyv¨arinen & Oja 2000; Hyv¨arinen & Oja. 2001; Riley et al. 2002). In otherwords, uncorrelatedness is a special case of independence. Uncorrelated Gaussian random variables are al-ways also independent and the definitions of uncorrelatedness, orthogonality (for zero mean) and statisticalindependence become identical. B. Preprocessing
The covariance matrix of X , C x , is given by C x = EDE T , where E is the matrix of eigenvectors and D the diagonal matrix of eigenvalues, D = diag ( d , d , ..., d n ). Using principal component analysis (PCA), wecompute E and D and the whitening matrix is hence the inverse square root covariance matrix C − / isthen given by equation B2 (Hyv¨arinen & Oja. 2001; Jolliffe 2002).24 X = C − / ( X − ¯X ) = ˜AS (B1) C − / = ED − / E T (B2)where ˜W △ = ˜A − and is the de-mixing matrix of the whitened observed signals ˜X . C. Blind source separation
At the heart of the algorithm lies the blind-source separation routine. To attain the demixing matrix ˜W , many different types and varieties of algorithms are being used in the literature. Here we will use the’Multi-COMBI’ algorithm developed by Tichavsk´y et al. (2006a) combining a fixed point high-order ICAalgorithm to separate non-Gaussian sources with a second-order statistics blind-source-separation (BSS)algorithm for separating auto-regressive (AR) sources. We will briefly outline these algorithms and explainhow it is applied to the whitened data ˜X obtained in section 3.1. C.1. Parallel FastICA and EFICA
In section 2.2 we briefly outlined the measures of non-Gaussianity, negentropy (equation 8), used through-out this paper and stated that negentropy can be approximated via the kurtosis of the random vector y orvia the use of contrast functions, equation 9 & 10. In section 2.2.3 we showed stated the iteration schemeto obtain a single independent component (IC) at a time. This is called a deflationary algorithm where thecomputed IC is subtracted from the data before the second IC is computed. Such an iteration scheme hasthe property of finding the ICs in the order of decreasing non-Gaussianity. However, the main drawbackof a serial computation of ICs is that estimation errors in the first ICs propagate in the extraction of laterICs via the orthogonalization step. This effect is cumulative and may signficantly impair weaker ICs. Thispredicament can be circumvented by estimating all ICs in parallel.Similar to the single unit iteration, the whitened demixing matrix ˜W is at its most mutually independentwhen the projection Y = ˜W T ˜X is at its most non-Gaussian. The FastICA fixed point iteration step is thengiven by ˜W + ← g ( ˜W ˜X ) ˜X T − diag[ g ′ ( ˜W ˜X ) N ] ˜W (C1)where ˜W + is the unnormalised next iteration of ˜W , N is an N x 1 vector of 1’s and g ( . ) and g ′ ( . ) are thefirst and second order derivatives of the nonlinear function G ( . ): g ( y ) = tanh( a y ) (C2) g ( y ) = y exp( − y / g ( y ) = y This is followed by a symmetric orthogonalisation step: ˜W ← ( ˜W + ˜W + T ) − / ˜W + (C3)Equations C1 & C3 are iterated until the result has converged.For a full derivation we refer you to Hyv¨arinen (1999) and Hyv¨arinen & Oja. (2001). Whereas theconvergence of the FastICA algorithm is often dependent on the non-linearity chosen by the user, the EFICA(Koldovsk´y et al. 2006) algorithm employed here is a variant of the above iteration scheme and allows for25ifferent non-linearities to be assigned adaptively to different sources. Koldovsk´y et al. (2006) showed thatEFICA is asymptotically efficient, ie. reaches the Cramer-Rao Lower Bound (CRLB) in an ideal case wherethe nonlinearity G ( . ) equals the score function.To assert a good degree of separation, we can define G as the gain matrix. For a perfectly estimatedde-mixing matrix, W , the gain matrix is equal to its identity matrix G = WA = I (C4)In signal processing, the performance of blind-source separation algorithms is usually measured by theinterference over signal ratio matrix, ISRISR kl = G kl G kk , k, l = 1 , , ..., d (C5)where k and l denote the observed and estimated sources. The ISR for and individual observed signal k isgiven by isr k = P dl =1 ,l = k G kl G kk , k = 1 , , ..., d (C6)However, the original mixing matrix, A , is not generally known for real data sets and equations C5 & C6are only useful in the case of simulations. Tichavsk´y et al. (2006a) have shown that the whole ISR matrixfor the EFICA algorithm can be approximated by
ISR
EFkl ≃ N γ k ( γ l + τ l ) τ l γ k + τ k ( γ l + τ l ) (C7) γ k = β k − µ k (C8) µ k = E [ˆ s k g k (ˆ s k )] τ k = | µ k − ρ k | ρ k = E [ g ′ k (ˆ s k ] β k = E [ g k (ˆ s k ]where ˆ s k and ˆ s l are the k’th and l’th observed and estimated signals of S in equation 3, g k ( . ) and g ′ k ( . ) thefirst and second derivative of G ( . ) for signal k and N is the number of signals estimated. Here it shouldbe mentioned that, of course, the true realisation of each ISR component is unknown and a mean-
ISR iscomputed leading to the best ’on average’ separation of the signals.
C.2. WASOBI
Whilst EFICA is optimised for the separation of instantaneously mixed, non-Gaussian sources, second-order statistics BSS algorithms rely on time-structure in the sources’ correlation function to estimate ˜W .A variety of algorithms exist in the literature, here we use a derivative of the popular SOBI algorithm(Belouchrani et al. 1997), WASOBI (Yeredor 2000; Tichavsk´y et al. 2006a) to separate Gaussian auto-regressive (AR) sources in the input data ˜X . Here, the blind source separation follows the same linearmodel as in equation 3 and the mixing matrix ˜A is estimated by a joint diagonalisation of the signals’autocorrelation matrices. The unknown correlation matrices of the observed signals for a given lag τ , R x [ τ ] R x [ τ ] △ = 1 N N X n =1 x [ n ] x T [ n + τ ] , τ = 0 , ..., M − R x [ τ ] = ˜AR s [ τ ] ˜A T , ∀ τ (C10)where R s [ τ ] △ = E [ s [ n ] s T [ n + τ ]] are the source signals’ diagonalised correlation matrices (Yeredor 2000).Hence, if the correlation matrices are diagonal, ie. the off-diagonal components are zero, the separatedsignals can be said to be independent from each other. The SOBI & WASOBI algorithms estimate ˜A as thejoint diagonoliser of a set of correlation matrices. Similar to the EFICA code, we can define an asymptoticestimate of the ISR matrix
ISR
W Akl ≃ N φ kl − φ kl φ lk σ k R l [0] σ l R k [0] (C11) φ kl △ = 1 σ k M − X i,j =0 a il a jl R k [ i − j ] (C12)where k and l denote the observed and the estimated sources, { R k [ τ ] } M − τ =0 is the covariance sequence of the k -th source, σ k is the variance of the source and { a il } M − i =0 are the auto-regression coefficients of the l -thsource (Tichavsk´y et al. 2006a). C.3. Multi-COMBI
The algorithms introduced above are highly complementary to each other. Whilst EFICA has an asymp-totically efficient performance in separating non-Gaussian instantaneous mixtures, WASOBI is asymptoti-cally efficient in separating Gaussian time-correlated signals. Both these properties are necessary since a realdata set will have both of the aforementioned properties and its components would hence not be optimallyde-mixed if one would only employ one type of algorithm. MULTI-COMBI (Tichavsk´y et al. 2006a) uses aclustering technique in which both algorithms are run on the set of unseparated sources ˜X and their inter-ference over signal matrices, ISR EF and ISR WA , are estimated. The signals are then clustered dependingon whether their specific ISR kl is lower for the EFICA or WASOBI case. Then, the process is repeateduntil all clusters are singeltons, ie. only contain one signal per cluster, and the signals are hence optimallyseparated. D. Convergence check
From the MULTI-COMBI algorithm, we obtain the estimated signal matrix ˆS , an overall ISR matrix aswell as final
ISR EF and ISR WA . Since the algorithms used here use fixed-point convergence techniques,the problem of non-repeatability of the separation process is less than for neural network based approaches.However, it is common sense to check the stability of the result obtained and to estimate the error on ˆS .In order to estimate the stability of the convergence, we perturb the unknown mixing matrix A witha random and known mixing matrix P to give a new mixing matrix A = PA and equation 3 becomes: X = PAS = A S . This is equivalent to multiplying the whitened signal ˜X with P˜X = P ˜X = PC − / ( X − ¯X ) = ˜A S (D1)We re-run the separation step and estimate A . Since P is known, we can reconstruct the original mixing-matrix and compare it with the new result. In the scope of an automated algorithm, the sum of all terms of ISR A is compared to the sum of ISR A2 and the result is reported.To identify the stochastic nature of the retrieval we furthermore re-run the separation step with the samewhitened signal, ˜X , akin to a Monte Carlo simulation. We perform i realisations (where i = 10 −
100 typically)27nd use the de-mixing matrices ˜W i to construct mean noise models later on. This way, we propagate thesignal separation error to the model-fitting in a coherent manner. E. Signal separation
In order to identify the non-white (i.e. systematic) signals in our estimated signal matrix ˆS , we use theLjung-Box portmanteau test (Brockwell & Davis 2006). The test statistic, usually denoted by Q , is definedby summing the normalised autocorrelations of the individual time series, ˆs l over a range of lags: Q = n ( n + 2) m X τ =1 ˆ ρ τ m − τ (E1)where ˆ ρ τ is the autocorrelation at lag τ and m is the number of observations in the time series. Thehypothesis of the time series being solely white noise is rejected if Q is bigger than a pre-specified fractionof the chi-squared distribution Q > χ − α,h (E2)where χ − α,h is the α -quantile of the chi-squared distribution with h degrees of freedom (Brockwell & Davis2006). Here we take α = 0 ..