cissa(): A MATLAB Function for Signal Extraction
ccissa() : A
MATLAB
Function for Signal Extraction ∗ Juan B´ogalo a, † , Pilar Poncela a,b , Eva Senra c a Universidad Aut´onoma de Madrid, Spain b IBiDat, Univ. Carlos III de Madrid, Spain c Universidad de Alcal´a, Alcal´a de Henares, Spain
February 4, 2021
Abstract cissa() is a
MATLAB function for signal extraction by Circulant Singular Spec-trum Analysis, a procedure proposed in [2]. cissa() extracts the underlying signalsin a time series identifying their frequency of oscillation in an automated way, byjust introducing the data and the window length. This solution can be applied tostationary as well as to non-stationary and non-linear time series. Additionally, inthis paper, we solve some technical issues regarding the beginning and end of sampledata points. We also introduce novel criteria in order to reconstruct the underlyingsignals grouping some of the extracted components. The output of cissa() is theinput of the function group() to reconstruct the desired signals by further groupingthe extracted components. group() allows a novel user to create standard signalsby automated grouping options while an expert user can decide on the number ofgroups and their composition. To illustrate its versatility and performance in sev-eral fields we include 3 examples: an AM-FM synthetic signal, an example of thephysical world given by a voiced speech signal and an economic time series. Pos-sible applications include de-noising, de-seasonalizing, de-trending and extractingbusiness cycles, among others.
Keywords:
Circulant SSA, AM-FM signal,
MATLAB ∗ Financial support from the Spanish government, the contract grants MINECO/FEDER ECO2016-76818-C3-3-P, PID2019-107161GB-C32 and PID2019-108079GB-C22/AEI/10.13039/501100011033 areacknowledged. † Corresponding author. E-mail address: [email protected] (J. B´ogalo), [email protected] (P. Poncela), [email protected] (E. Senra). a r X i v : . [ s t a t . C O ] F e b Introduction
Signal extraction of the main features of a noisy time series is a key topic in manydisciplines. The interest in the extracted signals can range from just de-noising the timeseries, to de-seasonalizing or de-trending, to finding main oscillations or to the isolationof a group of frequencies of relevance to the analyst. Expert analysis, as well as theautomated and reliable processing of a large quantity of time series in real time is inhigh demand. Within the physical world, signal processing is used in voice or patternrecognition. In economics, Statistical Offices regularly produce de-seasonalized time seriesand estimations of business cycles widely used by analysts to assess, and forecast, themomentum of the economy. Circulant Singular Spectrum Analysis, CiSSA, is a non-parametric signal extraction procedure, which is able to reconstruct the original timeseries as the sum of the orthogonal components of known frequencies. CiSSA allows both,a detailed analysis of the frequency characterization of the data, as well as an automatedapplication in those cases where the frequencies of interest are known before hand.CiSSA, [2], relies on Singular Spectrum Analysis (SSA), a widely used signal extrac-tion alternative originating from the works of [3] and [5]. At the same time, but in andindependent way, research carried out in the former URSS proposed the ’Caterpillar’method, a variant of SSA, see [4]. Different SSA variants work in two stages: decom-position and reconstruction. In the decomposition stage, the original time series dataare transformed into a related trajectory matrix made of pieces of length L (the windowlength) and perform its singular value decomposition obtaining the so-called elementarymatrices. In the reconstruction stage, the elementary matrices are grouped and time seriesrecovered. The original proposal is known as Basic SSA and the method used to performthe singular value decomposition introduces the different alternative versions of SSA. Oneof the alternatives, Toeplitz SSA, developed by [10], computes the required eigenvectorsby diagonalizing the matrix of second moments associated with the trajectory matrix.Traditionally, the identification of the frequencies associated with the elementary matri-ces is made after the time series have been reconstructed. CiSSA, [2], substitutes thevariance-covariance matrix of the Toeplitz version by a related circulant matrix. CiSSAhas an immediate advantage because the properties of circulant matrices allow for theexact identification between the eigenvectors and eigenvalues obtained in terms of thefrequency of the components they represent.Regarding the software to perform SSA, [7] propose an R package to compute BasicSSA in univariate time series, while [8] extend it to multivariate time series. Both contri-butions are based on the software of the Gistat Group, Ltd. ( ).In this paper, we present several contributions not included in [2]. Firstly, we dis-cuss the reconstruction of the underlying signals both at the beginning and the end ofthe sample proposing several alternatives that best suit different applications. Secondly,2e introduce several grouping alternatives for different signal extraction problems thatrecognize automated, semi-automated and expert user options. Finally, we provide two MATLAB functions to perform CiSSA: cissa() executes the algorithm after introducingthe original data and the window length, L , producing as the outcome the exact de-composition into components of identified frequency and the power spectral density ofthe original time series; and group() allows the clustering of the identified componentsaccording to several degrees of automation that go from expert user to fully automatedbased on the needs of the analyst.The structure of the paper is as follows: In section 2, we introduce the CiSSA algo-rithm. In section 3, we describe its extensions. In section 4, we present the cissa() and group() MATLAB functions used to perform the algorithm and to form the desiredreconstructed signals. In section 5, we illustrate how CiSSA and the
MATLAB functionswork and show their performance in different fields and under alternative options. Finally,in section 6, we present our conclusions.
CiSSA is an algorithm that exactly decomposes the original time series into the sum of aset of oscillatory components at known frequencies. Its main advantage is that, becausethe components are identified by frequency, the user can group them according to theirneeds. Grouping interests can be, for instance, in the form of previously determinedfrequencies, or as a dimension reduction technique paying attention to the selection of acertain amount of explained variability, among others.In this section, we firstly present the main algorithm used to perform the frequencydecomposition and the clustering procedures.Let { x t } be a zero-mean stochastic process, t ∈ T whose realization of length T is given by x = ( x , ..., x T ) (cid:48) , where the prime denotes transpose, and let L be a posi-tive integer, called the window length, such that 1 < L < T /
2. The 2 stages in SSA(decomposition and reconstruction) are carried out in a four 4-step algorithm as follows:
We build a trajectory matrix by putting together laggedpieces of size L from the original series. The L × N trajectory matrix X , N = T − L + 1,is given by: For simplicity, we use the same notation for the stochastic process and for the observed time series.It will be clear from the context if we are referring to the population, or the sample. If it is not, we willexplicitly clarify this in the main text. = x x x ... x N x x x ... x N +1 ... ... ... ... ... x L x L +1 x L +2 ... x T . This step decomposes the trajectory matrix into ele-mentary matrices of rank 1 that will be associated with different frequencies. In order todo so, we find the second order moments of the series, build a related circulant matrix S C and calculate its eigenvalues and eigenvectors as functions of specific frequencies givenby: w k = k − L (1)for k = 1 , ..., L .In order to do so, from the following estimated autocovariances (cid:98) γ m = 1 T − m T − m (cid:88) t =1 x t x t + m , m = 0 , ..., L − , we compute the elements (cid:98) c m of the first row of a circulant matrix. (cid:98) c m = L − mL (cid:98) γ m + mL (cid:98) γ L − m , m = 0 , , ..., L − . Therefore, we can build the circulant matrix given by: S C = (cid:98) c (cid:98) c (cid:98) c ... (cid:98) c L − (cid:98) c L − (cid:98) c (cid:98) c ... (cid:98) c L − ... ... ... ... ... (cid:98) c (cid:98) c (cid:98) c ... (cid:98) c . Based on [9], the eigenvalues and eigenvectors of S C for k = 1 , ..., L are given respec-tively by: (cid:98) λ L,k = L − (cid:88) m =0 (cid:98) c m exp (cid:18) i2 πm k − L (cid:19) (2) u k = L − / ( u k, , ..., u k,L ) H (3)where H indicates the conjugate transpose of a matrix and u k,j = exp (cid:0) − i2 π ( j − k − L (cid:1) .Since the k -th eigenvalue in (2) is an estimate of the spectral density at w k given by(1), k = 1 , ..., L , the k -th eigenvalue and corresponding eigenvector is associated with thisfrequency.As a consequence, the diagonalization of S C allows us to write X as sum of elementary4atrices X k as: X = L (cid:88) k =1 X k = L (cid:88) k =1 u k w (cid:48) k , where w k = X (cid:48) u k . Given (1) and the symmetry of the powerspectral density, in this step, we group the elementary matrices by frequency.The symmetry of the power spectral density leads to (cid:98) λ k = (cid:98) λ L +2 − k , k = 2 . . . M , where M = (cid:98) L +12 (cid:99) . The corresponding eigenvectors given by (3) are complex, therefore, theyare conjugated complex by pairs, u k = u ∗ L +2 − k where v ∗ indicates the complex conjugateof a vector v . Then, u (cid:48) k X and u (cid:48) L +2 − k X correspond to the same harmonic period.To obtain the elementary matrices by frequency, we first form the groups of 2 elements B k = { k, L + 2 − k } for k = 2 , ..., M with B = { } and, B L +1 = (cid:8) L + 1 (cid:9) if L is even.Secondly, we compute the elementary matrix by frequency X B k as the sum of the twoelementary matrices X k and X L +2 − k , associated with eigenvalues (cid:98) λ k and (cid:98) λ L +2 − k andfrequency w k given by (1), k = 1 , ..., M , X B k = X k + X L +2 − k = u k u H k X + u L +2 − k u H L +2 − k X = ( u k u H k + u ∗ k u (cid:48) k ) X = 2( R u k R (cid:48) u k + I u k I (cid:48) u k ) X where R u k denotes the real part of u k and I u k its imaginary part. Notice that the matrices X B k , k = 1 , ..., M, are real.As a by-product, the algorithm also allows us to determine the share of explainedvariability at frequency w k given by sh k = 2 × (cid:98) λ k / (cid:80) (cid:98) λ k for k = 2 , ..., M and sh = (cid:98) λ / (cid:80) (cid:98) λ k for k = 1. Finally, this step transforms the matrices obtained instep 3 into M signals of the same length as the original series for frequencies w k given by(1), k = 1 , ..., M , denominated elementary reconstructed components by frequency.To transform X B k = ( (cid:101) x i,j ), k = 1 , ..., M , into a time series (cid:101) x ( B k ) t , we use the algoritmcalled diagonal averaging derived by [11]. This consists of averaging the elements of X B k over its antidiagonals, i.e., the hankelization of this matrix with the operator H ( · ): (cid:101) x ( B k ) t = H ( X B k ) = t (cid:80) ti =1 (cid:101) x i,t − i +1 , ≤ t < L L (cid:80) Li =1 (cid:101) x i,t − i +1 , L ≤ t ≤ N T − t +1 (cid:80) T − N +1 i = L − N +1 (cid:101) x i,t − i +1 , N < t ≤ T. We have created cissa() , a
MATLAB function, to implement the CiSSA algorithm5hat provides the M elementary reconstructed components by frequency and the powerspectral density of the original time series. We will illustrate how it works in section 4. In this section, we introduce new results with respect to [2] that allow us to answertwo technical issues when implementing the algorithm. The first is related to the end ofsample data points, a topic usually discussed in other signal extraction techniques but notin SSA until now. The second deals with further grouping of the estimated componentsin different contexts. This second issue has been approached in SSA in different ways andis still an open topic within this methodology. Because one of the advantages of CiSSArelies on matching principal components with frequencies, we can use this result to formgroups with the desired frequencies. Therefore, CiSSA can be used to extract signals atdesired frequencies, i.e., acting as a band-pass filter. We have also introduced some newalternatives in order to form the groups, either in an expert user, semi-automated or fullyautomated way.
Notice that the number of times that a data point appears in the trajectory matrix dependson its proximity to the left upper, and right lower, corners of the matrix. Therefore,in step 4 of the algorithm, reconstruction, we may use a different number of elementswhen reconstructing each point of the unobserved extracted signals as we move to thetwo aforementioned corners of the elementary matrices by frequency. These two cornersrepresent the two extremes of the data, the beginning and the end of the sample. Asa consequence, the reconstruction at the extremes can be rather unstable because it isdone with very few elements. Notice that, for both extremes t = 1 and t = T , it is donewith only one element. To overcome this problem, we propose two alternative solutionsdepending on the type of signals. The MATLAB function, extend() , gives the possibilityof implementing these different alternatives through a parameter that can be specified bythe user as an optional input to cissa() . First solution: autoregressive forecasting/backcasting.
This consists of adjust-ing an autoregressive (AR) process to the first difference of the time series. The order ofthe AR process is large enough to pick up the dynamics of signals of very different nature.In particular, we use p = T / p being the order of the AR process. We can makeforecasts with this auxiliary AR process and afterwards integrate them in order to recoverthe level of the observed data. The process is similar at the beginning of the sample, butbackcasts are used instead of forecasts. The number of forecasts used in either extremeis the window length L . This option is of general use for any time series.6 econd solution: Mirroring. This proposal goes in line with that of [6] and consistsof mirroring the time series around the extremes. In this way, after we reach the end ofthe series at T , we add data points considering x T + j = x T − j +1 , j = 1 , ..., T . A similarsolution is applied at the beginning of the sample with x − j = x j , j = 1 , .., T . Thisextension can be used for stationary time series, and also works well for certain types ofnon-stationary time series such as amplitude and frequency modulated (AM-FM) signalsbut does not necessarily work well for trending time series. The outcome of the automated CiSSA algorithm is a set of M time series, one for eachfrequency, w k given by (1), k = 1 , · · · , M . Because the components are already identifiedby frequency, afterwards the user can group them according to their needs. The MATLAB function, group() , gives the possibility of implementing different grouping strategies afterexecuting cissa() . Alternatives range from a completely expert user option defining thegroups after analyzing the data, incorporating semi-automated options where the userintroduces simply a parameter related to the amount of variability they want to explain,or the relevance of the psd values they want to introduce, and even to providing a totallyautomated option devoted to economic time series.
Users in Economics.
Under this option economists only need to introduce thenumber of observations per year and automatically obtain the default signals typicallyrequired in a time series of economic activity: trend, business cycle and seasonality.
Expert user.
With this option the user introduces their own groups according to theirprevious knowledge of the problem or attending to their analysis of the power spectraldensity obtained in psd as an output from cissa() . Dimension reduction.
In this case, the group() function retains as many compo-nents as necessary to achieve a pre-specified amount of explained variability. It producesa single reconstructed time series as the sum of these components. Furthermore, thisfunction also provides the selected frequencies.
Largest psd grouping.
Now, the group() function considers the empirical distri-bution of the eigenvalues and selects those over a percentile specified by the user. Thefunction, again, produces a single reconstructed series as the sum of these componentsand provides the selected frequencies and the amount of variability explained.7
MATLAB functions
There are two main functions available to the user: cissa() and group() . The group() function needs to be run after cissa() and cannot be run on its own because it uses asinputs the output of cissa() . Firstly, we describe cissa() below.
The cissa() function performs CiSSA. Given an input series x t and a window length, L , it returns the M elementary reconstructed components (cid:101) x ( B k ) t by frequency given by(1), k = 1 , ..., M , and the estimated power spectral density of the input time series. Thesyntax of the function is: [Z, psd] = cissa(x,L,H) where the inputs of the function are: • x : Columm vector containing the original data. • L : Window length. • H : Optional. A parameter related to the characteristics of the time series and thekind of extension made at the beginning and end of the sample. It can take thefollowing values: – H=0 (Default) Autoregressive extension. – H=1
Mirroring. – H=2
No extension.The parameter H is the input of the internal extend() function. The option H=0 extendsthe data using an autoregressive forecasting/backcasting. It is indicated for any timeseries. The option
H=1 corresponds to mirroring the time series. It can be used withstationary time series and it also works well for AM-FM signals. Finally, the option
H=2 does not extend the data. It is suitable for deterministic time series. After performingCiSSA, the function returns the following outputs: • Z : Matrix whose M columns are the elementary reconstructed components by fre-quency. • psd : Column vector with the estimated power spectral density estimated at fre-quencies w k , k = 1 , ..., L through the eigenvalues of the circulant matrix of secondmoments. 8 .2 The group function Once we have extracted the components (saved in Z ) with cissa() , we can use the group() function to reconstruct the desired signals. The second MATLAB function thatwe provide uses as input arguments the two outputs from cissa() and an additionalargument to specify how we want to perform the grouping. All in all, this
MATLAB function groups the reconstructed components by frequency obtained with CiSSA intodisjoint subsets and computes their share of the corresponding power spectral density.The syntax of the function is: [rc, sh, kg] = group(Z,psd,I) where the inputs of the function are: • Z : Matrix whose M columns are the elementary reconstructed components by fre-quency obtained with cissa() . • psd : Column vector with the power spectral density estimated at frequencies w k , k =1 , , ..., L , given by (1) obtained with cissa() . • I : This input argument has four different options:1. A positive integer. It is the number of data per year in an economic timeseries. The function automatically computes the trend (oscillations with periodgreater than 8 years), the business cycle (oscillations with period between 8and 1.5 years) and a seasonal component.2. A cell array. Each cell contains a row vector with the desired values of k to beincluded in a group, k = 1 , , ..., M . The function computes the reconstructedcomponents for these groups.3. A number in the interval (0, 1). This number represents the accumulatedshare of the psd achieved with the sum of the share associated with the largesteigenvalues. The function computes the original reconstructed time series asthe sum of these components.4. A number in the interval (-1, 0). This number is a percentile (in positive) ofthe empirical distribution of the values of psd . The function computes theoriginal reconstructed time series as the sum of the reconstructed componentsby a frequency whose psd is greater than this percentile.After running this function, we obtain as output the following items: • rc : Matrix whose columns are the reconstructed components for each group or sub-set of frequencies. In the case of economic time series this matrix has 3 columns9orresponding to the trend, business cycle and seasonal components, respectively.Under option 2 (cell array), it returns as many columns as groups introduced. Andunder options 3 and 4 (largest psd and dimension reduction respectively), this ma-trix has just one column. • sh : Column vector with the share of the power spectral density for each group. • kg : Cell array where each cell contains a row vector with the values of k belongingto a group. Option 1) produces 3 groups, option 2) gives the groups introduced bythe user and options 3) and 4) produce a single group. In option 3), the values of k are sorted according to the share of total psd of their corresponding eigenvalues. In this section we illustrate the use of the functions cissa() and group() through threeexamples. Firstly, we will consider a case where the frequencies of oscillation of theunderlying components, are known to the analyst. This is an AM-FM synthetic signaland shows the performance of CiSSA in an expert user way. Secondly, we will analyze aphysical signal from voiced speech recognition illustrating how cissa() can process thesetype of signals. Within this example, we illustrate the use of options 2, 3 and 4 of thefunction group() through the analysis of the voiced signal of the word ”Hello”. Finally,we present an economic example to show the performance of cissa() and group() in anautomated way.
We have simulated a synthetic signal to illustrate the expert user alternative to definethe groups in the group() function. The original signal is the sum of two components:the first is an AM signal and the second is both an amplitude and frequency modulated(AM-FM) signal, similar to that used in [1]. Let x ( t ) = x ( t ) + x ( t ) where x ( t ) = a ( t ) cos ( w t ) and x ( t ) = a ( t ) sin ( w , t + w , t T ) with a ( t ) = 1 + 0 . cos ( w A, t ) and a ( t ) = 0 . . cos ( w A, t ). See that x ( t ) shows a linearly increasing frequency in timesince w ( t ) = w , + w , t T . We assume that the signal generated has a time lapse of 10seconds with sample frequency equal to 1000 observations per second. In this particularexample, we have chosen the following values f = 100 Hz , f , = 10 Hz , f , = 40 Hz , f A, = 1 Hz and f A, = 5 Hz , being w . = 2 πf . . The simulated data are generated with thefollowing MATLAB code:
T = 10; % Time lapse, in secondsfs = 1000; % Sample frequencyt = (1:T*fs) (cid:48) ; The simulated signal is saved in x . The time lapse between 2.2 and 2.6 seconds isrepresented in the left panel of Figure 1.To apply CiSSA we need to choose a value for L . It is advisable to choose a value of L which is a multiple of the oscillation periods. In this case, the period associated with x ( t ) is 10 and the periods associated with the frequency modulated signal x ( t ) rangefrom 20 to 100, and we have chosen L = 200. The input parameter H (used to extend thedata at the beginning and end of the sample) takes the value of 1 since this is an AM-FMsignal. The code for signal extraction applying our proposal is: L = 200;H = 1; % Extension by Mirroring[Z, psd] = cissa(x,L,H);
The right hand side of Figure 1 shows the estimated power spectral density whichis the output saved on the variable psd . From the latter, we can see two outstandingfeatures. Firstly, a peak at the normalized frequency 0.1 that corresponds to x ( t ). And,secondly, a plateau reflecting a constant gain between the normalized frequencies 0.01 and0.05 associated with the modulated signal x ( t ). Therefore, we should identify 2 sets inorder to make the groups: the first contains the isolated normalized frequency linked to x ( t ) that, according to (1), corresponds to k = 21; and, the second group includes thenormalized frequencies of the plateau that characterizes x ( t ) and are associated with theindexes k = 3 , ..., Z contains all the reconstructed signals by frequency, for k = 1 , ..., group() . The remaining two inputs are psd and I , where thelatter contains the values of k needed for grouping: I = cell(2); % 2 groupsI { } = 21; % Component with the isolated frequency in the first groupI { } = (3:11); % Components 3 to 11 in the second group[rc, sh, kg] = group(Z,psd,I); The output has 3 outcomes: rc that is used to save the reconstructions of x ( t ) and x ( t ); the 2 × sh that takes the values 95 and 4 indicating the percentage of Notice that, due to symmetry, we only present normalized frequencies up to 0.5. igure 1 . Graph of the synthetic series (a) and its estimated power spectral density (b). Figure 2 . Synthetic components (blue) and their corresponding CiSSA estimation (red).variability explained by each of the two reconstructed components, respectively; and,finally, in this particular case, since the grouping has been introduced manually by theuser kg = I .Figure 2 shows the reconstructed components stored in rc in red together with thecomponents generated in blue for the time interval between 2.2 and 2.6 seconds in orderto check the performance of the algorithm. As we can see, they are really close and CiSSAperforms quite well. 12 igure 3 . The original voiced signal ”Hello” (blue line) and its reconstruction (red line). As a second example to illustrate the performance and use of options 2 to 4 of input I in group() , we downloaded a voiced speech from the Cambridge dictionary with theword “Hello”. The goal is to find a unique group able to synthesize the original word,that could be similar to de-noising. Firstly, under the expert user option we decided onthe groups after looking at the values of the estimated spectral power density. Secondly,under the semi-automated largest psd grouping, we form one group with the elementaryreconstructed components with a frequency greater than a specified percentile of the powerspectral density. And thirdly, as an alternative semi-automated criterion, the dimensionreduction option returns one group that attains the desired share of explained variability.The voiced signal consists of 33000 observations and a sample frequency of 48000 Hzand is represented by the blue line in Figure 3.We ran cissa() with L = 6000, for the window length and used the default value for H , as the time series showed no stochastic trend and both options H=0 or H=1 are suitableoptions. The elementary reconstructed components by frequency output are stored in the M columns of matrix Z and the power spectral density in the vector psd . In the threecases we are going to consider just one group and the differences that arise in the way toselect it.Firstly, in an expert user option, we selected the components and introduced them asa group in group() . We did so by looking at the values of the estimated power density(dB) in psd , that is represented in Figure 4. As a first attempt, we formed the group with L is either a multiple or a fraction of the sample frequency. In this case, we chose a fraction becausethe sample length is smaller than the sample frequency. igure 4 . Power spectral density in dB (blue line) of the voiced signal ”Hello” andselected threshold (red line).the components with a positive value, that corresponded to normalized frequencies with psd over the black line in Figure 4. We stored the indexes k of those components withpositive power spectral density in I and used these values in group() . L = 6000;[Z, psd] = cissa(x, L);M = size(Z,2);k = (1:M);I = cell(1); % Define the number of groups (1 here)thres = 0; % Define a threshold for the psdI { } = k(10*log10(psd(k))>thres); % Select k with psd over the threshold[rc1, sh1, kg1] = group(Z,psd,I); The reconstructed signal is stored in rc1 . In sh1 we obtained the percentage ofexplained variability (74.3%) in the first attempt. Finally, in kg1 we saved the selectedcomponents that in this case coincided with I . The number of components with positive psd in dB was 29. As a result, the reconstructed signal with the previous groupingreproduces the sound / @"l@U /, leaving out the initial sound / h /. To incorporate the missingphoneme we change the threshold for selecting components. Choosing thres=-1.4 solvedthe problem. We can identify that the k linked to the sound /h/ is 228 that correspondsto the normalized frequency 0.0378. Figure 4 shows the estimated power spectral density(PSD) in dB with the above-mentioned threshold in red. The reconstructed signal (redline in Figure 3) quite accurately reproduces the sound / h@"l@U /.14n order to achieve the desired reconstruction in the previous example, we could al-ternatively introduce a group automatically formed by those components whose psd isover a percentile of the empirical distribution of the eigenvalues. To do so for the 95thpercentile, select I=-0.95 in group() as we show in the following piece of code: I = -0.95; % Share of explained variability[rc2, sh2, kg2] = group(Z,psd,I);
As in the previous case, rc2 stores the reconstructed signal, sh2 is now 96.4 % and kg2 contains the 150 indexes k of the selected components.Thirdly and finally, we illustrate a different option for grouping using the percentage ofaccumulated explained variability. Choosing at least 80 % of the explained variability, weintroduced this input in the group() function as a negative number in order to distinguishit from the psd largest percentile option as the following piece of code illustrates: I = 0.80;[rc3, sh3, kg3] = group(Z,psd,I);
Now rc3 stores the reconstructed signal that is shown in red in Figure 3. The exactexplained variability saved in sh3 is 80.1 % and kg3 contains the 41 indexes k of theselected components. This solution also corresponds with the second selected thresholdneeded in order to incorporate the sound /h/ to reproduce the word ”hello”. Indicators of economic activity typically show regular patterns such as trend, seasonal-ity and business cycle that are key to assessing the momentum of the economy. In thisrespect, Statistical Offices regularly produce seasonally adjusted time series and analystsdemand business cycle estimations of a myriad of economic indicators. Given that foreconomists the frequencies of interest are known beforehand, CiSSA can be used to es-timate these signals in an automated way. In this section, we explain and illustrate theautomated extraction of these signals by means of the use of cissa() and group() underthe automated option.The Spanish Energy Consumption, in millions of kilowatts, serves as a representativeof time series of monthly economic activity. The source is REE (acronysm of Red El´ectricade Espa˜na, the sole transmission agent and operator of the Spanish electricity system).Figure 5 shows the evolution of the log-transformation of this indicator from January1970 until October 2020. It clearly shows trend, business cycle oscillations and seasonality.The sample size is T = 610 observations. SSA does not require log-transformation of the time series but, as typically done in business cycleanalysis, we take logs in order to make the figures comparable with those usually handled by economicanalysts. igure 5 . Log Monthly Spanish Energy Consumption (in blue) and CiSSA estimatedtrend (dotted line in red). Figure 6 . Power spectral density estimation for the monthly Spanish Energy Consump-tion.To perform cissa() we just needed to decide on the parameter L . In this case wehave considered L=288 . The main rationale for this selection for L are that it has tobe a number below T / . cissa() output produced a matrix of 145 elementary reconstructed components by frequency Z and a vector psd that stores the CiSSA estimated power spectral density at normalized Notice that using this reasoning, alternative possible values for L could be L = 96 or L = 192. Themain results are robust for the choice of L . k = 1 , , ..., group() with I=12 (the number ofobservations per year). Before analyzing the output produced, we highlighted how thegroups are formed as fluctuations are assigned to each group according to their periodicity.It means that the trend is constructed as the sum of the components corresponding toperiodicities greater than 8 years. Therefore, the group for the trend comprises k = 1( ∞ ) and k = 2 , / / k = 4 , ..., / , / , / , / , /
12 and 6 /
12 andthat correspond with the values of k = 25 , , , ,
121 and 145. The values of the powerspectral density associated with the corresponding normalized frequencies for the selectedvalues of k are marked with a large dot in Figure 6.The code for the previous analysis is very simple and is shown in the following lines: L = 288;[Z, psd] = cissa(log(x), L);I = 12; % Number of observations in the year[rc, sh, kg] = group(Z,psd,I);
The output of group() is composed of three items. Firstly, rc contains three timeseries that correspond to the trend, business cycle and seasonal component as they areplotted in Figures 5, 7 and 8, respectively. Secondly, sh provides the share of the powerspectral density for each group that are 88.2 % for the trend, 8.5 % for the businesscycle and 1.5 % for the seasonal component. Finally, kg stores the indexes of the selectedcomponents in each group.The analysis of the extracted signals reveals that the estimated trend, see the dottedred line in Figure 5, is able to reflect the long-term behaviour of the Spanish log ofEnergy Consumption. With regard to the business cycle, Figure 7 shows that it matchesthe recessions as dated by the OECD for the Spanish economy. Finally, Figure 8 showsthe changing seasonal behaviour of the Spanish energy consumption where the drops insummer are less pronounced in August, and have even become positive in June and Julyin recent times. 17 igure 7 . Estimated Business Cycle for the monthly Spanish Energy Consumption andOECD announced recessions (shadowed areas). Figure 8 . Estimated seasonality for the monthly Spanish Energy Consumption.
The cissa() and group() functions can be used to perform Circulant Singular SpectrumAnalysis in a univariate time series. The only input needed are the data, the windowlength chosen for the analysis, and the option desired for grouping. The output gives thereconstructed components as well as the estimation of the power spectral density and theselected components for grouping. CiSSA can be applied in a variety of fields such as thosedescribed in the Applications section. The functions are useful for the analysis of station-ary and non-stationary time series. They are also very useful for extracting components ina non-linear time series and due to its non-parametric nature, is not necessary to specify18 model. Additionally, it can be run in an automated way when the frequency of oscil-lation is known beforehand, as is usually the case in economic analysis. The functionscan be downloaded from the web page https://es.mathworks.com/matlabcentral/fileexchange/84094-cissa-circulant-ssa-under-matlab?s_tid=srchtitle or https://github.com/jbogalo/CiSSA . References [1] G. Biagetti, P. Crippa, A. Curzi, S. Orcioni, and C. Turchetti. Analysis of the emgsignal during cyclic movements using multicomponent am-fm decomposition.
IEEEJournal of Biomedical and Health Informatics , 19(5):1672–1681, 2015.[2] J. B´ogalo, P. Poncela, and E. Senra. Circulant singular spectrum analysis: A newautomated procedure for signal extraction.
Signal Processing , 179, 2021. doi: 10.1016/j.sigpro.2020.107824.[3] D. Broomhead and G. King. Extracting qualitative dynamics from experimentaldata.
Physica D: Nonlinear Phenomena , 20(2-3):217–236, 1986.[4] D. Danilov and A. Zhigljavsky.
Principal components of time series: the “Caterpillar”method . Saint Petersburg Press, Saint Petersburg, Russian, 1997.[5] K. Fraedrich. Estimating the dimension of weather and climate attractors.
Journalof the Atmospheric Sciences , 43(5):419–432, 1986.[6] F. Gianfelici, G. Biagetti, P. Crippa, and C. Turchetti. Am-fm decomposition ofspeech signals: An asymptotically exact approach based on the iterated hilbert trans-form.
IEEE Workshop on Statistical Signal Processing Proceedings , pages 333–337,2005.[7] N. Golyandina and A. Korobeynikov. Basic singular spectrum analysis and forecast-ing with r.
Computational Statistics & Data Analysis , 71:934–954, 2014.[8] N. Golyandina, A. Korobeynikov, A. Shlemov, and K. Usevich. Multivariate and 2dextensions of singular spectrum analysis with the rssa package.
Journal of StatisticalSoftware , 67(2):1–78, 2015. doi: 10.18637/jss.v067.i02.[9] P. Lancaster.
Theory of Matrices . Academic Press, New York, US, 1969.[10] R. Vautard and M. Ghil. Singular spectrum analysis in nonlinear dynamics, withapplications to paleoclimatic time series.
Physica D: Nonlinear Phenomena , 35:395–424, 1989. 1911] R. Vautard, P. Yiou, and M. Ghil. Singular-spectrum analysis: A toolkit for short,noisy chaotic signals.