Boosting Factor-Specific Functional Historical Models for the Detection of Synchronisation in Bioelectrical Signals
David Rügamer, Sarah Brockhaus, Kornelia Gentsch, Klaus Scherer, Sonja Greven
BBoosting Factor-Specific Functional Historical Models for theDetection of Synchronisation in Bioelectrical Signals
David R¨ugamer
Department of Statistics, LMU Munich, Munich, Germany.
Sarah Brockhaus
Department of Statistics, LMU Munich, Munich, Germany.
Kornelia Gentsch
Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland.
Klaus Scherer
Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland.
Sonja Greven
Department of Statistics, LMU Munich, Munich, Germany.
Summary . The link between different psychophysiological measures during emotion episodesis not well understood. To analyse the functional relationship between electroencephalography(EEG) and facial electromyography (EMG), we apply historical function-on-function regressionmodels to EEG and EMG data that were simultaneously recorded from 24 participants whilethey were playing a computerised gambling task. Given the complexity of the data structurefor this application, we extend simple functional historical models to models including randomhistorical effects, factor-specific historical effects, and factor-specific random historical effects.Estimation is conducted by a component-wise gradient boosting algorithm, which scales wellto large data sets and complex models.
Keywords : factor-specific functional historical effect, functional data analysis, function-on-functionregression, gradient boosting, signal synchronisation a r X i v : . [ s t a t . A P ] M a y R¨ugamer et al.
1. Introduction
Bioelectrical signals such as electromyography (EMG), electroencephalography (EEG) or electro-cardiogram (ECG) are variations in electrical energy that carry information about living systems(Semmlow and Griffel, 2014). An appropriate analysis of bioelectrical signals, usually obtainedin the form of time series data, is a crucial point in many research areas, including (tele-)medi-cine, automotive technology, and psychology (Kang et al., 2006; Kaniusas, 2012). In the field ofcognitive affective neuroscience, a particular interest lies in the link of measured brain activityrecorded with the EEG, and peripheral response systems such as the heart rate or facial muscleactivity. In this context, our motivating study (Gentsch et al., 2014) investigated the coherencebetween emotion components. In componential emotion theory, an emotional episode is thoughtto be an emergence of coherent or temporally correlated changes in emotion components, suchas appraisals or facial expressions. This is referred to as synchronisation (Grandjean and Scherer,2009).
The emotion components data . In the study of Gentsch et al., brain activity (EEG) as wellas facial muscle activity (EMG) was simultaneously recorded. The data set at hand consists oftime series of equidistant observed time points for both EEG and EMG signals, eight differentstudy settings (conditions in a computerised gambling game) and participants. The traditionalapproach of analysing EEG and EMG data is to calculate the average signal for each participantacross all trials of one study setting. For EEG data, this is referred to as event-related potentialanalysis (see, e.g., Pfurtscheller and da Silva, 1999). Such an aggregation yields a reduced dataset of N = 8 · ·
384 = 73 , observed data points. At each of the N time points, measurementsare available for three EMG and EEG electrodes. Figure 1 depicts one EEG and EMG signal forone participant and all eight study settings with a common starting point of ms after stimulusonset.Efferent signals from the brain (signals originating from the brain) innervate or activate fa-cial muscles (see, e.g., Rinn, 1984). Therefore, it should be possible to trace back facial muscleactivity recorded with facial EMG to brain activity captured with EEG. As certain cognitive pro-cesses can be related to different brain areas and facial regions, our particular interest lies ininvestigating the link between a selected EEG electrode signal and a specific EMG signal. Weexpect any association between these two signals to (a) be time-varying, (b) exhibit a temporallag that is a priori unknown (even though a minimum lag can be inferred from the literature),(c) be specific to a study setting and / or (d) be only present during certain time intervals. Existing methods for detecting synchronisation . Previous approaches to detect synchrony inbrain activity and autonomic physiology data have mostly focused on coherence or cross-corre-lation. Examinations of EEG and EMG synchronisation can, inter alia, be found in Hollenstein oosting Factor-Specific Historical Models −10−505−2002040 F z ( EE G ) F r on t a li s m u sc l e ( E M G ) time [ms] v a l ue [ m i c r o v o l t] high control, high power, gainlow control, high power, gainhigh control, high power, losslow control, high power, losshigh control, low power, gainlow control, low power, gainhigh control, low power, losslow control, low power, loss Fig. 1: Example for one EEG signal (Fz electrode) and one EMG signal (frontalis muscle forraising eyebrows) of one participant, averaged over all trials for each of the eight possible gameconditions (line colours) and Crowell (2014); Mima and Hallett (1999); Brown (2000); Mima et al. (2000a,b); Grosseet al. (2002); Quiroga et al. (2002); Bortel and Sovka (2006); Hashimoto et al. (2010). Whilecoherence is a function of the frequency measuring the explained variance of one time series byanother time series in the frequency-domain, cross-correlation is a function of time, yielding thecorrelation of two time series for a given lag (see, e.g., Pawitan, 2005). With the aim to relatedifferent time points of two signals to each other, we focus on methods in the time-domain.Established methods are, however, concerned with the estimation of the association betweentwo observed time series rather than the analysis of a large number of time series observationsgiven in pairs of two signals. This applies for (cross-)correlation, which additionally does notprovide the possibility to take covariates into account, as well as for other methods such as the generalised synchronisation approach based on the state-space representation (Diab et al., 2013)or autoregressive times series approaches (see e.g. Ozaki, 2012). Furthermore, most of theseapproaches require the definition of a specific or a maximum time lag.
Function-on-function regression . As both the EEG and the EMG signal can be understoodas noisy observations of functional variables, function-on-function regression approaches offeranother possibility to describe and infer the relationship of such time series (see Morris, 2015,for a recent review). Function-on-function regression models adapt the principle of standardregression by allowing for a functional response as well as functional covariates. The so calledhistorical model (Malfait and Ramsay, 2003; Harezlak et al., 2007) is one possibility to explaina functional response Y ( t ) , t ∈ T = [ T , T ] with T , T ∈ R , using a linear effect of the complete R¨ugamer et al. history of a functional covariate X ( s ) , s ∈ T : E ( Y ( t ) | X = x ) = (cid:90) tT x ( s ) β ( s, t ) ds. (1) In contrast to the existing approaches discussed above, historical models allow us to relate agiven time point of one time series to more than one time point in [ T , t ] of another time series.Early core work on functional historical models is limited to historical models with only onefunctional covariate. A multitude of application possibilities are conceivable and historical mod-els have been used in different research areas including health and biological science (Malfaitand Ramsay, 2003; Harezlak et al., 2007; Gervini, 2015; Brockhaus et al., 2016b). Brockhauset al. (2016b) extended the framework of a simple historical model as in (1) to functional regres-sion models with a high number of functional historical effects and potentially further covariateeffects by utilizing gradient boosting for estimation.Alternative estimation procedures for flexible function-on-function regression models includ-ing historical effects are based on a mixed model representation (cf. Scheipl et al., 2015) orcomponent-wise gradient boosting (cf. Brockhaus et al., 2015, 2016b). These are implementedin the pffr -function of the R package refund (Huang et al., 2015) and in the R package FDboost (Brockhaus and R¨ugamer, 2016), respectively.
Proposed approach . In order to reflect the study design in this application, we extend func-tional historical models to historical effects that vary over one or two (penalised) categoricalcovariates to allow for subject-, setting- and subject-by-setting-specific effects. We provide math-ematical concepts for the construction of design matrices, penalty matrices as well as suitableidentifiability constraints. We integrate these concepts into the framework of Brockhaus et al.(2015, 2016b) and implement them for estimation via component-wise gradient boosting. Wealso speed up the estimation by making use of our particular model structure. By carrying outestimation with component-wise gradient boosting as in Brockhaus et al. (2015), our approachhas several advantages. In particular, it can fit multiple factor-, subject- as well as subject-by-factor-specific functional effects, which is not possible in alternative approaches for function-on-function regression such as implemented in the pffr -function in the R package refund (Huanget al., 2015). Furthermore, the algorithm allows for different loss functions and thus covers mod-els beyond mean regression (Kneib, 2013), e.g., median, robust or quantile regression. It candeal with high-dimensional data sets that often go hand in hand with multi-sensor bioelectricalsignal data collections, as well as settings with more covariates than observations. Our approachis able to find multi-modal effect surfaces and band effects, thereby covering special cases of timeseries approaches. In addition, we derive options to reduce computation time as well as memory oosting Factor-Specific Historical Models
FDboost , anextension of the model based boosting package mboost (Hothorn et al., 2016). The R code forour simulation as well as code and data for our application is provided in an online repository( https://github.com/davidruegamer/BoostingSignalSynchro ).
2. Functional response models and historical effects
After outlining the functional historical model in 2.1, we extend the model of Brockhaus et al.(2016b) to models with functional historical terms interacting with categorical covariates and torandom functional historical effects in subsection 2.2.
We focus on additive functional regression models of the form (Brockhaus et al., 2015, 2016b) ξ ( Y ( t ) | X = x ) = h ( x )( t ) = J (cid:88) j =1 h j ( x )( t ) , (2) where ξ is a transformation function for the conditional distribution of the functional response Y ( t ) , t ∈ T . In our application ξ is equal to the conditional expectation E , although it couldalso be e.g. the (pointwise) median or a quantile. The covariate set x comprises functionalobservations x ( · ) , . . . , x p x ( · ) and scalar covariates z , . . . , z p z with p := p x + p z . h j ( x )( t ) arepartial effects, which can depend on scalar as well as on functional covariates. In particular, thisgeneral model class includes models with one or more historical effects h j ( x )( t ) = (cid:90) u ( t ) l ( t ) x k j ( s ) β j ( s, t ) ds, (3) k j ∈ { , . . . , p x } , which can have general integration limits l ( t ) and u ( t ) , e.g., defined by l ( t ) = T and u ( t ) = t , l ( t ) = t − δ and u ( t ) = t or partial histories l ( t ) = t − δ l and u ( t ) = t − δ u , t >δ l > δ u > as in Harezlak et al. (2007). Functional historical effects are particularly suitedto settings where both response Y ( t ) and covariates X k j ( s ) are observed over the same timeinterval, s, t ∈ T .In practice, x k j ( · ) is observed on a grid s , . . . , s R and the integral as well as the smooth R¨ugamer et al. coefficient surface β j ( s, t ) in (3) must be approximated. We use numerical integration and atensor product spline basis expansion, respectively. For k = 1 , . . . , K x , l = 1 , . . . , K t define thebasis functions Φ sj,k ( s ) and Φ tj,l ( t ) for the s - and the t -direction of the coefficient surface β j ( s, t ) ,respectively. Let θ j,k,l be the corresponding basis coefficients and ∆( s r ) numerical integrationweights for the observed time points s r . Then, the historical effect can be represented by (Scheiplet al., 2015; Brockhaus et al., 2016b) (cid:90) u ( t ) l ( t ) x k j ( s ) β j ( s, t ) ds ≈ B j ( x k j , t ) θ j (4) with θ j = ( θ j, , , . . . , θ j,K x ,K t ) (cid:62) , B j ( x k j , t ) = B xj ( x k j , t ) ⊗ B tj ( t ) using the Kronecker-product ⊗ and by defining B xj ( x k j , t ) = (cid:34) R (cid:88) r =1 ∆( s r ) x k j ( s r , t )Φ sj, ( s r ) · · · R (cid:88) r =1 ∆( s r ) x k j ( s r , t )Φ sj,K x ( s r ) (cid:35) as well as B tj ( t ) = [Φ tj, ( t ) · · · Φ tj,K t ( t )] . Let I ( · ) be the indicator function. Following Scheiplet al. (2015), for n observed curves x k j , ( · ) , . . . , x k j ,n ( · ) at grid points s r , x k j ( s r , t ) = x k j ( s r ) · I { l ( t ) ≤ s r ≤ u ( t ) } and response observations y i ( t i,d ) at potentially curve specific time points t i,d ∈ T , i = 1 , . . . , n, d = 1 , . . . , D i , N = (cid:80) ni =1 D i , the design matrix of a historical effect can besummarised by B j := B xj (cid:12) B tj = ( B xj ⊗ (cid:62) K t ) ∗ ( (cid:62) K x ⊗ B tj ) , (5) where B xj ∈ R N × K x with rows B xj ( x k j ,i , t i,d ) , B tj ∈ R N × K t with rows B tj ( t i,d ) , (cid:12) is the row-wise tensor product, ∗ the Hadamard product (element-wise matrix multiplication) and (cid:62) a arow-vector of length a . In the supplemental material, we provide a simple example of how tointerpret estimated coefficient surfaces of historical effects, as we believe that this is an importantpart in using historical models.Regularisation of the coefficient vector θ j in (4) is achieved by an anisotropic penalty. Usingthe marginal penalties P xj ∈ R K x × K x and P tj ∈ R K t × K t of the historical effect basis in s - and t -direction, respectively, a quadratic penalty term can be constructed as θ (cid:62) j P j θ j = θ (cid:62) j (cid:2) λ xj ( P xj ⊗ I K t ) + λ tj ( I K x ⊗ P tj ) (cid:3) θ j = θ (cid:62) j (cid:2) λ xj P xj ⊕ λ tj P tj (cid:3) θ j , (6) where λ xj , λ tj ≥ are smoothing parameters and ⊕ is the Kronecker-sum (Wood, 2006; Scheiplet al., 2015). More details on the penalisation and potential extensions can be found in the nextsubsection. Similarly, penalised basis expansions as in (4), (5), and (6) can also be constructedfor a multitude of other effects of scalar and / or functional covariates, including all effects ofscalar covariates in our proposed model for the emotion components data (Scheipl et al., 2015; oosting Factor-Specific Historical Models h j ( x )( t ) = α ( t ) as well as time-varying categorical or random effects h j ( x )( t ) = γ j,e ( t ) · I ( z q j = e ) , (7) where q j ∈ { , . . . , p z } , z q j is a categorical covariate with levels e ∈ { , . . . , η } and γ j,e ( t ) thecorresponding time-varying coefficient. The smoothness of the coefficient functions α ( t ) and γ j,e ( t ) is obtained with a spline basis representation as in (4) and a Kronecker sum penaltyas in (6) with P xj set to zero for categorical effects and P xj = I K z for (independent) functionalrandom effects (see Brockhaus et al., 2015, for more details). In particular, for functional randomeffects, the quadratic penalty in (6) is equivalent to a normal distribution assumption on θ j withzero mean and covariance proportional to the generalised inverse of P j (Brumback et al., 1999),inducing a Gaussian process assumption for the functional random effects. Furthermore, weconsider interaction effects of z q j and a second categorical covariate z q (cid:48) j with levels f = 1 , . . . , ϕ of the form h j ( x )( t ) = ρ j,e,f ( t ) · I ( z q j = e ) · I ( z q (cid:48) j = f ) . (8) Identifiability constraints for time-varying categorical effects such as (7) and (8) are discussed inthe following subsection.
In light of our application, we newly introduce factor-specific historical effects for functionalregression models. Factor-specific historical effects can be useful when historical effects areassumed to vary, e.g., between different study settings or subjects. First, consider a categoricalcovariate z q j with levels e = 1 , , . . . , η and a functional covariate x k j ( s ) , which is modeled viaa historical effect. A simple additive model of the form (2) would then include a main historicaleffect (3) and a factor-specific historical effect h j ( x )( t ) = I ( z q j = e ) · (cid:90) u ( t ) l ( t ) x k j ( s ) β j,e ( s, t ) ds. (9) Given a total of N observations and the covariate vector z q j = (cid:2) ( z q j , ⊗ D ) (cid:62) , . . . , ( z q j ,n ⊗ D n ) (cid:62) (cid:3) (cid:62) the factor-specific historical effect is constructed similarly to (5). The design matrix is extendedto B j = B zj ( z q j ) (cid:12) B xj (cid:12) B tj = (cid:101) B xj (cid:12) B tj , (10) R¨ugamer et al. where B zj ( z q j ) is a design matrix for the factor variable depending on the constraints on β j,e ( s, t ) (see below) and (cid:101) B xj = B zj ( z q j ) (cid:12) B xj . An important special case is given for the unconstrainedestimation of β j,e when the observations are sorted by the factor levels e = 1 , . . . , η . This yieldsa block-diagonal incidence matrix for B zj ( z q j ) = diag ( κ , κ , . . . , κ η ) ∈ R N × η and a N × ηK x block-diagonal matrix for (cid:101) B xj = diag ( B xj, , . . . , B xj,η ) . Here, B xj,e ∈ R κ e × K x contains the rows (cid:80) e − k =1 κ k + 1 , . . . , (cid:80) ek =1 κ k of B xj corresponding to all rows with factor level e and κ e being thetotal number of observation points for factor level e . This special structure can be exploited fora more efficient computational implementation (see section 3.2 for more details).When the historical effect of x k j is not only factor- or subject-specific, but varies for a categor-ical covariate z q j with levels e = 1 , , . . . , η as well as for subjects z q (cid:48) j with levels f = 1 , , . . . , ϕ ,we let h j ( x )( t ) = I ( z q j = e ) · I ( z q (cid:48) j = f ) · (cid:90) u ( t ) l ( t ) x k j ( s ) β j,e,f ( s, t ) ds. (11) The design matrix for the random factor-specific historical effect or doubly-varying historical effect (11) is then defined by extending B zj ( z q j ) in (10) to B zj ( z q j , z q (cid:48) j ) = B zj ( z q j ) (cid:12) B zj ( z q (cid:48) j ) . For these factor-specific historical effects (9) and (11), we have to carefully consider their iden-tifiability and regularisation.
Identifiability constraints . In order to ensure that the main historical effect is separable fromthe factor-specific historical effects and vice versa, we impose the following constraint when bothare included in the model: η (cid:88) e =1 ψ e · β j,e ( s, t ) = 0 ∀ t ∈ T , s ∈ [ l ( t ) , u ( t )] , (12) where ψ e are weights for each level e = 1 , . . . , η of the factor variable. Specifically, for observedcurves i = 1 , . . . , n , we use ψ e = (cid:80) ni =1 I ( z q j ,i = e ) , which coincides with equal weighting inthe case of balanced factor levels. This also allows β j ( s, t ) in (9) to be interpretable as averagehistorical effect over the η subgroups. (12) ensures identifiability because the factor-specifichistorical effects are centred around the surface of the main effect for models including both (3)and (9).For the doubly-varying historical effects to be defined as deviations from both factor-specific oosting Factor-Specific Historical Models η (cid:88) e =1 ψ e,f · β j,e,f ( s, t ) = 0 ∀ t ∈ T , s ∈ [ l ( t ) , u ( t )] , f ∈ { , . . . , ϕ } and (13) ϕ (cid:88) f =1 ψ e,f · β j,e,f ( s, t ) = 0 ∀ t ∈ T , s ∈ [ l ( t ) , u ( t )] , e ∈ { , . . . , η } , (14) for which we use the weights ψ e,f = (cid:80) ni =1 I ( z q j ,i = e, z q (cid:48) j ,i = f ) .To ensure identifiability and interpretability of the whole model, further constraints must beplaced on effects other than the historical effects, i.e., when including time-varying effects in themodel. As in Scheipl et al. (2015) and Brockhaus et al. (2015), all time-varying effects in ourmodels are specified as deviations from the smooth intercept α ( t ) . This ensures the identifiabilityof each effect and allows for a meaningful interpretation (as deviation from the sample mean α ( t ) ). Consider the factor variable z q j and an effect as in (7). We then impose (cid:80) ηe =1 ψ e · γ j,e ( t ) =0 ∀ t ∈ T . A similar constraint is enforced for interaction effects (8) with coefficients ρ j,e,f ( t ) : (cid:80) ηe =1 ψ e,f · ρ j,e,f ( t ) = 0 ∀ t ∈ T , f ∈ { , . . . , ϕ } and (cid:80) ϕf =1 ψ e,f · ρ j,e,f ( t ) = 0 ∀ t ∈ T , e ∈{ , . . . , η } , i.e., each interaction effect has to be centred around its corresponding main effects.For details on the implementation, see section B in the supplementary material. Parameterisation . The separation of the factor-specific historical effect and the correspondingmain historical effect together with constraint (12) is particularly useful in the light of model se-lection. However, an alternative model formulation that does not separte main and factor-specifichistorical effects may sometimes be beneficial for the interpretation of estimated effects and thesimplicity of the model definition. A historical model with a main and factor-specific historicaleffects can be rewritten as (cid:82) u ( t ) l ( t ) x k j ( s ) (cid:16) β j ( s, t ) + I ( z q j = e ) · β j,e ( s, t ) (cid:17) ds , combining main andfactor-specific historical effects by estimating the sum ˜ β j,e ( s, t ) := (cid:0) β j ( s, t ) + I ( z q j = e ) · β j,e ( s, t ) (cid:1) and thereby making constraint (12) obsolete. Regularisation . For the regularisation of a factor-specific historical effect, the penalty dependson whether we want to regularise over the factor levels, e.g., for “ random historical effects ”, ornot, e.g., for study settings. In general, the quadratic penalty matrix in (6) is extended to ananisotropic penalty P j = (cid:18) λ zj P zj ⊕ (cid:20) λ xj P xj ⊕ λ tj P tj (cid:21)(cid:19) , (15) where P zj is the K z × K z marginal penalty matrix over the factor levels and λ xj , λ tj , λ zj are thesmoothing parameters controlling the regularisation of the historical effect part in s - as wellas t -direction and of the factor variable part, respectively. Usually, K z is the number of factorlevels (minus one, depending on the constraint on the effect) and P zj is a simple Ridge penalty P zj = I K z . Whereas the factor-specific historical effect is therefore shrunk towards the main0 R¨ugamer et al. historical effect in a model with both, main and factor-specific historical effect, the penalty inthe alternative parametrisation without constraint on the factor-specific historical effect enforcesshrinkage of β j,e towards zero. In practice, the s - and t -directions of the historical effect aretypically measured on the same scale (i.e., time), thus we introduce an isotropic penalty for thehistorical effect part by defining λ tj ≡ λ xj =: λ hj and P xj ⊕ P tj =: P hj . For the doubly-varyinghistorical effect (11), the term λ zj P zj in (15) is replaced by (cid:104) λ zj P zj ⊕ λ z (cid:48) j P z (cid:48) j (cid:105) . If one or bothfactors are not penalised, the corresponding penalty matrices are set to zero.
3. Estimation: Component-wise gradient boosting
The estimation via component-wise gradient boosting (B¨uhlmann and Hothorn, 2007; Brockhauset al., 2015) has several advantages. The main advantage of using component-wise boosting overconventional estimation procedures lies in the nature of component-wise fitting, as the feasibilityof component-wise fitting procedures only depends on the most complex individual component.Adding partial effects step-by-step, boosting provides implicit variable selection and allows formodel estimation in settings with
J > n or p > n . The component-wise gradient boosting algorithm for a function-on-function regression modelwas introduced by Brockhaus et al. (2015), and is based on the functional gradient descent(FGD) algorithm (cf. B¨uhlmann and Hothorn, 2007; Hothorn et al., 2016).
Loss function and empirical risk . In general, the component-wise FGD algorithm aims tominimize the expected loss E ( Y, X ) ( ρ (( Y, X ) , h )) for response Y and covariates X with respect tothe additive predictor h for a suitable loss function ρ . The loss is determined by the underlyingregression problem, e.g., the L -loss for mean regression. In order to adapt the principle of FGDto functional observations, the loss function (cid:96) for a whole trajectory is defined as (cid:96) (( Y, X ) , h ) = (cid:82) T ρ (( Y, X ) , h )( t ) dt , i.e., the integrated pointwise loss ρ over the domain T . For potentiallyfunctional observations ( y i , x i ) , i = 1 , . . . , n , the objective function, the risk, is then given by E ( Y, X ) ( (cid:96) (( Y, X ) , h )) and the FGD algorithm for functional regression models aims at minimizingthe empirical risk n − n (cid:88) i =1 D i (cid:88) d =1 w i Υ( t i,d ) ρ (( y i , x i ) , h )( t i,d ) , where sampling weights w i are used to select or deselect all observations of one functionaltrajectory in resampling approaches and Υ( t ) are weights of a numerical integration schemeused to approximate the integrated loss (cid:96) (Brockhaus et al., 2016b) . oosting Factor-Specific Historical Models Routine and baselearners . In each step, the FGD algorithm evaluates a set of baselearners (inthis case corresponding to penalised regression for the partial effects h j ), chooses the baselearnerthat best fits the negative gradient at the current estimate − ∂∂ h E ( Y, X ) ( (cid:96) (( Y, X ) , h )) and updatesthe fit in light of this choice. As in representation (4), we assume that every baselearner can berepresented as a linear effect in θ j ∈ R K j , i.e. h j ( x )( t ) = B j ( x k j , t ) θ j , with suitable penalty,e.g., (6) or (15). Algorithm . The full algorithm is given by the following five steps:
Step 1 : Set m = 0 ; Initialise the estimates, e.g. ˆ θ [ m ] j ≡ for each baselearner j ∈ { , . . . , J } ,and define ˆ h [ m ] ( x )( t ) = (cid:80) Jj =1 B j ( x , t ) ˆ θ [ m ] j ; choose a step-length ν ∈ (0 , and a maximalstopping iteration m stop . Step 2 : Compute the negative gradient − ∂∂ h ρ (( y, x ) , h ) and define the so called pseudo residuals u i ( t i,d ) := − ∂∂ h ρ (( y i , x i ) , h )( t i,d ) (cid:12)(cid:12)(cid:12)(cid:12) h =ˆ h [ m ] . Step 3 : Fit the baselearners j = 1 , . . . , J to the pseudo residuals ˆ ϑ j = argmin ϑ ∈ R Kj n (cid:88) i =1 D i (cid:88) d =1 w i Υ( t i,d ) (cid:8) u i ( t i,d ) − B j ( x k j ,i , t i,d ) ϑ (cid:9) + ϑ (cid:62) P j ϑ and find the best-fitting j ∗ th baselearner such that j ∗ = argmin j =1 ,...,J n (cid:88) i =1 D i (cid:88) d =1 w i Υ( t i,d ) (cid:110) u i ( t i,d ) − B j ( x k j ,i , t i,d ) ˆ ϑ j (cid:111) . Step 4 : Set ˆ θ [ m +1] j ∗ = ˆ θ [ m ] j ∗ + ν ˆ ϑ j ∗ , ˆ θ [ m +1] j = ˆ θ [ m ] j ∀ j (cid:54) = j ∗ and update ˆ h [ m ] accordingly. Step 5 : Set m = m + 1 ; as long as m ≤ m stop , repeat steps 2 — 5.The final model with corresponding parameters ˆ θ m ∗ j , j = 1 , . . . , J , m ∗ ∈ { , . . . , m stop } is chosenfrom the set of m stop estimated models via cross-validation or other resampling methods on thelevel of curves (Brockhaus et al., 2015) in order to prevent over-fitting. This so called earlystopping of the boosting procedure introduces regularisation on coefficient estimates (Zhangand Yu, 2005). It is important to set equal degrees of freedom df j for every baselearner j for a fair selection ofbaselearners (Hofner et al., 2011). A regularisation over factor levels for categorical covariateswith a moderate or large number of factor levels is thus often necessary in practice as df j would2 R¨ugamer et al. otherwise become very large. The smoothing parameters λ j , which have a one-to-one correspon-dence with df j , must therefore be computed and fixed appropriately beforehand for j = 1 , . . . , J .Model complexity and smoothness is then controlled for fixed ν by the stopping iteration, whichis chosen by resampling.The FDboost package, based on the mboost package, uses the Demmler-Reinsch orthogonal-ization (DRO, see, e.g., Ruppert et al., 2003), which avoids repeated matrix inversions to effi-ciently find a suitable λ j . Nonetheless, computing the DRO may be very expensive, particularlyfor factor- and subject-specific historical effects, due to a singular value decomposition (SVD),and can take up to % of total computing time. To tackle this problem, on the one hand, werecommend reducing the number of knots for (doubly-)varying historical effects to a small num-ber (e.g., four), if this is not expected to lead to unwanted oversmoothing. On the other hand,we exploit the model structure for factor-specific historical effects and derive a presentation thatallows for a blockwise SVD with computation time on the order of an ordinary historical effect.This reduces overall computation time dramatically (see section C in the supplementary materialfor more details). For the application in section 5, for example, the most complex model withpartially aggregated data could be fitted in under 16 minutes with less than 45 gigabyte RAM,whereas the brute-force method (fitting the model with ten knots without exploitation of themodel structure) failed, exceeding the memory limit of 1 terabyte RAM after running for morethan 10 days. Although the first approach can be a good (approximate) ad-hoc solution, thesecond approach is exact and thus generally recommended if feasible. Due to the large fluctuation in bioelectrical signals, a very important aspect in the analysis of suchsignals is the assessment and quantification of uncertainty. For the detection of synchronisationwith a large number of potentially relevant time intervals of both signals, “significant” effectsfor specific time point combinations are of particular interest. Apart from rank based p-valuesprovided in the context of Likelihood-based boosting (Binder et al., 2009) using permutations ofthe response, no general inferential framework in the classical statistical sense exists for boost-ing methods. An alternative approach is stability selection (Meinshausen and B¨uhlmann, 2010),which evaluates the importance of explanatory variables by looking at the stability of term se-lection under subsampling and has already been adapted for functional regression boosting (seee.g., Brockhaus et al., 2015). In the emotion components application, however, the applied re-search question defines the chosen covariates and the statistical analysis needs to address theuncertainty of estimated coefficient surfaces. We therefore use a non-parametric curve-levelbootstrap to assess the variability of estimated effects. Due to the shrinkage effect of boosting, oosting Factor-Specific Historical Models
4. Simulations
We provide results for the estimation performance of simple historical effects (subsection 4.1),factor-specific historical effects (subsection 4.2) and for the uncertainty quantification via boot-strap (subsection 4.3). In section 4.4, we briefly address results on different parametrisationsand boosting step-lengths.Similar to our application, we use historical effects with integration limits l ( t ) = T =0 and u ( t ) = t − δ with δ = 0 . . We compare the estimated surface with the underly-ing true function and, wherever possible, with an estimate using a functional additive mixedmodel as implemented in the pffr -function in R package refund (Scheipl et al., 2015). Apartfrom visual comparisons, we estimate the relative integrated mean squared error ( reliMSE ) (cid:82)(cid:82) ( ˆ β ( s, t ) − β ( s, t )) ds dt · ( (cid:82)(cid:82) β ( s, t ) ds dt ) − by its discrete approximation in order to com-pare the estimates of our method, referred to as FDboost .Simulation settings were generally based on n ∈ { , , , } number of observedcurves with D i ≡ D ∈ { , , } observed grid points per trajectory, a signal-to-noise ratio SNR ∈ { . , , } . For the following subsections the combinations were customised or re-stricted accordingly, in particular for simulations with very time-consuming bootstrap calcula-tions. Whereas the number of curves in our application n = 184 is within the range of simulatedsettings, we use fewer observations per trajectory in our simulations than available in our ap-plication ( D = 384 ) in order to reduce computational time. Increasing sampling density D from to or in additional simulations with SNR ∈ { . , . , } and n = 160 almostalways results in an improvement of estimation performance. The average estimated SNR inour application is . , which, due to the shrinkage effect, might potentially be underestimatethe true SNR. We also present results of another simulation for n ∈ { , } , D ∈ { , } and SNR ∈ { . , . , } in the appendix. The results suggest that, even for a small number ofobservations, the estimation performance is satisfactory when the density of sampling is largeenough.The results of our simulation studies are briefly summarised in the following sections. See4 R¨ugamer et al. the supplementary material for a full presentation of results.
Though estimation performance for simple historical effects has already been examined in Brock-haus et al. (2016b), we provide additional simulation results for complex multi-modal effect sur-faces. The simulation settings are motivated by our application, in which several time windowsmay show a relationship between the two biosignals. We thus simulate data sets where the effectsurface is multi-modal for both the s - and the t -direction. Samples were generated from themodel Y i ( t ) = α ( t ) + (cid:90) t − δ x i ( s ) β ( s, t ) ds + ε i ( t ) , i = 1 , . . . , n, (16) for which the functional covariate x i ( s ) is simulated as sum of κ ∈ { , , , } natural cubicB-Splines with independent random coefficients from a standard normal distribution. The trueunderlying coefficient surface is given by β ( s, t ) = sin(10 · | s − t | ) · cos(10 t ) I ( s ≤ t − δ ) with I ( s ≤ t − δ ) = 1 if s ≤ t − δ , else . The independent Gaussian error process ε ( t ) with meanzero has constant variance σ defined via the SNR = (cid:112) Var(Ξ) / (cid:112) ( σ ) with Var(Ξ) being theempirical variance of the linear predictor.In addition, we simulate effect surfaces with a band structure. This is done by using the datagenerating process in (16) and restricting the influence of x i to values s , for which s ≤ t − δ , s ≥ t − . and t ≤ . , s, t ∈ [0 , . With observed time points the restriction s ≥ t − . corresponds to an autoregressive model with time-varying effects and a lag of . / (1 /
40) =4 time points. With this simulation, we want to investigate whether our approach is able toadequately recover the effect of x i restricted to a certain number of lags without having topredefine lags. This would be an advantage over time-series models which have to specify theassumed lag structure a priori and would allow a corresponding dimension reduction withoutrestricting the analysis. Results . For combinations in which n and SNR are not very small at the same time, ourgradient-boosting approach works well and recovers the true underlying functional relationship.These findings are depicted in Figure 2. As can be seen in the upper row, both pffr and FDboost are able to recover the true underlying effect well (right panel) with
FDboost having an ad-vantage for low SNR and low n (left panel). For higher SNR, where FDboost shows less of animprovement than pffr compared to the low SNR setting, boosting estimates may potentiallybe further improved by using a higher number of iterations (limited to for this subsection).In the supplementary material, we additionally provide estimates with average reliMSE for asmaller number of observations, visualising the deterioration in estimation performance withdecreasing sample size. oosting Factor-Specific Historical Models
FDboost outperforms pffr (lower row of Figure 2) forband surfaces in settings with a lower SNR, whereas for a SNR = 10 , pffr shows partly betterperformances. As exemplarily shown in the lower right panels of Figure 2, FDboost is often ableto correctly detect the non-zero regions, whereas the typical estimated surface of pffr exhibitslarger parts with false positive estimates.
SNR = 0.1 SNR = 1 SNR = 10 llllllllllllllll lll llllllllllllllllllll ll llllll lll ll l lllllll l l ll llll l l l lllll l −1.5−1.0−0.50.00.51.0 80 160320640 80 160320640 80 160320640 number of observations n l og ( r e li M SE ) FDboostpffr FDboost pffr truth010203040 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 time point of influence s t i m e po i n t be i ng i n f l uen c ed t −1.5−1.0−0.50.00.5 coefficient SNR = 0.1 SNR = 1 SNR = 10 llllll l lllllllllllllllllll lllllllllllllll lllllllllllll lllllllllll lllllllllllllllllllllllllllllll lll llllllllllllllllll lll lll l l −1.0−0.50.00.51.01.52.02.5 80 160320640 80 160320640 80 160320640 number of observations n l og ( r e li M SE ) FDboostpffr FDboost pffr truth0.000.250.500.751.00 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 time point of influence s t i m e po i n t be i ng i n f l uen c ed t −0.40.00.4 coefficient Fig. 2: Left panels: Comparison of reliMSEs for the estimation of multi-modal surfaces (upperrow) as well as band surfaces (lower row) and different settings of the SNR (columns). The x i ( s )were generated on the basis of 11 natural cubic B-Splines with 15 knots. Right panels: examplefor estimates of a multi-modal surface (upper row) for 640 observed trajectories and a SNR of1 respectively estimates of a band structured coefficient surface (lower row) for 320 observedtrajectories and a SNR of 10 both with respective average reliMSE. For random historical effects, we adapt the ideas of Scheipl and Greven (2016) and Brockhauset al. (2016b, Web Appendix C) and generate random coefficient functions β f ( s, t ) as linear com-binations of cubic P-splines (Eilers and Marx, 1996) for n subject = 10 factor levels (subjects). Thecoefficient functions β f ( s, t ) , f = 1 , . . . , ϕ = 10 are then centred to comply with constraint (12).For factor-specific historical effects, we specify multiples ι ( e ) of one fixed coefficient function (cid:36) ( s, t ) = s √ · cos( π √ t ) with ι ( e ) being centred coefficients drawn uniformly between − and for each factor level e = 1 , . . . , η = 4 , allowing for a more systematic examination of estimation6 R¨ugamer et al. accuracy in specific regions of the coefficient function. An additional doubly-varying effect is sim-ulated by multiplying (cid:36) ( s, t ) with centred random coefficients drawn from a standard normaldistribution.In a first series of settings (correctly specified case), the data are generated on the basis of thefitted model, including a main historical effect and (i) a time-varying categorical effect as well asa factor-specific historical effect, (ii) a time-varying random effect as well as a random historicaleffect, (iii) combining (i) and (ii), or (iv) combining (iii) with a doubly-varying historical effect(full model). In a second series of settings, the model is misspecified by fitting a single historicaleffect, whereas the data are simulated using a main and (v) a factor-specific historical effect or(vi) a random historical effect or alternatively (vii) by generating the data from the full modelwhereas the model is fitted without the doubly-varying effect. Results . Whereas the main historical effect for the settings (i)-(iv) shows a similar logarithmicreliMSE as in previous simulation settings in 4.1, the historical effects varying with a categoricalcovariate show more diverse performances and larger deviations. The factor-specific and randomhistorical effect estimation mostly capture the main features of the true underlying surface, butare not estimated as reliably as the main historical effect. Estimates for the doubly-varyinghistorical effect are often shrunk almost to zero due to an insufficient number of observations.In settings (v) or (vi) where the true underlying model includes a random or factor-specifichistorical effect, estimation performance for the main historical effect is equally good when fittingthe correct or the misspecified model. For setting (vii) the performance is practically the same forthe estimation of the main historical effect. The difference in estimation performance varies morestrongly for the factor-specific as well as random historical effect and, in particular, indicates abetter performance of the correctly specified model for high SNR and larger n . The fact thatestimation performance is not affected more strongly is likely due to the orthogonality of theomitted effect to the effects included in the model, cf. (12) - (14). In the following, we examine the ability of -bootstrap intervals to correctly identify (non-)zero coefficients in the manner of conventional confidence intervals by looking at the inclusionof zero. On the basis of nonparametric bootstrap iterations, we calculate the false negativerate ( FNR ) and false positive rate ( FPR ) over the surface for each of simulated data sets.In addition, the frequencies of false negative (
FFN ) and false positive estimates (
FFP ) for eachsurface point across all data sets are obtained. We present results for a model including only onemain historical effect in addition to a model with main and factor-specific historical effects, forboth of which true coefficient surfaces are partly equal to zero. The true coefficient surface for the oosting Factor-Specific Historical Models β ( s, t ) = Q . { sin( | t − s | +10) · cos(5 s ) } and surfaces for factor-specific historical effects are simulated as multiples of (cid:36) ( s, t ) = Q . { φ . , . ( s ) · φ . , . ( t ) } ,where Q a ( x ) = x · I ( x ≥ a ) and φ µ,σ ( · ) is the normal density function with expectation µ andvariance σ . We additionally investigate the performance of our uncertainty quantification for amodel including main and random historical effects, which are simulated as described in section4.2. Results . Figure 3 depicts the results for a simple historical effect simulation with SNR = 1 , n = 160 and D = 40 . Both the FNR and the FPR are below . in all but a few cases. Whendecreasing the SNR to . , the bootstrap approach yields smaller FPR at the cost of a larger FNR.Considering the FFP and FFN, % of all non-zero surface points reveal a FFN of above . and % of all zero surface points reveal a FFP of above . . Plotting the FFN against the coefficientsize indicates that FFNs larger than . only occur for coefficient values of below . (below 0.6if SNR = 0 . ). The rightmost panel of Figure 3 reveals a strong relationship between the FFPand a smaller distance to non-zero points on the surface, with FFP mostly below about . forpoints not next to a non-zero coefficient. l v a l ue per surface lllllllllllllllllllllllllllllllllllllllllllllllll lllllllllllllllllllllllll v a l ue per surface point ll ll l ll l l ll l l l ll l l l lll l l l llll l l l lllll l l l llllll l l l lllllll l l l llllllll l l l lllllllll l l l llllllllll l l l lllllllllll l l l llllllllllll l l l lllllllllllll l l l llllllllllllll l l l lllllllllllllll l l l llllllllllllllll l l l lllllllllllllllll l l l l lllllllllllllllll l l l l llllllllllllllllll l l l l lllllllllllllllllll l l l l lllllllllllllllllll l l l l l lllllllllllllllllll l l l l l lllllllllllllllllll l l l l l lllllllllllllllllll absolute size of coefficient f r equen cy frequency of false neg. surface points llllllll l l distance on surface to non−zero point f r equen cy frequency of false pos. surface points Fig. 3: Results for uncertainty quantification of a simple historical effect and data generated with
SNR = 1, n = 160 and D = 40. FNR and FPR for each surface with boxplot over iterations (leftpanel) as well as FFN and FFP for each surface point over all simulation iterations with boxplotover surface points (second panel). Third panel: frequency of bootstrap intervals including zeroplotted against the coefficient size for each truly non-zero coefficient surface point. Forth panel:frequency of false positive estimates plotted against the minimal distance to a true non-zeropoint for each zero surface point. Though the performance depends on the specific surface, the bootstrap approach finds themajority of non-zero coefficient points in simulations for a simple historical model and tends tohave a FFN of almost zero. A large FFP only occurs for surface points, that are directly adjacentto true non-zero coefficient points.For a more complex model also including a factor-specific historical effect, the bootstrap ap-proach works well regarding the detection of the truly non-zero surface area. However, it reveals8
R¨ugamer et al. considerably higher FNR as well as higher FFN particularly for smaller coefficients of both effectsurfaces. In the case of correlated observations, for example given by repeated measurementsper subject, we subsample on the level of independent observation units (subjects). In the simu-lation with a main and a random historical effect, higher frequencies of false positive estimatesfor the main historical effect occur, which, however, are again located around the true non-zerocoefficient area.In summary, simulation results suggest that the bootstrap approach does not comply withthe chosen confidence level in the manner of conventional confidence intervals, but proves tofind most of the truly non-zero surface regions for all simulation settings. Large FFN and FFP aremainly revealed at the edges of non-zero coefficient areas, such that an interpretation of detectednon-zero areas of the surface are still possible as long as exact pixel locations of edges are nottaken at face value.
In addition to the presented simulations, we investigate the performance of boosting for differentparameterisations as introduced in section 2.2 and compare boosting estimates with step-length ν = 0 . and ν = 1 . The gradient boosting algorithm is defined for step length ν ∈ (0 , . Ingeneral, it is recommended to set the step length “sufficiently small” (B¨uhlmann and Hothorn,2007) for predictive accuracy reasons, for example in the range of . and . . A larger steplength and, in particular ν = 1 , requires much fewer iteration steps and therefore speeds up themodel fit, but may result in a deterioration of prediction performance due to overfitting. Sincewe are rather interested in the estimation performance of model components, we investigatewhether or how much overfitting is a problem in our particular setting. Results . For the two different parameterisations, performances differ on a relatively smallscale, suggesting that the choice of parameterisation can be based on the given research question.In the comparison of step-lengths, there appears to be no clear best choice in all settings. Thusestimation with ν = 1 might be a reasonable alternative to smaller step-lengths, requiring lesscomputing time and memory consumption due to a smaller number of necessary iterations,especially in complex models applied to large data sets.
5. Application to the detection of synchronisation in bioelectrical signals
Gentsch et al. (2014) conducted a study in which participants played a computerised gam-bling game with real monetary outcome. During the gambling rounds, Gentsch et al. modi- oosting Factor-Specific Historical Models Goal conduciveness , which was related to the monetary outcome at the end of each gamblinground ( gain coded as G = 1 or loss with G = 0 ), (2) Power , which allowed players to changethe final outcome if the setting was high power ( hp for short coded as P = 1 , else referred to as low power / lp with P = 0 ) and (3) Control . The control setting was manipulated in blocks inorder to change the participant’s subjective feeling about her ability to cope with the situation.Before a block with several gambling rounds would start, participants were told whether theywere going to have high or low power for the majority of upcoming games, which corresponds to high or low control settings ( hc coded as C = 1 respectively lc with C = 0 ). In rounds with highcontrol, for example, the player was told to frequently have high power, thereby trying to inducea subjective feeling of control over the situation, and vice versa for low control. Each participantplayed over gambling rounds for each of the eight appraisal settings, which we also refer toas trials .Before performing statistical analyses, EEG- as well as EMG-signals are pre-processed (seethe supplementary material for further details). After removing the data of one participant dueto considerably deviating observations, which imply a defective or displaced sensor, several hun-dred gambling rounds each with equally spaced EEG- and EMG-measurements within around1500 milliseconds are available for each of the 23 participants. Analogous to previous studies onsynchronisation and, in particular, the study of Gentsch et al. (2014), we use aggregated obser-vations for each participant and game condition by averaging the corresponding trials for eachtime point. On the one hand, this results in less computing time and the feasibility to quantify un-certainty in effect estimates via bootstrap, on the other hand, this is motivated by investigationson event-related potentials (ERPs). ERP analysis is a commonly practised method to infer fromneuronal activity. Neuronal activity is thought to be time-locked in delay to a certain stimulus,wherefore aggregating over a large number of trials is used to cancel out random brain activityand strengthens those parts of the signal, which are commonly observed for all trials (see, e.g.,Pfurtscheller and da Silva, 1999; Handy, 2005; Rousselet et al., 2008).Instead of combining the (spatially correlated) EEG-signals in order to maximize the ex-planatory power of the analysis, the question of interest rather lies in the dominant influence ofcertain selected EEG-signals. We fit a model for each EEG-signal of interest ( F z -, F Cz -, P Oz -0 R¨ugamer et al. and
P z -electrode) in order to determine the direct effect on the facial muscle activity. In orderto demonstrate the capability of our approach to handle high-dimensional data sets, we also pro-vide sample code in the repository for fitting a model, in which all EEG-signals are potentiallyincluded with historical, factor-specific and random historical effects. In the supplementary ma-terial, we additionally provide a visualisation for the selection frequency of this model after iterations.
It is predicted that facial expression is largely driven by efferent brain signals reflecting appraisalprocesses. We use the following maximal model Y il ( t ) = (cid:88) j =1 h j ( x il )( t ) + ε il ( t ) , (17) for l = 1 , . . . , n setting = 8 , i = 1 , . . . , n subject = 23 , t ∈ T = [0 ms, ms ] and D i ≡ D = 384 observed time points in T . In (17), Y il ( t ) represents a chosen EMG-signal for subject i , gamecondition l and time point t in the game. h j ( x il )( t ) , or, for short, h j ( t ) are thirteen partial effectsof covariates x il including a time-varying intercept, game condition effects ( C , P , G ) and EEG-signal effects depending on the selected electrode signal ω il . Table 1 provides the details on eachpart of the linear predictor. For the integration limits, we use l ( t ) = 0 and a lead-parameter u ( t ) = t − δ = t − ms , which is meaningful due to restrictions given by the neuro-anatomyof humans and is just below the time lag between EMG and EEG of . ms (Mima and Hallett,1999). In order to reflect subject-specific variation, we include time-varying random interceptsand subject-specific historical EEG-effects in the model.Though game condition-specific historical effects may well be subject specific, simulations inthe previous section suggest that even if the true model corresponds to the full model, estimationperformance is only slightly affected when using a misspecified model without a random factor-specific historical effect h ( t ) . As a sensitivity analysis, we also fit the full model including h ( t ) on a finer aggregation of the data, for which we average over fewer trials per subject and thusobtain repeated measurements per subject-game condition-combination. For the historical effects, the estimated coefficient surfaces are depicted in Figure 4 for the EEG-covariate in the form of the electrode ’Fz’ (in particular measuring intentional and motivationalactivities, Teplan (2002)) and the EMG-response signal of the frontalis muscle (raises the eye-brows). The lower panel in these figures depicts the average EEG-signal per game condition, oosting Factor-Specific Historical Models Table 1: Partial effects in the EMG-EEG-model
Partial effect h j ( x il )( t ) Effect (of ) h ( t ) = α ( t ) Intercept h ( t ) = b ,i ( t ) Subject-specific intercepts h ( t ) = γ ( t ) C il Game condition controlh ( t ) = γ ( t ) P il Game condition powerh ( t ) = γ ( t ) G il Game condition goal conducivenessh ( t ) = γ ( t ) C il P il Interaction of control and powerh ( t ) = γ ( t ) C il G il Interaction of control and goal cond.h ( t ) = γ ( t ) P il G il Interaction of power and goal cond.h ( t ) = γ ( t ) C il P il G il Interaction of all game conditions h ( t ) = (cid:82) t − ω il ( s ) β ( s, t ) ds EEG-signal h ( t ) = (cid:82) t − ω il ( s ) β ,l ( s, t ) ds EEG-signal (game-condition specific) h ( t ) = (cid:82) t − ω il ( s ) b ,i ( s, t ) ds EEG-signal (subject-specific) h ( t ) = (cid:82) t − ω il ( s ) b ,i,l ( s, t ) ds EEG-signal (subj.- and game cond.-spec.) demeaned per time point by the overall mean and with negative or positive values highlightedin blue or red, respectively. Two further panels (left, center) for the EMG-signal show the over-all mean, the prediction with and without the historical effects (left) as well as the differencebetween these predictions (center). For predictions, the average EEG-signal per game conditionwas used. Additionally, corresponding bootstrap results for uncertainty assessment are incorpo-rated in the figures by different degrees of transparency related to different pointwise bootstrapintervals BI α = [ q α/ , q − α/ ] , q a as α %-bootstrap quantile and α ∈ { , , } . Surface pointsare coloured with the corresponding coefficient value and are less transparent if the specifiedbootstrap interval does not contain the value zero.Figure 4 shows the sum of the estimated coefficient surfaces of main and game condition-specific historical effects for the four high control settings (the other four surfaces are includedin the online appendix). In all four effect surfaces a similar pattern can be found, which reflectsthe structure of the main historical effect. The coefficients near the diagonal reveal a positivesign at around s ≈ ms , whereas the upper left as well as the upper right of the surface,visually separated by a thick black contour line, are estimated with a negative sign. In contrastto the upper left negative coefficient area, which is mostly indicated to be not different from zeroby the boostrap, the upper right negative coefficient area is indicated to be non-zero for all eightconditions at least to some extent. The positive area in between those two negative subareas ismostly estimated to be either zero or non-zero but with relatively small coefficient values. Thepositive effect near the diagonal at s ≈ ms is estimated to have the largest values for hcsettings in combination with hp / loss and lp / gain situations and is found to be non-zero by2 R¨ugamer et al. the bootstrap only for the latter scenario. This very strong short-term synchronisation of EEG-and EMG-signal seems to be very reasonable from a theoretical point of view, as facial reactionsincluding raising of the eyebrows are usually of brief nature and are linked to appraisals such asnovelty, which is consistent in the hc / lp / gain case with low power not being expected in ahigh control setting (Scherer, 2009).The estimated effect can on the one hand be interpreted on the subject level. A person witha higher EEG-signal at s ≈ ms , for example, will on average show a higher EMG-signal (i.e.stronger muscle activity) for t ≈ ms , given the preceding EEG-signal and game conditionremain the same. On the other hand, effects can be explained by relating the demeaned averageEEG-signal for one game condition and the corresponding coefficients to the changes in theaverage EMG-signal, which is illustrated by the hc / lp / gain setting in Figure 4. As EEGvalues related to this game condition are on average above the overall mean EEG values for s ∈ [300 , ms , the EEG seems to have an increasing effect on subsequent EMG values andthus muscle activity, with the effect lasting for at least ms .In theory, muscle activity should be traceable to brain signals. Therefore the results indicatethat brain activity measured at the F z -electrode only contributes to a relatively small amount inexplaining the movement of eyebrows (difference panels on the left of each plot in Figure 4).However, for the game condition hc / lp / gain, the model explains a considerable amount ofEMG-activity (particularly visible in the difference plot of EMG predictions).When reparameterising the factor-specific historical effects without historical main effect,when boosting with step-length as well as in the full model with more finely aggregated data,the estimated effects are similar to the reported ones. Further results for the application aregiven in the online appendix, including results for the scalar covariates.Gentsch et al. (2014) analysed EEG- and EMG-signals separately and made statements re-garding differences in game conditions for one of the signals at a time. Although this and othersimilar strategies may yield results on significant changes in one signal for different study set-tings, no statement on the association of the two signals can be made. In contrast, investigatingthe emotion components data with our proposed approach facilitates the modeling of synchroni-sation of EMG- and EEG-signals in the first place and additionally allows the simultaneous EEG-and EMG-analysis to differ for influence factors given by the study design. Our method thereforeis able to recreate parts of the theoretical emotion components model and leads to new insightson the underlying synchronisation process. Specifically, we found associations between EEG- andEMG-signal that are time-localized (without the need to prespecify time lags) and which differbetween experimental settings, with setting hc / lp / gain showing the clearest association. oosting Factor-Specific Historical Models
6. Discussion
The focus of this paper is the development of a regression framework for the synchronisationanalysis of bioelectrical signal data. Bioelectrical signals like EEG or EMG are recorded in manydifferent research areas, as for example, in neuroscience or cognitive neuropsychology, wherethe goal is to develop an understanding of synchronisation processes in emotion episodes. Incontrast to previous approaches, which are mostly based on coherence, cross-correlation or sim-ilar concepts (see, e.g., Mima and Hallett, 1999; Brown, 2000; Grosse et al., 2002), we use afunction-on-function regression model (see, e.g., Morris, 2015) with factor-specific historical ef-fects. Our model extends the simple historical model (Malfait and Ramsay, 2003; Harezlak et al.,2007; Brockhaus et al., 2016b) by factor-specific and / or random historical effects. As far as weknow, there are no methods available other than
FDboost allowing historical effects to vary withother covariates. We develop constraints to make the resulting estimates both interpretable aswell as identifiable. This flexible class of function-on-function regression models is implementedin the R package
FDboost . Using the component-wise gradient boosting approach by Brockhauset al. (2015, 2016b) for estimation, this approach can deal with high-dimensional data, even p > n settings, and includes variable selection. The algorithm is able to recover different effectsurfaces, including relationships assumed in time series approaches, and allows for potentiallytime-varying associations. The quality of estimates is comparable to those of the function pffr of the R package refund for special cases of function-on-function regression where pffr is ap-plicable.A bootstrap can be employed to assess the variability of boosted estimates. While bootstrapintervals, due to the shrinkage, do not constitute confidence intervals with proper coverage,simulations show that the bootstrap approach is able to recover areas with non-zero effects verywell and only shows a larger FPR and FNR at the edges of true non-zero effect surfaces. A betteruncertainty quantification would be a relevant avenue for future developments.While we do not focus on this feature here, our approach can also model other characteristicsof the conditional response distribution than the mean, such as the median or a quantile. A morecomplex yet interesting class of models would be obtained by combining functional regressionmodels with generalized additive models for location, scale and shape as done for scalar responseby Brockhaus et al. (2016a).For the emotion components data, our model contributes to the understanding of the compo-nential theory by estimating a functional relationship between the EEG and EMG signals withouthaving to prespecify a certain time lag between these two signals. In addition, our proposedextension for historical models allows for appraisal-specific investigations on synchronisationprocesses of emotion components.4
R¨ugamer et al.
Acknowledgements
We thank Fabian Scheipl for his help and useful comments. Sonja Greven, Sarah Brockhaus andDavid R¨ugamer acknowledge funding by Emmy Noether grant GR 3793/1-1 from the GermanResearch Foundation. Kornelia Gentsch and Klaus Scherer were funded by an ERC AdvancedGrant in the European Communitys 7th Framework Programme under grant agreement No.230331-PROPEREMO to Klaus Scherer and by the National Center of Competence in Research(NCCR) Affective Sciences financed by the Swiss National Science Foundation (No. 51NF40-104897) hosted by the University of Geneva.
References
Binder, H., Porzelius, C. and Schumacher, M. (2009) Rank-based p-values for sparse high-dimensional risk prediction models fitted by componentwise boosting.
Tech. Rep. FDM-PreprintNr.101 , University of Freiburg, Germany.Bortel, R. and Sovka, P. (2006) EEG-EMG coherence enhancement.
Signal Processing , , 1737–1751.Brockhaus, S., Fuest, A., Mayr, A. and Greven, S. (2016a) Signal regression models for location,scale and shape with an application to stock returns. URL: https://arxiv.org/abs/1605.04281 . Submitted.Brockhaus, S., Melcher, M., Leisch, F. and Greven, S. (2016b) Boosting flexible functional regres-sion models with a high number of functional historical effects. Statistics and Computing . Toappear.Brockhaus, S. and R¨ugamer, D. (2016)
FDboost: Boosting Functional Regression Models . URL: http://CRAN.R-project.org/package=FDboost . R package version 0.2-0.Brockhaus, S., Scheipl, F., Hothorn, T. and Greven, S. (2015) The functional linear array model.
Statistical Modelling , , 279–300.Brown, P. (2000) Cortical drives to human muscle: the piper and related rhythms. Progress inNeurobiology , , 97 – 108.Brumback, B. A., Ruppert, D. and Wand, M. P. (1999) Comment on ”Variable selection andfunction estimation in additive nonparametric regression using a data-based prior”. Journal ofthe American Statistical Association , , 794–797.B¨uhlmann, P. and Hothorn, T. (2007) Boosting algorithms: Regularization, prediction and modelfitting (with discussion). Statistical Science , , 477–505.Diab, A., Hassan, M., Boudaoud, S., Marque, C. and Karlsson, B. (2013) Nonlinear estimation ofcoupling and directionality between signals: Application to uterine EMG propagation. In oosting Factor-Specific Historical Models , 4366–4369. IEEE.Eilers, P. H. C. and Marx, B. D. (1996) Flexible smoothing with B-splines and penalties. StatisticalScience , , 89–121.Gentsch, K., Grandjean, D. and Scherer, K. R. (2014) Coherence explored between emotioncomponents: Evidence from event-related potentials and facial electromyography. BiologicalPsychology , , 70 – 81.Gervini, D. (2015) Dynamic retrospective regression for functional data. Technometrics , , 26–34.Grandjean, D. and Scherer, K. (2009) Synchronization (and emotion). In The Oxford companionto emotion and the affective sciences (eds. D. Sander and K. Scherer).Grosse, P., Cassidy, M. and Brown, P. (2002) EEG-EMG, MEG-EMG and EMG-EMG frequencyanalysis: physiological principles and clinical applications.
Clinical Neurophysiology , , 1523– 1531.Handy, T. (2005) Event-related Potentials: A Methods Handbook . A Bradford book. MIT Press.Harezlak, J., Coull, B. A., Laird, N. M., Magari, S. R. and Christiani, D. C. (2007) Penalizedsolutions to functional regression problems.
Computational Statistics & Data Analysis , ,4911–4925.Hashimoto, Y., Ushiba, J., Kimura, A., Liu, M. and Tomita, Y. (2010) Correlation between EEG-EMG coherence during isometric contraction and its imaginary execution. Acta Neurobiol Exp(Wars) , , 76–85.Hofner, B., Hothorn, T., Kneib, T. and Schmid, M. (2011) A framework for unbiased modelselection based on boosting. Journal of Computational and Graphical Statistics , , 956–971.Hollenstein, T. and Crowell, S. (2014) Special issue: Whither concordance? autonomic psy-chophysiology and the behaviors and cognitions of emotional responsivity. vol. 98, 1–94.Hothorn, T., B¨uhlmann, P., Kneib, T., Schmid, M. and Hofner, B. (2016) mboost: Model-BasedBoosting . URL: http://CRAN.R-project.org/package=mboost . R package version 2.6-0.Huang, L., Scheipl, F., Goldsmith, J., Gellar, J., Harezlak, J., McLean, M. W., Swihart, B., Xiao,L., Crainiceanu, C. and Reiss, P. (2015) refund: Regression with Functional Data . URL: http://CRAN.R-project.org/package=refund . R package version 0.1-13.Kang, J. M., Yoo, T. and Kim, H. C. (2006) A wrist-worn integrated health monitoring instrumentwith a tele-reporting device for telemedicine and telecare. IEEE Transactions on instrumenta-tion and measurement , , 1655–1661.Kaniusas, E. (2012) Fundamentals of Biosignals , 1–26. Springer Berlin Heidelberg.Kneib, T. (2013) Beyond mean regression.
Statistical Modelling , , 275–303.6 R¨ugamer et al.
Malfait, N. and Ramsay, J. O. (2003) The historical functional linear model.
Canadian Journal ofStatistics , , 115–128.Meinshausen, N. and B¨uhlmann, P. (2010) Stability selection. Journal of the Royal StatisticalSociety: Series B (Statistical Methodology) , , 417–473.Mima, T. and Hallett, M. (1999) Electroencephalographic analysis of cortico-muscular coher-ence: reference effect, volume conduction and generator mechanism. Clinical Neurophysiology , , 1892–1899.Mima, T., Matsuoka, T. and Hallett, M. (2000a) Functional coupling of human right and leftcortical motor areas demonstrated with partial coherence analysis. Neuroscience Letters , ,93 – 96.Mima, T., Steger, J., Schulman, A. E., Gerloff, C. and Hallett, M. (2000b) Electroencephalo-graphic measurement of motor cortex control of muscle activity in humans. Clinical Neuro-physiology , , 326 – 337.Morris, J. S. (2015) Functional regression. Annual Review of Statistics and Its Application , ,321–359.Ozaki, T. (2012) Time Series Modeling of Neuroscience Data . Chapman & Hall/CRC Interdisci-plinary Statistics. CRC Press.Pawitan, Y. (2005) Coherence Between Time Series. In
Encyclopedia of Biostatistics , vol. 2. WileyOnline Library.Pfurtscheller, G. and da Silva, F. L. (1999) Event-related EEG/MEG synchronization and desyn-chronization: basic principles.
Clinical Neurophysiology , , 1842 – 1857.Quiroga, R. Q., Kreuz, T. and Grassberger, P. (2002) Event synchronization: a simple and fastmethod to measure synchronicity and time delay patterns. Physical review E , , 041904.Rinn, W. E. (1984) The neuropsychology of facial expression: A review of the neurological andpsychological mechanisms for producing facial expressions. Psychological Bulletin , , 52.Rousselet, G. A., Husk, J. S., Bennett, P. J. and Sekuler, A. B. (2008) Time course and robustnessof erp object and face differences. Journal of Vision , , 3.Ruppert, D., Wand, M. P. and Carroll, R. J. (2003) Semiparametric regression . Cambridge seriesin statistical and probabilistic mathematics. Cambridge and New York: Cambridge UniversityPress.Scheipl, F. and Greven, S. (2016) Identifiability in penalized function-on-function regressionmodels.
Electronic Journal of Statistics , , 495–526.Scheipl, F., Staicu, A.-M. and Greven, S. (2015) Functional additive mixed models. Journal ofComputational and Graphical Statistics , , 477–501. oosting Factor-Specific Historical Models Cognition and Emotion , , 1307–1351.Semmlow, J. L. and Griffel, B. (2014) Biosignal and medical image processing . CRC press.Teplan, M. (2002) Fundamentals of EEG measurement.
Measurement Science Review , , 1–11.Wood, S. N. (2006) Generalized Additive Models: An introduction with R . Chapman & Hal/CRC,Boca Raton, Florida.Zhang, T. and Yu, B. (2005) Boosting with early stopping: Convergence and consistency.
TheAnnals of Statistics , , 1538–1579.8 R¨ugamer et al. t i m e po i n t be i ng i n f l uen c ed t i n E M G [ m s ] prediction −1.0 −0.5 0.0 difference −2024 coefficientniveau − 10%5%1% hc, hp, gain −1.0−0.50.00.51.0 time point of influence s of EEG [ms] t i m e po i n t be i ng i n f l uen c ed t i n E M G [ m s ] prediction −1.5 −1.0 −0.5 difference −2024 coefficientniveau − 10%5%1% hc, hp, loss −1.0−0.50.00.51.0 time point of influence s of EEG [ms] t i m e po i n t be i ng i n f l uen c ed t i n E M G [ m s ] prediction difference −2024 coefficientniveau − 10%5%1% hc, lp, gain −1.0−0.50.00.51.0 time point of influence s of EEG [ms] t i m e po i n t be i ng i n f l uen c ed t i n E M G [ m s ] prediction difference −2024 coefficientniveau − 10%5%1% hc, lp, loss −1.0−0.50.00.51.0 time point of influence s of EEG [ms] Fig. 4: Estimated coefficient surfaces for the model with EEG-covariate ’Fz’ (plot of averagesignals per game condition at bottom with negative and positive values highlighted blue andred, respectively; signals are demeaned per time point by the overall mean), all four high controlsettings and the EMG-response signal of the frontalis-muscle (left panels: overall mean (1) ingrey, prediction without historical effects (2) in green, with historical effects (3) using the averageEEG-signal per game condition in black; center panel: dashed line as difference between (1) and(2), solid line as difference between (1) and (3)). Surfaces correspond to estimated main historicaleffect plus game condition specific historical effect. Different degrees of transparency in coefficientplot indicate surface points having (1 − niveau)-bootstrap intervals which do not contain the valuezero. To obtain a reasonably sized image estimated effects are visualised on a 40 ××