[PDF] A comparison of oscillatory characteristics in covert speech and speech perception

Abstract

Covert speech, the silent production of words in the mind, has been studied increasingly to understand and decode thoughts. This task has often been compared to speech perception as it brings about similar topographical activation patterns in common brain areas. In studies of speech comprehension, neural oscillations are thought to play a key role in the sampling of speech at varying temporal scales. However, very little is known about the role of oscillations in covert speech. In this study, we aimed to determine to what extent each oscillatory frequency band is used to process words in covert speech and speech perception tasks. Secondly, we asked whether the {\theta} and {\gamma} activity in the two tasks are related through phase-amplitude coupling (PAC). First, continuous wavelet transform was performed on epoched signals and subsequently two-tailed t-tests between two classes were conducted to determine statistical distinctions in frequency and time. While the perception task dynamically uses all frequencies with more prominent {\theta} and {\gamma} activity, the covert task favoured higher frequencies with significantly higher {\gamma} activity than perception. Moreover, the perception condition produced significant {\theta}-{\gamma} PAC suggesting a linkage of syllabic and phonological sampling. Although this was found to be suppressed in the covert condition, we found significant pseudo-coupling between perception {\theta} and covert speech {\gamma}. We report that covert speech processing is largely conducted by higher frequencies, and that the {\gamma}- and {\theta}-bands may function similarly and differently across tasks, respectively. This study is the first to characterize covert speech in terms of neural oscillatory engagement. Future studies are directed to explore oscillatory characteristics and inter-task relationships with a more diverse vocabulary.

Full PDF

AA COMPARISON OF OSCILLATORY CHARACTERISTICS INCOVERT SPEECH AND SPEECH PERCEPTION

A P

REPRINT

Jae Moon

Institue of Biomedical EngineeringUniversity of TorontoBloorview Research InstituteHolland Bloorview Kid’s Rehabilitation Hospital [email protected]

Silvia Orlandi

Bloorview Research InstituteHolland Bloorview Kid’s Rehabilitation Hospital [email protected]

Tom Chau

Institue of Biomedical EngineeringUniversity of TorontoBloorview Research InstituteHolland Bloorview Kid’s Rehabilitation Hospital [email protected]

September 8, 2020 A BSTRACT

Covert speech, the silent production of words in the mind, has been studied increasingly to understandand decode thoughts. This task has often been compared to speech perception as it brings aboutsimilar topographical activation patterns in common brain areas. In studies of speech comprehension,neural oscillations are thought to play a key role in the sampling of speech at varying temporalscales. However, very little is known about the role of oscillations in covert speech. In this study,we aimed to determine to what extent each oscillatory frequency band is used to process words incovert speech and speech perception tasks. Secondly, we asked whether the θ and γ activity in thetwo tasks are related through phase-amplitude coupling (PAC). First, continuous wavelet transformwas performed on epoched signals and subsequently two-tailed t-tests between two classes wereconducted to determine statistical distinctions in frequency and time. While the perception taskdynamically uses all frequencies with more prominent θ and γ activity, the covert task favouredhigher frequencies with signiﬁcantly higher γ activity than perception. Moreover, the perceptioncondition produced signiﬁcant θ - γ PAC suggesting a linkage of syllabic and phonological sampling.Although this was found to be suppressed in the covert condition, we found signiﬁcant pseudo-coupling between perception θ and covert speech γ . We report that covert speech processing islargely conducted by higher frequencies, and that the γ - and θ -bands may function similarly anddifferently across tasks, respectively. This study is the ﬁrst to characterize covert speech in terms ofneural oscillatory engagement. Future studies are directed to explore oscillatory characteristics andinter-task relationships with a more diverse vocabulary. Covert speech (CS), the silent production of words in one’s mind, is a fundamental trait in mental cognition (Alderson-Day and Fernyhough, 2012; Perrone-Bertolotti et al., 2014). It is referred to as a linguistic form of thought and linked toa wide range of neurocognitive functions, such as reading, writing, planning, and memory (Alderson-Day et al., 2018;Morin et al., 2011, 2018). Due to its ubiquity, many researchers in the realm of brain-computer interfaces (BCIs) havebeen assessing this task to restore speech in motor-impaired individuals by decoding thoughts (DaSalla et al., 2009; a r X i v : . [ q - b i o . N C ] S e p PREPRINT - S

EPTEMBER

8, 2020Idrees and Farooq, 2016; Deng et al., 2010). However, CS BCIs are notorious for their difﬁculty in training, oftenrequiring individuals to mentally rehearse each speech item numerous times for the system to learn a reliable controlsignal. Fortunately, research in neurolinguistics has linked CS with the task of speech perception (SP) due to theoriesand evidence describing the parallel nature of top-down and bottom-up language pathways. For instance, functionalmagnetic resonance imaging (fMRI) studies have revealed that CS and SP activate common brain regions along thelinguistic processing pathway (Okada and Hickok, 2006; Shergill et al., 2002; Skipper et al., 2005; van de Ven et al.,2009; Venezia et al., 2016), and time-domain methods report that the pattern of activation in these brain regions issimilar (Tian and Poeppel, 2010, 2012). The results of these studies can suggest that CS signals could be modeled basedon SP signals and that a CS BCI can be trained through the passive perception of speech. Thus, being able to modelCS from SP would help hurdle the fatigue barrier in CS BCIs and enhance their translational potential. However, inorder to achieve this modelling, one must understand how CS and SP tasks comparatively utilize neural oscillations,the primary mechanism of information transmission in the brain (Buzsáki and Draguhn, 2004; Buzsáki et al., 2004;Morillon and Schroeder, 2015; Luo and Poeppel, 2007; Ding et al., 2017a).Numerous studies support a bi-directional linkage between perception and production systems of speech (Buchsbaumet al., 2001; Hickok and Poeppel, 2007, 2004; Poeppel, 2014; Tian and Poeppel, 2010; Okada and Hickok, 2006;Shergill et al., 2002; Skipper et al., 2005; van de Ven et al., 2009; Venezia et al., 2016). It is thought that SP initiates inthe auditory regions for direct processing of ongoing speech and ultimately maps the speech units into an articulatorynetwork via a sensorimotor interface (Hickok, 2014; Hickok and Poeppel, 2004). Speech production, on the other hand,initiates as an articulatory motor expression, which, through the same sensorimotor interface, becomes transformedinto auditory sensory targets in the temporal lobe (Hickok, 2014; Tian and Poeppel, 2012). Although the directionalitybetween the two tasks may be opposed, they have been consistently shown to draw activations from common brainareas. Namely, they seem to converge and produce similar activation patterns largely in phonological networks wherethe fundamental contrastive speech units (phonemes) are realized (Tian and Poeppel, 2010; Hickok et al., 2011; Hickokand Poeppel, 2004; Hickok et al., 2009; Okada and Hickok, 2006; Okada et al., 2018). These studies critically highlightthat CS and SP recruit activity from common brain regions, which likely subserve common functions across tasks.These source localization studies invite the question, do CS and SP utilize frequency bands in a similar manner? In thebrain, information transmission is characterized at various temporal and spatial scales through neural oscillations in the δ -(1-2.5Hz), θ -(4-7Hz), α -(8-11Hz), β -(13-30Hz), and γ -(30-60Hz) bands (Gross et al., 2013; Luo and Poeppel, 2007;Giraud et al., 2007; Di Liberto et al., 2015; Giraud and Poeppel, 2012; Poeppel and Assaneo, 2020). In SP, it has beenestablished that lower frequency activity (e.g. θ ) detects syllabic quantities through tracking of the speech envelope,whereas higher frequency (e.g. γ ) parses temporally ﬁne units of speech such as phonemes. In addition, the ﬂuctuating θ phase has been found to modulate the amplitude bursts of the γ -band by imbuing a rhythmicity to the signal throughphase-amplitude coupling (PAC); which enables the coordinated sampling of syllabic and phonological speech items(Hyaﬁl et al., 2015; Assaneo and Poeppel, 2018; Hermes et al., 2014). In speech production, active sensing - a theorizedpredictive processing mechanism via a motor sampling of sensory faculties - would suggest that overt speech andits variants use neural oscillations similarly to SP since acts of speech production effectively produce self-generatedspeech noises (Morillon and Schroeder, 2015). For instance, differential γ -band augmentations have been observed inphonological processing regions during overt and covert phoneme repetition tasks, suggesting a possible common roleof γ activity to SP (Fukuda et al., 2010; Toyoda et al., 2014). More generally, dorsal stream motor areas have beenfound to have its own preferred rhythm of speech production (Restle et al., 2012; Assaneo and Poeppel, 2018; Poeppeland Assaneo, 2020), suggesting that the quasi-rhythmicity of the vocal tract articulators generates the cadence of thespeech envelope which, in turn, improves speech intelligibility during comprehension (Giraud et al., 2000; Boemioet al., 2005; Trouvain, 2007). Hence, it is increasingly possible that neural oscillations in SP and CS serve similarfunctions.However, no studies to date have directly investigated the relative oscillatory contributions and differences in CS and SP.The additive oscillatory components together form the broader characteristics of the signal, and thus efforts to producemodels of CS based on signals from SP must ﬁrst delineate the oscillatory differences across tasks. One such method ofunderstanding how frequency characteristics differ across classes is through a t -test of complex-valued coefﬁcients ofa time-frequency transform. This method is referred to as a studentized continuous wavelet transform ( t -CWT) andwas ﬁrst introduced by Bostanov (2004) with the intention of improving feature extraction methods for classiﬁcationin BCI paradigms. Since then, it has been used in the classiﬁcation of motor imagery (Darvishi and Al-Ani, 2007;Hsu et al., 2007) and the analysis of ERPs in real and simulated EEG data (Real and Kotchoubey, 2014). Accordingto the latter study, distinguishing between ERPs with t -CWT produced high speciﬁcity and sensitivity under varioussignal-to-noise ratios in comparison to common peak detection methods. Studentizing time-frequency information inthis manner allows for a direct statistical comparison of time-frequency information between two classes in order todetect frequency indices which are signiﬁcantly different. Furthermore, compared to the discrete wavelet transform orfast Fourier transform, CWT is an ideal candidate for time series analysis due to its more ﬁne-grained resolution and2 PREPRINT - S

EPTEMBER

8, 2020temporal stability with respect to frequency (Kimata et al., 2018). It is therefore sensible to implement t -CWT with arecording modality with strong temporal resolution such as EEG.In the present study, we asked whether t -CWT can identify the frequency bands which are used to distinguish wordswithin CS and SP, comparatively. Considering that CS lacks overt vocalization and thus salient self-stimulation, wehypothesized that lower frequency elements would be overshadowed by high frequency activity, such as β and γ .Subsequently, we asked whether the more pertinent oscillations of CS perform similar functions to those of SP, namelyby testing for θ - γ PAC. However, in the very likely scenario that the θ -band does not play a major linguistically-relevantrole in CS, we tested whether CS’s γ activity is ’pseudo-coordinated’ (or pseudo-coupled) to SP’s θ activity. Such acoupling would indicate that CS’s γ -band response has a rhythmicity that is related to the putative tracking of syllabicquantities by SP’s θ activity, thereby asserting that the γ -band performs a similar function across tasks. Therefore,we hypothesized that the roles of oscillations in SP and CS would be similar and different in the γ - and θ -bands,respectively. The remainder of the paper is organized as follows: in Section 2, we provide an overview of the previousworks on neural oscillation in CS and SP. In Sections 3 and 4, we provide the details of the study methodology anddescribe the results. In section 5 we discuss the ﬁndings of the study and ﬁnally conclude our paper in section 6. Over the past two decades, neural oscillations have proven to be a window to understanding a wide variety of cognitiveprocesses. The most important function of neural oscillations is to allow the brain to operate at multiple temporal andspatial scales such that information can be integrated into a holistic percept (Buzsáki and Draguhn, 2004; Buzsáki et al.,2004; Morillon and Schroeder, 2015). Indeed, oscillations are thought to provide the most energy-efﬁcient physicalmechanism for synchrony and temporal coordination (Mirollo and Strogatz, 1990). In the study of language processing,the induction of such oscillations (e.g. δ , θ , α , β , γ ) contributes to synchronized activities across spatially segregatedneuronal assemblies for the coordinated processing of speech units at varying scales (Bastiaansen and Hagoort, 2006;Weiss and Mueller, 2003).A multitude of studies have investigated the role of oscillations during SP, and a commonly synthesized interpretationfrom literature is that SP multiplexes neuronal oscillations; that is, SP dynamically samples incoming acousticinformation at multiple time scales simultaneously (Gross et al., 2013; Luo and Poeppel, 2007; Ding et al., 2017a;Giraud et al., 2007; Poeppel and Assaneo, 2020). A general rule of thumb is that the higher the frequency of oscillation,the ﬁner the detail to which speech information is sampled. For instance, δ oscillations (1-2.5Hz) have been implicatedin the processing of words, phrases, and sentences (Giraud et al., 2007; Morillon et al., 2012; Doelling et al., 2014; Dinget al., 2015). θ activity (4-7Hz) has been found to be critically sensitive to syllabic modulations namely by tracking theongoing speech envelope that contains a 4.5Hz syllabic speech rate (Luo and Poeppel, 2007; Giraud et al., 2007; Ghitza,2013; Doelling et al., 2014). In contrast, high frequency γ activity (30-60Hz) has been found to index processing atthe phonemic level, as intracortical studies have located a ’phonotopic’ map of phonemes in regions of the superiortemporal gyrus producing differential γ -band augmentations to phonemes (Chang et al., 2010; Moses et al., 2016;Pasley et al., 2012). Although β activity (13-30Hz) has been commonly associated with motor-related potentials inmotor imagery studies, in language, this oscillation is thought to be involved in playing a simultaneous role alongside θ and γ activity by conjoining phonological units into a broader syllabary by binding the activity of temporally segregatedneuronal assemblies (Bastiaansen et al., 2010; Weiss and Mueller, 2003, 2012).Of these oscillations, the most crucial oscillations seem to be the θ and γ band. Evidence for this comes from numerousobservations that the phase and amplitude of these frequency bands are coupled in order to synchronize the detection ofsyllabic boundaries and the parsing of phonemes (Lizarazu et al., 2019; Mai et al., 2016; Gross et al., 2013; Morillonet al., 2012). In other words, θ activity samples the input spike trains (induced by speech waveform) to generate basicunits and time references of speech for subsequent, ﬁner-detailed processing by γ activity (Giraud and Poeppel, 2012).The purpose behind this phase-amplitude coupling (PAC) may be to temporally localize γ processing power to moredescriptive parts of syllabic sound patterns that constitute reference time frames (Hyaﬁl et al., 2015).Thus, SP multiplexes in relevant frequency bands in order to detect incoming speech and parse them for necessaryspeech comprehension (Pickering and Garrod, 2013). Although the main oscillatory contributions during SP have beenﬂeshed out, there are vastly fewer studies investigating the role of oscillations in CS. The main feature that distinguishesCS from SP is corollary discharge. Corollary discharge is, in essence, a neural sensory prediction of the consequencesof self-generated movements (Wolpert and Ghahramani, 2000; Cullen, 2004) and, in the case of speech, it is regardedas an auditory prediction of self-generated speech noises (Ford and Mathalon, 2005, 2019; Jack et al., 2019; Scott,2013). The sequential estimation mechanism by Tian and Poeppel (2012) theorizes that the principal reason why CSproduces similar activation patterns as SP in temporal regions of the brain (Tian and Poeppel, 2010) is due to an auditoryprediction of imagined articulation. Indeed, this corollary discharge during CS has been shown to be temporally precise3 PREPRINT - S

EPTEMBER

8, 2020and content-speciﬁc (Jack et al., 2019), sensory in nature (Scott, 2012, 2013), and cancel out self-generated sounds(Okada et al., 2018). Therefore, CS and SP are bridged by a common sensory goal in the auditory domain. This factinvites the question: does CS use oscillations in a similar manner to SP?The corollary discharge during speech production has been linked to a fronto-temporal γ -band synchrony (Chenet al., 2011). Moreover, investigations into auditory verbal hallucinations (AVH) have revealed that schizophrenicindividuals exhibit signiﬁcantly suppressed fronto-temporal γ synchrony, suggesting that an aberrant corollary dischargeis responsible for thoughts manifesting as phantom perceptions (Uhlhaas et al., 2006; Uhlhaas and Singer, 2010; Gallinatet al., 2004; Ford and Mathalon, 2005, 2019; Mathalon and Ford, 2008). More relevant to neurolinguistics, intracranialrecording studies of overt and covert phoneme repetition tasks have observed differential γ band augmentations intemporal brain regions thought to be responsible for phonological processing (Fukuda et al., 2010; Toyoda et al., 2014).As the purpose of corollary discharge is to match the sensory consequences of self-generated actions, these results invitethe hypothesis that the corollary discharge produced during CS, reﬂected in its γ -band response, may be phonologicalin nature similar to SP.Although these studies provide indirect evidence for a common γ -band function across tasks, the same may not be thecase for θ . Hermes et al. (2014) showed that θ - γ PAC is suppressed during CS, with θ power being anti-correlated tohigh frequency activity. In contrast to ﬁndings of increase PAC in SP studies, the authors surmised that when there is noexternal input, brain areas may need to downregulate θ activity in order to allow local neuronal processing (Schroederand Lakatos, 2009). Therefore, it is increasingly possible that the θ -band may play an alternative role to syllabicchunking seen in SP.EEG is an appropriate modality in which to measure and analyze such brain oscillations due to its fast temporal resolutionand ease of setup. In speech processing studies, this modality has been frequently used characterize oscillations inphase entrainment (Zoefel and VanRullen, 2016), processing asymmetry (Morillon et al., 2012), speech intelligibility(Onojima et al., 2017), semantic evaluation of speech (Shahin et al., 2009), categorical processing (Bidelman, 2015),and ﬁnally oscillatory abnormalities in schizophrenic individuals (Uhlhaas and Singer, 2010; Ford and Mathalon, 2005).Although the spatial resolution for EEG is poor due to volume conduction effects, its strong temporal resolution presentsthis modality as an optimal medium for tracking fast temporal dynamics of ongoing neural oscillations.In summary, the ﬁndings outlined above demonstrate that certain oscillations may be functionally correlated or divergentin SP and CS. However, no studies have yet determined the relative oscillatory engagements across the two tasks.Hence, the present study used EEG to demonstrate how CS utilizes oscillations relative to SP and whether the mostpertinent frequency bands (i.e. θ , γ ) perform similar functions across tasks. Ten adults between the ages of 20 and 40 without disabilities or known health conditions were recruited for this study (7Female, age X +/-Y; 3 Male, age X+/-Y). All participants were right-handed to ensure a consistency in the hemisphericdominance of neurolinguistic processing. Furthermore, all participants were native English speakers (i.e. ﬁrst language).The research ethics board of the Bloorview Research Institute approved this study. Participants provided informedwritten consent forms.Participants donned a 128 electrode ActiCap EEG cap. Of these 128 channels, 64 were utilized (Fig. 1), with the groundelectrode at AFz and the reference electrode at FCz. Channels Fp1 and Fp2 were used as ocular artifact detectors.Electrode coverage included the frontal, temporal, and temporo-parietal areas on both hemispheres, including midlinecomponents such as Fz, Cz, CPz, and Pz). Data was sampled at 1000Hz and collected through BrainVision Recorder.

Each participant was seated comfortably approximately 50cm from the computer screen with a refresh rate of 75Hz.The screen was positioned in the central ﬁeld of vision and light in the data collection room was turned off prior tobeginning the computer task to minimize peripheral vision distractions. Prior to the experiment, prompted by a constantgreen cross, baseline signals (i.e. neural activity at rest) were recorded for a minute. During the session, SP and CStrial pairs were presented sequentially (Fig. 2). First, a blank screen with a duration jitter between 1-2 seconds waspresented, followed by a green cross for 2 seconds. During this time, the audio of the speech token (‘Blue’ or ‘Orange’)was presented once. Succeeding this was another blank screen for 2 seconds, followed by a red cross for 2 seconds.Participants were instructed to covertly rehearse the speech token that they had just heard. Therefore, every SP trialwas succeeded by a CS trial with the same word. For the rest trial, a letter R cued the participants to refrain from4

PREPRINT - S

EPTEMBER

8, 2020Figure 1: 64 channel ActiCap wet EEG montage.Figure 2: The experimental protocol for SP, CS, and rest. SP and rest were preceded with a jitter of 1-2 seconds andfollowed by either the audio or the symbol, R. In the former case, this was succeeded by a 2 second blank screen and a2 second window for CS, demarcated by the red cross. In the latter case, the 2 second blank screen was followed by ablack cross signalling rest for 5 seconds. 5

PREPRINT - S

EPTEMBER

8, 2020the speech task and ﬁxate on a black cross held on the screen for 5 seconds. Each block consisted of 10 SP-CS trialpairs and 10 rest trials, and each session consisted of 5 blocks. Therefore, each session involved 50 trials/class, andparticipants underwent two sessions each at least 2 days apart and at roughly the same times of the day. Each sessiontook approximately one hour.

The two speech items (‘Blue’, ‘Orange’) were chosen as they differ in the number of syllables and phonemes and differin their place and manner of articulation. As such, they were expected to engender substantially different patterns ofneural activity that could be suitable for detecting substantial differences in the neural activity associated with eachitem. The rationale was that these variations would encourage the activation of different motor, somatosensory, andauditory neural representations, thereby enhancing signal discriminability. Speech stimuli were generated by GoogleCloud Text-to-Speec platform and presented at an approximate rate of 150 words per minute, which is within the rangeof the natural speaking rate (Giraud et al., 2007; Luo and Poeppel, 2007). Phoneme models were generated through theMontreal Forced Aligner (McAuliffe et al., 2017).

Raw data were analyzed in EEGLAB (Delorme and Makeig, 2004). A 4th order zero-phase high pass Butterworth ﬁlterwith a cutoff frequency of 1Hz was applied to remove baseline drift. Subsequently, a 4th order zero-phase low passButterworth ﬁlter with a cutoff frequency of 60Hz was applied to remove high frequency noise, as well as the high- γ band. Subsequently, the PREP pipeline (Bigdely-shamlo et al., 2015) was applied to remove line noise, detect noisyor outlier channels, and to interpolate bad channels. Eye movement artifacts and muscular artifacts were removed intwo separate steps using blind source separation through the EEGLAB plugin "Automatic Artifact Removal toolbox".Following preprocessing, a spline Laplacian was applied to establish local relationships between surface potentials andthe underlying source activity (Babiloni et al., 2001). Data were downsampled to 256Hz prior to epoching. Data fromsessions 1 and 2 were combined. There were a total of 5 classes: SP Blue (SPB), SP Orange (SPO), CS Blue (CSB), CSOrange (CSO), and rest (RST). To determine how oscillations drive the distinction of words in CS and SP, we employed a t -CWT routine in orderto ﬁnd the frequency and time indices at which two sets of wavelet coefﬁcients were signiﬁcantly different. First,each epoched trial was zero-padded with 12 samples in the begining and end. A CWT was conducted on each signalyielding a 55 frequency by 536 time sample matrix (frequencies above 60Hz were removed). CWT was performedusing equation (1): W ( s, t ) = 1 √ s (cid:90) ∞−∞ f ( τ ) ψ ( τ − ts ) dτ (1)Where W(s,t) represents the wavelet coefﬁcients, s denotes the scale or frequency, t denotes the time shift, and ψ is the wavelet function which has a zero mean. CWT is thus a sort of template matching computation whereby thecross-covariance between the signal and mother wavelet (here, a Morlet wavelet) is measured by shifting back and forththe latter at dilated and constricted scales. The local extrema of W(s,t) signify the points in frequency and time thatare best matched between the signal and template wavelet, and can be visualized in the form of a time-frequency plot,referred to as a scalogram.To determine which mother wavelet suited the data best, each mother wavelet (Bump, Haar, Morse, Morlet) was used tocreate wavelet coefﬁcients which were then used to reconstruct the signal via inverse CWT. The correlation between theoriginal and reconstructed signals was calculated by conducting cross-correlation tests, divided by the auto-correlationof the original signal. This analysis yielded a metric that showed how similar the two sets of waveforms are related. Itwas found that the Morlet wavelet correlated best compared to all other mother wavelets. This is consistent with reportsthat this wavelet is useful for the detection of salient oscillations (Ende et al., 1998; Senkowski and Herrmann, 2002).For each classiﬁcation type and the 58 chosen channels, aggregated two-sample t -tests were conducted on the waveletcoefﬁcients across all trials, yielding 55 frequency x 536 sample t -statistic and H (hypothesis test; 0 or 1) matrices. Foreach channel, the t -CWT was calculated by: 6 PREPRINT - S

EPTEMBER

8, 2020 t k ( s, t ) = W kx ( s, t ) − W ky ( s, t ) (cid:113) σ x n + σ y m (2)where k is channel and W kx ( s, t ) , W ky ( s, t ) , σ x ( s, t ) , and σ y ( s, t ) denote sample means and standard deviations ofwavelet coefﬁcients across trials at each scale s and time t , with sample sizes n and m . Studentizing the waveletcoefﬁcients in this manner enabled the statistical comparison of two classes, describing the frequency and time indicesat which they are signiﬁcantly different. As such, the complex-valued t -statistic matrices served as time-frequencyscalograms with greater magnitudes denoting greater differences. Thus, the absolute values of the t -statistic matriceswere calculated and subsequently normalized to determine the regional maxima, with the condition that the maximamust be located within the cone of inﬂuence, but importantly, where H=1; i.e. where there is a signiﬁcant differencebetween classes (Fig. 3). Detecting maxima only within the cone of inﬂuence mitigated the risk of detecting artifactualmaxima in the scalograms. Each channel produced a different amount of regional maxima.Figure 3: Magnitude scalogram of a t -statistic. Red stars denote the time and scale indices of the peaks of the magnitudescalogram. Peaks were chosen under the criteria that they would exist inside the cone of inﬂuence, and importantly,where H=1 (i.e. signiﬁcant difference). Wavelet coefﬁcients of the CWT output were extracted at the frequency and time indices obtained through t-CWT. Thesevalues were extracted for each of the 58 channels and appended into a complex feature matrix. Subsequently, the realand imaginary values of these complex wavelet coefﬁcients were obtained and appended side by side to form a matrixof trials x features, with the last column representing the class identity. For each participant, the original feature matrixwas 100 trials x 500 features approximately. However, Minimally Redundant Maximally Relevant (mRmR) featureselection was conducted to select the top 20 features, which was the approximate cutoff point for feature importanceduring mRmR. Subsequently, a 10-fold cross validation was conducted on the 100 trials x 20 features matrices andsubsequently classiﬁed through a support vector machine (SVM) with a radial basis function kernel. All classiﬁcationtypes were binary.

Average classiﬁcation accuracies and their standard deviations were obtained using a 10 fold cross-validation. Precision,Recall, and F1-score were calculated for each of these folds and subsequently averaged.

P recision = T rueP ositivesT rueP ositives + F alseP ositives (3)7

PREPRINT - S

EPTEMBER

8, 2020

Recall = T rueP ositivesT rueP ositives + F alseN egatives (4) F − score = 2 · P recision ∗ RecallP recision + Recall (5)

The importance of PAC is well documented in speech processing studies (Giraud and Poeppel, 2012; Hyaﬁl et al., 2015;Assaneo and Poeppel, 2018; Voytek et al., 2013; Hermes et al., 2014). To determine signiﬁcant θ - γ coupling in CS, theevent-related phase-amplitude coupling (ERPAC) toolbox was utilized (Voytek et al., 2013). A 4th-order Butterworthﬁlter was applied between 4-7Hz to obtain θ -band signals, after which the angle of the Hilbert transform was taken toobtain phase data. The γ band was calculated by ﬁltering the signals between 30-60Hz, with a Butterworth ﬁlter orderof 3*r where r is the sampling rate divided by the low frequency cutoff of the ﬁlter, rounded. The γ -band amplitude wasobtained by taking the absolute value of the Hilbert transform. In a speciﬁc channel, PAC for each phase-amplitudepair was calculated through the circ_corrcl.m function from the CircStat toolbox (Berens, 2009) at each timepoint andacross all trials. This function calculates the correlation coefﬁcient ( ρ ) between a circular/angular ( φ, c, s ) and linearrandom variable (such as amplitude - a ) by linearizing the phase variable into sin and cosine components: ρ φa = (cid:115) r ca + r sa − r ca r sa r cs − r cs (6)where r ca = corr ( cosφ [ t ] , a [ t ]) , r sa = corr ( sinφ [ t ] , a [ n ]) , r cs = corr ( sinφ [ t ] , cosφ [ t ]) , and corr ( x, y ) is thePearson correlation between x and y with the assumption that the distribution of x and y are Gaussian. φ [ t ] and a [ t ] are the instantaneous phase and instantaneous analytic amplitude, respectively. Utilizing this function enabled theassessment of relationships between circular θ phase and linear γ amplitude at each time point and across trials.Subsequently, 1000 surrogate runs were conducted by shifting the trials of the amplitude data and testing correlationbetween the phase data and shifted amplitude data across trials at each time point. These PAC ρ values were comparedby ﬁrst applying Fisher’s z -transform to normalize the correlation coefﬁcients: z rt = 12 ln (cid:18) ρ t − ρ t (cid:19) (7)and calculating the difference between z -transformed coefﬁcients: (cid:52) ρ z = z ( ρ true ) − z ( ρ surrogate ) (8)From this, the z -score can be calculated by: z = (cid:52) ρ z σ (9)where σ is standard error. z -scores were then transformed into p -values via a normal cumulative distribution functionwith µ = 0 , σ = 1 . The reported p -values denote the time points at which there are signiﬁcant θ - γ PAC occurringagainst a surrogate population within a speciﬁc channel. For cross-task PAC (e.g. SP θ -CS γ ), the class indices used foramplitude calculation was simply switched to a CS class. For the t-CWT results, the frequency indices were tabulated across all channels and participants and subsequentlycategorized into the ﬁve major bands. Shapiro-Wilks tests were conducted on each category to conﬁrm non-normality.Subsequently, Wilcoxon Rank Sum tests were conducted for unequal medians.To test for signiﬁcant correlations between the θ phases of SP and CS classes, the circ_corrcc.m function from theCircStat toolbox was invoked (Berens, 2009). This function assesses the correlation between two circular/angularrandom signals: 8 PREPRINT - S

EPTEMBER

8, 2020 ρ αβ = (cid:80) i sin ( α i − α ) sin ( β i − β ) (cid:113)(cid:80) i sin ( α i − α ) sin ( β i − β ) (10)where α and β denote two samples of angular data and α and β denote their means. Under the null hypothesis ofno signiﬁcant correlations, the p -value to this correlation was computed by a normally distributed test statistic. Thisenabled the testing of angular correlation between two sets of SP θ phase across all trials of participants (1000 trials). Part/Type 1 2 3 4 5 6 7 8 9 10SPO-CSO Acc.

Prec.

Rec. F1 CSB-CSO Acc.

Prec.

Rec. F1 SPB-CSB Acc.

Prec.

Rec. F1 SPO-CSO Acc.

Prec.

Rec. F1 SPB-RST Acc.

Prec.

Rec. F1 SPO-RST Acc.

Prec.

Rec. F1 CSB-RST Acc.

Prec.

Rec. F1 CSO-RST Acc.

Prec.

Rec. F1 Table 1: Classiﬁcation scores for each classiﬁcation type and for each participant. Post-mRmR feature selection featureswere classiﬁed using a SVM with a radial basis function kernel. Classiﬁcation accuracies were calculated by averagingacross the 10 cross validation folds (standard deviation in brackets). Precision, recall, and F1-scores were calculated foreach fold and subsequently averaged.Extracting the real and imaginary values of imaginary coefﬁcients at these indices produced signiﬁcantly higher binarySVM classiﬁcation accuracies than chance level (Table 1) in most participants with the exception of P2 and P7. Similarly,these features were found to have high precision, recall, and F1-score across most participants. The two participantswho seemed to perform relatively poorly had greater standard deviations across classiﬁcations and lower performancescores.Assessing the frequency indices at which opposing classes are signiﬁcantly different (cumulative over all 58 channelsand participants) revealed signiﬁcant differences in oscillatory characteristics between SP and CS (Fig. 4). Consistentwith previous studies (Gross et al., 2013; Luo and Poeppel, 2007), SP multiplexed in all relevant frequency bands suchas δ , θ , β , and γ . After conﬁrming non-normality of distribution of frequencies through Shapiro-Wilks tests ( p <0.05),Wilcoxon Rank Sum tests revealed that distinguishing SP classes engages signiﬁcantly more of lower frequency δ and θ ( p <0.01, p < 0.0001), whereas distinguishing CS involves more usage of low γ ( p <0.05) (Fig. 4a). No signiﬁcantdifferences in α and β were observed ( p >0.01). Distinguishing between corresponding SP and CS classes (e.g. Blue)showed that SP and CS classes differ similarly across words ( p >0.05) (Fig. 4b). The distinction of active classes fromrest compared between CS and SP revealed signiﬁcantly higher involvement of δ and θ bands in SP ( p <0.05, p <0.01),whereas CSO vs RST produced signiﬁcantly higher β involvement than SPO vs RST ( p <0.05) (Fig.4c, d).To determine the topography of oscillatory differences, EEGLAB’s (Delorme and Makeig, 2004) topoplot.m functionwas used on features selected through mRmR feature selection. Distinctions in channel locations were tabulated(cumulative across participants) for each frequency band (Fig. 5). SP showed relatively greater inter-participantconsistency in the θ -band, with widespread distinctions across temporal and temporo-parietal regions and high countsin this frequency (Fig. 5b). SP’s γ distinctions were focally distributed with lower consistencies across participants (Fig.9 PREPRINT - S

EPTEMBER

8, 2020 (a) (b)(c) (d)

Figure 4: Differential utilization of frequency bands in covert speech and speech perception. Each frequency and timeindex was determined ﬁrst by conducting a two-sample t -test against two sets of wavelet coefﬁcients. Maxima of the t -statistic were used as indices provided that the hypothesis test returned a statistically signiﬁcant difference. Data werepooled across all channels and participants. A Shapiro-Wilks test was conducted between each binary classiﬁcationpair (e.g. SPB vs SPO) to conﬁrm non-normality. Subsequently, a Wilcoxon Rank Sum test was conducted to test forsigniﬁcantly different medians. The distinction of SP classes produced a greater engagement of low frequency δ and θ ,whereas CS involved more low γ activity for distinction (a). No signiﬁcant differences were observed when comparingbinary classiﬁcations involving corresponding SP and CS classes (b). δ and θ activity contributed signiﬁcantly more todistinguishing SP from rest than CS from rest (c, d). β activity contributed signiﬁcantly more to distinction of CSO vsRST than SPO vs RST (d). (*- p <0.05; **- p <0.01; ***= p <0.001; ns-no signiﬁcance).10 PREPRINT - S

EPTEMBER

8, 2020 (a) (b)(c) (d)(e)

Figure 5: Topography of frequency indices (a-d) time indices (e) of speech perception and covert speech tasks. (a-d)Calculated from 20 features selected through mRmR feature selection as a tabulated sum across participants. Figureswere generated through EEGLAB. (Delorme and Makeig, 2004). Scale shows the number of tabulated distinctions in thefrequency band. Brighter colors mean higher count. The distinction of CS classes produces involves higher frequencieswhereas the distinction of SP is largely based on lower frequencies. (e) Time indices were calculated by taking themedian of time values across participants. Scales show the time in seconds. Brighter and darker colors mean high-and low-latency, respectively. SP and CS both produced low latency distinctions in the temporal and temporo-parietalregions, with high latency activations in the frontal/motor regions.5d). In contrast, CS produced focal θ -band distinctions with low count/consistency and widespread γ -band distinctionswith high amount of consistency across participants in temporal and temporo-parietal regions. Both tasks producedcomparable amount of β -band distinctions, but CS produced more counts of distinguishable β patterns in the righthemisphere (Fig. 5c). CS had minimal δ activity, whereas SP showed three foci of δ activity in the left hemisphere (Fig.5a). Event-related spectral perturbations (ERSPs) revealed strong transient synchronization in the θ band for CS and SPbetween 200-500ms (Fig. 6a, b). For both tasks, this was succeeded by β -band synchronization starting at the offsetof θ -band synchronization. γ -band desynchronizations were observed for both tasks between 200-500ms, but morescattered for CS. Rest showed scattered and unorderly synchronizations in the α and β bands (Fig. 6c).Considering the putative coordination between θ and γ band activity (Giraud and Poeppel, 2012; Hyaﬁl et al., 2015),PAC between these frequency bands were assessed across all classes (Fig. 7). Interestingly, single-channel PACs wereobserved speciﬁcally in only the right hemisphere temporal channels. Namely, in channel FT10, the PAC of both SPclasses signiﬁcantly departed from the surrogate PAC between 200-500ms ( p <0.01), conﬁrming that SP γ amplitudeproduces a rhythm in keeping with the cadence of ﬂuctuating θ phase. CS and rest were found to suppress or lack sucha stable PAC, producing sparse distributions of signiﬁcant departures from surrogates.However, it was possible that CS θ activity may have served a separate function unlike that putatively observed in SP(Restle et al., 2012; Albouy et al., 2017), especially in the absence of salient stimulation. Therefore, the PAC between11 PREPRINT - S

EPTEMBER

8, 2020 (a) (b)(c)

Figure 6: Event related spectral perturbation in perception (a), covert speech (b), and rest tasks (c) in channel T8.Figures were generated using EEGLAB’s spectopo.m function (Delorme and Makeig, 2004) and data aggregated fromparticipants 1-5. Results are FDR-corrected at p <0.05. Frequency units are logarithmically spaced. The ERSP forSP (a) and CS (b) both show early latency synchronization in the θ band, followed by β -band synchronization. Restshowed a scattered synchronization of α and β -bands.SP’s θ and CS’s γ was assessed to determine whether the γ amplitude of CS contains a task-speciﬁc rhythm (Fig. 8a,b). Signiﬁcant pseudo-PAC ( p <0.01) occurred again between 200-500ms for both words, conﬁrming that CS’s γ -bandproduced a rhythmic ﬂuctuation speciﬁc to the time course of SP’s θ synchronization (Fig. 6a). Furthermore, relativelyless but nevertheless signiﬁcant pseudo-PACs were observed across words (e.g SPB-CSO), meaning that the SP θ -CS γ relationship contained both general and speciﬁc portions. This lack of speciﬁcity between SP θ and CS γ was found tobe due to signiﬁcantly correlated θ phase patterns in SP of the two words between 200-500ms (circular correlation test p <0.01) (Fig. 9a). On the other hand, CS did not produce correlative θ phase patterns across words (Fig. 9a) and acrosstasks (Fig. 9b).However, this did not portend that the γ -band responses of CS and SP words were general, as signiﬁcant cross-trial γ power correlations (Pearson correlation p <0.05) were observed across tasks. Such correlations were observed in lefttemporal and temporo-parietal regions, the right fronto-temporal edge, and along the right motor to somatosensoryregions (Fig. 10), consistent to that seen in the topography of frequencies (Fig. 5). Importantly, signiﬁcant γ -bandpower correlations were observed in channel FT10, where SP θ - γ and SP θ -CS γ PACs were observed.12

PREPRINT - S

EPTEMBER

8, 2020 (a) (b)(c) (d)(e)

Figure 7: Speech perception produces signiﬁcant θ phase- γ amplitude coupling between 200-500ms. Calculatedthrough ERPAC toolbox (Voytek et al., 2013). To determine whether arising PAC relationships index an inter-trialrelationship and not an artifact of stimulus-evoked responses, we conducted a resampling analysis (surrogate testing)that randomizes the phase-amplitude relationship across trials. This was done 1000 times per sample and resulted ina distribution of possible surrogate PACs. SP produced signiﬁcant PAC relationships occurring between 200-500ms,while CS and rest produced relatively little and sparse PAC between θ phase and low γ amplitude. Dotted lines indicatesigniﬁcant difference of true PAC from surrogates ( p <0.05). Figures were generated from participant 5.13 PREPRINT - S

EPTEMBER

8, 2020 (a) (b)

Figure 8: Speech perception θ phase predicts covert speech low γ amplitude between 200-500ms. PACs were calculatedacross trials and per sample of the time series after a Hilbert transform using the ERPAC toolbox (Voytek et al., 2013).Phase information was extracted only on SP classes and amplitude information was extracted only on the correspondingCS class (a,b). Statistical signiﬁcance was calculated through surrogate testing. a and b show that SP’s θ phase issigniﬁcantly ’pseudo-coupled’ to CS’s low γ amplitude compared to surrogate PACs between 200-500ms, with a similarcoupling morphology to SP PAC. Red dotted lines indicate signiﬁcance difference ( p <0.01) for blue lines. (a) (b) Figure 9: θ activity is task dependent and serves different functions in speech perception and covert speech. θ phase ofeach signal was determined by calculating the angle of the Hilbert transform of the signal after a Butterworth bandpass ﬁlter between 4-7Hz. The circular mean of the phase angles were calculated across all channels. Subsequently,the circular correlation between SPB and SPO θ phase was calculated for each time point and across trials. Data werepooled across all participants. θ phase was correlated in the two SP classes (a). Signiﬁcant ( p <0.01) correlationswere observed between 200-500ms for SPB-SPO, whereas CSB-CSO produced no signiﬁcant phase correlations.Furthermore, no relationship was observed between the theta phases of SP and CS (b). Grey line depicts signiﬁcantcircular correlations of theta phase between SPB and SPO. Red and blue lines depict the SP-CS pairs which producedno signiﬁcant θ phase correlations.Figure 10: Topography of cross-trial γ power correlations between corresponding classes. Data were pooled across allparticipants and sessions. γ amplitude was extracted ﬁrst through a Butterworth ﬁlter band-passed between 30-60Hz onpreprocessed signals and then taking the absolute value of the Hilbert transform of said signals. Cross-trial γ power wascalculated by squaring the amplitude at each point in time, summing across trials, and normalizing by number of trialsto result in a single time vector of power. s the Pearson’s correlation coefﬁcient was extracted for each channel betweentwo opposing classes (e.g. SPB-CSB). High values (bright regions) correspond to high γ power correlation across tasks(Pearson’s rho). Only correlations with p <0.05 are depicted.14 PREPRINT - S

EPTEMBER

8, 2020

Using a t -CWT method, the present study conﬁrmed the hypothesis that CS largely utilizes higher frequency oscillationsrelative to SP. Crucially, we conclude that the γ -band likely functions similarly across tasks. Like SP, the γ activityof CS was found to contain a processing rhythm time-locked to the cadence of event-related SP θ phase. Speciﬁcally,we suggest that CS’s γ activity likely depicts a phonological processing function similar to SP reported here and inprevious studies (Giraud and Poeppel, 2012; Pasley et al., 2012; Chang et al., 2010, 2016). However, the lack of θ - γ PAC within CS suggests that CS’s θ activity likely serves alternative roles to syllabic chunking seen in SP, possibly inpreparatory motor or memory-related activity (Restle et al., 2012; Albouy et al., 2017). The present study represents theﬁrst investigation into the differences in oscillatory engagements between CS and SP and reports a relationship betweenSP’s θ and CS’s γ activity. The ﬁndings reported here could enable the development of CS models based on SP signals,which can be used to train a CS BCI based on the passive perception of speech. The main goal of the study was to assess the relative contribution and roles of major oscillatory dynamics in CS relativeto SP. Thus, the oscillatory references of SP will be discussed ﬁrst. A commonly synthesized interpretation from studieson SP investigating the role of oscillations is that each frequency band contributes to a dynamic sampling of speechitems at varying temporal scales, referred to as multiplexing (Gross et al., 2013). The greater the frequency, the greaterthe resolution and detail at which a speech item is sampled. In the present study, we have identiﬁed that SP indeedutilizes the δ , θ , β , and γ frequency bands, with signiﬁcantly more prominent distinctions in the δ and θ bands thanCS. This prominence can be attributed to the importance of tracking the speech envelope of salient percepts (Luo andPoeppel, 2007; Giraud and Poeppel, 2012) and syllabic chunking (Ghitza, 2012, 2013; Doelling et al., 2014), featureswhich seem to be common across languages (Ding et al., 2017a; Varnet et al., 2017). Moreover, θ activity may producegeneralized patterns during SP, as the frequency topography showed widespread and consistent distinctions acrossparticipants in the temporal and temporo-parietal channels.The relatively high count of γ -band distinctions suggests that the driving force of delineating words in SP lies primarilyin phonological processes (Chang et al., 2010). These activations were found focally in left and right temporal andtemporo-parietal regions, which may loosely correspond to phonological regions of interests as reported throughintracranial recording studies (Chang et al., 2010, 2016; Pasley et al., 2012). Moreover, the signiﬁcant θ - γ PACbetween 200-500ms likely depicts a linkage between syllabic and phonological processes (Giraud and Poeppel, 2012),suggesting that the two oscillations may be inherently coordinated to focus γ activity to speciﬁc times within syllabictime references upheld by θ (Hyaﬁl et al., 2015).The timing of this PAC was likely linked to strong θ -band synchronization occurring in this time period, as observedthrough ERSP. Interestingly, β -band synchronization followed shortly after. The β -band, producing a comparableamount of distinctions as γ , has been proposed to play a role in binding of semantic (Weiss and Mueller, 2003) andsynactic information (Bastiaansen et al., 2010), being sensitive to the temporal alignment of ongoing speech (Rimmeleet al., 2018). The observation of enhanced β -synchronization succeeding the θ - γ PAC can suggest that coordinatedphonological and syllabic sampling becomes bound/conjoined into whole-word percepts via β activity during SP.CS, on the other hand, did not exhibit a diverse multiplexing relationship akin to SP, but rather favoured high frequencyactivity, namely β and γ , for the distinction of words. The low count and sparse distribution of θ activity was likelydue to lack of salient percepts in CS, suggesting that, unlike SP, θ activity in CS is focal and may serve a dissimilarfunction (discussed in Section 5.3). On the other hand, the high amount of distinctions in the γ -band suggested thatthis frequency is highly speciﬁc to word identity. Indeed, studies of overt and covert phoneme/word repetition taskshave shown differential γ -band augmentations in the temporal and temporo-parietal lobe (Fukuda et al., 2010; Peiet al., 2011; Toyoda et al., 2014), which may loosely correspond to the frequency topography reported here. Whatthe frequency topography does ﬁrmly suggest is that the widespread and high count of β - and γ -band distinctionslikely indicated a greater degree of inter-participant consistency and can, in turn, allude to the existence of generalizedactivation proﬁles in temporal and temporo-parietal channels during CS.It is possible that the γ -band activity during CS corresponded to a corollary discharge, as enhanced fronto-temporal γ synchrony has been observed during speech production tasks relative to perception conditions (Chen et al., 2011;Ford and Mathalon, 2005). Furthermore, in investigations of auditory verbal hallucinations (AVH), fronto-temporal γ synchrony was found to be signiﬁcantly suppressed in schizophrenic individuals (Uhlhaas et al., 2006; Uhlhaasand Singer, 2010; Gallinat et al., 2004), suggesting an improper transmission of corollary discharge leads to phantomperceptions (Mathalon and Ford, 2008; van Lutterveld et al., 2011). Although such inter-reginoal synchrony was notinvestigated here, the fact that a suppression of this auditory prediction produces phantom perceptions of internalthoughts suggests that the pattern of activity in corollary discharge reﬂects the potentials during SP of the same words.15 PREPRINT - S

EPTEMBER

8, 2020This invites the hypothesis that CS’s γ activity serves a ’mirrored phonological’ function to SP’s γ activity (discussed inSection 5.2). Similarly, it can follow that the observed β -band synchronization (occurring at the same time as SP) alsosigniﬁes the temporal binding routines of phonological speech units enacted by the γ -band (Section 5.4). It was observed that CS and SP engage their oscillations differentially during speech processing. Interestingly, γ distinctions were found to be the highest within tasks. As previously mentioned, a multitude of studies describe aphonological processing function of γ activity during SP (Chang et al., 2010, 2016; Pasley et al., 2012). Althoughthe methods of the present study were not sensitive to determining phonological cognitive load, we speciﬁcally askedwhether CS’s γ activity contained a processing rhythm speciﬁc to θ activity. SP’s γ activity has been shown to keep arhythm with respect to the rise and fall (periodicity) of its θ phase, which, when coupled, putatively allows individualphonemes to be processed in the context of larger syllabic units (Giraud and Poeppel, 2012; Hyaﬁl et al., 2015). Theexistence of such a rhythm speciﬁc to a time frame would lend support, but not proof, for the hypothesis that γ activityin CS serves a similar function to that in SP. Therefore, conﬁrming this γ rhythm would resolve an important steptoward modelling CS from SP through a demonstration of possible functional equivalence.However, a θ - γ PAC was not observed within the CS condition. This was likely due to θ activity in speech production(and its variants) being responsible for non-linguistic portions of the task such as motor (Restle et al., 2012) andmemory-related processes (Albouy et al., 2017). Indeed, task-dependent θ phase correlations were not observed withinCS (Fig. 9a). Thus, we asked if CS γ would produce a rhythm that corresponds to the task- event-related cadence ofSP’s θ activity. Indeed, the results of the current study support this hypothesis with signiﬁcant SP θ -CS γ PAC alsooccurring at 200-500ms. This coupling was found to be temporally sensitive and speciﬁc to this particular time period,as an otherwise random relationship would portend a sparsely distributed coupling pattern. This result conﬁrms thatCS’s γ activity is rhythmic, and importantly, speciﬁc to the periodicity of SP’s θ phase, which putatively tracks syllabicquantities through the stimulus envelope (Luo and Poeppel, 2007). Therefore, the observed cross-task pseudo-couplingsupports the idea that the γ -bands of SP and CS served similar functions.The notion that this observed time-localized rhythmicity of CS γ activity corresponds to a phonological processingfunction like SP may be loosely entertained by the frequency topography, which shows that CS words are distinguishedin the γ band more consistently in the temporal and temporo-parietal channels. Intracranial studies employing overt andcovert phoneme repetition tasks also report γ activity in these regions, but speciﬁcally in the superior temporal gyrusand supramarginal gyrus (Fukuda et al., 2010; Toyoda et al., 2014), regions which have previously shown to play a rolein phonological processing through fMRI investigations (Okada and Hickok, 2006; van de Ven et al., 2009; Veneziaet al., 2016). Although it is tempting to connect the present topographical results to the source localization in thesestudies, a word of caution is warranted as EEG is known to have poor spatial resolution and activations portrayed by thescalp map may not project ideally to the putative sources of speech processing.In contrast to the loose functional correspondence depicted by the topographical results, the ﬁnding of γ rhythmicity inSP and CS in the same time frame (200-500ms) substantiates the interpretation that γ activity served a similar functionacross tasks. The temporally co-localized γ rhythms of SP and CS both seemed to correspond to transient θ -bandsynchronizations in the same time period, likely caused by a phase resetting priorly. In SP, θ phase has been found toreset to the temporal edges of the speech envelope (Gross et al., 2013) in order to initiate the coordination of processingat syllabic and phonemic levels (Assaneo and Poeppel, 2018). Similarly, θ phase has been found to reset to the onset ofCS, resulting in strong phase-locking between 250-500ms that represents a temporal marker of CS processing (Yaoet al., 2020). It remains inconclusive whether the present θ -band synchronization in CS was a result of phase resetting(Luo and Poeppel, 2007) or greater evoked potentials (Obleser and Weisz, 2012). However, if like overt speech, CStracks self-generated and temporally regular speech through neural oscillations, then the θ phase would necessarilyreset to cause enhanced synchronization across trials (Luo and Poeppel, 2007).From a neural architectural perspective (i.e. neural circuits), such phase-resetting has been proposed to underlieinformation transmission such as communication through coherence (Roberts et al., 2013) and the phase-dependentcoordination of large scale neural networks for encoding and decoding during attention and goal-directed behaviours(Canavier, 2015; Voloh and Womelsdorf, 2016). It is thought that phase alignment through resetting forms predictablewindows for integration which aids the coordinated parsing of segments (Fries, 2009). Hence, the transient θ -bandsynchronizations observed here likely demarcated the points of processing in the tasks, indicating that CS and SPprocess words at the same time. Naturally, it then follows that the common occurrences of γ rhythms, both of which aremodulated and pseudo-modulated by SP’s θ phase, represented a similar function between SP and CS. Since CS isa variant of speech production (only lacking overt articulation and production of sounds), it must generate a timelyinternal auditory prediction to match the processing of self-generated speech sounds (Jack et al., 2019; Scott, 2013).16 PREPRINT - S

EPTEMBER

8, 2020Therefore, under the view that CS is equivalent to self-generated SP without feedback, we propose that CS’s γ -bandresponse may have represented a similar function, potentially relating to phonological processing.While the above discussion supports the idea that the γ -bands subserved similar functions across tasks, it may be furtherreasoned that γ activity during CS represents a ’mirrored phonological’ activation pattern, as previously pondered.This hypothesis emerged out of studies of AVH whereby an aberrant corollary discharge (i.e. γ synchrony) results inphantom perceptions (Mathalon and Ford, 2008; van Lutterveld et al., 2011; Ford and Mathalon, 2005). As the purposeof corollary discharge is to cancel out self-generated sounds, it follows that CS’s γ -band response must continuouslypredict the sound patterns of ongoing speech. This invites the hypothesis that CS’s γ activity may produce similaractivation patterns to that of SP’s γ -band response, upon some transformation. As the current study did not analyze anyexisting correlations between the γ activities, future studies are directed to employ distance correlation measures toquantify the predictability and/or dependence between the γ -band amplitudes of SP and CS.However, some caution is warranted to the above interpretations as the current study involved only two speech tokens.Hence, future studies are directed to design studies with more diverse arrays of speech tokens for understanding therelationship between SP and CS’s θ - and γ -band responses. It was found that SP and CS produced rhythmic ﬂuctuations of γ -band activity that correlated to the tracking of thespeech envelope by SP’s θ activity. However, the lack of θ - γ PAC occurring within

CS suggested that the γ bandresponse of CS retained its own processing rhythm in the absence of modulation by its θ phase. This observation iscorroborated by a lack of θ phase correlations occurring between SP and CS (Fig. 9b). Thus, it is possible that θ activityserved a dissimilar function to syllabic chunking as seen in SP, or that the relationship between θ and γ in CS cannot bedescribed by a coupling of phase and amplitude. Similar to the current results, Hermes et al. (2014) showed that θ - γ PAC is suppressed during CS in Broca’s area, the temporo-parietal junction, and middle temporal gyrus, and that its θ power is anti-correlated to high frequency power. Perhaps more counter-intuitively, θ - γ PAC has been reported toincrease in patients during AVH as thoughts manifest as phantom perception (Koutsoukos et al., 2013); the inverse ofwhich suggests that a normal thought would suppress this θ - γ coupling. These results beg the question: if CS’s θ bandsynchronizes in the same time period (Fig. 6b) as the emergence of its γ rhythm, what is the role of θ with respect to γ in CS?Considering that stimulation of a major dorsal stream area (posterior inferior frontal gyrus) with θ burst stimulationfacilitates speech repetition accuracy (Restle et al., 2012), it may be reasoned that θ activity during CS correlates tomotor planning and activity. Indeed, 4-7Hz also corresponds to the mandibular movement rate during articulation(Giraud et al., 2007), which is also demonstrated by an enhanced coupling between motor and auditory areas duringsyllable presentation at 4.5Hz (Assaneo and Poeppel, 2018; Poeppel and Assaneo, 2020). These studies indicate that the θ -band represents a preferred articulatory rhythm and thus has motor origins in speech production. If this interpretationholds, it would suggest that phonological and articulatory processing in CS are independent in the context of PAC, butpotentially related by some other measure. For instance, it may be possible that θ -based articulatory expressions induces γ -based corollary discharge that pre-contains the sensory predictions outlined by the motor code and rules out the needfor coupling between the two frequency bands. Indeed, θ coherence (Ford et al., 2002) and γ synchrony (Uhlhaas et al.,2006; Uhlhaas and Singer, 2010; Gallinat et al., 2004) has both been found to be signiﬁcantly reduced in schizophrenicpatients with AVH, suggesting that the independent suppression of synchronization in the two oscillations each plays arole in an aberrant corollary discharge mechanism. Ameliorating the current results with the ﬁndings of these studiescan lead to the the hypothesis that θ -based motor discharges may inform γ activity in CS, but not be linked through aPAC.Alternatively, or perhaps in parallel, θ oscillations in the dorsal stream may also work by enhancing auditory workingmemory (Albouy et al., 2017), potentially in the form of access to lexical stores (Piai et al., 2014). This is consistentwith the model proposed by Indefrey and Levelt (2004) where word production has been suggested to initiate with thelexical concept. More empirically, in a series of studies investigating oscillatory power during covert word reading(Bastiaansen et al., 2005) and lexical deicsion-making (Bastiaansen et al., 2008), θ power - or local synchrony - wasfound to be modulated as a function of lexicality, peaking between 300-500ms. Thus, θ activity as accessing the mentallexicon is a sensible interpretation, as instantiating a lexical memory can initiate subsequent unitary/phonologicalprocessing by the γ -band, but not be necessarily linked through phase and amplitude. Indeed, γ -band activity hasbeen suggested to be necessary for the formation of both phonological and lexico-semantic representations of wordsthrough repetition and homophone priming tasks (Matsumoto and Iidaka, 2008). It is thus possible that θ activityregisters the broader lexical framework for the auditory prediction and informs γ -band corollary discharge, whichseemingly provides the sensory/phonological representation of the word. Although the singular role of θ activity inspeech production is being debated, considering that slow oscillations can synchronize between widely distributed brain17 PREPRINT - S

EPTEMBER

8, 2020areas (Buzsáki and Draguhn, 2004), the different role of θ activity in CS may encompass motor-related activity andaccess to lexical memory simultaneously or in a cascading manner. However, the methods of the present study were notsensitive to understanding the motor or lexical load by CS’s θ activity. The processing of CS brought about a comparable amount of distinctions in the β -band as the γ -band. While β activityfor speech production has been reported to play a role in motor activity and motor preparation (Mersov et al., 2016; Piaiet al., 2015), in language-related processes, it has importantly been proposed to serve as a top-down modulatory signalfor the temporal management of ongoing speech (Rimmele et al., 2018). Indeed, in language processing, β activityhas been shown to play a role in the timely binding of synactic (Bastiaansen et al., 2010) and semantic (Weiss andMueller, 2003) information. More broadly, Weiss and Mueller (2012) assert that synchronized β -band oscillationsserve to bind the contents of distributed set of neuronal populations into one coherent memory unit. The role of β activity in temporal binding routines is supported by studies revealing signiﬁcant δ - β PAC (Arnal et al., 2015; Keitelet al., 2017, 2018; Morillon et al., 2019), for the registration of words, phrases, and sentences by δ must necessarilyemerge from the temporal bindings along the hierarchy of speech units (phonemes to words). Although the currentstudy did not investigate this coupling, the observation of temporally co-localized β -band synchronizations between CSand SP, succeeding the common θ -band synchronization (discussed to signify the processing of SP and CS), invitesthe hypothesis that the β -band, too, served a similar function across tasks, namely in enacting binding routines ofphonological/syllabic items into broader whole-word percepts.Finally, it should be noted that the phase-amplitude relationships between SP’s θ and CS’s γ were found to be generalas weaker, but nevertheless existent, pseudo-couplings were observed also across words (e.g. SPB θ -CSO γ ). However,this does not necessarily portend that the γ activities themselves are general, as no gamma-band correlations wereobserved. Indeed, CS and SP both produced a signiﬁcant amount of distinctions in the γ -band, suggesting that γ activityis likely speciﬁc to words. Instead, general nature of SP-CS PAC can be attributed to the lack of diversity in the currentlexicon, which varied only between 1-2 syllables and spoken at the same rate. This evidently led to non-divergent θ patterns: average θ phases were found to be signiﬁcantly correlated between 200-500 in the two SP classes, making thisfrequency band less pertinent to distinction of SP words than γ activity (Fig. 9a). In contrast, studies using a largervocabulary and sentential speech tokens have shown that θ phase adjusts to syllabic rate and the number of syllables(Assaneo and Poeppel, 2018; Lizarazu et al., 2019; Ding et al., 2017a,b). Therefore, future studies should experimentwith a richer lexicon with more syllable counts, possibly embedded in sentential forms, in order to determine whetherCS’s γ activity forms a speciﬁc relationship to the putative syllabic tracking by SP’s θ activity. Such studies will assistin determining whether this pseudo-coupling between SP and CS simply reﬂects task demands or reﬂects commonneurolinguistic processing. Furthermore, future studies are directed to determine whether correlations exist between the γ -band responses of SP and CS. These investigations should provide substantial supports for the hypothesis that CS γ activity also reﬂects a phonological process. The present study represents the ﬁrst to investigate the similarities and differences with respect to oscillatory engagementduring CS and SP. We found that CS favours higher frequency activity that likely reﬂects corollary discharge. Speciﬁcally,we found that CS’s γ -band response has a similar rhythmic pattern to SP, possibly representing a similar phonologicalprocess. These ﬁndings substantiate the results of Oppenheim and Dell (2008) who describe CS to contain robustphonological information, and further suggests that CS and SP may share a common function when it comes to γ activity. Contrarily, we assert that θ activity in CS and SP play different roles possibly via differential processingthrough dorsal and ventral streams, respectively. Understanding the relative oscillatory engagements and their functionalcorrelates in the two tasks are elemental to the modelling of CS based on SP signals. Therefore, the present work canlead to the development of CS BCIs through the passive perception of speech, which can help hurdle the difﬁcultiesof training by rendering the training process passive. In order to achieve this modelling, we direct future studies toinvestigate the details of the relationship between SP’s θ activity and CS’s γ activity, as well as similarities in the γ -bandresponses for further conﬁrmation of a common function of γ activity. We thank Christine Horner and Sarah Holman for their help in data collection, and Ka Lun Tam and Pierre Duez forassisting in developing the protocol. 18

PREPRINT - S

EPTEMBER

8, 2020

References

Albouy, P., Weiss, A., Baillet, S., and Zatorre, R. J. (2017). Selective Entrainment of Theta Oscillations in the DorsalStream Causally Enhances Auditory Working Memory Performance.

Neuron , 94(1):193–206.e5.Alderson-Day, B. and Fernyhough, C. (2012). Inner Speech: Development, Cognitive Functions, Phenomenology, andNeurobiology Ben.

Cirugia Espanola , 90(9):545–547.Alderson-Day, B., Mitrenga, K., Wilkinson, S., McCarthy-Jones, S., and Fernyhough, C. (2018). The varieties of innerspeech questionnaire – Revised (VISQ-R): Replicating and reﬁning links between inner speech and psychopathology.

Consciousness and Cognition , 65(July):48–58.Arnal, L. H., Doelling, K. B., and Poeppel, D. (2015). Delta-beta coupled oscillations underlie temporal predictionaccuracy.

Cerebral Cortex , 25(9):3077–3085.Assaneo, M. F. and Poeppel, D. (2018). The coupling between auditory and motor cortices is rate-restricted: Evidencefor an intrinsic speech-motor rhythm.

Science Advances , 4(2):1–10.Babiloni, F., Cincotti, F., Carducci, F., Rossini, P. M., and Babiloni, C. (2001). Spatial enhancement of EEG data bysurface Laplacian estimation: The use of magnetic resonance imaging-based head models.

Clinical Neurophysiology ,112(5):724–727.Bastiaansen, M. and Hagoort, P. (2006). Chapter 12 Oscillatory neuronal dynamics during language comprehension.

Progress in Brain Research , 159(06):179–196.Bastiaansen, M., Magyari, L., and Hagoort, P. (2010). Syntactic uniﬁcation operations are reﬂected in oscillatorydynamics during on-line sentence comprehension.

Journal of Cognitive Neuroscience , 22(7):1333–1347.Bastiaansen, M. C., Oostenveld, R., Jensen, O., and Hagoort, P. (2008). I see what you mean: Theta power increases areinvolved in the retrieval of lexical semantic information.

Brain and Language , 106(1):15–28.Bastiaansen, M. C., Van Der Linden, M., Ter Keurs, M., Dijkstra, T., and Hagoort, P. (2005). Theta responses areinvolved in lexical-semantic retrieval during language processing.

Journal of Cognitive Neuroscience , 17(3):530–541.Berens, P. (2009). CircStat: a MATLAB toolbox for circular statistics.

Journal of Statistical Software , 31(10).Bidelman, G. M. (2015). Induced neural beta oscillations predict categorical speech perception abilities.

Brain andLanguage , 141:62–69.Bigdely-shamlo, N., Mullen, T., Kothe, C., Su, K.-m., and Widmann, A. (2015). The PREP pipeline : standardizedpreprocessing for large-scale EEG analysis.

Frontiers in neuroinformatics , 9(June):1–20.Boemio, A., Fromm, S., Braun, A., and Poeppel, D. (2005). Hierarchical and asymmetric temporal sensitivity in humanauditory cortices.

Nature Neuroscience , 8(3):389–395.Bostanov, V. (2004). BCI competition 2003 - Data sets Ib and IIb: Feature extraction from event-related brain potentialswith the continuous wavelet transform and the t-value scalogram.

IEEE Transactions on Biomedical Engineering ,51(6):1057–1061.Buchsbaum, B. R., Hickok, G., and Humphries, C. (2001). Cognitive Science : A Multidisciplinary Role of leftposterior superior temporal gyrus in phonological processing for speech perception and production.

CognitiveScience , 25(784375790):663–678.Buzsáki, G. and Draguhn, A. (2004). Neuronal olscillations in cortical networks.

Science , 304(5679):1926–1929.Buzsáki, G., Geisler, C., Henze, D. A., and Wang, X. J. (2004). Interneuron Diversity series: Circuit complexity andaxon wiring economy of cortical interneurons.

Trends in Neurosciences , 27(4):186–193.Canavier, C. C. (2015). Phase-resetting as a tool of information transmission.

Current Opinion in Neurobiology ,31:206–213.Chang, C. K., Chiari, L., and Hutchison, D. (2016).

Inclusive Smart Cities .Chang, M. D., Sejdi´c, E., Wright, V., and Chau, T. (2010). Measures of dynamic stability: Detecting differencesbetween walking overground and on a compliant surface.

Human Movement Science , 29(6):977–986.Chen, C. M. A., Mathalon, D. H., Roach, B. J., Cavus, I., Spencer, D. D., and Ford, J. M. (2011). The corollary dischargein humans is related to synchronous neural oscillations.

Journal of Cognitive Neuroscience , 23(10):2892–2904.Cullen, K. E. (2004). Sensory signals during active versus passive movement.

Current Opinion in Neurobiology ,14(6):698–706.Darvishi, S. and Al-Ani, A. (2007). Brain-computer interface analysis using continuous wavelet transform andadaptive neuro-fuzzy classiﬁer.

Annual International Conference of the IEEE Engineering in Medicine and Biology -Proceedings , pages 3220–3223. 19

PREPRINT - S

EPTEMBER

8, 2020DaSalla, C., Kambara, H., Koike, Y., and Sato, M. (2009). Spatial ﬁltering and single-trial classiﬁcation of EEG duringvowel speech imagery.

International Convention on Rehabilitation Engineering and Assistive Technology (ICREAT) ,5:1–4.Delorme, A. and Makeig, S. (2004). EEGLAB: An open source toolbox for analysis of single-trial EEG dynamicsincluding independent component analysis.

Journal of Neuroscience Methods , 134(1):9–21.Deng, S., Srinivasan, R., Lappas, T., and D’Zmura, M. (2010). EEG classiﬁcation of imagined syllable rhythm usingHilbert spectrum methods.

Journal of Neural Engineering , 7(4).Di Liberto, G. M., O’Sullivan, J. A., and Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reﬂectsphoneme-level processing.

Current Biology , 25(19):2457–2465.Ding, N., Melloni, L., Yang, A., Wang, Y., Zhang, W., and Poeppel, D. (2017a). Characterizing Neural Entrain-ment to Hierarchical Linguistic Units using Electroencephalography (EEG).

Frontiers in Human Neuroscience ,11(September):1–9.Ding, N., Melloni, L., Zhang, H., Tian, X., and Poeppel, D. (2015). Cortical tracking of hierarchical linguistic structuresin connected speech.

Nature Neuroscience , 19(1):158–164.Ding, N., Patel, A. D., Chen, L., Butler, H., Luo, C., and Poeppel, D. (2017b). Temporal modulations in speech andmusic.

Neuroscience and Biobehavioral Reviews , 81:181–187.Doelling, K. B., Arnal, L. H., Ghitza, O., and Poeppel, D. (2014). Acoustic landmarks drive delta-theta oscillations toenable speech comprehension by facilitating perceptual parsing.

NeuroImage , 85:761–768.Ende, M., Louis, A. K., Maass, P., and Mayer-Kress, G. (1998). EEG Signal Analysis by Continuous Wavelet TransformTechniques. In

Nonlinear Analysis of Physiological Data , number 1, pages 213–219.Ford, J. M. and Mathalon, D. H. (2005). Corollary discharge dysfunction in schizophrenia: Can it explain auditoryhallucinations?

International Journal of Psychophysiology , 58(2-3 SPEC. ISS.):179–189.Ford, J. M. and Mathalon, D. H. (2019). Efference Copy, Corollary Discharge, Predictive Coding, and Psychosis.

Biological Psychiatry: Cognitive Neuroscience and Neuroimaging , 4(9):764–767.Ford, J. M., Mathalon, D. H., Whitﬁeld, S., Faustman, W. O., and Roth, W. T. (2002). Reduced communication betweenfrontal and temporal lobes during talking in schizophrenia.

Biological Psychiatry , 51(6):485–492.Fries, P. (2009). Neuronal Gamma-Band Synchronization as a Fundamental Process in Cortical Computation.

AnnualReview of Neuroscience , 32(1):209–224.Fukuda, M., Rothermel, R., Juhász, C., Nishida, M., Sood, S., and Asano, E. (2010). Cortical gamma-oscillationsmodulated by listening and overt repetition of phonemes.

NeuroImage , 49(3):2735–2745.Gallinat, J., Winterer, G., Herrmann, C. S., and Senkowski, D. (2004). Reduced oscillatory gamma-band responsesin unmedicated schizophrenic patients indicate impaired frontal network processing.

Clinical Neurophysiology ,115(8):1863–1874.Ghitza, O. (2012). On the role of theta-driven syllabic parsing in decoding speech: Intelligibility of speech with amanipulated modulation spectrum.

Frontiers in Psychology , 3(JUL):1–12.Ghitza, O. (2013). The theta-syllable: A unit of speech information deﬁned by cortical function.

Frontiers in Psychology ,4(MAR):1–5.Giraud, A. L., Kleinschmidt, A., Poeppel, D., Lund, T. E., Frackowiak, R. S. J., and Laufs, H. (2007). EndogenousCortical Rhythms Determine Cerebral Specialization for Speech Perception and Production.

Neuron , 56(6):1127–1134.Giraud, A. L., Lorenzi, C., Ashburner, J., Wable, J., Johnsrude, I., Frackowiak, R., and Kleinschmidt, A. (2000).Representation of the temporal envelope of sounds in the human brain.

Journal of Neurophysiology , 84(3):1588–1598.Giraud, A. L. and Poeppel, D. (2012). Cortical oscillations and speech processing: Emerging computational principlesand operations.

Nature Neuroscience , 15(4):511–517.Gross, J., Hoogenboom, N., Thut, G., Schyns, P., Panzeri, S., Belin, P., and Garrod, S. (2013). Speech Rhythms andMultiplexed Oscillatory Sensory Coding in the Human Brain.

PLoS Biology , 11(12).Hermes, D., Miller, K. J., Vansteensel, M. J., Edwards, E., Ferrier, C. H., Bleichner, M. G., van Rijen, P. C., Aarnoutse,E. J., and Ramsey, N. F. (2014). Cortical theta wanes for language.

NeuroImage , 85:738–748.Hickok, G. (2014). The architecture of speech production and the role of the phoneme in speech processing.

Language,Cognition and Neuroscience , 29(1):2–20. 20

PREPRINT - S

EPTEMBER

8, 2020Hickok, G., Houde, J., and Rong, F. (2011). Sensorimotor Integration in Speech Processing: Computational Basis andNeural Organization.

Neuron , 69(3):407–422.Hickok, G., Okada, K., and Serences, J. T. (2009). Area Spt in the Human Planum Temporale Supports Sensory-MotorIntegration for Speech Processing.

Journal of Neurophysiology , 101(5):2725–2732.Hickok, G. and Poeppel, D. (2004). Dorsal and ventral streams: A framework for understanding aspects of the functionalanatomy of language.

Cognition , 92(1-2):67–99.Hickok, G. and Poeppel, D. (2007). The cortical organization of speech processing.

Nature Reviews Neuroscience ,8(5):393–402.Hsu, W. Y., Lin, C. C., Ju, M. S., and Sun, Y. N. (2007). Wavelet-based fractal features with active segment selection:Application to single-trial EEG data.

Journal of Neuroscience Methods , 163(1):145–160.Hyaﬁl, A., Fontolan, L., Kabdebon, C., Gutkin, B., and Giraud, A. L. (2015). Speech encoding by coupled corticaltheta and gamma oscillations. eLife , 4(MAY):1–45.Idrees, B. M. and Farooq, O. (2016). Vowel classiﬁcation using wavelet decomposition during speech imagery. , pages 636–640.Indefrey, P. and Levelt, W. J. (2004). The spatial and temporal signatures of word production components.

Cognition ,92(1-2):101–144.Jack, B. N., Le Pelley, M. E., Han, N., Harris, A. W., Spencer, K. M., and Whitford, T. J. (2019). Inner speech isaccompanied by a temporally-precise and content-speciﬁc corollary discharge.

NeuroImage , 198(March):170–180.Keitel, A., Gross, J., and Kayser, C. (2018). Perceptually relevant speech tracking in auditory and motor cortex reﬂectsdistinct linguistic features.

PLoS Biology , 16(3):1–19.Keitel, A., Ince, R. A., Gross, J., and Kayser, C. (2017). Auditory cortical delta-entrainment interacts with oscillatorypower in multiple fronto-parietal networks.

NeuroImage , 147(November 2016):32–42.Kimata, A., Yokoyama, Y., Aita, S., Nakamura, H., Higuchi, K., Tanaka, Y., Nogami, A., Hirao, K., and Aonuma, K.(2018). Temporally stable frequency mapping using continuous wavelet transform analysis in patients with persistentatrial ﬁbrillation.

Journal of Cardiovascular Electrophysiology , 29(4):514–522.Koutsoukos, E., Angelopoulos, E., Maillis, A., Papadimitriou, G. N., and Stefanis, C. (2013). Indication of in-creased phase coupling between theta and gamma EEG rhythms associated with the experience of auditory verbalhallucinations.

Neuroscience Letters , 534(1):242–245.Lizarazu, M., Lallier, M., and Molinaro, N. (2019). Phase amplitude coupling between theta and gamma oscillationsadapts to speech rate.

Annals of the New York Academy of Sciences , (April).Luo, H. and Poeppel, D. (2007). Phase Patterns of Neuronal Responses Reliably Discriminate Speech in HumanAuditory Cortex.

Neuron , 54(6):1001–1010.Mai, G., Minett, J. W., and Wang, W. S. (2016). Delta, theta, beta, and gamma brain oscillations index levels of auditorysentence processing.

NeuroImage , 133.Mathalon, D. H. and Ford, J. M. (2008). Corollary discharge dysfunction in schizophrenia: Evidence for an elementaldeﬁcit.

Clinical EEG and Neuroscience , 39(2):82–86.Matsumoto, A. and Iidaka, T. (2008). Gamma band synchronization and the formation of representations in visual wordprocessing: Evidence from repetition and homophone priming.

Journal of Cognitive Neuroscience , 20(11):2088–2096.McAuliffe, M., Socolof, M., Mihuc, S., Wagner, M., and Sonderegger, M. (2017). Montreal forced aligner: Trainabletext-speech alignment using kaldi.

Proceedings of the Annual Conference of the International Speech CommunicationAssociation, INTERSPEECH , 2017-Augus:498–502.Mersov, A. M., Jobst, C., Cheyne, D. O., and De Nil, L. (2016). Sensorimotor oscillations prior to speech onset reﬂectaltered motor networks in adults who stutter.

Frontiers in Human Neuroscience , 10(SEP2016):1–16.Mirollo, R. E. and Strogatz, S. H. (1990). Synchronization of pulse-coupled biological oscillators.

SIAM Journal onApplied Mathematics , 50(6):1645–1662.Morillon, B., Arnal, L. H., Schroeder, C. E., and Keitel, A. (2019). Prominence of delta oscillatory rhythms in themotor cortex and their relevance for auditory and speech perception.

Neuroscience and Biobehavioral Reviews ,107(September):136–142.Morillon, B., Liégeois-Chauvel, C., Arnal, L. H., Bénar, C. G., and Giraud, A. L. (2012). Asymmetric function of thetaand gamma activity in syllable processing: An intra-cortical study.

Frontiers in Psychology , 3(JUL):1–9.21

PREPRINT - S

EPTEMBER

8, 2020Morillon, B. and Schroeder, C. E. (2015). Neuronal oscillations as a mechanistic substrate of auditory temporalprediction.

Annals of the New York Academy of Sciences , 1337(1):26–31.Morin, A., Duhnych, C., and Racy, F. (2018). Self-reported inner speech use in university students.

Applied CognitivePsychology , 32(3):376–382.Morin, A., Uttl, B., and Hamper, B. (2011). Self-reported frequency, content, and functions of inner speech.

Procedia -Social and Behavioral Sciences , 30:1714–1718.Moses, D. A., Mesgarani, N., Leonard, M. K., and Chang, E. F. (2016). Neural speech recognition: Continuousphoneme decoding using spatiotemporal representations of human cortical activity.

Journal of Neural Engineering ,13(5):1–19.Obleser, J. and Weisz, N. (2012). Suppressed alpha oscillations predict intelligibility of speech and its acoustic details.

Cerebral Cortex , 22(11):2466–2477.Okada, K. and Hickok, G. (2006). Left posterior auditory-related cortices participate both in speech perception andspeech production: Neural overlap revealed by fMRI.

Brain and Language , 98(1):112–117.Okada, K., Matchin, W., and Hickok, G. (2018). Neural evidence for predictive coding in auditory cortex during speechproduction.

Psychonomic Bulletin and Review , 25(1):423–430.Onojima, T., Kitajo, K., and Mizuhara, H. (2017). Ongoing slow oscillatory phase modulates speech intelligibility incooperation with motor cortical activity.

PLoS ONE , 12(8):1–17.Oppenheim, G. M. and Dell, G. S. (2008). Inner speech slips exhibit lexical bias, but not the phonemic similarity effect.

Cognition , 106(1):528–537.Pasley, B. N., David, S. V., Mesgarani, N., Flinker, A., Shamma, S. A., Crone, N. E., Knight, R. T., and Chang, E. F.(2012). Reconstructing speech from human auditory cortex.

PLoS Biology , 10(1).Pei, X., Leuthardt, E. C., Gaona, C. M., Brunner, P., Wolpaw, J. R., and Schalk, G. (2011). Spatiotemporal dynamics ofelectrocorticographic high gamma activity during overt and covert word repetition.

NeuroImage , 54(4):2960–2972.Perrone-Bertolotti, M., Rapin, L., Lachaux, J. P., Baciu, M., and Lœvenbruck, H. (2014). What is that little voiceinside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring.

Behavioural Brain Research , 261:220–239.Piai, V., Dahlsl??tt, K., and Maris, E. (2015). Statistically comparing EEG/MEG waveforms through successivesigniﬁcant univariate tests: How bad can it be?

Psychophysiology , 52(3):440–443.Piai, V., Roelofs, A., and Maris, E. (2014). Oscillatory brain responses in spoken word production reﬂect lexicalfrequency and sentential constraint.

Neuropsychologia , 53(1):146–156.Pickering, M. J. and Garrod, S. (2013). An integrated theory of language production and comprehension.

Behavioraland Brain Sciences , 36(4):329–347.Poeppel, D. (2014). The neuroanatomic and neurophysiological infrastructure for speech and language.

CurrentOpinion in Neurobiology , 28:142–149.Poeppel, D. and Assaneo, M. F. (2020). Speech rhythms and their neural foundations.

Nature Reviews Neuroscience ,21(6):322–334.Real, R. G. and Kotchoubey, B. (2014). Studentized continuous wavelet transform (t-CWT) in the analysis of individualERPs: Real and simulated EEG data.

Frontiers in Neuroscience , 8(SEP):1–9.Restle, J., Murakami, T., and Ziemann, U. (2012). Facilitation of speech repetition accuracy by theta burst stimulationof the left posterior inferior frontal gyrus.

Neuropsychologia , 50(8):2026–2031.Rimmele, J. M., Morillon, B., Poeppel, D., and Arnal, L. H. (2018). Proactive Sensing of Periodic and AperiodicAuditory Patterns.

Trends in Cognitive Sciences , 22(10):870–882.Roberts, M. J., Lowet, E., Brunet, N. M., TerWal, M., Tiesinga, P., Fries, P., and DeWeerd, P. (2013). Robust gammacoherence between macaque V1 and V2 by dynamic frequency matching.

Neuron , 78(3):523–536.Schroeder, C. E. and Lakatos, P. (2009). Low-frequency neuronal oscillations as instruments of sensory selection.

Trends in Neurosciences , 32(1):9–18.Scott, M. (2013). Corollary Discharge Provides the Sensory Content of Inner Speech.

Psychological Science ,24(9):1824–1830.Scott, S. K. (2012). The neurobiology of speech perception and production-Can functional imaging tell us anything wedid not already know?

Journal of Communication Disorders , 45(6):419–425.22

PREPRINT - S

EPTEMBER

8, 2020Senkowski, D. and Herrmann, C. S. (2002). Effects of task difﬁculty on evoked gamma activity and ERPs in a visualdiscrimination task.

Clinical Neurophysiology , 113(11):1742–1753.Shahin, A. J., Picton, T. W., and Miller, L. M. (2009). Brain oscillations during semantic evaluation of speech.

Brainand Cognition , 70(3):259–266.Shergill, S. S., Brammer, M. J., Fukuda, R., Bullmore, E., Amaro, E., Murray, R. M., and McGuire, P. K. (2002).Modulation of activity in temporal cortex during generation of inner speech.

Human Brain Mapping , 16(4):219–227.Skipper, J. I., Nusbaum, H. C., and Small, S. L. (2005). Listening to talking faces: Motor cortical activation duringspeech perception.

NeuroImage , 25(1):76–89.Tian, X. and Poeppel, D. (2010). Mental imagery of speech and movement implicates the dynamics of internal forwardmodels.

Frontiers in Psychology , 1(OCT):1–23.Tian, X. and Poeppel, D. (2012). Mental imagery of speech: linking motor and perceptual systems through internalsimulation and estimation.

Frontiers in Human Neuroscience , 6(November):1–11.Toyoda, G., Brown, E. C., Matsuzaki, N., Kojima, K., Nishida, M., and Asano, E. (2014). Electrocorticographiccorrelates of overt articulation of 44 English phonemes: Intracranial recording in children with focal epilepsy.

Clinical Neurophysiology , 125(6):1129–1137.Trouvain, J. (2007). On the comprehension of extremely fast synthetic speech. pages 5–13.Uhlhaas, P. J., Linden, D. E., Singer, W., Haenschel, C., Lindner, M., Maurer, K., and Rodriguez, E. (2006). Dysfunc-tional long-range coordination of neural activity during gestalt perception in schizophrenia.

Journal of Neuroscience ,26(31):8168–8175.Uhlhaas, P. J. and Singer, W. (2010). Abnormal neural oscillations and synchrony in schizophrenia.

Nature ReviewsNeuroscience , 11(2):100–113.van de Ven, V., Esposito, F., and Christoffels, I. K. (2009). Neural network of speech monitoring overlaps withovert speech production and comprehension networks: A sequential spatial and temporal ICA study.

NeuroImage ,47(4):1982–1991.van Lutterveld, R., Sommer, I. E. C., and Ford, J. M. (2011). The Neurophysiology of Auditory Hallucinations – AHistorical and Contemporary Review.

Frontiers in Psychiatry , 2(May):1–7.Varnet, L., Ortiz-Barajas, M. C., Erra, R. G., Gervain, J., and Lorenzi, C. (2017). A cross-linguistic study of speechmodulation spectra.

The Journal of the Acoustical Society of America , 142(4):1976–1989.Venezia, J. H., Fillmore, P., Matchin, W., Lisette Isenberg, A., Hickok, G., and Fridriksson, J. (2016). Perceptiondrives production across sensory modalities: A network for sensorimotor integration of visual speech.

NeuroImage ,126:196–207.Voloh, B. and Womelsdorf, T. (2016). A role of phase-resetting in coordinating large scale neural networks duringattention and goal-directed behavior.

Frontiers in Systems Neuroscience , 10(MAR):1–19.Voytek, B., Esposito, M. D., Crone, N., and Knight, R. T. (2013). A method for event-related phase / amplitude coupling.

NeuroImage , 64:416–424.Weiss, S. and Mueller, H. M. (2003). The contribution of EEG coherence to the investigation of language.

Brain andLanguage , 85(2):325–343.Weiss, S. and Mueller, H. M. (2012). "Too many betas do not spoil the broth": The role of beta brain oscillations inlanguage processing.

Frontiers in Psychology , 3(JUN):1–15.Wolpert, D. M. and Ghahramani, Z. (2000). Computational principles of movement neuroscience.

Nature Neuroscience ,3(11s):1212–1217.Yao, B., Taylor, J. R., Banks, B., and Kotz, S. A. (2020). Theta activity phase-locks to inner speech in silent reading.

PsyArXiv , 44(0).Zoefel, B. and VanRullen, R. (2016). EEG oscillations entrain their phase to high-level features of speech sound.