[PDF] Spatial sampling of MEG and EEG revisited: From spatial-frequency spectra to model-informed sampling

Abstract

In this paper, we analyze spatial sampling of electro- (EEG) magnetoencephalography (MEG), where the electric or magnetic field is typically sampled on a curved surface such as the scalp. Using simulated measurements, we study the spatial-frequency content in EEG as well as in on- and off-scalp MEG. The analysis suggests that on-scalp MEG would generally benefit from three times more samples than EEG or off-scalp MEG. Based on the theory of Gaussian processes and experimental design, we suggest an approach to obtain sampling locations on surfaces that are optimal with respect to prior assumptions. Additionally, the approach allows to control, e.g., the uniformity of the sampling locations in the grid. By simulating the performance of grids constructed with different priors, we show that for a low number of spatial samples, model-informed non-uniform sampling can be beneficial. For a large number of samples, uniform sampling grids yield nearly the same total information as the model-informed grids.

Full PDF

SSpatial sampling of MEG and EEG revisited: From spatial-frequency spectra tomodel-informed sampling

Joonas Iivanainen a, ∗ , Antti J. Mäkinen a, ∗ , Rasmus Zetter a , Matti Stenroos a , Risto J. Ilmoniemi a , Lauri Parkkonen a a Department of Neuroscience and Biomedical Engineering, Aalto University School of Science, FI-00076 Aalto, Finland

Abstract

In this paper, we analyze spatial sampling of electro- (EEG) magnetoencephalography (MEG), where the electric or magneticﬁeld is typically sampled on a curved surface such as the scalp. Using simulated measurements, we study the spatial-frequencycontent in EEG as well as in on- and off-scalp MEG. The analysis suggests that on-scalp MEG would generally beneﬁt fromthree times more samples than EEG or off-scalp MEG. Based on the theory of Gaussian processes and experimental design, wesuggest an approach to obtain sampling locations on surfaces that are optimal with respect to prior assumptions. Additionally,the approach allows to control, e.g., the uniformity of the sampling locations in the grid. By simulating the performance of gridsconstructed with different priors, we show that for a low number of spatial samples, model-informed non-uniform samplingcan be beneﬁcial. For a large number of samples, uniform sampling grids yield nearly the same total information as the model-informed grids.

Keywords: magnetoencephalography, electroencephalography, on-scalp MEG, spatial sampling, Gaussian process, spatialfrequency

1. Introduction

Sufﬁcient sampling of a continuous signal ensures that thesignal can be reconstructed from the samples without lossof relevant information. Theorems that dictate the neces-sary conditions for accurate signal reconstruction, such asthe number of samples, are essential for guiding data acqui-sition (e.g., Shannon 1949; Pesenson 2000; McEwen & Wiaux2011). The best known of these theorems is the Shannon–Nyquist sampling theorem of bandlimited functions (Shan-non, 1949; Luke, 1999; Unser, 2000), which connects continu-ous 1-D signals and their discrete representations. The theo-rem also applies to separable multi-dimensional signals suchas two- and three-dimensional images.The necessary conditions imposed by sampling theoremscannot always be followed in spatial sampling. For exam-ple, there can be limitations on the number of samples andtheir locations or the properties of the signal may not be fullyknown a priori. Further, if the sampling domain is not sim-ple, there may be no applicable general theorem to guide thesampling; the optimal sampling positions have to be deter-mined by other means, e.g., with statistical methods (Krauseet al., 2008).In this work, we analyze spatial sampling of electro- (EEG)and magnetoencephalography (MEG), where the electric andmagnetic ﬁelds of the brain are measured on surfaces em-bedded in 3-D. We carry out a spatial-frequency analysis of ∗ These authors contributed equally to this work.

Email addresses: [email protected] (Joonas Iivanainen), [email protected] (Antti J. Mäkinen) continuous ﬁeld patterns using the eigenfunctions of the sur-face Laplace–Beltrami operator (Reuter et al., 2009; Bronsteinet al., 2017). The basis formed by these functions can be seenas a natural generalization of the 1-D Fourier basis on a sur-face. We further describe how more compact bases can beconstructed using prior information of the neuronal ﬁeld pat-terns. We discuss how these basis-function representations ofthe ﬁeld patterns relate to the number of spatial samples.We investigate how to obtain optimal sample positions inEEG and MEG. We utilize Gaussian processes (Abrahamsen,1997; Williams & Rasmussen, 2006) to encode prior knowl-edge and to make Bayesian inference of the ﬁeld. This per-spective is similar to kriging in geospatial sciences (Chilès& Desassis, 2018) and Gaussian-process regression in ma-chine learning (Williams & Rasmussen, 2006). We introducemeasures from experimental design (Lindley et al., 1956;Chaloner & Verdinelli, 1995; Krause et al., 2008) that canbe used to quantify the optimality of the sample positions.We suggest a method that maximizes Shannon’s information(Shannon, 1949) for a given number of samples and prior as-sumptions. We use the method to generate sampling grids fordifferent EEG and MEG experiments, where either the wholebrain or parts of it are of interest.

2. Spatial sampling of EEG and MEG

EEG and MEG are noninvasive neuroimaging techniques formeasuring brain activity at millisecond time scale (Hämäläi-nen et al., 1993; Nunez et al., 2006). EEG and MEG measurethe electric and magnetic ﬁeld, respectively, due to neuronal

Preprint submitted to arXiv June 5, 2020 a r X i v : . [ phy s i c s . m e d - ph ] J un urrents. Adequate spatial sampling is necessary to capturethe spatial detail of the continuous ﬁeld due to brain activ-ity. EEG setups typically involve 16–128 electrodes placeduniformly on the subject’s scalp while state-of-the-art MEGsystems use around 300 superconducting quantum interfer-ence device (SQUID) sensors placed rigidly around the sub-ject’s head (Baillet, 2017).Spatial sampling of EEG and MEG has again become atopic of interest. High-density EEG (hdEEG) with an elec-trode count of 128–256 or even more has been suggested tobe beneﬁcial (e.g., Brodbeck et al. 2011; Petrov et al. 2014;Grover & Venkatesh 2016; Hedrich et al. 2017; Robinson et al.2017). In MEG, in contrast to conventional liquid-helium-cooled SQUID sensors that measure the ﬁeld ∼ c SQUIDs (Fa-ley et al., 2017) have enabled on-scalp ﬁeld sensing withinmillimetres from the head surface. Simulations have shownthat the closer proximity of the sensors to the brain improvesspatial resolution of MEG and increases information aboutneuronal currents (Schneiderman, 2014; Boto et al., 2016;Iivanainen et al., 2017; Riaz et al., 2017). The number ofsensors that is beneﬁcial in on-scalp MEG is not currentlywell-known. A recent study suggested that fewer sensors areneeded in on-scalp MEG to achieve similar spatial discrim-ination performance as with SQUIDs (Tierney et al., 2019),while another study suggested with experiments that ∼ ∼

3. Theory

The quasi-static electric potential or magnetic ﬁeld compo-nent is generated by distributed source activity (cid:126) q ( (cid:126) r s ) insidethe brain. We call this ﬁeld pattern on a surface a topogra-phy ; it can be written in terms of Green’s function (cid:126) L ( (cid:126) r , (cid:126) r s ) as(Hämäläinen et al., 1993; Nunez et al., 2006) t ( (cid:126) r ) = (cid:90) (cid:126) q ( (cid:126) r s ) · (cid:126) L ( (cid:126) r , (cid:126) r s ) dV s . (1)By discretizing the neural source distribution (cid:126) q ( (cid:126) r s ) into a setof primary current dipoles (cid:126) q i ( (cid:126) r i ) = q i ˆ q i δ ( (cid:126) r − (cid:126) r i ), i = M ,where q i is the amplitude of the i th source at (cid:126) r i and ˆ q i itsorientation, Eq. (1) becomes t ( (cid:126) r ) = (cid:88) i q i ˆ q i · (cid:126) L ( (cid:126) r , (cid:126) r i ) = (cid:88) i q i t i ( (cid:126) r ), (2)where t i ( (cid:126) r ) = ˆ q i · (cid:126) L ( (cid:126) r , (cid:126) r i ) is the source topography , i.e., to-pography due to a unit source ˆ q i at (cid:126) r i . When the topogra-phy is also discretized to (or sampled at) N points, the equa-tion reduces to a linear matrix equation t = Lq , where L is the( N × M ) lead-ﬁeld matrix (Hämäläinen et al., 1993).In MEG and EEG, we sample the topographies t ( (cid:126) r ) on acurved 2-D surface embedded in 3-D space. Conventional 1-D Fourier analysis can be extended to spatial-frequency (SF)analysis on such surfaces using a suitable orthonormal func-tion basis. The generating equation for the spatial-frequencybasis (Levy, 2006) { u m } is the Helmholtz equation − ∇ u m = k m u m , (3)where ∇ is the Laplace–Beltrami (LB) operator, i.e., the sur-face Laplacian. The equation is an eigenvalue equation for −∇ , where the eigenfunctions u m represent the modes ofstanding waves on the surface (see Fig. 4) and eigenvalues k m are the squared (spatial) frequencies. These functions havebeen used in a variety of applications from signal and ge-ometry processing (Levy, 2006; Reuter et al., 2009) to corticalanalysis (Qiu et al., 2006) and deep learning (Bronstein et al.,2017). On a compact surface, the eigenvalue spectrum is dis-crete { k m , u m } and the eigenfunctions form an orthonormalbasis with respect to the inner product (Levy, 2006): 〈 u i , u j 〉 = (cid:90) S u i ( (cid:126) r ) u j ( (cid:126) r )d S = δ i j , (4)2here δ i j is the Kronecker delta.The SF basis can be used to analyze the spatial-frequencycontent of the topographies. As the basis is orthonormal, anytopography t ( (cid:126) r ) can be expressed as a linear combination ofSF basis functions t ( (cid:126) r ) = ∞ (cid:88) m = a m u m ( (cid:126) r ) = ∞ (cid:88) m = 〈 t , u m 〉 u m ( (cid:126) r ), (5)where the coefﬁcients are projections of t ( (cid:126) r ) on { u m }. Due toorthonormality, topography energy can be decomposed as || t || = 〈 t , t 〉 = (cid:90) S | t ( (cid:126) r ) | d S = ∞ (cid:88) m = a m , (6)a relation similar to Parseval’s theorem for Fourier series. Thesquared coefﬁcients a m = 〈 t , u m 〉 comprise the energy spec-trum of the topography in the sense of spectral signal analy-sis.If the topography t is solely due to source activity (Eq. (2)),the spectral coefﬁcients can be written as a m = (cid:80) i q i 〈 t i , u m 〉 .Assuming that the source amplitudes q i are independent ran-dom variables with zero mean (E( q i q j ) = t can be decomposed using the spectra of the sourcetopographies E( (cid:107) t (cid:107) ) = ∞ (cid:88) m = (cid:88) i E( q i ) 〈 t i , u m 〉 . (7)Sampling theorems typically assume that the signals arebandlimited [e.g., in 1-D (Shannon, 1949); on a sphere(McEwen & Wiaux, 2011); on a surface/manifold (Pesenson,2014, 2015)]. In our context, a bandlimited topography can beexpressed with a limited number of SF basis functions { u m },i.e., a m = m > B ( k m > k B ). Figure 1 shows the spatial-frequency representation of a topography in MEG and EEG.Although most of the topography energy in general residesat low spatial frequencies, topographies are not strictly ban-dlimited. Thereby, standard sampling theorems cannot be di-rectly applied to them.To determine an effective bandlimit for a topography t , weexpress it as t ( (cid:126) r ) = t B ( (cid:126) r ) + t r ( (cid:126) r ) = B (cid:88) m = a m u m ( (cid:126) r ) + ∞ (cid:88) m = B + a m u m ( (cid:126) r ), (8)where t B is the B -bandlimited representation of the topogra-phy and t r the residual. If B is chosen such that every sourcetopography t i is expressed up to 1 − α of its total energy (e.g.,1 − = (cid:107) t r (cid:107) ) = (cid:88) i E( q i ) ∞ (cid:88) m = B + 〈 t i , u m 〉 ≤ (cid:88) i E( q i ) α (cid:107) t i (cid:107) = α E( (cid:107) t (cid:107) ). (9)Thus, any topography due to independent neural sources isexpected to have 1 − α of its energy in the subspace spanned by the B ﬁrst SF basis functions. One way to obtain an effec-tive bandlimit B (and an effective sample spacing) for bound-ing the expected error is to study the topography spectra ofthe individual sources. Here, we further analyze how an effective bandlimit B can bechosen by studying the reconstruction of a topography fromnoisy samples. We ﬁrst assume that the topography can beapproximated with the B ﬁrst SF basis functions as t ≈ t B . Tosimplify the notation of Eq. (8), we express the summationover the basis functions as a dot product t B ( (cid:126) r ) = u ( (cid:126) r ) (cid:62) a where u ( (cid:126) r )[ m ] = u m ( (cid:126) r ) and a [ m ] = a m .We model N noisy samples of t B ( (cid:126) r ) at locations (cid:126) r n stackedin a column vector y as y =  u ( (cid:126) r ) (cid:62) ... u ( (cid:126) r N ) (cid:62)  a + (cid:178) = Ua + (cid:178) , (10)where U is an ( N × B ) matrix with the column vectors u at (cid:126) r n on rows, i.e., U [ n , m ] = u m ( (cid:126) r n ), and (cid:178) ∼ N (0, σ I ) is whitemeasurement noise. When N ≥ B , we can estimate the coef-ﬁcients a using the least-squares methodˆ a = ( U (cid:62) U ) − U (cid:62) y = Gy , (11)which minimizes the sum of squared errors between themeasurements and the model. The bandlimited topography t B can be reconstructed using the estimated coefﬁcients asˆ t B ( (cid:126) r ) = u ( (cid:126) r ) (cid:62) ˆ a = u ( (cid:126) r ) (cid:62) Gy = (cid:88) n (cid:181)(cid:88) m G [ m , n ] u m ( (cid:126) r ) (cid:182) y n , (12)where the functions (cid:80) m G [ m , n ] u m ( (cid:126) r ) are interpolation func-tions for the samples y n .If the reconstruction ˆ t B matches the continuous topogra-phy t on the whole sampling surface, we can argue that sam-pling is adequate. To study the reconstruction error, we up-date the measurement model as y = Ua + U r a r + (cid:178) , where U r [ n , m ] = u m + B ( (cid:126) r n ) corresponds to the residual topography t r . Inserting y into Eq. (11), the coefﬁcient estimate can beexpressed asˆ a = G ( Ua + U r a r + (cid:178) ) = a + G ( U r a r + (cid:178) ) = a + ∆ a , (13)where ∆ a = G ( U r a r + (cid:178) ) is the error in the estimated coefﬁ-cients. The error consists of two parts: GU r a r is the error dueto aliasing from components outside the band m ≤ B , and G (cid:178) is the random error due to noise.The expected reconstruction error can now be evaluated asE (cid:107) t − ˆ t B (cid:107) = E (cid:107) t B − ˆ t B (cid:107) + E (cid:107) t r (cid:107) = E (cid:107) ∆ a (cid:107) + E (cid:107) a r (cid:107) . (14)Since E( (cid:107) G (cid:178) (cid:107) ) = Tr[ σ ( U (cid:62) U ) − ] and E( (cid:178) ) =

0, we can rewritethe error asE (cid:107) t − ˆ t (cid:107) = Tr[ σ ( U (cid:62) U ) − ] + E( (cid:107) GU r a r (cid:107) ) + E ( (cid:107) a r (cid:107) ). (15)3 opography 10 Representation 40 100 300Number of components -10 -5 Spatial frequency (1/cm) -10 -5 E ne r g y s pe c t r u m EEG C u m u l a t i v e ene r g y ( % ) On-scalp MEGOff-scalp MEGEEG

Spatial frequency (1/cm)

On-scalp MEGOff-scalp MEG

Figure 1: Spatial-frequency representations of topographies in MEG and EEG. ’On scalp’ refers to MEG measurement within millimetres from the scalp while’off scalp’ to that within 2 cm. For details, see Sec. 4.1.

Left:

Representation of a topography with an increasing number of spatial-frequency components.The maximum spatial frequency (1/cm) and the cumulative energy of the representation are shown.

Right:

Energy spectrum and cumulative energy of thetopography as a function of spatial frequency.

If the noise term dominates the reconstruction error, t B canbe considered a reasonably good approximation of t ; thewhite noise variance σ can then be used to determine thenumber of components B and the number of samples N ≥ B needed for the reconstruction. If the noise level is unknown,a threshold such as 1% expected residual can be used to de-termine the effective bandlimit B . As described in the previ-ous section, this threshold may be obtained by ﬁnding the B that covers at least 99% energy of every source topography; asimilar threshold has been used previously (e.g., Ahonen et al.1993; Grover & Venkatesh 2016). Last, both the noise and thealiasing error depend on the sample positions { (cid:126) r n } via the ma-trix U . We will quantify the optimality of the sample positionsin Sec. 3.5.In the presented analysis, we assumed a certain band, rep-resenting a subspace of all possible topographies, to recon-struct the topography from the samples. For a more ﬂex-ible incorporation of prior assumptions, we next present aBayesian approach to analyze the sampling and reconstruc-tion. In the Bayesian framework, the prior assumptions of the sig-nals are cast to prior probability distributions. In this section,we express the topography as t ( (cid:126) r ) = ψ ( (cid:126) r ) (cid:62) a , where { ψ m } canbe any set of (not necessarily orthonormal) basis functions(e.g., u m or t i ). We incorporate the prior knowledge by as-signing a prior distribution for the coefﬁcients a m , which weconsider random variables. We assume them to be Gaussianwith a joint probability density N ( m a , K a ), where m a is theprior mean of a and K a is the covariance matrix of a , repre-senting the prior uncertainty about the coefﬁcients.A linear combination of basis functions with Gaussian ran-dom coefﬁcients is a Gaussian random ﬁeld , an extension ofthe Gaussian process (Williams & Rasmussen, 2006) to 3-D

Figure 2: Bayesian reconstruction of a topography.

Left : Prior variance anda ground-truth topography.

Right : Posterior variance and the topographyreconstruction (the posterior mean ﬁeld) after 1, 15, 30 and 100 noisy mea-surements (white dots) taken uniformly on the surface. Each additional sam-ple decreases the posterior variance, yielding a better reconstruction of thetopography. space. From this perspective, t ( (cid:126) r ) = ψ ( (cid:126) r ) (cid:62) a is a random to-pography, which can be described by its mean ﬁeld µ t ( (cid:126) r ) = E( t ( (cid:126) r )) = (cid:88) m µ a [ m ] ψ m ( (cid:126) r ) = ψ ( (cid:126) r ) (cid:62) µ a (16)and covariance kernelK t ( (cid:126) r , (cid:126) r (cid:48) ) = Cov( t ( (cid:126) r ), t ( (cid:126) r (cid:48) )) = (cid:88) m , m (cid:48) K a [ m , m (cid:48) ] ψ m ( (cid:126) r ) ψ m (cid:48) ( (cid:126) r (cid:48) ) = ψ ( (cid:126) r ) (cid:62) K a ψ ( (cid:126) r (cid:48) ). (17)The mean ﬁeld represents the expected value of the topogra-phy based on prior knowledge and the variance K t ( (cid:126) r , (cid:126) r (cid:48) = (cid:126) r )describes the uncertainty around the mean. Every ﬁnite col-lection of samples of a random ﬁeld is jointly distributed as N ( µ , K ), where K is the sample covariance matrix deﬁned el-ementwise as K [ n , n (cid:48) ] = K t ( (cid:126) r n , (cid:126) r n (cid:48) ) = ψ ( (cid:126) r n ) (cid:62) K a ψ ( (cid:126) r n (cid:48) ), and µ is the sample mean, i.e., µ [ n ] = µ t ( (cid:126) r n ) = ψ ( (cid:126) r n ) (cid:62) µ a .The effect of measurement is encoded in the posterior dis-tribution of the topography as illustrated in Fig. 2. Hav-ing a collection of (noisy) samples y at sampling points R = (cid:126) r n | n = N }, modeling the noise as (cid:178) ∼ N (0, Σ ), and as-suming E( t ( (cid:126) r )) =

0, the posterior mean ﬁeld and covarianceare (Williams & Rasmussen, 2006) µ t ( (cid:126) r | y , R ) = k ( (cid:126) r ) (cid:62) ( K + Σ ) − y , K t ( (cid:126) r , (cid:126) r (cid:48) | R ) = K t ( (cid:126) r , (cid:126) r (cid:48) ) − k ( (cid:126) r ) (cid:62) ( K + Σ ) − k ( (cid:126) r (cid:48) ), (18)where k ( (cid:126) r ) is the covariance between the topography at thesample points and the topography at the point (cid:126) r : k ( (cid:126) r )[ n ] = K t ( (cid:126) r , (cid:126) r n ) = ψ ( (cid:126) r ) (cid:62) K a ψ ( (cid:126) r n ).The posterior mean µ t ( (cid:126) r | y , R ) depends linearly on thesamples y and can be considered as the reconstruction ofthe topography similar to Eq. (12). The posterior covariance K t ( (cid:126) r , (cid:126) r (cid:48) | R ) describes the uncertainty in the topography afterthe measurements; the noisier the samples, the more uncer-tainty is left. For (cid:126) r = (cid:126) r (cid:48) , the term subtracted from K t ( (cid:126) r , (cid:126) r ) isalways positive (( K + Σ ) − is positive-deﬁnite). Hence, a newmeasurement always reduces the uncertainty about the to-pography (Fig. 2). Uncertainty is reduced at all points (cid:126) r thatare correlated with the topography at the sampling location.In Appendix A, we derive the following formulas for themean and covariance of t ( (cid:126) r ) µ t ( (cid:126) r | y , R ) = ψ ( (cid:126) r ) (cid:62) ( Ψ (cid:62) Σ − Ψ + K − a ) − Ψ (cid:62) Σ − y = ψ ( (cid:126) r ) (cid:62) ˆ a , K t ( (cid:126) r , (cid:126) r (cid:48) | R ) = ψ ( (cid:126) r ) (cid:62) ( Ψ (cid:62) Σ − Ψ + K − a ) − ψ ( (cid:126) r ), (19)where ˆ a = ( Ψ (cid:62) Σ − Ψ + K − a ) − Ψ (cid:62) Σ − y is the posterior mean ofthe coefﬁcients and ( Ψ (cid:62) Σ − Ψ + K − a ) − is the associated pos-terior covariance matrix. In the limit of white noise ( Σ → σ I )and inﬁnite SNR ( σ K − a → σ ( Ψ (cid:62) Ψ ) − , which isthe covariance of the coefﬁcient error. At this limit, the equa-tions are well-deﬁned only when the number of samples N islarger than the number of basis functions, i.e., the topogra-phy belongs to a subspace of B ≤ N basis functions. In this section, we outline how to construct prior covariancekernels (Eq. (17)) for a random topography t ( (cid:126) r ) on the mea-surement surface. In the later sections, we discuss how thesekernels can be used to obtain sampling grids. Spatial-frequency kernel.

The SF basis functions can be usedto express the prior kernel as K ( (cid:126) r , (cid:126) r (cid:48) ) = u ( (cid:126) r ) (cid:62) K SF u ( (cid:126) r (cid:48) ), where u ( (cid:126) r )[ i ] = u i ( (cid:126) r ) is the i th eigenfunction of the LB operatorand K SF is the covariance matrix of the spatial-frequencycoefﬁcients. To construct this kernel, knowledge of thesampling-domain geometry, i.e., of the measurement sur-face, is needed. For a detailed treatment of kernel represen-tation with Laplacian eigenfunctions, we refer to the work bySolin and Särkkä (2020).The assumption of smoothly-varying bandlimited to-pographies (Sec. 3.1) can be encoded in this form by settingthe diagonal of K SF to a constant up to B components and to zero above B ; the highest spatial frequency of the topogra-phies is assumed to be k B . The covariance structure of thisSF-bandlimited prior is visualized in Fig. 3 both in the co-efﬁcient K SF and spatial domains K ( (cid:126) r , (cid:126) r (cid:48) ). This prior for thecoefﬁcients results in covariance kernels with sinc-like spatialproﬁles K ( (cid:126) r , (cid:126) r i ) and uniform spatial variance K ( (cid:126) r , (cid:126) r ).A physically more plausible prior kernel can be con-structed by assigning the diagonal with a variance that decaystowards the high spatial frequencies (SF variance decay; Fig.3). This results in a nearly uniform spatial variance and spa-tially symmetric covariance proﬁles without sinc-like ringing. Topography kernel.

Considering the source amplitudes q i inEq. (2) to be Gaussian random variables (de Munck et al.,1992), the covariance kernel can be written as K ( (cid:126) r , (cid:126) r (cid:48) ) = t ( (cid:126) r ) (cid:62) K q t ( (cid:126) r (cid:48) ) = (cid:88) i , j K q [ i , j ] t i ( (cid:126) r ) t j ( (cid:126) r (cid:48) ), (20)where K q is the ( M × M ) source covariance matrix. The sourcetopographies t i ( (cid:126) r ) can be considered as the basis functionsthat assemble the kernel and q i as the associated (possiblycorrelated) random coefﬁcients. Compared to the SF ker-nel, more detailed prior knowledge of the random topogra-phy can be encoded in this form. For a set of N samplingpoints, the kernel reduces to a covariance matrix K = LK q L (cid:62) .A relevant topography kernel can be constructed by as-suming the source amplitudes q i identically and indepen-dently distributed (IID source prior): K q = q I . This kernel K IID ( (cid:126) r , (cid:126) r (cid:48) ) = q t ( (cid:126) r ) (cid:62) t ( (cid:126) r (cid:48) ) has non-uniform spatial varianceand the spatial proﬁles K ( r , (cid:126) r i ) are asymmetric (Fig. 3). Kernel eigendecomposition and topography eigenbasis.

Prin-cipal component analysis can be extended to random pro-cesses using the Karhunen–Loève decomposition, in whichthe process is expressed as a linear combination of orthonor-mal functions multiplied by uncorrelated random coefﬁ-cients (Loeve, 1978; Stark & Woods, 1986); the orthonormalbasis functions are obtained as the eigenfunctions of the cor-responding covariance kernel (Mercer, 1909). The Karhunen–Loève theorem (Stark & Woods 1986, section 7.6) states thatthis basis has the minimal truncation error among all possi-ble bases for representing the random process with a givennumber of components.To analyze MEG and EEG topographies, we investigate theeigenbasis of K IID . The kernel can be written using the eigen-decomposition as K IID ( (cid:126) r , (cid:126) r (cid:48) ) = v ( (cid:126) r ) (cid:62) Dv ( (cid:126) r (cid:48) ) = (cid:88) m d m v m ( (cid:126) r ) v m ( (cid:126) r (cid:48) ), (21)where v m ( (cid:126) r ) form the orthonormal basis on the measure-ment surface, which we call the topography eigenbasis , and D is a diagonal matrix with variances d m of the eigenfunctionson the diagonal. The coefﬁcient covariances of the differentpriors in the topography eigenbasis are shown in Fig. 3.We can express the expected energy (or total variance)of a random topography following the IID-source assump-tion using the variances of the eigencomponents as E( (cid:107) t (cid:107) ) = patial covariance Max0Distance (m)

Kernel Variance

Coefficient covariance S F b a n d li m i t e d S F v a r i a n c e d e c a y Topography eigenbasisSF basis

Component index II D s o u r c e p r i o r log(abs(K)) diag(K) log(abs(K)) diag(K)100 200 100 200 Figure 3: Covariances of three priors (SF bandlimited: bandlimited in spatial-frequency basis; SF variance decay: variance roll-off in SF basis; IID sourceprior: identically and independently distributed neural sources generating the ﬁeld).

Left : Prior coefﬁcient covariance in the SF basis and in the topographyeigenbasis of IID sources. The covariance matrix and its diagonal are shown.

Right : Spatial covariance. The spatial kernel and its decay are shown for oneposition on the measurement surface. The rightmost column shows the spatial variance. For details on the computation, see Sec. 4.1. Units in the plots arearbitrary. (cid:80) ∞ m = d m . With similar arguments as in Secs. 3.1 and 3.2, wecan get an estimate for the number of samples needed to re-construct the random topography, e.g., by truncating the se-ries up to the number of components { v m } that capture 99%of the total variance.The source topographies t i (Eq. 2) can also be analyzed us-ing the eigenbasis ({ v m }). Based on the Karhunen–Loeve the-orem, we expect that { v m } compresses the source topographyrepresentation, e.g., on average, fewer components is neededwhen truncating t i in { v m } than in the SF basis { u m }. Fig. 4gives an illustration of the two bases. We study the decompo-sition of source topographies in these bases in Sec. 4.2. Noise kernels and SNR.

Measurement noise can be modeledas a random topography ν ( (cid:126) r ) with an associated covariancekernel K ν ( (cid:126) r , (cid:126) r (cid:48) ). Noise sources include the sensors them-selves. Sensor noise is typically independent across the sen-sors with an equal variance σ , which can be modeled as anequivalent spatial white noise, i.e., a random topography witha kernel K ν ( (cid:126) r , (cid:126) r (cid:48) ) = σ δ ( (cid:126) r − (cid:126) r (cid:48) ). In any orthonormal basis, thecovariance of white noise is proportional to the identity ma-trix E( a i a j ) = (cid:90) (cid:90) u i ( (cid:126) r ) σ δ ( (cid:126) r − (cid:126) r (cid:48) ) u j ( (cid:126) r (cid:48) ) dS (cid:48) dS = σ 〈 u i , u j 〉 = σ δ i j . (22)Noise that originates from sources other than the sensors isgenerally colored (i.e., not white) and can be modeled witha covariance function K ν ( (cid:126) r , (cid:126) r (cid:48) ) = ψ ν ( (cid:126) r ) (cid:62) K ν ψ ν ( (cid:126) r (cid:48) ), where ψ ν ( (cid:126) r ) are the basis functions for the noise topography and K ν is the prior covariance of the noise coefﬁcients. We can now deﬁne the signal-to-noise ratio (SNR) of arandom topography t ( (cid:126) r ). We start by diagonalizing thenoise kernel as in Eq. (21): K ν ( (cid:126) r , (cid:126) r (cid:48) ) = w ( (cid:126) r ) (cid:62) Λ w ( (cid:126) r (cid:48) ), where w i ( (cid:126) r ) are eigenfunctions of K ν and Λ contains the associ-ated noise variances on its diagonal. The noise eigenfunc-tions can be used to compose a whitening kernel W ( (cid:126) r , (cid:126) r (cid:48) ) = w ( (cid:126) r ) (cid:62) Λ − w ( (cid:126) r (cid:48) ), which, when applied to the noise topog-raphy (cid:82) W ( (cid:126) r , (cid:126) r (cid:48) ) ν ( (cid:126) r (cid:48) ) dS (cid:48) , yields spatial white noise. Apply-ing the whitener to a random topography, we get a whitenedtopography ˜ t ( (cid:126) r ) = (cid:82) W ( (cid:126) r , (cid:126) r (cid:48) ) t ( (cid:126) r (cid:48) ) dS (cid:48) . The spatial SNR, i.e.,the expected SNR for a sample at (cid:126) r , is the variance of thewhitened topographySNR( (cid:126) r ) = K ˜ t ( (cid:126) r , (cid:126) r ) = (cid:90) (cid:90) W ( (cid:126) r , (cid:126) r (cid:48) ) K t ( (cid:126) r (cid:48) , (cid:126) r (cid:48)(cid:48) ) W ( (cid:126) r , (cid:126) r (cid:48)(cid:48) ) dS (cid:48) dS (cid:48)(cid:48) .(23)With white noise, the whitening operation only acts as scalingwith 1/ σ . With colored noise, it also modiﬁes the covariancestructure of the random topography. In this and the following section, we discuss how an optimalsampling grid R can be constructed. A classical approachwould be to choose R that minimizes the expected recon-struction error, involving the inversion of Ψ (cid:62) Ψ (Sec. 3.2). Inthe Bayesian approach, R should minimize the posterior vari-ance, which involves the inversion of K + Σ (Sec. 3.3). One wayto obtain an optimal sampling is then to ﬁnd an R that max-imizes the "size" of either of these matrices. The matrix sizecan be generally quantiﬁed in multiple ways, giving different6 igure 4: Illustration of the orthonormal basis functions that are used to rep-resent topographies. Top : Representation of the spatial-frequency (SF) ba-sis and the topography eigenbasis due to IID sources. Each column showsthe SF function and eigenfunction with the corresponding index. Spatialfrequencies of the SF functions are displayed. The energy spectrum of theeigenfunction in SF basis is shown at bottom. The SF basis was generatedusing the zero-Neumann boundary condition (Sec. 4.1). The eigenfunctionscomprise multiple SF functions with an average spatial frequency increas-ing together with the index.

Bottom : Random topographies and their energyspectra both in the SF basis and eigenbasis. Topography energy is distributedto lower index components in the eigenbasis than in the SF basis indicatingcompression by the eigenbasis. optimality criteria in the context of (Bayesian) experimentaldesign (Krause et al., 2008; Chaloner & Verdinelli, 1995).A common criterion is

A-optimality (Krause et al., 2008)measured either as Tr[( Ψ (cid:62) Ψ ) − ] or more generally asTr[( Ψ (cid:62) Σ − Ψ + K − a ) − ]. However, when studying samplingon a bounded surface as in MEG and EEG, we should rathermeasure how well the sampling pattern captures the overallvariance on the surface. To optimize the measurement in thissense, we deﬁne the fractional explained variance:FEV( R ) = − (cid:82) K f ( (cid:126) r , (cid:126) r | R ) dS (cid:82) K f ( (cid:126) r , (cid:126) r ) dS = (cid:82) k ( (cid:126) r ) (cid:62) ( K + Σ ) − k ( (cid:126) r ) dS (cid:82) K f ( (cid:126) r , (cid:126) r ) dS , (24)which ranges from 0 to 1, i.e., from no to all variance ex-plained. This measure is related to I-optimality or integratedoptimality (Atkinson, 2014).The matrix size can also be measured using the deter-minant; this criterion is called

D-optimality (Chaloner &Verdinelli, 1995). In the Bayesian setting, this leads to max-imization of the total information (Appendix B)TI( R ) =

12 log det( K + Σ )det( Σ ) =

12 log det( ˜ K + I ), (25)where ˜ K = Σ − K Σ − is the whitened sample covariancematrix. The D-optimal R minimizes the posterior entropy Figure 5: Illustration of covariance kernel and sampling.

Left : Coefﬁcientvariance as a function of spatial frequency for a strictly bandlimited processand a process with a smooth variance decay.

Middle : The covariance func-tions and samples. The red and cyan curves describe the covariance func-tions for two samples while the spikes show the sample positions. Equis-paced samples ﬁt the zeros of the sinc functions. For a non-sinc covariance,there is no trivial choice for the optimal sampling distance.

Right : Samplecovariance matrices for the two cases, i.e., the covariance functions sampledat the equispaced locations. Dashed lines indicate the rows of the two sam- ples. of the random-ﬁeld coefﬁcients a (Sebastiani & Wynn, 2000),i.e., maximizes the information gained, e.g., from the neuralsources. Previously, total information has been used in stud-ies comparing MEG sensor arrays (Kemppainen & Ilmoniemi,1989; Nenonen et al., 2004; Schneiderman, 2014; Iivanainenet al., 2017; Riaz et al., 2017). In this section, we suggest a method to obtain sampling gridsthat maximize the total information for given prior assump-tions. In Appendix B, we show that this corresponds to maxi-mizing the diagonal elements in the whitened sample covari-ance matrix ˜ K while simultaneously minimizing the absolutevalues of the non-diagonal elements. In other words, maxi-mizing total information is equivalent to ﬁnding the samplinggrid with the least correlations and maximal SNR.As an illustrating example, we ﬁrst discuss how the samplespacing in the Shannon–Nyquist theorem (Shannon, 1949;Jerri, 1977) can be seen to result from information maximiza-tion. When treating bandlimited functions as random pro-cesses with a uniform prior variance for the spatial-frequencycoefﬁcients up to k B , the kernel K ( x , x (cid:48) ) can be calculated asthe Fourier transformation of a boxcar function. This resultsin a sinc-function kernel with zeros at equispaced intervals1/(2 k B ) illustrated in Fig. 5. When the samples are placed atthe zero crossings, the sample covariance matrix K becomesdiagonal, maximizing the total information.To maximize the total information for a given number ofsamples with a more general covariance kernel, we reformu-late the problem using the kernel eigenfunctions v n ( (cid:126) r ). Weconsider a (whitened) kernel K ( (cid:126) r , (cid:126) r (cid:48) ) with an eigendecompo-sition v ( (cid:126) r ) (cid:62) Dv ( (cid:126) r (cid:48) ). The decomposition can be rewritten as7 ( (cid:126) r , (cid:126) r (cid:48) ) = ˜ v ( (cid:126) r ) (cid:62) ˜ v ( (cid:126) r (cid:48) ), where ˜ v ( (cid:126) r ) = D v ( (cid:126) r ), i.e., the covari-ance between points (cid:126) r and (cid:126) r (cid:48) can be calculated as an Eu-clidean dot product between ˜ v ( (cid:126) r ) and ˜ v ( (cid:126) r (cid:48) ). This is similarto the kernel trick used in machine learning (Williams & Ras-mussen, 2006), where ˜ v ( (cid:126) r ) are called feature vectors. Rewrit-ing the squared distance between these vectors as || ˜ v ( (cid:126) r ) − ˜ v ( (cid:126) r (cid:48) ) || = || ˆ v ( (cid:126) r ) || + || ˜ v ( (cid:126) r (cid:48) ) || − v ( (cid:126) r ) (cid:62) ˆ v ( (cid:126) r (cid:48) ) = K ( (cid:126) r , (cid:126) r ) + K ( (cid:126) r (cid:48) , (cid:126) r (cid:48) ) − K ( (cid:126) r , (cid:126) r (cid:48) ), (26)we can see that maximizing this distance with respect (cid:126) r and (cid:126) r (cid:48) corresponds to maximizing the diagonal elements in thesample covariance matrix K while simultaneously minimiz-ing the non-diagonal elements.The sample conﬁguration that yields maximal informationcan be found with the farthest-point sampling algorithm (El-dar et al., 1997; Schlömer et al., 2011), which attempts to max-imize the pair-wise minimum distances between the samplepoints. Instead of maximizing distances | (cid:126) r − (cid:126) r (cid:48) | , we maxi-mize (cid:107) ˜ v ( (cid:126) r ) − ˜ v ( (cid:126) r (cid:48) ) (cid:107) ; otherwise the algorithm works similarly.If K [ i , j ] = K ( (cid:126) r i , (cid:126) r j ) are positive for neighbouring samples, thealgorithm leads to minimization of the non-diagonal | K [ i , j ] | ,and thereby to maximization of the total information.Generally, the sampling grid given by the method followsthe covariance structure of the kernel (Fig. 3). For example,approximately uniform sampling grids can be generated us-ing spatial-frequency-bandlimited covariance with circularlysymmetric sinc-like kernels (Sec. 4.4). This showcases a con-nection between our sampling method and the sampling the-orems that assume bandlimited functions and use uniformsampling (Shannon, 1949; Pesenson, 2015).

4. Simulations

In this Section, we simulate spatial-frequency (SF) spectra ofsource topographies and covariance kernels of random to-pographies due to random source distributions. Further, weconstruct optimal sampling grids for different kernels andevaluate their performance. We begin by outlining the nu-merical methods.

We used a head model built using the example data ofSimNIBS software (version 2.0; Windhoff et al. 2013) in anearlier study (Stenroos & Nummenmaa, 2016). The headmodel consists of the outer surfaces of the white matter inleft and right hemispheres, gray matter, cerebellum, scalp aswell as the inner and outer surfaces of the skull.We assumed a piecewise constant isotropic conductor andcomputed the electric potential and magnetic ﬁeld using alinear Galerkin boundary-element method with the isolated-source approach (Stenroos et al., 2007; Stenroos & Sarvas,2012). We used two source spaces discretized into point-likedipolar source-current elements. To analyze the spatial fre-quency content of the source topographies independent ofthe source orientation, we constructed a volumetric sourcespace comprising sets of three orthogonal dipoles at 4-mm spacing inside the gray-matter surface (10 635 positions). Forall other analyses, we used a cortically-constrained sourcespace with dipoles distributed on the boundary of white andgray matter at 3-mm average spacing. The 20 324 sourceswere oriented normal to the surface following the anatomi-cal orientation of apical dendrites of pyramidal neurons.We constructed three volume conductor models from theanatomical head model (see Fig. 6C). The model 4C used inmost simulations had the following compartments: brain (in-side the outer surface of gray matter) and cerebellum, cere-brospinal ﬂuid (CSF), skull and scalp. The conductivity ofthe soft tissues (brain, cerebellum and scalp) was set to 0.33S/m while the conductivity of CSF was 1.79 S/m. For skullconductivity, we used the value 0.33/50 S/m. Two other vol-ume conductor models were constructed by removing com-partments from 4C in order to illustrate how volume conduc-tion affects the source topographies. In a three-compartmentmodel (3C), CSF was removed from 4C so that the conductiv-ity boundaries were deﬁned by the inner and outer skull sur-faces as well as the scalp surface (brain, skull and scalp). In asingle-compartment conductor (1C), all other compartmentswere removed expect the scalp surface.We calculated the electric potential and the normal com-ponent of the magnetic ﬁeld at the nodes of triangularmeshes that represented the measurement surfaces. Themeasurement surfaces were generated from a dense scalpmesh by cutting the mesh above a plane deﬁned roughly bythe ears and nose. The cut mesh was resampled to 5 404nodes and 10 421 triangles. The mesh for the electric poten-tial (EEG) was obtained by projecting the node positions onthe scalp. An ’on-scalp’ MEG surface was generated by inﬂat-ing the mesh 4.5 mm away from the scalp. A more distant’off-scalp’ MEG surface was obtained by further inﬂating themesh and by smoothing it with the function smoothsurf inthe iso2mesh MATLAB toolbox (Fang & Boas, 2009). The me-dian distance of the nodes of the off-scalp MEG surface to thescalp was 2.4 cm (2.0–4.4 cm).The SF basis vectors were computed by discretizing the LBoperator to the triangle mesh in the weak form (Reuter et al.,2009). The discrete form of the eigenvalue equation (3) is − Cu i = k i Mu i , (27)where u i contains the nodal values for the i th SF function, M is a matrix that takes account the overlap in the piecewise-linear basis functions, and C is the discrete LB operator.Matrices C and M were computed using MATLAB functions cotmatrix and massmatrix included in gptoolbox (Jacob-son et al., 2018). The zero-Neumann boundary condition,which sets the outwards-facing derivative of u to zero, wasused in the spectral energy analysis, while the zero-Dirichletboundary condition, setting u to zero, was used in the gridconstruction. The inner product of Eq. (4) discretizes to 〈 u i , u j 〉 = (cid:90) S u i ( (cid:126) r ) u j ( (cid:126) r )d S ≈ u (cid:62) i Mu j . (28)The IID source topography kernel ( K IID ; Sec. 3.4) was con-structed using the cortically-constrained source space and8C. The eigenbasis of K IID was computed by discretizing thekernel on the surface and solving the discrete eigenvectors v i . We quantiﬁed the source topography energies andtheir spectra to estimate the beneﬁcial numbers of spatialsamples as discussed in Sec. 3.2. The spectra were computedusing the SF basis and the eigenbasis of K IID . Speciﬁcally, weanalyzed the number of SF components needed to get 99%energy of each source topography, we denote this numberby B . The maximum spatial frequency of those SF com-ponents was deﬁned as the 99% bandwidth of the source to-pography, k . In order to quantify source topography com-pression, we calculated the number of eigencomponents toachieve 99% energy M and the ratio M / B .For the locations of volumetric sources, the topographyenergies and B were computed by summing the squaredcoefﬁcients of the three orthogonal dipoles in the same po-sition. To see how the properties of the volume conductoraffect the spectra, the analysis was repeated in all volumeconductor models. To model realistic source orientations,the energy and B -analysis was repeated in the cortically-constrained source space. The sensor spacing in each modal-ity that corresponds to the maximum B across the sourceswas estimated using uniform sampling grids (see Sec. 4.4).We further inspected how many SF components wereabove a certain spatial white noise level. We matched thenoise levels of MEG and EEG: B components of thehighest-energy source topography were above the noise level.For on-scalp and off-scalp MEG, the same noise level wasused. Results.

Fig. 6A–B displays B for each volumetric sourceposition. Generally, B decreases as a function of sourcedepth. For the most superﬁcial sources, B ranges from200 to 280 in on-scalp MEG, while in off-scalp MEG and inEEG it is up to 90. The bandwidth k of the most superﬁcialsources in on-scalp MEG is about 2.1 1/cm, while in off-scalpMEG and EEG it is approximately 1.0 and 1.2 1/cm, respec-tively.The numbers of topography eigencomponents M areshown in Fig. 6B. For on-scalp MEG, up to 179 componentsare required, while corresponding counts for the off-scalpMEG and EEG are 61 and 62. The number of eigencompo-nents M is typically lower than the number of SF compo-nents B . The ratio M / B is on average 81%, 67% and73% in on-scalp MEG, off-scalp MEG and EEG with ranges42–200%, 30–130% and 30–180%, respectively.Topography energy as a function of source depth is shownin Fig. 6E. For superﬁcial sources, the energy is 6–7 timeshigher in on-scalp than in off-scalp MEG. The effect of differ-ent compartments of the volume conductor on the topogra-phy energy spectra is visualized in Fig. 6D–E. In on-scalp andoff-scalp MEG, the topography energy spectra are relativelyunaffected by the CSF and skull in the volume conductor. On the contrary, topography spectra in EEG show a depen-dency on the spatial ﬁltering occurring due to volume con-duction: for example, in 1C, the maximum B is roughly350 giving k of about 2.5 1/cm, while including the poorly-conducting skull (3C) causes spatial low-pass ﬁltering, lead-ing to a maximum B of 120.Topography energies and SF component counts for thecortically-constrained sources are shown in Fig. 7. In MEG,superﬁcial tangential sources have the highest energies. Themaximum B is about 300, 100 and 110 in on-scalp MEG,off-scalp MEG and EEG ( k : 2.2, 1.0 and 1.3 1/cm). Forthese maximum B , the estimated uniform sample spac-ings are 1.5, 3.4 and 2.6 cm. With the set noise level (0.5 fT and2.4 nV assuming a source amplitude of 20 nAm), at maximum305, 90 and 105 SF components are above noise in on-scalp,off-scalp MEG and EEG, respectively; with an about 18 timeshigher noise level (9.2 fT and 44 nV) those counts are 114, 43and 40. We analyzed the spatial variance K IID ( (cid:126) r , (cid:126) r )(amount of signal) and correlations K IID ( (cid:126) r i , (cid:126) r ) (sample spac-ing; Sec. 3.4). The number of eigencomponents explaining99% of the total variance was calculated to estimate thespatial degrees of freedom of the random source distribution(Sec. 3.4). Topography correlation length was quantiﬁed foreach measurement point (cid:126) r i by computing the distance alongthe surface at which K IID ( (cid:126) r i , (cid:126) r ) had decayed to half of itsmaximum value (the ’half-maximum width’). The distancesalong the surfaces were computed using a method based onthe heat equation (Crane et al., 2017) implemented as theMATLAB function heat_geodesic in gptoolbox (Jacobsonet al., 2018). Results.

Fig. 8 illustrates the kernels of the IID randomsource distribution for the three modalities. To explain 99%of the total variance, 88, 35 and 25 eigencomponents areneeded in on-scalp MEG, off-scalp MEG and EEG, respec-tively. Variance is distributed nonuniformly on the measure-ment surface with the highest values around the temporalcortex. The kernels are asymmetric and the correlation lengthvaries across the surface; the least correlated area can befound on top of the temporal cortex. The correlation lengthsare shortest in on-scalp MEG as indicated by smaller values ofhalf-maximum widths compared to off-scalp MEG and EEG.

The sampling grids were constructed by subsam-pling the nodal positions of the triangle meshes using themethod described in Sec. 3.6. For a given kernel discretizedon the mesh, we calculated its eigendecomposition, and uti-lized a farthest-point sampling algorithm implemented ingptoolbox (Jacobson et al., 2018) to obtain the optimal sam-pling positions. We ran the optimization algorithm severaltimes with a random initial conﬁguration to seek the globaloptimum as the output of the algorithm was sensitive to theinitial condition.9 B O n s c a l p O ff s c a l p EE G B A B % M C B M T o p o g r a p h y e n e r g y B % E Figure 6: Basis-function representations of topographies of volumetric sources (A–B) and their dependency on the volume conductor (C–E). A: Number ofspatial-frequency (SF) components to reach 99% of the energy ( B ) for each source position visualised on orthogonal slices of brain volume. B: B ) and eigenbasis ( M ) as a function of source depth. The dots correspond to individual sources and the lines estimate themedians of the distributions as a function of source depth. Scatterplots across the source positions compare M and B . C: Volume conductors withdifferent number of compartments (4C, 3C and 1C). D: Energy spectra of the topographies averaged across the sources with different conductor models. E: Topography energy and B with different conductor models.Figure 7: Topography energy and spatial-frequency component count of cortically-constrained sources. Left:

Normalized topography energy (same normal- ization for on-scalp and off-scalp MEG).

Center:

Number of components needed to capture 99% topography energy ( B ). Right:

Number of componentsabove the indicated spatial white noise level (given as a ratio to the source amplitude). igure 8: Topography kernel due to independent Gaussian sources with anequal variance. A: Variance as a function of the kernel eigencomponent. In-set shows the normalized eigenvalues (variances). B: Distribution of the vari-ance on the measurement surface. C: Spatial proﬁle of the ﬁeld kernel at oneposition on the measurement surface and its decay as a function of distancealong the surface. The length at which the kernel has decayed to half (half-maximum width) is illustrated. D: Half-maximum width of the kernel acrossthe measurement surface.

Grid construction.

We constructed sampling grids for sce-narios where random source distributions were deﬁned inthe brain: a global scenario where the whole brain was as-sumed active and of interest, and for a local scenario wherea region of interest (ROI) in the brain was deﬁned. Wecompared the performance of uniform sampling to model-informed sampling for different numbers of spatial samples N . Uniform sampling was generated using bandlimited pri-ors in the SF basis (white covariance for the N lowest SF co-efﬁcients; SF bandlimited prior; Sec. 3.4). Model-informedsampling was generated by encoding the prior in the covari-ance of the cortically-constrained sources (IID source prior;Sec. 3.4). The grids were evaluated by computing total infor-mation (TI) and fractional explained variance FEV accordingto Eqs. (25) and (24) from the ’ground-truth’ model of the sig-nal and noise. Here, the ground-truth model corresponded to theIID source prior and spatial white noise. The grids were con-structed on meshes with the regions of eyes cropped out. Weperformed two analyses. In the ﬁrst, we compared uniform and model-informed sampling in EEG and MEG as a func-tion of SNR. The standard deviation of the white noise rangedfrom 0.05 σ to 20 σ . The reference noise level σ was ﬁxed ineach modality so that they gave an average SNR (23) of 1 forsource standard deviation of 7.7 pAm. Using this convention, σ was determined to be 9 fT, 3.7 fT and 42 nV in on-scalp,off-scalp MEG and EEG, respectively.In the second analysis, we compared uniform sampling ofon-scalp and off-scalp MEG as a function of spatial whitenoise level. We set the reference source variance q so thatthe average SNR of off-scalp MEG was 1 with a white noiselevel of 3 fT. The white noise level of on-scalp MEG rangedfrom 1 to 20 fT and the source variance was varied (0.2 q , q and 5 q ). On-scalp MEG grids comprising 10–600 pointswere generated, while off-scalp MEG had 100 or 300 points. Results.

Fig. 9 summarizes the ﬁrst analysis in the globalscenario. For model-informed sampling, the sample densityis the highest on the temporal lobe, which corresponds tothe shorter correlation lengths and higher variance of the to-pography kernel shown in previous analysis (Fig. 8). Model-informed sampling is especially beneﬁcial in MEG at lowSNR < q , 31, 125 and 260 samples are needed in on-scalp MEG withnoise levels 3, 7 and 10 fT, respectively, to yield the same TIas 300 off-scalp MEG samples. The spacings correspondingto those sample numbers in on-scalp MEG are about 4.4, 2.2and 1.5 cm (Fig. 9A). For a noise level of 15 fT, roughly 590samples are needed. For a lower source variance 0.2 q , cor-responding sample numbers for equal TI are 33, 173 and 357,while for higher variance 5 q they are 34, 92 and 180. Witha higher source variance, about 400 on-scalp samples with anoise level of 15 fT give the same TI as 300 off-scalp samples.11 dd i t i o n a l s a m p l e s SNR = 0.050.10.21.051020

Number of samples F E V ( % )

100 300050100150200 A SF bandlimited prior Source prior C

25 50 200 25 50 200Number of samples T I ( b i t s )

100 300050100150200100 300050100150200

On-scalp MEGOff-scalp MEGEEG B SNR = 10SNR = 0.1 SNR = 1 On-scalp MEG Off-scalp MEG EEG T I r a t i o

10 100 400

Number of samples mean, SFminmean, sourcemin S a m p l e s p a c i n g ( c m ) Figure 9: Comparison of uniform (SF bandlimited prior) and model-informed (IID source prior) sampling of the whole brain in on-scalp MEG, off-scalp MEGand EEG. A: Example on-scalp MEG sampling grids generated with the priors and their average and minimum sample spacing. B: Total information (TI) andfractional explained variance (FEV) computed with different SNRs for uniform grids. C: Ratio of the TI obtained by model-informed and uniform samplingwith different SNRs (top) and the number of additional samples required with uniform sampling to achieve the same TI as with model-informed sampling(bottom).

In the local scenario, the ground-truth model con-sisted of IID sources distributed in an ROI deﬁned aroundthe motor cortex. Both white (sensor) and colored (sensor +background brain activity) noise were considered. The ROIconsisted of sources within a patch that had a radius of ap-proximately 3 cm along the cortical surface (387 source el-ements; Fig. 11A). Sampling grids were constructed for on-scalp MEG and EEG.The source variance q was set so that the maximum SNRof the on-scalp MEG was 3 with a spatial white noise level of9 fT. The white noise level of EEG (24 nV) was set so that themaximum SNR was also 3. To model colored noise due tobrain background activity, the sources outside the ROI wereassumed active (IID Gaussian; variance q /100).The topography kernel was whitened as described in Sec.3.4 and SNR was analyzed. The eigenvalues λ i of thewhitened kernel were converted to bits as 1/2log ( λ i + Results.

Fig. 11A shows the SNR distributions as well as theTI content across the eigencomponents. With white noise,the TI is 28 bits in MEG and 27 bits in EEG. In MEG, 19 eigen-components have a TI content larger than 0.1 bits, while thisnumber is 13 in EEG. With colored noise, the maximum SNRis 1.1 in MEG and 0.3 in EEG while the TI is 18 and 12 bits,respectively. The number of eigencomponents above 0.1 bitsis in this case 16 and 12 in MEG and EEG, respectively.The grids constructed using the different priors for signaland noise are presented in Fig. 11B. Compared to uniformsampling, the model-informed grids are more densely dis-tributed, especially when the colored noise model is used.Compared to whole-head sampling, distributing samples onthe region with high SNR (Fig. 11A) is beneﬁcial in terms of TI12 White noise level (fT) -3 -2 -1 S NR , on scalpq , off scalpq T I ( b i t s ) F EV ( % ) Number of samples

On-scalp MEG white noise level (fT) , 300 off scalpq , 100 off scalpq N u m be r o f on - sc a l p M E G s pa t i a l s a m p l e s A B C

Figure 10: Spatial sampling of independent sources (variance q ) distributed over the whole cortex with on-scalp and off-scalp MEG. A: Signal-to-noise ratioaveraged over the whole measurement surface as a function of spatial white noise level. B: Total information (TI) and fractional explained variance (FEV).Colored lines give the values for on-scalp MEG as a function of sample number and noise level. The vertical lines show the values for off-scalp MEG with 100and 300 samples with a noise level of 3 fT. The source variance is q . C: Number of spatial samples needed in on-scalp MEG to achieve the same TI as in off-scalp MEG (noise level 3 fT; 100 or 300 samples) as a function of on-scalp MEG white noise level. both in EEG and MEG. Grids constructed with source priorsgive the highest TI, especially in MEG (Fig. 11C). For exam-ple, to reach 5 bits of information in the situation with whitenoise, 11 samples are needed in the model-informed MEGgrids while 80 samples are needed in the whole-head grid; forEEG the corresponding sample counts are 19 and 50. Withcolored noise, corresponding sample counts are about 14 and120 in MEG and 40 and 200 in EEG, respectively.

5. Discussion

We quantiﬁed the number of spatial samples beneﬁcial forMEG and EEG by analyzing the spatial frequency content ofthe source topographies. Based on a review of Gaussian pro-cesses and experimental design, we related the informationmetric used in MEG sensor-array designs to the covarianceof random-ﬁeld models. This relationship was used to derivean information-maximizing sampling algorithm. We appliedthe algorithm to generate sampling grids, which we used toquantify the beneﬁt of model-informed sampling of randomsource distributions in comparison to uniform sampling.

To capture 99% of the topography energy of every source inthe brain, approximately 300, 100 and 110 SF componentswere needed in on-scalp, off-scalp MEG and EEG, respec-tively. These numbers represent the maximum number ofspatial degrees of freedom in any topography due to neuralactivity. Additionally, they correspond to the number of uni-form samples needed to achieve at most roughly 1% recon-struction error in noiseless conditions. These results are ingood accordance with previously published results. For MEG,this corresponds to the "rule of thumb" presented by Ahonen et al. 1993: the sensor spacing should be approximately thedistance of the sensors to the closest sources. For EEG, this isin line with the arguments by Srinivasan et al. 1998: the sen-sor count should be at least roughly 120 for adequate sam-pling of the EEG in a typical adult head.The spatial bandwidths of source topographies did not de-pend on the level of detail of the volume conductor in MEG.In contrast, the bandwidths in EEG were affected by conduc-tivity differences within the head, demonstrating the well-known spatial low-pass ﬁltering of EEG (Srinivasan et al.,1996). Therefore, the beneﬁcial sensor spacing and spatialcorrelations in MEG depend mainly on the measurement dis-tance while those in EEG have a dependence on the tissueconductivity contrasts.The eigenbasis of the topography kernel of IID sources wasshown to reduce the component number needed to capture99% of the source topography energies, compressing the to-pography representations on average by 19, 33 and 27% com-pared to the SF basis (on-scalp, off-scalp MEG, and EEG). Thiscompression and the variance analysis of IID sources sug-gest that the sample numbers given by the SF analysis pro-vide oversampling, and, thereby, give a good upper limit onthe beneﬁcial sample number.

To generate sampling grids, we presented a method that uti-lizes prior information of signal and noise in the form of acovariance kernel. Assuming random spatial-frequency co-efﬁcients up to some bandlimit, the kernel is isotropic andtranslation-invariant and the method yields uniform sam-pling grids similar to those in the sampling theorems intro-duced in Sec. 3.1. With this approach, uniform grids may beconstructed also on more complex surfaces and domains for13 niformSource prior

White noise Colored noise Partial Whole head

MEG EEG Eigencomponent

White noiseColored noise I n f o r m a t i o n ( b i t s ) AC White noiseColored noise White noiseColored noise Eigencomponent

White noiseColored noise BD UniformSource prior

White noise Colored noise Partial Whole head I n f o r m a t i o n ( b i t s ) S N R S N R E White noise Colored noise White noise

Number of samples

Source/whiteSource/coloredPartial

100 200 300 400

Whole head

20 40 60 100 200 300 400 0 20 40 600510 100 200 300 400 20 40 60 100 200 300 400 F T I ( b i t s ) Colored noise

Figure 11: Spatial sampling of on-scalp MEG (left panel; A, C and E) and EEG (right; B, D and F) topographies due to uncorrelated sources in an ROI around themotor cortex (shown on the center in red on the inﬂated cortical surface).

A & B:

Eigencomponent distribution of total information (TI) and spatial distributionof SNR computed with spatial white noise as well as with colored noise (brain background activity + white noise).

C & D:

Example grids constructed with IIDsource and bandlimited SF ( ∼ uniform sampling) priors. With IID source prior, two noise priors (white and colored) were considered. E & F:

TI as a function ofnumber of samples in the grids computed using the two noise models. applications that require uniform sampling. The kernel con-structed from the topographies of IID random neural sourceshas a location-dependent shape and length, and the methodyields nonuniform sampling grids.In the case of whole-head sampling of MEG topographiesdue to random sources distributed over the cortex, nonuni-form model-informed sampling was only slightly beneﬁcialcompared to uniform sampling. When the number of sam-ples was small and the noise level high, nonuniform sam-pling yielded roughly 10–25% more information than uni-form sampling. When decreasing the noise level or increas-ing the number of samples, the performance of uniform andnonuniform sampling grids became roughly similar. Withnonuniform sampling, the same total information was, how-ever, reached with fewer number of samples; the samplenumber was proportional to the total sample count and didnot strongly depend on the SNR. In EEG, the beneﬁts ofnonuniform sampling were less clear; EEG may not beneﬁtfrom it due to long-range correlations. Though, the nonuni-form grids generated with the method may have been sub-optimal for EEG: the method assumed positive correlationswhile EEG has substantial negative correlations at long dis-tances (Fig. 8). Altogether, among the modalities, with sim- ilar SNR and the number of samples, on-scalp MEG yieldedthe most information, and more measurements were neededto explain the same amount of variance.When a cortical region of interest was deﬁned, local high-density (nonuniform) spatial sampling was beneﬁcial; densesampling of the local components yielded higher total infor-mation than uniform sampling covering a larger surface areaor the whole head. Dense sampling may be especially usefulwhen colored noise ﬁelds (such as those generated by back-ground brain activity) are present and the region of interest issmall. Local dense arrays could be beneﬁcial in applicationswhere a certain part of the brain is of interest and the num-ber of sensors is limited (e.g., Iivanainen et al. 2019) or brain–computer interfacing where simple measurement setups aredesirable.

The sampling grid optimization presented here does not nec-essarily result in sensor arrays that can be readily imple-mented in practice. For example, the minimum distance be-tween the sensors was not constrained and the sensor dimen-sions were ignored. Modeling sensor dimensions is impor-tant as they limit the minimum distance between the sensors14nd the sensor noise level is proportional to them in manysensor types (e.g., Mitchell & Alvarez 2019; Kemppainen & Il-moniemi 1989). Moreover, spatial integration within a sen-sor can be viewed as spatial low-pass ﬁltering, an effect thatcan be used to reduce aliasing due to high spatial frequencies(Roth et al., 1989).The optimization method at its current stage is useful forunderstanding how to place sensors in speciﬁc experimentssuch as when a certain brain region is measured. It canalso be used to aid in practical sensor-array design to informwhether it is beneﬁcial to decrease sensor noise, add sensorsor switch from uniform to nonuniform sampling. In the fu-ture, the sensor dimensions and a constraint on smallest al-lowed sensor distance could be included, and the methodcould be used with a population of head models so that asensor array with the best average performance could be de-signed.We analyzed scalar ﬁelds which is sufﬁcient in EEG. Incontrast, magnetic ﬁeld is a vector ﬁeld, and the orthogonalﬁeld components may each provide additional information(Iivanainen et al., 2017). The sensor orientation could be in-cluded in the optimization by applying vector basis functionsto assemble the kernel. Also (external) interference ﬁeldscould be taken into account. In this case, the vector-sphericalharmonics could be useful (Taulu & Kajola, 2005); samplinggrids with multiple layers of vector samples at different dis-tances from the scalp may be beneﬁcial (Nurminen et al.,2010, 2013).With the generalization of the SF basis to an arbitrary mea-surement surface, the standard 1-D signal processing meth-ods such as ﬁltering and interpolation should be straightfor-ward to apply to spatial EEG/MEG data. Spatial-frequencyﬁltering could be useful in noise and interference rejection asdemonstrated by Graichen et al. (2015); the continuous SFbasis functions make the ﬁltering less dependent on the sen-sor conﬁguration. As the SF basis is both data- and model-independent, the ﬁltering methods may be useful in real-time applications, such as brain–computer interfaces. TheSF ﬁltering as a part of preprocessing would be similar tothe spline Laplacian (Nunez et al., 2019), providing ﬂexibilityfor deﬁning the ﬁlter coefﬁcients. Filtering and interpolationmay also help in data visualization.Estimating the neural sources that could have generatedthe data, i.e., solving the bioelectromagnetic inverse prob-lem (Sarvas, 1987) is often of interest. We did not explicitlyconsider the inverse problem. However, the total informationshould be an estimator of the source-estimation performanceas it quantiﬁes the size (or the stability of the inversion) of themodelled or measured measurement covariance matrix. Inother terms, when the kernel is generated using source to-pographies, total information maximization corresponds tominimization of the posterior entropy of the sources (Sec.3.5). Moreover, the Gaussian random-ﬁeld model has a directconnection to minimum-norm source estimation (AppendixA).

6. Conclusions

We analyzed the spatial sampling of EEG and MEG and sug-gested a method for designing optimal sampling positions.Our simulations of neuronal ﬁelds suggest that on-scalp MEGcan beneﬁt from 300 spatial samples while for off-scalp MEGand EEG the beneﬁcial sample number is roughly 100. Ourmethod for sample positioning can be used to design sam-pling grids that convey the most information from the neu-ronal sources. Such designs may be useful when the sensornumber is limited or a certain region of the brain is of inter-est.

Conﬂicts of interest

The authors declare that they have no conﬂict of interest.

Data and code availability statement

To facilitate further use in the research community, theinformation-maximizing sampling algorithm is availablefrom the corresponding authors for academic use.The data and code sharing policy adopted by the authorscomplies with the requirements of Aalto University.

Acknowledgments

This work has received funding from the European Union’sHorizon 2020 research and innovation programme undergrant agreement No 686865 (project BREAKBEN), the Euro-pean Research Council under grant agreement No 678578(project HRMEG), the National Institute of Neurological Dis-orders and Stroke of the National Institutes of Health underAward Number R01NS094604, the Finnish Cultural Founda-tion under grant nos. 00170330 and 00180388 (JI), and Vilho,Yrjö and Kalle Väisälä Foundation (AM). The content is solelythe responsibility of the authors and does not necessarily rep-resent the ofﬁcial views of the funding organizations.

References

Abrahamsen, P. (1997). A review of Gaussian random ﬁelds and correlationfunctions.Ahonen, A. I., Hämäläinen, M. S., Ilmoniemi, R. J., Kajola, M., Knuutila, J.E. T., Simola, J., & Vilkman, V. A. (1993). Sampling theory for neuromag-netic detector arrays.

IEEE Transactions on Biomedical Engineering , ,859–869.Atkinson, A. C. (2014). Optimal Design. Wiley StatsRef: Statistics ReferenceOnline , (pp. 1–17).Baillet, S. (2017). Magnetoencephalography for brain electrophysiology andimaging.

Nature neuroscience , , 327.Boto, E., Bowtell, R., Krüger, P., Fromhold, T. M., Morris, P. G., Meyer, S. S.,Barnes, G. R., & Brookes, M. J. (2016). On the Potential of a New Genera-tion of Magnetometers for MEG: A Beamformer Simulation Study. PLOS

ONE , , e0157655. doi: .Brodbeck, V., Spinelli, L., Lascano, A. M., Wissmeier, M., Vargas, M.-I., Vul-liemoz, S., Pollo, C., Schaller, K., Michel, C. M., & Seeck, M. (2011). Elec-troencephalographic source imaging: a prospective study of 152 operatedepileptic patients. Brain , , 2887–2897. ronstein, M. M., Bruna, J., LeCun, Y., Szlam, A., & Vandergheynst, P. (2017).Geometric deep learning: going beyond Euclidean data. IEEE Signal Pro-cessing Magazine , , 18–42.Budker, D., & Romalis, M. (2007). Optical magnetometry. Nature physics , ,227.Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science , (pp. 273–304).Chilès, J.-P., & Desassis, N. (2018). Fifty years of kriging. In

Handbook ofMathematical Geosciences (pp. 589–612). Springer.Crane, K., Weischedel, C., & Wardetzky, M. (2017). The heat method for dis-tance computation.

Commun. ACM , , 90–99. URL: http://doi.acm.org/10.1145/3131280 . doi: .Dale, A. M., & Sereno, M. I. (1993). Improved localizadon of cortical activityby combining EEG and MEG with MRI cortical surface reconstruction: alinear approach. Journal of cognitive neuroscience , , 162–176.de Munck, J. C., Vijn, P. C. M., & Lopes da Silva, F. H. (1992). A random dipolemodel for spontaneous brain activity. IEEE Transactions on BiomedicalEngineering , , 791–804. doi: .Eldar, Y., Lindenbaum, M., Porat, M., & Zeevi, Y. Y. (1997). The farthest pointstrategy for progressive image sampling. IEEE Transactions on Image Pro-cessing , , 1305–1315.Faley, M., Dammers, J., Maslennikov, Y., Schneiderman, J., Winkler, D.,Koshelets, V., Shah, N., & Dunin-Borkowski, R. (2017). High-Tc SQUIDbiomagnetometers. Superconductor Science and Technology , , 083001.Fang, Q., & Boas, D. A. (2009). Tetrahedral mesh generation from volumetricbinary and grayscale images. In Biomedical Imaging: From Nano to Macro (pp. 1142–1145). Ieee.Freeman, W. J., Holmes, M. D., Burke, B. C., & Vanhatalo, S. (2003). Spatialspectra of scalp EEG and EMG from awake humans.

Clinical Neurophysi-ology , , 1053–1068.Graichen, U., Eichardt, R., Fiedler, P., Strohmeier, D., Zanow, F., & Haueisen,J. (2015). SPHARA-a generalized spatial Fourier analysis for multi-sensorsystems with non-uniformly arranged sensors: Application to EEG. PloSone , , e0121741.Grover, P., & Venkatesh, P. (2016). An information-theoretic view of EEG sens-ing. Proceedings of the IEEE , , 367–384.Hämäläinen, M., Hari, R., Ilmoniemi, R. J., Knuutila, J., & Lounasmaa, O. V.(1993). Magnetoencephalography—theory, instrumentation, and appli-cations to noninvasive studies of the working human brain. Reviews ofmodern Physics , , 413.Hedrich, T., Pellegrino, G., Kobayashi, E., Lina, J.-M., & Grova, C. (2017).Comparison of the spatial resolution of source imaging techniques inhigh-density EEG and MEG. NeuroImage , , 531–544.Hill, R. M., Boto, E., Rea, M., Holmes, N., Leggett, J., Coles, L. A., Papastavrou,M., Everton, S., Hunt, B., Sims, D. et al. (2020). Multi-channel whole-headOPM-MEG: Helmet design and a comparison with a conventional system. NeuroImage , (p. 116995).Iivanainen, J., Stenroos, M., & Parkkonen, L. (2017). Measuring meg closerto the brain: Performance of on-scalp sensor arrays.

NeuroImage , ,542–553.Iivanainen, J., Zetter, R., & Parkkonen, L. (2019). Potential of on-scalp MEG:Robust detection of human visual gamma-band responses. Human BrainMapping , , 0.Jacobson, A. et al. (2018). gptoolbox: Geometry Processing Toolbox.http://github.com/alecjacobson/gptoolbox.Jerri, A. J. (1977). The Shannon sampling theorem—its various extensionsand applications: A tutorial review. Proceedings of the IEEE , , 1565–1596.Kemppainen, P., & Ilmoniemi, R. (1989). Channel capacity of multichannelmagnetometers. In Advances in biomagnetism (pp. 635–638). Springer.Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placementsin gaussian processes: Theory, efﬁcient algorithms and empirical studies.

Journal of Machine Learning Research , , 235–284.Levy, B. (2006). Laplace-Beltrami eigenfunctions towards an algorithm that"understands" geometry. In IEEE International Conference on Shape Mod-eling and Applications 2006 (SMI’06) (pp. 13–13). IEEE.Lindley, D. V. et al. (1956). On a measure of the information provided by an experiment.

The Annals of Mathematical Statistics , , 986–1005.Loeve, M. (1978). Probability Theory II . (4th ed.). Springer.Luke, H. D. (1999). The origins of the sampling theorem.

IEEE Communica-tions Magazine , , 106–108. doi: . McEwen, J. D., & Wiaux, Y. (2011). A novel sampling theorem on the sphere. IEEE Transactions on Signal Processing , , 5876–5887.Mercer, J. (1909). Functions of positive and negative type, and their con-nection the theory of integral equations. Philosophical transactions of theroyal society of London, Series A , , 415–446.Mitchell, M. W., & Alvarez, S. P. (2019). Quantum limits to the energy resolu-tion of magnetic ﬁeld sensors. arXiv preprint arXiv:1905.00618 , .Nenonen, J., Kajola, M., Simola, J., & Ahonen, A. (2004). Total information ofmultichannel MEG sensor arrays. In Proceedings of the 14th internationalconference on biomagnetism (Biomag2004) (pp. 630–631).Nunez, P. L., Nunez, M. D., & Srinivasan, R. (2019). Multi-scale neural sourcesof EEG: genuine, equivalent, and representative. A tutorial review.

Braintopography , , 193–214.Nunez, P. L., Srinivasan, R. et al. (2006). Electric ﬁelds of the brain: the neuro-physics of EEG . Oxford University Press, USA.Nurminen, J., Taulu, S., Nenonen, J., Helle, L., Simola, J., & Ahonen, A. (2013).Improving MEG performance with additional tangential sensors.

IEEETransactions on Biomedical Engineering , , 2559–2566.Nurminen, J., Taulu, S., & Okada, Y. (2010). Improving the performance ofthe signal space separation method by comprehensive spatial sampling. Physics in Medicine & Biology , , 1491.Pesenson, I. (2000). A sampling theorem on homogeneous manifolds. Trans-actions of the American Mathematical Society , , 4257–4269.Pesenson, I. Z. (2014). Multiresolution analysis on compact Riemannianmanifolds. arXiv preprint arXiv:1404.5037 , .Pesenson, I. Z. (2015). Sampling, splines and frames on compact manifolds. GEM-International Journal on Geomathematics , , 43–81.Petersen, K. B., & Pedersen, M. S. (2012). The matrix cookbook, .Petrov, Y., Nador, J., Hughes, C., Tran, S., Yavuzcetin, O., & Sridhar, S. (2014).Ultra-dense EEG sampling results in two-fold increase of functional braininformation. Neuroimage , , 140–145.Qiu, A., Bitouk, D., & Miller, M. I. (2006). Smooth functional and structuralmaps on the neocortex via orthonormal bases of the Laplace-Beltrami op-erator. IEEE Transactions on Medical Imaging , , 1296–1306.Reuter, M., Biasotti, S., Giorgi, D., Patanè, G., & Spagnuolo, M. (2009). Dis-crete Laplace–Beltrami operators for shape analysis and segmentation. Computers & Graphics , , 381–390.Riaz, B., Pfeiffer, C., & Schneiderman, J. F. (2017). Evaluation of realistic lay-outs for next generation on-scalp MEG: spatial information density maps. Scientiﬁc reports , , 6974.Robinson, A. K., Venkatesh, P., Boring, M. J., Tarr, M. J., Grover, P., &Behrmann, M. (2017). Very high density EEG elucidates spatiotemporalaspects of early visual processing. Scientiﬁc reports , , 16248.Roth, B. J., Sepulveda, N. G., & Wikswo Jr, J. P. (1989). Using a magnetome-ter to image a two-dimensional current distribution. Journal of appliedphysics , , 361–372.Sarvas, J. (1987). Basic mathematical and electromagnetic concepts of thebiomagnetic inverse problem. Physics in Medicine & Biology , , 11.Schlömer, T., Heck, D., & Deussen, O. (2011). Farthest-point optimized pointsets with maximized minimum distance. In Proceedings of the ACM SIG-GRAPH Symposium on High Performance Graphics (pp. 135–142). ACM.Schneiderman, J. F. (2014). Information content with low-vs. high-Tc SQUIDarrays in MEG recordings: The case for high-Tc SQUID-based MEG.

Jour-nal of neuroscience methods , , 42–46.Sebastiani, P., & Wynn, H. P. (2000). Maximum entropy sampling and opti-mal bayesian experimental design. Journal of the Royal Statistical Society:Series B (Statistical Methodology) , , 145–157.Shannon, C. (1949). Communication in the presence of noise. Proceedings ofthe IRE , , 10–21.Slutzky, M. W., Jordan, L. R., Krieg, T., Chen, M., Mogul, D. J., & Miller, L. E.(2010). Optimal spacing of surface electrode arrays for brain–machineinterface applications. Journal of neural engineering , , 026004.Solin, A., & Särkkä, S. (2020). Hilbert space methods for reduced-rank Gaus-sian process regression. Statistics and Computing , , 419–446.Srinivasan, R., Nunez, P. L., Tucker, D. M., Silberstein, R. B., & Cadusch, P. J.(1996). Spatial sampling and ﬁltering of EEG with spline laplacians to es-timate cortical potentials. Brain topography , , 355–366. Srinivasan, R., Tucker, D. M., & Murias, M. (1998). Estimating the spatialNyquist of the human EEG.

Behavior Research Methods, Instruments, &Computers , , 8–19.Stark, H., & Woods, J. W. (1986). Probability, random processes, and estima- ion theory for engineers . Prentice-Hall, Inc.Stenroos, M., Hunold, A., & Haueisen, J. (2014). Comparison of three-shelland simpliﬁed volume conductor models in magnetoencephalography. NeuroImage , , 337–348.Stenroos, M., Mäntynen, V., & Nenonen, J. (2007). A matlab library for solvingquasi-static volume conduction problems using the boundary elementmethod. Computer methods and programs in biomedicine , , 256–263.Stenroos, M., & Nummenmaa, A. (2016). Incorporating and compensat-ing cerebrospinal ﬂuid in surface-based forward models of magneto-and electroencephalography. PLOS ONE , , 1–23. URL: https://doi.org/10.1371/journal.pone.0159595 . doi: .Stenroos, M., & Sarvas, J. (2012). Bioelectromagnetic forward problem: iso-lated source approach revis (it) ed. Physics in Medicine & Biology , , 3517.Taulu, S., & Kajola, M. (2005). Presentation of electromagnetic multichanneldata: the signal space separation method. Journal of Applied Physics , ,124905.Tierney, T. M., Mellor, S., O’Neil, G. C., Holmes, N., Boto, E., Roberts, G., Hill,R. M., Legget, J., Bowtell, R., Brookes, M. J. et al. (2019). Pragmatic spatialsampling for wearable MEG arrays. bioRxiv , .Unser, M. (2000). Sampling—50 years after Shannon. Proceedings of the IEEE , , 569–587.Williams, C. K., & Rasmussen, C. E. (2006). Gaussian processes for machinelearning . MIT press Cambridge, MA.Wilson, H., & Vrba, J. (2007). Comparison of MEG arrays—revisited. In

Inter-national Congress Series (pp. 619–622). Elsevier volume 1300.

Windhoff, M., Opitz, A., & Thielscher, A. (2013). Electric ﬁeld calculationsin brain stimulation based on ﬁnite elements: an optimized processingpipeline for the generation and usage of accurate individual head models.

Human brain mapping , , 923–935. Appendix A. Interpretations of posterior variables

The covariance kernel obeys the form K ( (cid:126) r , (cid:126) r (cid:48) ) = ψ ( (cid:126) r ) (cid:62) K a ψ ( (cid:126) r (cid:48) ). As k ( (cid:126) r ) = Ψ K a ψ ( (cid:126) r ), we can read fromEq. (18) that the posterior covariance of the coefﬁcients a is K ∗ a = K a − K a Ψ (cid:62) ( K + Σ ) − Ψ K a . (A.1)Application of Woodbury’s matrix identity (Petersen & Peder-sen, 2012) yields K ∗ a = K a − K a Ψ (cid:62) ( K + Σ ) − Ψ K a = K a − K a Ψ (cid:62) ( Ψ K a Ψ (cid:62) + Σ ) − Ψ K a = ( Ψ (cid:62) Σ − Ψ + K − a ) − , (A.2)which approaches the covariance of the coefﬁcient error ofa bandlimited topography (Eq. (15)), when K − a approacheszero and the measurement noise is white.Similarly, the posterior mean of the coefﬁcients is µ ∗ a = K a Ψ (cid:62) ( K + Σ ) − y = K a Ψ (cid:62) ( Ψ K a Ψ (cid:62) + Σ ) − y , (A.3)which follows the form of the generalized minimum-normestimate (Dale & Sereno, 1993). Inserting Σ = σ I and K a = λ I , we get the coefﬁcients to the form Ψ (cid:62) ( ΨΨ (cid:62) + ( σ / λ ) I ) − y , which is the more common form for the esti-mator, σ / λ being the Tikhonov regularization parameter. The coefﬁcient estimator can be further manipulated: µ ∗ a = K a Ψ (cid:62) ( K + Σ ) − y = K a Ψ (cid:62) ( K + Σ ) − [( K + Σ ) − K ] Σ − y = K a Ψ (cid:62) [ I − ( K + Σ ) − K ] Σ − y = [ K a Ψ (cid:62) − K a Ψ (cid:62) ( K + Σ ) − Ψ K a Ψ (cid:62) ] Σ − y = [ K a − K a Ψ (cid:62) ( K + Σ ) − Ψ K a ] Ψ (cid:62) Σ − y = K ∗ a Ψ (cid:62) Σ − y = ( Ψ (cid:62) Σ − Ψ + K − a ) − Ψ (cid:62) Σ − y , (A.4)which again reduces to the least-squares estimator of Eq. (11),when K − a approaches zero and the measurement noise iswhite. Appendix B. Total information and covariance

Channel capacity or the total information (TI) conveyed by anoisy channel (Shannon, 1949) has been used to evaluate dif-ferent sensor arrays in MEG (Kemppainen & Ilmoniemi, 1989;Nenonen et al., 2004; Iivanainen et al., 2017; Riaz et al., 2017).Here, we show how the channel capacity relates to the sampleand noise covariance matrices, K and Σ (Sec. 3.3).If the samples as well as the noise were uncorrelated, TIwould be 1/2 (cid:80) i log ( P i +

1) where P i is the power signal-to-noise ratio of each measurement. If the measurements arecorrelated, they can be orthogonalized by the eigendecompo-sition of the whitened covariance matrix ˜ K = Σ − K Σ −(cid:62) /2 = VPV (cid:62) , where V contains eigenvectors of ˜ K and P is a diago-nal matrix of P i . Starting from the original formula, TI can benow written asTI( R ) = (cid:88) i log ( P i + =

12 log det( P + I ) =

12 log det( ˜ K + I ) =