[PDF] Fast estimation of a convolution type model for the intensity of spatial point processes

Abstract

Estimating the first-order intensity function in point pattern analysis is an important problem, and it has been approached so far from different perspectives: parametrically, semiparametrically or nonparametrically. Our approach is close to a semiparametric one. Motivated by eye-movement data, we introduce a convolution type model where the log-intensity is modelled as the convolution of a function β(⋅) , to be estimated, and a single spatial covariate (the image an individual is looking at for eye-movement data). Based on a Fourier series expansion, we show that the proposed model is related to the log-linear model with infinite number of coefficients, which correspond to the spectral decomposition of β(⋅) . After truncation, we estimate these coefficients through a penalized Poisson likelihood and prove infill asymptotic results for a large class of spatial point processes. We illustrate the efficiency of the proposed methodology on simulated data and real data.

Full PDF

FFast estimation of a convolution type model for the intensity ofspatial point processes

Francisco Cuevas–Pacheco ∗ , Jean–Fran¸cois Coeurjolly , and Marie–H´el`ene Descary Department of Mathematics, Universit´e du Qu´ebec `a Montr´eal University Grenoble Alpes, Grenoble-INP, LJK, 38000 Grenoble, France Departamento de Matem´atica, Universidad T´ecnica Federico Santa Mar´ıa, Avenida Espa˜na 1680,Valpara´ıso, Chile

Abstract

Estimating the ﬁrst-order intensity function in point pattern analysis is an importantproblem, and it has been approached so far from diﬀerent perspectives: parametrically,semiparametrically or nonparametrically. Our approach is close to a semiparametric one.Motivated by eye-movement data, we introduce a convolution type model where the log-intensity is modelled as the convolution of a function β ( · ), to be estimated, and a singlespatial covariate (the image an individual is looking at for eye-movement data). Based on aFourier series expansion, we show that the proposed model is related to the log-linear modelwith inﬁnite number of coeﬃcients, which correspond to the spectral decomposition of β ( · ).After truncation, we estimate these coeﬃcients through a penalized Poisson likelihood andprove inﬁll asymptotic results for a large class of spatial point processes. We illustrate theeﬃciency of the proposed methodology on simulated data and real data. Point processes, Intensity function, Convolution, Fourier transform.

Spatial point pattern data arise in many contexts where the interest lies in describing thedistribution of an event in space. Some examples include the locations of trees in a forest, golddeposits mapped in a geological survey, stars in a cluster star (see e.g. Møller and Waagepetersen,2003; Illian et al., 2008; Baddeley et al., 2015). One of the main interests when analyzing spatialpoint pattern data is to estimate the intensity which characterizes the probability that a point(or an event) occurs in an inﬁnitesimal ball around a given location. It is common to relate thedistribution of points to one (or more) non-random spatial covariate Z observed over the wholedomain W ⊂ R d . This can be done semiparametrically (Baddeley et al., 2015) by assumingthat ρ ( s ) = f { Z ( s ) } , s ∈ W, where f : R → R + or using a parametric model, like the log-linearmodel log ρ ( s ) = β + βZ ( s ) , s ∈ W, (1)where β , β ∈ R , which is certainly the most often used model. These models and in particularthe latter one can easily be extended in a multivariate setting and have been applied in manyvarious ﬁelds: e.g. to estimate intensity of species of trees (Waagepetersen, 2008; Choiruddinet al., 2020), of diseases locations in epidemiology surveillance (see e.g. Gatrell et al., 1996), oflocations of wildﬁre starts (see e.g. Xu and Schoenberg, 2011), etc. ∗ Corresponding author a r X i v : . [ s t a t . M E ] D ec he application we have in mind concerns eye-movement data which consist in locations ofretina ﬁxations recorded by an eye-tracker from individuals looking at an image (or a video). Itwas observed (see e.g. Deubel and Schneider, 1996; Wolfe and Horowitz, 2004; Cerf et al., 2008;Judd et al., 2009) that ﬁxations of the eyes are guided by local features of the image such asedges and colors, and by more global ones such as faces and objects. Parametric models havebeen investigated for example in Barthelm´e et al. (2013). Sometimes, the temporal feature isalso known and sequential spatio-temporal point processes models have been developed (see e.g.Penttinen and Ylitalo (2016) or Ylitalo et al. (2016)) to take this more complex situation intoaccount.From a biological point of view, when a human looks at an image, he explores locally a smallpart of the image. The resulting spatial average corresponds to the ﬁxation. Then the eye ofthe subject jumps to another part of the image, explores it locally (in an imperceptible way forthe human) and so on. In this paper, we assume we are only given a realization of a spatialpoint process (no temporal feature is observed) and focus on a log-convolution model whichrelaxes the log-linear model (1) and which may take into account that spatial moving averagecharacteristic. This new model is written aslog ρ ( s ) = ( β ∗ Z ) ( s ) , (2)where the unknown parameter β becomes a function in L ( R d ), and ∗ denotes the convolutionoperator ( f ∗ g )( s ) = (cid:82) R d f ( s − τ ) g ( τ ) d τ . This model allows the intensity function evaluated ata location s to depend not only on the single value Z ( s ), like in the log-linear model, but alsoon all values of the covariate function Z . Indeed, log ρ ( s ) can be seen as an inﬁnite weightedsum of the values of Z over W , where the weights are deﬁned through the function β . Withrespect to the previous application, Z could correspond to the raw image (in gray level) or tothe saliency map (see e.g. Barthelm´e et al., 2013) which is a prediction map of retina ﬁxationsbuilt independently from the data.Motivated by such an application, the aim of this paper is to consider the model (2) andto estimate nonparametrically the function β ( · ). Such a problem is close to a deconvolutionproblem (see Starck et al., 2002, and the references therein). Therefore, we intend to provide afast and eﬃcient nonparametric estimator of the function β ( · ) by borrowing classical ideas fromstandard signal processing, image analysis and functional data.As detailed in Section 2 our strategy is to take advantage of the Fourier basis which classicallyis able to handle eﬃciently convolutions. We decompose both Z and β in Fourier series andtruncate these series. We show in Section 2 that the resulting model will be close to a modelof the form log ρ ( s ) ≈ ψ (cid:62) y ( s ), where ψ and y ( s ) are p -dimensional vectors obtained fromFourier decompositions of β and Z ( s ) respectively. This approximation is nothing else thana multivariate version of (1) which is easily eastimated using standard methodology (see e.g.Baddeley et al., 2015; Coeurjolly et al., 2014; Guan et al., 2015). The length of the vector ψ to be estimated depends on the truncation of the Fourier series for β and Z and can get largevery quickly. Hence, we borrow ideas from Choiruddin et al. (2018) and implement a penalizedcomposite likelihood. More speciﬁcally, we provide adaptive Ridge and Lasso versions. Thisis detailed in Section 3 where we also present asymptotic results. Again motivated by theapplication, we present inﬁll asymptotic type results: i.e. we assume observing more and morepoints in the same observation domain. In comparison, results established by Choiruddin et al.(2018) were obtained under increasing domain assumptions. Section 4 presents a simulationstudy. In particular, we show that, for a large class of spatial point processes models, if thefunction β has a sparse representation in the spectral domain, our adaptive Lasso estimator isable to estimate quickly and eﬃciently the Fourier coeﬃcients of β and the function β itself bysimply using inverse Fourier transform. Section 5 presents a brief application to eye-movement2ata. Finally, proofs are postponed to A. Let X be a point process deﬁned on R d , i.e., a random countable subset of R d observed ina bounded set W ⊂ R d which is typically a rectangular region and where usually d = 2 , N ( A ) be the number of points of X falling in A ⊂ R d . We assume the intensity functionassociated with X exists, that is the function ρ : R d → R + such that E { N ( A ) } = (cid:82) A ρ ( s ) d s for A ⊂ R d and ρ ( s ) > s ∈ W is well-deﬁned.Let X be a spatial point process on R d . Let W ⊂ R d be a compact set of Lebesgue measure | W | which will play the role of the observation domain. We view X as a locally ﬁnite randomsubset of R d , i.e. the random number of points of X in B , N ( B ), is almost surely ﬁnite whenever B ⊂ R d is a bounded region. A realization of X in W is thus a set x = { x , x , . . . , x m } , where x i ∈ W and m is the observed (ﬁnite) number of points in W .Campbell theorem (see e.g. Møller and Waagepetersen, 2003) states that, assuming that X has ﬁrst-order (resp. second-order) intensity ρ (resp. ρ (2) ) is equivalent to saying that for anyfunction k : R d → [0 , ∞ ) or k : R d × R d → [0 , ∞ ) E (cid:88) s ∈ X k ( s ) = (cid:90) R d k ( s ) ρ ( s )d s (3) E (cid:54) = (cid:88) s , t ∈ X k ( s , t ) = (cid:90) R d (cid:90) R d k ( s , t ) ρ (2) ( s , t )d s d t . (4)Therefore, for instance, we may interpret ρ ( u )d u as the probability of occurrence of a point in aninﬁnitesimally small ball with center u and volume d u . We end this section with the deﬁnitionof the pair correlation function given by g ( s , t ) = ρ (2) ( s , t ) / { ρ ( s ) ρ ( t ) } (with the convention0 / X corresponds to a Poisson point process. Let us ﬁrst describe more rigorously the model (2). We assume that the covariate Z : R d → R is a function of L ( R d ) the space of square integrable functions equipped with the inner product (cid:104) f, g (cid:105) = (cid:82) R d f ( s ) g ( s ) d s . We assume without loss of generality that the process X is observed onthe hypercube W = [0 , d , and that both β and Z are functions with support W . Therefore it ispossible to expand β and Z using a Fourier basis of L ( W ). Indeed, let φ κ ( s ) = exp(2 π i κ · s ) = (cid:81) di =1 exp(2 π i κ i s i ) be the Fourier bases, where i is the imaginary unit, s = ( s , . . . , s d ) ∈ W ,and κ = ( κ , . . . , κ d ) ∈ Z d is a vector of frequencies, then we can write β ( s ) = (cid:88) κ ∈ Z d β κ φ κ ( s ) , Z ( s ) = (cid:88) κ ∈ Z d Z κ φ κ ( s ) , (5)where β κ = (cid:104) β, φ κ (cid:105) and Z κ = (cid:104) Z, φ κ (cid:105) correspond to the κ -Fourier coeﬃcients of β and Z respectively. By replacing the functions β and Z in the log-convolution model (2) by theirexpression given in (5), we havelog ρ ( s ) = ( β ∗ Z ) ( s ) = (cid:88) κ ∈ Z d (cid:88) l ∈ Z d β κ Z l (cid:90) W φ κ ( s − τ ) φ l ( τ ) d τ = (cid:88) κ ∈ Z d β κ Z κ φ κ ( s ) , (6)3here the last equality is obtained using the fact that φ κ ( s − τ ) = φ κ ( s ) φ κ ( − τ ) = φ κ ( s ) φ κ ( τ ),with z denoting the complex conjugate of z . Hence, our model can be rewritten in terms of thespectrum of β and Z . Note that since log ρ ( s ) is a real number, its imaginary part I [log ρ ( s )]must be zero. Moreover, since the Fourier coeﬃcients of a real function satisfy the Hermitiansymmetry property, then β κ = β − κ and Z κ = Z − κ , for κ ∈ Z d . Deﬁning Z d ⊕ as the set ofunique non-zero vectors κ ∈ Z d under the relation κ = − κ , and κ = as the d-dimensionalzero vector (yielding to φ κ ( s ) = exp(2 π i · s ) = 1), we can rewrite (6) aslog ρ ( s ) = β κ Z κ φ κ ( s ) + (cid:88) κ ∈ Z d ⊕ (cid:110) β κ Z κ φ κ ( s ) + β κ Z κ φ κ ( s ) (cid:111) = β κ Z κ + (cid:88) κ ∈ Z d ⊕ R [ β κ Z κ φ κ ( s )]= β κ Z κ + (cid:88) κ ∈ Z d ⊕ { R [ β κ ] R [ Z κ φ κ ( s )] − I [ β κ ] I [ Z κ φ κ ( s )] } , (7)with R [ z ] being the real part of z , and where the identities z + ¯ z = 2 R [ z ] and R [ z z ] = R [ z ] R [ z ] − I [ z ] I [ z ] have been used to obtain the second and last equality respectively. Inthe sequel we denote R [ β κ ] (respectively I [ β κ ]) by β R κ (respectively β I κ ).The log-convolution model (7) cannot be used directly since as already pointed out, itdepends on the covariate function Z through its spectrum which is not observed in practice.We propose to estimate it in an eﬃcient way by using the d -dimensional fast Fourier transform(FFT) of the sequence { Z ( s i ) } Ni =1 , where { s i } Ni =1 is a regular grid over W . The resultingapproximation of the spectrum has a ﬁnite number of terms { Z κ i } Ki =0 , where { κ i } Ki =0 is apartially ordered sequence of the obtained Fourier frequencies. This yields the following ﬁniteapproximation of the model (7):log ρ K ( s ) = β κ Z κ + K (cid:88) i =1 (cid:8) β R κ i R [ Z κ i φ κ i ( s )] − β I κ i I [ Z κ i φ κ i ( s )] (cid:9) . (8)Note that since φ κ ( s ) = 1, we have that β κ = (cid:104) β, φ κ (cid:105) = (cid:90) W β ( s ) d s and Z κ = (cid:104) Z , φ κ (cid:105) = (cid:90) W Z( s ) d s , are two real quantities. Moreover, from expression (8), we note that the parameter β κ is notidentiﬁable if Z κ = 0. To avoid this problem, we suppose without loss of generality that Z κ = (cid:82) W Z ( s ) d s = 1. Finally, we can write the ﬁnite approximation (8) of our originallog-convolution model (2) as a log-linear modellog ρ K ( s ) = β κ + β (cid:62) K Z ( s ) , (9)where β κ ∈ R and β K = ( β R κ , . . . , β R κ K , β I κ , . . . , β I κ K ) (cid:62) ∈ R K are the parameters to beestimated and Z ( s ) is the vector of covariates deﬁned as Z ( s ) = { R [ Z κ φ κ ( s )] , . . . , R [ Z κ K φ κ K ( s )] , − I [ Z κ φ κ ( s )] , . . . , − I [ Z κ K φ κ K ( s )] } (cid:62) . In the following, we assume that the approximation is exact that is there exists

K < ∞ (poten-tially very large) such that ρ = ρ K . Given the form of (9), it is therefore possible to estimatethe spectrum of the function β using existing estimation methods for log-linear models, giventhat the covariate function Z is such that there is at least one κ i ∈ Z d , i = 1 , . . . , K such that Z κ i is non-zero. Finally, once the spectrum of β is estimated, we can get back to the function β using the inverse fast Fourier transform. 4 Regularized estimation procedure and asymptotic results

To avoid any confusion with previous sections, we slightly change our notation in this section.The model (9) will therefore be seen as a particular case. We assume to observe a sequence ofspatial point processes X n with intensity ρ n parameterized by ψ ∈ Ψ ⊆ R p ( p ≥

1) aslog ρ n ( s ) = θ n + ψ (cid:62) y ( s ) , s ∈ W. (10)Here the parameter θ n should be regarded as a nuisance parameter and is such that θ n → ∞ as n → ∞ while y ( s ) = ( y ( s ) , . . . , y p ( s )) (cid:62) represents the vector of p spatial covariates availableat any location s ∈ W . As a matter of fact, W is assumed to be ﬁxed and the mean numberof points in W which is proportional to θ n grows with n . This setting, called inﬁll asymptoticsin the spatial statistics literature, is natural in the context of the present paper (the image isﬁxed).The inference is done by maximizing Q n ( · ), an adaptive Lasso regularized version of thePoisson likelihood (Choiruddin et al., 2020), namely Q n ( ψ ) = θ − n (cid:96) n ( ψ ) − p (cid:88) j =1 λ n,j | ψ j | , (11)where (cid:96) n ( ψ ) = (cid:88) s ∈ X n log ρ n ( s ) − (cid:90) W ρ n ( s )d s (12)and where λ n,j ≥ (cid:96) (1) n ( ψ ) ∈ R p be the derivative of (cid:96) n ( ψ )with respect to ψ , it is well-known that the sequence { (cid:96) (1) n ( ψ ) } n ≥ constitutes a sequence ofestimating equations. In that respect, we let ψ = ψ n = argmin ψ E { (cid:96) (1) n ( ψ ) } and ˆ ψ = ˆ ψ n =argmax ψ Q n ( ψ ). Moreover, let S n ( ψ ) = E (cid:110) − dd ψ (cid:62) (cid:96) (1) n ( ψ ) (cid:111) = − dd ψ (cid:62) (cid:96) (1) n ( ψ ) and Σ n ( ψ ) =Var { (cid:96) (1) n ( ψ ) } be the ( p, p ) matrices given by S n ( ψ ) = θ n (cid:90) W y ( s ) y ( s ) (cid:62) exp( ψ (cid:62) y ( s ))d s , Σ n ( ψ ) = S n ( ψ ) + θ n (cid:90) W (cid:90) W y ( s ) y ( t ) (cid:62) { g n ( s , t ) − } exp( ψ (cid:62) ( y ( s ) + y ( t ))d s d t . We assume our model is sparse in the sense that ψ can be decomposed as ψ = ( ψ (cid:62) , ψ (cid:62) ) (cid:62) where ψ ∈ R s , with s < p , and ψ = ∈ R p − s is a zero vector. Following this decompositionwe let y ( s ) = { y ( s ) (cid:62) , y ( s ) (cid:62) } (cid:62) and ψ = ( ψ (cid:62) , ψ (cid:62) ) (cid:62) and more generally for any vector z ∈ R p we use the notation z = ( z (cid:62) , z (cid:62) ) (cid:62) . For a square ( p, p ) positive semi-deﬁnite matrix M n , welet ν min ( M n ) denote its smallest eigenvalue, (cid:107) M n (cid:107) denote its spectral norm and M n, denoteits ( s, s ) top-left corner. Finally, we let a n = max j ≤ s λ n,j and b n = min j>s λ n,j . (13)Our main result is based on a set of assumptions that we gather under the assumption [H].[H] Ψ is an open bounded convex set of R p , sup s ∈ W (cid:107) y ( s ) (cid:107) ∞ < ∞ ,lim inf n ν min { θ − n S n ( ψ ) } >

0, lim inf n ν min { θ − n Σ n ( ψ ) } > (cid:107) Σ n ( ψ ) (cid:107) = O ( θ n ) andas n → ∞ Σ − / n, ( ψ ) (cid:96) (1) n, ( ψ ) d → N (0 , I s )in distribution. 5 heorem 1. We assume the set of assumptions [H] holds and let θ n → ∞ as n → ∞ . Thenwe have the two following statements.(i) Assume a n = O ( θ − / n ) , then ˆ ψ − ψ = O P ( θ − / n ) .(ii) Assume a n √ θ n → and b n √ θ n → ∞ , then as n → ∞ , P( ˆ ψ = 0) → and Σ − / n, ( ψ ) S n, ( ψ )( ˆ ψ − ψ ) d → N (0 , I s ) in distribution. Theorem 1 is the natural inﬁll version of Choiruddin et al. (2018, Theorem 1-2) where thevolume | W n | in the increasing domain asymptotics is replaced by θ n .We outline that following Choiruddin et al. (2018) and Ba and Coeurjolly (2020), resultscould be easily extended to the setting where p = p n diverges to inﬁnity. In such a setting, weclaim that we would require p n /θ n → a n √ θ n → b n (cid:112) θ n /p n → ∞ as n → ∞ . Also,similar results could be obtained for more general penalties such as the adaptive elastic net,including non-convex penalties such as SCAD or MC+ (see again Choiruddin et al., 2018). Thishas not been considered in this paper. In this section, we investigate the ﬁnite-sample properties of the adaptive Lasso estimatorintroduced in Section 3 through a simulation study. We also compare this estimator with anadaptive Ridge estimator and the standard Poisson likelihood estimator.We focus on planar point processes, set W = [0 , × [0 , β (four scenarios so). As models, we consider an inhomogeneousPoisson point process and an inhomogeneous Thomas process. For both types of processes, wesuppose that the intensity function ρ follows the log-convolution model given by (2) where thefunction β has a compactly supported spectrum. We set β ( s ) = β κ + (cid:80) K i =1 β κ i φ κ i ( s ), where K = 12 is the true number of frequencies, and { κ i } i =0 ⊂ Z is the partially ordered frequencysequence illustrated in Figure 1. We consider the two following spectral models for β :(a) β κ i = 0 . , i = 1 , . . . ,

12. The spectrum is illustrated on the left hand side of Figure 2, andit yields a symmetric function β which is illustrated on the top left corner of Figure 6.(b) β κ i = 0 . . κ yi , i = 1 , . . . ,

12 with κ yi being the y coordinate. The real (resp. imag-inary) part of the spectrum is illustrated on the left (resp. right) hand side of Figure 2,and it yields the asymmetric function β depicted on the top right corner of Figure 6.For each scenario, the covariate function Z is the image depicted on the left of Figure 3, andthe resulting log-intensity function for the symmetric (a) (resp. asymmetric (b)) function β isillustrated on the middle side (resp. right) of Figure 3. Finally, for each scenario, we considerthree diﬀerent values of the parameter β κ = β , where these values are chosen such thatthe expected number of points E { N ( W ) } is 200, 800, and 1800 (to emulate inﬁll asymptoticframework). For the Thomas process, we impose 100 clusters in average and set the scaleparameter to respect the expected number of points.For each scenario and each value of β , we simulate 1000 point patterns. For each simulatedpoint pattern, we estimate the spectrum of β using the log-linear model given by (9) with thethree following methods: • Poisson likelihood: maximizing the Poisson likelihood function;6 −404 −4 0 4

Figure 1: Partial order of the sequence of frequencies. The green boxes are the used frequenciesand the blue boxes are the conjugate ones. The zero frequency κ = (0 ,

0) is always included. • Ridge: maximizing an adaptive Ridge regularized version of the Poisson likelihood follow-ing Choiruddin et al. (2018); • Lasso: Adaptive Lasso estimator deﬁned in the previous section.The vector of covariates, say Z ( s ) in (9), was standardized, i.e. (cid:107)R [ Z κ i φ κ i ( s )] (cid:107) = 1 = (cid:107)I [ Z κ i φ κ i ( s )] (cid:107) for i = 1 , . . . , K .We assume that the true number of frequencies K , is unknown and we study the perfor-mance of the estimates of (9) using diﬀerent values of K . Since the partial ordering of thefrequencies illustrated in Figure 1 follows a spiral with center at the origin, we obtained thenumber K of frequencies to be considered from the sequence { t + 1) + 2( t + 1) } t =1 andits middle points { ( t + 1) + ( t + 2) + 2 t + 3 } t =1 , for a total of 11 considered values for K (namely K = 12 , , , , , , , , ,

98 and 112). The standard Poisson likelihood re-gression cannot handle a large number of covariates, so to avoid numerical problems, we onlyused K = 12 ,

18 and 24 for this estimator. Recall that the total number of estimated param-eters is p = 2 K plus an intercept. Our aim is to illustrate that even for a large value of K ,the regularized estimators are quite eﬃcient compared to the oracle which corresponds to thePoisson likelihood estimator with K = 12.It is well-known that the quality of the estimation obtained using adaptive Lasso or adaptiveRidge relies on the sequence { λ j } Kj =1 (Fan and Lv, 2010). We follow Zou (2006) and Choiruddinet al. (2018) and set λ j = λ/ | ˆ β Rj | where ˆ β Rj is the Ridge estimator of β j . This transforms theoriginal problem of ﬁnding a sequence { λ j } Kj =1 into the one of choosing the parameter λ whichhas been done by minimizing the weighted version of the BIC criteria (Choiruddin et al., 2018)WQBIC( λ ) = − (cid:96) { ˆ β ( λ ) } + s ( λ ) log | W | , where ˆ β ( λ ) is the estimate of { β κ i } Ki =1 for a given λ and s ( λ ) is the total number of non-zeroparameters. The minimization of WQBIC has been done by a grid search method on the interval[ λ min , λ max ] where λ min and λ max are obtained following Friedman et al. (2010).The implementation has been done using the R statistical software and results in a combina-tion of the spatstat package (Baddeley et al., 2015) devoted to spatial point pattern analysis7

202 −2 0 2 0.00.10.20.3

Real part of symmetric and asymmetric spectra −202 −2 0 2 −0.250.000.25

Imaginary part of the asymmetric spectrum

Figure 2: Real and imaginary part of the spectrum of β described in (a) and (b) of Section 4.and the glmnet package (Friedman et al., 2010) which provides Lasso and Ridge regularizedestimators for GLMs. We also used extensively the discrete Fast Fourier transform fft imple-mented in the base package.In order to evaluate the performance of the diﬀerent estimation procedures we calculatefor each of them the mean squared error (MSE) of ˆ β , and the mean integrated squared error(MISE) of the function ˆ β with β set to 0, that isMSE( ˆ β ) = 1 M M (cid:88) m =1 (cid:110) β − ˆ β ( m ) (cid:111) , IMSE (cid:40) K (cid:88) i =1 ˆ β κ i φ κ i ( s ) (cid:41) = K (cid:88) i =1 MSE( ˆ β κ i ) , where M = 1000 is the number of simulated point patterns and ˆ β ( m ) κ i is the estimate of β κ i forthe m th point pattern. The ﬁrst rows of Figures 4 and 5 contain four graphs, each correspondingto a diﬀerent simulation scenario. Each curve on a graph is the plot of log(MSE) against K , fora given method and a given value of the expected number of points. The second rows of thesetwo ﬁgures are constructed the same way but for log(MISE). We can see that both MSE andMISE decrease with respect to the expected number of points and increase with respect to K .However, the increment becomes smaller when the average number of points increases.In order to study the selection properties of the adaptive Lasso, we follow Choiruddin et al.(2018) and report the true positive rate (TPR) given by the number of selected coeﬃcients thatare truly diﬀerent from 0, hereafter true coeﬃcients, over the total number of true coeﬃcientsand the false positive rate (FPR) given by the number of selected coeﬃcients that are truly equalto 0, hereafter noisy coeﬃcients, over the total number of noisy coeﬃcients. In our context,TPR and FPR play complementary roles: TPR measures how well the method selects thefrequencies associated with true coeﬃcients, while FPR indicates how well the method dropsthe frequencies corresponding to noisy coeﬃcients. The third and fourth rows of Figures 4 and5 give the average TPR and FPR values with respect to the total number of frequencies K for8 aw image log r ( (cid:215) ) (symmetric spectrum) log r ( (cid:215) ) (asymmetric sptectrum)−2 0 2 Figure 3: Left : True image Z . Middle: Resulting convolution between Z and the symmetricfunction β deﬁned in Section 4 (a). Right: Resulting convolution between Z and the asymmetricfunction β deﬁned in Section 4 (b).diﬀerent values of expected number of points. In the two scenarios where the point patternsare realizations of a Poisson point process, we observe that TPR increases and FPR decreaseswith the expected number of points, meaning that Lasso is able to select the total number oftrue frequencies and discard the noisy frequencies. When the point patterns are realizations ofa Thomas process, the TPR behaves as previously, however, the FPR tends to increase withrespect to the expected number of points.Overall, what is interesting to point out is that if K is large, which from a practical pointof view means we take the maximal number of available frequencies, then the results remainvery satisfactory. The MSE and IMSE of respectively ˆ β abd ˆ β are not too much deterioratedcompared to the oracle estimator. Furthermore, the adaptive Lasso procedure is able to recoverthe true non-zero frequencies eﬃciently.Finally, Figure 6 illustrates the average of the M = 1000 estimated functions of β :ˆ β A ( s ) = M − M (cid:88) m =1 (cid:40) K (cid:88) i =0 ˆ β ( m ) κ i φ κ i ( s ) (cid:41) obtained with the adaptive Lasso and adaptive Ridge with K = 112, for our four scenarios andwhere β κ is set such that the expected number of points is 1800. In this section, we illustrate our methodology on a real dataset. Data were kindly providedby David Meary (LPNC, Grenoble) and is a small sample of a large study conducted on 3- to12-month old babies and a group of adults to understand visual diﬀerences (Helo et al., 2016).To illustrate this paper, we consider only the group of adults and one image. The true image isthe one used for the simulation study (see Figure 3). The locations of retina ﬁxations consist ina sample of 632 data points, shown in the top left of Figure 7. The points are superimposed onthe image of the spatial covariate Z , which is a saliency map provided by David Meary, followingHo-Phuoc et al. (2010). We estimate the intensity using the following methods/models:1. Nonparametric estimate: we make no assumption and estimate the intensity using thekernel density estimator (e.g. Baddeley et al., 2015) where the bandwidth parameter hasbeen selected using the proposal by Cronie and Van Lieshout (2018).9ethod/model AUCNonparametric estimate 0.869Semiparametric estimate 0.774Log-linear model 0.785Log-convolution model (A. Lasso) 0.900Log-convolution model (A. Ridge) 0.918Table 1: Table of area under ROC curve (AUC) for the diﬀerent estimates.2. Semiparametric estimate: we consider the semiparametric model ρ ( s ) = f { Z ( s ) } andestimate nonparametrically f as detailed in Baddeley et al. (2012).3. Log-linear model: we consider the log-linear regression model (1) with Z as covariate andestimate the single parameter β by using the Poisson likelihood.4. Log-convolution model: we consider the log-convolution model (2) and estimate the func-tion β from its spectrum estimated by adaptive Lasso or adaptive Ridge (both with initialestimate obtained from a Ridge penalization). In both cases, the total number of usedfrequencies was 112, sorted as in Figure 1, and the variables were normalized as in theprevious section.The resulting estimates are shown in Figure 7. Typically, we can notice that the nonparametricestimate and the ones from the log-convolution model are blurred versions of the image, whereasthe parametric and semiparametric estimates are transformed versions of the image. We canclearly see that the saliency map in this example does not fully explain locations of ﬁxationsin the top of the image which reveals why the parametric and semiparametric estimates alsomiss these points. The log-convolution model seems to be a good compromise between theparametric and nonparametric estimates. We have the nonparametric ﬂexibility while keepingthe interpretation on the inﬂuence of the spatial covariate Z .To evaluate numerically the performance of the estimates, we use the area under the ROCcurve (AUC) (Baddeley et al., 2015, Section 6.7.3), also implemented in the spatstat R soft-ware. The results are reported in Table 1, which shows that estimates from the log-convolutionmodel seem to exhibit a higher AUC than other estimates.For the two estimates from the log-convolution model, the estimated β functions are shownin Figure 8 and depict how the intensity function on a location s is proportional to a weightedversion of Z at the location s and its neighbours. The Adaptive Lasso estimate selects thirtyfrequencies. This paper has introduced a log-convolution model to parameterize the intensity of a spatialpoint process. This model can be seen as a good compromise between the log-linear model anda nonparametric estimator.For eye-movement data, we considered in this paper the saliency map. Even for such data, wecould consider a multivariate version of the log-convolution model, say log ρ ( s ) = (cid:80) Ii =1 β i ∗ Z i ( s )(for instance for I = 3, Z i could correspond to the RGB levels of the raw image). The proposedmethodology could be extended to estimate each function β i , however a restriction of the form (cid:80) Ii =1 (cid:82) W β i ( s ) d s = 1 would probably be required. The aforementioned restriction would onlyallow us to recover the shape of each β i ( · ) but not its zero frequency. Such an extension will10robably imply numerical complexities as the number of parameters to estimate will quicklygrow up.We considered d -dimensional point patterns in this paper, with a focus on the planar case.We believe that the model and methodology should be straightforwardly extended to circularor spherical point processes. Acknowledgements

The authors would like to thank David Meary for fruitful discussions and for providing us illus-trating data. The research of Jean–Fran¸cois Coeurjolly and Marie–H´el`ene Descary is supportedby the Natural Sciences and Engineering Research Council of Canada. Francisco Cuevas–Pacheco has been partially supported by the AC3E, UTFSM, under grant FB-0008

A Proof of Theorem 1

Proof.

The proof shares several similarities with Choiruddin et al. (2018, Theorems 1-2), exceptmainly that | W n | is replaced by θ n . We only give a sketch of the proof below. The notation κ hereafter stands for a generic positive constant which may vary from line to line.(i) Let d n = θ − / n + a n = O ( θ − / n ) by assumption. Following existing proofs, we have toprove that for any ε >

0, there exists Ω > (cid:40) sup (cid:107) ω (cid:107) =Ω ∆ n ( ω ) > (cid:41) ≤ ε where ∆ n ( ω ) = Q n ( ψ + d n ω ) − Q n ( ψ )for ω ∈ R p . We decompose ∆ n ( ω ) = T + T as T = θ − n { (cid:96) n ( ψ + d n ω ) − (cid:96) n ( ψ ) } T = p (cid:88) j =1 λ n,j ( | ψ j | − | ψ j + d n ω j | ) . Using Taylor expansion θ n T = d n ω (cid:62) (cid:96) (1) n ( ψ ) − d n ω (cid:62) S n ( ψ ) ω + 12 d n ω (cid:62) (cid:110) S n ( ψ ) − S n ( ˜ ψ ) (cid:111) ω for some ˜ ψ on the line segment between ψ and ψ + d n ω . From [H], it is left to check that for n large enough T ≤ d n (cid:107) ω (cid:107)(cid:107) (cid:96) (1) n (cid:107) θ − n − ˇ ν d n (cid:107) ω (cid:107) + κd n (cid:107) ω (cid:107) ≤ d n (cid:107) ω (cid:107)(cid:107) (cid:96) (1) n (cid:107) θ − n − ˇ ν d n (cid:107) ω (cid:107) where ˇ ν = lim inf ν min { θ − n S n ( ψ ) } . Regarding T , we have T = s (cid:88) j =1 λ n,j ( | ψ j | − | ψ j + d n ω j | ) ≤ κa n d n (cid:107) ω (cid:107) ≤ κd n (cid:107) ω (cid:107) . Hence, for n large enough, we can pick Ω large enough to ensure thatP (cid:40) sup (cid:107) ω (cid:107) =Ω ∆ n ( ω ) > (cid:41) ≤ P (cid:110) (cid:107) (cid:96) (1) n ( ψ ) (cid:107) ≥ κ (cid:112) θ n (cid:111) (cid:107) (cid:96) (1) n ( ψ ) (cid:107) = O P ( √ θ n ).(ii) This statement is proved into two parts. The ﬁrst one deals with the oracle propertywhile the second one is focused on the asymptotic normality result.The oracle property is proved if, under the assumption [H] and the conditions of Theorem 1,we prove that with probability tending to 1, for any ψ satisfying (cid:107) ψ − ψ (cid:107) = O P ( θ − / n ),and for any constant K > Q n (cid:110) ( ψ (cid:62) , (cid:62) ) (cid:62) (cid:111) = max (cid:107) ψ (cid:107)≤ K θ − / n Q n (cid:110) ( ψ (cid:62) , ψ (cid:62) ) (cid:62) (cid:111) . (14)And to prove (14), it is suﬃcient to show that with probability tending to 1 as n → ∞ , for any ψ satisfying (cid:107) ψ − ψ (cid:107) = O P ( θ − / n ), for some small ε n = K θ − / n , and for j = s + 1 , . . . , p , ∂Q n ( ψ ) ∂ψ j < < ψ j < ε n and ∂Q n ( ψ ) ∂ψ j > − ε n < ψ j < . (15)By Assumption [H], (cid:107) (cid:96) (1) n ( ψ ) (cid:107) = O P ( θ / n ) and there exists t ∈ (0 ,

1) such that ∂(cid:96) n ( ψ ) ∂ψ j = ∂(cid:96) n ( ψ ) ∂ψ j + t K (cid:88) l =1 ∂ (cid:96) n { ψ + t ( ψ − ψ ) } ∂ψ j ∂ψ l ( ψ l − ψ l ) = O P ( θ / n ) + O P ( θ n θ − / n ) = O P ( θ / n ) . Let 0 < ψ j < ε n (the other part of (15) is proved similarly). For n suﬃciently large,P (cid:18) ∂Q n ( ψ ) ∂ψ j < (cid:19) = P (cid:18) ∂(cid:96) n ( ψ ) ∂ψ j − θ n sign( ψ j ) < (cid:19) ≥ P (cid:18) ∂(cid:96) n ( ψ ) ∂ψ j < θ n b n (cid:19) = 1 + o (1)since ∂(cid:96) n ( ψ ) /∂ψ j = O P ( θ / n ) and b n θ / n −→ ∞ .We now focus on the asymptotic normality statement. As shown in (i) and from the oracleproperty, there is a root- θ n consistent local maximizer of Q n (cid:110) ( ψ (cid:62) , (cid:62) ) (cid:62) (cid:111) , which is regardedas a function of ψ , and that satisﬁes ∂Q n ( ˆ ψ ) ∂ψ j = 0 for j = 1 , . . . , s and ˆ ψ = ( ˆ ψ (cid:62) , (cid:62) ) (cid:62) . We nowuse a Taylor series expansion. There exists t ∈ (0 ,

1) and ˘ ψ = ˆ ψ + t ( ψ − ˆ ψ ) such that0 = ∂(cid:96) n ( ˆ ψ ) ∂ψ j − θ n λ n,j sign( ˆ ψ j )= ∂(cid:96) n ( ψ ) ∂ψ j + s (cid:88) l =1 ∂ (cid:96) n ( ψ ) ∂ψ j ∂ψ l ( ˆ ψ l − ψ l ) + s (cid:88) l =1 D n,jl ( ˆ β l − ψ l ) − θ n λ n,j sign( ˆ ψ j ) (16)where D n,jl = ∂ (cid:96) n ( ˘ ψ ) ∂ψ j ∂ψ l − ∂ (cid:96) n ( ψ ) ∂ψ j ∂ψ l . Now, let (cid:96) (1) n, ( ψ ) (resp. (cid:96) (2) n, ( ψ )) be the ﬁrst s components(resp. s × s top-left corner) of (cid:96) (1) n ( ψ ) (resp. (cid:96) (2) n ( ψ ) = − S n ( ψ )). Let also D n be the s × s matrix with entries D n,jl , j, l = 1 , . . . , s , let s n = { λ n, sign( ˆ ψ ) , . . . , λ n,s sign( ˆ ψ s ) } and M n = Σ − / n, ( ψ ). We premultiply both sides of (16) and rewrite them as M n (cid:110) (cid:96) (1) n, ( ψ ) − S n, ( ψ )( ˆ ψ − ψ ) (cid:111) = M n (cid:110) − D n ( ˆ ψ − ψ ) + θ n s n (cid:111) . (17)By Assumption [H], conditions of Theorem 1 and since ˆ ψ is a root-( θ n ) estimator, we have (cid:107) M n (cid:107) = O ( θ − / n ), (cid:107) D n (cid:107) = O P ( θ / n ) and (cid:107) ˆ ψ − ψ (cid:107) = O P ( θ − / n ), whereby we deduce ﬁrst (cid:107) M n D n ( ˆ ψ − ψ ) (cid:107) = O P ( θ − / n ) = o P (1) and θ n (cid:107) M n s n (cid:107) = O ( a n θ / n ) = o (1) , and second that M n (cid:96) (1) n, ( ψ ) − M n S n, ( ψ )( ˆ ψ − ψ ) = o P (1) which proves the result bySlutsky’s Theorem. 12 eferences Ba, I., Coeurjolly, J.F., 2020. High-dimensional inference for inhomogeneous gibbs point pro-cesses. arXiv preprint arXiv:2003.09830 .Baddeley, A., Chang, Y.M., Song, Y., Turner, R., 2012. Nonparametric estimation of thedependence of a spatial point process on spatial covariates. Statistics and its interface 5,221–236.Baddeley, A., Rubak, E., Turner, R., 2015. Spatial point patterns: methodology and applica-tions with R. CRC press.Barthelm´e, S., Trukenbrod, H., Engbert, R., Wichmann, F., 2013. Modeling ﬁxation locationsusing spatial point processes. Journal of Vision 13, 1–1.Cerf, M., Harel, J., Einhaeuser, W., Koch, C., 2008. Predicting human gaze using low-levelsaliency combined with face detection, in: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T.(Eds.), Advances in Neural Information Processing Systems 20. Curran Associates, Inc., pp.241–248.Choiruddin, A., Coeurjolly, J.F., Letu´e, F., et al., 2018. Convex and non-convex regularizationmethods for spatial point processes intensity estimation. Electronic Journal of Statistics 12,1210–1255.Choiruddin, A., Cuevas-Pacheco, F., Coeurjolly, J.F., Waagepetersen, R., 2020. Regularizedestimation for highly multivariate log Gaussian Cox processes. Statistics and Computing 30,649–662.Coeurjolly, J.F., Møller, J., et al., 2014. Variational approach for spatial point process intensityestimation. Bernoulli 20, 1097–1125.Cronie, O., Van Lieshout, M.N.M., 2018. A non-model-based approach to bandwidth selectionfor kernel estimators of spatial intensity functions. Biometrika 105, 455–462.Deubel, H., Schneider, W., 1996. Saccade target selection and object recognition: Evidence fora common attentional mechanism. Vision Research 36, 1827–1837.Fan, J., Lv, J., 2010. A selective overview of variable selection in high dimensional featurespace. Statistica Sinica 20, 101.Friedman, J., Hastie, T., Tibshirani, R., 2010. Regularization paths for generalized linear modelsvia coordinate descent. Journal of statistical software 33, 1.Gatrell, A.C., Bailey, T.C., Diggle, P.J., Rowlingson, B.S., 1996. Spatial point pattern analysisand its application in geographical epidemiology. Transactions of the Institute of Britishgeographers , 256–274.Guan, Y., Jalilian, A., Waagepetersen, R., 2015. Quasi-likelihood for spatial point processes.Journal of the Royal Statistical Society. Series B, Statistical methodology 77, 677.Helo, A., R¨am¨a, P., Pannasch, S., Meary, D., 2016. Eye movement patterns and visual attentionduring scene viewing in 3-to 12-month-olds. Visual Neuroscience 33.Ho-Phuoc, T., Guyader, N., Gu´erin-Dugu´e, A., 2010. A functional and statistical bottom-upsaliency model to reveal the relative contributions of low-level visual guiding factors. CognitiveComputation 2, 344–359. 13llian, J., Penttinen, A., Stoyan, H., Stoyan, D., 2008. Statistical analysis and modelling ofspatial point patterns. volume 70. John Wiley & Sons.Judd, T., Ehinger, K., Durand, F., Torralba, A., 2009. Learning to predict where humans look,in: IEEE International Conference on Computer Vision (ICCV).Møller, J., Waagepetersen, R.P., 2003. Statistical inference and simulation for spatial pointprocesses. CRC Press.Penttinen, A., Ylitalo, A.K., 2016. Deducing self-interaction in eye movement data using se-quential spatial point processes. Spatial Statistics 17, 1–21.Starck, J.L., Pantin, E., Murtagh, F., 2002. Deconvolution in astronomy: A review. Publicationsof the Astronomical Society of the Paciﬁc 114, 1051.Waagepetersen, R., 2008. Estimating functions for inhomogeneous spatial point processes withincomplete covariate data. Biometrika 95, 351–363.Wolfe, J.M., Horowitz, T.S., 2004. What attributes guide the deployment of visual attentionand how do they do it? Neuroscience 5, 495–501.Xu, H., Schoenberg, F.P., 2011. Point process modeling of wildﬁre hazard in Los AngelesCounty, California. The Annals of Applied Statistics , 684–704.Ylitalo, A.K., S¨arkk¨a, A., Guttorp, P., et al., 2016. What we look at in paintings: A comparisonbetween experienced and inexperienced art viewers. The Annals of Applied Statistics 10, 549–574.Zou, H., 2006. The adaptive lasso and its oracle properties. Journal of the American statisticalassociation 101, 1418–1429. 14 oisson Thomas l og ( M SE ( b ^ )) l og ( I M SE ( b ^ )) T P R F P R

12 18 24 32 40 50 60 72 84 98 112 12 18 24 32 40 50 60 72 84 98 112−6−5−4−3−21.61.71.81.9809010001020304050

KMethod

Poisson Likelihood Ridge Lasso N

200 800 1800

Figure 4: Plot of the log(MSE) (ﬁrst row), log(MISE) (second row), TPR (third row), andFPR (fourth row) obtained for scenarios where the function β is symmetric. The expectedtotal number of points and the used method is detailed in the bottom label, whilst the totalnumber of frequencies K is detailed on the x -axis. The simulated point pattern is detailed ineach column. The performance of the Poisson likelihood estimate for N = 200 is not good so ithas been deleted to avoid distortions. 15 oisson Thomas l og ( M SE ( b ^ )) l og ( I M SE ( b ^ )) T P R F P R

12 18 24 32 40 50 60 72 84 98 112 12 18 24 32 40 50 60 72 84 98 112−5−4−3−2−12.052.102.152.202.252.3060708090100204060

KMethod

Poisson Likelihood Ridge Lasso N

200 800 1800

Figure 5: Plot of the log(MSE) (ﬁrst row), log(MISE) (second row), TPR (third row), and FPR(fourth row) obtained for scenarios where the function β is asymmetric. The expected totalnumber of points and the estimation method are detailed in the bottom label, whilst the totalnumber of frequencies K is detailed on the x -axis. The simulated point pattern is detailed ineach column. The performance of the Poisson likelihood estimates for N = 200 is not good soit has been deleted to avoid distortions. 16 llll lllll Symmetric spectrum model Asymmetric spectrum model T r ue f un c t i onLa ss o − P o i ss on R i dge − P o i ss onLa ss o − T ho m a s R i dge − T ho m a s −2−1 0 1 2 3 4 5 Figure 6: True β function (ﬁrst row) and average (rows 2–4) estimates under diﬀerent modelsconsidering E { N ( W ) } = 1800 and K = 112. The used β function is detailed in the columnname whilst the simulated point processes and the used estimation procedure is speciﬁed onthe label of each row. All plots share the same scale. The red dot is the center of the image(the coordinate (512,393)) and it has been added as a reference.17 d) Nonparametric estimation (e) Convolution model (Lasso) (f) Convolution model (Ridge)(a) Raw image and data (b) Parametric method (c) Semiparametric method (Baddeley et al)0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 100002004006008000200400600800 Figure 7: Data points and saliency map (covariate Z ) are represented on the top left. The otherﬁgures represent estimates of the intensity function obtained with parametric (log-linear model),semiparametric and nonparametric methods. Estimates obtained from the log-convolutionmodel (Lasso and Ridge) are also represented. Images are centered and normalized for a bettervisualization. Adaptive Lasso Adaptive Ridge0 250 500 750 1000 0 250 500 750 10000200400600800

Figure 8: Adaptive Lasso and Ridge estimates of the ββ