[PDF] A Differential Model of the Complex Cell

Abstract

The receptive fields of simple cells in the visual cortex can be understood as linear filters. These filters can be modelled by Gabor functions, or by Gaussian derivatives. Gabor functions can also be combined in an `energy model' of the complex cell response. This paper proposes an alternative model of the complex cell, based on Gaussian derivatives. It is most important to account for the insensitivity of the complex response to small shifts of the image. The new model uses a linear combination of the first few derivative filters, at a single position, to approximate the first derivative filter, at a series of adjacent positions. The maximum response, over all positions, gives a signal that is insensitive to small shifts of the image. This model, unlike previous approaches, is based on the scale space theory of visual processing. In particular, the complex cell is built from filters that respond to the \twod\ differential structure of the image. The computational aspects of the new model are studied in one and two dimensions, using the steerability of the Gaussian derivatives. The response of the model to basic images, such as edges and gratings, is derived formally. The response to natural images is also evaluated, using statistical measures of shift insensitivity. The relevance of the new model to the cortical image representation is discussed.

Full PDF

11 A Differential Model of the Complex Cell

Miles Hansard and Radu Horaud I NRIA

Rhˆone-Alpes, 655 Avenue de l’Europe, Montbonnot, France 38330.

Keywords:

Spatial vision, complex cells, image processing.

Abstract

The receptive ﬁelds of simple cells in the visual cortex can be understood as linear ﬁlters. These ﬁlters canbe modelled by Gabor functions, or by Gaussian derivatives. Gabor functions can also be combined in an‘energy model’ of the complex cell response. This paper proposes an alternative model of the complexcell, based on Gaussian derivatives. It is most important to account for the insensitivity of the complexresponse to small shifts of the image. The new model uses a linear combination of the ﬁrst few derivativeﬁlters, at a single position, to approximate the ﬁrst derivative ﬁlter, at a series of adjacent positions. Themaximum response, over all positions, gives a signal that is insensitive to small shifts of the image. Thismodel, unlike previous approaches, is based on the scale space theory of visual processing. In particular,the complex cell is built from ﬁlters that respond to the 2- D differential structure of the image. Thecomputational aspects of the new model are studied in one and two dimensions, using the steerabilityof the Gaussian derivatives. The response of the model to basic images, such as edges and gratings, isderived formally. The response to natural images is also evaluated, using statistical measures of shiftinsensitivity. The relevance of the new model to the cortical image representation is discussed. It is useful to distinguish between simple and complex cells in the visual cortex, as proposed by Hubeland Wiesel (1962). The original classiﬁcation is based on four characteristic properties of simple cells,as follows. Firstly, the receptive ﬁeld has distinct excitatory and inhibitory subregions. Secondly, thereis spatial summation within each subregion. Thirdly, there is antagonism between the excitatory andinhibitory subregions. Fourthly, the response to any stimulus can be predicted from the receptive ﬁeldmap. Cells that do not show these characteristics can be classiﬁed, by convention, as complex.The distinction between simple and complex cells has endured, subject to certain qualiﬁcations (al-though an alternative view is described by Mechler and Ringach, 2002). In particular, it has been arguedthat the principle characteristic of complex cells is the phase invariance of the response (Carandini, 2006;Movshon et al., 1978a). This means that a complex cell, which is tuned to a particular orientation andspatial frequency, is not sensitive to the precise location of the stimulus within the receptive ﬁeld (Kjaeret al., 1997; Mechler et al., 2002). If the stimulus consists of a moving (or ﬂickering) grating, then phaseinvariance can be quantiﬁed by the relative modulation a /a , where a is the amplitude associated withthe fundamental frequency of the response, and a is the mean response (De Valois et al., 1982; Movshonet al., 1978a; Skottun et al., 1991). If this ratio is less than one, then the cell can be classiﬁed as complex.The standard model of the simple cell is based on a linear ﬁlter that is localized in position, spatialfrequency and orientation. The output of this ﬁlter is subject to a nonlinearity, such as squaring, and mayalso be normalized by the responses of nearby simple cells. The theoretical framework of this model iswell advanced (as reviewed by Carandini et al., 2005; Dayan and Abbott, 2001), owing to the assumedlinearity of the underlying spatial ﬁlters. Although the physiology of the complex cell is increasinglywell-understood, (as reviewed by Alonso and Martinez, 1998; Martinez and Alonso, 2003; Spitzer andHochstein, 1988), the appropriate theoretical framework is less clear. It is useful to adopt the distinction a r X i v : . [ q - b i o . N C ] D ec etween ‘position’ and ‘phase’ models that was made by Fleet et al. (1996) in the analysis of binocularprocessing. It is invariance to position or phase that is of interest in the present context.A position-invariant model of the complex cell can be constructed from a set of simple cells, asdescribed by Hubel and Wiesel (1962). Suppose, for example, that the simple cells are represented bylinear ﬁlters of a common orientation, but different spatial positions. If the responses of these ﬁltersare summed, then the corresponding complex cell will be tuned to an oriented element that appearsanywhere in the union of the simple cell receptive ﬁelds (Spitzer and Hochstein, 1988). This scheme isthe basis of the subunit model , which was further developed and tested by Movshon et al. (1978a,b). Theresults of these experiments are consistent with the idea that the complex response is based on a groupof spatially linear subunits. The most straightforward way to combine the individual responses would beby rectifying and summing them, but the maximum could also be taken (Riesenhuber and Poggio, 1999).The main disadvantage of the subunit model is that it is too general in its basic form. For example, itallows arbitrarily complicated receptive ﬁelds to be constructed, as there is no intrinsic constraint on thepositions, orientations or spatial frequencies of the subunits.Alternatively, a phase-invariant model can be constructed from a pair of ﬁlters of different shapes((Adelson and Bergen, 1985)). If the ﬁlters have a quadrature relationship, then their responses to asinusoidal stimulus will differ in phase by π/ . It follows that the sum of the squared outputs will beinvariant to the phase of the input. The application of this energy model to spatial vision (Atherton, 2002;Emerson et al., 1992; Morrone and Burr, 1988; Wundrich et al., 2004) is motivated by the observedphase-differences in the receptive ﬁelds of adjacent simple cells Pollen and Ronner (1981). Indeed,these receptive ﬁelds can be well-represented by odd and even Gabor functions (Daugman, 1985; Jonesand Palmer, 1987; Pollen and Ronner, 1983). There are two problems with the energy model, in thepresent context. Firstly, there is no generally agreed way to combine energy mechanisms across differentfrequencies and orientations (one approach is described by Fleet et al., 1996). This is an obstacle to theconstruction of mechanisms that show more complicated invariances, such as those found in areas MTand MST (Orban, 2008). Secondly, the quadrature ﬁlters that are best-suited to the energy model arenot convenient for the general description of 2- D image-structure. The concept of phase itself becomessomewhat complicated in two or more dimensions (Felsberg and Sommer, 2001), and the quadraturerepresentation of more complex image-features, such as edge curvature, is unclear. It must be emphasizedthat 2- D images contain important structures (e.g. luminance saddle-points) that have no analogue in the1- D signals to which the energy model is ideally suited. A realistic framework for spatial vision must becapable of representing the full variety of 2- D structures (Ben-Shahar and Zucker, 2004; Dobbins et al.,1987; Petitot, 2003).The present work is motivated by the difﬁculty of extending the energy model to more complicated2- D image-features and spatial transformations. These extensions require a representation of the localimage geometry , rather than the local phase and frequency structure. The new approach, like the energymodel, is based on a set of odd and even linear ﬁlters that are located at the same position. The outputs ofthese ﬁlters are nonlinearly combined, again as in the energy model. The combination, however, involvesthe implicit construction of spatially-offset subunits. The complete model, in this sense, can be seen as are-formulation of the Hubel and Wiesel (1962) scheme. A minimal overview of the new model will now be given. Let S ( x ) be the original scalar image,where x = ( x, y ) (cid:62) , and consider a spatial array of simple cells, parameterized by preferred frequencyand orientation. These receptive ﬁelds will be modelled by k -th order directional derivatives of theGaussian kernel, G k ( x , σ, θ ) = ( v · ∇ ) k G ( x , σ ) where σ is the spatial scale, θ is the orientation, and v = (cos θ, sin θ ) (cid:62) . The simple cell representation S k ( x , σ, θ ) is given by the convolution of these ﬁlterswith the image: S k ( x , σ, θ ) = G k ( x , σ, θ ) (cid:63) S ( x ) . (1)In particular, consider the magnitude of the ﬁrst derivative signal, | S ( x , σ, θ ) | . This will be large if thereis a step-like edge at x , with the luminance boundary perpendicular to v . Now suppose that the edgeis shifted by some amount in direction v . This means that the magnitude | S ( x , σ, θ ) | will fall, but thenonlinear function C ( x , σ, θ ) = max t (cid:12)(cid:12) S ( x + t v , σ, θ ) (cid:12)(cid:12) , where | t | ≤ ρ (2)2ill remain large, unless the shift exceeds the range ρ . Equations (1 & 2) will be the basic models ofsimple cells S k ( x , σ, θ ) and complex cells C ( x , σ, θ ) in this paper (full derivations are given in sec. 2).The complex cell, which inherits the scale and orientation tuning ( σ, θ ) , has a receptive ﬁeld of radius ρ ,centred on position x . It can be seen that (2) is just a special case of the Hubel and Wiesel (1962) subunitmodel, with simple cells distributed along the spatial axis v , and ‘max’ being the combination rule. It hasalready been argued, in section 1, that this model is too general. For example, there is no natural limit onthe size ρ of the complex receptive ﬁeld in (2).Suppose, however, that access to the ﬁrst-order directional structure around position x is replacedby access to the higher-order directional structure at position x . Mathematically, this means that thefunction S ( x + t v , σ, θ ) of the scalar t is replaced by the values S k ( x , σ, θ ) indexed by k = 1 , . . . , K .This is interesting for three reasons: Firstly, the model becomes inherently local, because the ﬁlters G k that compute the values S k are now centred at the same point x . Secondly, the ﬁlters G k are symmetricor antisymmetric about the point x , and resemble the Gabor functions used in the energy model. Thirdly,the values S k can be obtained from a linear transformation of the K -th order local jet at x , and so thisscheme is compatible with the scale space theory described above.To be more speciﬁc, it will be shown that the ﬁrst-order structure in the neighbourhood x + t v , as in(2), can be estimated from a linear combination of the directional derivatives S k , S ( x + t v , σ, θ ) ≈ K (cid:88) k =1 P k ( t ) S k ( x , σ, θ ) (3)where the functions P k ( t ) are ﬁxed polynomials. This approximation will then be substituted into theright-hand side of (2). It will be shown in section 2.2 that the approximation (3) can be motivated by aMaclaurin expansion in powers of t . This can also be interpreted, as shown in ﬁgure 1, as the synthesisof spatially offset ﬁlters, using the Gaussian derivatives as a basis. A matrix formulation of this modelwill be given in section 2.3. An optimal (and image-independent) construction of the polynomials P k ( t ) will be given in section 2.4. The case in which the derivatives on the right-hand side of (3) are in anotherdirection φ (cid:54) = θ , is treated in section 2.5. The model presented in this paper is quite different from the previous approaches, as explained above.The main contribution is a ‘differential’ model of the complex cell, which is exactly steerable, and whichﬁts naturally into the geometric approach to image analysis (Koenderink and van Doorn, 1987, 1990).This shows that it is possible to analyze the local image geometry, and to obtain a shift-invariant response,using a common set of ﬁlters.The body of the paper is organized as follows. The new model is developed in section 2, using linearalgebra and least-squares optimization. The accuracy of the estimated ﬁlters is evaluated in section 3.This section also derives the exact response of the ideal ﬁlters to several basic stimuli. Some preliminaryexperiments on natural images are reported. The biological interpretation of the model, and its predic-tions, are discussed in section 4. Future directions, and the relationship of the model to scale-space theory(Koenderink, 1984), are also discussed.

The following notation will be used here. Matrices and vectors are written in bold, e.g. M , v , where M (cid:62) is the transpose, and M + is the Moore-Penrose inverse (Press et al., 1992). The convolution of functionsis F ( x ) (cid:63) G ( x ) = (cid:82) ∞−∞ F ( x − y ) G ( y ) d y . Some properties of the Gaussian derivatives G k ( x, σ ) willnow be reviewed. There is no particular spatial scale at which a natural image should be analyzed.It is therefore desirable to represent the image in a scale space , so that a range of resolutions can beconsidered (Koenderink and van Doorn, 1987). The preferred way to do this is by convolution witha Gaussian kernel. It follows that the structure of the image, at a given scale, can be analyzed viathe spatial derivatives of the corresponding Gaussian. The k -th order derivatives of a 1- D Gaussian,3 k = d k / d x k G can be expressed as G k ( x, σ ) = (cid:18) − σ √ (cid:19) k H k (cid:18) xσ √ (cid:19) G ( x, σ ) (4) G ( x, σ ) = exp (cid:18) − x σ (cid:19) (5)where G ( x, σ ) is the original Gaussian, k is a positive integer, and H k ( x ) is the k -th Hermite polyno-mial. The ﬁrst seven Gaussian derivatives are shown in column one of ﬁgure 1. It will also be useful tointroduce two normalizations of the Gaussian derivatives. G k ( x, σ ) = 1 σ √ π G k ( x, σ ) and G k ( x, σ ) = 12 G k ( x, σ ) (6)which are deﬁned so that (cid:82) | G kk ( x ) | d x = 1 . In particular, G and G are the L -normalized blurringand differentiating ﬁlters, respectively. This superscript notation will not be used unless a particularnormalization is important (e.g. in sec. 3.2).The two-dimensional Gaussian derivative, in direction θ with x = ( x, y ) (cid:62) , will be written G k ( x , σ, θ ) =( v · ∇ ) k G ( x , σ ) , as in section 1.1. Two special properties of these ﬁlters should be noted. Firstly, theﬁlter G k ( x , σ, θ ) is separable in the local coordinate-system that is deﬁned by the direction of differ-entiation. This means that the 2- D ﬁlter can be obtained from the product of 1- D ﬁlters G k ( x θ , σ ) and G ( y θ , σ ) . Secondly, the Gaussian derivatives are steerable , meaning that G k ( x , σ, θ ) can be obtainedfrom a linear combination of derivatives in other directions, G k ( x , σ, φ j ) , where j = 1 , . . . , k + 1 . Thesefacts make it possible to analyze a multidimensional ﬁlter, in many cases, in terms of 1- D functions(Freeman and Adelson, 1991).The ﬁrst derivative, G , will be used as the basic model of a complex subunit (which is also a simplecell receptive ﬁeld). This choice is motivated by two observations. Firstly, it is well established thatgradient ﬁlters can be used to detect edges, as well as more complex image-features (Canny, 1986;Harris and Stephens, 1988). Secondly, G is the ﬁrst zero-mean ﬁlter in the local-jet representationof the image, which is physiologically and mathematically convenient. The extension to higher-ordersubunits is straightforward, as discussed in section 4.3. This section will put the system of simple cells, introduced in section 1.1, into a standard signal pro-cessing framework. This will be done in 1- D , in order to simplify the notation. The extension to 2- D isstraightforward. The 1- D version of the simple cell response (1) is S k ( u, σ ) = (cid:82) ∞−∞ G k ( u − x, σ ) S ( x )d x .If k = 1 , and the ﬁlter G is offset by an amount t , then the convolution can be expressed as S ( u − t, σ ) = (cid:90) ∞−∞ G ( u − t − x, σ ) S ( x ) d x = − (cid:90) ∞−∞ G ( x − t, σ ) S ( x − u ) d x. (7)Here the antisymmetry of G ( x, σ ) has been used to show that the result is equal to the negative corre-lation of the ﬁlter and signal. It is evident that if t could be kept equal to u , then the signal shift wouldhave no effect on the result. The prototypical stimulus for G is the step-edge S ( x ) = sgn( x ) . In thiscase the ﬁlter and signal are anti-correlated, and so the ﬁnal response (7) is non-negative. It was established in the section 2.1 that the response S ( u − t, σ ) can be constructed from the offset ﬁlters G ( x − t, σ ) . This means that the desired approximation (3) can be treated as a ﬁlter-design problem.The following notation will be adopted for the offset ﬁlters: F (0 , x ) = G ( x, σ ) F ( t, x ) ≈ G ( x − t, σ ) (8)4hich also depend on the spatial scale and derivative order, but it will not be necessary to make thisexplicit in the notation. It will sufﬁce to analyze a single ﬁlter which, without loss of generality, islocated at the origin x = (0 , (cid:62) of the spatial coordinate-system. The linear response of this ﬁlter isdeﬁned in relation to (7) as R ( t, u ) = − (cid:90) ∞−∞ F ( t, x ) S ( x − u ) d x. (9)The complex response at x = (0 , (cid:62) , with reference to (2), can now be expressed as a function of thesignal translation u ; C ( u ) = max t (cid:12)(cid:12) R ( t, u ) (cid:12)(cid:12) where | t | ≤ ρ. (10)The actual value of u , in general, has no particular signiﬁcance. It will be more important to considerthe response R ( t, u ) as u changes. In particular, suppose that | R ( t, u ) | is high at the stimulus position u = u . If the response is insensitive to slight translation of the signal, then ∂ C (cid:14) ∂u ≈ at u .The approximation problem in (8) will now be addressed. The ﬁlter F ( t, x ) can be deﬁned in relationto the Maclaurin expansion of G ( x − t, σ ) with respect to the offset t , as indicated in (8). If image-derivatives up to order K are available, then the approximation is F ( t, x ) = K − (cid:88) k =0 ( − t ) k k ! G k +1 ( x, σ ) (11)The key observation is that the ﬁlters G k +1 ( x, σ ) in (11) are precisely those that compute the local jetcoefﬁcients, of order , . . . , K , at the point x = 0 . In other words, the family of shifted ﬁlters F ( t, x ) has been obtained from the family of non-shifted derivatives G k ( x, σ ) . It can be seen from (4,5) and (11)that the estimated function F ( t, x ) decreases to zero for large | x | , owing to the exponential tails of G .The deﬁnition (11) is usable in practice, as will be shown in section 3.1. There are, however, twodifﬁculties with the scheme described above. Firstly, although F ( t, x ) is an approximation of G ( x − t, σ ) , the nature of this approximation (a polynomial with the given derivatives) may be inappropriate.Secondly, as expected, the approximation (11) is not well-behaved for large | t | . Both of these problemscan be addressed by replacing the Maclaurin series with a more ﬂexible construction of F ( t, x ) . This isdone by substituting a general polynomial P k ( t ) in place of each monomial ( − t ) k /k ! in (11), leading to F ( t, x ) = K (cid:88) k =1 P k ( t ) G k ( x, σ ) . (12)The K polynomials P k ( t ) are constructed from standard monomial basis functions t j and coefﬁcients c jk . The order of each polynomial will be K − , for consistency with the original series approximation(11). It follows that the polynomials are P k ( t ) = K − (cid:88) j =0 t j c jk where < k ≤ K. (13)The problem has now been altered to that of ﬁnding K appropriate coefﬁcients c jk . This will be treated,in sections 2.4 and 2.5, as the optimization of arg min C (cid:90) (cid:90) (cid:12)(cid:12) F ( t, x ) − G ( x − t, σ ) (cid:12)(cid:12) d x d t where F ( t, x ) is the family of ﬁlters deﬁned in (12), and C is the matrix of coefﬁcients c jk . This opti-mization scheme generalizes immediately to ﬁlters in any number of dimensions. The simple Maclaurinscheme (11) remains a useful model, because the optimal polynomials are, in practice, close to the origi-nal monomials P k ( t ) ≈ t ( k − , as can be seen in ﬁgure 1.It is important to note that, once the coefﬁcients c jk have been estimated, the location of the syntheticﬁlter F ( t, x ) can be varied continuously with respect to the offset t . Any set of translated ﬁlters F ( t i , x ) can be obtained, provided | t i | ≤ ρ for i = 1 , . . . M , by re-sampling the monomial basis functions as t ji and, then repeating (12 & 13). Furthermore, the principle of shiftability states that the convolution f ( x − ) (cid:63) s ( x ) can be represented in a ﬁnite basis f ( x − t i ) , provided that the ﬁlter f is bandlimited (Perona,1995; Simoncelli et al., 1992). The ﬁlters F ( t, x ) are not bandlimited, but they do decay exponentially, asthe Fourier transforms have a Gaussian factor. This means, in practice, that the linear response R ( t, u ) in(9) can be represented by a suitable discretization R ( t i , u ) , where the shift-resolution ∆ t = 2 ρ/ ( M − can be chosen to achieve any desired accuracy. It will be convenient to represent the ﬁlter construction in terms of matrices. This results in a compactformulation, and prepares for the least-squares estimation procedure that will be introduced in section 2.4.Suppose that M ﬁlters, each of length N are to be constructed, and that the highest available derivativeis of order K . Each ﬁlter will be represented as a row-vector, so that the collection of offset ﬁlters formsan M × N matrix F . Note that this representation applies in any number of dimensions, provided thatthe positions of the ﬁlter-samples are consistently identiﬁed with the column-indices of F .The columns of another matrix P will contain the K polynomials P k ( t ) from equation (13). Thesepolynomials must be sampled at M points t i , hence P has dimensions M × K . Let the sampled monomialbasis functions t ji be the columns of the matrix B , which must therefore have the same dimensions, M × K . The monomials are weighted by the K × K matrix of coefﬁcients C , such that P = BC (14)where column k of C contains the coefﬁcients of P k . Let each row of the K × N matrix G contain thesampled Gaussian derivative G k ( x i , σ ) . Each offset ﬁlter should be a linear combination of the Gaussianderivatives, constructed from the polynomials P k . It follows from (12) that F = PG . (15)Let the column-vector s contain the sampled signal, s = (cid:0) S ( x ) , . . . , S ( x N ) (cid:1) (cid:62) . This means that theresponse-vector r = (cid:0) R ( t , u ) , . . . , R ( t M , u ) (cid:1) (cid:62) is obtained according to (9 & 15) as r = − Fs = − PGs . (16)This clearly shows that the response r is simply a linear transformation P of the K -th order Gaussianjet, Gs = (cid:0) S ( x, σ ) , . . . , S K ( x, σ ) (cid:1) (cid:62) . The implication is that the ﬁlter-bank F need not be explicitlyconstructed; rather, the response r is computed directly from the K image derivatives S k at x . It will now be shown that the K unknown coefﬁcients, contained in the matrix C , can be obtained bystandard least-squares methods. It should be emphasized that this is a ﬁlter-design problem; the matrix P is ﬁxed for all signals, and the response r is obtained according to (16).Let the M × N matrix F (cid:63) contain the true derivative ﬁlters, such that the ij -th element is G ( x j − t i , σ ) . The approximation F ≈ F (cid:63) can be expressed, according to (14 & 15), as the product F (cid:63) ≈ BCG , of which C = B + F (cid:63) G + (17)is the solution in the least-squares sense. This formulation requires two Moore-Penrose inverses, whichcan be computed from the singular-value decompositions of the monomial basis and Gaussian derivativematrices B and G , respectively. It is, however, more efﬁcient to solve this problem using QR decom-positions, as follows. There are, in practice, more offsets than derivative ﬁlters M > K , as well asmore spatial samples than derivative ﬁlters

N > K . The basis matrix has full column-rank K , and canbe factored as B = Q B R B . The derivative matrix has full row-rank K , and so its transpose can befactored as G (cid:62) = Q G R G . It follows that the solution (17) can be obtained via B + = R − B Q (cid:62) B and G + = Q G R −(cid:62) G . 6 erivatives -2.502.5 Polynomials -2.502.5-2.502.5-2.502.5-2.502.5-2.502.5-2.502.5

Estimated Ideal

Figure 1: Construction of offset ﬁlters.

Column 1:

The Gaussian derivatives G k ( x, σ ) , scaled for display,of orders , . . . , K , where K = 7 . Column 2:

The corresponding polynomial interpolation functions P k ( t ) , of order K − . Note that P k ( t ) resembles the monomial t ( k − . Column 3:

Estimated ﬁlters, F ( t j , x ) which are offset versions of G ( x, σ ) . Column 4:

Ideal ﬁlters F (cid:63) ( t j , x ) . The synthesis equationis F ( t j , x ) = (cid:80) k P k ( t j ) G k ( x, σ ) , where each weight P k ( t j ) corresponds to the j -th dot on the k -thpolynomial in column two. The least-squares construction of the ﬁlters F was described in the preceding section. The method isquite usable, but has two shortcomings. Firstly, if one of the shifts t i is zero, then F (0 , x ) ≈ G ( x, σ ) ,but it would be preferable to make this an exact equality, so that the original ﬁlter is returned as in (8).The second shortcoming of the method in 2.4 is that, in two or more dimensions, the orientation of thederivative ﬁlters in the basis-set G may not match that of the target-set F (cid:63) . Both of these problems willbe solved below. 7 − − − −4 −2 0 2 4 − − −4 −2 0 2 4 −4 −2 0 2 4 Figure 2: Synthesis of orientation-tuned subunits. The nine ﬁlters were synthesized from eight orientedderivatives centred at the origin (with of σ = 1 ). The steered-additive solution of section 2.5.3 wasused. Middle row:

The G ﬁlter is steered to the axes of a hexagonal lattice; θ = 0 ◦ , ◦ , ◦ . Topand bottom rows:

Offset ﬁlters, synthesized at shifts of t = ± ρ in the corresponding directions (with ρ = 1 . ). Note that each column can be interpreted as three subunits of an orientation-tuned complexcell. The ﬁlter amplitudes are scaled to the range [ − , , with contour lines separated by increments of0.2 units. The requirement F (0 , x ) = G ( x, σ ) is satisﬁed by an additive model, in which the polynomial P ( t ) that weights G ( x, σ ) is always unity, and all other polynomials pass through zero when t = 0 . Thisimplies the following partitioning of the derivative, monomial and coefﬁcient matrices: G = (cid:32) g (cid:62) G ∆ (cid:33) B = (cid:0) ∆ (cid:1) C = (cid:18) (cid:62) ∆ (cid:19) (18)where is the column-vector of M ones, and is the column-vector of ( K − zeros. The × N vector g (cid:62) contains the ﬁrst derivative ﬁlter G ( x, σ ) , while the ( K − × N matrix G ∆ contains the higher-order ﬁlters. The columns of the M × ( K − matrix B ∆ contain the sampled monomials, excluding the constant vector . The unknown matrix C will be recovered in the form indicated, where C ∆ hasdimensions ( K − × ( K − .The product of B and C , as in (14), now gives P = ( ∆ ) , where the columns of P ∆ = B ∆ C ∆ are polynomials without constant terms. It follows that the product PG , as in (15), gives the additive8pproximation F (cid:63) ≈ (cid:62) + B ∆ C ∆ G ∆ (19)where (cid:62) is the rank-one matrix containing M identical rows g (cid:62) . Note that if the i -th row of F (cid:63) corre-sponds to t = 0 , then the i -th row of B ∆ C ∆ must be zero, this being the evaluation of the polynomials (cid:80) K − j =1 t j c jk at t = 0 . It follows that the i -th row of F (cid:63) is exactly recovered from (19) as g (cid:62) , and so theconstraint F (0 , x ) = G ( x, σ ) has been imposed.The unknown coefﬁcients C ∆ are recovered by subtracting (cid:62) from F (cid:63) , and then proceeding byanalogy with (17). This leads to C ∆ = B +∆ (cid:0) F (cid:63) − (cid:62) (cid:1) G +∆ where the matrices B +∆ and G + (cid:62) ∆ can be obtained from the QR factorizations of B ∆ and G ∆ , as before. In two (or more) dimensions, it is assumed that the desired ﬁlters G ( x − t, θ, σ ) have a common orien-tation, where v = (cos θ, sin θ ) (cid:62) is the direction of the derivative in 2- D . This leads to invariance withrespect to translations of the signal in the given direction. The basis ﬁlters G k ( x, σ, θ ) , however, willtypically have a range of orientations φ (cid:96) (cid:54) = θ . This problem can be solved as follows.Recall from section 2 that the k -th order Gaussian derivative is steerable with a basis of size k + 1 .Now suppose that row k of the matrix G is replaced by k +1 rows, containing sampled ﬁlters G k ( x, φ (cid:96) , σ ) at (cid:96) = 1 , . . . , k + 1 distinct orientations φ (cid:96) . The enlarged matrix G φ now has dimensions M K × N ,where M K = K (cid:88) k =1 ( k + 1) = K ( K + 3) . (20)It follows that there is a K × M K ‘steering’ matrix D such that G = DG φ is exactly the K × N matrix of derivatives at the desired orientation. Moreover, if the approach of section 2.4 is applied to the M K × N matrix G φ , then a solution F = BC φ G φ = BCG (21)will be obtained. It follows that the two coefﬁcient matrices are related by C φ = CD . In summary, if thematrix G contains a sufﬁcient number M K of differently oriented ﬁlters, then a set of translated ﬁlters F can be approximated in any common orientation θ . There is no change to the algorithm describedin section 2.4. It should, however, be noted that (20) shows a trade-off between translation invarianceand steerability. Larger translation invariance requires more derivatives, but these become increasinglydifﬁcult to steer. The steered solution, as described in the previous section, will not automatically be additive, in the senseof (19). This problem will be solved, with reference to section 2.5.1, by putting an explicitly steeredﬁlter g (cid:62) θ in place of g (cid:62) . The ﬁrst derivative can be steered with respect to a basis of ﬁlters at distinctorientations φ and φ (these would be the ﬁrst two rows of the M K × N matrix G φ ). The standardsteering equation (Freeman and Adelson, 1991) can be simpliﬁed in this case to (cid:18) cos θ sin θ (cid:19) = (cid:18) cos φ cos φ sin φ sin φ (cid:19) (cid:18) p p (cid:19) where θ , φ and φ are known angles. This system can be solved exactly for the unknown coefﬁcients p and p , resulting in p = sin( φ − θ ) /δ and p = sin( θ − φ ) /δ where δ = sin( φ − φ ) . (22)It may be noted that if φ = 0 and φ = π/ , then the solution reduces to the usual coefﬁcients p = cos θ and p = sin θ for the construction of the directional derivative from d / d x and d / d y . The additive steeredapproximation can now be deﬁned, using the new ﬁlter G ( x , θ, σ ) , as F (cid:63) ≈ (cid:62) θ + B ∆ C φ ∆ G φ ∆ where g (cid:62) θ = p g (cid:62) φ + p g (cid:62) φ . (23)9his system can be solved in the same way as (19). Note that the higher-order ﬁlters in G φ ∆ will beimplicitly steered, as described in section 2.5.2. Two issues are addressed in this evaluation, as follows.

Approximation : The accuracy of the least-squaresalgorithms from sections 2.4 and 2.5 is established in section 3.1.

Characterization : The response of theunderlying model from section 1.1 to basic stimuli, as well as to natural images, is analyzed in sections 3.2and 3.3 respectively. Note that the issues of approximation and characterization are addressed separately,in order to avoid mixing different sources of error. Hence section 3.1 will evaluate the approximate ﬁlters F , while sections 3.2 and 3.3 will analyze the ideal ﬁlters F (cid:63) . The accuracy of the ﬁlter approximations will be evaluated in this section, and it will be shown that theleast-squares methods are superior to the original Maclaurin expansion. The evaluation is based on theroot mean-square (RMS) error between the target and synthetic ﬁlters.The accuracy of a given ﬁlter-synthesis method is determined by two variables; the range of offsets,and the number of available derivatives (size of the basis). Better approximations can, in general, beobtained by reducing the range of offsets and/or increasing the size of the basis. The range ρ = 1 σ isthe smallest that results in a unimodal impulse response, as will be shown in section 3.2.1. It is thereforeimportant to analyze the corresponding approximation. In addition, the larger range ρ = 1 . σ will beanalyzed. This leaves the size of the basis (for which there is no prior preference) to be varied in eachcase.The method of evaluation is illustrated in ﬁgure 3. It can be seen that the furthest-offset ﬁlters beginto depart from the target shape. The RMS difference between the ideal and approximate ﬁlters, for eachtest, was measured over 51 offsets t i in the range ± ρ . Each ﬁlter was sampled at 101 points x j in therange ± σ , which contains the signiﬁcantly nonzero part of all ﬁlters (see ﬁg. 3).The RMS error, for basis sizes K = 3 , . . . , is shown in ﬁg. 4. The meaning of the RMS error,in terms of ﬁlter distortion, can be gauged with reference to ﬁg. 3. For example, it can be seen that theMaclaurin model with K = 8 and ρ = 1 . is poor, and this corresponds to a point around the middle ofthe RMS axis in ﬁg. 4.In the case of the Maclaurin approximation (top row) it can be seen that the error increases rapidlyand monotonically with respect to the offset. The pattern is more complicated for the least-squaresapproximations, because the error has been minimized over an interval ± ρ , which effectively truncatesthe basis functions in x . Nonetheless, the lines corresponding to the different basis-sizes remain nested;they cannot cross, because increasing the size of the basis cannot make the approximation worse. It is,however, possible for the lines to meet. In particular, the unconstrained lines meet in pairs at t = 0 .This is because the target function at zero offset is anti-symmetric. It follows that the incorporation of a symmetric basis function G k cannot improve an existing approximation of order k − . In the case ofthe additive approximation, all lines meet at t = 0 , where the error is zero by construction. A number of basic signals will be introduced below, and the ideal responses will be derived; furtherexamples are given in Hansard and Horaud (2010). The responses are ‘ideal’ in the sense that the errorof the least-squares approximations (sec. 2.4–2.5.3) will be ignored. This is primarily in order to obtainuseful results, but there are two further justiﬁcations. Firstly, it has been demonstrated in the precedingsection that the approximations are good, over an appropriate range ρ . Secondly, the approximation errorcan be made arbitrarily low, by using a large enough basis for the given range.Recall that the offset ﬁlters F ( t, x ) are copies of the Gaussian derivative G ( x, σ ) . It follows from(9) that the response function is covariant to the shift t , in the sense that R ( t, u ) = R (0 , u − t ) . (24)10 aclaurin, K=8 Maclaurin, K=7Additive, K=8 x−6 −4 −2 0 2 4 6 Additive, K=7 x−6 −4 −2 0 2 4 6

Figure 3: Filter deformation.

Top left:

The eighth-order Maclaurin synthesis (11) of ﬁlters σ = 1 ,over a range ρ = ± . σ of offsets. Large errors are visible in the most extreme ﬁlters. Top right:

The approximation is much worse if the order of the basis is reduced by one.

Bottom left, right:

Theleast-squares approximation is much better, even if the additivity constraint is enforced.It therefore sufﬁces to obtain the linear response for the case t = 0 , as the other responses are simplytranslations of this function. The linear response in this case is R (0 , u ) = G ( x, σ ) (cid:63) S ( x − u ) , byanalogy with (7). Note that the normalized ﬁlter has been used, as deﬁned in (6). It follows that R (0 , u ) can be obtained by blurring the signal with the ﬁlter G ( x, σ ) , and then differentiating the result. Thecomplex response C ( u ) is given by the max operation (10). Evidently C ( u ) is the upper envelope ofthe family | R ( t, u ) | , but it is possible to be more precise than this. In particular, the shift-insensitivity ofthe model can be quantiﬁed by determining the intervals of u over which C ( u ) is constant, as describedbelow.The response | R (0 , u ) | to a basic signal S ( x − u ) can be either symmetric or antisymmetric, andeither periodic or aperiodic. However, a common property of the responses considered here is that thelocal maxima are all of equal height. Let | R (0 , u (cid:63) ) | = R (cid:63) be a local maximum, and suppose that u iswithin range of this maximum, meaning that | u − u (cid:63) | ≤ ρ . It follows that C ( u ) = R (cid:63) , because themaximum in (10) will be found at t = u − u (cid:63) , and | R ( u − u (cid:63) , u ) | = | R (0 , u (cid:63) ) | = R (cid:63) by (24). Anintuitive summary of this is that each local maximum | R (0 , u (cid:63) ) | generates a plateau C ( u (cid:63) ± ρ ) = R (cid:63) inthe complex response. In order to make use of this interpretation, the function V ( u ) will be deﬁned asthe signed distance u − u (cid:63) to the nearest local maximum of | R (0 , u ) | . It follows that C ( u ) =  R (cid:63) if | V ( u ) | ≤ ρ max | t |≤ ρ | R ( t, u ) | otherwise . (25)This explicitly identiﬁes the intervals, | V ( u ) | ≤ ρ , over which C ( u ) is constant. Note that if V ( u ) > ρ then the original deﬁnition (10) is used. The functions R (0 , u ) and V ( u ) , as well as the constant R (cid:63) , willnow be derived for each of the basic signals. It should be emphasized that V ( u ) and R (cid:63) are only used to characterize the response; they are not part of the computational model. The ﬁrst test signal to be considered is the unit impulse, which can be used to characterize the initiallinear stage of the model. The impulse is deﬁned as S σ ( x ) = δ ( x ) (26)11 l l l l Maclaurin0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r l l l l l l l l l Maclaurin0.0 0.5 1.0 1.5 l l l Unconstrained0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r l l l l l Unconstrained0.0 0.5 1.0 1.5 l l l Additive0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r Offset l l l l l Additive0.0 0.5 1.0 1.5Offset

Figure 4: Error vs. offset.

Top row:

The Maclaurin error rises quickly as the target ﬁlter is offset fromthe centre of the basis. Each line represents a different basis-size, k = 3 , . . . , , as indicated. Theleft and right plots show ranges ρ = 1 σ , and ρ = 1 . σ , respectively. Middle row:

The unconstrainedleast-squares approximation is much better, especially for high-order bases.

Bottom row:

The additiveapproximation is also good, and ensures that the error is zero when there is no offset (as in the Maclaurincase).where δ ( x ) is the Dirac distribution. It follows that the linear response G (cid:63) S σ is just the originalnormalized derivative ﬁlter, R σ (0 , u ) = G ( u, σ ) . (27)The maxima of the linear response can be found by differentiating R σ (0 , u ) , and setting the result tozero. The derivative contains a factor σ − u , and so the zeros are at ± σ . The peak is at − σ , and itfollows that the maximum response is R (cid:63)σ = G ( − σ, σ ) . (28)Both extrema become peaks in | R σ (0 , u ) | , and the extent of the response plateau is determined by theminimum distance from these. The distance function for the impulse response can now be deﬁned as V σ ( u ) = u − sgn + ( u ) σ, (29)where sgn + is the sign-function with the convention sgn + (0) = 1 . If u = 0 then | V σ ( u ) | = σ , and itfollows from (25) that C ( u ) (cid:54) = R (cid:63)σ if σ > ρ . It has already been established that | R σ (0 , u ) | has maximaof R (cid:63)σ at ± σ , which implies that the response C ( u ) will be bimodal unless σ ≤ ρ (30)12s illustrated ﬁgure 5. This condition is strictly imposed, as it would be undesirable to have a bimodalresponse to a unimodal signal. In general, ρ should be made as large as possible for a given σ , in order toachieve as much shift-invariance as possible. Recall, for example, that the least-squares approximationsin section 3.1 were demonstrated for ρ = 1 . σ . x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 5: Impulse response.

Left column: the top plot shows the unit impulse, S σ ( x ) , as in (26). Themiddle plot shows the response C ( u ) . The bottom plot shows the distance function | V σ ( u ) | , as in (29),along with the value of the maximum offset ρ = 1 σ . The response is constant, C ( u ) = R (cid:63)σ , when | V σ ( u ) | ≤ ρ . The critical case, σ = 1 , ρ = 1 is plotted. Right column: as before, except ρ = 0 . σ . Theresponse becomes bimodal, which shows the importance of the condition σ ≤ ρ . The second test signal to be considered is the unit step function. This is arguably the most importantexample, because it is the basic model for a luminance edge . Indeed, the current model is optimized forthe detection of step-like edges, owing to the use of the ﬁrst derivative as the offset ﬁlter (Canny, 1986).The step can be deﬁned from the standard sign-function, as follows S α ( x ) = α (cid:0) x ) (cid:1) (31)The unit step function is related to the integral Φ ( u, σ ) of the normalized Gaussian function G ( x, σ ) inthe following way: Φ ( u, σ ) = (cid:90) u −∞ G ( x, σ ) d x (32) = 1 /ασ (cid:112) π/ (cid:90) ∞−∞ G ( x, σ ) S α ( u − x ) d x (33)The third integral is the convolution of G with S α , and hence Φ ( u, σ ) is proportional to the smoothedstep-edge. The linear response is given by the derivative, R α (0 , u ) = σ (cid:112) π/ /α dd u Φ ( u, σ ) (34) = α G ( u, σ ) (35)13his shows that the basic response is simply an un-normalized Gaussian, located at the step-discontinuity.The maximum response and the signed-distance function are evidently R (cid:63)α = α/ and V ( u ) = u. (36)Let (cid:0) u, R ( u ) (cid:1) be the Cartesian coordinates of the response curve. The ﬁnal response can be constructedfrom the Gaussian (35) by inserting the plateau ( ± ρ, α/ in place of the maximum point (0 , α/ . Thisis illustrated in in ﬁgure 6. x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 6: Step response.

Left column:

The top plot shows the step-edge S α ( x ) , as deﬁned in (31). Themiddle plot shows the response C ( u ) . The bottom plot shows the distance function V α ( u ) , as in (36).Note that the response is unconditionally unimodal in this case. Right column:

As before, except withGaussian noise (SD 0.075) added independently at each point. The response C ( u ) is not signiﬁcantlyaffected. The third class of signals to be considered are the sines and cosines. These are of central importance,owing to their role in the Fourier synthesis of more complicated signals. Furthermore, these functionsare used to construct the 2- D grating patterns that are commonly used to characterize complex cells. Itwill be convenient to base the analysis on the cosine function S ξ ( x ) = cos (cid:0) πξx (cid:1) (37)where ξ is the frequency. The Fourier transforms g ( x ) (cid:55)→ F x [ g ]( η ) of the ﬁlter G ( x, σ ) and signal S ξ ( x ) are F x (cid:2) G (cid:3) ( η ) = σ (cid:112) π/ G (cid:0) η, / (2 πσ ) (cid:1) (38) F x (cid:2) S ξ (cid:3) ( η ) = (cid:0) δ ( η − ξ ) + δ ( η + ξ ) (cid:1) (39)respectively, where η is the frequency variable. The convolution G (cid:63) S ξ can be obtained from theinverse Fourier transform of the product F x (cid:2) G ] F x (cid:2) S ξ (cid:3) . The resulting cosine is attenuated by a scale-factor F x (cid:2) G (cid:3) ( ξ ) , because F x (cid:2) S ξ ]( η ) is zero unless | η | = ξ . Differentiating cos(2 πξx ) with respect to x gives − sin(2 πξx ) , along with a second scale-factor of πξ . The amplitude of the linear response isgiven by the product of the two scale-factors πξ and F x (cid:2) G ]( ξ ) , which can be expressed as R (cid:63)ξ = σξπ / √ G (cid:0) ξ, / (2 πσ ) (cid:1) .

14t can be seen that the amplitude depends on the scale σ of the ﬁlter, as well as on the frequency ξ of thesignal. The complete linear response is given by R ξ (0 , u ) = − R (cid:63)ξ sin (cid:0) πξu (cid:1) . Note that a phase-shift u can be introduced, if required, by substituting u − u for u . The rectiﬁed sine | R ξ (0 , u ) | is another periodic function, of twice the frequency. The peaks of this function are separatedby a distance ξ , and so V ξ ( u ) = (cid:18) u mod 12 ξ (cid:19) − ξ (40)is a suitable distance function for the cosine signal (37). The case of sine signals is analogous, with sin replaced by cos in the linear response (3.2.3), and u replaced by u − ξ in the distance function (40). −101 x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 7: Cosine response.

Left column : The top plot shows the cosine signal S ξ ( x ) , as in (37), offrequency ξ = . The middle plot shows the response C ( u ) , which is constant. The bottom plot showsthe distance function | V ξ ( u ) | , as in (40). Note that the critical case is plotted, in which the peaks of | V ξ ( u ) | touch the line ρ = 1 . σ . The response is also constant for any higher frequency. Right column:

As before, but for a lower frequency, ξ = . The distance function now crosses the line ρ = 1 . σ , andcorresponding ‘notches’ appear in C ( u ) .It is important to see that the system response is entirely constant for frequencies that are not toolow. Speciﬁcally, the extreme values of (40), with respect to u , are ± ξ , from which it follows that theresponse is identically R (cid:63)ξ if ξ ≥ ρ . The corresponding constraint on the wavelength ξ is /ξ ≤ ρ. (41)In order to interpret this result, recall that ρ ≥ σ is required for a unimodal impulse response (30).Furthermore, in section 3.1, it was shown that ρ ≈ . σ is achievable in practice. This means that aconstant response can be expected for frequencies as low as ξ = σ . This section makes a basic evaluation of the differential model, using the objective function of ‘slowfeature analysis’ (Berkes and Wiskott, 2005; Wiskott and Sejnowski, 2002). The procedure is as follows.Each × greyscale image is decomposed into i = 1 , . . . , orientation channels θ i at scale σ = 2 pixels. This corresponds to a set of simple-cell responses S ( x , σ, θ i ) , with an angular separation15f ◦ . The steerability of S is not used (i.e. a separate convolution is done for each θ i ) in order tominimize any angular bias in the image sampling. A set of straight tracks x ijk = p j ± k ∆ × (cos θ i , sin θ i ) (cid:62) , where j = 1 , . . . , m and k = 0 , . . . , n (42)is sampled from each 2- D response. The m = 100 random points p j are sampled from a uniformdistribution over the image; the sign ± is also random. The resolution ∆ is set to one pixel, and thenumber of steps along each path is n = 99 . This gives a total of samples from each orientationchannel. The responses at non-integral positions x ijk are obtained by bilinear interpolation. The samplesare non-negative by deﬁnition, and a global scale factor γ is used to make the overall ijk -mean of γS ( x ijk , σ, θ i ) equal to . The mean simple-cell response is then computed in each orientation channel, E S ( i ) = 1 mn m (cid:88) j =1 n (cid:88) k =0 γS (cid:0) x ijk , σ, θ i (cid:1) (43)where the scaling by γ ensures that (cid:80) i E S ( i ) = . The mean quadratic variation along the paths is alsocomputed, in each orientation channel; Q S ( i ) = 1 mn m (cid:88) j =1 n (cid:88) k =1 (cid:12)(cid:12)(cid:12) γS (cid:0) x ijk , σ, θ i (cid:1) − γS (cid:0) x ij [ k − , σ, θ i (cid:1)(cid:12)(cid:12)(cid:12) . (44)The coordinates x ijk and x ij [ k − represent adjacent points (separated by ∆ ) on the j -th path in the i -thchannel. In summary, E S ( i ) measures the average response for orientation θ i , and Q S ( i ) measures theaverage spatial variation of this response in direction θ i . Slow feature analysis ﬁnds ﬁlters that minimizethe quadratic variation. The orientation tuning and slowness measurements are plotted in ﬁgures 8 and9 as a function of θ i , by attaching vertical bars ± (cid:112) Q S ( i ) to each point E i . Each test is then repeated,using the complex response C ( x , σ, θ ) in place of the simple response S ( x , σ, θ ) , giving measurements E C ( i ) and Q C ( i ) .Three test-images with a dominant global orientation are used. Firstly, a vertical cosine grating, I cos ( x, y ) = (cid:0) πξx ) (cid:1) . The range is set to ρ = 1 . σ , as usual, and the wavelength is set to ξ = 8 σ . These values do not satisfy the limit (41), which ensures that the complex response will not betrivial. The simple-cell response is shown in ﬁgure 8 (top left), and two effects should be noted. Firstly,the response is tuned to the dominant orientation θ = 0 , as can be seen from the unimodal shape of thecurve. Secondly, there is a large variation in the response when the tracks are orthogonal to the grating, asshown by the large bars around θ = 0 . This is because the ﬁlter falls in and out of phase with the imageas it moves horizontally. The corresponding complex response is shown in ﬁgure 8 (top right). It canbe seen that the orientation tuning is preserved, while the response variation is greatly reduced. Figure8 (bottom) repeats the test, but with noise added to the cosine grating, I = 0 . × I cos + 0 . × I uni ,where each pixel in I uni is independently sampled from the uniform distribution on [0 , . This meansthat a variable simple response is obtained as the ﬁlter moves in any direction, because the image is nowtruly 2- D . The complex response reduces the variation, as shown in ﬁgure 8 (bottom).Figure 9 (top) shows results for a real image which has an orientation-structure similar to that of thegrating. The results are analogous. Finally, the same test is performed on a natural image, which containsa mixture of foliage and rocks. Figure 9 (bottom) shows that, although there is no dominant orientationin the stimulus, the complex response remains much less variable than the simple response. It has been shown that a differential model of the complex cell can be constructed from the local jetrepresentation. The differential model, which works naturally in a basis of steerable ﬁlters, can be viewedas a constrained version of the Hubel and Wiesel (1962) subunit model.

The qualitative components in the present approach are similar to those of the Gabor energy model. Bothmodels are based on oriented linear ﬁlters, which are centred at the same position. Likewise, both models16 y - p p . . . R e s pon s e x y - p p . . . y - p p . . . Orientation R e s pon s e y - p p . . . Orientation

Figure 8: Cosine response.

Top left:

Average simple cell response E S ( i ) to a vertical cosine grating ofwavelength σ . The curve indicates the mean response in each of 36 orientation channels θ i , and has aclear peak at zero. The vertical bars ± (cid:112) Q S ( i ) indicate the RMS spatial variation of the response in thepreferred direction of each orientation channel.

Top right:

Complex cell response E C ( i ) to the sameimage. The orientation tuning is preserved, but the variability ± (cid:112) Q C ( i ) of the response is greatlyreduced. Bottom left, right:

As before, but with noise added to the image.output a combination of the nonlinearly transformed ﬁlter responses. The derivative operators G k ( x, σ ) ,in the differential model, are interpreted as simple cells, at a common location x (Hawken and Parker,1987; Young et al., 2001). These are constructed from LGN inputs, according to the classical model(Hubel and Wiesel, 1962). The offset ﬁlters F ( t i , x ) have two possible interpretations, shown in ﬁg. 10,as follows.Firstly, the offset ﬁlters F ( t i , x ) can be interpreted as an intermediate layer of simple cells, each ofwhich has an RF that is a linear combination of other simple cell RFs. Let the row-vector f (cid:62) i represent F ( t i , x ) , and let s be the signal vector. It follows that the linear response is r i = f (cid:62) i s (45)where f (cid:62) i = p (cid:62) i G , according to the ﬁlter-design equation F = PG in (15). The task of the complex cell C is to compute the (absolute) maximum of the linear responses, r i . This interpretation corresponds tothe HMAX model (Lampl et al., 2004; Riesenhuber and Poggio, 1999), but with additional relationshipsimposed on the underlying simple cells. An advantage of this interpretation is that it predicts a majorityof simple cells with few oscillations, as is observed (Ringach, 2002). This follows from the fact that theoffset ﬁlters are of lower order than their derivatives, together with the fact that an unlimited number ofoffsets can be obtained from a given derivative basis. Furthermore, suppose that a number of complexcells (e.g. of different orientations) C n are constructed from the same basis G . A new layer of low-orderoffset ﬁlters F n is required for each complex cell, and so the high-order ﬁlters in G are soon outnumberedin the ensemble { G , F , F , . . . } .An alternative physiological implementation is as follows. Suppose that the complex cell has i =1 . . . , M basal dendrites, each of which branches out to the K simple cells G k ( x, σ ) . The linear response17 y - p p . . . R e s pon s e x y - p p . . . y - p p . . . Orientation R e s pon s e y - p p . . . Orientation

Figure 9: Image response.

Top:

As in ﬁgure 8, but using a real image that contains a dominant verticalorientation. Left: The simple response shows variation across all orientation channels Q S ( i ) . Right: Thevariation of the complex response Q C ( i ) is much lower. Bottom:

As before, but using a natural image,with no dominant orientation.can then be expressed as r i = p (cid:62) i ( Gs ) (46)where Gs , the Gaussian jet response, is computed ﬁrst. This interpretation requires no intermediatesimple cells. Instead, it places a ﬁxed weight P ik on each dendritic branch, and requires a summation tobe performed within each dendrite. This seems to be quite compatible with the observed cell morphology;a typical complex cell has a small number of basal dendrites which, unlike those of simple cells, areextensively branched (Gilbert and Wiesel, 1979; Kelly and van Essen, 1974).The dendritic interpretation is more economical in the number of simple cells required, but lesscompatible with the observed simple-cell statistics (Ringach, 2002). It should be noted that a mixture ofthe two interpretations in ﬁg. 10 is quite possible. For example, there could be a few intermediate simplecells, with the remaining ﬁlters implemented by dendritic summation. In all cases, each complex cell isassociated with odd and even simple cells g k , as is observed (Pollen and Ronner, 1981).The maximum (10) over the | r i | can be approximated by a barycentric combination, (cid:80) i w i | r i | . Anappropriate vector of weights can be computed as w i = | r i | β (cid:14) ( α + (cid:80) i | r i | β ) . This is a form of ‘softmax’(Bridle, 1989), with parameters α and β , as described in (Hansard and Horaud, 2010). Several neuralmodels of this operation have been proposed (Riesenhuber and Poggio, 1999; Yu et al., 2002). Forexample, the w i can be interpreted as the outputs of a network of mutually inhibitory neurons, whichreceive copies of the subunit responses r i . There is experimental evidence for similar arrangements, withrespect to both simple and complex cells (Carandini and Heeger, 1994; Heeger, 1992a; Lampl et al.,2004). 18igure 10: Two neural implementations of the complex cell C . These are schematic representations, withreduced numbers of derivative ﬁlters g k and offset ﬁlters f i . Left: The offset ﬁlters are identiﬁed with anintermediate layer of simple cells. Right: The offset ﬁlters are implicit in the weighted sums p (cid:62) G and p (cid:62) G performed by the dendritic tree of C . This section will demonstrate that the response of the new model, to broad-band stimuli, can easily bedistinguished from that of the standard energy model. Some qualitative predictions will also be discussed.The response of the new model to sinusoidal stimuli is similar to that of the energy model. Bothresponses are phase-invariant, provided that the stimulus frequency is not too low (see ﬁg. 7). Consider,however, a luminance step-edge that is ﬂashed (or shifted) across the RF. The Gabor energy is approxi-mately Gaussian, as a function of the edge-position, with a peak in the centre of the RF. The differentialresponse is much ﬂatter, as shown in ﬁg. 6. This suggests that an empirical measure of kurtosis could beused to distinguish between the two responses, as will be shown below.Let the odd-symmetric Gabor ﬁlter be deﬁned as F (cid:107) ( x, ξ, τ ) = − G ( x, τ ) sin(2 πξx ) , so that itmatches the polarity of G . The even ﬁlter, following (Lehky et al., 2005), is deﬁned by the numericalHilbert transform F ⊥ = H ( F (cid:107) ) , in order to avoid the nonzero DC component that arises in the cosine-based deﬁnition. The Gabor energy of a signal S is then R F = ( F (cid:107) (cid:63) S ) + ( F ⊥ (cid:63) S ) . The envelopewidth τ of each Gabor ﬁlter is determined from the constraint that the bandwidth be equal to 1.5, whichis realistic for complex cells (Daugman, 1985). A self-similar family of Gabor pairs, parameterized byfrequency ξ , can now be deﬁned. Quasi-Newton optimization is used to determine a frequency ξ , forwhich F (cid:107) ( x, ξ , τ ) ≈ G ( x, in the L sense. Ten Gabor pairs with frequencies ξ k = ξ − k ∆ ξ areconstructed, where k = 0 , . . . . The corresponding differential model, with target ﬁlter G ( x, ξ /ξ k ) , isalso constructed for each pair. Four possible ranges are considered for the differential models, by setting ρ/σ = 1 . , 1.25, 1.5, 1.75.Let p ξ ( u ) be the response distribution, which gives the ﬁring-rate for an image-edge, as a function ofits offset u from the centre u = 0 of the complex RF. If P ξ ( u ) is the cumulative distribution (cid:82) p ξ ( u ) d u ,then the (uncentred) kurtosis κ of any p ξ ( u ) can be estimated (Crow and Siddiqui, 1967) by K ( p ξ ) = P − ξ (1 − a ) − P − ξ ( a ) P − ξ (1 − b ) − P − ξ ( b ) . (47)The values a = 0 . and b = 0 . are used here (making K the ratio of the 95% and 50% conﬁdenceintervals). This estimate, which can be computed by linear-interpolation between the samples of P ξ ( u ) ,has the following interpretation. Suppose that the offsets u a and u b are associated with ﬁring-rates a and b respectively. The response distributions are symmetric, and so the statistic (47) is simply the distanceratio K ( p ξ ) = u a /u b , as illustrated in ﬁg. 11. The kurtosis could also be estimated from the empiricalmoments of p ξ ( u ) , but (47) is much less sensitive to noise in the tails of the distribution.The distributions considered here lie in and around the range from the uniform distribution ( κ = , K = 1 . ), to the normal distribution ( κ = 3 , K ≈ . ). It can be seen from ﬁg. 11 that the Gaborresponse is approximately Gaussian, whereas the differential responses are much ﬂatter. Furthermore,note that the line ρ = σ in ﬁg. 11 shows the maximum kurtosis of the differential model (determined by19 F ( x ) p ( x ) l u a u b K ( p x ) x x . . . . l l l l l l l l l l GaussianUniform C, r = s C, r = s C F2 C F l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l Figure 11: Top Left: Gaussian Derivative (black) and Gabor (grey) ﬁlters, matched subject to the band-width condition. Bottom Left: The kurtosis statistic (47) is the ratio of the two horizontal lines, shownhere on a Gaussian distribution. Right: Kurtosis of the edge-response. Each line represents a complexcell model, parameterized by preferred frequency ξ . The top pair is obtained from the Gabor energyand its square-root. The bottom pair delimits the range of possible differential models. The Gabor anddifferential responses are easily distinguished. The dashed lines are estimates for reference distributions.30), which is still much lower than that of the Gabor energy. Furthermore, it can be seen that the kurtosisis approximately independent of frequency, which simpliﬁes the comparision. It should be noted that amuch better ﬁt between G and F (cid:107) can be obtained if the bandwidth constraint is relaxed. This however,makes the energy responses even more kurtotic.The differential model makes several predictions about the conﬁguration of simple and complex cells.Firstly, like the energy model, it predicts that both odd and even ﬁlters are required by the complex cell.Unlike the energy model, it does not require an exact quadrature relationship (indeed, the G k basis isnot orthogonal). More generally, an important property of the differential model is its robustness todeviations from the ideal simple-cell RF proﬁles. Derivative of Gaussian basis was used, in the presentderivation, for mathematical clarity. However, all that is required is a basis that spans the space of desiredsubunit ﬁlters F ( t i , x ) .The differential model also predicts a relationship between the scale σ of the subunits and the radius ρ of the resulting complex receptive ﬁeld. This prediction, as in the case of the energy model, is probablytoo strict (i.e. larger complex receptive ﬁelds should be possible). However, as discussed in the followingsection, the complex receptive ﬁelds can be extended by allowing multiple scales σ j in the basis set ofthe differential model.A qualitative prediction of the present model is that high-order derivative ﬁlters are required, in orderto approximate the target ﬁlter over a sufﬁcient range ρ . In particular, it was shown in section 3.2.1 that,for a unimodal impulse response, ρ ≥ σ is required. This means, in practice, that derivative ﬁlters oforder ﬁve and beyond must be used in the approximation, as can be seen from ﬁgure 4. This is interesting,because very oscillatory ﬁlters have been observed in V1 (Young and Lesperance, 2001). These have anatural role as high-frequency processors in the Gabor model. Their role is less clear in the geometricapproach, because estimates of the high-order image derivatives are of limited use. The present worksuggests that these ﬁlters could have a different role, in providing a basis for spatially offset ﬁlters of loworder. There are several directions in which this model could be developed. One straightforward extension isto allow ﬁlters of different scales (as well as different orders) in the basis set. Preliminary experiments20onﬁrm that this extends the range ρ of translation invariance, as would be expected. This means thatthe complex cell receptive ﬁeld could be made larger, relative to those of the underlying simple cells.Another extension would be to allow a variety of offset-ﬁlter shapes (with odd, even & mixed symmetry),rather than just the ﬁrst derivative used here. This would lead to better agreement with the physiologicaldata, which indicates a variety of receptive ﬁeld shapes among the complex subunits (Gaska et al., 1987;Sasaki and Ohzawa, 2007; Touryan et al., 2005). It would also be interesting to explore the relationshipof the present work to the normalization model (Heeger, 1992b; Rust et al., 2005), and to other modelsof motion and spatial processing (Georgeson et al., 2007; Johnston et al., 1992).The present work has concentrated on local shift-invariance, because this is a deﬁning characteristicof complex cells. However, mechanisms that have other geometric invariances can be constructed in thesame scale space framework. For example, consider the effect of a geometric scaling ( x, y ) → ( αx, αy ) on the operator G ( x, y, σ, θ ) , which represents the k -th derivative of the normalized 2- D Gaussian,in direction θ . The scaling α has no effect on the shape of the RF, as can be seen from the equation G k ( αx, αy, ασ, θ ) = G k ( x, y, σ, θ ) (cid:14) α k . This leads to simple relationships between the responses ofthe RF family G k ( x, y, σ (cid:96) , θ ) , where (cid:96) = 1 , . . . L deﬁnes a range of scales (Koenderink and van Doorn,1992; Lindeberg, 1998). Future work will consider more complicated geometric invariances (e.g. 2- D afﬁne), in connection with the larger RFs that are found in extrastriate areas.Another direction would be to consider how the differential model could be learned from naturalimage data, by analogy with (Berkes and Wiskott, 2005; Karklin and Lewicki, 2009; Wiskott and Se-jnowski, 2002). This could be done by ﬁxing the local jet ﬁlters (i.e. simple cells), and then optimizingthe linear transformation P . The transformation could be parameterized by coefﬁcients C , given a basis B of smooth functions (e.g. the polynomials that were used here). Alternatively, P could be optimizeddirectly, subject to smoothness constraints on the columns P k ( t ) . The variability of the response C ( u ) would be a suitable objective function for the learning process, by analogy with slow-feature analysismodels (Berkes and Wiskott, 2005; Wiskott and Sejnowski, 2002). The combination of the geometricand statistical approaches to image analysis is, more generally, a very promising aim. Acknowledgements

The authors would like to thank the reviewers for their help with the manuscript.

References

Adelson, E. H. and Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion.

J.Opt. Soc. Am. A , 2(2):284–299.Alonso, J.-M. and Martinez, L. M. (1998). Functional connectivity between simple cells and complexcells in cat striate cortex.

Nature Neuroscience , 1(5):395–403.Atherton, T. J. (2002). Energy and phase orientation mechanisms: A computational model.

SpatialVision , 15(4):415–441.Ben-Shahar, O. and Zucker, S. W. (2004). Geometrical Computations Explain Projection Patterns ofLong Range Horizontal Connections in Visual Cortex.

Neural Computation , 16(3):445–476.Berkes, P. and Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties.

Journal of Vision , 5(6):579–602.Bridle, J. S. (1989). Probabilistic interpretation of feedforward classiﬁcation network outputs, with re-lationships to statistical pattern recognition. In Fougelman-Soulie, F. and H´erault, J., editors,

Neuro-computing: Algorithms, Architectures and Applictions . Springer Verlag.Canny, J. (1986). A computational approach to edge detection.

IEEE Transactions on Pattern Analysisand Machine Intelligence , 8(6):679–698.Carandini, M. (2006). What simple and complex cells compute.

J. Physiology , 577(2):463–466.21arandini, M., Demb, J. B., Mante, V., Tolhurst, D. J., Dan, Y., Olshausen, B. A., Gallant, J. L., and Rust,N. (2005). Do we know what the early visual system does?

J. Neuroscience , 25:10577–10597.Carandini, M. and Heeger, D. (1994). Summation and Division by Neurons in Primate Visual Cortex.

Science , 264:1333–1336.Crow, E. and Siddiqui, M. (1967). Robust Estimation of Location.

Journal of the American StatisticalAssociation , 62(318):353–389.Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientationoptimized by two-dimensional visual cortical ﬁlters.

J. Optical Soc. America , 2(7):1160–1169.Dayan, P. and Abbott, L. F. (2001).

Theoretical Neuroscience . MIT Press.Dobbins, A., Zucker, S. W., and Cynader, M. S. (1987). Endstopped neurons in the visual cortex as asubstrate for calculating curvature.

Nature , 329:438–441.Emerson, R. C., Bergen, J. R., and Adelson, E. H. (1992). Directionally Selective Complex Cells and theComputation of Motion Energy in Cat Visual Cortex.

Vision Research , 32(2):203–218.Felsberg, M. and Sommer, G. (2001). The Monogenic Signal.

IEEE Transactions on Signal Processing ,49(12):3136–3144.Fleet, D. J., Wagner, H., and Heeger, D. J. (1996). Neural encoding of binocular disparity: Energymodels, position shifts and phase shifts.

Vision Research , 36(12):1839–1857.Freeman, W. T. and Adelson, E. H. (1991). The design and use of steerable ﬁlters.

IEEE Trans. PAMI ,13(9):891–906.Gaska, J. P., Pollen, D. A., and Cavanagh, P. (1987). Diversity of complex cell responses to even-and odd-symmetric luminance proﬁles in the visual cortex of the cat.

Experimental Brain Research ,68:249–259.Georgeson, M. A., May, K. A., Freeman, T. C. A., and Hesse, G. S. (2007). From ﬁlters to features:Scale–space analysis of edge and blur coding in human vision.

Journal of Vision , 7(13):1–21.Gilbert, C. and Wiesel, T. (1979). Morphology and Intracortical Projections of Fucntionally Charac-terised Neurones in the Cat Visual Cortex.

Nature , 280(5718):120–125.Hansard, M. and Horaud, R. (2010). Complex Cells and the Representation of Local Image-Structure.Technical Report 7485, INRIA.Harris, C. and Stephens, M. (1988). A combined corner and edge detector. In

Proc. 4th Alvey VisionConference , pages 147–151.Hawken, M. J. and Parker, A. J. (1987). Spatial properties of neurons in the monkey striate cortex.

Proc.R. Soc. Lond. B , B 231:251–288.Heeger, D. J. (1992a). Half-Squaring in responses of cat striate cells.

Visual Neuroscience , 9:427–443.Heeger, D. J. (1992b). Normalization of Cell Responses in Cat Striate Cortex.

Visual Neuroscience ,9:181–197.Hubel, D. H. and Wiesel, T. N. (1962). Receptive ﬁelds, binocular interaction and functional architecturein the cat’s visual cortex.

J. Physiology , 160:106–54.Johnston, A., McOwan, P. W., and Buxton, H. (1992). A computational model of the analysis of someﬁrst-order and second-order motion patterns by simple and complex cells.

Proc. R. Soc. Lond. B ,250:297–306.Jones, J. P. and Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor ﬁlter model of simplereceptive ﬁelds in cat striate cortex.

J. Neurophysiology , 58(6):1233–1258.22arklin, Y. and Lewicki, M. S. (2009). Emergence of complex cell properties by learning to generalizein natural scenes.

Nature , 457:83–86.Kelly, J. and van Essen, D. (1974). Cell Structure and Function in the Visual Cortex of the Cat.

J.Physiology , 238:515–547.Kjaer, T. W., Gawne, T. J., Hertz, J. A., and Richmond, B. J. (1997). Insensitivity of V1 Complex CellResponses to Small Shifts in the Retinal Image of Complex Patterns.

J. Neurophysiology , 78:3187–3197.Koenderink, J. and van Doorn, A. (1992). Generic Neighborhood Operators.

IEEE Transactions onPattern Analysis and Machine Intelligence , 14(6):597–605.Koenderink, J. J. (1984). The structure of images.

Biological Cybernetics , 50:363–370.Koenderink, J. J. and van Doorn, A. J. (1987). Representation of local geometry in the visual system.

Biological Cybernetics , 55:367–375.Koenderink, J. J. and van Doorn, A. J. (1990). Receptive ﬁeld families.

Biological Cybernetics , 63:291–297.Lampl, I., Ferster, D., Poggio, T., and Riesenhuber, M. (2004). Intracellular Measurements of SpatialIntegration and the MAX Operation in Complex Cells of the Cat Primary Visual Cortex.

J. Neurophys-iology , 92:2704–2713.Lehky, S. R., Sejnowski, T. J., and Desimone, R. (2005). Selectivity and sparseness in the responses ofstriate complex cells.

Vision Research , 45:57–73.Lindeberg, T. (1998). Edge Detection and Ridge Detection with Automatic Scale Selection.

InternationalJournal of Computer Vision , 30(2):117–154.Martinez, L. M. and Alonso, J.-M. (2003). Complex Receptive Fields in Primary Visual Cortex.

TheNeuroscientist , 9(5):317–331.De Valois, R. L., Albrecht, D. G., and Thorell, L. G. (1982). Spatial frequency selectivity of cells inmacaque visual cortex.

Vision Research , 21:545–559.Mechler, F., Reich, D. S., and Victor, J. D. (2002). Detection and Discrimination of Relative SpatialPhase by V1 Neurons.

J. Neuroscience , 22(14):6129–6157.Mechler, F. and Ringach, D. L. (2002). On the Classiﬁcation of Simple and Complex Cells.

VisionResearch , 42:1017–1013.Morrone, M. and Burr, D. (1988). Feature detection in human vision: A phase dependent energy model.

Proc. R. Soc. Lond. B. , 235:221–245.Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978a). Receptive ﬁeld organization of complexcells in the cat’s striate cortex.

J. Physiology , 283:79–99.Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978b). Spatial summation in the receptive ﬁeldsof simple cells in the cat’s striate cortex.

J. Physiology , 283:53–77.Orban, G. A. (2008). Higher order visual processing in macaque extrastriate cortex.

PhysiologicalReviews , 88:59–89.Perona, P. (1995). Deformable kernels for early vision.

IEEE Trans. PAMI , 17(5):488–499.Petitot, J. (2003). The Neurogeometry of Pinwheels as a Sub-Riemannian Contact Structure.

J. PhysiolParis , 97:265–309.Pollen, A. D. and Ronner, S. F. (1983). Visual cortical neurons as localized spatial frequency ﬁlters.

IEEE Transactions on Systems, Man & Cybernetics , 13:907–916.23ollen, D. A. and Ronner, S. F. (1981). Phase relationships between adjacent simple cells in the visualcortex.

Science , 212(4501):1409–1411.Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).

Numerical Recipes in C .Cambridge University Press, 2nd edition.Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex.

NatureNeuroscience , 2(11):1019–1025.Ringach, D. (2002). Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque Pri-mary Visual Cortex.

J. Neurophysiology , 88:455–463.Rust, N., Schwartz, O., Movshon, J., and Simoncelli, E. (2005). Spatiotemporal Elements of MacaqueV1 Receptive Fields.

Neuron , 46:945–956.Sasaki, K. S. and Ohzawa, I. (2007). Internal Spatial Organization of Receptive Fields of Complex Cellsin the Early Visual Cortex.

J. Neurophysiology , 98:1194–1212.Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). Shiftable Multiscale Trans-forms.

IEEE Transactions on Information Theory , 38(2):587–607.Skottun, B. C., Valois, R. L. D., Grosof, D. H., Movshon, J. A., Albrecht, D. G., and Bonds, A. B.(1991). Classifying Simple and Complex Cells on the basis of Response Modulation.

Vision Research ,31(7/8):1079–1086.Spitzer, H. and Hochstein, S. (1988). Complex cell receptive ﬁeld models.

Progress in Neurobiology ,31:285–309.Touryan, J., Felsen, G., and Dan, Y. (2005). Spatial Structure of Complex Cell Receptive Fields Measuredwith Natural Images.

Neuron , 45:781–791.Wiskott, L. and Sejnowski, T. (2002). Slow feature analysis: unsupervised learning of invariances.

NeuralComputation , 14(4):715–770.Wundrich, I. J., von der Malsburg, C., and W¨urtz, R. P. (2004). Image Representation by Complex CellResponses.

Neural Computation , 16(4):2563–2575.Young, R. A. and Lesperance, R. M. (2001). The Gaussian derivative model for spatial-temporal vision:II. Cortical data.

Spatial Vision , 14(3):321–389.Young, R. A., Lesperance, R. M., and Meyer, W. W. (2001). The Gaussian derivative model for spatial-temporal vision: I. Cortical model.

Spatial Vision , 14(3):261–319.Yu, A., Giese, M., and Poggio, T. (2002). Biophysiologically Plausible Implementations of the MaximumOperation.