A Differential Model of the Complex Cell
11 A Differential Model of the Complex Cell
Miles Hansard and Radu Horaud I NRIA
Rhˆone-Alpes, 655 Avenue de l’Europe, Montbonnot, France 38330.
Keywords:
Spatial vision, complex cells, image processing.
Abstract
The receptive fields of simple cells in the visual cortex can be understood as linear filters. These filters canbe modelled by Gabor functions, or by Gaussian derivatives. Gabor functions can also be combined in an‘energy model’ of the complex cell response. This paper proposes an alternative model of the complexcell, based on Gaussian derivatives. It is most important to account for the insensitivity of the complexresponse to small shifts of the image. The new model uses a linear combination of the first few derivativefilters, at a single position, to approximate the first derivative filter, at a series of adjacent positions. Themaximum response, over all positions, gives a signal that is insensitive to small shifts of the image. Thismodel, unlike previous approaches, is based on the scale space theory of visual processing. In particular,the complex cell is built from filters that respond to the 2- D differential structure of the image. Thecomputational aspects of the new model are studied in one and two dimensions, using the steerabilityof the Gaussian derivatives. The response of the model to basic images, such as edges and gratings, isderived formally. The response to natural images is also evaluated, using statistical measures of shiftinsensitivity. The relevance of the new model to the cortical image representation is discussed. It is useful to distinguish between simple and complex cells in the visual cortex, as proposed by Hubeland Wiesel (1962). The original classification is based on four characteristic properties of simple cells,as follows. Firstly, the receptive field has distinct excitatory and inhibitory subregions. Secondly, thereis spatial summation within each subregion. Thirdly, there is antagonism between the excitatory andinhibitory subregions. Fourthly, the response to any stimulus can be predicted from the receptive fieldmap. Cells that do not show these characteristics can be classified, by convention, as complex.The distinction between simple and complex cells has endured, subject to certain qualifications (al-though an alternative view is described by Mechler and Ringach, 2002). In particular, it has been arguedthat the principle characteristic of complex cells is the phase invariance of the response (Carandini, 2006;Movshon et al., 1978a). This means that a complex cell, which is tuned to a particular orientation andspatial frequency, is not sensitive to the precise location of the stimulus within the receptive field (Kjaeret al., 1997; Mechler et al., 2002). If the stimulus consists of a moving (or flickering) grating, then phaseinvariance can be quantified by the relative modulation a /a , where a is the amplitude associated withthe fundamental frequency of the response, and a is the mean response (De Valois et al., 1982; Movshonet al., 1978a; Skottun et al., 1991). If this ratio is less than one, then the cell can be classified as complex.The standard model of the simple cell is based on a linear filter that is localized in position, spatialfrequency and orientation. The output of this filter is subject to a nonlinearity, such as squaring, and mayalso be normalized by the responses of nearby simple cells. The theoretical framework of this model iswell advanced (as reviewed by Carandini et al., 2005; Dayan and Abbott, 2001), owing to the assumedlinearity of the underlying spatial filters. Although the physiology of the complex cell is increasinglywell-understood, (as reviewed by Alonso and Martinez, 1998; Martinez and Alonso, 2003; Spitzer andHochstein, 1988), the appropriate theoretical framework is less clear. It is useful to adopt the distinction a r X i v : . [ q - b i o . N C ] D ec etween ‘position’ and ‘phase’ models that was made by Fleet et al. (1996) in the analysis of binocularprocessing. It is invariance to position or phase that is of interest in the present context.A position-invariant model of the complex cell can be constructed from a set of simple cells, asdescribed by Hubel and Wiesel (1962). Suppose, for example, that the simple cells are represented bylinear filters of a common orientation, but different spatial positions. If the responses of these filtersare summed, then the corresponding complex cell will be tuned to an oriented element that appearsanywhere in the union of the simple cell receptive fields (Spitzer and Hochstein, 1988). This scheme isthe basis of the subunit model , which was further developed and tested by Movshon et al. (1978a,b). Theresults of these experiments are consistent with the idea that the complex response is based on a groupof spatially linear subunits. The most straightforward way to combine the individual responses would beby rectifying and summing them, but the maximum could also be taken (Riesenhuber and Poggio, 1999).The main disadvantage of the subunit model is that it is too general in its basic form. For example, itallows arbitrarily complicated receptive fields to be constructed, as there is no intrinsic constraint on thepositions, orientations or spatial frequencies of the subunits.Alternatively, a phase-invariant model can be constructed from a pair of filters of different shapes((Adelson and Bergen, 1985)). If the filters have a quadrature relationship, then their responses to asinusoidal stimulus will differ in phase by π/ . It follows that the sum of the squared outputs will beinvariant to the phase of the input. The application of this energy model to spatial vision (Atherton, 2002;Emerson et al., 1992; Morrone and Burr, 1988; Wundrich et al., 2004) is motivated by the observedphase-differences in the receptive fields of adjacent simple cells Pollen and Ronner (1981). Indeed,these receptive fields can be well-represented by odd and even Gabor functions (Daugman, 1985; Jonesand Palmer, 1987; Pollen and Ronner, 1983). There are two problems with the energy model, in thepresent context. Firstly, there is no generally agreed way to combine energy mechanisms across differentfrequencies and orientations (one approach is described by Fleet et al., 1996). This is an obstacle to theconstruction of mechanisms that show more complicated invariances, such as those found in areas MTand MST (Orban, 2008). Secondly, the quadrature filters that are best-suited to the energy model arenot convenient for the general description of 2- D image-structure. The concept of phase itself becomessomewhat complicated in two or more dimensions (Felsberg and Sommer, 2001), and the quadraturerepresentation of more complex image-features, such as edge curvature, is unclear. It must be emphasizedthat 2- D images contain important structures (e.g. luminance saddle-points) that have no analogue in the1- D signals to which the energy model is ideally suited. A realistic framework for spatial vision must becapable of representing the full variety of 2- D structures (Ben-Shahar and Zucker, 2004; Dobbins et al.,1987; Petitot, 2003).The present work is motivated by the difficulty of extending the energy model to more complicated2- D image-features and spatial transformations. These extensions require a representation of the localimage geometry , rather than the local phase and frequency structure. The new approach, like the energymodel, is based on a set of odd and even linear filters that are located at the same position. The outputs ofthese filters are nonlinearly combined, again as in the energy model. The combination, however, involvesthe implicit construction of spatially-offset subunits. The complete model, in this sense, can be seen as are-formulation of the Hubel and Wiesel (1962) scheme. A minimal overview of the new model will now be given. Let S ( x ) be the original scalar image,where x = ( x, y ) (cid:62) , and consider a spatial array of simple cells, parameterized by preferred frequencyand orientation. These receptive fields will be modelled by k -th order directional derivatives of theGaussian kernel, G k ( x , σ, θ ) = ( v · ∇ ) k G ( x , σ ) where σ is the spatial scale, θ is the orientation, and v = (cos θ, sin θ ) (cid:62) . The simple cell representation S k ( x , σ, θ ) is given by the convolution of these filterswith the image: S k ( x , σ, θ ) = G k ( x , σ, θ ) (cid:63) S ( x ) . (1)In particular, consider the magnitude of the first derivative signal, | S ( x , σ, θ ) | . This will be large if thereis a step-like edge at x , with the luminance boundary perpendicular to v . Now suppose that the edgeis shifted by some amount in direction v . This means that the magnitude | S ( x , σ, θ ) | will fall, but thenonlinear function C ( x , σ, θ ) = max t (cid:12)(cid:12) S ( x + t v , σ, θ ) (cid:12)(cid:12) , where | t | ≤ ρ (2)2ill remain large, unless the shift exceeds the range ρ . Equations (1 & 2) will be the basic models ofsimple cells S k ( x , σ, θ ) and complex cells C ( x , σ, θ ) in this paper (full derivations are given in sec. 2).The complex cell, which inherits the scale and orientation tuning ( σ, θ ) , has a receptive field of radius ρ ,centred on position x . It can be seen that (2) is just a special case of the Hubel and Wiesel (1962) subunitmodel, with simple cells distributed along the spatial axis v , and ‘max’ being the combination rule. It hasalready been argued, in section 1, that this model is too general. For example, there is no natural limit onthe size ρ of the complex receptive field in (2).Suppose, however, that access to the first-order directional structure around position x is replacedby access to the higher-order directional structure at position x . Mathematically, this means that thefunction S ( x + t v , σ, θ ) of the scalar t is replaced by the values S k ( x , σ, θ ) indexed by k = 1 , . . . , K .This is interesting for three reasons: Firstly, the model becomes inherently local, because the filters G k that compute the values S k are now centred at the same point x . Secondly, the filters G k are symmetricor antisymmetric about the point x , and resemble the Gabor functions used in the energy model. Thirdly,the values S k can be obtained from a linear transformation of the K -th order local jet at x , and so thisscheme is compatible with the scale space theory described above.To be more specific, it will be shown that the first-order structure in the neighbourhood x + t v , as in(2), can be estimated from a linear combination of the directional derivatives S k , S ( x + t v , σ, θ ) ≈ K (cid:88) k =1 P k ( t ) S k ( x , σ, θ ) (3)where the functions P k ( t ) are fixed polynomials. This approximation will then be substituted into theright-hand side of (2). It will be shown in section 2.2 that the approximation (3) can be motivated by aMaclaurin expansion in powers of t . This can also be interpreted, as shown in figure 1, as the synthesisof spatially offset filters, using the Gaussian derivatives as a basis. A matrix formulation of this modelwill be given in section 2.3. An optimal (and image-independent) construction of the polynomials P k ( t ) will be given in section 2.4. The case in which the derivatives on the right-hand side of (3) are in anotherdirection φ (cid:54) = θ , is treated in section 2.5. The model presented in this paper is quite different from the previous approaches, as explained above.The main contribution is a ‘differential’ model of the complex cell, which is exactly steerable, and whichfits naturally into the geometric approach to image analysis (Koenderink and van Doorn, 1987, 1990).This shows that it is possible to analyze the local image geometry, and to obtain a shift-invariant response,using a common set of filters.The body of the paper is organized as follows. The new model is developed in section 2, using linearalgebra and least-squares optimization. The accuracy of the estimated filters is evaluated in section 3.This section also derives the exact response of the ideal filters to several basic stimuli. Some preliminaryexperiments on natural images are reported. The biological interpretation of the model, and its predic-tions, are discussed in section 4. Future directions, and the relationship of the model to scale-space theory(Koenderink, 1984), are also discussed.
The following notation will be used here. Matrices and vectors are written in bold, e.g. M , v , where M (cid:62) is the transpose, and M + is the Moore-Penrose inverse (Press et al., 1992). The convolution of functionsis F ( x ) (cid:63) G ( x ) = (cid:82) ∞−∞ F ( x − y ) G ( y ) d y . Some properties of the Gaussian derivatives G k ( x, σ ) willnow be reviewed. There is no particular spatial scale at which a natural image should be analyzed.It is therefore desirable to represent the image in a scale space , so that a range of resolutions can beconsidered (Koenderink and van Doorn, 1987). The preferred way to do this is by convolution witha Gaussian kernel. It follows that the structure of the image, at a given scale, can be analyzed viathe spatial derivatives of the corresponding Gaussian. The k -th order derivatives of a 1- D Gaussian,3 k = d k / d x k G can be expressed as G k ( x, σ ) = (cid:18) − σ √ (cid:19) k H k (cid:18) xσ √ (cid:19) G ( x, σ ) (4) G ( x, σ ) = exp (cid:18) − x σ (cid:19) (5)where G ( x, σ ) is the original Gaussian, k is a positive integer, and H k ( x ) is the k -th Hermite polyno-mial. The first seven Gaussian derivatives are shown in column one of figure 1. It will also be useful tointroduce two normalizations of the Gaussian derivatives. G k ( x, σ ) = 1 σ √ π G k ( x, σ ) and G k ( x, σ ) = 12 G k ( x, σ ) (6)which are defined so that (cid:82) | G kk ( x ) | d x = 1 . In particular, G and G are the L -normalized blurringand differentiating filters, respectively. This superscript notation will not be used unless a particularnormalization is important (e.g. in sec. 3.2).The two-dimensional Gaussian derivative, in direction θ with x = ( x, y ) (cid:62) , will be written G k ( x , σ, θ ) =( v · ∇ ) k G ( x , σ ) , as in section 1.1. Two special properties of these filters should be noted. Firstly, thefilter G k ( x , σ, θ ) is separable in the local coordinate-system that is defined by the direction of differ-entiation. This means that the 2- D filter can be obtained from the product of 1- D filters G k ( x θ , σ ) and G ( y θ , σ ) . Secondly, the Gaussian derivatives are steerable , meaning that G k ( x , σ, θ ) can be obtainedfrom a linear combination of derivatives in other directions, G k ( x , σ, φ j ) , where j = 1 , . . . , k + 1 . Thesefacts make it possible to analyze a multidimensional filter, in many cases, in terms of 1- D functions(Freeman and Adelson, 1991).The first derivative, G , will be used as the basic model of a complex subunit (which is also a simplecell receptive field). This choice is motivated by two observations. Firstly, it is well established thatgradient filters can be used to detect edges, as well as more complex image-features (Canny, 1986;Harris and Stephens, 1988). Secondly, G is the first zero-mean filter in the local-jet representationof the image, which is physiologically and mathematically convenient. The extension to higher-ordersubunits is straightforward, as discussed in section 4.3. This section will put the system of simple cells, introduced in section 1.1, into a standard signal pro-cessing framework. This will be done in 1- D , in order to simplify the notation. The extension to 2- D isstraightforward. The 1- D version of the simple cell response (1) is S k ( u, σ ) = (cid:82) ∞−∞ G k ( u − x, σ ) S ( x )d x .If k = 1 , and the filter G is offset by an amount t , then the convolution can be expressed as S ( u − t, σ ) = (cid:90) ∞−∞ G ( u − t − x, σ ) S ( x ) d x = − (cid:90) ∞−∞ G ( x − t, σ ) S ( x − u ) d x. (7)Here the antisymmetry of G ( x, σ ) has been used to show that the result is equal to the negative corre-lation of the filter and signal. It is evident that if t could be kept equal to u , then the signal shift wouldhave no effect on the result. The prototypical stimulus for G is the step-edge S ( x ) = sgn( x ) . In thiscase the filter and signal are anti-correlated, and so the final response (7) is non-negative. It was established in the section 2.1 that the response S ( u − t, σ ) can be constructed from the offset filters G ( x − t, σ ) . This means that the desired approximation (3) can be treated as a filter-design problem.The following notation will be adopted for the offset filters: F (0 , x ) = G ( x, σ ) F ( t, x ) ≈ G ( x − t, σ ) (8)4hich also depend on the spatial scale and derivative order, but it will not be necessary to make thisexplicit in the notation. It will suffice to analyze a single filter which, without loss of generality, islocated at the origin x = (0 , (cid:62) of the spatial coordinate-system. The linear response of this filter isdefined in relation to (7) as R ( t, u ) = − (cid:90) ∞−∞ F ( t, x ) S ( x − u ) d x. (9)The complex response at x = (0 , (cid:62) , with reference to (2), can now be expressed as a function of thesignal translation u ; C ( u ) = max t (cid:12)(cid:12) R ( t, u ) (cid:12)(cid:12) where | t | ≤ ρ. (10)The actual value of u , in general, has no particular significance. It will be more important to considerthe response R ( t, u ) as u changes. In particular, suppose that | R ( t, u ) | is high at the stimulus position u = u . If the response is insensitive to slight translation of the signal, then ∂ C (cid:14) ∂u ≈ at u .The approximation problem in (8) will now be addressed. The filter F ( t, x ) can be defined in relationto the Maclaurin expansion of G ( x − t, σ ) with respect to the offset t , as indicated in (8). If image-derivatives up to order K are available, then the approximation is F ( t, x ) = K − (cid:88) k =0 ( − t ) k k ! G k +1 ( x, σ ) (11)The key observation is that the filters G k +1 ( x, σ ) in (11) are precisely those that compute the local jetcoefficients, of order , . . . , K , at the point x = 0 . In other words, the family of shifted filters F ( t, x ) has been obtained from the family of non-shifted derivatives G k ( x, σ ) . It can be seen from (4,5) and (11)that the estimated function F ( t, x ) decreases to zero for large | x | , owing to the exponential tails of G .The definition (11) is usable in practice, as will be shown in section 3.1. There are, however, twodifficulties with the scheme described above. Firstly, although F ( t, x ) is an approximation of G ( x − t, σ ) , the nature of this approximation (a polynomial with the given derivatives) may be inappropriate.Secondly, as expected, the approximation (11) is not well-behaved for large | t | . Both of these problemscan be addressed by replacing the Maclaurin series with a more flexible construction of F ( t, x ) . This isdone by substituting a general polynomial P k ( t ) in place of each monomial ( − t ) k /k ! in (11), leading to F ( t, x ) = K (cid:88) k =1 P k ( t ) G k ( x, σ ) . (12)The K polynomials P k ( t ) are constructed from standard monomial basis functions t j and coefficients c jk . The order of each polynomial will be K − , for consistency with the original series approximation(11). It follows that the polynomials are P k ( t ) = K − (cid:88) j =0 t j c jk where < k ≤ K. (13)The problem has now been altered to that of finding K appropriate coefficients c jk . This will be treated,in sections 2.4 and 2.5, as the optimization of arg min C (cid:90) (cid:90) (cid:12)(cid:12) F ( t, x ) − G ( x − t, σ ) (cid:12)(cid:12) d x d t where F ( t, x ) is the family of filters defined in (12), and C is the matrix of coefficients c jk . This opti-mization scheme generalizes immediately to filters in any number of dimensions. The simple Maclaurinscheme (11) remains a useful model, because the optimal polynomials are, in practice, close to the origi-nal monomials P k ( t ) ≈ t ( k − , as can be seen in figure 1.It is important to note that, once the coefficients c jk have been estimated, the location of the syntheticfilter F ( t, x ) can be varied continuously with respect to the offset t . Any set of translated filters F ( t i , x ) can be obtained, provided | t i | ≤ ρ for i = 1 , . . . M , by re-sampling the monomial basis functions as t ji and, then repeating (12 & 13). Furthermore, the principle of shiftability states that the convolution f ( x − ) (cid:63) s ( x ) can be represented in a finite basis f ( x − t i ) , provided that the filter f is bandlimited (Perona,1995; Simoncelli et al., 1992). The filters F ( t, x ) are not bandlimited, but they do decay exponentially, asthe Fourier transforms have a Gaussian factor. This means, in practice, that the linear response R ( t, u ) in(9) can be represented by a suitable discretization R ( t i , u ) , where the shift-resolution ∆ t = 2 ρ/ ( M − can be chosen to achieve any desired accuracy. It will be convenient to represent the filter construction in terms of matrices. This results in a compactformulation, and prepares for the least-squares estimation procedure that will be introduced in section 2.4.Suppose that M filters, each of length N are to be constructed, and that the highest available derivativeis of order K . Each filter will be represented as a row-vector, so that the collection of offset filters formsan M × N matrix F . Note that this representation applies in any number of dimensions, provided thatthe positions of the filter-samples are consistently identified with the column-indices of F .The columns of another matrix P will contain the K polynomials P k ( t ) from equation (13). Thesepolynomials must be sampled at M points t i , hence P has dimensions M × K . Let the sampled monomialbasis functions t ji be the columns of the matrix B , which must therefore have the same dimensions, M × K . The monomials are weighted by the K × K matrix of coefficients C , such that P = BC (14)where column k of C contains the coefficients of P k . Let each row of the K × N matrix G contain thesampled Gaussian derivative G k ( x i , σ ) . Each offset filter should be a linear combination of the Gaussianderivatives, constructed from the polynomials P k . It follows from (12) that F = PG . (15)Let the column-vector s contain the sampled signal, s = (cid:0) S ( x ) , . . . , S ( x N ) (cid:1) (cid:62) . This means that theresponse-vector r = (cid:0) R ( t , u ) , . . . , R ( t M , u ) (cid:1) (cid:62) is obtained according to (9 & 15) as r = − Fs = − PGs . (16)This clearly shows that the response r is simply a linear transformation P of the K -th order Gaussianjet, Gs = (cid:0) S ( x, σ ) , . . . , S K ( x, σ ) (cid:1) (cid:62) . The implication is that the filter-bank F need not be explicitlyconstructed; rather, the response r is computed directly from the K image derivatives S k at x . It will now be shown that the K unknown coefficients, contained in the matrix C , can be obtained bystandard least-squares methods. It should be emphasized that this is a filter-design problem; the matrix P is fixed for all signals, and the response r is obtained according to (16).Let the M × N matrix F (cid:63) contain the true derivative filters, such that the ij -th element is G ( x j − t i , σ ) . The approximation F ≈ F (cid:63) can be expressed, according to (14 & 15), as the product F (cid:63) ≈ BCG , of which C = B + F (cid:63) G + (17)is the solution in the least-squares sense. This formulation requires two Moore-Penrose inverses, whichcan be computed from the singular-value decompositions of the monomial basis and Gaussian derivativematrices B and G , respectively. It is, however, more efficient to solve this problem using QR decom-positions, as follows. There are, in practice, more offsets than derivative filters M > K , as well asmore spatial samples than derivative filters
N > K . The basis matrix has full column-rank K , and canbe factored as B = Q B R B . The derivative matrix has full row-rank K , and so its transpose can befactored as G (cid:62) = Q G R G . It follows that the solution (17) can be obtained via B + = R − B Q (cid:62) B and G + = Q G R −(cid:62) G . 6 erivatives -2.502.5 Polynomials -2.502.5-2.502.5-2.502.5-2.502.5-2.502.5-2.502.5
Estimated Ideal
Figure 1: Construction of offset filters.
Column 1:
The Gaussian derivatives G k ( x, σ ) , scaled for display,of orders , . . . , K , where K = 7 . Column 2:
The corresponding polynomial interpolation functions P k ( t ) , of order K − . Note that P k ( t ) resembles the monomial t ( k − . Column 3:
Estimated filters, F ( t j , x ) which are offset versions of G ( x, σ ) . Column 4:
Ideal filters F (cid:63) ( t j , x ) . The synthesis equationis F ( t j , x ) = (cid:80) k P k ( t j ) G k ( x, σ ) , where each weight P k ( t j ) corresponds to the j -th dot on the k -thpolynomial in column two. The least-squares construction of the filters F was described in the preceding section. The method isquite usable, but has two shortcomings. Firstly, if one of the shifts t i is zero, then F (0 , x ) ≈ G ( x, σ ) ,but it would be preferable to make this an exact equality, so that the original filter is returned as in (8).The second shortcoming of the method in 2.4 is that, in two or more dimensions, the orientation of thederivative filters in the basis-set G may not match that of the target-set F (cid:63) . Both of these problems willbe solved below. 7 − − − −4 −2 0 2 4 − − −4 −2 0 2 4 −4 −2 0 2 4 Figure 2: Synthesis of orientation-tuned subunits. The nine filters were synthesized from eight orientedderivatives centred at the origin (with of σ = 1 ). The steered-additive solution of section 2.5.3 wasused. Middle row:
The G filter is steered to the axes of a hexagonal lattice; θ = 0 ◦ , ◦ , ◦ . Topand bottom rows:
Offset filters, synthesized at shifts of t = ± ρ in the corresponding directions (with ρ = 1 . ). Note that each column can be interpreted as three subunits of an orientation-tuned complexcell. The filter amplitudes are scaled to the range [ − , , with contour lines separated by increments of0.2 units. The requirement F (0 , x ) = G ( x, σ ) is satisfied by an additive model, in which the polynomial P ( t ) that weights G ( x, σ ) is always unity, and all other polynomials pass through zero when t = 0 . Thisimplies the following partitioning of the derivative, monomial and coefficient matrices: G = (cid:32) g (cid:62) G ∆ (cid:33) B = (cid:0) ∆ (cid:1) C = (cid:18) (cid:62) ∆ (cid:19) (18)where is the column-vector of M ones, and is the column-vector of ( K − zeros. The × N vector g (cid:62) contains the first derivative filter G ( x, σ ) , while the ( K − × N matrix G ∆ contains the higher-order filters. The columns of the M × ( K − matrix B ∆ contain the sampled monomials, excluding the constant vector . The unknown matrix C will be recovered in the form indicated, where C ∆ hasdimensions ( K − × ( K − .The product of B and C , as in (14), now gives P = ( ∆ ) , where the columns of P ∆ = B ∆ C ∆ are polynomials without constant terms. It follows that the product PG , as in (15), gives the additive8pproximation F (cid:63) ≈ (cid:62) + B ∆ C ∆ G ∆ (19)where (cid:62) is the rank-one matrix containing M identical rows g (cid:62) . Note that if the i -th row of F (cid:63) corre-sponds to t = 0 , then the i -th row of B ∆ C ∆ must be zero, this being the evaluation of the polynomials (cid:80) K − j =1 t j c jk at t = 0 . It follows that the i -th row of F (cid:63) is exactly recovered from (19) as g (cid:62) , and so theconstraint F (0 , x ) = G ( x, σ ) has been imposed.The unknown coefficients C ∆ are recovered by subtracting (cid:62) from F (cid:63) , and then proceeding byanalogy with (17). This leads to C ∆ = B +∆ (cid:0) F (cid:63) − (cid:62) (cid:1) G +∆ where the matrices B +∆ and G + (cid:62) ∆ can be obtained from the QR factorizations of B ∆ and G ∆ , as before. In two (or more) dimensions, it is assumed that the desired filters G ( x − t, θ, σ ) have a common orien-tation, where v = (cos θ, sin θ ) (cid:62) is the direction of the derivative in 2- D . This leads to invariance withrespect to translations of the signal in the given direction. The basis filters G k ( x, σ, θ ) , however, willtypically have a range of orientations φ (cid:96) (cid:54) = θ . This problem can be solved as follows.Recall from section 2 that the k -th order Gaussian derivative is steerable with a basis of size k + 1 .Now suppose that row k of the matrix G is replaced by k +1 rows, containing sampled filters G k ( x, φ (cid:96) , σ ) at (cid:96) = 1 , . . . , k + 1 distinct orientations φ (cid:96) . The enlarged matrix G φ now has dimensions M K × N ,where M K = K (cid:88) k =1 ( k + 1) = K ( K + 3) . (20)It follows that there is a K × M K ‘steering’ matrix D such that G = DG φ is exactly the K × N matrix of derivatives at the desired orientation. Moreover, if the approach of section 2.4 is applied to the M K × N matrix G φ , then a solution F = BC φ G φ = BCG (21)will be obtained. It follows that the two coefficient matrices are related by C φ = CD . In summary, if thematrix G contains a sufficient number M K of differently oriented filters, then a set of translated filters F can be approximated in any common orientation θ . There is no change to the algorithm describedin section 2.4. It should, however, be noted that (20) shows a trade-off between translation invarianceand steerability. Larger translation invariance requires more derivatives, but these become increasinglydifficult to steer. The steered solution, as described in the previous section, will not automatically be additive, in the senseof (19). This problem will be solved, with reference to section 2.5.1, by putting an explicitly steeredfilter g (cid:62) θ in place of g (cid:62) . The first derivative can be steered with respect to a basis of filters at distinctorientations φ and φ (these would be the first two rows of the M K × N matrix G φ ). The standardsteering equation (Freeman and Adelson, 1991) can be simplified in this case to (cid:18) cos θ sin θ (cid:19) = (cid:18) cos φ cos φ sin φ sin φ (cid:19) (cid:18) p p (cid:19) where θ , φ and φ are known angles. This system can be solved exactly for the unknown coefficients p and p , resulting in p = sin( φ − θ ) /δ and p = sin( θ − φ ) /δ where δ = sin( φ − φ ) . (22)It may be noted that if φ = 0 and φ = π/ , then the solution reduces to the usual coefficients p = cos θ and p = sin θ for the construction of the directional derivative from d / d x and d / d y . The additive steeredapproximation can now be defined, using the new filter G ( x , θ, σ ) , as F (cid:63) ≈ (cid:62) θ + B ∆ C φ ∆ G φ ∆ where g (cid:62) θ = p g (cid:62) φ + p g (cid:62) φ . (23)9his system can be solved in the same way as (19). Note that the higher-order filters in G φ ∆ will beimplicitly steered, as described in section 2.5.2. Two issues are addressed in this evaluation, as follows.
Approximation : The accuracy of the least-squaresalgorithms from sections 2.4 and 2.5 is established in section 3.1.
Characterization : The response of theunderlying model from section 1.1 to basic stimuli, as well as to natural images, is analyzed in sections 3.2and 3.3 respectively. Note that the issues of approximation and characterization are addressed separately,in order to avoid mixing different sources of error. Hence section 3.1 will evaluate the approximate filters F , while sections 3.2 and 3.3 will analyze the ideal filters F (cid:63) . The accuracy of the filter approximations will be evaluated in this section, and it will be shown that theleast-squares methods are superior to the original Maclaurin expansion. The evaluation is based on theroot mean-square (RMS) error between the target and synthetic filters.The accuracy of a given filter-synthesis method is determined by two variables; the range of offsets,and the number of available derivatives (size of the basis). Better approximations can, in general, beobtained by reducing the range of offsets and/or increasing the size of the basis. The range ρ = 1 σ isthe smallest that results in a unimodal impulse response, as will be shown in section 3.2.1. It is thereforeimportant to analyze the corresponding approximation. In addition, the larger range ρ = 1 . σ will beanalyzed. This leaves the size of the basis (for which there is no prior preference) to be varied in eachcase.The method of evaluation is illustrated in figure 3. It can be seen that the furthest-offset filters beginto depart from the target shape. The RMS difference between the ideal and approximate filters, for eachtest, was measured over 51 offsets t i in the range ± ρ . Each filter was sampled at 101 points x j in therange ± σ , which contains the significantly nonzero part of all filters (see fig. 3).The RMS error, for basis sizes K = 3 , . . . , is shown in fig. 4. The meaning of the RMS error,in terms of filter distortion, can be gauged with reference to fig. 3. For example, it can be seen that theMaclaurin model with K = 8 and ρ = 1 . is poor, and this corresponds to a point around the middle ofthe RMS axis in fig. 4.In the case of the Maclaurin approximation (top row) it can be seen that the error increases rapidlyand monotonically with respect to the offset. The pattern is more complicated for the least-squaresapproximations, because the error has been minimized over an interval ± ρ , which effectively truncatesthe basis functions in x . Nonetheless, the lines corresponding to the different basis-sizes remain nested;they cannot cross, because increasing the size of the basis cannot make the approximation worse. It is,however, possible for the lines to meet. In particular, the unconstrained lines meet in pairs at t = 0 .This is because the target function at zero offset is anti-symmetric. It follows that the incorporation of a symmetric basis function G k cannot improve an existing approximation of order k − . In the case ofthe additive approximation, all lines meet at t = 0 , where the error is zero by construction. A number of basic signals will be introduced below, and the ideal responses will be derived; furtherexamples are given in Hansard and Horaud (2010). The responses are ‘ideal’ in the sense that the errorof the least-squares approximations (sec. 2.4–2.5.3) will be ignored. This is primarily in order to obtainuseful results, but there are two further justifications. Firstly, it has been demonstrated in the precedingsection that the approximations are good, over an appropriate range ρ . Secondly, the approximation errorcan be made arbitrarily low, by using a large enough basis for the given range.Recall that the offset filters F ( t, x ) are copies of the Gaussian derivative G ( x, σ ) . It follows from(9) that the response function is covariant to the shift t , in the sense that R ( t, u ) = R (0 , u − t ) . (24)10 aclaurin, K=8 Maclaurin, K=7Additive, K=8 x−6 −4 −2 0 2 4 6 Additive, K=7 x−6 −4 −2 0 2 4 6
Figure 3: Filter deformation.
Top left:
The eighth-order Maclaurin synthesis (11) of filters σ = 1 ,over a range ρ = ± . σ of offsets. Large errors are visible in the most extreme filters. Top right:
The approximation is much worse if the order of the basis is reduced by one.
Bottom left, right:
Theleast-squares approximation is much better, even if the additivity constraint is enforced.It therefore suffices to obtain the linear response for the case t = 0 , as the other responses are simplytranslations of this function. The linear response in this case is R (0 , u ) = G ( x, σ ) (cid:63) S ( x − u ) , byanalogy with (7). Note that the normalized filter has been used, as defined in (6). It follows that R (0 , u ) can be obtained by blurring the signal with the filter G ( x, σ ) , and then differentiating the result. Thecomplex response C ( u ) is given by the max operation (10). Evidently C ( u ) is the upper envelope ofthe family | R ( t, u ) | , but it is possible to be more precise than this. In particular, the shift-insensitivity ofthe model can be quantified by determining the intervals of u over which C ( u ) is constant, as describedbelow.The response | R (0 , u ) | to a basic signal S ( x − u ) can be either symmetric or antisymmetric, andeither periodic or aperiodic. However, a common property of the responses considered here is that thelocal maxima are all of equal height. Let | R (0 , u (cid:63) ) | = R (cid:63) be a local maximum, and suppose that u iswithin range of this maximum, meaning that | u − u (cid:63) | ≤ ρ . It follows that C ( u ) = R (cid:63) , because themaximum in (10) will be found at t = u − u (cid:63) , and | R ( u − u (cid:63) , u ) | = | R (0 , u (cid:63) ) | = R (cid:63) by (24). Anintuitive summary of this is that each local maximum | R (0 , u (cid:63) ) | generates a plateau C ( u (cid:63) ± ρ ) = R (cid:63) inthe complex response. In order to make use of this interpretation, the function V ( u ) will be defined asthe signed distance u − u (cid:63) to the nearest local maximum of | R (0 , u ) | . It follows that C ( u ) = R (cid:63) if | V ( u ) | ≤ ρ max | t |≤ ρ | R ( t, u ) | otherwise . (25)This explicitly identifies the intervals, | V ( u ) | ≤ ρ , over which C ( u ) is constant. Note that if V ( u ) > ρ then the original definition (10) is used. The functions R (0 , u ) and V ( u ) , as well as the constant R (cid:63) , willnow be derived for each of the basic signals. It should be emphasized that V ( u ) and R (cid:63) are only used to characterize the response; they are not part of the computational model. The first test signal to be considered is the unit impulse, which can be used to characterize the initiallinear stage of the model. The impulse is defined as S σ ( x ) = δ ( x ) (26)11 l l l l Maclaurin0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r l l l l l l l l l Maclaurin0.0 0.5 1.0 1.5 l l l Unconstrained0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r l l l l l Unconstrained0.0 0.5 1.0 1.5 l l l Additive0.0 0.2 0.4 0.6 0.8 1.0 . . . R M S e rr o r Offset l l l l l Additive0.0 0.5 1.0 1.5Offset
Figure 4: Error vs. offset.
Top row:
The Maclaurin error rises quickly as the target filter is offset fromthe centre of the basis. Each line represents a different basis-size, k = 3 , . . . , , as indicated. Theleft and right plots show ranges ρ = 1 σ , and ρ = 1 . σ , respectively. Middle row:
The unconstrainedleast-squares approximation is much better, especially for high-order bases.
Bottom row:
The additiveapproximation is also good, and ensures that the error is zero when there is no offset (as in the Maclaurincase).where δ ( x ) is the Dirac distribution. It follows that the linear response G (cid:63) S σ is just the originalnormalized derivative filter, R σ (0 , u ) = G ( u, σ ) . (27)The maxima of the linear response can be found by differentiating R σ (0 , u ) , and setting the result tozero. The derivative contains a factor σ − u , and so the zeros are at ± σ . The peak is at − σ , and itfollows that the maximum response is R (cid:63)σ = G ( − σ, σ ) . (28)Both extrema become peaks in | R σ (0 , u ) | , and the extent of the response plateau is determined by theminimum distance from these. The distance function for the impulse response can now be defined as V σ ( u ) = u − sgn + ( u ) σ, (29)where sgn + is the sign-function with the convention sgn + (0) = 1 . If u = 0 then | V σ ( u ) | = σ , and itfollows from (25) that C ( u ) (cid:54) = R (cid:63)σ if σ > ρ . It has already been established that | R σ (0 , u ) | has maximaof R (cid:63)σ at ± σ , which implies that the response C ( u ) will be bimodal unless σ ≤ ρ (30)12s illustrated figure 5. This condition is strictly imposed, as it would be undesirable to have a bimodalresponse to a unimodal signal. In general, ρ should be made as large as possible for a given σ , in order toachieve as much shift-invariance as possible. Recall, for example, that the least-squares approximationsin section 3.1 were demonstrated for ρ = 1 . σ . x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 5: Impulse response.
Left column: the top plot shows the unit impulse, S σ ( x ) , as in (26). Themiddle plot shows the response C ( u ) . The bottom plot shows the distance function | V σ ( u ) | , as in (29),along with the value of the maximum offset ρ = 1 σ . The response is constant, C ( u ) = R (cid:63)σ , when | V σ ( u ) | ≤ ρ . The critical case, σ = 1 , ρ = 1 is plotted. Right column: as before, except ρ = 0 . σ . Theresponse becomes bimodal, which shows the importance of the condition σ ≤ ρ . The second test signal to be considered is the unit step function. This is arguably the most importantexample, because it is the basic model for a luminance edge . Indeed, the current model is optimized forthe detection of step-like edges, owing to the use of the first derivative as the offset filter (Canny, 1986).The step can be defined from the standard sign-function, as follows S α ( x ) = α (cid:0) x ) (cid:1) (31)The unit step function is related to the integral Φ ( u, σ ) of the normalized Gaussian function G ( x, σ ) inthe following way: Φ ( u, σ ) = (cid:90) u −∞ G ( x, σ ) d x (32) = 1 /ασ (cid:112) π/ (cid:90) ∞−∞ G ( x, σ ) S α ( u − x ) d x (33)The third integral is the convolution of G with S α , and hence Φ ( u, σ ) is proportional to the smoothedstep-edge. The linear response is given by the derivative, R α (0 , u ) = σ (cid:112) π/ /α dd u Φ ( u, σ ) (34) = α G ( u, σ ) (35)13his shows that the basic response is simply an un-normalized Gaussian, located at the step-discontinuity.The maximum response and the signed-distance function are evidently R (cid:63)α = α/ and V ( u ) = u. (36)Let (cid:0) u, R ( u ) (cid:1) be the Cartesian coordinates of the response curve. The final response can be constructedfrom the Gaussian (35) by inserting the plateau ( ± ρ, α/ in place of the maximum point (0 , α/ . Thisis illustrated in in figure 6. x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 6: Step response.
Left column:
The top plot shows the step-edge S α ( x ) , as defined in (31). Themiddle plot shows the response C ( u ) . The bottom plot shows the distance function V α ( u ) , as in (36).Note that the response is unconditionally unimodal in this case. Right column:
As before, except withGaussian noise (SD 0.075) added independently at each point. The response C ( u ) is not significantlyaffected. The third class of signals to be considered are the sines and cosines. These are of central importance,owing to their role in the Fourier synthesis of more complicated signals. Furthermore, these functionsare used to construct the 2- D grating patterns that are commonly used to characterize complex cells. Itwill be convenient to base the analysis on the cosine function S ξ ( x ) = cos (cid:0) πξx (cid:1) (37)where ξ is the frequency. The Fourier transforms g ( x ) (cid:55)→ F x [ g ]( η ) of the filter G ( x, σ ) and signal S ξ ( x ) are F x (cid:2) G (cid:3) ( η ) = σ (cid:112) π/ G (cid:0) η, / (2 πσ ) (cid:1) (38) F x (cid:2) S ξ (cid:3) ( η ) = (cid:0) δ ( η − ξ ) + δ ( η + ξ ) (cid:1) (39)respectively, where η is the frequency variable. The convolution G (cid:63) S ξ can be obtained from theinverse Fourier transform of the product F x (cid:2) G ] F x (cid:2) S ξ (cid:3) . The resulting cosine is attenuated by a scale-factor F x (cid:2) G (cid:3) ( ξ ) , because F x (cid:2) S ξ ]( η ) is zero unless | η | = ξ . Differentiating cos(2 πξx ) with respect to x gives − sin(2 πξx ) , along with a second scale-factor of πξ . The amplitude of the linear response isgiven by the product of the two scale-factors πξ and F x (cid:2) G ]( ξ ) , which can be expressed as R (cid:63)ξ = σξπ / √ G (cid:0) ξ, / (2 πσ ) (cid:1) .
14t can be seen that the amplitude depends on the scale σ of the filter, as well as on the frequency ξ of thesignal. The complete linear response is given by R ξ (0 , u ) = − R (cid:63)ξ sin (cid:0) πξu (cid:1) . Note that a phase-shift u can be introduced, if required, by substituting u − u for u . The rectified sine | R ξ (0 , u ) | is another periodic function, of twice the frequency. The peaks of this function are separatedby a distance ξ , and so V ξ ( u ) = (cid:18) u mod 12 ξ (cid:19) − ξ (40)is a suitable distance function for the cosine signal (37). The case of sine signals is analogous, with sin replaced by cos in the linear response (3.2.3), and u replaced by u − ξ in the distance function (40). −101 x u −15 −10 −5 0 5 10 15 u r x u −15 −10 −5 0 5 10 15 u r Figure 7: Cosine response.
Left column : The top plot shows the cosine signal S ξ ( x ) , as in (37), offrequency ξ = . The middle plot shows the response C ( u ) , which is constant. The bottom plot showsthe distance function | V ξ ( u ) | , as in (40). Note that the critical case is plotted, in which the peaks of | V ξ ( u ) | touch the line ρ = 1 . σ . The response is also constant for any higher frequency. Right column:
As before, but for a lower frequency, ξ = . The distance function now crosses the line ρ = 1 . σ , andcorresponding ‘notches’ appear in C ( u ) .It is important to see that the system response is entirely constant for frequencies that are not toolow. Specifically, the extreme values of (40), with respect to u , are ± ξ , from which it follows that theresponse is identically R (cid:63)ξ if ξ ≥ ρ . The corresponding constraint on the wavelength ξ is /ξ ≤ ρ. (41)In order to interpret this result, recall that ρ ≥ σ is required for a unimodal impulse response (30).Furthermore, in section 3.1, it was shown that ρ ≈ . σ is achievable in practice. This means that aconstant response can be expected for frequencies as low as ξ = σ . This section makes a basic evaluation of the differential model, using the objective function of ‘slowfeature analysis’ (Berkes and Wiskott, 2005; Wiskott and Sejnowski, 2002). The procedure is as follows.Each × greyscale image is decomposed into i = 1 , . . . , orientation channels θ i at scale σ = 2 pixels. This corresponds to a set of simple-cell responses S ( x , σ, θ i ) , with an angular separation15f ◦ . The steerability of S is not used (i.e. a separate convolution is done for each θ i ) in order tominimize any angular bias in the image sampling. A set of straight tracks x ijk = p j ± k ∆ × (cos θ i , sin θ i ) (cid:62) , where j = 1 , . . . , m and k = 0 , . . . , n (42)is sampled from each 2- D response. The m = 100 random points p j are sampled from a uniformdistribution over the image; the sign ± is also random. The resolution ∆ is set to one pixel, and thenumber of steps along each path is n = 99 . This gives a total of samples from each orientationchannel. The responses at non-integral positions x ijk are obtained by bilinear interpolation. The samplesare non-negative by definition, and a global scale factor γ is used to make the overall ijk -mean of γS ( x ijk , σ, θ i ) equal to . The mean simple-cell response is then computed in each orientation channel, E S ( i ) = 1 mn m (cid:88) j =1 n (cid:88) k =0 γS (cid:0) x ijk , σ, θ i (cid:1) (43)where the scaling by γ ensures that (cid:80) i E S ( i ) = . The mean quadratic variation along the paths is alsocomputed, in each orientation channel; Q S ( i ) = 1 mn m (cid:88) j =1 n (cid:88) k =1 (cid:12)(cid:12)(cid:12) γS (cid:0) x ijk , σ, θ i (cid:1) − γS (cid:0) x ij [ k − , σ, θ i (cid:1)(cid:12)(cid:12)(cid:12) . (44)The coordinates x ijk and x ij [ k − represent adjacent points (separated by ∆ ) on the j -th path in the i -thchannel. In summary, E S ( i ) measures the average response for orientation θ i , and Q S ( i ) measures theaverage spatial variation of this response in direction θ i . Slow feature analysis finds filters that minimizethe quadratic variation. The orientation tuning and slowness measurements are plotted in figures 8 and9 as a function of θ i , by attaching vertical bars ± (cid:112) Q S ( i ) to each point E i . Each test is then repeated,using the complex response C ( x , σ, θ ) in place of the simple response S ( x , σ, θ ) , giving measurements E C ( i ) and Q C ( i ) .Three test-images with a dominant global orientation are used. Firstly, a vertical cosine grating, I cos ( x, y ) = (cid:0) πξx ) (cid:1) . The range is set to ρ = 1 . σ , as usual, and the wavelength is set to ξ = 8 σ . These values do not satisfy the limit (41), which ensures that the complex response will not betrivial. The simple-cell response is shown in figure 8 (top left), and two effects should be noted. Firstly,the response is tuned to the dominant orientation θ = 0 , as can be seen from the unimodal shape of thecurve. Secondly, there is a large variation in the response when the tracks are orthogonal to the grating, asshown by the large bars around θ = 0 . This is because the filter falls in and out of phase with the imageas it moves horizontally. The corresponding complex response is shown in figure 8 (top right). It canbe seen that the orientation tuning is preserved, while the response variation is greatly reduced. Figure8 (bottom) repeats the test, but with noise added to the cosine grating, I = 0 . × I cos + 0 . × I uni ,where each pixel in I uni is independently sampled from the uniform distribution on [0 , . This meansthat a variable simple response is obtained as the filter moves in any direction, because the image is nowtruly 2- D . The complex response reduces the variation, as shown in figure 8 (bottom).Figure 9 (top) shows results for a real image which has an orientation-structure similar to that of thegrating. The results are analogous. Finally, the same test is performed on a natural image, which containsa mixture of foliage and rocks. Figure 9 (bottom) shows that, although there is no dominant orientationin the stimulus, the complex response remains much less variable than the simple response. It has been shown that a differential model of the complex cell can be constructed from the local jetrepresentation. The differential model, which works naturally in a basis of steerable filters, can be viewedas a constrained version of the Hubel and Wiesel (1962) subunit model.
The qualitative components in the present approach are similar to those of the Gabor energy model. Bothmodels are based on oriented linear filters, which are centred at the same position. Likewise, both models16 y - p p . . . R e s pon s e x y - p p . . . y - p p . . . Orientation R e s pon s e y - p p . . . Orientation
Figure 8: Cosine response.
Top left:
Average simple cell response E S ( i ) to a vertical cosine grating ofwavelength σ . The curve indicates the mean response in each of 36 orientation channels θ i , and has aclear peak at zero. The vertical bars ± (cid:112) Q S ( i ) indicate the RMS spatial variation of the response in thepreferred direction of each orientation channel.
Top right:
Complex cell response E C ( i ) to the sameimage. The orientation tuning is preserved, but the variability ± (cid:112) Q C ( i ) of the response is greatlyreduced. Bottom left, right:
As before, but with noise added to the image.output a combination of the nonlinearly transformed filter responses. The derivative operators G k ( x, σ ) ,in the differential model, are interpreted as simple cells, at a common location x (Hawken and Parker,1987; Young et al., 2001). These are constructed from LGN inputs, according to the classical model(Hubel and Wiesel, 1962). The offset filters F ( t i , x ) have two possible interpretations, shown in fig. 10,as follows.Firstly, the offset filters F ( t i , x ) can be interpreted as an intermediate layer of simple cells, each ofwhich has an RF that is a linear combination of other simple cell RFs. Let the row-vector f (cid:62) i represent F ( t i , x ) , and let s be the signal vector. It follows that the linear response is r i = f (cid:62) i s (45)where f (cid:62) i = p (cid:62) i G , according to the filter-design equation F = PG in (15). The task of the complex cell C is to compute the (absolute) maximum of the linear responses, r i . This interpretation corresponds tothe HMAX model (Lampl et al., 2004; Riesenhuber and Poggio, 1999), but with additional relationshipsimposed on the underlying simple cells. An advantage of this interpretation is that it predicts a majorityof simple cells with few oscillations, as is observed (Ringach, 2002). This follows from the fact that theoffset filters are of lower order than their derivatives, together with the fact that an unlimited number ofoffsets can be obtained from a given derivative basis. Furthermore, suppose that a number of complexcells (e.g. of different orientations) C n are constructed from the same basis G . A new layer of low-orderoffset filters F n is required for each complex cell, and so the high-order filters in G are soon outnumberedin the ensemble { G , F , F , . . . } .An alternative physiological implementation is as follows. Suppose that the complex cell has i =1 . . . , M basal dendrites, each of which branches out to the K simple cells G k ( x, σ ) . The linear response17 y - p p . . . R e s pon s e x y - p p . . . y - p p . . . Orientation R e s pon s e y - p p . . . Orientation
Figure 9: Image response.
Top:
As in figure 8, but using a real image that contains a dominant verticalorientation. Left: The simple response shows variation across all orientation channels Q S ( i ) . Right: Thevariation of the complex response Q C ( i ) is much lower. Bottom:
As before, but using a natural image,with no dominant orientation.can then be expressed as r i = p (cid:62) i ( Gs ) (46)where Gs , the Gaussian jet response, is computed first. This interpretation requires no intermediatesimple cells. Instead, it places a fixed weight P ik on each dendritic branch, and requires a summation tobe performed within each dendrite. This seems to be quite compatible with the observed cell morphology;a typical complex cell has a small number of basal dendrites which, unlike those of simple cells, areextensively branched (Gilbert and Wiesel, 1979; Kelly and van Essen, 1974).The dendritic interpretation is more economical in the number of simple cells required, but lesscompatible with the observed simple-cell statistics (Ringach, 2002). It should be noted that a mixture ofthe two interpretations in fig. 10 is quite possible. For example, there could be a few intermediate simplecells, with the remaining filters implemented by dendritic summation. In all cases, each complex cell isassociated with odd and even simple cells g k , as is observed (Pollen and Ronner, 1981).The maximum (10) over the | r i | can be approximated by a barycentric combination, (cid:80) i w i | r i | . Anappropriate vector of weights can be computed as w i = | r i | β (cid:14) ( α + (cid:80) i | r i | β ) . This is a form of ‘softmax’(Bridle, 1989), with parameters α and β , as described in (Hansard and Horaud, 2010). Several neuralmodels of this operation have been proposed (Riesenhuber and Poggio, 1999; Yu et al., 2002). Forexample, the w i can be interpreted as the outputs of a network of mutually inhibitory neurons, whichreceive copies of the subunit responses r i . There is experimental evidence for similar arrangements, withrespect to both simple and complex cells (Carandini and Heeger, 1994; Heeger, 1992a; Lampl et al.,2004). 18igure 10: Two neural implementations of the complex cell C . These are schematic representations, withreduced numbers of derivative filters g k and offset filters f i . Left: The offset filters are identified with anintermediate layer of simple cells. Right: The offset filters are implicit in the weighted sums p (cid:62) G and p (cid:62) G performed by the dendritic tree of C . This section will demonstrate that the response of the new model, to broad-band stimuli, can easily bedistinguished from that of the standard energy model. Some qualitative predictions will also be discussed.The response of the new model to sinusoidal stimuli is similar to that of the energy model. Bothresponses are phase-invariant, provided that the stimulus frequency is not too low (see fig. 7). Consider,however, a luminance step-edge that is flashed (or shifted) across the RF. The Gabor energy is approxi-mately Gaussian, as a function of the edge-position, with a peak in the centre of the RF. The differentialresponse is much flatter, as shown in fig. 6. This suggests that an empirical measure of kurtosis could beused to distinguish between the two responses, as will be shown below.Let the odd-symmetric Gabor filter be defined as F (cid:107) ( x, ξ, τ ) = − G ( x, τ ) sin(2 πξx ) , so that itmatches the polarity of G . The even filter, following (Lehky et al., 2005), is defined by the numericalHilbert transform F ⊥ = H ( F (cid:107) ) , in order to avoid the nonzero DC component that arises in the cosine-based definition. The Gabor energy of a signal S is then R F = ( F (cid:107) (cid:63) S ) + ( F ⊥ (cid:63) S ) . The envelopewidth τ of each Gabor filter is determined from the constraint that the bandwidth be equal to 1.5, whichis realistic for complex cells (Daugman, 1985). A self-similar family of Gabor pairs, parameterized byfrequency ξ , can now be defined. Quasi-Newton optimization is used to determine a frequency ξ , forwhich F (cid:107) ( x, ξ , τ ) ≈ G ( x, in the L sense. Ten Gabor pairs with frequencies ξ k = ξ − k ∆ ξ areconstructed, where k = 0 , . . . . The corresponding differential model, with target filter G ( x, ξ /ξ k ) , isalso constructed for each pair. Four possible ranges are considered for the differential models, by setting ρ/σ = 1 . , 1.25, 1.5, 1.75.Let p ξ ( u ) be the response distribution, which gives the firing-rate for an image-edge, as a function ofits offset u from the centre u = 0 of the complex RF. If P ξ ( u ) is the cumulative distribution (cid:82) p ξ ( u ) d u ,then the (uncentred) kurtosis κ of any p ξ ( u ) can be estimated (Crow and Siddiqui, 1967) by K ( p ξ ) = P − ξ (1 − a ) − P − ξ ( a ) P − ξ (1 − b ) − P − ξ ( b ) . (47)The values a = 0 . and b = 0 . are used here (making K the ratio of the 95% and 50% confidenceintervals). This estimate, which can be computed by linear-interpolation between the samples of P ξ ( u ) ,has the following interpretation. Suppose that the offsets u a and u b are associated with firing-rates a and b respectively. The response distributions are symmetric, and so the statistic (47) is simply the distanceratio K ( p ξ ) = u a /u b , as illustrated in fig. 11. The kurtosis could also be estimated from the empiricalmoments of p ξ ( u ) , but (47) is much less sensitive to noise in the tails of the distribution.The distributions considered here lie in and around the range from the uniform distribution ( κ = , K = 1 . ), to the normal distribution ( κ = 3 , K ≈ . ). It can be seen from fig. 11 that the Gaborresponse is approximately Gaussian, whereas the differential responses are much flatter. Furthermore,note that the line ρ = σ in fig. 11 shows the maximum kurtosis of the differential model (determined by19 F ( x ) p ( x ) l u a u b K ( p x ) x x . . . . l l l l l l l l l l GaussianUniform C, r = s C, r = s C F2 C F l l l l l l l l l ll l l l l l l l l ll l l l l l l l l l Figure 11: Top Left: Gaussian Derivative (black) and Gabor (grey) filters, matched subject to the band-width condition. Bottom Left: The kurtosis statistic (47) is the ratio of the two horizontal lines, shownhere on a Gaussian distribution. Right: Kurtosis of the edge-response. Each line represents a complexcell model, parameterized by preferred frequency ξ . The top pair is obtained from the Gabor energyand its square-root. The bottom pair delimits the range of possible differential models. The Gabor anddifferential responses are easily distinguished. The dashed lines are estimates for reference distributions.30), which is still much lower than that of the Gabor energy. Furthermore, it can be seen that the kurtosisis approximately independent of frequency, which simplifies the comparision. It should be noted that amuch better fit between G and F (cid:107) can be obtained if the bandwidth constraint is relaxed. This however,makes the energy responses even more kurtotic.The differential model makes several predictions about the configuration of simple and complex cells.Firstly, like the energy model, it predicts that both odd and even filters are required by the complex cell.Unlike the energy model, it does not require an exact quadrature relationship (indeed, the G k basis isnot orthogonal). More generally, an important property of the differential model is its robustness todeviations from the ideal simple-cell RF profiles. Derivative of Gaussian basis was used, in the presentderivation, for mathematical clarity. However, all that is required is a basis that spans the space of desiredsubunit filters F ( t i , x ) .The differential model also predicts a relationship between the scale σ of the subunits and the radius ρ of the resulting complex receptive field. This prediction, as in the case of the energy model, is probablytoo strict (i.e. larger complex receptive fields should be possible). However, as discussed in the followingsection, the complex receptive fields can be extended by allowing multiple scales σ j in the basis set ofthe differential model.A qualitative prediction of the present model is that high-order derivative filters are required, in orderto approximate the target filter over a sufficient range ρ . In particular, it was shown in section 3.2.1 that,for a unimodal impulse response, ρ ≥ σ is required. This means, in practice, that derivative filters oforder five and beyond must be used in the approximation, as can be seen from figure 4. This is interesting,because very oscillatory filters have been observed in V1 (Young and Lesperance, 2001). These have anatural role as high-frequency processors in the Gabor model. Their role is less clear in the geometricapproach, because estimates of the high-order image derivatives are of limited use. The present worksuggests that these filters could have a different role, in providing a basis for spatially offset filters of loworder. There are several directions in which this model could be developed. One straightforward extension isto allow filters of different scales (as well as different orders) in the basis set. Preliminary experiments20onfirm that this extends the range ρ of translation invariance, as would be expected. This means thatthe complex cell receptive field could be made larger, relative to those of the underlying simple cells.Another extension would be to allow a variety of offset-filter shapes (with odd, even & mixed symmetry),rather than just the first derivative used here. This would lead to better agreement with the physiologicaldata, which indicates a variety of receptive field shapes among the complex subunits (Gaska et al., 1987;Sasaki and Ohzawa, 2007; Touryan et al., 2005). It would also be interesting to explore the relationshipof the present work to the normalization model (Heeger, 1992b; Rust et al., 2005), and to other modelsof motion and spatial processing (Georgeson et al., 2007; Johnston et al., 1992).The present work has concentrated on local shift-invariance, because this is a defining characteristicof complex cells. However, mechanisms that have other geometric invariances can be constructed in thesame scale space framework. For example, consider the effect of a geometric scaling ( x, y ) → ( αx, αy ) on the operator G ( x, y, σ, θ ) , which represents the k -th derivative of the normalized 2- D Gaussian,in direction θ . The scaling α has no effect on the shape of the RF, as can be seen from the equation G k ( αx, αy, ασ, θ ) = G k ( x, y, σ, θ ) (cid:14) α k . This leads to simple relationships between the responses ofthe RF family G k ( x, y, σ (cid:96) , θ ) , where (cid:96) = 1 , . . . L defines a range of scales (Koenderink and van Doorn,1992; Lindeberg, 1998). Future work will consider more complicated geometric invariances (e.g. 2- D affine), in connection with the larger RFs that are found in extrastriate areas.Another direction would be to consider how the differential model could be learned from naturalimage data, by analogy with (Berkes and Wiskott, 2005; Karklin and Lewicki, 2009; Wiskott and Se-jnowski, 2002). This could be done by fixing the local jet filters (i.e. simple cells), and then optimizingthe linear transformation P . The transformation could be parameterized by coefficients C , given a basis B of smooth functions (e.g. the polynomials that were used here). Alternatively, P could be optimizeddirectly, subject to smoothness constraints on the columns P k ( t ) . The variability of the response C ( u ) would be a suitable objective function for the learning process, by analogy with slow-feature analysismodels (Berkes and Wiskott, 2005; Wiskott and Sejnowski, 2002). The combination of the geometricand statistical approaches to image analysis is, more generally, a very promising aim. Acknowledgements
The authors would like to thank the reviewers for their help with the manuscript.
References
Adelson, E. H. and Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion.
J.Opt. Soc. Am. A , 2(2):284–299.Alonso, J.-M. and Martinez, L. M. (1998). Functional connectivity between simple cells and complexcells in cat striate cortex.
Nature Neuroscience , 1(5):395–403.Atherton, T. J. (2002). Energy and phase orientation mechanisms: A computational model.
SpatialVision , 15(4):415–441.Ben-Shahar, O. and Zucker, S. W. (2004). Geometrical Computations Explain Projection Patterns ofLong Range Horizontal Connections in Visual Cortex.
Neural Computation , 16(3):445–476.Berkes, P. and Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties.
Journal of Vision , 5(6):579–602.Bridle, J. S. (1989). Probabilistic interpretation of feedforward classification network outputs, with re-lationships to statistical pattern recognition. In Fougelman-Soulie, F. and H´erault, J., editors,
Neuro-computing: Algorithms, Architectures and Applictions . Springer Verlag.Canny, J. (1986). A computational approach to edge detection.
IEEE Transactions on Pattern Analysisand Machine Intelligence , 8(6):679–698.Carandini, M. (2006). What simple and complex cells compute.
J. Physiology , 577(2):463–466.21arandini, M., Demb, J. B., Mante, V., Tolhurst, D. J., Dan, Y., Olshausen, B. A., Gallant, J. L., and Rust,N. (2005). Do we know what the early visual system does?
J. Neuroscience , 25:10577–10597.Carandini, M. and Heeger, D. (1994). Summation and Division by Neurons in Primate Visual Cortex.
Science , 264:1333–1336.Crow, E. and Siddiqui, M. (1967). Robust Estimation of Location.
Journal of the American StatisticalAssociation , 62(318):353–389.Daugman, J. G. (1985). Uncertainty relation for resolution in space, spatial frequency, and orientationoptimized by two-dimensional visual cortical filters.
J. Optical Soc. America , 2(7):1160–1169.Dayan, P. and Abbott, L. F. (2001).
Theoretical Neuroscience . MIT Press.Dobbins, A., Zucker, S. W., and Cynader, M. S. (1987). Endstopped neurons in the visual cortex as asubstrate for calculating curvature.
Nature , 329:438–441.Emerson, R. C., Bergen, J. R., and Adelson, E. H. (1992). Directionally Selective Complex Cells and theComputation of Motion Energy in Cat Visual Cortex.
Vision Research , 32(2):203–218.Felsberg, M. and Sommer, G. (2001). The Monogenic Signal.
IEEE Transactions on Signal Processing ,49(12):3136–3144.Fleet, D. J., Wagner, H., and Heeger, D. J. (1996). Neural encoding of binocular disparity: Energymodels, position shifts and phase shifts.
Vision Research , 36(12):1839–1857.Freeman, W. T. and Adelson, E. H. (1991). The design and use of steerable filters.
IEEE Trans. PAMI ,13(9):891–906.Gaska, J. P., Pollen, D. A., and Cavanagh, P. (1987). Diversity of complex cell responses to even-and odd-symmetric luminance profiles in the visual cortex of the cat.
Experimental Brain Research ,68:249–259.Georgeson, M. A., May, K. A., Freeman, T. C. A., and Hesse, G. S. (2007). From filters to features:Scale–space analysis of edge and blur coding in human vision.
Journal of Vision , 7(13):1–21.Gilbert, C. and Wiesel, T. (1979). Morphology and Intracortical Projections of Fucntionally Charac-terised Neurones in the Cat Visual Cortex.
Nature , 280(5718):120–125.Hansard, M. and Horaud, R. (2010). Complex Cells and the Representation of Local Image-Structure.Technical Report 7485, INRIA.Harris, C. and Stephens, M. (1988). A combined corner and edge detector. In
Proc. 4th Alvey VisionConference , pages 147–151.Hawken, M. J. and Parker, A. J. (1987). Spatial properties of neurons in the monkey striate cortex.
Proc.R. Soc. Lond. B , B 231:251–288.Heeger, D. J. (1992a). Half-Squaring in responses of cat striate cells.
Visual Neuroscience , 9:427–443.Heeger, D. J. (1992b). Normalization of Cell Responses in Cat Striate Cortex.
Visual Neuroscience ,9:181–197.Hubel, D. H. and Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecturein the cat’s visual cortex.
J. Physiology , 160:106–54.Johnston, A., McOwan, P. W., and Buxton, H. (1992). A computational model of the analysis of somefirst-order and second-order motion patterns by simple and complex cells.
Proc. R. Soc. Lond. B ,250:297–306.Jones, J. P. and Palmer, L. A. (1987). An evaluation of the two-dimensional Gabor filter model of simplereceptive fields in cat striate cortex.
J. Neurophysiology , 58(6):1233–1258.22arklin, Y. and Lewicki, M. S. (2009). Emergence of complex cell properties by learning to generalizein natural scenes.
Nature , 457:83–86.Kelly, J. and van Essen, D. (1974). Cell Structure and Function in the Visual Cortex of the Cat.
J.Physiology , 238:515–547.Kjaer, T. W., Gawne, T. J., Hertz, J. A., and Richmond, B. J. (1997). Insensitivity of V1 Complex CellResponses to Small Shifts in the Retinal Image of Complex Patterns.
J. Neurophysiology , 78:3187–3197.Koenderink, J. and van Doorn, A. (1992). Generic Neighborhood Operators.
IEEE Transactions onPattern Analysis and Machine Intelligence , 14(6):597–605.Koenderink, J. J. (1984). The structure of images.
Biological Cybernetics , 50:363–370.Koenderink, J. J. and van Doorn, A. J. (1987). Representation of local geometry in the visual system.
Biological Cybernetics , 55:367–375.Koenderink, J. J. and van Doorn, A. J. (1990). Receptive field families.
Biological Cybernetics , 63:291–297.Lampl, I., Ferster, D., Poggio, T., and Riesenhuber, M. (2004). Intracellular Measurements of SpatialIntegration and the MAX Operation in Complex Cells of the Cat Primary Visual Cortex.
J. Neurophys-iology , 92:2704–2713.Lehky, S. R., Sejnowski, T. J., and Desimone, R. (2005). Selectivity and sparseness in the responses ofstriate complex cells.
Vision Research , 45:57–73.Lindeberg, T. (1998). Edge Detection and Ridge Detection with Automatic Scale Selection.
InternationalJournal of Computer Vision , 30(2):117–154.Martinez, L. M. and Alonso, J.-M. (2003). Complex Receptive Fields in Primary Visual Cortex.
TheNeuroscientist , 9(5):317–331.De Valois, R. L., Albrecht, D. G., and Thorell, L. G. (1982). Spatial frequency selectivity of cells inmacaque visual cortex.
Vision Research , 21:545–559.Mechler, F., Reich, D. S., and Victor, J. D. (2002). Detection and Discrimination of Relative SpatialPhase by V1 Neurons.
J. Neuroscience , 22(14):6129–6157.Mechler, F. and Ringach, D. L. (2002). On the Classification of Simple and Complex Cells.
VisionResearch , 42:1017–1013.Morrone, M. and Burr, D. (1988). Feature detection in human vision: A phase dependent energy model.
Proc. R. Soc. Lond. B. , 235:221–245.Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978a). Receptive field organization of complexcells in the cat’s striate cortex.
J. Physiology , 283:79–99.Movshon, J. A., Thompson, I. D., and Tolhurst, D. J. (1978b). Spatial summation in the receptive fieldsof simple cells in the cat’s striate cortex.
J. Physiology , 283:53–77.Orban, G. A. (2008). Higher order visual processing in macaque extrastriate cortex.
PhysiologicalReviews , 88:59–89.Perona, P. (1995). Deformable kernels for early vision.
IEEE Trans. PAMI , 17(5):488–499.Petitot, J. (2003). The Neurogeometry of Pinwheels as a Sub-Riemannian Contact Structure.
J. PhysiolParis , 97:265–309.Pollen, A. D. and Ronner, S. F. (1983). Visual cortical neurons as localized spatial frequency filters.
IEEE Transactions on Systems, Man & Cybernetics , 13:907–916.23ollen, D. A. and Ronner, S. F. (1981). Phase relationships between adjacent simple cells in the visualcortex.
Science , 212(4501):1409–1411.Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).
Numerical Recipes in C .Cambridge University Press, 2nd edition.Riesenhuber, M. and Poggio, T. (1999). Hierarchical models of object recognition in cortex.
NatureNeuroscience , 2(11):1019–1025.Ringach, D. (2002). Spatial Structure and Symmetry of Simple-Cell Receptive Fields in Macaque Pri-mary Visual Cortex.
J. Neurophysiology , 88:455–463.Rust, N., Schwartz, O., Movshon, J., and Simoncelli, E. (2005). Spatiotemporal Elements of MacaqueV1 Receptive Fields.
Neuron , 46:945–956.Sasaki, K. S. and Ohzawa, I. (2007). Internal Spatial Organization of Receptive Fields of Complex Cellsin the Early Visual Cortex.
J. Neurophysiology , 98:1194–1212.Simoncelli, E. P., Freeman, W. T., Adelson, E. H., and Heeger, D. J. (1992). Shiftable Multiscale Trans-forms.
IEEE Transactions on Information Theory , 38(2):587–607.Skottun, B. C., Valois, R. L. D., Grosof, D. H., Movshon, J. A., Albrecht, D. G., and Bonds, A. B.(1991). Classifying Simple and Complex Cells on the basis of Response Modulation.
Vision Research ,31(7/8):1079–1086.Spitzer, H. and Hochstein, S. (1988). Complex cell receptive field models.
Progress in Neurobiology ,31:285–309.Touryan, J., Felsen, G., and Dan, Y. (2005). Spatial Structure of Complex Cell Receptive Fields Measuredwith Natural Images.
Neuron , 45:781–791.Wiskott, L. and Sejnowski, T. (2002). Slow feature analysis: unsupervised learning of invariances.
NeuralComputation , 14(4):715–770.Wundrich, I. J., von der Malsburg, C., and W¨urtz, R. P. (2004). Image Representation by Complex CellResponses.
Neural Computation , 16(4):2563–2575.Young, R. A. and Lesperance, R. M. (2001). The Gaussian derivative model for spatial-temporal vision:II. Cortical data.
Spatial Vision , 14(3):321–389.Young, R. A., Lesperance, R. M., and Meyer, W. W. (2001). The Gaussian derivative model for spatial-temporal vision: I. Cortical model.
Spatial Vision , 14(3):261–319.Yu, A., Giese, M., and Poggio, T. (2002). Biophysiologically Plausible Implementations of the MaximumOperation.