A Neural Network with Local Learning Rules for Minor Subspace Analysis
AA NEURAL NETWORK WITH LOCAL LEARNING RULESFOR MINOR SUBSPACE ANALYSIS
Yanis Bahroun †† Center for Computational NeuroscienceFlatiron Institute, Simons FoundationNew York, NY USA
Dmitri B. Chklovskii † (cid:63)(cid:63) Neuroscience InstituteNYU Langone Medical CenterNew York, NY USA
ABSTRACT
The development of neuromorphic hardware and modeling ofbiological neural networks requires algorithms with local learn-ing rules. Artificial neural networks using local learning rulesto perform principal subspace analysis (PSA) and clusteringhave recently been derived from principled objective functions.However, no biologically plausible networks exist for minorsubspace analysis (MSA), a fundamental signal processingtask. MSA extracts the lowest-variance subspace of the inputsignal covariance matrix. Here, we introduce a novel similaritymatching objective for extracting the minor subspace, MinorSubspace Similarity Matching (MSSM). Moreover, we derivean adaptive MSSM algorithm that naturally maps onto a novelneural network with local learning rules and gives numericalresults showing that our method converges at a competitiverate.
Index Terms — artificial neural networks, minor subspaceanalysis, dimensionality reduction.
1. INTRODUCTION
One of the most straightforward tasks that a neural network(NN) can perform is to learn a low-dimensional space of stimu-lus features that captures the directions of the lowest variance,forming the minor subspace (Fig. 1B). This task is known asminor subspace analysis, and some datasets are well charac-terized by the directions of least variation [1, 2, 3]. Minorsubspace analysis (MSA) has been used for tasks such as to-tal least square regression [4], direction of arrival estimation[5], and others [6, 7]. MSA is also integral to problems moreclosely related to neuroscience, such as invariance learning[8], and slow features analysis [9].While online MSA algorithms with associated NNs ex-ist [8, 10, 11], they are often the result of straightforwardadaptations of Oja’s learning rule for Principal Subspace Anal-ysis (PSA). However, these NNs inherit the non-locality ofthe learning rules characteristic of the Oja’s NNs. Besidesunderstanding and modeling brain functions, increased biolog-ical realism in artificial NN can be useful for handling large datasets or streaming tasks [12], and the development of neu-romorphic hardware [13]. In particular, biologically plausibleNNs operate online, i.e., sample-by-sample, avoiding storageof large datasets in memory. A necessary yet not sufficientlearning for learning rules in biological networks is that thelearning rules be local because synapses only have access toinformation about the neurons they connect.Our two main contributions are the following. Firstly, toovercome the non-local nature of the rules used in existingNNs, we propose a similarity matching objective function forMSA. Secondly, we show that such an objective is optimizedby an online algorithm that maps onto a neural network withlocal learning rules (Fig. 1C). Numerical experiments showthat, despite using local learning rules, our neural network per-forms competitively with existing methods that do not respectsuch constraints.
2. BACKGROUND AND PROBLEM SETTING
Given T centered input data samples ( x t ) Tt =1 ∈ R n , whichdefines the input matrix by X = [ x , . . . , x T ] ∈ R n × T , theMSA problem aims at finding a set of m orthonormal vectors,denoted W = [ w , . . . , w m ] (cid:62) ∈ R m × n , such that the projec-tions of x t onto these vectors, denoted by ( y t := Wx t ) ∈ R m has minimum variance. We also define the output matrix by Y = [ y , . . . , y T ] ∈ R m × T . The empirical covariance matrixof X , denoted by C x = T XX (cid:62) , is assumed to be full rank.One formulation of the MSA problem is thus min W ∈ R m × n , WW (cid:62) = I m
12 Tr[ WC x W (cid:62) ] . (1)Suppose the eigen-decomposition of C x = V x Λ x V (cid:62) x , where Λ x = diag ( λ x , . . . , λ xn ) , with λ x ≥ . . . ≥ λ xn > are theeigenvalues of C x . It is well known that the optimal solutionof the problem (1) is the projections of the input dataset X onto its minor subspace. The minor subspace is spanned by thecolumns of V x corresponding to the m smallest eigenvaluesof C x , denoted by V MSm = [ v xn − m +1 , . . . , v xn ] . Standardsingular value decomposition and other offline methods exist[14] to extract the m minor subspace. a r X i v : . [ c s . N E ] F e b nti-Hebbian synapsesHebbian - MW . . . x x n . . . x y y k - W . . . x x n . . . x y y k A B W T W Non-local x x n . . . x Principal Inter-neurons y y k z z l C D S ub s p ace E rr o r
50 500 5K0.11 Samples
A B C anti-Hebbian synapsesHebbian - MW . . . x x n . . . x y y k - W . . . x x n . . . x y y k A B W T W Non-local x x n . . . x Principal Inter-neurons y y k z z l C D S ub s p ace E rr o r
50 500 5K0.11 Samplesanti-Hebbian synapsesHebbian - MW . . . x x n . . . x y y k - W . . . x x n . . . x y y k A B W T W Non-local x x n . . . x Principal Inter-neurons y y k z z l C D S ub s p ace E rr o r
50 500 5K0.11 Samples E i g e n v a l u e s o f C ! Eigenvalue index
Local Feedforward Lateral ConnectionsGating - MW ... x x x n z z z n d d d n y y y m k k k m x x x n z z z n d d d n y y y m k k k m x x x n z z z n d d d n y y y m k k k m x x x n z z z n d d d n y y y m k k k m x x x n z z z n d d d n y y y m k k k m Minor Neurons z t Fig. 1 : (A) Single-layer NN performing online PSA by sim-ilarity matching (4) [17]. (B)
Example of a sorted spectrumof an input covariance matrix, with four minor componentshighlighted. (C)
Our proposed NN with local learning forMSA derived from MSSM (8).
Various NNs exist for solving MSA in the online setting. Itis natural to identify the inputs x t ∈ R n with the activity of n upstream neurons at time, t . In response, the NN outputsan activity vector, y t ∈ R m , with m the number of outputneurons. For each time t , y t is obtained by multiplying x t bythe corresponding synaptic weights, W [15]. Existing learningrules used to train NNs for MSA result from adaptations ofOja’s original work for principal subspace analysis (PSA).Indeed, PSA is a variance maximization problem formulatedas max W ∈ R m × n , WW (cid:62) = I m
12 Tr[ WC x W (cid:62) ] . (2)Oja first proposed [15] a stochastic gradient ascent algorithmfor solving PSA (2) leading to the popular Oja’s rule.Thus, it appears natural to implement a stochastic gradientdescent, instead of ascent, of the same objective function toobtain an algorithm for MSA. Oja algorithm for MSA is then ∆ W ≈ − η (cid:0) y t x (cid:62) t − y t y (cid:62) t W (cid:1) , (3)with η > the learning rate. However, besides the fact thatsuch an update rule leads to diverging weights [15], imple-menting it in a single-layer NN architecture requires non-locallearning rules [16]. Indeed, the last term in (3) implies thatupdating the weight of a synapse requires knowledge of outputactivities of all other neurons, which are not available to thesynapse. To better understand our approach for building an MSA NNwith local learning, we recall the similarity matching (SM) ap-proach for deriving single-layer PSA NNs with local learningrules [17]. If the similarity of a pair of vectors is quantified bytheir scalar product SM leads to the following objective: min Y ∈ R m × T T (cid:107) X (cid:62) X − Y (cid:62) Y (cid:107) F . (4)Despite a different form, both PSA (2) and SM (4) lead tothe same embeddings [18, 19]. Since C x and X (cid:62) X , have the [ X ! X ] E i g e n v a l u e s Eigenvalue index σλ λ n σλ λ n σλ λ n n T - n A n T - n B Eigenvalue index σ − λ σ − λ n σ − λ σ − λ n σλ λ n E i g e n v a l u e s [ σ I T − X ! X ] Eigenvalue index n T - n C σ − λ σ − λ n σ − λ σ − λ n σλ λ n [ σ X ! C − x X − X ! X ] E i g e n v a l u e s Fig. 2 : Example of spectra of the different matrices of similar-ity considered. The eigenvalues are ordered according to theeigenvalue index of X (cid:62) X . (A) shows the spectrum of X (cid:62) X ,and (B) of [ σ I T − X (cid:62) X ] , and (C) [ σ X (cid:62) C − x X − X (cid:62) X ] usedin MSSM (8).same n non-zero eigenvalues and related eigenvectors, SMalso projects X onto the subspace of m largest eigenvectors of C x .A key insight of [17, 20] was that the optimization problem(4) can be converted algebraically to an online-tractable formby introducing dynamical variables W and M : min Y ∈ R m × T min W ∈ R m × n max M ∈ R m × m T Tr (cid:0) − X (cid:62) W (cid:62) Y + 2 Y (cid:62) MY (cid:1) +2 Tr (cid:0) W (cid:62) W (cid:1) − Tr (cid:0) M (cid:62) M (cid:1) . (5)They also proposed an online algorithm based on alternatingoptimization [17] with respect to y t , and ( W , M ) that can beimplemented by a single-layer NN (Fig. 1A) as: d y t ( γ ) dγ = Wx t − My t ( γ ) , (6) ∆ W := η (cid:0) y t x (cid:62) t − W (cid:1) , ∆ M := η (cid:0) y t y (cid:62) t − M (cid:1) . (7)As before, the activity of the upstream neurons encodes in-put variables, x t . Output variables, y t , are computed by thedynamics of activity (6) in a single layer of neurons. Theyalso suggested that the elements of matrices W and M arerepresented by the weights of synapses in feedforward andlateral connections, respectively. Crucially, unlike in (3), theresulting learning rules (7) are local.
3. A SIMILARITY MATCHING APPROACH TOMINOR SUBSPACE ANALYSIS
To overcome the non-locality of existing learning rules forMSA, we propose exploring a similarity matching approach.We present the first similarity matching objective functionfor MSA, referred to as Minor Subspace Similarity Matching(MSSM) in the following. We also derive an online algorithmfor optimizing it.
To develop our MSSM algorithm, we are looking for a sim-ilarity matrix with the following property: its eigenvectorsssociated with its largest eigenvalues must span the samesubspace as the smallest non-zero eigenvalues of the originalmatrix of similarity X (cid:62) X (Fig. 2A).A similar problem was raised when considering the co-variance matrix, C x , for which at least two methods exist fortransforming the smallest eigenvalues into the largest eigenval-ues. One is by considering the eigenvalue of C − x , the otheris by considering σ I n − C x , with σ > λ x . However, neitherof these tricks work when the matrix of similarity X (cid:62) X isconsidered. Indeed, X (cid:62) X is not invertible if T > n , and has T − n zero eigenvalues. Also, shifting the spectrum of X (cid:62) X by considering σ I T − X (cid:62) X makes the null eigenvalues of X (cid:62) X the largest eigenvalues of the resulting similarity matrix(Fig. 2B), which is not the m minor subspace of C x .Let us now consider the matrix σ X (cid:62) C − x X . The afore-mentioned matrix is the scaled matrix of similarity of whitenedinput. It has n non-zero eigenvalues, all equal to σ , and thesame eigenvectors as X (cid:62) X . It is simply resulting from thefact that C − / x X has all singular values equal to 1. Assumingthat σ > λ , σ X (cid:62) C − x X − X (cid:62) X , has the following spectrum, σ − λ n ≥ . . . ≥ σ − λ > , with T − n null eigenvalues(Fig. 2C). This matrix is thus the perfect candidate for theMSSM objective.Our MSSM objective for discovering a low-dimensionalsubspace spanning the m -MS of C x , with σ ≥ λ , is thus min Y ∈ R m × T T (cid:107) σ X (cid:62) C − x X − X (cid:62) X − Y (cid:62) Y (cid:107) F , (8)from the following proposition with proof in Appendix A. Proposition 1.
Optimal solutions Y ∗ ∈ R m × T of MSSM (8) are projections of the dataset X onto the m -dimensional minorsubspace of C x , spanned by V MSx defined in Section 2
We propose a tractable min-max formulation of MSSM instru-mental for deriving an algorithm that maps onto NN with locallearning rules. We start by expanding the squared Frobeniusnorm and discard the terms independent of Y to obtain min Y ∈ R m × T − T Tr (cid:0) X (cid:62) ( σ I n − C x ) C − x XY (cid:62) Y (cid:1) + 1 T Tr (cid:0) Y (cid:62) YY (cid:62) Y (cid:1) . (9)We then introduce dynamical matrix variables W , and M inplace of σ T YX (cid:62) C − x , and T YY (cid:62) , respectively. Similarsubstitution tricks are detailed in [20]. We can now rewrite (9)as the following min-max optimization problem min Y ∈ R m × T min W ∈ R m × n max M ∈ R m × m L ( W , M , Y ) (10)with L ( W , M , Y ) := 1 T Tr (cid:0) − X (cid:62) ( σ I n − C x ) W (cid:62) Y (cid:1) + 1 T Tr(2 Y (cid:62) MY ) + Tr (cid:0) WC x W (cid:62) − M (cid:62) M (cid:1) . In the offline setting, we can solve (10) by alternating optimiza-tion [21]. We first minimize with respect to Y while holding ( W , M ) fixed, which admits a closed-form solution Y = M − ( σ WX − WC x X ) . (11)Holding Y fixed, we then perform a gradient descent-ascentstep with respect to ( W , M ) : W ← W + 2 η (cid:18) T Y ( σ I n − C x ) X (cid:62) − WC x (cid:19) ; (12) M ← M + ητ (cid:18) T YY (cid:62) − M (cid:19) . (13)Here, η > is the learning rate for both W , and τ > isthe ratio of the learning rates of W and M . The stability ofsimilar learning rules is investigated in [22, 23]. We now propose an online implementation of (8) by observ-ing that (10) can be decomposed so that optimal outputs atdifferent time steps can be computed independently as min W ∈ R m × n max M ∈ R m × m T T (cid:88) t =1 (cid:2) Tr (cid:0) WC x W (cid:62) − M (cid:62) M (cid:1) + min y t ∈ R m l t ( W , M , y t ) (cid:21) (14)with l t ( W , M , y t ) := − z t x (cid:62) t W (cid:62) y t + 2 y (cid:62) t My t . (15)with z t = ( σ − (cid:107) x t (cid:107) ) . The approximation of C x by x t x (cid:62) t isessential in the online setting as the true covariance matrix isnot available and should be approximated at each t .We can thus solve (14) sample-by-sample, i.e., online, byfirst minimizing (15) with respect to the output variables, y t .To do so, we run the following neural dynamics obtained bygradient-descent until convergence, while keeping ( W , M ) fixed: d y t ( γ ) dγ = z t Wx t − My t ( γ ) . (16)After the convergence of y t , we update ( W , M ) by gradientdescent-ascent as W ← W + 2 η t ( z t y t − Wx t ) x (cid:62) t , (17) M ← M + η t τ ( y t y (cid:62) t − M ) . (18)Similarly to PSA SM, our algorithm can be implemented by aNN with feedforward, W , and lateral connections M (Fig.1C).Here, however, the output and W update rules are gated bythe global factor z t . Global gating factors like z t have beenused outside of the similarity matching framework for PCAand ICA [24, 25]. B CA B C D E
Fig. 3 : Plot of the deviation of the MSSM, CAL, and DKA solutions from the m -MS, over 5 runs. Evaluation on the syntheticdataset with Linear spectrum for (A) m = 1 , (B) m = 2 , and (C) m = 4 . Evaluation on the synthetic dataset with Gaussianspectrum for (D) m = 1 , and (E) . Inset: Linear and Gaussian spectrum.
4. NUMERICAL EXPERIMENTS
As an illustration of the capability of the NN derived fromSMMS we provide experimental results of our algorithmagainst two popular algorithms proposed in [26] and [27],denoted by
CAL and
DKA . The competing algorithms usethe following update rules
CAL : ∆ W = − η ( WW (cid:62) yx (cid:62) − yy (cid:62) W ) ; (19) DKA : ∆ W = − η (( WW (cid:62) ) yx (cid:62) − yy (cid:62) W ) . (20)We evaluate our algorithm on two artificially generateddatasets, X ∈ R × , , with a linear spectrum, ( λ k = k/ , ∀ k ∈ { , } ) (Fig. 3A,B and C), and with a randomlygenerated spectrum (Fig. 3D and E), respectively shown Insetof Fig. 3A and Fig. 3C.The performance of the online algorithms are measuredbased on the subspace alignment error. Given matrices ( W , M ) , we define the projection F = M − W ( σ I n − C x ) .The subspace alignment error is then generally defined by therelative difference in Frobenius norm square between the truenormalized projector V MSm ( V MS (cid:62) m V MSm ) − V MS (cid:62) m and thelearned normalized projector F ( F (cid:62) F ) − F (cid:62) .In Fig. 3, we show that after convergence, the subspacespanned by the synaptic connections learned by our onlinealgorithms, is the same as the true basis vectors. In the ex-periments, our algorithm appears to be converging faster thanCAL and DKA. However, no MSA algorithm, including thoseused here, have known provable convergence rates.
5. DISCUSSION
In this work, we proposed a novel similarity matching objec-tive function, and showed that the online optimization of suchan objective leads to the extraction of the minor subspace ofthe input covariance matrix. The online algorithm we derivedmaps naturally onto a NN using only local learning rules.Generalizing our work to the learning of other minor sub-space analysis based tasks, such as slow feature analysis [9],will open a path towards principled biologically plausible forinvariance learning [8], complementing the work on transfor-mation learning from [28].
A. PROOF OF PROPOSITION
Our result is an extension of the work of Mardia that con-nected PSA and similarity matching used in [22] with proofsin [29]. The result states that similarity matching is optimizedby the projections of inputs onto the principal subspace of theircovariance, i.e., performing PSA [18, 29].
Proposition A.
For X ∈ R n × T , and fixed m ( ≤ m ≤ n ) ,amongst all projections of X onto m -dimensionsional sub-spaces of R n , the objective (4) is minimized when X is pro-jected onto its principal coordinates in m dimensions. (Mardiaet al. [29] Theorem 14.4.1). Now, to show our results we need the following two re-sults. Firstly, that X (cid:62) X and X (cid:62) C − x X are simultaneouslydiagonalizable in the basis formed by the eigenvector of X (cid:62) X .Secondly, that the subspace associated with m largest eigenval-ues of σ X (cid:62) C − x X − X (cid:62) X is the same as that the m -subspaceby the m smallest non-zero eigenvalues of X (cid:62) X . Finally, weapply Prop.A to the MSSM similarity objective. Step 1:
By the spectral theorem ∃ U ∈ O T ( R ) , composedof the eigenvectors of X (cid:62) X , such that X (cid:62) X = UΛU (cid:62) , with Λ a diagonal matrix of eigenvalues of X (cid:62) X sorted by decreas-ing order. We can now show that X (cid:62) C − x X is diagonalizablein the basis formed by the columns of U . Indeed, for all i ∈ { , n } , u i be the i -th column of U by definition we havethat X (cid:62) Xu i = λ i u i . As a result we have that X (cid:62) C − x Xu i = Tλ i X (cid:62) (cid:20) C − x T XX (cid:62) (cid:21) Xu i = Tλ i X (cid:62) Xu i = T u i , (S.1)which proves that all eigenvectors of X (cid:62) X are eigenvectorsof X (cid:62) C − x X . We can now rewrite the difference between thetwo matrices of similarities in the basis of U as σ X (cid:62) C − x X − X (cid:62) X = ˜ U ˜ σ − λ n . . . ˜ σ − λ ˜ U (cid:62) with ˜ U = [ u n , . . . , u , u n +1 , . . . , u T ] and ˜ σ = σT . We canthen use Prop.A on the new similarity matrix to prove Prop.1 . eferences [1] C. K. Williams and F. V. Agakov, “Products of Gaussiansand probabilistic minor component analysis,” NeuralComputation , vol. 14, no. 5, pp. 1169–1182, 2002.[2] M. Welling, C. Williams, and F. V. Agakov, “Extremecomponents analysis,” in
Advances in Neural Informa-tion Processing Systems , 2004, pp. 137–144.[3] Y. Weiss and W. T. Freeman, “What makes a goodmodel of natural images?,” in . IEEE, 2007,pp. 1–8.[4] Y. Gao, X. Kong, C. Hu, H. Zhang, and L. Hou, “Conver-gence analysis of M¨oller algorithm for estimating minorcomponent,”
Neural Processing Letters , vol. 42, no. 2,pp. 355–368, 2015.[5] X. Kong, C. Hu, and C. Han, “A self-stabilizing MSAalgorithm in high-dimension data stream,”
Neural net-works , vol. 23, no. 7, pp. 865–871, 2010.[6] X. Kong, C. Hu, and C. Han, “A dual purpose principaland minor subspace gradient flow,”
IEEE Transactionson Signal Processing , vol. 60, no. 1, pp. 197–210, 2011.[7] T. D. Nguyen and I. Yamada, “A unified convergenceanalysis of normalized PAST algorithms for estimatingprincipal and minor components,”
Signal processing , vol.93, no. 1, pp. 176–184, 2013.[8] N. N. Schraudolph and T. J. Sejnowski, “Competitiveanti-Hebbian learning of invariants,” in
Advances inNeural Information Processing Systems , 1992, pp. 1017–1024.[9] L. Wiskott and T. J. Sejnowski, “Slow feature analysis:Unsupervised learning of invariances,”
Neural computa-tion , vol. 14, no. 4, pp. 715–770, 2002.[10] F.-L. Luo, R. Unbehauen, and A. Cichocki, “A minorcomponent analysis algorithm,”
Neural Networks , vol.10, no. 2, pp. 291–297, 1997.[11] G. Cirrincione, M. Cirrincione, J. H´erault, and S.Van Huffel, “The MCA EXIN neuron for the minorcomponent analysis,”
IEEE Transactions on Neural Net-works , vol. 13, no. 1, pp. 160–187, 2002.[12] A. Giovannucci, V. Minden, C. Pehlevan, and D. B.Chklovskii, “Efficient principal subspace projection ofstreaming data through fast similarity matching,” in .IEEE, 2018, pp. 1015–1022.[13] C. Pehlevan, “A spiking neural network with local learn-ing rules derived from nonnegative similarity matching,”in
ICASSP 2019-2019 IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) .IEEE, 2019, pp. 7958–7962.[14] G. Allaire and S. M. Kaber,
Numerical linear algebra ,vol. 55, Springer, 2008.[15] E. Oja, “Principal components, minor components, and linear neural networks,”
Neural Networks , vol. 5, no. 6,pp. 927–935, 1992.[16] C. Pehlevan and D. B. Chklovskii, “Neuroscience-inspired online unsupervised learning algorithms: Artifi-cial neural networks,”
IEEE Signal Processing Magazine ,vol. 36, no. 6, pp. 88–96, 2019.[17] C. Pehlevan and D. Chklovskii, “A normative theory ofadaptive dimensionality reduction in neural networks,”in
Advances in Neural Information Processing Systems ,2015, pp. 2269–2277.[18] T. F. Cox and M. A. Cox,
Multidimensional scaling ,Chapman and hall/CRC, 2000.[19] C. K. Williams, “On a connection between kernel PCAand metric multidimensional scaling,” in
Advances inNeural Information Processing Systems , 2001, pp. 675–681.[20] C. Pehlevan, A. M. Sengupta, and D. B. Chklovskii,“Why do similarity matching objectives lead toHebbian/anti-Hebbian networks?,”
Neural Computation ,vol. 30, no. 1, pp. 84–124, 2018.[21] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse codefor natural images,”
Nature , vol. 381, pp. 607–609, 1996.[22] C. Pehlevan, T. Hu, and D. B. Chklovskii, “AHebbian/anti-Hebbian neural network for linear subspacelearning: A derivation from multidimensional scaling ofstreaming data,”
Neural computation , vol. 27, no. 7, pp.1461–1495, 2015.[23] D. Lipshutz, Y. Bahroun, S. Golkar, A. M. Sengupta,and D. B. Chkovskii, “A biologically plausible neuralnetwork for multi-channel canonical correlation analysis,” arXiv preprint arXiv:2010.00525 , 2020.[24] T. Isomura and T. Toyoizumi, “Error-gated hebbian rule:A local learning rule for principal and independent com-ponent analysis,”
Scientific reports , vol. 8, no. 1, pp.1–11, 2018.[25] T. Isomura and T. Toyoizumi, “Multi-context blindsource separation by error-gated hebbian rule,”
Scientificreports , vol. 9, no. 1, pp. 1–13, 2019.[26] T. Chen, S. I. Amari, and Q. Lin, “A unified algorithmfor principal and minor components extraction,”
Neuralnetworks , vol. 11, no. 3, pp. 385–390, 1998.[27] S. C. Douglas, S.-Y. Kung, and S.-i. Amari, “A self-stabilized minor subspace rule,”
IEEE Signal ProcessingLetters , vol. 5, no. 12, pp. 328–330, 1998.[28] Y. Bahroun, A. Sengupta, and D. B. Chklovskii, “Asimilarity-preserving network trained on transformedimages recapitulates salient features of the fly motiondetection circuit,” in
Advances in Neural InformationProcessing Systems , 2019, pp. 14178–14189.[29] K. V. Mardia, J. T. Kent, and J. M. Bibby,