[PDF] A Historical Review of Forty Years of Research on CMAC

Abstract

The Cerebellar Model Articulation Controller (CMAC) is an influential brain-inspired computing model in many relevant fields. Since its inception in the 1970s, the model has been intensively studied and many variants of the prototype, such as Kernel-CMAC, Self-Organizing Map CMAC, and Linguistic CMAC, have been proposed. This review article focus on how the CMAC model is gradually developed and refined to meet the demand of fast, adaptive, and robust control. Two perspective, CMAC as a neural network and CMAC as a table look-up technique are presented. Three aspects of the model: the architecture, learning algorithms and applications are discussed. In the end, some potential future research directions on this model are suggested.

Full PDF

AA Historical Review of Forty Years of Research on CMAC

Frank Z. Xing

School of Computer Science and EngineeringNanyang Technological University [email protected]

Abstract — The Cerebellar Model Articulation Controller(CMAC) is an inﬂuential brain-inspired computing model inmany relevant ﬁelds. Since its inception in the 1970s, the modelhas been intensively studied and many variants of the prototype,such as KCMAC, MCMAC, and LCMAC, have been proposed.This review article focus on how the CMAC model is graduallydeveloped and reﬁned to meet the demand of fast, adaptive, androbust control. Two perspective, CMAC as a neural networkand CMAC as a table look-up technique are presented. Threeaspects of the model: the architecture, learning algorithms andapplications are discussed. In the end, some potential futureresearch directions on this model are suggested.

I. INTRODUCTIONThe Cerebellar Model Articulation Controller (CMAC)was proposed by J. S. Albus in 1975 [2]. Parallel at this timein the history, the concept of perceptron [23] had alreadybeen popular, whereas effective learning schemes to tuneperceptrons [24] were not on the stage yet. In 1969, Minskyand Papert also pointed out the limitations that the exclusivedisjunction logic cannot be solved by the perceptron modelin their book

Perceptrons: An Introduction to ComputationalGeometry . These facts made it less promising to considerCMAC as a neural network form. Consequently, althoughthe name of CMAC appears bio-inspired enough, and thetheory that the cerebellum is analogous to a perceptron hasbeen proposed earlier [1], CMAC was emphasized to beunderstood as a table referring technique that can adaptiveto real-time control system. Nevertheless, the underlyingbioscience mechanism was addressed again in 1979 by Albus[3], which always gives the CMAC model two different waysof interpretation.The structure of CMAC was originally described as twointer-layer mappings illustrated in Fig. 1. The control func-tions are represented in the weighted look-up table, ratherthan by solution of analytic equations or by analog [2]. If weuse S to denote sensory input vectors, and let A and P standfor association cells and response output vectors respectively,both CMAC and multilayer percptrons (MLP) model can beformalized as: (cid:26) f : S → Ag : A → P the ﬁnal output can be calculated as: y = ∑ A ∗ i w i where the asterisk denotes a link or activation betweencertain association cell and the output. The nuance between Fig. 1. The primary CMAC structure

MLP and CMAC model is that, for mapping f , MLP modelis fully connected but CMAC restricts the association in acertain neighboring range. This property of mapping signif-icantly accelerates the learning process of CMAC, which isconsidered a main advantage of it comparing to other neuralnetwork models.It is notable that CMAC may not represent accuratelyhow the human cerebellum works, even at a most simpliﬁedlevel. For instance, recent biological evidence from eyelidconditioning experiments suggests that the cerebellum iscapable of computing exclusive disjunction [34]. However,CMAC is still an important computational model, becausethe restriction on mapping function effectively decreased thechance of been trapped in a local minimum during learningprocess.Despite the advantage aforementioned, CMAC has thefollowing disadvantages as well [25]: • many more weight parameters are needed comparing tonormal MLP model • local generalization mechanism may cause some train-ing data to be incorrectly interpolated, especially whendata is sparse considering the size of CMAC • CMAC is a discrete model, analytical derivatives do notexistAs a response to these problems, modiﬁed or high orderCMAC, storage space compressing, and fast convergentlearning algorithms are continuously studied. These discov- a r X i v : . [ c s . N E ] F e b ries will be elaborated in the following sections. Recentadvances in Big Data and computing power seem to havewatered down these problems. But in many physical scenar-ios, the computing power are still restricted and high speedresponses are required. This serves for the reason to furtherstudy CMAC-like models, though it has been in and out offashion for several times.Although there has been few other pioneer review workson CMAC, for instance by Mohajeri et al. in 2009 [20], thisarticle is inventive for its chronological perspective. ratherthan emphasizing on detailed techniques. The remainder ofthe article is organized as follows: Section II provides theevolution trajectory of CMAC structure and efﬁcient storagetechniques. Section III discusses the learning algorithms.Section IV presents various circumstances that CMAC modelhas been applied. Section V summarizes the paper, andinstigates discussions about potential improvements that canbe made on CMAC model.II. ARCHITECTURE A. Basic Architectures

Before the CMAC was proposed in the 1970s, the anatomyand physiology of cerebellum has been studied for a longtime. It is widely agreed that many different type of nervecells are involved in cerebellar functioning. Fig. 2 shows thecomputational model proposed by Mauk and Donegan [18].A more simple and implementable model proposed by Albus[3] is exhibited in Fig. 3.

Fig. 2. Mauk’s model assumes complex interaction between different typeof cells: open arrow stands for plastic synapses; ﬁlled circle arrow standsfor inhibitory synapses; hollow square arrow stands for excitatory synapses.

Based on the computational model of cerebellum (Fig. 3),the primary CMAC structure as shown in Fig. 1 is conceived.While it is obvious that high dimensional proximity ofassociation rules cannot be captured in this primary form,because the nodes are arranged into one dimensional array.A simple solution to this problem is to introduce somenonlinearity to the ﬁrst mapping. As a result, another layercalled “conceptual memory” was soon added to the CMACstructure, which involves one additional mapping to theprimary structure. The function of conceptual memory isillustrated in Fig. 4.

Fig. 3. Albus’ model [1] of cerebellum resembles a forward feedperceptron. It assumes climbing ﬁber activity would cause parallel ﬁbersynapses to be weakened. S stands for stellate b cells, B stands for basketcells

If we use A to represent the actual memory (AssociationCells), M is the conceptual memory to encode S . Then theconceptual mapping f is more sparse and constrained withina certain range, but mapping g could be random.  f : S → Mg : M → Ah : A → P When it comes to implementation, the connectivism per-spective to recognize CMAC as a neural network and thetable referring perspective are equivalent. Fig. 5 illustratesthe difference at a conceptual level [25]. To the upper part isa two input one output neural network structure, to the lowerpart is a two input one output table look-up structure.An intuitive observation from both Fig. 4 and Fig. 5is that the number of weights will increase exponentiallywith the number of input variables. This problem brings outtwo challenges: 1) The storage of weights become spaceconsuming; 2) The training process becomes difﬁcult toconverge and waiting time before termination will lengthen.

Fig. 4. CMAC with conceptual memoryig. 5. A conceptual level comparison of two different perspectives

A previously used technique to solve the ﬁrst challenge iscalled tile coding, which latter was developed to an adaptiveversion as well [36]. The advantage of tile coding is thatwe can strictly control the number of features through tilesplit. Another commonly employed trick is called hashing.This technique is applied to CMAC in the 1990s, andmaps a large virtual space A to a smaller physical address A (cid:48) . Many common hashing function f h can be used, forinstance from MD5 or DES, which are fundamental methodsin cryptography. However, how to reduce the collision forspeciﬁc problems according to the data property is stillconsidered an art.Literature [25] introduced a hardware implementationwhich uses selected binary bits of indexes as the hashingcode. Whereas other research, e. g. [35], claims that dueto the learning rate diminishing and slower convergence,hashing is not effective for enhancing the capability ofapproximation by CMAC. Therefore, many other attemptshave been made, such as neighbor sequential, CMAC withgeneral basis functions, and adaptive coding [7], which isa similar idea to hashing in the sense of weight spacecompression. B. Modiﬁed Architectures

Since simply increasing the CMAC size gives diminishingreturns, two directions of modiﬁcation are undertaken to pushforward the research on CMAC. The ﬁrst consideration isto combine multiple low dimensional CMACs. The second consideration is to introduce other properties, for examplespline neurons, fuzzy coding or cooperative PID controller.Cascade CMAC architecture was ﬁrstly proposed in 1997for the purpose of printer calibration [10] (Fig. 6). Inputvariables are sequentially added to keep each of the CMACcomponent two dimensional.

Fig. 6. Cascade CMAC architecture

Another method to combine multiple CMACs can be re-alized by voting technique. If we regard the Cascade CMACmodel as a fusion of input information at a feature level, thenthe voting CMACs can be reckoned as a fusion at decisionlevel. Each small CMAC just accept a subset of the wholeinput space. In this case an important antecedent is that inputdata is well partitioned. The reason to this requirement is thatvoting lift can only be achieved by heterogeneous expertnetworks. Taken this into account, some prior knowledgeof input data or unsupervised clustering techniques can beapplied in this stage.If we make more efforts for dimension reduction, multiplelevels of voting can be used. Then the architecture canbe reckoned as a Hierarchical CMAC (H-CMAC), whichis described by Tham [32] in 1996. H-CMAC has severaladvantages, such as less storage space and fast adaptationfor learning non-linear functions. In Fig. 7, a two level H-CMAC is illustrated. It is noticeable that each conventionalCMAC components in different layers plays different role.The gating network works at a higher level.

Fig. 7. Hierarchical CMAC architecture, adapted from [32] hese architectures can be employed at the same time withmore fundamental modiﬁcations for the second considera-tion. In 1992, Lane et al. [11] descried high order CMACas CMAC with binary neuron output replaced by splinefunctions. This modiﬁcation brings about more parametersof splines, but makes the output derivable, which sometimesgives a better performance because the learning phase goesdeeper. Sharing the idea to allow more meticulous transferfunction, a similar modiﬁcation can be made by introducinglinguistic rules and fuzzy logic.Linguistic CMAC (LCMAC) was proposed by He andLawry based on label semantics, rather than mapping func-tions in 2009. A cascade of LCMAC series was furtherdeveloped in 2015 [8]. Borrowing the terminology of “fo-cal element” from evidence theory, the properties used toactivate it is represented as membership function (usuallytrapezoidal) of several attributes. Therefore, for each inputtuple, the excited neurons form a hypercube in the matrixof all the weights as memory. The responsive output can bedistributionally depicted as: P ( a | x ) = N ∏ d = m x d ( F d i ) where P is the probability of some memory unit a beenactivated given the input vector x . F denotes focal element. d i is the index of linguistic attributes. m is the hidden weightsfor F called “mass assignment”.Fuzzy CMAC (FCMAC) is yet another form of fuzzycoding. The intuition to use fuzzy representation is similarto using spline function. For most well deﬁned problems,the nature of CMAC approximation is using multiple stepsto emulate a smooth surface. Proper selection of fuzzymembership function would obviously relief the pressureof weight storage and training. From my understanding,FCMAC is an inverse structure of many established Neuro-Fuzzy Inference Systems. Usually, two extra fuzzy/defuzzylayers are added next to the association layer, the consequentscan be Mamdani type, TSK type, weights, or a hybrid ofthem, e. g. Parametric-FCMAC (PFCMAC) [21]. More ad-vanced FCMAC models, maybe inspired by spline methods,use interpolation to solve the discrete inference problem. In2015, Zhou and Quek proposed FIE-FCMAC [39], whichadds fuzzy interpolation and extrapolation to a rigid CMACstructure.Recently, CMAC applications to more speciﬁc scenar-ios are studied. For example, for control of time-varyingnonlinear system, a combination of Radial Basis Functionnetwork and the local learning feature of CMAC is proposed(RBFCMAC). It is reported that using RBFs can preventparameter drift and accelerate synchronization speed to thechanging system [17]. For this type of combination, besideRBF, Wavelet Neural Network (WNN), fuzzy rule neuronand recurrent mechanism [15] or a mingle of them can alsobe employed with CMAC model simultaneously. Previousworks, such as [38], have provided evidence that thesefeatures are effective for modeling complicated dynamicsystems. In a broader scenario, CMAC can be applied with othercontrol systems as well (Fig. 8). Traditionally, the role ofCMAC at the primary stage is to assist the output of a maincontroller. As the training proceed, error between CMACand the actual model decays. CMAC takes charge of themain controller. Back-forward signal acceptor or conjugatedCMACs are often used to accelerate this process [13]. Moreprecisely, this arrangement is a change of information ﬂowbut not a change of architecture. Fig. 8. A combined control system of CMAC and PID

Similarly, CMAC structures that modiﬁes the storage op-timizing methods, for example quantization [16] and multi-resolution [19], will only result in architectural differencefrom a hardware implementation sense. According to mypersonal understanding, they are the same thing regarding theconceptual structure, though these techniques are of sufﬁcientinterests to be discussed in Section III.III. LEARNING ALGORITHMThe original form of learning proposed Albus is basedon backward propagation of errors. The fast convergence ofthis algorithm is proved mathematically by succeeding re-searches. Speciﬁcally, the convergence rate is only governedby the size of receptive ﬁelds for association cells [37]. Somestudy [25] suggests that it would be useful to distinguishbetween target training and error training, despite they sharethe same mathematical form. In the weights updating rule, w i ( k + ) = w i ( k ) + α Errorcount ( A ∗ ) x i k is the epoch times, α ∈ [ , ] is the learning rate, x i is a stateindicator of activation. The learning process is theoreticallyfaster with a larger α , while overshooting may occur. If thedifference between output and the desired value ∆ y = y − ˆ y can well deﬁne the error, target training and error trainingwill be equivalent. In certain cases, Error = c ( y − ˆ y ) is a popular cost function as well.Based on the original learning algorithm, many develop-ments are further derived. Most of them can be categorizedinto two directions of improvement. The ﬁrst relies on extrasupervisory signals or value assignment mechanism basedon statistics. The second endeavors to optimize the use ofmemory. From a more practical sense, the dichotomy canbe understood as learning to adjust value of weights, andlearning to adjust number or size of weights. . Adjusting value of weights Facing the trade off between speed of convergence andlearning stability, it is intuitive to consider using a relativelylarge α at the beginning, and slow down the weight adjust-ment near the optimum point. This improvement is calledadaptive learning rate [14]. It can be achieved by usingtwo CMAC components, one as main controller, anotheras supervisory controller or compensated controller. Anotherway to achieve this adaption is by imposing a resistant termto the weight updating rule: w i ( k + ) = w i ( k ) + α c α i Errorcount ( A ∗ ) x i In the above rule, α and c are constants, α i denotes theaverage activation times of the memory units that have beenactivated by training sample i .For both ﬁxed learning rate and adaptive learning rate,weights adjustment start from a randomized set of param-eters. Experiments suggest that for sparse data, even if thetraining samples are within the local generalization range,perfect linear interpolation may not be achieved [25]. As aresult, the approximation may appears to have many smallzigzag patterns. Therefore, the weight smoothing techniqueis proposed. After each iteration, the weights are globallyadjusted according to: w i ( A ∗ j ) = ( − δ ) w i ( A ∗ j ) + δ count ( A ∗ ) count ( A ∗ ) ∑ k = w i ( A ∗ k ) where δ is the proportional coefﬁcient measuring the shareof the weight with index j that needs to be replaced by theaverage of all activated weights A ∗ .While the weight smoothing technique tries to affect asmuch memory as possible in one iteration, repeated activa-tion of the same units may not be a good thing. In 2003,research [26] pointed out that equally assign the error toeach weight is not a meticulous method. During the learningprocess, if an associative memory is activated many times,which means many relevant samples are already learned, theweight should be more close to the desired value. In otherwords, the weight value is more “credible”. In this situation,less error should be assigned to it so that other memory unitscan learn faster. This rule is called learning based on creditassignment: w i ( k + ) = w i ( k ) + α f ( i ) − g ∑ i = f ( i ) − ( y − g ∑ i = w i ( k )) x i where g is a parameter regarding the degree of local gen-eralization, f ( i ) records the times memory unit i has beenactivated. Further research has proved that the convergenceis guaranteed with learning rate α < xxx = { x , x , ..., x n } , the output can directly be theweight of the winning neuron. The weights updating rule canbe formalized according to Hebbian theory: w xxx ( k + ) = w xxx ( k ) + α ( y ( k ) − w xxx ( k )) Note that the index of xxx can be time dependent.A modiﬁed version of the aforementioned rule involvesnot only inputs, but also errors as feedback: w xxx , eee ( k + ) = w xxx , eee ( k ) + α ( y ( k ) − w xxx , eee ( k )) Using this learning mechanism, the SOFM-like CMAC isnamed MCMAC. In 2000, Ang and Quek [5] proposedlearning with momentum, neighborhood, and averaged fuzzyoutput for both CMAC and MCMAC. For CMAC andMCMAC with momentum, the weights updating rules canbe written as: (cid:26) ∆ w j ( k + ) = α ∆ w j ( k ) + η δ j ( k ) y j ( k ) ∆ w xxx , eee ( k + ) = α ∆ w xxx , eee ( k ) + λ ( − α )[ y ( k ) − w xxx , eee ( k )] where the ﬁrst term represents a momentum, the second termis a back propagation term with learning rate η and localgradient δ j . j is the index for activated weights.Therefore, given a sequential learning process, the aggre-gational weights adjustment can be derived from the aboverules: ∆ w j ( k + ) = η k ∑ i = α k − δ j ( i ) y j ( i )= − η k ∑ i = α k − ∂ Error ∂ w j ( k ) When the sign of ∂ Error ∂ w j keeps unchanged, ∆ w j is acceler-ating; while if the sign is reversing, ∆ w j will slow down tostabilize the learning process.The neighborhood learning rule proposed by Ang andQuek [5] serves the same purpose as weight smoothingtechnique. However, they used the Gaussian function to putmore attention to those neurons surrounding the winningneuron instead of evenly adjusting each weight regardlessof the distance. Weights updating rule for MCMAC withmomentum and neighborhood in singular form is: ∆ w i ( k + ) = α ∆ w i ( k ) + h i j [ λ ( − α )( y ( k ) − w i ( k ))] where h i j = exp (cid:16) − | r j − r i | σ (cid:17) is the distance metric between neuron j and the winningneuron i .Beside using additional terms such as momentum andneighborhood, kernel method can also be applied to CMAClearning. In 2007, Kernel CMAC (KCMAC) was proposed[9], [11] to reduce the CMAC dimension which usuallyhazard the training speed and convergence. KCMAC treatsthe association memory as the feature space of a kernelmachine, thus weights can be determined by solving anptimization problem. The supervised learning using error eee as slack variables is to achieve:min , eee T + β n ∑ i = eee i s . t . T φ ( u i ) + β eee i (cid:62) , i = , , ..., n where denotes weights, coefﬁcient β serves as penaltyparameter, φ ( · ) is the mapping function and K ( uuu , ) = φ ( uuu ) · φ ( ) is the kernel function.The standard procedure to this problem is to solve themaxima-minima of Lagrangian function. Though other learn-ing method can be employed as well, for instance, usingBayesian Ying-Yang (BYY) learning as proposed in 2013by Tian et al [33]. The key idea behind BYY learningcan be represented by harmonizing the joint probabilitywith different product form of Bayesian components. In thisspeciﬁc KCMAC case, uuu and output zzz are observable, while is a hidden variable. The joint distribution can either bewritten as ying form or yang form. (cid:26) p ying ( uuu , zzz , ) = p ( ) p ( uuu | ) p ( zzz | uuu , ) p yang ( uuu , zzz , ) = p ( uuu ) p ( zzz | uuu ) p ( | uuu , zzz ) Our goal is to maximize H ( p ying , p yang ) , H ( p ying , p yang ) = (cid:90)(cid:90)(cid:90) p ying ln p yang In practice, ∆ H is frequently calculated depending on howthe conditional probabilities are estimated and maximum of H is achieved by searching heuristically. B. Adjusting number of weights

Adjusting number of weights is chieﬂy realized by in-troducing multi-resolution and dynamic quantization tech-niques. As Section I has explained, CMAC was ﬁrstly usedfor real time control system. Consequently, the structuraldesign ﬁts the hardware implementation well. Many CMACvariants, such as LCMAC and FCMAC, inherit the memoryunits division with a lattice-based manner. Inputs are gener-ally built on grids with equal space. This characteristic addslocal constraints to the value of adjacent units.If we consider this problem from a function approximationperspective, it is also rather intuitive that local complex shapeneeds a larger number of low-order elements to approach.Naturally, multi-resolution lattice techniques [19] were pro-posed in the 1990s.With our prior knowledge, some metrics can be used todetermine the resolution, for example, the variation of outputin certain memory unit areas, which can be formalized asfollowing: resolution ∝ v = N N ∑ j = | y j − ¯ y | where N is the number of output samples, v is the variance.Other attempts use a tree structure to manage the resolutionhierarchically. New high resolution lattice is generated onlyif the threshold of variance is exceeded. The concept of quantization is almost identical to resolu-tion. The nuance may be that the terminology quantization puts more emphasis on the discretion of continuous sig-nal. Furthermore, increasing resolution will also cause thenumber of weights to grow. Quantization deals with a givenmemory capacity problem.To the best of my knowledge, the idea of adaptive quan-tization was initially proposed in 2006 [16]. The algorithmused to interpolate points looks on change of slopes, whichis also similar to the variance metric discussed above. Theinput space is initialized with uniform quantization. For eachpoint x i , the slopes are calculated by neighbor points.ˆ slope j = f ( x j ) − f ( x i ) x j − x i , j = , , ... Here f stands for the mapping function between input vectorand association cells. The change of sign for correspondingdirections indicates ﬁner local structure, or ﬂuctuation inother words, thus a new critical point is added to splitthe interval. The termination condition is that, within allintervals, the difference of output values should not exceedan experimental constant C [28]. The adaptive quantizationtechnique is later developed to the Pseudo Self EvolvingCMAC (PSECMAC) model [30], which further introducedneighborhood activation mechanism.IV. APPLICATIONThough CMAC was ﬁrstly proposed for manipulator con-trol, during the past decades it has been proved effectivein robotic arm control, fractionator control, piezoelectricactuator control [22], mechanic system, signal processing,intelligent driver warning system, fuel injection, canal controlautomation, printer color reproduction, ATM cash demand[31], and many other ﬁelds [7]. This can be attributed tothe fast learning and stable performance of CMAC models.Moreover, engineering-oriented software and toolkit alsohelp to promote the application of CAMC. In the followingpart, CMAC application to two emerging engineering ﬁeldsare elaborated. A. Financial Engineering

The practice side of ﬁnancial engineering has employedvarious instruments to model the price and risk of bond,stock, option, future and other derivatives. In most cases,historic data can be partitioned into chunks of selected timespan to enable a supervised learning process. In 2008, Teddyet al [29] proposed an improved CMAC variant (PSECMAC)to model the price of currency exchange American call op-tion. Three variables are taken into consideration: differencebetween strike price and current price, time of maturity, andpricing volatility. Thus the pricing function is formulated as: C = f ( S − X , T , σ ) The article reported PSECMAC as the best-performingmodel among several CMAC-like systems. It is also reportedthat most of the CMAC models produce better result thanusing Black-Scholes model in sense of RMSE. Thoughor American option, Black-Scholes model is not a goodbenchmark because it is sensitive to details of calculation.This work further constructed an arbitrage system basedon the pricing model. Positions are adjusted according to theDelta hedging ratio. Experiments suggested the model has amarginal positive ROI, omitting transaction costs.

B. Adaptive Control

The application of CMAC on commercial devices maybe more advantageous. The pressure to reduce cost anddemand of embedded controller make CMAC-like models agood choice. Recently, studies have been carried on adaptivecontrol of disabled wheelchair and seatless unicycle [12].This type of problems can be formalized as controlling a setof variables in certain range (e. g. speed and balance angle)with a set of unknown and varying variables (e. g. frictionto the ground and weight of the rider).Li et al [12] proposed a TSK type FCMAC to synthesisthe equations of adaptive steering control. The back-steppingerror is associated with the torque τ , which can simultane-ously effect on critical moments. (cid:26) ¨ θ = ¯ A ( θ , ˙ θ ) + ¯ B ( θ )( τ − µ ˙ φ − c sgn ( ˙ φ )) ¨ φ = ¯ C ( θ , ˙ θ ) + ¯ D ( θ )( τ − µ ˙ φ − c sgn ( ˙ φ )) As the output adapting to the moment parameters, thebalance angle is controlled near zero.The performance of FCMAC is benchmarked with alinear-quadratic regulator for differential equations to de-scribe the state of the motor. Simulations suggest that LQRis not able to converge speed and angle of balance, whileFCMAC provides satisfactory result.V. DISCUSSIONSReferring to Section IV, CMAC was proved to be effectivein many classic control problem and has been applied toemerging engineering problems. However, this model seemshave encountered a bottleneck because of the lack of fun-damental breakthrough during the past decade. Nowadays,issues discussed are mainly focused on trivial modiﬁcationon memory structure and learning algorithm. The frameworkof error propagation or minimization of loss function is keptunchanged.According to my understanding, the limitation of currentCMAC models can be ascribe to its over simpliﬁcation ofthe cerebellum structure. Therefore, the next generation cere-bellar model may adopt new discoveries from neuroscience.For example, the associative memory cells may take differentroles rather than been treated identically. In fact, anatomicalmodels usually feature several types of elementary cerebellarprocessing units. Meanwhile, the theory of Spike TimingDependent Plasticity (STDP) suggests that the learning pro-cess of ﬁring neurons may be ordered [6]. This featurecan introduce far more complexity to the current learningalgorithm. R

EFERENCES[1] J. S. Albus, “A Theory of Cerebellar Function,”

Mathematical Bio-sciences , pp. 25-61, 1971.[2] J. S. Albus, “A New Approach to Manipulator Control: the CerebellarModel Articulation Controller (CMAC),”

Trans. ASME, Series G.Journal of Dynamic Systems, Measurement and Control , 97, pp. 220-233, 1975.[3] J. S. Albus, “Mechanisms of Planning and Problem Solving in theBrain,”

Mathematical Biosciences , 45, pp. 247-293, 1979.[4] P. C. E. An, W.T. Miller, and P.C. Parks, “Design Improvements inAssociative Memories for Cerebellar Model Articulation Controllers,”in

Proceedings of ICANN , pp. 1207-10, 1991.[5] K. K. Ang and C. Quek, “Improved MCMAC with momentum,neighborhood, and averaged trapezoidal output,”

IEEE Transactionson Systems. Man and Cybernetics: PartB , 30(3), pp. 491-590, 2013.[6] N. Caporale and Y. Dan, “Spike TimingDependent Plasticity: AHebbian Learning Rule,”

Annual Review of Neuroscience , 31(1), pp.25-46, 2008.[7] P. Duan and H. Shao, “CMAC Neural Network based Neural Com-putation and Neuro-control,”

Information and Control (in Chinese) ,28(3), 1999.[8] H. He, Z. Zhu, A. Tiwari, and A. Mills, “A Cascade of LinguisticCMAC Neural Networks for Decision Making,” in

International JointConference on Neural Networks (IJCNN) , 2015.[9] G. Horv´ath and T. Szab´o, “Kernel CMAC with improved capability,”

IEEE Transactions on Systems. Man and Cybernetics: PartB , 37(1),pp. 124–38, 2007.[10] K.-L. Huang, S.-C. Hsieh, and H.-C. Fu, “Cascade-CMAC neuralnetwork applications on the color scanner to printer calibration,” in

International Conference on Neural Networks , 1997.[11] S. H. Lane, D. A. Handelman, and J. J. Gelfand, “Theory anddevelopment of higher-order CMAC neural networks,”

IEEE ControlSystems , April, pp. 23–30, 1992.[12] Y.-Y. Li, C.-C. Tsai, F.-C. Tai and H.-S. Yap, “Adaptive steering controlusing fuzzy CMAC for electric seatless unicycles,”

IEEE InternationalConference on Control & Automation , pp. 556–561, 2014.[13] C. C. Lin and F. C. Chen, “On a new CMAC control scheme, and itscomparisons with the PID controllers,”

Proceedings of the AmericanControl Conference , pp. 769–774, 2001.[14] C.-M. Lin and Y.-F. Peng, “Adaptive CMAC-based supervisory controlfor uncertain nonlinear systems,”

IEEE Transactions on Systems, Man,and Cybernetics, Part B , 34(2), pp. 1248–60, 2004.[15] C.-M. Lin and H.-Y. Li, “Self-organizing adaptive wavelet CMACbackstepping control system design for nonlinear chaotic systems,”

Nonlinear Analysis: Real World Applications , 14(1), pp. 206–223,2013.[16] H.-C. Lu, M.-F. Yeh, and J.-C. Chang, “CMAC Study with AdaptiveQuantization,”

IEEE Intl. Conf. on Systems, Man, and Cybernetics ,Taipei, pp. 2596–2601, 2006.[17] C. J. B. Macnab, “Using RBFs in a CMAC to prevent parameter driftin adaptive control,”

Neurocomputing , pp. 45–52, 2016.[18] M. D. Mauk and N. H. Donegan, “A Model of Pavlovian EyelidConditioning Based on The Synaptic Organization of Cerebellum,”

Learn. Mem. , 3, pp. 130-158, 1997.[19] A. Menozzi and M.-Y. Chow, “On the training of a multi-resolutionCMAC neural network,” in

Proceedings of the IEEE InternationalSymposium on Industrial Electronics , 1997.[20] K. Mohajeri, G. Pishehvar, and M. Seiﬁ, “CMAC neural networksstructures,” in

IEEE International Symposium on Computational In-telligence in Robotics and Automation , 2009.[21] K. Mohajeri, M. Zakizadeh, B. Moaveni, and M. Teshnehlab, “FuzzyCMAC structures,” in

IEEE International Conference on Fuzzy Sys-tems , 2009.[22] Y.-F. Peng, R.-J. Wai and C.-M. Lin, “Adaptive CMAC model refer-ence control system for linear piezoelectric ceramic motor,” in

IEEEInternational Symposium on Computational Intelligence in Roboticsand Automation , 2003.[23] F. Rosenblatt, “Principles of Neurodynamics: Perceptrons and theTheory of Brain Mechanisms,”

Spartan Books , Washington DC, 1961.[24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning InternalRepresentations by Error Propagation,”

Nature , 323(9), 1986.[25] R. L. Smith, “Intelligent Motion Control with an Artiﬁcial Cere-bellum,”

PhD Thesis of The University of Auckland, New Zealand ,Chapter 3, 1998.26] S.-F. Su, T. Tao, and T.-H. Hung, “Credit assigned CMAC and itsapplication to online learning robust controllers,”

IEEE Transactionson Systems, Man, and Cybernetics, Part B (Cybernetics) , 33(2), 2003.[27] R. S. Sutton, “Generalization in Reinforcement Learning: SuccessfulExamples Using Sparse Coarse Coding,”

Advances in Neural Infor-mation Processing Systems , pp. 1038–1044, MIT Press, 1996.[28] S. D. Teddy, E. M.-K. Lai, and C. Quek, “Hierarchically ClusteredAdaptive Quantization CMAC and Its Learning Convergence,”

IEEETRANSACTIONS ON NEURAL NETWORKS , 18(6), 2007.[29] S. D. Teddy, C. Quek, and E. M.-K. Lai, “A cerebellar associativememory approach to option pricing and arbitrage trading,”

Neurocom-puting , 19(4), 2008.[30] S. D. Teddy, C. Quek, and E. M.-K. Lai, “PSECMAC: A Novel Self-Organizing Multiresolution Associative Memory Architecture,”

IEEETrans. Neural Networks , 19(4), pp. 689–712, 2008.[31] S. D. Teddy and S. K. Ng, “Forecasting ATM cash demands usinga local learning model of cerebellar associative memory network,”

International Journal of Forecasting , 27(3), pp. 760–776, 2011.[32] C.-K. Tham, “A hierarchical CMAC architecture for context dependentfunction approximation,” in

IEEE International Conference on NeuralNetworks , 1996.[33] K. Tian, B. Guo, G. Liu, I. Mitchell, D. Cheng, and W. Zhao,“KCMAC-BYY: Kernel CMAC using Bayesian YingYang learning,”

Neurocomputing , 101, pp. 24-31, 2013.[34] H. Voicu, “The Cerebellum: An Incomplete Multilayer Perceptron?,”

Neurocomputing , 72, pp. 592-599, 2008.[35] Z.-Q. Wang, J. L. Shiano, and M. Ginsberg, “Hash-coding in CMACNeural Networks,” in

IEEE International Conference on Neural Net-works , Washington, pp. 1698-1703, 1996.[36] S. Whiteson, M. E. Taylor, and P. Stone “Adaptive Tile Coding forValue Function Approximation,”

AI Technical Report AI-TR-07-339 ,University of Texas at Austin, 2007.[37] Y.-F. , “CMAC Learning is Governed by a Single Parameter,” in

IEEE International Conference on Neural Networks , San Francisco,pp. 1439-43, 1993.[38] F. Z. Xing, E. Cambria and X. Zou, “Predicting Evolving Chaotic TimeSeries with Fuzzy Neural Networks,” in

International Joint Conferenceon Neural Networks (IJCNN) , 2017.[39] W. J. Zhou, D. L. Maskell and C. Quek, “FIE-FCMAC: A novelfuzzy cerebellum model articulation controller (FCMAC) using fuzzyinterpolation and extrapolation technique,” in