[PDF] Self Organizing Classifiers: First Steps in Structured Evolutionary Machine Learning

Abstract

Learning classifier systems (LCSs) are evolutionary machine learning algorithms, flexible enough to be applied to reinforcement, supervised and unsupervised learning problems with good performance. Recently, self organizing classifiers were proposed which are similar to LCSs but have the advantage that in its structured population no balance between niching and fitness pressure is necessary. However, more tests and analysis are required to verify its benefits. Here, a variation of the first algorithm is proposed which uses a parameterless self organizing map (SOM). This algorithm is applied in challenging problems such as big, noisy as well as dynamically changing continuous input-action mazes (growing and compressing mazes are included) with good performance. Moreover, a genetic operator is proposed which utilizes the topological information of the SOM's population structure, improving the results. Thus, the first steps in structured evolutionary machine learning are shown, nonetheless, the problems faced are more difficult than the state-of-art continuous input-action multi-step ones.

Full PDF

NNoname manuscript No. (will be inserted by the editor)

Self Organizing Classiﬁers: First Steps in StructuredEvolutionary Machine Learning

Danilo Vasconcellos Vargas · Hirotaka Takano · Junichi Murata

Received: date / Accepted: date

Abstract

Learning classiﬁer systems are evolutionarymachine learning algorithms, ﬂexible enough to be ap-plied to reinforcement, supervised and unsupervised learn-ing problems with good performance. Recently, self or-ganizing classiﬁers were proposed which are similar tolearning classiﬁer systems but have the advantage thatin its structured population no balance between nichingand ﬁtness pressure is necessary. However, more testsand analysis are required to verify its beneﬁts. Here, avariation of the ﬁrst algorithm is proposed which usesa parameterless self organizing map (SOM). This al-gorithm is applied in challenging problems such as big,noisy as well as dynamically changing continuous input-action mazes (growing and compressing mazes are in-cluded) with good performance. Moreover, a genetic op-erator is proposed which utilizes the topological infor-mation of the SOM’s population structure, improvingthe results. Thus, the ﬁrst steps in structured evolu-tionary machine learning are shown, nonetheless, theproblems faced are more diﬃcult than the state-of-artcontinuous input-action multi-step ones.

Learning Classiﬁer Systems (LCS) are several algorithmsinspired by evolution [29],[20]. They can be applied

Danilo Vasconcellos VargasKyushu University, Fukuoka, JapanE-mail: [email protected] TakanoKyushu University, Fukuoka, JapanE-mail: [email protected] MurataKyushu University, Fukuoka, JapanE-mail: [email protected] to reinforcement learning problems (actually, they cansolve supervised learning and unsupervised learning [26]as well but we will only focus on reinforcement learn-ing in this article). Diﬀerent from most reinforcementlearning algorithms, however, LCS algorithms do notuse state-action look-up tables to predict payoﬀ. Tosolve RL problems, LCS systems use a set of individ-uals with condition-action-prediction rules, i.e., solvingthe problem with piecewise approximations [19]. In thismanner, the diﬃculties that arrive from complex prob-lems, where a large number of states and/or actions arerequired, can be avoided.However, the dynamic niches of solutions presentin LCS allows over-generalized solutions with higher ﬁt-ness to compete and win against specialized ones in low-ﬁtness niches (niches where good solutions receive lowpayoﬀ when compared to other niches), even thoughthe specialized solutions would have a better perfor-mance. One way of solving this problem is to separatea ﬁtness deﬁned on a niche from ﬁtnesses deﬁned onother niches (i.e., having a good ﬁtness on other nicheswould not inﬂuence the present niche). This is exactlythe niched ﬁtness concept which was introduced andused by Self Organizing Classiﬁers (SOC) (niched ﬁt-ness is explained in Section 6). Notice that niched ﬁt-ness requires well deﬁned niches where the ﬁtness ofindividuals can be measured and compared locally in-side each niche. This concept is diﬃcult to insert intocurrent LCSs.SOC are a recently proposed family of evolutionarymachine learning methods with a structured popula- In this article, the term niche follows the Hutchinsonianniche deﬁnition [16]. Hutchinsonian niche is an n -dimensionalhyper-volume composed of environmental features. Moreover,niching is basically clustering, i.e., the objective is to createniches (clusters) which are more similar in some sense. a r X i v : . [ c s . N E ] N ov Danilo Vasconcellos Vargas et al. tion (to the knowledge of the authors the ﬁrst evolu-tionary machine learning algorithm to use a structuredpopulation). The main objective of SOC’s constructionwas to overcome over-generalization problems (over-generalized solutions and related problems) in LCS [31].In SOC systems, no balance between specialization andgeneralization are needed, since the niched ﬁtness con-cept is used. Actually, the niched ﬁtness concept indi-rectly requires a separation between the niching pres-sure from the ﬁtness pressure, in other words, by requir-ing well deﬁned niches, both objectives (i.e. the objec-tive of creating good niches and the objective to reachgood solutions) need to be clearly stated. SOC usesevolution only for ﬁnding good solutions (ﬁtness pres-sure), letting the SOM face the clustering objective inparallel (niching pressure). Indeed, it is similar to co-evolutionary approaches. The ﬁtness for each niche canbe strength based because the niched ﬁtness alreadysolved the problems with over-generalized classiﬁers.Regarding the relationship between SOC and LCS.Although, SOC and LCS possess many similarities, itmay not be correct to classify SOC as an LCS. Thereare many crucial diﬀerences that make SOC diﬃcult tomatch with other LCS algorithms, see Table 1.This article extends [31] to include a more robustmethod without modifying its simplicity. A parameter-less SOM replaces the SOM algorithm conferring betteradaptation properties and less parameters. Previously,SOC were applied to some complex multi-step RL prob-lems with optimum or near optimum results even usingsmall population sizes. This article shows results on fournew challenging problems: – Big mazes - Mazes with as much as four times thearea of previous mazes. – Noisy mazes - Mazes with noise. – Changing mazes - Mazes which have their structuremodiﬁed over a series of trials. – Growing and compressing mazes - Mazes which in-crease or decrease in size and structure over trials.Additionally, a new genetic operator is proposed whichutilizes the topological information of the SOM popu-lation. Its use is shown to improve the results.

LCS have been developed for a while, forming a wideand diverse literature. LCS are evolutionary based sys- Strength based ﬁtness that is directly proportional to thepayoﬀ. They came into disuse in the LCS literature becauseof consequential over-generalization problems. tems capable of solving problems by evolving a set ofagents with condition-action-prediction rules which co-operate or compete with themselves. Here we will brieﬂyreview LCS applied to multi-step and/or continuousproblems. For a detailed review of the literature, pleaserefer to [29][20].In problems with continuous actions, LCS was ap-plied to many problems. To begin with, XCSF has beenapplied to function approximation [35],[10],[28]. Otherworks in function approximation include the LCS withfuzzy logic [30],[8],[11], neural-based LCS algorithms[7],[8] and genetic programming-based [18]. The suc-cess of LCS also span the control of robotic arms [24,9]and navigation problems [6,15].However, applications to multi-step problems withcontinuous actions restrict to the mobile robot in a cor-ridor [6] and the empty room with noise [15]. Complexmulti-step problems were solved only for discrete out-puts [19].

The commonly used SOM is an algorithm capable ofproducing a projection of the input usually into twodimensional space. One of the most important advan-tages of the method is the preservation of the topologi-cal relationship of the input in the constructed map. InSOM, for every input x the grid of weights w i competefor it (the closest weight wins the competition, i.e., theweight having minimum || x − w i || ). After the winningcell is decided, all cells in the grid are updated by thefollowing equation: ∆w i ( t ) = (cid:15) ( t ) h i,c { x ( t ) − w i ( t ) } (1) w i ( t + 1) = w i ( t ) + ∆w i ( t ) . (2)where ∆w i ( t ) is the weight update in the current itera-tion t , (cid:15) is the learning rate and h i,c is the neighborhoodfunction. x and w i are respectively the input array andthe weight of a given cell i when the winning cell indexis c . The learning rate (cid:15) is a monotonically decreasingfunction with respect to the number of times the gridupdate was realized and neighborhood function is usu-ally an exponentially decreasing distance based func-tion. For example: (cid:15) = 0 . . t (3) h i,c = e − dist ( i,c ) , (4)where dist is some distance metric applied on the grid,deﬁning the topological relationship of the grid’s cells.Figure 1 illustrates one iteration of the algorithm, show-ing how the grid adapts to a given two dimensional in-put. OC: First Steps in Structured Evolutionary Machine Learning 3

Table 1

Diﬀerence Between SystemsAlgorithm Type Model FitnessLCS set of condition-action-prediction rules Accuracy basedSOC dynamic state-table of prediction rules Strength basedStandard RL static state-action look-up tables Strength based

Fig. 1

Illustration of the SOM dynamics. The left part ofthe ﬁgure shows a two dimensional input (white dot) beingpresented to the SOM’s grid (black dots connected by lines).In this scenario, the winning cell (closest to the input) isshown in red. Afterwards, the SOM’s grid is updated and theresulting grid is shown on the right.

A parameterless SOM is a SOM where the learningrate function is not necessary [5]. Instead, it is automat-ically deﬁned by how good the SOM ﬁts the input (infact, this value is strongly related to the input’s novelty[14],[23]). In the usual SOM, by making learning ratemonotonically decreasing with the number of iterations,the more the SOM is used the less it could learn/adaptto new information. The parameterless SOM sets thelearning rate according to the error of the input (not amonotonic decreasing function), therefore it is able toincrease the learning rate when an rare or unexpectedinput arrives. Thus, the new parameterless SOM doesnot only possess fewer parameters, but also the abilityto always adapt to changes in the environment.Let the learning rate (cid:15) be deﬁned as: r (0) = || x (0) − w c (0) || (5) r ( t ) = max ( || x ( t ) − w c ( t ) || , r ( t − (cid:15) ( t ) = || x ( t ) − w c ( t ) || r ( t ) , (7)where w c is the SOM winning cell’s weight array. Theweight update ∆w i ( t ) is similar to SOM’s weight up-date, changing only in relation to the new learning rate (cid:15) and the modiﬁed neighborhood function h i,c . Con-sidering dist ( i, c ) the distance between cell i and thewinner cell c , we have: Θ ( (cid:15) ( t )) = (cid:15) ( t ) θ max , Θ ( (cid:15) ( t )) > θ min (8) h i,c = e − dist ( i,c )2 Θ ( (cid:15) ( t ))2 (9) θ max and θ min are respectively the maximum and min-imum of Θ ( (cid:15) ( t )). In this article, θ max equals to theSOM’s area (width multiplied by the height of the grid)and θ min = 1 are used. Fig. 2

Island model structure. Arrows indicate the infre-quent immigration procedure, the circles are the individualsand the oval shapes are the subpopulations.

Fig. 3

Cellular algorithm structure. Shaded area indicatesan example of neighborhood for the central individual.

Structured evolutionary algorithms does not possess apanmictic population. Instead they organize the indi-viduals into a structured population [27,2]. As com-monly considered in the literature, algorithms whichhas some sort of implicit structure will not be consid-ered structured. In fact, to avoid this type of confu-sion the name of parallel evolutionary algorithms aresometimes used in the literature. The need for distinc-tion derives from the fact that algorithms with implicitstructure lose many of the beneﬁts of ones with explicitstructure.Two types of structured EAs will be given as exam-ples which are somewhat related to the structure of theproposed method.The ﬁrst type is island models (also called distributedgenetic algorithms) [4]. Figure 2 shows its structure.Basically, the population is divided into a number ofsubpopulations (“islands”) with few genetic informa-tion exchanged between them.The second type, cellular algorithms are structuredevolutionary algorithms where individuals are usuallypositioned in a vertex of a lattice graph (Figure 3 showsa common cellular structure). They interact solely withadjacent individuals deﬁned by a neighborhood func-tion [21,3].

Danilo Vasconcellos Vargas et al.

Fig. 4

Structured and unstructured division.

Methods in evolutionary machine learning can also beclassiﬁed into structured and unstructured . As explainedin Section 4, structured algorithms arrange their pop-ulation in some sort of structure whereas unstructuredones possess a single population set. That is, in thesame way that structured evolutionary algorithms dif-fer from unstructured panmictic ones, learning classiﬁersystems can be classiﬁed, in this context, as a type ofunstructured evolutionary algorithms and self organiz-ing classiﬁers can be seen as a structured evolutionaryalgorithm. Figure 4 shows a diagram illustrating this as-pect. Note that implicit structured algorithms such theniched genetic algorithm can not be considered struc-tured following the deﬁnition in Section 4.There is a motivation behind structured evolution-ary machine learning. Indeed, some advantages of struc-tured against unstructured algorithms should follow theones from the optimization ﬁeld. To cite a few: – Diversity - By restricting the interrelation of indi-viduals, structured populations can preserve diﬀer-ent subpopulations, i.e., avoid competition; – Time Performance - Structured populations alloweasier parallelization and therefore less running timewhen running in the appropriate system.

Two relatively novel concepts are used in the article,both concepts were introduced in [31]. They are respec-tively: – Niched ﬁtness - Niched ﬁtness is a ﬁtness deﬁnedonly in a given niche (place or circumstance). Out-side of this niche, the ﬁtness is undeﬁned and there-fore nonexistent. The objective here is to evaluate Structured and parallel as well as unstructured and pan-mictic terms when referring to algorithms will be used indis-tinctly.

Fig. 5

SOM population structure. A self organizing map gridwith subpopulations inside each cell. individuals by their performance in the given nicheindependent of how they behave in other niches,solving over-generalization problems. In fact, theniched ﬁtness concept expose the similarity betweenniching and multiobjectivization [22]. – SOM population - SOM population is a 2D cell gridwith each cell having a subpopulation. The grid be-have as a SOM, self-organizing itself to the inputand allowing only the subpopulation inside the win-ning cell to interact with the input. In other words,SOM population is a mixture of both island modelsand cellular algorithms with a self-organizing struc-ture (See Figure 5).

SOC are a series of algorithms based on the SOM sub-population, sharing similar model and dynamics. Theschematic of SOC is described in Figure 6 (the schematicstyle is the same as the one previously used to describeZCS, XCS and many other papers of the LCS literature[34]).A Q-learning based reinforcement scheme with nichedﬁtness is used by SOC. The ﬁtness update of each in-dividual is done using the Widrow-Hoﬀ rule [33]: F = F + η ( ˆ F − F ) , (10)where η is the learning rate, F is the current ﬁtness andˆ F is a new ﬁtness estimate. Given an arbitrary classiﬁer c activated at its SOM’s cell cell . The ﬁtness estimateof the pair ( cell , c ) which were activated at time t − F ( c, cell ) t − = R t − + γ max c (cid:48) ∈ cell (cid:48) { F ( c (cid:48) , cell (cid:48) ) } , (11)where R is the reward received, γ is the discount-factor,max c ∈ cell { F ( t ) } is the maximum ﬁtness inside the activatedcell cell (cid:48) at the current cycle t and c (cid:48) is a classiﬁer whichhas the current highest ﬁtness in cell (cid:48) . OC: First Steps in Structured Evolutionary Machine Learning 5

EnvironmentSOMDetectors

Input(1.1, 5.2)

Effectors

Individual Fitness 2.1, 4.7 90

PreviousAction Set

Reward Action(2.1, 4.7)

Action Set +

Fitnessdiscount

Fig. 6

Self Organizing Classiﬁer Schematic

Similar to learning classiﬁer systems, to decreasecomputation resources, the structure of the SOM pop-ulation is implemented as a single array of classiﬁerswith a given numerosity indexed by the SOM popula-tion structure. In this manner, the numerosity is deﬁnedby the number of indexes a given individual possesses.

Before going into the further details of the algorithm,we wish to clarify one question that might surge. Amongthe vast amount of algorithms that could cluster inputs,why was SOM chosen?The answer is that we chose SOM not for what itdoes, but for what it gives as information. In this con-text, SOM’s interesting capabilities are as follows: – Topological Preserving Projection - high-dimensionalinputs are projected in a two-dimensional topologi-cal preserving map. – Novelty Measure - The error of the SOM’s cell giventhe input is an approximation to the uniqueness ofthe input itself, which is a measure of novelty [14],[23]. In fact, this measure is used to aﬀect the updateof the other cell weights in the parameterless SOM.Evolutionary algorithms can use these pieces of in-formation to better adapt to the problem at hand. More-over, other procedures can be run on these informationto improve the system as a whole. For example, geneticoperators may exploit the SOM’s structure by repro-ducing individuals with similar individuals in subpop-ulations adjacent to them. Observe that adjacent sub- populations are necessarily similar in input and there-fore may share similar or even equal solutions.

Simplest Self Organizing Classiﬁers are implementa-tions of the class of Self Organizing Classiﬁers (see Sec-tion 7). Figure 6 shows how the cycle is executed andTable 2 describes the execution cycle in details.For the sake of comprehensibility, we give here thename of Simplest Self Organizing Classiﬁer (SSOC) tothe ﬁrst very simple algorithm developed in [31] and thename of ”Simplest Self Organizing Classiﬁer 2” (SSOC2)to the one described in this article. The method pro-posed here replaces the SOM by a parameterless SOM,reducing two of the parameters since the learning ratefunction is not necessary [5]. There are not any otherdiﬀerences, therefore the deﬁnition here applies to bothSSOC and SSOC2.The classiﬁers used code directly the action by anarray of real numbers, i.e., the model is an array of realnumbers which is mapped directly to the output.Moreover, the SOM population has particularly asubpopulation inside each cell which is also dividedinto two groups: one of best individuals and the otherof novel individuals. Best and novel individuals havea ﬁxed size of β and ν respectively. Considering evo-lutionary algorithm’s cycle is an algorithm cycle whenthe evolutionary algorithm (EA) is called, the followingrules take place: – Best individuals are the best ﬁtted individuals insidethe subpopulation in the last EA’s cycle.

Danilo Vasconcellos Vargas et al.

Table 2

Simplest Self Organizing Classiﬁers’ Cycle1. An input is received by the system2. The SOM population is activated on the input (a givenindividual will be returned to act)(a) The cell’s weight array which is closest to the re-ceived input wins the competition(b) Inside the winning cell, a random individual is cho-sen either from the novel group or from the bestgroup (depending if it is an exploration or exploita-tion cycle)(c) The cells’ weight array of the SOM population isupdated by the SOM algorithm(d) The chosen individual is returned3. The chosen individual acts on the system (in the caseof SSOC, the individual’s chromosome is the actionitself).4. The previous acted individual on the past activatedcell has its ﬁtness updated. The equation used to up-date is written in Section 7 (notice that the ﬁtness isupdated only for the given cell even if the individualis present on other cells, see the niched ﬁtness concepton Section 6).5. Check if the EA should be called. If positive, executethe EA (see Section 9.1 for the complete description ofthe EA). – Novel individuals are renewed every EA’s cycle (thedetailed process is described in the next Section).The SOM population begins without any classiﬁers.Classiﬁers are created when the respective cell wins thecompetition inside the SOM. In one hand, novel indi-viduals are created as random classiﬁers. On the otherhand, best individuals, when possible, are set equalto another cell’s best individuals from the neighbor-hood which maximize experiencechebyshevDistance (experience isdeﬁned on Section 9.1). If not possible, best individualsare initialized in the same way as the novel individuals.Observe that a given cell’s experience are set to zeroevery time the evolution happens in it. This may seemcounter-intuitive, but it alleviates a problem with hy-per active cells which often possess a high error (see thecell’s error for the growth of SOM in [1], [12]). A ran-dom selection of a close cell should output the similarresults.Cycles of exploration and exploitation are alternated(a cycle of exploration is followed by an exploitationcycle and so on). Within the SOM’s winning cell in agiving exploration or exploitation cycle a random indi-vidual from respectively the novel or best individualsare chosen to act. Neighborhood is deﬁned as the cells within a Chebyshevdistance of less or equal to four ιS . S is the number of subpop-ulation individuals (novel plus best individuals) presenton each cell and parameter ι deﬁnes an experience perindividual, above which they should have an accurateﬁtness evaluation. After the evolution has been applied,the experience of the cell is set to zero.By applying the evolutionary algorithm locally, SSOC2(the same is valid for SSOC) respects the niched ﬁtnessconcept. Its procedure consists of sorting the individu-als of the given cell according to their ﬁtness. On onehand, the current best β individuals substitute the pre-vious best individuals and the remaining individualsare discarded (the index is removed and the individ-ual numerosity decrease, if the numerosity reaches 0 itis deleted). On the other hand, novel ν individuals arecreated using either:1. Indexing - A copy from (index to) a randomly se-lected individual of the entire population;2. Reproduction - Created by a genetic operator.The two procedures above have equal probabilities ofhappening.The diﬀerential evolution operator is used in bothSSOC and SSOC2. Motivated by its robustness andoverall good results [25,32,17], even when comparedagainst complex optimization algorithms (e.g., Estima-tion of Distribution Algorithms) [13]. In this paper, thediﬀerential evolution’s mutant vector is created by ran-domly choosing three vectors from the SOM’s entirepopulation of individuals (notice that individuals withnumerosity bigger than one are counted as one).

10 Experiments

The deﬁnitions done in Section 10.1 are important spe-cially when discussing adaptation of the system in ques-tion.10.1 Deﬁnitions

Deﬁnition 1

Adaptation - Capability of a system tomodify its behavior in order to better ﬁt to changes inthe environment.

Deﬁnition 2

Adaptation’s Time - The time necessaryfor a system to adapt to changes in the environment.

OC: First Steps in Structured Evolutionary Machine Learning 7

Fig. 7

Maze 1 - Static 10x20 maze problem.

A system with an adaptation’s time of zero canbe said to possess a generalized model where modiﬁ-cations are unnecessary to ﬁt both past and currentenvironments. In this sense, generality and adaptationare closely related terms. Although, it depends on thedetails about the system, e.g., the type of model used.10.2 EnvironmentsThe experiments were conducted on both mazes of Fig-ure 8 as well as on the dynamic mazes of Figures 9and 10. Figures 9 and 10 show each two maze states, ineach of these problems the maze is constantly chang-ing every 10000 trials between one state (right side)and the other state (left side). Notice that the problemdescribed in Figure 10 shows a maze which constantlyincreases and decreases in size. Agents act on all en-vironments with continuous (x,y) translation actions.The variable observed by the agent is the agent’s posi-tion which is also continuous.At every trial, the agent starts at a random positionon the environment. Naturally, starting inside a wall isnot possible. Reaching the goal would give the agent areward of 1000, hitting an obstacle would return − −

10. Additionally,agents can not move more than 1 . it means the SOM iteration num-ber, chebyshevDistance () is the Chebyshev distancebetween the current cell and the cell which won theSOM’s competition and random ( a, b ) is a function which Fig. 8

Maze 2 - Static 21x21 maze problem.

Fig. 9

Maze 3 - Dynamic maze problem with constant di-mensions 10x10. The maze change from the left to the rightstate and vice-versa every 10000 trials.

Fig. 10

Maze 4 - Dynamic maze problem where the dimen-sions also change, growing from 10x10 to 10x20 state andvice-versa. returns a uniform random value between a and b . Thecells of the SOM are only updated if the neighborhoodfunction multiplied by the learning restraint surpassesthe cell update threshold.Following the design of [19] the performance is com-puted as the average steps to reach the goal during thelast 100 trials. Moreover, trials can not last more than500 steps. Any trial which last more than 500 is ter-minated and a new trial is started with the agent, asusual, in a random position. Every result, when notstated otherwise, are averaged over 20 experiments.To give an overall idea of the resulting behavior ofthe agent (or its ﬁtness) after the algorithm has learned,a sampling strategy was used. First, we divide the mazein blocks of size 1x1, then the action of the agent (orthe ﬁtness of the winning cell) is sampled 100 times andaveraged in a given 1x1 block. By repeating this processfor all blocks inside the maze, we have a general idea ofthe behavior learned (or ﬁtness distribution). Danilo Vasconcellos Vargas et al.

Table 3

Parameters Diﬀerential Evolution CR 0 . random (0 , × random (0 , − chebyshevDistance () )Cell update threshold 0 . η . β ν ι γ . InitialF itness .

99 to enable a good spread of ﬁtness.SSOC2’s sensitivity to the size of population is ex-istent but it is, surprisingly, not big even with suchsimple classiﬁers. The sensitivity happens due to twolimitations:1. The SOM’s current incapability of growing to couplewith the diﬃculty of the problem in question.2. SSOC2’s classiﬁers are as simple as possible. There-fore, niching’s granularity needs to increase for themethod to cope with diﬀerent actions. In fact, itis just evolving piecewise constant approximations(neural XCSF from [15] used piecewise nonlinear ap-proximations to solve continuous input-action multi-step problems).Therefore, with adaptable niching’s granularity (grow-ing and shrinking SOM population) as well as morecomplex classiﬁers (complex models) the sensitivity shoulddisappear completely. 10.5 Noisy mazesReal world problems always have some degree of noiseinvolved. In some of them, the noise present can not bedisregarded, aﬀecting some algorithms undesirably. Tosimulate mazes where noisy can not be disregarded, a ±

5% noisy variation is added to the observed variables.The result is shown for both Mazes 1 and 2 on re-spectively Figures 14 and 15. For Maze 1, Figure 14shows very good results with no visible issues. In fact,the performance is better than the results without noise(see Figure 11), i.e., it has less oscillations while con-verging to the same value. This happens because with-out noise the algorithm may get stuck, repeating for along time the same input and consequently building apoor SOM model (the weights of SOM’s cells will getnear the repeated input). With noise the input receivedby the algorithm will rarely be the same. Actually, thealgorithm may even found itself fewer times stuck, sincethe noise is constantly changing the input giving chanceto other SOM cells to activate and therefore the pos-sibility of diﬀerent actions to arise. Maze 2 shows aslightly worse performance and behavior. This happensbecause aliasing states with very diﬀerent ﬁtness ap-peared. To explain this phenomenon, ﬁrst take a lookat the ﬁtness in part A, B and C. Notice that parts Aand C have greater values than part B. This is only pos-sible because part C is getting a bit of ﬁtness from thehigh ﬁtness northwest portion of the map just aboveit. Due to the noise eﬀects, part C may receive inputsfrom the northwest portion of the map, allowing it toreceive a greater reward than it should. This inﬂuenceof a high ﬁtness aliasing state cause the unwise behav-ior at the frontiers. However, when the initial state isnot in this problematic regions, the result is as good aswithout noise.To give an idea of SSOC2’s performance in relationto the evolutionary machine learning literature, Fig-ure 16 shows the resulting behavior for the empty roomwith noise of [15]. Both SSOC2 and the neural XCSF

OC: First Steps in Structured Evolutionary Machine Learning 9

Fig. 11

SSOC2’s behavior (upper row), population (middle row) and performance (lower row) throughout the trials. The leftcolumn shows the results for the default 10x10 sized population and the right show results for the smaller 5x5 SOM population. converge to the optimum behavior. However, SSOC2used a SOM population of size 5x5 which is equivalentto a maximum of 175 individuals in a panmictic popu-lation (each cell of the SOM population has 7 individ-uals, see Table 3), while XCSF took 16000 individualsto solve it, i.e., SSCO2 used 91 times less population.Both SSCO2 and XCSF were not optimized and thesepopulations numbers are not the minimal populationrequired to solve the maze. However it is still a quanti-tative measure that gives an overall picture of the pop-ulation requirements for both algorithms on multi-stepRL problems. 10.6 Changing MazesProblems in the real world are never static. To reﬂectthis challenging characteristic, we consider a maze (seeFigure 9) which changes from time to time. In otherwords, the agent’s adaptation ability is put to test.Figure 17 shows the number of steps required bythe agent over time to solve Maze 3. Notice that theﬁrst two peaks are bigger than subsequent ones. Thisfact veriﬁes the ability of the algorithm to reuse theknowledge when possible. Moreover, the repetition ofexponentially decreasing learning curves demonstratethe agent’s ability to change its model to reﬂect the

Fig. 12

SSOC2 applied to Maze 2 with 20x20 SOM population and discount factor of 0 .

99. The ﬁgure shows the behavior(upper left), ﬁtness (upper right), population (lower left) and performance (lower right). environment whenever the environment changes. Thisveriﬁes the agent’s adaptation ability.10.7 Growing and compressing mazesA special case of dynamic mazes are mazes which doesnot only change its internal structure but also size. Thatis the case for Maze 4’s growing maze (see Figure 10).The results demonstrate that SSOC has a good ca-pability of adaptation which is also unchanged through-out the experiment (see Figure 18). Moreover, the de-creasing peaks shows the reuse of knowledge from theprevious problem. The system seems to arrive at a stagewhere minimal steps are required to adapt, i.e., minimaladaptation’s time.

11 SSOC versus SSOC2

Here, SSOC2 and SSOC algorithms are compared. Inthe SSOC algorithm, 0 . . it is used as a learn-ing rate function (parameter absent in a parameterlessSOM).The main diﬀerence between SSOC and SSOC2 arethe incapability of changing the model when an earlydistribution was biased (or when the problem has justchanged).Therefore, the results from Figure 19 are ex-pected (SSOC was not able to continuously adapt asthe learning rate gets smaller and can not increase withtime). Figures 20 and 21 shows respectively a smallerproblem (solved by both algorithms) and a bigger prob-lem where SSOC had diﬃculties. Similar to problemsthat change with time, big mazes may cause the distri-bution of input from a series of trials to diﬀer strongly.To make the diﬀerences explicit, Figure 22 shows a com-parison between the resulting SOM’s structure from OC: First Steps in Structured Evolutionary Machine Learning 11 N u m be r o f S t ep s t o G oa l Fig. 13

Smaller population results. SSOC2 applied to Maze 2 with a discount factor of 0 .

99 (remaining parameters are thedefault, i.e., 10x10 SOM Population, etc). The ﬁgure shows the behavior (upper left), ﬁtness (upper right), population (lowerleft) and performance (lower right). both SSOC and SSOC2. It is clear that SSOC focuseson the frequency of input while SSOC2 focuses on thenovelty of the input . That explains the poorer cover-age of SSOC. Having said that, a poor coverage doesnot mean a worse result. It depends on the system. Forexample, if the high frequency of inputs coincides withthe most diﬃcult inputs, the poor coverage would be avery useful mapping.

12 Mixed Genetic Operator

SOM’s population is naturally a structure. However,until now the evolutionary algorithm did not take any recall that the error between the SOM’s cell and the in-put, which is used to determine the rate of learning, is anapproximation to uniqueness, i.e., a measure of novelty [14],[23] beneﬁt from this aspect. Here we show that by usingthe topological preserving properties of SOM it is pos-sible to mix global and local solutions to obtain relevantimprovements.The mixed genetic operator used is, as before, basedon the diﬀerential evolution operator. But instead ofchoosing three random global solutions, each of thethree solutions chosen come with equal probability fromeither a random adjacent cell (local) or a random solu-tion (global). Results are shown on Figure 23. The num-ber of steps necessary to reach the goal is on averageapproximately 60 after the algorithm has converged.Recall that using the previous (only global) genetic op-erator the average reached approximately 70 steps (seeFigures 12). This result shows that evolutionary algo-rithms can explore the structure of the SOM with ge-netic operators, ending up also working on the inherentstructure of the problem. In other words, it is possible Fig. 14

SSOC2’s behavior and performance for Maze 1 withnoise. to eﬃciently reuse similar partial solutions (adjacentsolutions) to solve similar partial problems (adjacentcells).

13 Justiﬁcation

SSOC2 has a simple model. Nonetheless, it is still pos-sible to achieve good results in complex problems withit. One way to think about it (as described in Section 1)is to see that niched ﬁtness is necessary to avoid over-generalized solutions, deriving the rest of the reasoningfrom it. However, one might also think that at the heartof any problem, there is the necessity to divide and de-couple as much as there is the necessity to solve theparts. In other words, divide and conquer is essential.The hurdle is that it was always a multi-objective prob-lem, treating it as a single-objective problem will con-sequently provoke either division or conquer strategiesto prevail over another, while we may want both to co-exist . By separating both niching and ﬁtness pressuresinto a cooperating system, SOC managed to overcomethis problem. In this sense, niched ﬁtness becomes aconsequence. The name of a widely known algorithm but most impor-tantly a line of thought of how to solve problems in general over-generalizing solutions derives exactly from the preva-lence of conquer strategies over division strategies Fig. 15

SSOC2’s behavior, ﬁtness and performance for Maze2 with noise. SOM population’s size is 20x20 and the discountfactor is set to 0 . Fig. 16

Empty room maze with noise solved by SSOC2 witha SOM population’s size of 5x5.

Number of Trials N u m be r o f S t ep s t o G oa l Fig. 17

Number of steps required by the agent to reach theobjective in Maze 3. Vertical red dashed lines shows whenthe maze changed. A horizontal dashed line at 20 is drawnfor orientation purposes.

Number of Trials N u m be r o f S t ep s t o G oa l Fig. 18

SSOC2 performance on Maze 4. Vertical red dashedlines show when the maze changed. A horizontal dashed lineat 20 is drawn for orientation purposes.

Number of Trials N u m be r o f S t ep s t o G oa l Fig. 19

Number of steps required by SSOC2 (red) and SSOC(blue) agents to reach the objective in Maze 4. Vertical dashedlines show when the maze changed.

Fig. 20

Results on Maze 1 for SSOC2 (red) and SSOC(blue).

Fig. 21

Results on Maze2 for SSOC2 (red) and SSOC (blue).4 Danilo Vasconcellos Vargas et al.

Fig. 22

Two samples of the ﬁnal distribution of SOM’s cellsfor SSOC2 (above) and SSOC (below) for Maze 2. The size ofthe points is directly proportional to their experience (numberof times they were selected to act).

Fig. 23

SSOC2 with mixed genetic operator applied toMaze 2.

14 Conclusion

This article evaluated deeply the beneﬁts and challengesof self organized classiﬁers (or more generally struc-tured evolutionary machine learning). The following arethe main points: – SSOC2 - An improved version of SSOC (named SSOC2)that was created with the substitution of the SOMby the parameterless SOM. Diﬀerences in the ﬁnalSOM population structured was shown as well asbetter results. – State-of-art Results On Challenging Problems - Re-sults were shown in a variety of continuous input-action multi-step problems. Problems with noisy andwith dynamic environments were also considered.To the knowledge of the authors these problems arethe most diﬃcult continuous input-action multi-stepproblems faced by an evolutionary machine learningalgorithm to date. Although SSOC2 is made of verysimple classiﬁers (SSOC2 is capable of only piece-wise constant approximations), near optimal resultswere presented. – Genetic Operator - Tests have shown the advantagesof using the SOM population’s topological infor-mation inside the evolutionary algorithm. By usingglobal and local solutions inside the genetic opera-tor, the average number of steps required to reachthe objective reduced in approximately 10 steps.Thus, with the good results on very diﬃcult prob-lems it was possible to verify the strength of the ap-proach even with a very simple internal model. Thereare many possible branches of research that can de-rive from SOC. In fact, structured evolutionary machine

OC: First Steps in Structured Evolutionary Machine Learning 15 learning algorithms were barely researched. Numerouswidely diﬀerent algorithms are possible from the combi-nation of parallel (structured) evolutionary algorithms,learning classiﬁer systems and machine learning. Thisis just the ﬁrst step.

References

1. D. Alahakoon, S. K. Halgamuge, and B. Srinivasan. Dy-namic self-organizing maps with controlled growth forknowledge discovery.

Neural Networks, IEEE Transac-tions on , 11(3):601–614, 2000.2. E. Alba and M. Tomassini. Parallelism and evolutionaryalgorithms.

Evolutionary Computation, IEEE Transac-tions on , 6(5):443–462, 2002.3. E. Alba and J. Troya. Cellular evolutionary algorithms:Evaluating the inﬂuence of ratio. In

Parallel ProblemSolving from Nature PPSN VI , pages 29–38. Springer,2000.4. T. Belding. The distributed genetic algorithm revisited.In

Proceedings of the Sixth International Conference onGenetic Algorithms: University of Pittsburgh, July 15-19, 1995 , page 114. Morgan Kaufmann Pub, 1995.5. E. Berglund and J. Sitte. The parameterless self-organizing map algorithm.

Neural Networks, IEEETransactions on , 17(2):305–316, 2006.6. A. Bonarini, C. Bonacina, and M. Matteucci. Fuzzyand crisp representations of real-valued input for learn-ing classiﬁer systems.

Learning Classiﬁer Systems , pages107–124, 2000.7. L. Bull. On using constructivism in neural classiﬁer sys-tems.

Parallel problem solving from nature-PPSN VII ,pages 558–567, 2002.8. L. Bull and T. O’Hara. Accuracy-based neuro and neuro-fuzzy classiﬁer systems. In

Proceedings of the Geneticand Evolutionary Computation Conference , pages 905–911. Morgan Kaufmann Publishers Inc., 2002.9. M. Butz and O. Herbort. Context-dependent predictionsand cognitive arm control with XCSF. In

Proceedings ofthe 10th annual conference on Genetic and evolutionarycomputation , pages 1357–1364. ACM, 2008.10. M. Butz, P. Lanzi, and S. Wilson. Function approxi-mation with XCS: Hyperellipsoidal conditions, recursiveleast squares, and compaction.

Evolutionary Computa-tion, IEEE Transactions on , 12(3):355–376, 2008.11. J. Casillas, B. Carse, and L. Bull. Fuzzy-XCS: A michigangenetic fuzzy system.

Fuzzy Systems, IEEE Transactionson , 15(4):536–550, 2007.12. M. Dittenbach, D. Merkl, and A. Rauber. The grow-ing hierarchical self-organizing map. In

Neural Net-works, 2000. IJCNN 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on , vol-ume 6, pages 15–19. IEEE, 2000.13. S. Garc´ıa, D. Molina, M. Lozano, and F. Herrera. Astudy on the use of non-parametric tests for analyzing theevolutionary algorithms’ behaviour: a case study on theCEC’2005 special session on real parameter optimization.

Journal of Heuristics , 15(6):617–644, 2009.14. T. Hester and P. Stone. Intrinsically motivated modellearning for a developing curious agent. In

The EleventhInternational Conference on Development and Learning(ICDL) , Nov 2012.15. G. Howard, L. Bull, and P. Lanzi. Towards continuousactions in continuous space and time using self-adaptive constructivism in neural XCSF. In

Proceedings of the11th Annual conference on Genetic and evolutionarycomputation , pages 1219–1226. ACM, 2009.16. G. Hutchinson. Concluding remarks. In

Cold SpringHarbor Symposia on Quantitative Biology 22 , pages 415–427, 1957.17. A. Iorio and X. Li. Solving rotated multi-objective opti-mization problems using diﬀerential evolution.

AI 2004:Advances in Artiﬁcial Intelligence , pages 861–872, 2005.18. M. Iqbal, W. N. Browne, and M. Zhang. Xcsr with com-puted continuous action. In

AI 2012: Advances in Arti-ﬁcial Intelligence , pages 350–361. Springer, 2012.19. P. Lanzi, D. Loiacono, S. Wilson, and D. Goldberg. XCSwith computed prediction in multistep environments. In

Proceedings of the 2005 conference on Genetic and evo-lutionary computation , pages 1859–1866. ACM, 2005.20. P. Lanzi and R. Riolo. A roadmap to the last decade oflearning classiﬁer system research (from 1989 to 1999).

Learning Classiﬁer Systems , pages 33–61, 2000.21. B. Manderick and P. Spiessens. Fine-grained parallel ge-netic algorithms. In

ICGA’89 , pages 428–433, 1989.22. J. Mouret. Novelty-based multiobjectivization.

NewHorizons in Evolutionary Robotics , pages 139–154, 2011.23. E. Reehuis, M. Olhofer, M. Emmerich, B. Sendhoﬀ, andT. B¨ack. Novelty and interestingness measures for design-space exploration. In

Proceeding of the ﬁfteenth an-nual conference on Genetic and evolutionary computa-tion conference , pages 1541–1548. ACM, 2013.24. P. Stalph and M. Butz. Learning local linear jacobiansfor ﬂexible and adaptive robot arm control.

Genetic pro-gramming and evolvable machines , 13(2):137–157, 2012.25. R. Storn and K. Price. Diﬀerential evolution–a simpleand eﬃcient heuristic for global optimization over contin-uous spaces.

Journal of global optimization , 11(4):341–359, 1997.26. K. Tamee, L. Bull, and O. Pinngern. Towards clusteringwith xcs. In

Proceedings of the 9th annual conference onGenetic and evolutionary computation , pages 1854–1860.ACM, 2007.27. M. Tomassini.

Spatially structured evolutionary algo-rithms . Springer, 2005.28. H. Tran, C. Sanza, Y. Duthen, and T. Nguyen. XCSFwith computed continuous action. In

Genetic And Evo-lutionary Computation Conference: Proceedings of the 9th annual conference on Genetic and evolutionary com-putation , volume 7, pages 1861–1869, 2007.29. R. Urbanowicz and J. Moore. Learning classiﬁer systems:a complete introduction, review, and roadmap.

Journalof Artiﬁcial Evolution and Applications , 2009:1, 2009.30. M. Valenzuela-Rend´on. The fuzzy classiﬁer system: Aclassiﬁer system for continuously varying variables. In

Proceedings of the Fourth International Conference onGenetic Algorithms pp346-353, Morgan Kaufmann I ,volume 991, pages 223–230, 1991.31. D. V. Vargas, H. Takano, and J. Murata. Self organiz-ing classiﬁers and niched ﬁtness. In

Proceeding of theﬁfteenth annual conference on Genetic and evolutionarycomputation conference , pages 1109–1116. ACM, 2013.32. J. Vesterstrom and R. Thomsen. A comparative study ofdiﬀerential evolution, particle swarm optimization, andevolutionary algorithms on numerical benchmark prob-lems. In

Evolutionary Computation, 2004. CEC2004.Congress on , volume 2, pages 1980–1987. IEEE, 2004.33. B. Widrow and M. E. Hoﬀ. Adaptive Switching Cir-cuits. In , pages 96–104, New York, 1960. IRE.6 Danilo Vasconcellos Vargas et al.34. S. Wilson. ZCS: A zeroth level classiﬁer system.