[PDF] A Layered Learning Approach to Scaling in Learning Classifier Systems for Boolean Problems

Abstract

Learning classifier systems (LCSs) originated from cognitive-science research but migrated such that LCS became powerful classification techniques. Modern LCSs can be used to extract building blocks of knowledge to solve more difficult problems in the same or a related domain. Recent works on LCSs showed that the knowledge reuse through the adoption of Code Fragments, GP-like tree-based programs, into LCSs could provide advances in scaling. However, since solving hard problems often requires constructing high-level building blocks, which also results in an intractable search space, a limit of scaling will eventually be reached. Inspired by human problem-solving abilities, XCSCF* can reuse learned knowledge and learned functionality to scale to complex problems by transferring them from simpler problems using layered learning. However, this method was unrefined and suited to only the Multiplexer problem domain. In this paper, we propose improvements to XCSCF* to enable it to be robust across multiple problem domains. This is demonstrated on the benchmarks Multiplexer, Carry-one, Majority-on, and Even-parity domains. The required base axioms necessary for learning are proposed, methods for transfer learning in LCSs developed and learning recast as a decomposition into a series of subordinate problems. Results show that from a conventional tabula rasa, with only a vague notion of what subordinate problems might be relevant, it is possible to capture the general logic behind the tested domains, so the advanced system is capable of solving any individual n-bit Multiplexer, n-bit Carry-one, n-bit Majority-on, or n-bit Even-parity problem.

Full PDF

AA Layered Learning Approach to Scaling inLearning Classiﬁer Systems for BooleanProblems

Isidro M. Alvarez [email protected]

Trung B. Nguyen [email protected]

Will N. Browne [email protected]

Mengjie Zhang [email protected] of Engineering and Computer Science, Victoria University of Wellington, Kel-burn, Wellington 6140, New Zealand

Abstract

Learning classiﬁer systems (LCSs) originated from cognitive-science research but mi-grated such that LCS became powerful classiﬁcation techniques. Modern LCSs canbe used to extract building blocks of knowledge to solve more difﬁcult problems inthe same or a related domain. Recent works on LCSs showed that the knowledgereuse through the adoption of Code Fragments, GP-like tree-based programs, intoLCSs could provide advances in scaling. However, since solving hard problems oftenrequires constructing high-level building blocks, which also results in an intractablesearch space, a limit of scaling will eventually be reached. Inspired by human problem-solving abilities, XCSCF* can reuse learned knowledge and learned functionality toscale to complex problems by transferring them from simpler problems using layeredlearning. However, this method was unreﬁned and suited to only the Multiplexerproblem domain. In this paper, we propose improvements to XCSCF* to enable it tobe robust across multiple problem domains. This is demonstrated on the benchmarksMultiplexer, Carry-one, Majority-on, and Even-parity domains. The required base ax-ioms necessary for learning are proposed, methods for transfer learning in LCSs de-veloped and learning recast as a decomposition into a series of subordinate problems.Results show that from a conventional tabula rasa, with only a vague notion of whatsubordinate problems might be relevant, it is possible to capture the general logic be-hind the tested domains, so the advanced system is capable of solving any individualn-bit Multiplexer, n-bit Carry-one, n-bit Majority-on, or n-bit Even-parity problem.

Keywords

Learning Classiﬁer Systems, Code Fragments, Layered Learning, Scalability, BuildingBlocks, Genetic Programming.

Learning Classiﬁer Systems (LCSs) were ﬁrst introduced by Holland (1975) as cogni-tive systems designed to evolve a set of rules. LCSs were inspired by the principles ofstimulus-response in cognitive psychology (Holland, 1975, 1976; Schaffer, 1985) for in-teraction with environments. LCSs morphed from being platforms to study cognitionto become powerful classiﬁcation techniques (Bull, 2015; Butz, 2006; Lanzi and Riolo,2000). c (cid:13) a r X i v : . [ c s . N E ] J un M Alvarez et al.

An important strength of LCSs is their capability to subdivide the problem intoniches that can be solved efﬁciently. This is made possible by integrating generalityinto the rules produced. This pressure towards generality means that one classiﬁercould be a solution to a bigger set of problem instances. In the proposed work, we seekto go beyond what the Michigan-style LCS currently offers with its niching strength.We start with XCS, an accuracy-based LCS, which creates accurate building blocks ofknowledge for an experienced niche to develop a system to scale to problems of anysize in a domain (Wilson, 1995; Butz and Wilson, 2000).Although LCS techniques have facilitated progress in the ﬁeld of machine learn-ing, they had a fundamental weakness. Each time a solution is produced for a givenproblem, the techniques tend to ‘jettison’ any learned knowledge and must start froma blank slate when tasked with a new problem.The ﬁeld of Developmental Learning in cognitive systems contains an idea knownas the Threshold Concept (Falkner et al., 2013). This idea conveys the fact that in humanlearning there exist certain pieces of knowledge that are transformative in advocatingthe learning of a task. These concepts need to be learned in a particular order, thusproviding the learner with viable progress towards learning more difﬁcult ideas at afaster pace than otherwise. For instance, humans are taught mathematics in a certainprogression; arithmetic is taught before trigonometry, and these two are taught beforecalculus. The empirical evidence indicates that this sequence will be more effective infostering the learning of progressively more difﬁcult mathematics (Falkner et al., 2013).Related to the beneﬁts of the threshold concept are Layered Learning (LL) andTransfer Learning (TL) in artiﬁcial systems. In LL, a sequence of knowledge is learned(Stone and Veloso, 2000). LL requires crafting a series of problems, which enables thelearning agent to learn successively harder problems. The beneﬁts of TL are actualisedwhen learning from one domain is transferred to aid learning knowledge to a similaror related domain . In essence, TL aims to extract the knowledge from one or moresource tasks and apply the knowledge to the target task (Feng et al., 2015).Current LCSs can be utilised to extract building blocks of knowledge in the form ofGP-like trees, called Code Fragments (CFs). TL can then reuse these building blocks tosolve more difﬁcult problems in the same or a related domain. The past work showedthat the reuse of knowledge through the integration of CFs into XCS, as a framework,can provide dividends in scaling (Iqbal et al., 2014).Numerous systems using CFs have been developed. XCSCFC is a system that hasextended XCS by replacing the condition of the classiﬁers with a number of CFs (Iqbalet al., 2014). Although XCSCFC exhibits better scalability than XCS, eventually, a com-putational limit in scalability will be reached (Iqbal et al., 2013b). The reason for this isthat multiple CFs can be used at the terminals, as the problem increases in size, thenany depth of tree could be created. Instead of using CFs in rule conditions, XCSCFAintegrates CFs in the action part of classiﬁers (Iqbal et al., 2013a). This method pro-duced optimal populations in both discrete domain problems and continuous domainproblems. However, XCSCFA lacked scaling to very large problems, even where theyhad repeated patterns in the data.In the preliminary work, XCSCF* (Alvarez et al., 2016) has applied the thresholdconcepts, LL, and TL to enable it to solve the n-bit Multiplexer problem. However, thiswas only a single domain, so the question remains: was the approach robust and easyto implement across multiple domains? Furthermore, the system output was humaninterpretable after two days’ work, where it is needed to generate more transparent Some ﬁelds deﬁne TL as transferring the underlying model (Pan and Yang, 2010) Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs solutions to n-bit problems. It is also important to discover ontologies of functions thatwill map to numerous, heterogeneous patterns in data at scale. This will aid in evolvinga compact and optimal set of classiﬁers at each of the proposed steps (Price and Friston,2005). This work requires hand-crafted layers, where it is usual for humans to specifythe problems for learning systems.In this paper, we aim to develop improvements to XCSCF* that enables it to solvemore general Boolean problems using LL. The idea behind this system is still to learnprogressively more complex problems using hand-crafted layers. For each tested prob-lem domain, i.e. the Multiplexer, Carry-one, Majority-on, and Even-parity domains, wepropose a series of subproblems to enable the LL system to evolve the complex logic be-hind the tested problems. The Multiplexer and Carry-one problems are ones that lendthemselves for research because they are difﬁcult, highly non-linear and have epistasis.In the Multiplexer domain, the importance of the data bits is dependent on the addressbits, while, in the Carry-one domain, the ﬁrst bits of the two half bitstrings occur morefrequently with larger niches in the search space. The Majority-on domain is known forits highly overlapped niches, which tend to overwrite optimal rules with over-generalones. Lastly, the Even-parity domain usually requires complex combinations of inputattributes to generalise.The speciﬁc research objectives are as follows: • Develop methods such that learned knowledge and learned functionality can bereused for Transfer Learning of Threshold Concepts through LL. • Determine the necessary axioms of knowledge, functions and skills needed forany system from which to commence learning. • Demonstrate the efﬁcacy of the introduced methods in complex domains, i.e. theMultiplexer, Carry-one, Majority-on, and Even-parity domains.It is hypothesised that crafting solutions at low-scale problems that scale to anyproblem in a domain is more plausible and practical than tackling each individuallylarge-scale problem. This is considered a necessary step towards continuous learningsystems, which will transition from interrogatable Boolean systems to practical real-world classiﬁcation tasks.

Figure 1 depicts the main highlights of XCS, a Michigan-style LCS developed by Wilson(Wilson, 1995). On receiving a problem instance, a.k.a. an environment state, a matchset [ M ] of classiﬁers that match the state is created from the rule population. Eachavailable action from the match set is assigned a possible payoff. Based on this arrayof predicted payoffs, an action is chosen. The chosen action is used to form an actionset [ A ] from the match set. The system executes the chosen action and receives a cor-responding reward ρ . The action set is updated regarding the reward and the GeneticAlgorithm (GA) may be applied (Wilson, 1995; Butz and Wilson, 2000). Subsumptiontakes place before the offspring are added to the population. If the new population sizeexceeds the limit, classiﬁers are chosen to be deleted until the population size is withinthe valid size.XCS differs from its predecessors in a number of key ways: (1) XCS uses the pre-diction accuracy to estimate rule ﬁtness, which promotes a solution encompassing a Evolutionary Computation Volume x, Number x M Alvarez et al.

Environment ActionPopulation Match Set Action SetGenetic Algorithm ReinforcementState Rule SelectionSelectParents Update CurrentAction SetInsertProgenyUpdate PreviousAction Set Reward

Figure 1: XCS framework showing the processes in the main loop.full map of the problem via accurate and optimally general rules; (2) evolutionary op-erations operate within niches instead of the whole population; and (3) unlike the tra-ditional LCS, XCS has no message list and therefore it is suitable for learning Markovenvironments only (Butz and Wilson, 2000; Wilson, 1995).Using Reinforcement Learning (RL), XCS guides the population of classiﬁers to-wards increasing usefulness via numerous parameters, e.g. ﬁtness. The main uses ofRL are of this mechanism are: 1) identify classiﬁers that are useful in obtaining futurerewards; 2) encourage the discovery of better rules (Urbanowicz and Moore, 2009). RLacts independently to covering, where in case there is an empty match set, new rulesare created to match the new situation (Bull, 2015). The rules or classiﬁers are com-posed of two main parts, the condition and the action. Originally the condition partutilised a ternary alphabet composed of: {

0, 1, } and the action part utilised the binaryalphabet {

0, 1 } (Urbanowicz and Browne, 2017). LCSs can select/deselect features using generality through the “don’t care” opera-tor. Originally the don’t care symbol was a ‘ {

0, 1, } (Holland, 1976). Since the initial introduction of LCSs,the number of applicable alphabets has been expanded to include more representa-tions such as Messy Genetic Algorithms (mGAs), S-Expressions, Automatically De-ﬁned Functions, and Code Fragments. A Code Fragment (CF) is an expression, similarto a tree generated in Genetic Programming (Iqbal et al., 2012). CFs generate smallblocks of code in binary trees with an initial maximum depth of two. CFs have alsobeen expressed using sub-trees with more than two children, with varying degrees ofsuccess (Alvarez et al., 2014b). The initial depth was chosen, based on empirical evi-dence, to limit bloating caused by the introduction of large numbers of introns. Analy-sis suggests that there is an implicit pressure for parsimony Iqbal et al. (2013c).LCSs based on CFs can reuse learned information to scale to problems beyond the4 Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs capabilities of non-scaling techniques. One such technique is XCSCFC. This approachuses CFs to represent each condition bit enabling feature construction in the conditionof the classiﬁers. The action part uses the binary alphabet {

0, 1 } (Iqbal et al., 2014).An important beneﬁt inherent in CFs is their decoupling between a CF and a positionwithin the condition, i.e. the ordering of the CFs is unimportant. High-level CFs cancapture the underlying complex patterns of data, but also pose a large space. The recentLCS, XOF, introduced the Observed List to enable learning useful high-level CFs in ruleconditions (Nguyen et al., 2019b,a). Another way to capture the complex patterns ofdata is to utilise CFs as rule actions while keeping the ternary representation for ruleconditions (Iqbal et al., 2013a).Previously it has been shown that rule-sets learned by a modiﬁed CF-based LCSsystem, termed XCSCF , can be reused in a manner similar to functions and their pa-rameters in a procedural programming language (Alvarez et al., 2014a). These learnedfunctions then become available to any subsequent tasks. These functions are com-posed of previously learned rule-sets that map inputs to outputs, which is a straight-forward reformatting of the conditions and actions of rules: (cid:48) If < Conditions > T hen < Actions > (cid:48) (1) (cid:48)

If < Input > T hen < Output > (cid:48) (2)

F unction ( Arguments < Input > Return < Output > ) (3)Eq. 1 is the standard way that a classiﬁer would process its conditions to achieve anaction, which is analogous to eq. (2). Eq. 3 is the analogy of a function. These functionswill take a number of arguments as their input (rule conditions) and will return anoutput (the effected action of the ruleset) (Alvarez et al., 2014a).The technique used in XCSCF places emphasis on user-speciﬁed problems, ratherthan user-speciﬁed instances, which is a subtle but important change in emphasis inEC approaches. That is, the function set is partly formed from past problems ratherthan preset functions. The advantage of learning functions is that the related CFs (asso-ciated building blocks) are also formed and transferred to the new problem, which canbootstrap the search. However, this technique lacks a rich representation at the actionpart, which will need adapting due to the different types of action values expected inthis current work, e.g. binary, integer, and bitstring. An early attempt at scaling was the S-XCS system that utilizes optimal populations ofrules, which are learned in the same manner as classical XCS (Ioannides and Browne,2008). These optimal rules are then imported into S-XCS as messages, thus enable ab-straction. The system uses human-constructed functions, such as Multiply, Divide,PowerOf, ValueAt, and AddrOf, among others (Ioannides and Browne, 2008). Al-though these key functions provide the system with the scaffolding to piece togetherthe necessary knowledge blocks, they have an inherent bias and might not be avail-able to the system in large problem domains. For example, in the Boolean domain,the log and multiplication functions do not exist. It also assumes completely accuratepopulations, whereas the proposed system is required to learn both the population andfunctionality, from scratch. If supervised learning is permitted (unlike in this work), theheterogeneous approach of ExSTraCS scales well; up to the 135-bit Multiplexer problem(Urbanowicz et al., 2012).

Evolutionary Computation Volume x, Number x M Alvarez et al.

Previously, other Boolean problems have been solved successfully by using tech-niques similar to the proposed work. One of these is a general solution to the Parityproblem described in (Huelsbergen, 1998). The technique is similar to the proposedwork because it evolves a general solution that is capable of solving parity problems ofany length. It can also address repeating patterns similar to the loop mechanism of theproposed work. On the other hand, this technique makes use of predeﬁned functionsmaking it a top-down approach. The proposed technique learns new functions, makingit more ﬂexible.The preliminary work proposed XCSCF* with of various components (Alvarezet al., 2016). Since different types of actions are expected, e.g. Binary, Integer, Real,and String (Bitstring); it is proposed that the functions be created by a system with CFsin the action (XCSCFA), although any rule production system can also be used, e.g.XCS, XCSCFC, etc. This will facilitate the use of real and integer values for the actionas well as enabling it to represent complex functionality. The proposed solution willreuse learned functionality at the terminal nodes as well as the root nodes of the CFssince this has been shown to be beneﬁcial for scaling. XCSR would not be helpful herebecause on a number of the steps, the permitted actions are not a number but a stringe.g., kBitString. Moreover, XCSR with Computed Continuous Action would presentunnecessary complications to the work because the conditions of the classiﬁers do notrequire real values (Iqbal et al., 2012). Accordingly, it is necessary to explore furtherways to expand the preliminary work to adapt to different domains.

In this section, we provide an analytical introduction to the tested problems that en-ables the training ﬂow in layers. These ﬂows help formalise the intermediate layers inSection 3.4. The problem understanding also provides an initial guess of the requiredbuilding blocks (functions and skills) that should be provided beforehand to bootstrapthe learning progress of the system. Although even these pre-provided building blockscan also be divided into more elemental knowledge, this work is not to imitate theeducation of machine intelligence from scratch. Instead, this paper aims to show theability of XCSCF* to learn progressively more complex tasks, which resemble humanintelligence.One of the underlying reasons for choosing the Boolean problem domains for theproposed work is that humans can solve this kind of problems by naturally combiningfunctions from other related domains along with functionalities from other Booleanproblems. Humans are also able to reason that some functions in their ‘experientialtoolbox’ may be appropriate for solving the problem. The experiential toolbox is thewhole of learned functionality for the agent. These functions include multiplication , addition , power , and the notion of a number line. Therefore, the agent here must build-up its toolbox of functions and associated pieces of knowledge (CFs). A computerprogram would make use of these functions and potentially many more, but it can notintuit which are appropriate to the problem, and which are not. Therefore, the agentwill need guidance in its learning so that it may have enough cross-domain functions tosolve the problem successfully. It will need to perform well with more functions thannecessary as the exact useful functions may not be known a priori . However, at thisstage of paradigm development, the agent is not expected to be able to adjust to fewerfunctions than necessary. The other reason is that Boolean problems are interrogatableso that a solutions to problems at scales beyond enumeration can still be veriﬁed.6 Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

In the Multiplexer problems, the number of address bits is related to the length of themessage string and grows along with the length. The search space of the problem isalso adequate enough to show the beneﬁts of the proposed work. For example, for the135-bit Multiplexer the search space consists of combinations, which is immenselybeyond enumerated search (Koza, 1991).An example of a 6 bit Multiplexer is depicted in Figure 2. Determining the num-ber of address bits k requires using the log function, as depicted in equation 4, in thisexample k is 2. Then k bits must be extracted from the string of bits to produce thetwo address bits. The next step is to convert the address bits into decimal form; thisrequires knowledge of the power base 2 function as well as elementary looping , addition and subtraction functions. Depending on the approach to this step, multiplication mayalso be required. The two address bits translate to 1 in decimal form, as shown in ﬁgure2, i.e. D and D . The decimal number points to the data bit D1 that contains the valueto be returned. The index begins at 0 and proceeds from the left towards the right, asshown in Figure 2.

1 1 0 0

Condition

0 1 : Action : 1

Address Data Channels D D D D D D Figure 2: 6-bit Multiplexer problem showing the address bits and the data bits of thecondition, this distinction is not provided to the learning system.Besides functions, the experiential toolbox will also contain skills. These are ca-pabilities that the agent will have learned or will have been given beforehand; one ex-ample is the looping skill. Skills, unlike functions, do not have a return value, but canmanipulate pointers to values (e.g. move around a bitstring). For example, a humanunderstands all the operations required for counting k number of bits, starting from theleft of the input string. Then a human would have to conceptualize how to convert theaddress bits to decimal, which requires the ability to multiply and add. If we wantedto increase the difﬁculty level, we could have the human determine the number of k address bits required for a particular problem: k = (cid:98) log L (cid:99) (4)Equation 4 determines the number of k address bits by using the length of the input.In this case the person would need familiarity with the log base 2 function as well asthe ﬂoor function. A human would eventually determine the address bits with increas-ing difﬁculty but a software system would have to learn this functionality before evenattempting to solve the n-bit Multiplexer problem. Evolutionary Computation Volume x, Number x M Alvarez et al.

The Carry-one domain is the set of problems that checks whether the addition of twonumbers, in the form of binary numbers, carries one in the addition of the highest-levelbits of the two numbers. Binary numbers are represented by bitstrings. The input ofCarry-one problems is a bitstring concatenating the two bitstrings of the two binarynumbers to be added.Humans can approach this problem in various ways. In this work, we design atraining ﬂow to check whether the summation of two binary numbers, as a bitstring,has a length higher than the length of the two half-bitstrings representing the two bi-nary numbers. First, the learning agent should learn to detach the two half-bitstrings toobtain the two binary numbers. However, the learning agent has no idea about whichparts represent the bitstrings of the two binary numbers. Therefore, the ﬁrst trainingstep is to teach the learning agent to obtain the half length of the input attributes. Then,the next step is to train the agent to extract two half-bitstrings as the two binary num-bers to be added with the knowledge of how many bits to be used. After that, thelearning agent is required to obtain the bitstring representing the result of the binarysummation. Finally, the last training stage is to check whether a bit is carried at thehighest-level bit. Humans can anticipate that the solutions for these processes wouldpossibly require the following skills and functions: binary addition, head list extrac-tion, tail list extraction, value comparison, division, and constant (the “Half Length”problem would require a constant number of value ). Even-parity problems check whether the number of bits s in the input bitstring is even.The operations of this problem domain are straightforward. We devised the trainingﬂow with two steps in Section 3.4, including “Sum Modulo 2” and “Is Even-parity”problems. For Majority-on domain, the learning agent is asked to check whether themajority of bits in the input are s . We can anticipate that this problem domain wouldinvolve with the subproblem “Half Length” from the Carry-one domain. The secondtraining step is to teach the learning agent to compare the summation of bits s withthe output of the “Half Length” problem. These two domains would be expected torequire summation of bitstring, modulo , constant, and comparison skills. According to the analysis above, the probable methods of separating the Multiplexerand Carry-one domains are shown in Figure 3 and Figure 4 respectively. These train-ing ﬂows require a “human teacher” to form these “curricula”. The training ﬂow ofthe Multiplexer domain has ﬁve main steps corresponding to the ﬁrst six subproblemslisted below, while the Carry-one domain is divided into six other subproblems from“Half Length” to “Is Carried” problems. Each subsequent part builds upon the ruleslearned from the previous step as well as from the Axioms provided. Figure 5 illus-trates the relationships between the Axioms, skills and learned functionality and theirCF representation in Multiplexer subproblems. The ﬁgure also depicts how the typeof problem faced can feed domain speciﬁc functionality into the experiential toolboxof the system. This is shown by the arrow ﬂowing from the Multiplexer domain to-wards the Experiential Toolbox. All subproblems for the four benchmark problemsare described below with samples provided in the Supplementary material. Table 2shows the set of the functions to be learned, note that these were furnished in order, asa curricula. These functions correspond to the subproblems of the curricula.8

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

Multiplexer k: length of address bitsA: address bitsd: data bit index (among data bits)D: data bit positionint(d)int(D)

Data Bit

Data channel

Data bit

PositionAddress Bits

MultiplexerInput bitstringString(A)int(k) int(k)

Address LengthMux data bit

Figure 3: Multiplexer training ﬂow. Each stage of this ﬂow is designed to obtain anaspect of the logic behind the Multiplexer domain. These stages follow the analysis ofthe Multiplexer problem in Section 3.At each step, the system has access to leaf node candidates, hard-coded functions,the learned CFs, and learned CF-ruleset functions. Table 1 shows a listing of all theskills made available to the system along with their system tags (used to interpret re-sults) and their input/output data types. This function list is anticipated to be usefulbased on the above analysis of the benchmark problem domains. In addition to re-quired skills, we also provide extra skills that complement the anticipated ones. Theseskills could possibly provide unexpected solutions or at least test the ability of XCSCF*to ignore redundant irrelevant skills.It is important to note that the work presented here does not seek to provide alearning plan for a system to follow and ultimately arrive at the solution to a givenproblem. The aim here is to facilitate learning in a series of steps, where in this casethe learned functionality could potentially help a system to arrive at a general solutionto any set problem. In other words, it is important for the system to learn to mix andmatch the different learned functions in a way contributive to learning; a way thatwill produce a general solution. The number of subordinate problems can always beincreased in the future, e.g. learning basic functions such as an adder or a multipliervia Boolean functions or even learning the log function via training data.

Multiplexer Address Length - kBitsGivenLength

The ﬁrst step is to determine the number of k address bits that will contribute to thesolution for the n-bit Multiplexer. The length function (Table 1) furnishes the systemwith the length of the environment state instead of the constant L in the previous ver- Evolutionary Computation Volume x, Number x M Alvarez et al.

Carry-one

L/2

A: first number

B: second numberL: length of input bitstring(B) bitstring(A+B)

Length(A+B)

Carried Bit

Binary SumLength ofBinary SumIs CarriedHead String Tail StringHalf Length bitstring(A) Carry-oneInput bitstring

Figure 4: Carry-one training ﬂow.Table 1: Functionality Provided (Hard-coded functions)Functions Tags Input OutputFloor [ Float IntegerCeiling ] Float IntegerLog { Float FloatLength L String IntegerPower 2 Loop (binary to decimal) d String IntegerAdd + Floats, Integers IntegerSubtract − Floats, Integers FloatMultiply ∗ Floats IntegerDivide / Floats IntegerValueAt @ String, Integer BinaryConstant c None IntegerStringSum sum

String IntegerBinaryAddition ⊕ Strings StringBinarySubtraction (cid:9)

Strings StringHeadList ( String, Integer StringTailList ) String, Integer StringisGreater > Floats, Integers BinaryModulo % Integers Integer10

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

Mux Problem Carry-on Problem ?

1, 2, …, L

Floor, Ceiling, etc.

Axiomatic Functions

ConstantsSkills

Move RightMove LeftLoop

Experimental Toolbox

General

Mux specific Carry specific ?

Leaf-node

Candidates

LCS

Figure 5: Training encompasses different types of functions, skills and axioms. Theexperiential toolbox will contain general and problem speciﬁc learned functionality.The question marks indicate the next domain and functionality learned from it.Table 2: Functions to be learned.Functions Tags Input OutputKBitsGivenLength k l Integer IntegerKBits k String IntegerKBitString k s String IntegerBin2Int b d String FloatAddressOf d c String FloatValueAt M @ String BinaryHalfLength h Integer IntegerHeadString S h String StringTailString S t String StringBinarySum S + String StringLengthBinarySum L + String IntegerisCarried iC String BooleanSumMod2 sm String IntegerisEvenParity iP e String BooleanisMajorityOn iM String Booleansion of XCSCF* (Alvarez et al., 2016). The training data-set used consists of instances ofpossible lengths and the corresponding number of address bits.

Multiplexer Address Length - kBits

This step is to determine the number of k address bits when the input is the originalinput of the Multiplexer problem. The training dataset in this problem replaces theinput lengths of the previous “kBits given Length” problem ( k l ) with the input bitstringof the Multiplexer problem at various scales. Evolutionary Computation Volume x, Number x M Alvarez et al.

Multiplexer Address Bits - kBitString

This part extracts the ﬁrst k bits from a given input string. The data-set will be randombit strings, say length 6, and a given k length where the action is the ﬁrst k bits. Multiplexer Data Channel - Bin2Int

This problem trains the learning agent to convert a binary number to a decimal integer.This is crucial because the system needs this information to determine the position ofthe data bit. However, this is not a trivial task as the system would need to be cog-nizant of many functions that a human would potentially already have in their experi-ential toolbox. The data-set will be random strings with the action being the equivalentinteger number.

Multiplexer Data Bit Position - AddressOf

This functionality determines the location of the data bit given the input bitstring. Thisproblem is to guide the learning agent to discover the addition of the address lengthand the decoded data channel. The data-set will be random strings and decoded ad-dress with the integer action.

Multiplexer Data Bit - ValueAt

The functionality to be learned is to return the bit referenced from a bitstring. Thesystem is trained using a dataset of bitstrings of varied lengths (from to ) with areference integer and corresponding output bit. This problem is actually a Multiplexerproblem with variable scales. Sum Modulo 2 - SumMod2

This problem is the ﬁrst step of training the Even-parity problem domain. It determineswhether the total number of bits in the input bitstring is even or odd. The groundtruth of this problem is the summation of all bits in the input bitstring modulo . Is Even-parity - isEvenParity

This step is the ﬁnal step of training the Even-parity domain. This problem expects

T rue if the number of bits in the input bitstring is even and F alse otherwise. It isvariable in size (from 1-bit to 11-bit Even-parity problems) to encourage only generalsolutions for the Even-parity problem domain. With varied scales, the solution cansolve the problem at any scale. The Even-parity problems with relatively small scalesare already intractable to traditional XCS with the ternary encoding because XCS mustform a one-to-one mapping of instances to rules.

Half Length - HalfLength

This problem is a regression problem, which requires the learning agent to return thehalf-length of the input bitstring. This problem can provide prerequisite knowledge forboth the Carry-one and Majority-on domains.

Head String - HeadString

This is a step in the training ﬂow of the Carry-one domain, see Figure 4. It trains thelearning agent to obtain the ﬁrst number of the addition, which is the ﬁrst half of theinput bitstring. The outputs are binary numbers represented by bitstrings.

Tail String - TailString

This step is similar to the “Head String” problem but the expected output is the latterhalf of the input bitstring or the second number of the addition.12

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

Binary Summation of Two Strings - BinarySum

This problem requires the learning agent to add the outputs of the two preceding prob-lems, which are the output of two input numbers in the Carry-one domain and alsorepresented by bitstrings.

Length of Binary Sum - SumStringLength

The expected output of this problem is the length of the output from the precedingproblem. This is to learn to predict the length of the binary number resulted by addingthe binary number of the ﬁrst half and the binary number of the second half.

Is Carry-one - isCarried

This requires the general logic behind the Carry-one problem domain. It is to determinewhether is carried at the highest bit when adding the binary number of the ﬁrst halfand the binary number of the second half. The scales of this problem were set to varyfrom 2-bit to 12-bit. Is Majority On - isMajorityOn

This problem is the ﬁnal stage of training the Majority-on domain. It expects a returnedvalue of

T rue if more than half the bits in the input are on ( ), and F alse otherwise. Thesize of input bitstrings are randomly selected from the range of [1 , bits. This work disrupts the standard learning paradigms in EC, where the goal is to learnabilities using a top-down approach, by aligning it with LL. The proposed work uses abottom-up approach by learning functions and using parts or entire functions to solvemore difﬁcult problems. In other words, the method here is to specify the order ofproblems/domains (together with robust parameter values) while allowing the systemto automatically adjust the terminal set through feature construction and selection, andultimately develop the function set. This is analogous to a school teacher determiningthe order of threshold concepts for a student in a curricula (Meyer and Land, 2006). Thesystem can use learned rule-sets as functions along with the associated building blocks,i.e. CFs, that capture any associated patterns; this is an advantage over pre-specifyingfunctionality.This method modiﬁes the intrinsic problem from ﬁnding an overarching ‘single’solution that covers all instances or features of a problem to ﬁnding the structure (links)of sub-problems that construct the overall solution. Learning the underlying patternsthat describe the domain is anticipated to be more compact and reusable as they donot grow as the domain scales (unlike individual solutions that can grow impracticallylarge as the problem grows, e.g. DNF solutions to the Multiplexer problem).We employed an adapted XCSCFA as the algorithm for the agents of XCSCF* thatlearn subproblems. This XCSCF* has the type-ﬁtting property that can: (1) verify thetype compatibility between connected nodes within generated CFs; and (2) the outputtype of CFs is compatible with the required actions of the current problem environ-ment . There are sufﬁcient novel contributions to XCSCF* to warrant a new acronym, but as the old one is nowsuperseded and the LCS ﬁeld already has many acronyms, XCSCF* is retained.Evolutionary Computation Volume x, Number x M Alvarez et al.

Table 3: Leaf node candidates.

LEN is the length of input bitstring.Leaf node candidates Tag TypeBase CFs of separated attributes D , D , etc. type of attributesList of all attributes attlst StringConstants , , ..., LEN Integer

In addition to type-ﬁtting CFs, the next important adjustment is pre-provided CFs havebeen introduced to this version of XCSCF*. Table 3 describes pre-provided CFs, whichare all candidates for leaf nodes of CFs. First, leaf node candidates include the CFsrepresenting input attributes, called “base CFs”, which resemble base CFs in XOFs(Nguyen et al., 2019b).In the previous version of XCSCF* (Alvarez et al., 2016), a constant L for the lengthof input bitstrings is provided as a possible leaf node for CFs. This feature is infeasiblewhen the subproblems have variable scales. Therefore, the second change of to XCSCF*to provide another base CF listing all attributes (termed attlst ) in the order providedby the problem. We hypothesise that this new feature improves the generality of thesystem by providing inborn knowledge. Lastly, to learn more general problems, it isnecessary to provide a system with arbitrary constants. In the limit of Boolean prob-lems, CF generation can access constants (constant CFs) of values from to the ‘lengthof the current input attributes’ as possible leaf nodes. The constant L in the previousimplementation can be obtained by the provided function Length in Table 1.

Type-ﬁtting Code Fragments

Inspired by Strongly-Typed GP (Montana, 1995), we propose here the type-ﬁtting prop-erty for CFs to reduce the search space by ﬁtting each node with only compatible inputsand outputs. CFs with the new type-ﬁtting property, called typed CFs, are designed tocreate workable and eligible CFs. Being workable refers to the compatibility of the out-put of CFs with the expected actions of the target problem and the compatibility amongthe function nodes of a CF. The output type of a function in the node cf i must be com-patible with the input types of the function in the node cf j that takes cf i as input.Being eligible includes two conditions: the output type of the root node function mustbe compatible with at least one of the action types of the problem, and the leaf nodesare CFs from the leaf-node candidates. For example, when selecting a leaf node that isthe ﬁrst input to a function that requires the ﬁrst input to be of type String (e.g. sum , @ , d , etc., see Section 3.4), the only compatible leaf node candidate is the attlst . Ul-timately, the type-ﬁtting property keeps learning agents from generating unworkableCFs.Accordingly, generating typed CFs applies a top-down recursive process of gen-erating tree nodes, i.e. the function genNode illustrated in Algorithm 1. We keep thedepth limit of CFs as , as is the original deﬁnition of CFs (Iqbal et al., 2014). Generat-ing a new CF needs to match with the action types of the problem and available outputtypes from the leaf-node candidates. First, the top node of a typed CF must employ afunction with output types compatible with the action types of the problem. Then theprocess recursively builds lower-level nodes that satisfy the type-ﬁtting property. Atany point when generating nodes, there is also a ﬁxed probability of . for generatinga leaf node from the leaf-node candidates, which stops the CF from going any deeper.To reduce the search space and generate veriﬁable CFs, it is necessary to have14 Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

Algorithm 1

Typed CFs are generated based on a recursive function for generatingnodes. The function is given the set of action types T a , the type set of base CFs T b , theexpected output types T o , the expected input types T i , the intermediate level l i , and aclustered set of all functions S f . procedure GEN N ODE ( T o , T i , l i ) T (cid:48) i = φ if l i = 2 then Output types T o = T a if l i = 1 then Output types T i (cid:48) = T b ∪ { integer } Filter function set S filtered from S f by required output types T o and input types T i Function f = randomSelect ( S filtered ) for index i in f.inputs do if l i − f.level > and random [0 , < . then f.inputs [ i ] = GEN N ODE ( f.input types [ i ] , T i (cid:48) , l i − f.level ) else Set of compatible base CFs S bCF = φ if integer ∈ f unction.input types [ i ] then c = randomSelect ([1 , ..., len ( Atts )]) for cf base in all base CFs do if cf base .out types & f.input types [ i ] (cid:54) = φ then Add cf base to S bCF f.inputs [ i ] = randomSelect ( S bCF ) compatibility rules among the four value types (Binary, Integer, Real, and String). Wefollowed the sense of numerics as well as the type compatibility of the programminglanguage (Python) to devise compatibility rules among types. Boolean variables arecompatible with integers and ﬂoats, and integers are compatible with ﬂoats, the com-patibility does not follow the opposite way. Lists are not compatible with other types. The experiments were executed times with each having an independent randomseed. The stopping criterion was when the agent completed the number of traininginstances allocated, which were chosen based on preliminary empirical tests on theconvergence of systems. The proposed systems were compared with XCSCFC and XCS.The settings for the experiments are common to the LCS ﬁeld (Urbanowicz and Browne,2017) and similar to the settings of the previous version of XCSCF* (Alvarez et al.,2016). They were as follows: Payoff , ; the learning rate β = 0 . ; the Probabilityof applying crossover to an offspring χ = 0 . ; the probability of mutation µ = 0 . ;the probability of using a don’t care symbol when covering P don (cid:48) tCare = 0 . ; theexperience required for a classiﬁer to be a subsumer Θ sub = 20 ; the initial ﬁtness valuewhen generating a new classiﬁer F I = 0 . ; the fraction of classiﬁers participating in atournament from an action set . . In addition, error threshold (cid:15) was set to . . This Evolutionary Computation Volume x, Number x M Alvarez et al. new XCSCF* naively uses the same population size N = 1000 for all problems. Figures 6a - 6f show that training was successful in the sub-problems, which enabledXCSCF* to reuse the learned CF functionality of the Multiplexer problem. XCSCF*also successfully solved the subproblems of the Carry-one domain (Figures 7a - 7f),the Even-parity domain (Figures 8a and 8b), and the Majority-on domain (Figures 7aand 8c) (note the use of the HalfLength problem twice). The numbers of rules aftercompaction and CFs generated by all problems were generally only , except for the“Is Even-parity” problem with a little diversity of the genotypes of ﬁnal solutions (seeSection 5.3). Reusing solutions from small-scale problems to solve large scale problemsis plausible because maximally general rules are kept general without speciﬁc conditionbits when used in larger-scale problems. This requires the logic behind the rule actionsof the ﬁnal solutions to be generalisable to the learned problems. . . . . . . Instances (x 1000) P e r f o r m an c e (a) kBitsGivenLength . . . . . . Instances (x 1000) P e r f o r m an c e (b) kBits . . . . . . Instances (x 1000) P e r f o r m an c e (c) kBitString . . . . . . Instances (x 1000) P e r f o r m an c e (d) Bin2Int . . . . . . Instances (x 1000) P e r f o r m an c e (e) AddressOf . . . . . . . Instances (x 1000) P e r f o r m an c e (f) ValueAt Figure 6: Learning curves of the subproblems of the Multiplexer domain.Figure 9 shows that only the proposed system XCSCF* and XCSCFC were ableto solve the 135-bit Multiplexer problem. These experiments followed the standardexplore and exploit phases of XCS. This shows scaling by relearning, but it is the cap-turing of the underlying patterns without retraining, which is the aim of this work.16

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs . . . . . . Instances (x 1000) P e r f o r m an c e (a) HalfLength . . . . . . Instances (x 1000) P e r f o r m an c e (b) HeadString . . . . . . Instances (x 1000) P e r f o r m an c e (c) TailString . . . . . . Instances (x 1000) P e r f o r m an c e (d) BinarySum . . . . . . Instances (x 1000) P e r f o r m an c e (e) SumStringLength . . . . . . . Instances (x 1000) P e r f o r m an c e (f) isCarried Figure 7: Learning curves of the subproblems of the Carry-one domain.Tests were conducted on the ﬁnal rules produced by the ﬁnal subproblem of theMultiplexer, the Carry-one, the Even-parity, and the Majority-on domains to determineif they were general enough to solve the corresponding problems at very large scales.Table 4 shows that the rule produced by the small-scale Multiplexer problem was ableto solve the 1034-bit and even the 8205-bit Multiplexer problems . Similarly, the ﬁnalrules of the ﬁnal subproblem of other domains also achieved accuracies on cor-responding problems at all tested large scales. The system used to test the generalityof the rules was in straight exploitation: there was no covering, rule generation, or ruleupdate. The rule produced by the “Multiplexer Data Bit” problem, the ﬁnal subproblem of theMultiplexer domain, is illustrated in Table 5. This rule is maximally general with nospeciﬁed bit in its rule condition. Also, it seems very simple and neat for the generallogic of the Multiplexer domain. This is not surprising given the functions accumulatedby the experiential toolbox. The fully expanded tree in the rule action produced by the Note that is a vast number, meaning that testing a million instances is a fractionally small sub-sample, but will identify many deﬁciencies.Evolutionary Computation Volume x, Number x M Alvarez et al. . . . . . . . Instances (x 1000) P e r f o r m an c e (a) SumModulo2 . . . . . . . Instances (x 1000) P e r f o r m an c e (b) isEvenParity . . . . . . . Instances (x 1000) P e r f o r m an c e (c) isMajorityOn Figure 8: Learning curves of the subproblems of the Even-parity ((a) and (b)) andMajority-on domains (c). The Majority-on domain also utilises the “Half Length”subproblem of the Carry-one domain (7a).Table 4: Accuracy tested on large-scale problems reusing solutions from ﬁnal subprob-lems without training. Problems Accuracies1034-bit Multiplexer “Multiplexer Data Bit” problem, the ﬁnal subproblem of the Multiplexer domain, isillustrated in Figure 10. Function nodes follow the function tags in Table 1. The dashedboxes in this ﬁgure are the reused learned functions with names deﬁned in Figure 2.The tree in Figure 10 is the rule action of the one compacted rule for the n-bitMultiplexer problem. It accumulates a high-level function with many nested-layers ofcomplexity. This complex tree can encapsulate the logic behind the n-bit Multiplexerproblem through the guidance of all Multiplexer subproblems. For instance, the mainbuilding block of this tree is in the code fragment CF , which provides the data bitposition in the input bitstring using the function d c learned from the “Data Bit Posi-tion” problem. This function d c is also a complex function involving an addition (+) of the outputs from two reused functions within it, k from the “Multiplexer AddressLength” problem and b d from the “Multiplexer Data Channel” problem. The function b d converts the binary-string output of the function k s from the “Multiplexer AddressBits” problem to a decimal value. k s returns the ﬁrst “Multiplexer Address Length”bits from the attlst (the input bitstring) using the function k . The function k is alsonested function reusing a simpler function k l from the “Multiplexer Address Length”18 Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs . . . . . . Instances (x 1000) P e r f o r m an c e . . . . . . . . . . . . Instances (x 1000) P e r f o r m an c e

135 Bit Multiplexer[1] − 135−bits using XCSCF*[2] − 135−bits using XCSCFC [3] − 135−bits using XCS [1] [2] [3]

Figure 9: Performance of XCSCF* and XCSCFC on the 135-bit Multiplexer problem.Wilcoxon signed rank test shows no signiﬁcant difference when converged.problem (with Multiplexer scale as the input). The block k is reused twice in the ﬁnalsolution M @ for the Multiplexer domain. The logic of the n-bit Multiplexer problemin the compacted rule with M @ in Table 5 was validated on the 1034-bit and 8205-bitMultiplexer problems (see Table 4).Table 5: Final rules learned before compaction while solving the “Multiplexer Data Bit”problem, the ﬁnal subproblem of the Multiplexer domain, cf. Figure 10.Condition Action attlst CF @ Other ﬁnal rules of the Carry-one, Majority-on, and Even-parity domains alsoachieve maximal generality with all “don’t-care” bits in the condition part. These ruleswere also validated on the corresponding domains at very large scales. The trees in therule actions of these ﬁnal rules are illustrated respectively in Figure 11, 12, and 13. Be-sides the Multiplexer domain, the Carry-one problem domain requires six subproblemsto obtain the ﬁnal logic, which resulted in a high complexity of the rule action. The ﬁnalfunction iC has ﬁve distinct nested functions within it and three occurrences of func-tion h . The complexity of the solution for the n-bit Carry-one problem is equivalent tothe complexity of function for the n-bit Multiplexer.As the training ﬂows of the Majority-on and Even-parity domains are straightfor-ward, XCSCF* also discovered simpler rule actions in the ﬁnal rules. XCSCF* yieldedseveral different solutions for the Even-parity domains. The two most popular onesare illustrated in Figure 13. Solution 1 in this Figure appeared in most runs. Anothersolution found in only two runs is identical in logic to the solution 2, but the node c (constant CF of value ) is replaced with another CF that uses the division operator Evolutionary Computation Volume x, Number x M Alvarez et al.

Multiplexer (M@)

CF61@ + 2d( [{L[{Lk l kk s b2dd c attlstMUXoutput Figure 10: Multiplexer solution. Function nodes follow the tags in Table 1. This solutionuses nested learned functionalities in dashed boxes, which follow the tags in Table 2.between c and a value of more than . It can be said that the reason XCSCF* is capable of solving problems to a much largerscale than previously is that human knowledge separated the problem into appropriateand simpler sub-problems. Nevertheless, it is still a difﬁcult task to learn each sub-taskin such a manner that the learned knowledge/functionality could be transferred andthen to learn to combine these blocks effectively. It is considered that the solutions ofthe tested problem domains, i.e. the Multiplexer, Carry-one, Even-parity, and Majority-on domains, yielded by XCSCF* contain the general logic of these domains and cansolve these problems at any scale.The way that humans select sub-problems is similar to that of humans selectingfunction sets in standard EC approaches where too few or inappropriate selection pre-vents effective learning, while selecting too many unnecessary components could in-hibit training. In these experiments, a number of redundant functions, such as theceiling and the multiplication, and functions useful for only one speciﬁc problem do-main, were never used by the ﬁnal evolved solutions. XCSCF*, however, can identifythe correct combination of accumulated knowledge to build complex solutions for thetested tasks.20

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs

Carry-one (iC) >L ⊕ )L( / attlstCarry-oneoutput/L c2 c2 L / c2 hS h S t S + L + Figure 11: Carry-one solution.

Majority-on (iM) /L c2> sumh attlstMajority-on output

Figure 12: Majority-on solution.Two main components of XCSCF* enables it to solve the tested problems fully.First, the supply of constants furnishes the required functionalities in the Carry-one,Even-parity, and Majority-on domains, as shown in the CFs of the ﬁnal solutions. Sec-ond, the availability of the CF attlst also contributes to solving the Carry-one, Majority-on, and Even-parity domains because it provides appropriate input for general func-tions, e.g.

StringSum , HeadList , and

T ailList . It is argued that we can still inputthe environment state implicitly to all such functions. However, this method createsthe complication of deﬁning the environment state when these functions are nested

Evolutionary Computation Volume x, Number x M Alvarez et al.

Even-parity (iP e ) % sum c2sm2 attlst ! output (a) Solution 1 Even-parity (iP e ) % sum c2 > c1 sm2 attlst output (b) Solution 2 Figure 13: Two most common solutions of the Even-parity domain. Solution 1 is morepopularly discovered than solution 2.in rule-set functions. Furthermore, deciding which functions should take the environ-ment state by default and which functions should choose other string inputs requiresextra human intervention. An extra beneﬁt of using attlst is that XCSCF* can nowsolve variable-scale problems in the tested domain. Previously, supplying a constant L meant that the problem scale could not change.It is evident that the proposed work has beneﬁted from the transfer of learnedinformation from each of the sub-problems. Reusing functionalities enables the systemto achieve neat and abstract solutions although these solutions are actually complexwithout bloat when fully expanded. Although a deﬁned recipe was not furnished tothe system, it was able to form logical determinations as to the ﬂow of the accumulatedfunctionality, see Figure 10. This property of the system is similar to deriving a set ofThreshold Concepts where signiﬁcant learning towards the ﬁnal target problem onlyadvances once the proper chain of functionality is formed and evaluated. In this paper, we have introduced a developed LL system, i.e. XCSCF*, that can trans-fer learned knowledge and functionality. Starting from having minimal general knowl-edge (functions and skills) on the Boolean domain and some speciﬁc basic knowledgenecessary for the target problems, XCSCF* is capable of learning general solutions tocomplex problem domains, i.e. the Multiplexer, Carry-one, Majority-on, and Even-parity domains, through analogies to the LL approach. By breaking down the problemdomain into component sub-problems, providing the necessary axioms and transfer-ring learned functionality in addition to knowledge, it is possible to identify generalrules that can then be applied to any-scale problems in the domain. Another impor-tant observation is that not all of the provided functionality was utilised in the ﬁnalsolutions.Certain improvements of XCSCF* have been developed to enable learning the logicbehind more general problems, such as the Multiplexer, Carry-one, Majority-on, andEven-parity domains. Removing the implicit connections between the instance and afew functions requires explicit connections between such functions and a newly pro-22

Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSs vided attribute list, a more general input to replace the human-generated constant L .This explicit connections allows these functions to take any string-type inputs. There-fore, new XCSCF* provides more ﬂexible logic and reduces the need for customisationto a given task. Also, the type-ﬁtting property assures that generated CFs are compat-ible within themselves as well as with the target problem, which results in the abilityto divide the search space by input and output types of available functions. Thus, thisstyle of learning system can have access to more functionality than necessary for a sin-gle problem, without inhibited learning.The general solutions from XCSCF* was validated by solving very difﬁcult prob-lems like the n-bit Multiplexer, n-bit Carry-one, n-bit Even-parity, and n-bit Majority-on problems. Although the aforementioned problems are comprised of a vastlysized search space, the proposed technique successfully discovered a minimal num-ber (mostly one) of general rules. An advancement of this work was that the logic ofcomplex problems was captured by simple trees when being described by the learnedfunctionalities. However, once fully expanded, the CF trees contain certain complexnested patterns. Thus, LL can facilitate interpreting complex tree solutions using theintermediate learned components from the intermediate stages of LL.Future work seeks to create a continuous-learning system with base axioms and anumber of problems, including their possible subproblems, to be solved in a parallelarchitecture simultaneously. The ‘toolbox’ of functions (learned functions and axioms)plus the complementary knowledge (CFs) will grow as problems are solved and willbe available for addressing future problems. The linked knowledge of solved problemswould demonstrate interesting meta-knowledge, a form of learning curricula, and pos-sible relationships among known problems, such as n-bit Multiplexer, n-bit Carry-one,etc. Furthermore, the research question is whether an XCS-based system with LL orparallel learning can solve real-valued datasets. The ﬁrst thing to consider is to estab-lish real-valued datasets that furnish LL. References

Alvarez, I. M., Browne, W. N., and Zhang, M. (2014a). Reusing learned functionality in XCS:Code fragments with constructed functionality and constructed features. In

Proceedings of theCompanion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation ,GECCO Comp 14, page 969976, New York, NY, USA. Association for Computing Machinery.Alvarez, I. M., Browne, W. N., and Zhang, M. (2014b). Reusing learned functionality to addresscomplex boolean functions. In

Simulated Evolution and Learning , Lecture Notes in ComputerScience, pages 383–394. Springer International Publishing.Alvarez, I. M., Browne, W. N., and Zhang, M. (2016). Human-inspired scaling in learning clas-siﬁer systems: Case study on the n-bit multiplexer problem set. In

Proceedings of the Geneticand Evolutionary Computation Conference 2016 , GECCO 16, page 429436, New York, NY, USA.Association for Computing Machinery.Bull, L. (2015). A brief history of learning classiﬁer systems: from CS-1 to XCS and its variants.

Evolutionary Intelligence , 8(2-3):55–70.Butz, M. V. (2006).

Rule-Based Evolutionary Online Learning Systems . Number v. 191 in Studies infuzziness and soft computing. Springer-Verlag, Berlin, Germany. OCLC: ocm61219110.Butz, M. V. and Wilson, S. W. (2000). An algorithmic description of XCS. pages 253–272.Falkner, N. J. G., Vivian, R. J., and Falkner, K. E. (2013). Computer science education: The ﬁrstthreshold concept. In , pages 39–46.IEEE.

Evolutionary Computation Volume x, Number x M Alvarez et al.Feng, L., Ong, Y.-S., Tan, A.-H., and Tsang, I. W. (2015). Memes as building blocks: a case studyon evolutionary optimization + transfer learning for routing problems.

Memetic Computing ,7(3):159–180.Holland, J. H. (1975).

Adaptation in natural and artiﬁcial systems: An introductory analysis with appli-cations to biology, control, and artiﬁcial intelligence . Adaptation in natural and artiﬁcial systems:An introductory analysis with applications to biology, control, and artiﬁcial intelligence. TheUniversity of Michigan Press, Ann Arbor, Oxford, England.Holland, J. H. (1976). Adaptation.

Progress in Theoretical Biology , pages 263–293.Huelsbergen, L. (1998). Finding general solutions to the parity problem by evolving machine-language representations.

Genetic Programming , pages 158–166.Ioannides, C. and Browne, W. (2008). Investigating scaling of an abstracted LCS utilising ternaryand S-expression alphabets. In Bacardit, J., Bernad´o-Mansilla, E., Butz, M. V., Kovacs, T.,Llor`a, X., and Takadama, K., editors,

Learning Classiﬁer Systems , pages 46–56. Springer BerlinHeidelberg, Berlin, Heidelberg.Iqbal, M., Browne, W. N., and Zhang, M. (2012). XCSR with computed continuous action. In

AI 2012: Advances in Artiﬁcial Intelligence , pages 350–361, Berlin, Heidelberg. Springer BerlinHeidelberg.Iqbal, M., Browne, W. N., and Zhang, M. (2013a). Evolving optimum populations with XCSclassiﬁer systems.

Soft Computing , 17(3):503–518.Iqbal, M., Browne, W. N., and Zhang, M. (2013b). Extending learning classiﬁer system withcyclic graphs for scalability on complex, large-scale boolean problems. In

Proceedings of the15th Annual Conference on Genetic and Evolutionary Computation , GECCO 13, page 10451052,New York, NY, USA. Association for Computing Machinery.Iqbal, M., Browne, W. N., and Zhang, M. (2013c). Learning overlapping natured and niche im-balance boolean problems using XCS classiﬁer systems. In , pages 1818–1825. IEEE.Iqbal, M., Browne, W. N., and Zhang, M. (2014). Reusing building blocks of extracted knowledgeto solve complex, large-scale boolean problems.

IEEE Transactions on Evolutionary Computation ,18(4):465–480.Koza, J. R. (1991). A hierarchical approach to learning the boolean multiplexer function. 1:171 –192.Lanzi, P. L. and Riolo, R. L. (2000). A roadmap to the last decade of learning classiﬁer systemresearch (from 1989 to 1999). In Lanzi, P. L., Stolzmann, W., and Wilson, S. W., editors,

LearningClassiﬁer Systems , pages 33–61, Berlin, Heidelberg. Springer Berlin Heidelberg.Meyer, J. H. F. and Land, R. (2006).

Overcoming Barriers to Student Understanding: Threshold con-cepts and troublesome knowledge . Routledge.Montana, D. J. (1995). Strongly typed genetic programming.

Evolutionary Computation , 3:199–230.Nguyen, T. B., Browne, W. N., and Zhang, M. (2019a). Improvement of code fragment ﬁtness toguide feature construction in XCS. In

Proceedings of the Genetic and Evolutionary ComputationConference , GECCO 19, page 428436, New York, NY, USA. Association for Computing Machin-ery.Nguyen, T. B., Browne, W. N., and Zhang, M. (2019b). Online feature-generation of code frag-ments for XCS to guide feature construction. In , pages 3308–3315. IEEE.Pan, S. J. and Yang, Q. (2010). A survey on transfer learning.

IEEE Transactions on Knowledge andData Engineering , 22(10):1345–1359. Evolutionary Computation Volume x, Number x

Layered Learning Approach for LCSsPrice, C. J. and Friston, K. J. (2005). Functional ontologies for cognition: The systematic deﬁnitionof structure and function.

Cognitive Neuropsychology , 22(3-4):262–275.Schaffer, J. D. (1985). Learning multiclass pattern discrimination. In

Proceedings of the 1st Interna-tional Conference on Genetic Algorithms , page 7479, USA. L. Erlbaum Associates Inc.Stone, P. and Veloso, M. (2000). Layered learning. In L´opez de M´antaras, R. and Plaza, E., editors,

Machine Learning: ECML 2000 , pages 369–381, Berlin, Heidelberg. Springer Berlin Heidelberg.Urbanowicz, R., Granizo-Mackenzie, A., and Moore, J. (2012). Instance-linked attribute trackingand feedback for michigan-style supervised learning classiﬁer systems. In

Proceedings of the14th Annual Conference on Genetic and Evolutionary Computation , GECCO 12, page 927934, NewYork, NY, USA. Association for Computing Machinery.Urbanowicz, R. J. and Browne, W. N. (2017).

Introduction to Learning Classiﬁer Systems . Springer-Briefs in Intelligent Systems. Springer-Verlag, Berlin Heidelberg.Urbanowicz, R. J. and Moore, J. H. (2009). Learning classiﬁer systems: A complete introduction,review, and roadmap.

Journal of Artiﬁcial Evolution and Applications , 2009.Wilson, S. W. (1995). Classiﬁer ﬁtness based on accuracy.

Evolutionary Computation , 3(2):149–175.