[PDF] Conscious Intelligence Requires Lifelong Autonomous Programming For General Purposes

Abstract

Universal Turing Machines [29, 10, 18] are well known in computer science but they are about manual programming for general purposes. Although human children perform conscious learning (i.e., learning while being conscious) from infancy [24, 23, 14, 4], it is unknown that Universal Turing Machiness can facilitate not only our understanding of Autonomous Programming For General Purposes (APFGP) by machines, but also enable early-age conscious learning. This work reports a new kind of AI---conscious learning AI from a machine's "baby" time. Instead of arguing what static tasks a conscious machine should be able to do during its "adulthood", this work suggests that APFGP is a computationally clearer and necessary criterion for us to judge whether a machine is capable of conscious learning so that it can autonomously acquire skills along its "career path". The results here report new concepts and experimental studies for early vision, audition, natural language understanding, and emotion, with conscious learning capabilities that are absent from traditional AI systems.

Full PDF

CConscious Intelligence Requires LifelongAutonomous Programming For General Purposes

Juyang Weng

Department of Computer Science and EngineeringCognitive Science ProgramNeuroscience ProgramMichigan State University, East Lansing, MI, 48824 USAGENISAMA LLC, East Lansing, MI, 48824 USAE-mail: [email protected].

Abstract

Universal Turing Machines [29, 10, 18] are well known in computer science but they are aboutmanual programming for general purposes. Although human children perform conscious learning (learning while being conscious) from infancy [24, 23, 14, 4], it is unknown that Universal Tur-ing Machiness can facilitate not only our understanding of Autonomous Programming For GeneralPurposes (APFGP) by machines, but also enable early-age conscious learning. This work reportsa new kind of AI—conscious learning AI from a machine’s “baby” time. Instead of arguing whatstatic tasks a conscious machine should be able to do during its “adulthood”, this work suggests thatAPFGP is a computationally clearer and necessary criterion for us to judge whether a machine iscapable of conscious learning so that it can autonomously acquire skills along its “career path”. Theresults here report new concepts and experimental studies for early vision, audition, natural languageunderstanding, and emotion, with conscious learning capabilities that are absent from traditional AIsystems.

To be conscious during adulthood, must a system be conscious from its “infancy” all the way intoits “adulthood”? All animals [36, 20] conduct lifelong APFGP but traditional AI systems do not [17,19, 7, 6]. This work argues that consciousness should not be an optional crown jewelry for AI [4, 15]but a necessity for credible AI. Unconscious AI has resulted in machines that are brittle because they areunaware of themselves and the physical world around them. Machines can become highly conscious bybootscraping its degree of consciousness through lifelong conscious learning.Consciousness must apply to different contexts that a life experiences. Furthermore, the term involvesmany entities, such as environment, awareness, cognition and behavior, which is learned through lifetimebased on genetically pre-positioned (i.e., developed) learning capabilities [24, 23, 4]. For example, howdoes a cattle or a human in Fig. 1 learn consciousness from infancy so that, after it has grown up, it cannavigate autonomously through the hustle and bustle of streets to reach its home?Can an artiﬁcial machine consciously learn to do the same and more? Weng et al. 2001 [33] proposed“autonomous mental development” as learning across lifetime that must be task-nonspeciﬁc. However,the link between “autonomous development” and “consciousness” still lacks a computational minimalset. 1 a r X i v : . [ q - b i o . N C ] J un igure 1: A conscious cattle and conscious humans navigate on a busy street of New Delhi, India.By “minimal set”, we mean a minimal set of mechanisms from which a machine, natural or artiﬁcial,can bootstrap its degree of consciousness from “early age” to “later age” so that a degree of consciousnessis already present during learning, not only after a static batch of learning. This minimal set makes thecausality of consciousness clearer.In order to gain understanding of what that minimal set is in terms of computation, let us start with amodel of computation well known in computer science but not directly related to consciousness till thiswork. Turing Machines, originally proposed by Alan Turing [29] in 1936, although not meant to explain con-sciousness, can assist us to understand how consciousness arises from computations by a machine, bothnatural and artiﬁcial.A Turing Machine [10, 18], illustrated in Fig. 2, consists of an inﬁnite tape, a read-write head, and acontroller. The controller consists of a sequence of moves where each move is a 5-word sentence of thefollowing form: ( q, γ ) → ( q (cid:48) , γ (cid:48) , d ) meaning that if the current state is q and the current input that the head senses on the tape is γ , then themachine enters to next state q (cid:48) , writes γ (cid:48) onto the tape, and its head moves in direction d (left, right, orstay) but no more than one cell away. 2 c Δ a t s a r e m y Figure 2: An example of Turing Machine. Each cell of the tape bears only a symbol. The controller has,at each integer time, a current state (e.g., 3).Intuitively speaking, let us consider each symbol in the above 5-word expression as a “word”. Thenall such 5-word expressions are “sentences”. Thus, a human-handcrafted “program” is a sequence ofsuch 5-word sentences the Turing Machine must follow in computation. Although such sentences arenot a natural language, they are more precise than a natural language.

How did Turing make the above machine general-purpose? All we need is to augment the meaning ofthe input on the tape: The tape contains two parts, program P and data x .In his 1936 paper [29], Turing explained in detail how a Turing Machine can be constructed toemulate program P applied to data x . This new kind of Turing Machines is now called Universal TuringMachines [10, 18]. We called it universal because the program P on the tape is open-ended, supplied byany users for any purposes. However, Universal Turing Machines are still not conscious.To see the link between Universal Turing Machines and consciousness, we must break a series ofrestrictions in Turing Machines, as explained in the next section. The eight conditions below were not well known to be necessary for consciousness. However, theAPFGP capability in the title requires all of them. That said, they are still insufﬁcient for giving riseto APFGP without the full Developmental Networks (DN) to be discussed in the next section.To facilitate memorization, let us summarize the eight conditions in eight words: Grounded, Emer-gent, Natural, Incremental, Skulled, Attentive, Motivated, Abstractive, giving acronym GENISAMA.Let us explain each of them below.

Grounded:

Grounded means sensors and effectors of a learner must be directly grounded in thephysical world in which the learner lives or operates. IBM Deep Blue, IBM Watson, and AlphaGo arenot grounded. Instead, it is humans who synthesize symbols from the physical world, and thus shieldmachines off from the rich physical game environments, including their human opponents.

Emergent : The signals in the sensors, effectors and all representations inside the “skull” of thelearner must emerge automatically through interactions between the learner and the physical world byway of sensors, effectors, and a genome (aka developmental program). A genome is meant to ﬁt thephysical world through the entire life, not only for a speciﬁc task during the life. For example, fruitﬂies must do foraging, ﬁghting and mating. Thus, task-speciﬁc handcrafting of representation in sensors,3ffectors, and inside the “skull” is inconsistent to consciousness. This emergence requirement rules outtask-speciﬁc and handcrafted representations, such as weights duplication in convolution used by deeplearning. Likewise, an artiﬁcial genetic algorithm without lifetime learning/development does not haveanything to emerge since each individual does not learn/develop in lifetime.

Natural : The learner must use natural sensory and natural motor signals, instead of human hand-synthesized features from sensors or hand-synthesized class labels for effectors, because such symbolsand labels are not natural without a human in the loop. For robots, natural signals are those directlyavailable from a sensor (e.g., RGB pixel values from a camera) and raw signals for an effector/actuator.IBM Deep Blue, IBM Watson and AlphaGo all used handcrafted symbols for the board conﬁgurationsand symbolic labels for game actions. Such symbols are not natural, not directly from cameras and notdirectly for robot arms.

Incremental : Because the current action from the learner will affect the next input to the learner(e.g., the current action “turn left” allows it to see left view), learning must take place incrementallyin time. IBM Deep Blue, IBM Watson and AlphaGo have used a batch learning method: all gameconﬁgurations are available as a batch for the learner to learn. The learner is not conscious of how it hasimproved from early mistakes—not self-conscious.

Skulled : The skull closes the brain of the learner so that any teacher’s direct manipulations with theinternal brain representations (e.g., twisting internal parameters) are not permitted. For example, howcan the brain be aware of what a neurosurgeon did inside the skull?

Attentive : The learner must learn how to attend to various entities in its environment — the bodyand extra-body environment. The entities include location (where to attend), type (what to attend), scaleto attend (e.g., body, face, or nose), as well as abstract concepts that the learner learned in life (e.g., amI doing the right thing?). IBM Deep Blue, IBM Watson and AlphaGo did not seem to think “what am Idoing?”.

Motivated : The beautiful logic that a Universal Turing Machine possesses to emulate any validprogram does not give rise to consciousness as we know it. By motivation, we mean that the learner mustlearn motivation based on its intrinsic motives, such as pain avoidance, pleasure seeking, uncertaintyawareness, and sensitivity to novelty. A system that is designed to do facial recognition does not havea motive to do things other than facial recognition. IBM Deep Blue, IBM Watson and AlphaGo did notfeel happy when they won a game.

Abstractive : Although a shallow deﬁnition of consciousness means awareness, full awareness re-quires a general capability to abstract higher concepts from concrete examples. By higher concepts herewe mean those concepts that a normal individual of a species is expected to be able to abstract. Considermovie “Rain Man”: If a kiss by a lady on the lip is sensed only as “wet”, there is a lack of abstraction.A baby cannot abstract love from the ﬁrst kiss, but a normal human adult is able to. Thus, abstractionrequires learning.With the above eight requirements, we are ready to discuss GENISAMA Universal Turing Machinesas super machines capable of conscious learning.

Handcrafting a University Turing Machine is not hard. What is really hard is how to enable such amachine to grow automatically from the natural physical word so as to learn any programs and any data4able 1: Unfolding Time for APFGP in a Developmental NetworkTime 0 1 2 3 4 5 6 7 ... tActable world

W W (0) W (1) W (2) W (3) W (4) W (5) W (6) W (7) ... W ( t ) Motor

Z Z (0) Z (1) Z (2) Z (3) Z (4) Z (5) Z (6) Z (7) ... Z ( t ) Skulled brain

Y Y (0) Y (1) Y (2) Y (3) Y (4) Y (5) Y (6) Y (7) ... Y ( t ) Sensor

X X (0) X (1) X (2) X (3) X (4) X (5) X (6) X (7) ... X ( t ) Sensible world W (cid:48) W (cid:48) (0) W (cid:48) (1) W (cid:48) (2) W (cid:48) (3) W (cid:48) (4) W (cid:48) (5) W (cid:48) (6) W (cid:48) (7) ... W (cid:48) ( t ) directly from its physical environment! We will see below how.A DN in Fig. 3 is capable of learning any GENISAMA Universal Turing Machine. It grows oneneuron at a time, to learn moves incrementally, one at a time. Such a machine is capable of APFGP,which motivates us as an alternative characterization of consciousness.A brain is highly recurrent, meaning a neuron sends signals to many other neurons but other neuronsalso send signals back, directly and indirectly. This recurrence has caused great difﬁculties in our un-derstanding how the brain works. We must unfold time so that the time-unfolded brain is not recurrentalong the time axis. We consider ﬁve entities W, Z, Y, X, X (cid:48) at times t, t = 0 , , , ... , as illustrated inTable 1.The ﬁrst row in Table 1 gives the sample times t .The second row denote the actable world W , such as a hammer acting on a nail.The third row is the motor Z , which has muscles to drive effectors, such as arms, legs, and mouth.The fourth row is the skull-closed brain Y . The computation inside the brain must be fully au-tonomous [33] without pre-given any tasks.The ﬁfth row is the sensor X , such as cameras, microphones, and touch sensors (e.g., skin).The last row is the sensible world W (cid:48) , such as surfaces of objects that reﬂect light received bycameras.The actable world W is typically not exactly the same as the sensible world W (cid:48) , because wheresensors sense from and where effectors act on can be different.Next, we discuss the rules about how a DN, denoted as N = ( X, Y, Z ) , learns from world W and W (cid:48) .Extend the tape of the Turing Machine to record images X from sensors, instead of symbols σ . Let X be the original emergent version of input, e.g., a vector that contains values of all pixels.Extend the output from the Turing Machine ( q (cid:48) , γ (cid:48) , d ) to be a muscle image from motor Z , insteadof symbols. Thus, the GENISAMA Turing Machine directly acts on the physical world. Parallel computing model:

We treat X and Z as external because they can be “supervised” by thephysical environment as well as “self-supervised” by the network itself. The internal area Y is closed(hidden)—cannot be directly supervised by external teachers. As in the above Table, we unfold the time t and allow the network to have three areas X ( t ) , Y ( t ) , and Z ( t ) that learn incrementally through time5 B C losed control Y Motoric Z Sensory X Location conceptsType conceptsHigher rulesMotion conceptsOutputInputOutputInput I npu t O u t pu tI npu t O u t pu t AB pq ExcitatoryInhibitory (a)(b)

Top-downTop-down LEFLRFBottom-upSensory X Internal Y Motoric ZX Y ZLateral Bottom-up

SRF MEFMRFSEF Parietal FrontalFrontalTemporalOccipitalEyes,ears,etc.

Figure 3: A DN in (b) grows a brain Y as a two-way bridge between the sensory bank X and the motorbank Z . All the connections are learned, updated and trimmed automatically by DN. (a) A neuron’sconnections are highly recurrent which requires our time-unfolded explanation.6 = 0 , , , ... :  Z (0) Y (0) X (0)  →  Z (1) Y (1) X (1)  →  Z (2) Y (2) X (2)  → ... (1)where → means neurons on the left adaptively links to the neurons on the right.Note, all neurons in every column t use only the values of the column t − to its immediate left, butuse nothing from other columns. This is true for all columns t , with integers t ≥ . Otherwise, iterationsare required. Namely, by unfolding time in the above expression, the highly recurrent operations inrecurrent DN become nonrecurrent in time-unfolded DN. Thus, the DN runs in real time and do not haveto slow down waiting for any iterations.Using Eq. (1), we outline each of the motor area Z , brain area Y , and sensory area X , for t = 1 , , ... ,from an embryo, all the way to an adult, till possible death.1. The motor area Z starting from Z (0) , represents many muscles signals in a developing body.Muscle cells in Z ( t ) at time t take inputs from the Y ( t − area and the Z ( t − , acting on theenvironment at time t using Z ( t ) through self-supervision—trials, errors, and practices.2. Concurrently, the brain Y , starting from Y (0) , also dynamically develops and grows. Each neuronin Y ( t ) gets multiple inputs from all three areas, X ( t − , Y ( t − and Z ( t − . Competition amongneurons allows only few Y neurons to win. These winner Y neurons (like an expert team) at the time t column represent the ﬁring of the brain at time t .3. Likewise, the sensory area X , starting from X (0) , also develops within a developing body. Whatis different between the motor area Z and the sensory area X is that the former develops neurons thatdrive muscles (and also “feel” the world) but the latter develops receptors that sense the world.Now, we have the minimal set of mechanisms—Eq. (1) along with above paragraphs 1, 2 and 3—asthe Computational Model of Emergence of Consciousness for natural and artiﬁcial machines to learnconsciousness (and consciously learn) throughout lifetime:

As time goes by, the learner looks more and deeper aware of the world and itself byincrementally learning a lifetime program P along with lifetime data x (where P and x are not necessarily separated) from its world, while an optimal GENISAMA Universal Tur-ing Machine emerges as proven mathematically in [31]. Inside the brain this machine au-tonomously and recursively generates (i.e., predicts, thinks, or dreams about), at each time t , for the next sensory input X , the next brain response Y and the next motor Z to make alarger, more sophisticated and increasingly integrated program P and to apply the programto world as x for an open-ended variety of purposes. In the eyes of humans, this learner becomes increasingly conscious. Although the lifelong-learned con-sciousness can be extremely complex, the minimal set of computational mechanisms is relatively simplefor us to understand, thanks to the Universal Turing Machines.The remaining detail of this work is discussed in Methods. See [32] for a less mathematical paperabout APFGP oriented towards cognitive scientists and neuroscientists.7

Methods

A Developmental Network is meant for consciousness because it is a holistic model for a biologicalbrain, also fully implementable on an artiﬁcial machine.The following section presents Developmental Network 1 (DN-1). Developmental Network 2 (DN-2) is different from DN-1 primarily in the following sense. In DN-1, each of multiple Y areas has a staticset of neurons so that the competition within each area is based on a top- k principle. Namely, inhibitionamong neurons within each area is implicitly modeled by top- k competition.In the DN-2, however, there is no static assignment of neurons to any regions, so that regions in theDN-2 automatically emerge, along their scales, cascade, and nesting. A major advantage of DN-2 isthat a human programmer is not in the loop of deciding the distribution of X-Y-Z mechanisms, relievinghuman from this intractable task of handcrafting consciousness. A major disadvantage of DN-2 is that itscomputational explanation is too sophisticated to be included in this paper. Let us leave DN-2 out fromthis paper and concentrate on DN-1 below.The hidden Y area corresponds to the entire “brain”. In the following, we assume the brain has asingle area Y but it will enable many subareas to emerge.The brain takes input from vector ( z , x ) , not just sensory x but also motor z , to produce an internalresponse vector y which represents the best match of ( z , x ) with one of many internally stored patternsof ( z , x ) :The winner-take-all learning rule, which is highly nonlinear and simulates parallel lateral inhibitionin the internal (hidden) area Y is sufﬁcient to prove in [31] that a DN that has sufﬁcient hidden neuronslearns any Turing Machine perfectly, immediately, and error-free.The n neurons in Y give a response vector y = ( y , y , ...y n ) of n neurons in which only the best-matched neuron ﬁres at value 1 and all other neurons do not ﬁre giving value 0: y j =  if j = argmax ≤ i ≤ n { f ( t i , z , b i , x ) } otherwise j = 1 , , ...n, (2)where f is a function that measures the similarity between the top-down weight vector t i and the top-down input vector z [25] as well as the similarity between the bottom-up weight vector b i and thebottom-up input vector x . The value of similarity is the inner product of their length-normalized versions[31]. Corresponding to FA, both the top-down weight and the bottom-up weight must match well for f to give a high value as inner product.The response vector y the hidden Y area of DN is then used by Z and X areas to predict the next z and x respectively in discrete time t = 1 , , , , ... : (cid:20) z ( t − x ( t − (cid:21) → y ( t ) → (cid:20) z ( t + 1) x ( t + 1) (cid:21) (3)where → denotes the update on the left side using the left side as input. The ﬁrst → above is highlynonlinear because of the top-1 competition so that only one Y neuron ﬁres (i.e., exactly one componentin binary y is 1). The second → consists of simply links from the single ﬁring Y neurons to all ﬁringneurons on the right side. 8ike the transition function of a Turing Machine, each prediction of z ( t + 1) in Eq. (3) is called a transition , but now in real-valued vectors without any symbols. The same y ( t ) can also be used to predictthe binary (or real-valued) x ( t + 1) ∈ X in Eq. (3). The quality of prediction of ( z ( t + 1) , x ( t + 1)) depends on how state Z abstracts the external world sensed by X . The more mature the DN is in its“lifetime”’ learning, the better its predictions.The expression in Eq. (3), is extremely rich as illustrated in Fig. 3: Self-wiring within a Developmen-tal Network (DN) as the control of GENISAMA TM, based on statistics of activities through “lifetime”,without any central controller, Master Map, handcrafted features, and convolution.The above vector formalization is simple but very powerful in practice. The pattern in Z can representthe binary pattern of any abstract concept — context, state, muscles, action, intent, object type, objectgroup, object relation. However, as far as DN is concerned, they mean the same— a ﬁring pattern of the Z area!Namely, uniﬁed numerical processing-and-prediction in DN amounts to any abstract concepts above.In symbolic representations, it is a human to handcraft every abstract concept as a symbol; but DN doesnot have a human in the “skull”. it simply learns, processes, and generates vectors. In the eyes of ahuman outside the “skull”, the DN gets smarter and smarter.Eq. (3)(a) shows each feature neuron has six ﬁelds in general: Sensory Receptive Field (SRF), Sen-sory Effective Field (SEF), Motor Receptive Field (MRF), Motoric Effective Field (MEF), and LateralReceptive Field (LRF) and Lateral Effective Field (LEF). Eq. (3)(b) shows the resulting self-wired ar-chitecture of DN with Occipital, Temporal, Parietal, and Frontal lobes. Regulated by a general-purposeDevelopmental Program (DP), the DN self-wires by “living” in the physical world. The X and Z areasare supervised by body and the physical world which includes teachers.Through the synaptic maintenance, some Y neurons gradually lose their early connections (dashedlines) with X ( Z ) areas and become “later” (early) Y areas. In the (later) Parietal and Temporal lobes,some neurons further gradually lost their connections with the (early) Occipital area and become rule-likeneurons. These self-wired connections give rise to a complex dynamic network, with shallow and deepconnections instead of a deep cascade of areas. Object location and motion are non-declarative conceptsand object type and language sequence are declarative concepts. Concepts and rules are abstract with thedesired speciﬁcities and invariances. DN does not have any static Brodmann areas. If a DN can learn quickly like other normal animals, we may have to call it retarded with only a limitedconsciousness compared to other animals of the same age. We do not want a DN to get stuck into a localminimum, as many nonlinear artiﬁcial systems have suffered.Fortunately, every DN is optimal in the sense of maximum likelihood, proven mathematically in [31].Put intuitively, all DNs are optimal, given the same learning environment, the same learning experience,and the same number of neurons in the “brain”. There might be many possible network solutions someof which got stuck into local minima in their search for a good network. However, each DN is the mostlikely one, without getting stuck into local minima. This is because although a DN starts with randomweights, all random weights result in the same network.However, this does not mean that the learning environment is the best possible one or the number ofneurons is best possible one for many lifetime tasks. Search for a better educational environment will behuman challenge for their future children, both natural and artiﬁcial kinds.9 raining data collection route blind folded testing routeregular testing routesunny weather sunny weather cloudy weathercloudy weather sunny weathercloudy weather sunny weathersunny weather sunny weather cloudy weathersunny weather cloudy weather

Figure 4: Training, regular testing, and blind-folded testing sessions conducted on campus of MichiganState University (MSU), under different times of day and different natural lighting conditions (e.g., thereare extensive shadows in images). Disjoint testing sessions were conducted along paths that the machinehas not learned. This is the ﬁrst time for visual awareness to be learned by GENISAMA Turing Machines.

This seems the ﬁrst time where general-purpose vision, general-purpose audition, and general-purposenatural language, as the three well-known bottleneck areas in AI has been learned by a single type ofnetwork integrated with motivational learning. Other systems include [5, 11, 28, 3, 12, 16, 21, 9].We conducted experiments in which a learning system acts as an emergent Turing Machine thatlearns one of three well-recognized bottleneck problems in AI, vision, audition and natural languageacquisition. Hopefully, when such systems are mature enough after “living” and “learning” in the realphysical world, they look as though they have a certain degree of animal-like consciousness in the eyesof humans.

Vision from a “lifelong” retinal sequence:

How does a DN become visually conscious demon-strated by its motor behaviors? Let it learn by artiﬁcially “living” in the real world!Fig. 4 provides an overview of the extensiveness of the training, regular training, and blindfoldedtesting sessions. The inputs to the DN were from the same mobile phone that performs computation.They include the current image from the monocular camera, the current desirable direction from theGoogle Map API and the Google Directions API. If the teacher imposes the state in Z , this is treated asthe supervised state. Otherwise, the DN outputs its predicted state from Z . The DN learned to attend10igure 5: The sequences of concept 1 (dense, bottom) and concept 2 (sparse, top) for phoneme /u:/. Thelatest DNs do not need human to provide any labels. Instead, they self-supervise themselves.critical visual information in the current image (e.g., scene type, road features, landmarks, and obstacles)depending on the context of desired direction and the context state. Each state from DN includes headingdirection or stop, the location of the attention, and the type of object to be detected (which detects alandmark), and the scale of attention (global or local), all represented as binary patterns. None is asymbol.For further detail of learning vision-guided navigation and planning for navigation see [35].Below, we will see that an auditory consciousness also uses the same DN, but using different “innate”parameters. Audition from a “lifelong” cochlear sequence:

How does a DN become auditory conscious demon-strated by its motor behaviors? Let it learn by artiﬁcially “living” in the real world!For the audition modality, each input image to X is the pattern that simulates the output from anarray of hair cells in the cochlea. We model the cochlea in the following way. The cells in the base ofthe cochlea correspond to ﬁlters with a high pass band. The cells in the top correspond to ﬁlters with alow pass band. At the same height, cells have different phase shifts. Potentially, such a cochlear modelcould deal with music and other natural sound, more general than the popular Mel Frequency CepstralCoefﬁcients (MFCCs) that are mainly for human speech processing. The performance will be reportedelsewhere due to the limited space.Take the phoneme /u:/ as an example shown in Fig. 5. The state of concept 2 keeps as silence wheninputs are silence frames. It becomes a “free” state when phoneme frames are coming in, and changesto /u:/ state when ﬁrst silence frame shows up at the end. At the same time, the states of concept 1 counttemporally dense stages.For more details of auditory learning using Developmental Networks, see [35].11igure 6: The ﬁnite automaton for the English and French versions of some sentences. The DN learneda much larger ﬁnite automaton. Cross-language meanings of partial- and full-sentences are representedby the same state of meaning context q i , i = 0 , , , ..., . See, e.g., q , q , q , and q . But the languagespeciﬁc context is represented by another concept: language type. The last letter is the return characterthat indicates the end of a sentence.One may ask, what about higher consciousness such as natural language understanding? Natural languages from a “lifelong” word sequence:

How does a DN become language consciousdemonstrated by its motor behaviors? Let it learn by artiﬁcially “living” in the real world! Here, weassume grounded words are emergent patterns, not symbols.As far as we know, this seems to be the ﬁrst work that deals with language acquisition in a bilingualenvironment, largely because the DN learns directly from emergent patterns, both in word input and inaction input (supervision), instead of static symbols.The input to X is a 12-bit binary pattern, each represents a word, which potentially can represent words using binary patterns. The system was taught 1,862 English and French sentences from [26],using , unique words (case sensitive). As an example of the sentences: English: “Christine used towait for me every evening at the exit.” French: “Christine m’attendait tours les soirs `a la sortie.”The Z area was taught two concepts: language type (English, French, and language neutral, e.g.,12 number or name) represented by 3 neurons (top-1 ﬁring), and the language-independent meanings asmeaning states, as shown in Fig. 6. The latter is represented by 18 neurons (18-bit binary pattern), alwaystop 5 neurons ﬁring, capable of representing C (18 ,

5) = 8 , possible combinations as states, but only , actual meanings were recorded. Therefore, the Z area has neurons, potentiallycapable of representing a huge number binary patterns if all possible binary patterns are allowed.However, the DN actually observed only , Z patterns (both concepts combined) from the train-ing experience, and , distinct ( Z, X ) patterns—FA transitions. Consider a traditional symbolicFA using a symbolic transition table, which has , × , rows and , columns. Thisamounts to , × ,

338 = 46 , , table entries.But only , / , , ≈ . of the entries were detected by the hidden neurons, repre-senting that only . of the FA transition table was observed and accommodated by the DN. Namely,the DN has a potential to deal with n -tuples of words with a very large n but bounded by DN size, be-cause most un-observed n -tuples are never represented. The FA transition table is extremely large, butnever generated.Without adding noise to the input X , the recognition error is zero, provided that there is a sufﬁcientnumber of Y neurons. We added Gaussian noise into the bits of X . Let α represent the relative powerof the signal in the noisy signal. When α is 60%, the state recognition rate of DN is around 98%. When α is 90%, the DN has reached 0% error rate, again thanks to the power of DN internal interpolation thatconverts a huge discrete (symbolic) problem into a considerably smaller continuous (numeric) problem.See [35] for more detail. Emotional learning using the same network:

One may wonder, does this type of consciousnessenable emotion? The DN model considers emotion to belong to a wider category known in neuroscienceas motivation.Motivation is very rich [2]. It has two major aspects (a) and (b) in the current DN model. Allreinforcement-learning methods other than the DN, as far as we know, are for symbolic methods (e.g.,Q-learning [27, 19]) and are in aspect (a) below exclusively. The DN uses concepts (e.g., importantevents) instead of the rigid time-discount in Q-learning to avoid the failure of far goals.(a) Pain avoidance and pleasure seeking to speed up learning important events. Signals from pain(aversive) sensors release a special kind of neural transmitters (e.g., serotonin [1]) that diffuse into allneurons that suppress Z ﬁring neurons but speed up the learning rates of the ﬁring Y neurons. Signalsfrom sweet (appetitive) sensors release a special kind of neural transmitters (e.g., dopamine [13]) thatdiffuse into all neurons that excite Z ﬁring neurons but also speed up the learning rates of the ﬁring Y neurons. Higher pains (e.g., loss of loved ones and jealousy) and higher pleasure (e.g., praises andrespects) develop at later ages from lower pains and pleasures, respectively.(b) Synaptic maintenance—the growing and trimming of the spines of synapses— segments ob-ject/event and motivates curiosity. Each synapse incrementally estimates the average error β between thepre-synaptic signal and the synaptic conductance (weight), represented by a kind of neural transmitter(e.g., acetylcholine [37]). Each neuron estimates the average deviation ¯ β as the average across all itssynapses. The ratio β/ ¯ β is the novelty represented by a kind of neural transmitters (e.g., norepinephrine,[37]) at each synapse. The synaptogenic factor f ( β, ¯ β ) at each synaptic spine and full synapse enablesthe spine to grow if the ratio is low (1.0 as default) and to shrink if the ratio is high (1.5 as default).See Fig. 3(b) for how a neuron can cut off their direct connections with Z to become early areas in theoccipital lobe or their direct connections with the X areas to become latter areas inside the parietal andtemporal lobes. However, we cannot guarantee that such “cut off” are 100% based on the statistics-based13iring theory here.See [34, 30, 8] for more details about motivational learning in DN. APFGP inside a network has a minimal set of computational mechanisms for conscious systems, naturaland artiﬁcial. The new APFGP characterization is clearer than existing other characterizations for thenotoriously vague term “consciousness” as we discussed in the ﬁrst section. Hopefully, APFGP will giverise to richer animal-like artiﬁcial consciousness so that conscious AI receives a long-overdue credibilityfor AI. APFGP might also be useful as a computational model for unifying natural consciousness andartiﬁcial consciousness, due to its holistic nature backed by the new capability—APFGP of GENISAMAUniversal Turing Machines. Much exciting practical work on learning consciousness remains to be donein the future, including creating conscious AI and verifying APFGP on natural conscious systems. This isa constructive proof that natural consciousness is computational and does not need quantum mechanics,contrary to the quantum hypothesis by Roger Penpose [22]. Hopefully, artiﬁcial consciousness wouldfollow this methodology to approach human levels.

References [1] N. D. Daw, S. Kakade, and P. Dayan. Opponent interactions between serotonin and dopamine.

Neural Networks , 15(4-6):603–616, 2002.[2] R. J. Dolan. Emotion, cognition, and behavior.

Science , 298(5596):1191–1194, 2002.[3] C. Eliasmith, T. C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and D. Rasmussen. Alarge-scale model of the functioning brain.

Science , 338:1202–1205, 2012.[4] J. L. Elman, E. A. Bates, M. H. Johnson, A. Karmiloff-Smith, D. Parisi, and K. Plunkett.

RethinkingInnateness: A connectionist perspective on development . MIT Press, Cambridge, Massachusetts,1997.[5] C. R. Gallistel. Themes of thought and thinking.

Science , 285:842–843, 1999.[6] E. Gibney. Google reveals secret test of AI bot to beat top Go players.

Nature , 541(142):142, 2017.[7] A. Graves and et al. Hybrid computing using a neural network with dynamic external memory.

Nature , 538:471–476, 2016.[8] Q. Guo, X. Wu, and J. Weng. Cross-domain and within-domain synaptic maintenance for au-tonomous development of visual areas. In

Proc. the Fifth Joint IEEE International Conference onDevelopment and Learning and on Epigenetic Robotics , pages +1–6, Providence, RI, August 13-162015.[9] E. A. Holm. In defense of the black box.

Science , 364(6435):26–27, April 5 2019.[10] J. E. Hopcroft, R. Motwani, and J. D. Ullman.

Introduction to Automata Theory, Languages, andComputation . Addison-Wesley, Boston, MA, 2006.1411] R. Jenkins and A. M. Burton. 100% accuracy in automatic face recognition.

Science ,319(5862):435, 2008.[12] M. I. Jordan and T. M. Mitchell. Machine learning: Trends, perspectives, and prospects.

Science ,349:255–260, July 17 2015.[13] S. Kakade and P. Dayan. Dopamine: generalization and bonuses.

Neural Network , 15:549–559,2002.[14] L. C. Katz and C. J. Shatz. Synaptic activity and the construction of cortical circuits.

Science ,274(5290):1133–1138, 1996.[15] C. Koch. What is consciousness?

Scientiﬁc American , 318(6):60–64, June 2018.[16] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through prob-abilistic program induction.

Science , 350:1332–1338, 2016.[17] Y. LeCun, L. Bengio, and G. Hinton. Deep learning.

Nature , 521:436–444, 2015.[18] J. C. Martin.

Introduction to Languages and the Theory of Computation . McGraw Hill, Boston,MA, 3rd edition, 2003.[19] V. Mnih and et al. Human-level control through deep reinforcement learning.

Nature , 518:529–533,2015.[20] P. R. Montague, P. Dayan, C. Person, and T. J. Sejnowski. Bee foraging in uncertain environmentsusing predictive Hebbian learning.

Nature , 377:725–728, 1995.[21] M. Moravcik and et al. Deepstack: Expert-level artiﬁcial intelligence in heads-up no-limit poker.

Science , 356:508–513, 2017.[22] R. Penrose.

Shadows of the Mind: A Search for the Missing Science of Consciousness . OxfordUniversity Press, Oxford, 1994.[23] J. Piaget.

The Construction of Reality in the Child . Basic Books, New York, 1954.[24] J. Piaget and M. Cook.

The Origins of Intelligence in Children . International Universities Press,Madison, New York, 1952.[25] Y. B. Saalmann, I. N. Pigarev, and T. R. Vidyasagar. Neural mechanisms of visual attention: Howtop-down feedback highlights relevant locations.

Science , 316:1612 – 1615, 2007.[26] R. Scriven, G. Amiot-Cadey, and Collins.

Collins French grammar . HarperCollins, Glasgow, 2011.[27] R. S. Sutton and A. Barto.

Reinforcement Learning . MIT Press, Cambridge, Massachusetts, 1998.[28] J. B. Tenenbaum, C. Kemp, T. L. Grifﬁths, and N. D. Goodman. How to grow a mind: Statistics,structure, and abstraction.

Science , 331:1279–1285, 2011.[29] A. M. Turing. On computable numbers with an application to the Entscheidungsproblem.

Proc.London Math. Soc., 2nd series , 42:230–265, 1936. A correction, ibid. , 43, pp. 544-546.1530] Y. Wang, X. Wu, and J. Weng. Synapse maintenance in the where-what network. In

Proc. Int’lJoint Conference on Neural Networks , pages 2823–2829, San Jose, CA, July 31 - August 5, 2011.[31] J. Weng. Brain as an emergent ﬁnite automaton: A theory and three theorems.

International Journalof Intelligent Science , 5(2):112–131, 2015.[32] J. Weng. A uniﬁed hierarchy for AI and natural intelligence through auto-programming for generalpurposes.

Journal of Cognitive Science , 21:53–102, 2020.[33] J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, and E. Thelen. Autonomousmental development by robots and animals.

Science , 291(5504):599–600, 2001.[34] J. Weng, S. Paslaski, J. Daly, C. VanDam, and J. Brown. Modulation for emergent networks:Serotonin and dopamine.

Neural Networks , 41:225–239, 2013.[35] J. Weng, Zejia Zheng, Xiang Wu, and Juan Castro-Garcia. Auto-programming for general purposes:Theory and experiments. In

Proc. International Joint Conference on Neural Networks , pages 1–8,Glasgow, UK, July 19-24 2020.[36] K. Wynn. Addition and subtraction by human infants.

Nature , 358:749–750, 1992.[37] A. J. Yu and P. Dayan. Uncertainty, neuromodulation, and attention.

Neuron , 46:681–692, 2005.

Addendum

Acknowledgement:

The author likes to thank Zejia Sheng, Xiang Wu and Juan Castro-Garcia for con-ducting experiments which will be further reported in [35].

Competing Interests

The author declares that he has no competing ﬁnancial interests.