[PDF] Saccade learning with concurrent cortical and subcortical basal ganglia loops

Abstract

The Basal Ganglia is a central structure involved in multiple cortical and subcortical loops. Some of these loops are believed to be responsible for saccade target selection. We study here how the very specific structural relationships of these saccadic loops can affect the ability of learning spatial and feature-based tasks. We propose a model of saccade generation with reinforcement learning capabilities based on our previous basal ganglia and superior colliculus models. It is structured around the interactions of two parallel cortico-basal loops and one tecto-basal loop. The two cortical loops separately deal with spatial and non-spatial information to select targets in a concurrent way. The subcortical loop is used to make the final target selection leading to the production of the saccade. These different loops may work in concert or disturb each other regarding reward maximization. Interactions between these loops and their learning capabilities are tested on different saccade tasks. The results show the ability of this model to correctly learn basic target selection based on different criteria (spatial or not). Moreover the model reproduces and explains training dependent express saccades toward targets based on a spatial criterion. Finally, the model predicts that in absence of prefrontal control, the spatial loop should dominate.

Full PDF

SSaccade learning with concurrent cortical andsubcortical basal ganglia loops

Steve N’Guyen , ∗ , Charles Thurat and Benoît Girard Institut des Systèmes Intelligents et de RobotiqueUniversité Pierre et Marie Curie-Paris 6CNRS UMR 7222, Paris, France LPPA, Collège de France, CNRS UMR 7152, Paris, France

Abstract

The Basal Ganglia is a central structure involved in multiple cortical andsubcortical loops. Some of these loops are believed to be responsible forsaccade target selection. We study here how the very speciﬁc structuralrelationships of these saccadic loops can aﬀect the ability of learning spa-tial and feature-based tasks.We propose a model of saccade generation with reinforcement learn-ing capabilities based on our previous basal ganglia and superior colliculusmodels. It is structured around the interactions of two parallel cortico-basal loops and one tecto-basal loop. The two cortical loops separatelydeal with spatial and non-spatial information to select targets in a concur-rent way. The subcortical loop is used to make the ﬁnal target selectionleading to the production of the saccade. These diﬀerent loops may workin concert or disturb each other regarding reward maximization. Inter-actions between these loops and their learning capabilities are tested ondiﬀerent saccade tasks.The results show the ability of this model to correctly learn basic tar-get selection based on diﬀerent criteria (spatial or not). Moreover themodel reproduces and explains training dependent express saccades to-ward targets based on a spatial criterion.Finally, the model predicts that in absence of prefrontal control, thespatial loop should dominate.

Keywords: basal ganglia, superior colliculus, saccades, decision mak-ing, reinforcement learning

The basal ganglia (BG) are a set of interconnected subcortical nuclei (Redgrave,2007), which are thought to be central in the performance of action selection(Mink, 1996; Redgrave et al., 1999).The BG are traditionally described as being composed of various parallelsubcircuits with identical internal wiring, implied in diﬀerent functions (from1 a r X i v : . [ q - b i o . N C ] D ec ortex Striatum SN/GPi Thalamus

Sensoryinputs Motoroutputs Sub-corticalstructure

Thalamus Striatum SN/GPi

Sensoryinputs Motoroutputs

A B C

FEF V4/ITBGThSGSC

SlowFast

FeaturesLocation Location

Figure 1: A: general organization for cortical loops. B: general organizationfor subcortical loops. Filled arrow heads are exitatory connexions, empty arrowheads are inhibitory connexions. Dashed block are inhibitory structures. Notethat the concerned thalamus nuclei are diﬀerents between A (ventral anterior,ventrolateral, medial dorsal) and B (pulvinar, lateral posterior, rostral and cau-dal intralaminar). A and B adapted from (McHaﬃe et al., 2005). C: schematicrepresentation of the relationships between the three modelled loops, note thetype of information processed (either location or features of targets) and thedelays (slow or fast).motor to cognitive ones), and belonging to a set of parallel cortico-baso-thalamo-cortical loops (Alexander et al., 1986b), as schematized in Fig. 1A. However,the BG also participate in purely subcortical loops (Groenewegen and Berendse,1994; McHaﬃe et al., 2005, 2006; May, 2006), which are wired a bit diﬀerentlyas the input to the BG is relayed through the thalamus and the BG outputprojects directly to the considered subcortical structures (Fig. 1B), and whichrely on diﬀerent thalamic nuclei (pulvinar, lateral posterior, rostral and caudalintralaminar). They do, in particular, participate in loops with the superiorcolliculus (SC), well-known for its laminar structure, its mapping of the visualﬁeld and its involvement in gaze orientation movements, including saccadic eyemovements (Moschovakis et al., 1996; Lynch and Tian, 2006).We propose here a computational model of the interactions of subcorticaland cortical BG loops in primates, processing either target position (spatial)information or target feature information, in the well investigated frameworkof saccadic eye movements (Hikosaka et al., 2000). Indeed, cortico-basal loopsdealing with the location of potential targets in the visual ﬁeld, on the one hand,or with the detection of features of potential targets, on the other hand, havelong been identiﬁed. The superior colliculus (and thus the tecto-basal loop)is a bottleneck receiving all this information for the ﬁnal decision, however italso receives target location information earlier than the cortically processedinformation, through direct projections from the retina.We thus study the eﬀects imposed by this hierarchical structure – where thehighest level modules have longer latencies, while the lowest level module has alower latency shortcut, but speciﬁc to location information, Fig. 1C – on perfor-mance and saccadic reaction time in space-based and/or feature-based selectiontasks, in order to identify predictions speciﬁc to this organization. These pre-dictions stand for dorsolateral prefrontal cortex (dlPFC) deprived animals as itis not included in our model and as we can expect the inhibitory control fromthe dlPFC on the superior colliculus to allow additional control on unwanted2hort-latency saccades (Koval et al., 2011).We show that the fact that a purely spatial selection and learning systemoperate at the last level predicts that: • in spatial tasks only should the saccadic reaction times decrease with learn-ing, allowing the generation of express saccades and causing short latencyactivations in the FEF, • performance in feature-based tasks should be lower than in spatial tasks,because of the perturbations caused by the subcortical spatial loop, • in conjunction tasks, where spatial and feature-based information deter-mine the good choice, errors are unavoidable when no choice should bemade. The subcortical loop (Fig. 2, dotted circuit) has access to visual inputs directlyconveyed from the retina to the superﬁcial layers of the superior colliculus, with alow latency. These retinal projections provide relatively rich visual information(Girman and Lund, 2007), but no color information. As the SC layers areorganized as piled retinotopic maps of the visual ﬁeld, and given the spatialreceptive ﬁelds of the BG output neurons projecting to the SC (Hikosaka et al.,1983), it can be assumed that the competition among targets is here basedon spatial position. This loop is a good candidate neural substrate to explainthe accumulating evidence (see for example McPeek and Keller (2002); McPeeket al. (2003); McPeek and Keller (2004), among many others since 2000) thatthe SC performs target selection on its own, rather than solely executing corticaldecisions.Two cortical loops, projecting to the SC as a common output, are considered.A ﬁrst one (Fig. 2, dashed circuit), comprising the frontal eye ﬁelds (FEF), alsooperates on the spatial domain, but contributes to saccade generation withlonger latencies than the SC. This loop is known to be a common pathwayfor “cognitive” saccades, where working memory or sequence generation areinvolved, however these are not included in the proposed model (indeed noSEF and pre-SEF have been included). We hypothesize that the BG subcircuitinvolved in this loop is shared with the subcortical one (i.e. there is only one BGsubcircuit dedicated to spatial selection of targets). This choice of converginginput has been made based on known anatomy as it seems that FEF projectsto the “Oculomotor Striatum” (central/longitudinal Caudate) (Stanton et al.,1988).The second one (Fig. 2, dash-dot circuit) comprises V4 and IT and dealswith the selection of targets exhibiting speciﬁc features (only color will be usedhere for simplicity). V4 is known to be selective to shape and color (Ogawaand Komatsu, 2004) and visuotopically organized (Gattass et al., 1988). More-over, this region exhibits strong recurrent connections with IT (in particular theTE area) (Ungerleider et al., 2008). The TE region of IT has been shown tobe selective to features (and colors) and not visuotopically organized (activity3oesn’t depend on object position) (Tompa and Sáry, 2010). More importantly,this TE area forms a loop with the Basal Ganglia (Middleton and Strick, 1996),thus it seems somewhat reasonable to hypothesize that colors and features couldbe selected through a cortical IT-BG-Th loop in a non-spatial fashion and thenprojected back to V4. In particular, the TE region, projects to the “VisualStriatum” (tail of Caudate and caudal/ventral portion of Putamen) (Middletonand Strick, 1996), supporting the separation between the spatial and the featureloop. The Superior Colliculus is known to receive numerous projections fromcortical areas amongst which V4 (Fries, 1984; Lock et al., 2003). This mech-anism is compatible with feature/color sensitivity with a longer latency thanluminance signal observed in intermediate layers of SC (SCi) (White et al.,2009; White and Munoz, 2011).So to summarize, in this model two parallel mechanisms compete for targetselection (Fig. 1, C). The ﬁrst one is “location” based and comprises two coop-erating loops, both cortical and subcortical. The second one is “feature” basedand comprises one cortical loop. The detail of the equations are given in section2.

The proposed model is intended to learn to generate saccades towards targetsselected based on their color and location in the visual ﬁeld (cf. Fig. 2), de-pending on the reward contingencies experienced during interaction with theenvironment.As said before, it is composed of three main loops going through the basalganglia, which interact in both competitive and cooperative ways. The subcor-tical one corresponds to the SC-Th-BG circuit (dotted connexions on Fig. 2),it gets its inputs from the direct projections from the retina to the superﬁciallayers of the superior colliculus along with activity of deep layers, and it se-lects among targets competing on a purely spatial dimension. This loop passesthrough the Intralaminar nucleus (IL) thalamic relay (McHaﬃe et al., 2005).The cortical ones also comprise a circuit dedicated to spatial competition(FEF-BG-Th, dashed connexions on Fig. 2), which shares its BG circuit withthe subcortical loop but with a diﬀerent thalamic relay (the paralamellar portionof the mediodorsal thalamic nuclei, MDpl) (Alexander et al., 1986a; Tian andLynch, 1997) and another dedicated to features (namely color) selection (IT-BG-Th, dot-dashed connexions on Fig. 2) via VAmc (Middleton and Strick,1996).Retinal information is transmitted to SC, FEF and V4|IT with diﬀerentlatencies according to the literature, SCs input latency is ﬁxed to 41 ms (typeI neurons) (Rizzolatti and Buchtel, 1980). FEF to 91 ms and IT to 122 ms(average over all TE sub-regions) (Lamme and Roelfsema, 2000).The FEF module contains an input and an output retinotopic map sensibleto luminance. The V4|IT module contains one input and one output retinotopicmaps for each color. SC module also contains several retinotopic maps, dealingwith direct retinal input (SCs), FEF input, V4|IT input, summed activity ofSCs, FEF and V4|IT (SCi output) and motor activity (SCi motor). For each ofthese structures a selection loop through BG occurs.We use rate-coding models of neurons (based on locally projected dynamicalsystems, lPDS, (Girard et al., 2008)), which are deﬁned as follows:4

EF V4/ITBGThSG

SubcorticalloopSpatialcortical loopNon-spatialcortical loopExcitatory connectionInhibitoryconnection

SCTh

FEF Th SC Th IT BG spatial BG color SCsSCi outputSCi motor

VAmcVAmcIL

Figure 2: Structure of the model. BG: Basal ganglia; FEF: frontal eye ﬁelds;SG: saccade generators; SC: superior colliculus; Th: thalamus; V4|IT: Featureperception area including IT (TE region) interacting with V4 visual cortex area.Dark gray shaded layers on BG modules are input layers with reinforcementlearning capabilities. 5 x = Π [0 ,max ] ( x ( t ) , I ( t ) − x ( t ) τ ) (1)where I ( t ) represents the external inputs, τ the time constant, and Π [0 ,max ] a projection operator ensuring that the neuron activity x ( t ) will remain within[0 , max ].The projection operator Π [0 ,max ] is simply an operator acting on ˙ x ensuringthat the variable x remains within a speciﬁed range of values. In our case (Eulerintegration with 1ms timestep) we end up with a discrete update operating asfollows : x ( t + dt ) = min (cid:2) , max (cid:2) , x ( t ) + dtτ × ( I ( t ) − x ( t )) (cid:3)(cid:3) (2)This method is very similar to the classical way of converting the computedactivity x into a non-negative one y = max (0 . , x ) but here the non linear“transfer function” is applied inside the diﬀerential equation at the cost of mak-ing it a non longer a classical ordinary diﬀerential equation but with some overbeneﬁts such as “contraction” i.e. stability.The basal ganglia model we use here (Girard et al., 2008) was formulated inthis framework, so as to formally ensure its dynamical stability. For the sakeof consistency, we thus use it for the rest of the model presented here. Onlythe external input part ( I ( t )) and the time constant ( τ ) of this equation haveto be speciﬁed to deﬁne such a neuron model. Thus, to simplify the writing,only I ( t ) will be given in the next section providing a detailed description ofthe model, while the time constants and other model parameters are providedin supplemental data section.The BG exert an inhibitory inﬂuence on their target circuits, which preventsthem from generating actions. Even without any inputs, the BG converge toa given level of inhibition, GP i | SN r rest , suﬃcient to enforce this control. Aspreviously proposed in (Arai et al., 1994; Das et al., 1996; Arai et al., 1999),we modeled the eﬀect of the basal ganglia inhibition as modulating the exci-tatory inputs of the targeted systems. To ensure that, at rest, no action canbe generated, this inhibitory gain modulation is normalized with regards to the

GP i | SN r rest constant. Thus, the contributions of the BG outputs to the cir-cuits they target will take the general following form in the equations of thenext section: W E × I E × (1 − GP i | SN rGP i | SN r rest ) (3)Where I E is the excitatory input controlled by the BG inhibition, GP i | SN r isthe output of the BG neurons projecting to the considered circuit.The feedback from the superior colliculus, which signals the end of the exe-cution of a saccade, is also modeled as modulating.Most of the components of the model are 70 ×

70 2D maps of lPDS neuronsfor each hemiﬁeld, respecting the complex-logarithmic geometry of the macaquesuperior colliculus, as modeled by (Ottes et al., 1986). Unless speciﬁed, neuronsof one map project to those of another map in a one-to-one manner. Visualinputs are simulated as gaussian activities spreading over a hundred of neurons.6 G color Th IT Input colorchannelsOutput colorchannels AC color reward IT SG inhib Figure 3: Selection loop for color channels. Black arrow heads are exitatoryconnexions, empty arrow heads are inhibitory connexions.

Color information is processed by the cortical V4|IT-BG-Th loop. As statedpreviously, the V4 structure contains several retinotopic maps each encodingfor a speciﬁc color (3 are used here, red, green and blue). In order to deal withnon spatial color channels, activity in each map is summed, providing a reducednumber of independent channel. These channels are ampliﬁed in an closed loopmanner by the interaction of IT with BG and Th (cf. Fig. 3).Thus, the BG selection occurring in the V4|IT-BG-Th loop deals with non-spatial color information only. Then these channels are transformed back intoretinotopic maps (cf. Fig. 4) and the resulting map (V4 output map) is thenprojected to SCi. Activity fed to the channels is computed as follows: IT outc = W IToutITin .IT inc × (1 − W SG inhib .SG inhib )+ W IToutThIT .T h

ITc (4)

T h

ITc = IT outc × (cid:16) W ThITITout + W ThITGPi × (1 − GP i | SN r colorc

GP i | SN r rest ) (cid:17) − W ThITTRNIT .T RN IT + I T h (5)with c ∈ [ red, green, blue ], IT inc the visual input channel for color c , IT outc the activity of the IT layer connected with T h IT and SG inhib the ascendinginhibition from saccade generators. Thalamic activity depends on IT outc andon BG output nuclei GP i | SN r color . The BG output thus gates a part of thetransmission between IT out and T h IT with a modulating inhibition. T RN IT isthe activity of the globally inhibiting inputs from the thalamic reticular nucleusand I T h a constant tonic activity. IT inc is fed to the reinforcement learning7 nput color channelscolor maps V4 output mapSelectionvia BG V4 in Output colorchannels IT in V4 out IT out Figure 4: Spatial-color transformation.module for the color ( AC color ). Details of the reinforcement learning are givenbelow (Section 2.2.3). Then, the resulting channels along with IT outc are givenas inputs to the BG. For full details about Th and BG model see (Girard et al.,2008).Spatial information is processed by two cooperating loops. In the corticalFEF-BG-Th loop, FEF receives visual information in its input map with a longlatency (91 ms). This map is then fed to the selection loop (cf. Fig. 5) and theresulting activity is computed as follows: F EF outi,j = W FEFoutFEFin .F EF ini,j × (1 − W SG inhib .SG inhib )+ W FEFoutThFEF .T h

F EFi,j (6)

T h

F EFi,j = F EF outi,j × (cid:16) W ThFEFFEFout + W ThFEFGPi | SNr × (1 − GP i | SN r spatiali,j

GP i | SN r rest ) (cid:17) − W ThFEFTRNFEF .T RN

F EF + I T h (7)with ( i, j ) ∈ [0 , n ] , F EF in the visual input and F EF out the activity of the

F EF layer connected with

T h

F EF .The two maps (

T h SC and F EF in ) are concatenated and fed to the reinforce-ment learning module. We decided to keep both maps concatenated in orderto preserve the full learning capabilities and then to merge back the resultingweighted maps at the BG spatial input level before BG selection.The merge is done by summing and passing these maps through a sigmoid( f ( x ) = e . (0 . − x ) ), inducing a non-linearity and a minimal salience thresh-old. Similarly to the color loop, the resulting map along with F EF in are givenas inputs to the BG.In the SC-Th-BG loop, SCi receives inputs from V4|IT, FEF and retina(via SCs). These inputs are weighted summed and fed to the selection loop(cf. Fig. 6). As stated previously the BG spatial module is the same than in theFEF-BG-Th loop. The resulting activity is computed as follow:8 G spatial Th FEF

FEF inputmapFEF outputmap Th SC AC spatial reward FEF SG inhib Figure 5: Closed loop selection-ampliﬁcation of spatial FEF map. Black arrowheads are exitatory connexions, empty arrow heads are inhibitory connexions.

SCi outi,j = (cid:2) ( W SCiSCs .SCs i,j + W SCiFEF .F EF outi,j + W SCiV | IT .V | IT outi,j ) (cid:3) × (cid:2) W SCiSCiin + W SCiBGamp × (1 − SN r i,j

SN r rest ) (cid:3) × (1 − W SG inhib SG inhib ) (8) T h

SCi,j = SCi outi,j × (cid:16) W ThSCiSCiout + W ThSCiGPi | SNr × (1 − GP i | SN r spatiali,j

GP i | SN r rest ) (cid:17) − W ThSCiTRNSCi .T RN

SCi + I T h (9)with ( i, j ) ∈ [0 , n ] , SCs the visual input from the superﬁcial layer of SC,

GP i | SN r the inhibition from the output nucleus of BG projecting to SC.

The Basal Ganglia model used here was ﬁrst described in (Girard et al., 2008)and is depicted in Figure 1A for cortical loops. Notice that for the subcorticalloop the connectivity is slightly diﬀerent for the position of the Thalamus (cf.Fig. 1B).The parameters of the BG circuit involved in the spatial loop have beenadapted so as to cope with the selection of 630 channels (see Table 1).The n × n inputs from the spatial maps (here with n = 70 for each hemiﬁelds)converge on the m × m inputs (here m = 18 for each hemiﬁeld) by the Gaussian9 EF outputmap V4/IT outputmapSCs inputmapTh SC BG spatial SCi outputmapSCi motormap SC AC spatial SG inhib FEF inputmap

Figure 6: Closed loop selection-ampliﬁcation of spatial SC map. Black arrowheads are exitatory connexions, empty arrow heads are inhibitory connexions.Pyramids method. Input map size is reduced by ﬁrst convolving it with a 5 × BG spatiali,j = ( In ∗ G ) i,j (10)with In the input map, G the normalized gaussian kernel and ( i, j ) ∈ [0 , n ] .Then it is 2 × BG colorc = W BGITc X i,j IT c i,j (11)with W BGITc a normalization constant. The output of the same circuit thus aﬀectsthe whole color maps in the following manner: V outc i,j = c V c i,j .IT outc (12)with c V c i,j the normalized activity of the input map for color c , c V c i,j = V inc i,j /max ( V inc ) and IT outc the output activity for a whole channel c . Figure 7.

A: Details of the BG model in the cortical loop (here, IT-BG-This shown but an identical structure is used for FEF-BG-Th). Only 3 channelsare represented, the middle one being the most salient. SNr/GPi and GPe10 TN Striatum

D1 D2FSSNr/GPi

Cortex

IT IT Th Th IT TRNThGPeDisinhibition ofselected channel

Figure 7: A: Details of the BG model in the cortical loop (here, IT-BG-This shown but an identical structure is used for FEF-BG-Th). Only 3 channelsare represented, the middle one being the most salient. SNr/GPi and GPeare color inverted as channels activity in these structures are opposed (middlechannel which is the most activated in input, is the weakest in these struc-tures). Thalamus structure (Th) is composed of a ventral anterior nucleus andof reticular nucleus (TRN) which constitute a population without segregatedchannels. Striatum is composed of D1 and D2 types of dopaminergic neuronsand of a population of fast discharge inter-neurons (FS). Filled arrow heads areexitatory connexions and empty arrow heads are inhibitory. Filled lines repre-sents one-to-one connexions and dotted lines represents one-to-all connexions.Adapted from (Girard et al., 2008). B: Details of the BG model in the subcor-tical loop (SC-Th-BG). Same model than in A except for the position of theThalamus.are color inverted as channels activity in these structures are opposed (middlechannel which is the most activated in input, is the weakest in these struc-tures). Thalamus structure (Th) is composed of a ventral anterior nucleus andof reticular nucleus (TRN) which constitute a population without segregatedchannels. Striatum is composed of D1 and D2 types of dopaminergic neuronsand of a population of fast discharge inter-neurons (FS). Filled arrow heads areexitatory connexions and empty arrow heads are inhibitory. Filled lines repre-sents one-to-one connexions and dotted lines represents one-to-all connexions.Adapted from (Girard et al., 2008). B: Details of the BG model in the subcor-tical loop (SC-Th-BG). Same model than in A except for the position of theThalamus. 11 .. ... ... I n p u t CriticActor(BG) S t r i a t u m G P i R ... ... I n p u t ( S t a t e s ) CriticActor

REnvironment Environment O u t p u t ( A c t i o n s ) SC W T A Figure 8: Schematic representation of the Actor-Critic reinforcement learn-ing algorithm. Left: classical Actor-Critic, involving a winner-takes-all (WTA)selection mechanism. Right: Actor-Critic with BG as selection mechanism.

The input to the Basal Ganglia circuits is biased by reward using the classical“Actor-Critic” TD( λ ) learning algorithm (Sutton, 1988; Sutton and Barto, 1998;Montague et al., 1996).TD-error δ is computed according to δ = R t + ( γ × V t ) − V t − with V t = W Critic · Input t (13) R t being the reward at time t , V t the estimated value function, W Critic thelearned weights of the Critic,

Input t the input matrix (spatial or color) and γ the discount factor.Critic’s weights are then updated using eligibility traces E Critic : W Critic ← W Critic + η × δ × E Critic with E Critic ← λ × E Critic + Input t − (14) η being the learning rate and λ the “forgetting” factor of eligibility traces.The size of the Critic’s weights vector is N , the same as Input so here connexionsare “all-to-one” type. Actions vector (weighted inputs) is computed as following: A t = W Actor · Input t (15)and Actor’s weights are computed as following: W Actor ← W Actor + η × δ × E Actor with E Actor ← α × E Actor + Input t − ⊗ A t − and A t − = GP i t − (16)Actor’s weights matrix is of size N × N so here, connexion are “all-to-all”type. 12ompared to classical reinforcement learning (cf. ﬁg. 8, left) we can see that“States” are inputs to be selected and “Actions” are weighted inputs. Here, theBG compute a selection of these weighted inputs – thus playing the role of the“winner-takes-all” (cf. ﬁg. 8, right) – and then disinhibit some structure (i.e.SC) which eventually will trigger a real action.Actor’s weights are initialized to an identity matrix in order to allow for aninitial “standard” behavior (direct unweighted projection). A minimum valuefor Actor’s weights diagonal has been implemented ( W Actor min = 0 .

6) in orderto prevent the system to from losing the ability to trigger saccades. Criticsweights are initialized to a random matrix with values ∈ [0 , . In order to compute the so-called “spatio-temporal transformation” (STT) re-quired to convert a spatially coded target into a saccade burst generators (SBGs)temporal sequence, we used the model ﬁrst described in (Tabareau et al., 2007)(cf. Fig. 9). This model includes a visual map (SCi output map described above)and a motor map (SC motor map) with a log-complex mapping along with col-liculi gluing mechanism. The motor layer is projected to the saccade generatorsand both are controlled by a strong inhibition from omnipause neuron (OPN).We can notice than we slightly modiﬁed the “integrating-saturating” mech-anism (

Int and

Sat in ﬁgure 9). This mechanism no longer inhibits the wholemotor map in a subtractive manner, but now modulates the visual map to motormap projection in a multiplicative manner: I motori,j = SCi outi,j × (1 − W SCiBGinhib × SN r i,j ) × (1 − W MotSat .Sat ) − W MotorOPN .OP N (17)with I motor the input activity of motor layer, SCi out the activity of

SCi out mapdescribed in section 2.2.1,

OP N the output activity of the OPN and ( i, j ) ∈ [0 , n ] .This modiﬁcation has the advantage of generating more realistic burst activ-ities, more similar to the gamma functions used in (van Opstal and Goossens,2008).Notice that Sat is used as the ascending inhibitory signal SG inhib in otherstructures, which signals the execution of a saccade (Sommer and Wurtz, 2002). The parameters of the model were hand-tuned, these tuning operations wereperformed, as much as possible, by considering the various subsystems (BGmodels, generation of the motor command, convergence of the inputs on the SC,and reinforcement learning) in isolation and enforcing their correct operation.13 a UpwardSaccadeGenerator LeftwardSaccadeGenerator

10° 20° 40°5°2° 0°45°90°−45°−90° E l e v a ti on AzimuthExtra−ocular muscles Extra−ocular muscles

MNTN EBN TN EBNMN ta LLB OPN M2M1 V2V1SatInt

SCi outputmapSC motormap

SG inhibitionto otherstructures

Figure 9: Architecture of the motor layer of SC. Only one colliculus (righthemiﬁeld) and two SBG are represented (without cross projections) along withtwo neurons by map (V1 and V2 in the SCi output map, M1 and M2 in themotor map). Grey discs represents gaussian activity produced by a visual target(coordinate (10 ◦ ,10 ◦ ), thus R = 10 ◦ , θ = 45 ◦ ), insets in the saccade generatorrepresent the temporal coding in EBNs generated to control muscles. Filledtriangles are for excitatory connexions, empty triangles are for inhibitory con-nexions. Bold connexion aﬀect the whole map. Adapted from (Tabareau et al.,2007). 14he parameters of the spatial BG loop had to be modiﬁed compared to theinitial parameterization of (Girard et al., 2008), as the number of competingchannels is much higher. This drastically aﬀects the eﬀects of diﬀuse projections,like those of the STN on the GPe and GPi. When 630 channels are excitingthe GPi, rather than 6, the strength of this excitation has to be reduced, so asto avoid saturating the GPi neurons, and so as to allow one-to-one inhibitionsfrom the Striatum to be strong enough to conteract excitation and thus allowselection. These modiﬁcations were made as follows: the BG model was isolatedfrom the rest of the system, and provided with 2D Gaussian inputs similar tothose used in the tasks, with varied amplitudes. The parameters were adjusteduntil the selection of a single target with an amplitude between 0.6 and 1 wasrestored. Finer adjustment were then made so that one or two distractorsof inferior amplitudes would not disturb the selection process, and that thesimultaneous selection of multiple targets occurred only when they have veryclose amplitudes.The parameters of the motor layers of the SC, and of the saccade genera-tors, which operate the spatio-temporal transformation, were almost identicalto those of (Tabareau et al., 2007), except slight modiﬁcations in the integra-tion rate of the saturating mechanism, so as to adjust the duration of the motorbursts to more realistic values.The parameters adjusting the strength of the contributions of all the diﬀer-ent maps to the ﬁnal SCi layer were adjusted so that: 1) imposing an inputfrom the spatial system only, or from the color one only, would generate thecorresponding saccade, and 2) simultaneously imposing a given target positionin the spatial system and another one in the color system, would result in anaveraging saccade.Finally, the parameters driving the temporal integration of reward in thelearning modules –namely the discount factors γ and the eligibility trace λ – hadto be large enough, so that learning could occur despite the relatively long delaybetween the appearance of a target and the eﬀective reward delivery ( ≈ ms ).The learning rates were adjusted so that the learning would converge to the bestpossible level of performance in approximately 20 −

25 sessions. The relativediﬀerence between η spatial and η color has to be considered in the light of: 1) thehuge diﬀerence in the number of input weights to be adjusted in each system(1587600 in the spatial domain vs. 9 in the color one), and 2) the diﬀerentextent of the input stimulations corresponding to one target (a 2D Gaussianinput spreading over a hunded of channels in the spatial domain vs. one singlechannel in the color domain). We simulated 3 target selection tasks where the system has to trigger a saccadetoward one of the two displayed cues (cf. Fig. 10).A “spatial task” is aimed at verifying its ability to learn to choose a targetbased on spatial information only. A “color task” for color information only.And a “conjunction task” to study interactions between these two. 10 runswere done and each experimental run is composed of 40 sessions of 12 trials.15 lack (50 ms)ﬁxation (800 ms)gap (150-250 ms)cues (600 ms)time

Figure 10: Simulated sequence of visual stimuli. A black screen of 50 ms isfollowed by a ﬁxation cue for 800 ms. Then a random gap time (between 150 and250 ms) is followed by the two cues. The cues are displayed for a maximum of600 ms and loops back. During this interval, if a saccade of suﬃcient amplitude( > . ◦ from the center) is detected, the trial ends and loop back. Rewards aregiven when the trial ends, which may be triggered by the timer or a saccadedepending on the task. In the spatial task, the rewarded cue only depends on its position on the visualﬁeld. So the system has to learn to ignore the color information and to favorthe spatial one.We can see that the model is able to learn the task with a performancereaching ≈ −

95% (Fig. 11A), this means that it is possible to ﬁnd a param-eterization of the model allowing for a good level of performance after learningThe distribution of SRT is bimodal, with a very sharp peak of low latency( ≈

88 ms) and a second bump centered around ≈

200 ms (cf. Fig. 11B).This behavior is very similar to that of “express saccades” for short latenciesand “regular saccades” for longer ones described in (Fischer and Weber, 1993).Looking at details of the evolution of these SRT, it appears that for the ﬁrst halfof the experiment (ﬁrst 20 sessions = ﬁrst 240 trials) saccade latencies mainlyfall within the 200 ms mode (cf. Fig. 12). These saccades reﬂect the baselinetimings of the system without any selection bias from learning.For the second half of the experiment (where performance is close to 90%),saccade latencies fall within the 88 ms mode.Associated weights for the color loop (Fig. 11D) indicates that the colors oftargets (red or green) have not been learned: they have similar weights valuesof ≈ . ≈ .

5, while the FEF ones are around 1 . BC D P e r f o r m a n c e r a t e Mean performance

50 100 150 200 250 300time(ms)02004006008001000

Reaction time histogramSC actor weights

FEF actor weights

R G BRGB

IT actor weights

Figure 11: Results of the spatial task. The rewarded cue is the right one regard-less its color. A: Performance across sessions (bold line is the mean performanceof the 10 runs represented with dotted lines). B: Distribution of saccadic reac-tion time (SRT) for the whole experiment. C: Learned weights (averaged over10 runs) for the Actor part of the spatial loop; for readability reasons, the mul-tidimensional weight matrix has been projected on the output: it represents,for each unit, the sum of the input weights coming from the whole map, for theSC (top) and the FEF (bottom), note also the diﬀerent intensity scale betweenSC and FEF. D: Learned weights (averaged over 10 runs) for Actor part of thecolor loop. 17

Reaction time histogram

Figure 12: Spatial task reaction time histogram with separated ﬁrst half of theexperiment (top) and second half (bottom).causes a strong activity on the spatial loop with a quick disinhibition from theSNr as soon as the direct retina-to-SC signal appears. Then, activity is trans-mitted to the motor layer even before visual information reaches the corticalvisual areas and rapidly triggers a saccade. This kind of saccade thus diﬀersfrom “standard” ones as they only rely on the direct retina-to-SC pathway. In-deed, before learning, the retina-to-SC input is not suﬃcient to trigger a saccadealone in our model and needs either FEF or V4|IT input, thus explaining thelonger SRT.If we look at the details of neural activity in normal and express saccades(Figure 13), what appears for the spatial task (after learning) is that directretinal input induces activity in the spatial loop, which is quickly dis-inhibitedby the BG (thanks to the strong weights) and activates the SCi motor map.Moreover, as the same BG module is shared between the subcortical and thecortical loops, this dis-inhibition also aﬀects the cortical loop and thus inducesactivity in FEF before visual information reaches it. This induced activitydepends in facts on the baseline level of the Thalamus and is a prediction of themodel due to our choice of a single shared spatial BG module. The activity inthe SC causes a disinhibition in the spatial BG circuit, which then disinhibitsalso the thalamo-FEF loop. As this loop is auto-excitatory and as the thalamushas a baseline activity, this trigger a resonance between Cortex and Thalamus.Thus the observed short latency activity in FEF is not caused directly by visualinput but indirectly by subcortical visual activity.Yet, express saccades depend only on the SC loop and FEF only has amarginal impact on it. Nevertheless, simulations with a FEF inactivation (afterlearning) extends SRT of ≈ ms , this FEF resonant activity thus contributesto the global behavior.Notice that Figure 13 also exhibits some very short bursts of post-saccadicvisual activity (better seen for SCs but the mechanism is the same for all thestructures). These bursts are provoked by the residual retinal activity reaching18ach visual region due to the latencies, whereas eyes have already moved. Thisbehavior is probably not signiﬁcant as it may be canceled by a diﬀerent choiceof parameters for SG inhib for example. In the color task, the rewarded cue only depends on its color. So the systemhas to learn to ignore the spatial information and to favor the color one.Here, the average performance only reaches about 75% (cf. Fig. 14A), sothe system can learn the task but errors are still made at a rather consistentrate. The performance is thus lower that in the spatial task, an eﬀect which ismost probably caused by the structure of the BG loops themselves, a point wediscuss further in section 4.2.Color learning is very sensitive to noise in the spatial domain. Indeed, mostof the time ( ≈ In the conjunction task, the rewarded cue depends on both position and color(e.g. red disk at the right position). When this conjunction is not presented (Noconjunction case), the system is rewarded only if the eye position stays withina 2.5 ◦ degrees circle around the center (“Good average” behavior).Here, the average performance for the conjunction case reaches levels similarto those of the spatial task (around 95%, Fig 15A) but for the “No conjunctioncase” the rewarded behavior (“Good average”) is rarely performed. We can seethat the errors made in this case tend to be mostly “color errors” i.e. a saccadetoward the good location but with the wrong color (around 90% of errors at theend of the experiment). “Spatial errors” occurred when a saccade is triggeredtoward the good color but at the wrong position. However, we can see that atthe beginning of the learning and until half of the experiment, the system isstill able to produce a small number of “good average” ( ≈

50 100 150 200 250 300time (ms)0101010101010101010.04.40131

FEFD1D2SNrTh (FEF)Th (SC)SCsSCiSCi motorEye posEye vel V4 FEFinputSCsinputV4input

Figure 13: Activities of diﬀerent neurons of the target channel in the spatialtask. Target appears at t = 0. Dashed line: before leaning. Solid line: afterlearning. SCs input, FEF input and V4 input represent the presence of thevisual cue in the receptive ﬁeld before learning (gray) and after learning (black).1: Visual activity reaching SCs. 2: Beginning of the express saccade (afterlearning). 3: Visual activity reaching FEF. 4: Visual activity reaching V4|IT.5: Beginning of the saccade before learning. 6: Indirect short latency activityin FEF provoked by SC activity. 7: Small burst of post-saccadic visual activityprovoked by the end of inhibition from SG inhib .20 BC D P e r f o r m a n c e r a t e Mean performance

50 100 150 200 250 300time(ms)0100200300400500

Reaction time histogramSC actor weights

FEF actor weights

R G BRGB

IT actor weights

Figure 14: Results of the color task. The rewarded cue is the red one regardlessits position. A,B,C,D: same as Fig. 11.The learned weights correspond well to the task as the right position isfavored compared to left one with (cf. Fig. 15C) but the red color is onlyslightly favored compared to green (cf. Fig. 15D).The saccade reaction time is more complicated here. In fact we can seethree modes ( ≈ ms , ≈ ms and ≈ ms ). These three modes are infact explained by the respective latencies imposed for the three pathways, SC(41 ms), FEF (91 ms) and V4|IT (122 ms). Similarly to the spatial task, mostof the 88ms saccades occurred on the second half of the experiment reﬂectingthe specialization toward spatial selection. In fact, saccade latencies shift fromthe 220 ms mode roughly at the ﬁrst tier of the experiment, to the 140 ms modeat the second tier and then to the 88 ms mode. This gradual shift of timingthus explain the lack of inﬂuence of the color loop, whose pathway latency isof 122 ms. A saccade may be triggered by the spatial loops before featureinformation even reaches the color loop. Again, this eﬀect is not speciﬁc toa given parameterization: the advantage of the spatial decisions, caused bya subcortical circuit with earlier access to information, and thus with fasterlearning, is structural. It is to be noted that the 140 ms peak did not appearin the spatial task, as the learning is fast enough to allow the system to quicklyswitch to an “express saccade expert”. This is also explained by the information“redundancy” in our model between SC and FEF, the latter dealing with thesame spatial information only with a longer latency. In the conjunction task,this peak appears as the “diﬃculty” slows down the learning, and thus the shiftto an “express saccade expert”. We described a model of the saccadic system with some very speciﬁc structuralfeatures: 21

Conjunction caseConjunctionSpatial and color error Bad averageBad saccade

No conjunction caseColor errorSpatial error Good averageBad saccade

50 100 150 200 250 300time(ms)050100150200250

Reaction time histogramSC actor weights

FEF actor weights

R G BRGB

IT actor weights

A BC D

Figure 15: Results of the conjunction task. The rewarded cue is the rightred one; if not present, reward is given for ﬁxating the center area. A: averagechoices, in the conjonction (top) and no conjunction (bottom) case. In theconjunction case, “conjunction” represents the good choice, “spatial and colorerror” a movement towards the wrong cue, “bad average” an averaging saccade(both targets selected simultaneously) and “bad saccade” (saccades that fallneither within a 2.5 ◦ radius from the center or any cue). In the no conjunctioncase, “good average” is a rewarded saccade keeping the eyes on the ﬁxationpoint, “spatial” and “color” errors respectively represent movements to the greentarget on the right and to the red target on the left, and “bad saccade” in anyother position (generally between ﬁxation and cue but outside the 2.5 ◦ radius).B,C,D: same as Figs. 11 and 14. 22 the cortico-basal circuits operate in various dimensions (selection basedon spatial position, or on target features), with sensory inputs providedwith a given latency, • the subcortico-basal circuit operates on spatial information only, and witha shorter latency, • all these circuits are subject to reinforcement learning at the level of theinput of the basal ganglia,We claim that this structure predicts very speciﬁc behaviors, especially infeature-based and space-and-feature-based decisions: • In the spatial decision task, an ability to switch, from long-latency to shortlatency saccades (thanks to the learning of the subcortical circuit). Aneﬀect experimentally described in (Fischer et al., 1984). • In this task, after the learning of the subcortical shortcut, an early burstof activity in the FEF appears, caused by resonant activity in the spatialcircuit. This burst slightly contributes to the reduction of the saccadelatency. • In the color decision task, the concurrently learning subcortical circuit re-duces the eﬃciency of learning, when compared to the spatial task. In nor-mal animals, this eﬀect could be cancelled by a external cognitive brake,for example the dlPFC, acting on the subcortical circuit. Thus we predictthat this deﬁcit observed in simulation should be observed only in animalswith prefrontal cortex deactivation. • In the conjunction color-and-space-based task, again with the same pre-frontal cortex deactivation, space should dominate in the sense that whenthe cunjunction is not presented, 1) inhibiting the response should bediﬃcult and disappear with learning, 2) the resulting errors should bepreferentially directed towards the correct position in space rather thantowards the target with the correct color, 3) the saccade latencies shoulddecrease as in the purely spatial task, a clear clue that the subcorticalspatial circuit has taken full control of the decisions.

Very few models have investigated the operation of multiple basal ganglia cir-cuits in saccadic decision and learning (Girard and Berthoz, 2005), and evenfewer took into account the existence of a purely subcortical loop.The seminal model of Dominey & Arbib (Dominey and Arbib, 1992, 1995)is quite complete, with memory and sequence learning that we have not yetreplicated. Nevertheless, some of its aspects seems now rather outdated. First,their model lacks the subcortical SC-Th-BG loop which is now clearly identiﬁed:they only integrated cortical loops. This subcortical loop can operate faster thanthe cortical circuit and one aim of our work is to explore their interactions.Second, the basal ganglia model they used is oversimpliﬁed. Indeed it is onlybased on the direct/indirect interpretation of the BG connectivity, from whichthey keep the direct pathway only. Consequently, concurrent channels cannot23nteract in the BG circuitry which make target selection problematic. TheirSC motor layer thus requires an ad hoc winner-takes-all mechanism, where ourmore complete BG model solves these problems.The model proposed in (Brown et al., 2004) includes a cortical loop dedi-cated to saccade strategy selection, and a subcortical loop dedicated to targetselection. They also include a working memory mechanisms we have not yetincluded. Their cortical “strategy” loop explicitly selects whether the target ofa saccade will be based on the ﬁxation cue, target position or target feature.Their subcortical loop lacks any thalamic relay and is entirely controlled by thecortical loop, making it unable to learn and make saccade without it. Finally,the details of their BG circuitry suﬀer from limitations, discussed in details in(Girard and Berthoz, 2005).Chambers et al. (2005) proposed a model integrating both the subcorticaland cortical pathways without learning capabilities, where a single up-to-dateBG model dedicated to location-based selection integrates FEF and SC inputs.Using the various positive feedback loops of this circuitry, they show that ma-nipulating the level of dopamine in their BG model generate reaction time andsaccade size modiﬁcations reminiscent of Parkinson’s disease patient behavior.This model is equivalent to our spatial circuits, and does not explore learningand competition between cortical loops.The model described in (Guthrie et al., 2013) integrates two cortical loops(“cognitive” and “motor”) interacting through diﬀerent associative structuresat both cortical and striatal level. They store in a sub-part of the Striatumall the possible spatial and feature combinations, which could create an obviouscombinatorial problem in a realistic model with a full ﬁeld of view representationand a rich feature space. This model has shown the ability to learn to selecttargets based on conjunction of information between the two loops but does notinclude SC and does not specify how the selection in the BG is transformedin a motor command. The associative striatal structure is dependent on theassociative cortical one and provides a mean of information transfer betweenloops. However, the BG architecture used is quite simpliﬁed, lacking GPe andGPe-STN connectivity. Finally this model does not include any subcortical loopand thus did not study possible interactions between cortical and subcorticalloops.

Our results show that the system is able to learn basic behaviors such as the“spatial task” and the “color task”. Moreover, we observed quite diﬀerent abil-ities for these tasks. A ﬁrst diﬀerence appeared on the color task performancewhich only rises to about 75%. This diﬀerence can be explained by the verystructure of the model where the spatial loop intrinsically dominates the sys-tem as it includes the SCi output map and has access to information before thecolor one. Thus, it can learn before the color loop processes information and, hasthe last word on selection. This characteristic is conﬁrmed in the “conjunctiontask” where the system ﬁnally learned a “spatial task”. What is quite clear withthis architecture is that subcortical spatial choice should prevail when opposedto a color one. This characteristic was also observed in a previous work witha simpler model without the cortical spatial loop (N’Guyen et al., 2010) andseems to be a prediction of this architecture. Such a prediction could be tested24n animals with dlPFC inactivation in a task where both a spatial and featurecriterion contradict each other as we expect the dlPFC to inhibit impulsive sub-cortical behavior. This prediction wouldn’t be hold for the model proposed byGuthrie et al. (2013) as they explicitly represent conjunction information in theStriatum, and this allows for an experimental discrimination between the twomodels.

Moreover, another stable outcome of this model relates to the saccade reactiontime. We observed what resembles to “express saccades” for the spatial task.These short latency saccades occurred only after a period of learning in our case.This training dependent behavior is in accordance with previous observationson monkeys (and humans) (Fischer et al., 1984; Fischer and Ramsperger, 1986).However it appears that monkeys are also able to trigger some rare and spon-taneous express saccades without learning that our model cannot reproduce.This behavior may be viewed as a kind of exploratory one, clearly lacking inour model.These express saccades are only performed toward learned locations andnever toward learned features. This suggests that this behavior is locationdependent and not feature dependent, which is in accordance with results inmonkeys (Fischer et al., 1984; Schiller and Haushofer, 2005). Indeed, imposedsensory pathways latencies exclude the ability of express saccade for the corticalcolor loop (122 ms) which easily explains the lack of such saccade in the colortask. Therefore, the intrinsic architecture of the model predicts that correctexpress saccades cannot occur based on feature information. Moreover in oursystem this spatial dependency is encoded in a retinocentric reference frame andso doesn’t depend on the location of target in space which is also in accordancewith previous results (Schiller and Haushofer, 2005).Moreover it seems that these express saccades are not dependent on FEFas simulations done with FEF inactivation on a learned system, only lengthenthem of about 15 ms which seems to be quite in accordance with what wasobserved in lesion studies (Schiller et al., 1987).Interestingly, we observed a short latency burst of activity in FEF prior tothe execution of the express saccade. This activity is not caused by a directvisual input (it appears before visual input reaches FEF) but by an indirect SCactivity causing the a resonating activity in the cortical loop. Although a SC toFEF projection, either direct or through the Thalamus, has been hypothesized(Sommer and Wurtz, 1998; Everling and Munoz, 2000), this induced activitythrough BG disinhibition seems to be a new prediction of our model.Notice that the express saccades we obtained could be theoretically short-ened even more with a pre-disinhibition of BG which could be viewed as apreparatory activity. Doing so it should be possible to shorten latency by tensof milliseconds maybe explaining the observed range of timings from 70 to 90msin living animals. For example a preparatory activity in FEF during the gap pe-riod which could either facilitate or even elicit disinhibition of BG (Everling andMunoz, 2000). Whether this pre-disinhibition exists or not remains a questionto be answered experimentally. However this phenomenon was not observed inour system and may require some memory capacity that we did not implement.If we look further at the SRT distributions, what is commonly observed in25rimates is a bimodal distribution of reaction time for a detection task (onlyone cue) which can be related to our spatial task. These two modes are in therange of 80-100ms and 130-160ms. Moreover, as said before these timingskeeps quite unmodiﬁed after a FEF lesion but are drastically changed after aSC lesion (Schiller et al., 1987). Our model produces a compatible bimodaldistribution but with a longer latency for the second mode which involves thecolor loop. So it seems that our model doesn’t capture the exact mechanismexplaining this precise timing.In contrast, a unimodal distribution is observed in primates for a discrimina-tion task (where the animal has to chose a cue based on a feature) which can berelated to our color task. In this case the distribution is wider and in the rangeof 160-200ms without express saccades. Once again, this distribution remainsunchanged after FEF lesion but is modiﬁed after a SC lesion (Schiller et al.,1987). Here the mechanism proposed by our model seems quite consistent withthe experimental data.Unfortunately to the best of our knowledge there is no data on a spatial-feature conjunction task in the literature, but it is to be noted that a similarthree peaks distribution was observed in a quite diﬀerent task where the primatehad to chose between two targets (both rewarded) presented with a 50ms oﬀset(Schiller et al., 2004).

Noise is necessary in the system to allow the generation of saccades towardsone target among two with similar predicted values, rather than systematicallyresulting in averaging saccades. While averaging saccades sometimes happen inbehaving animals (Ottes et al., 1984) they are quite rare and not as systematicas our model would produce them without perceptual noise. This is because theoutput of our BG do not represent a probability distribution of possible targetsbut indeed a direct control that requires a unique choice. Yet, our solution isprobably a bit simplistic, a more plausible one would be to produce a selectionwith more competition between targets such as “race models” (Bundesen, 1987;Ludwig et al., 2007). These mechanisms would most of the time allow a selectionof a unique target between two perfectly identical cues. Moreover these mech-anisms could also produce an attentional engagement/disengagement behaviorwhich could produce the “gap eﬀect” (Saslow, 1967; Braun and Breitmeyer,1988) that our model cannot replicate.

In our model we have chosen to include only one SC-Th-BG loop but McHaﬃeet al. (2005) have identiﬁed at least two (maybe three) diﬀerent loops involvingdiﬀerent layers of the SC.The ﬁrst one linking the SC superﬁcial layers (SCs) to the BG via lateralposterior (LP) and pulvinar nuclei of thalamus and ending back to the SCsuperﬁcial layers (and possibly also deep layers). According to the fact thatSCs activity is mainly driven by direct retinal projection, it seems reasonableto think that this loop could be responsible of selection of these retinal inputs.We didn’t implement this loop that appeared redundant in our model as we26ncluded a SCs to SCi projection but we can imagine a diﬀerent mechanismwith for example a SCs to SCi pathway gated by SNr inhibition.The second loop – that we implemented in our model – links the SC deeplayers (SCi) to BG via intralaminar thalamus nuclei (both caudal and rostral,which represent segregated regions with diﬀerent type of contact to striatalmedium spiny neurons and thus may in fact describe two parallel loops). Thedeep layers of the SC are known to receive aﬀerent connections from multipleareas (sensory, premotor, motor, but also multisensory. . . ) (May, 2006) thusprobably conveying much higher level information. Moreover, as good evidencesindicate a SCs to SCi projection (Lee et al., 1997; Isa, 2002), it seems reasonableto think that this loop could be involved in selection of sensory (or high order)targets for orienting behavior as described in this work.

The conjunction task clearly requires the ability to select and combine featureand location, but we built our model with the conservative assumption thatthese diﬀerent types of information were treated independently by strictly sep-arating feature and spatial loops in the learning stage. We thus stick to theassumption of parallel functionally segregated loops as described in (Alexanderet al., 1986a). Moreover this choice was also driven by anatomical considerationsas the TE region of IT seems to projects to the “Visual Striatum” (Middletonand Strick, 1996) while the FEF seems to project to the “Oculomotor Stria-tum” (Stanton et al., 1988). This architecture should make learning quicker andlearning generalization easier (i.e. we can directly learn that a color is rewardedregardless of its position rather than learn each color/location combination).This assumption has also the clear advantage to keep the system simple with-out the need to learn all possible combinations of features and locations whichwould causes a problem of combinatorial explosion. But the disadvantage isthat the system has no means to directly associate the couple feature/locationand can only separately learn both, explaining the relatively poor performancesfor this task.We hoped that each loop could learn to select separately and then producethe desired behavior while combined back at SC level. However, with this archi-tecture the only mean to perform the correct behavior (trigger a saccade only ifthe good cue appears at the good position) is by triggering an average saccadebetween the two cues in the “no conjunction” case and thus keeping ﬁxationclose to the center. In our model, it becomes less and less probable as learningprogresses, because the spatial loop becomes quicker than the feature one, thusfeature information cannot be included in the decision anymore. Notice thatwith an external brake (such as inhibition from dlPFC) limiting the expressionof express saccades, the task could probably be learned.Diﬀerent architectures can be proposed to alleviate this problem in morerealistic ways. It is possible to combine all the information at diﬀerent levels.FEF is known to receive inputs from multiple areas (Schall et al., 1995), beinga convergence structure for ventral and dorsal visual stream. In particular inour case, IT (TE) is known to project to FEF (Schall et al., 1995) and we canimagine that FEF already combines spatial and non-spatial information. Thiscombination could occur after feature selection and then explain the observedsalience map (Thompson et al., 2001).27nother possibility could be a combination at the Striatum level allowing thepossibility to learn combination of inputs as done in (Guthrie et al., 2013). Thedisadvantage is to multiply the size of the input vector as stated above. If wehave N spatial channels and M color channels the input size is N × M and the all-to-all weight matrix ( N × M ) . Even if Guthrie et al. (2013) invoked interestingbiological bases, one can question if this kind of combination is a problem inbiological systems. The predictions we make about the conjunction case couldhelp deciding based on experimental data, which architecture (separated ormerged loops) is correct.Finally, interaction between loops can also happen at the Thalamus level.Even if FEF and IT loops doesn’t share the same Thalamic nuclei (VAmc forIT and MDpl for FEF) this mechanism could still be possible. Disclosure/Conﬂict-of-Interest Statement

The authors declare that the research was conducted in the absence of anycommercial or ﬁnancial relationships that could be construed as a potentialconﬂict of interest.

Acknowledgement

Funding:

This research is funded by the HABOT project (Emergence(s) Villede Paris program).

Supplemental Data

Model parameters

Table 1: Parameters of the BG model in the spatial loop. The two independentThalamus modules share the same parameters. N τ ms τ ST N ms τ F S ms τ F C msτ T H ms τ T RN ms γ . W D GPe . W GPeD . W D GPe . W GPeD . W FSGPe . W D FS . W D FS . W GPeSTN . W STNGPe . W GPiGPe . W GPiSTN . W GPiD . W THTRN . W TRNTH . W THFCtx . W FCtxTH . W TRNFCtx . W THGPi . W STNFCtx . W D FCtx . W D FCtx . W FSFCtx . I D − . I D − . I ST N . I GP i . I GP e . I T h . W D /D Input . W FSInput . W FCInput . N τ ms τ ST N ms τ F S ms τ F C msτ T H ms τ T RN ms γ . W D GPe . W GPeD . W D GPe . W GPeD . W FSGPe . W D FS . W D FS . W GPeSTN . W STNGPe . W GPiGPe . W GPiSTN . W GPiD . W THTRN . W TRNTH . W THFCtx . W FCtxTH . W TRNFCtx . W THGPi . W STNFCtx . W D FCtx . W D FCtx . W FSFCtx . I D − . I D − . I ST N . I GP i . I GP e . I T h . W D /D Input . W FSInput . W FSInput . τ ms τ Sat ms (cid:15)

OP N . (cid:15) trig . (cid:15) stop . W LLBSCi . W MotOPN . W BNOPN W IntMot . W MotSat . W TNBN . W MNBN . W θMN . τ ms W SCiSCs . W SCiFEF . W SCiV | IT . W SCiSCiin . W SCiBGamp . W SCiBGinhib . W SGinhib GP i | SN R rest . SN R rest . γ spatial . η spatial . λ spatial . γ color . η color . λ color . References

Alexander, G., DeLong, M., and Strick., P. L. (1986a). Parallel organizationof functionally segregated circuits linking basal ganglia and cortex.

Annualreview of neuroscience , 9.Alexander, G. E., DeLong, M. R., and Strick., P. L. (1986b). Parallel orga-nization of functionally segregated circuits linking basal ganglia and cortex.

Annual Review of Neuroscience , 9:357–381.Arai, K., Das, S., Keller, E. L., and Aiyoshi, E. (1999). A distributed modelof the saccade system: simulations of temporally perturbed saccades usingposition and velocity feedback.

Neural networks : the oﬃcial journal of theInternational Neural Network Society , 12(10):1359–1375.Arai, K., Keller, E., and Edelman, J. (1994). Two-dimensional neural networkmodel of the primate saccadic system.

Neural Networks , 7:1115–1135.Braun, D. and Breitmeyer, B. (1988). Relationship between directed visual at-tention and saccadic reaction times.

Experimental Brain Research , 1988:546–552.Brown, J., Bullock, D., and Grossberg, S. (2004). How laminar frontal cortexand basal ganglia circuits interact to control planned and reactive saccades.

Neural Network , 17(4):471–510. 29undesen, C. (1987). Visual attention: Race models for selection from multi-element displays.

Psychological Research , pages 113–121.Chambers, J. M., Gurney, K., Humphries, M., and Prescott, T. (2005). Mech-anisms of choice in the primate brain: a quick look at positive feedback.In Bryson, J., Prescott, T., and Seth, A., editors,

Modelling Natural ActionSelection , pages 45–52. AISB Press, Brighton, UK.Das, S., Keller, E. L., and Arai, K. (1996). A distributed model of the saccadicsystem: the eﬀects of internal noise.

Neurocomputing , 11(2):245–269.Dominey, P. and Arbib, M. (1995). A model of corticostriatal plasticity forlearning oculomotor associations and sequences.

Journal of Cognitive Neuro-science .Dominey, P. F. and Arbib, M. (1992). A Cortico-Subcortical Model for Genera-tion of Spatially Accurate Sequential Saccades.

Cerebral Cortex , 2(2):153–175.Everling, S. and Munoz, D. P. (2000). Neuronal correlates for preparatoryset associated with pro-saccades and anti-saccades in the primate frontal eyeﬁeld.

The Journal of neuroscience : the oﬃcial journal of the Society forNeuroscience , 20(1):387–400.Fischer, B., Boch, R., and Ramsperger, E. (1984). Express-saccades of themonkey: eﬀect of daily training on probability of occurrence and reactiontime.

Experimental Brain Research , pages 232–242.Fischer, B. and Ramsperger, E. (1986). Human express saccades: eﬀects ofrandomization and daily practice.

Experimental Brain Research , pages 569–578.Fischer, B. and Weber, H. (1993). Express saccades and visual attention.

Be-havioral and Brain Sciences , 16(03):553.Fries, W. (1984). Cortical projections to the superior colliculus in the macaquemonkey: a retrograde study using horseradish peroxidase.

Journal of Com-parative Neurology , 230(1):55–76.Gattass, R., Sousa, A. P. B., and Gross, C. (1988). Organization and Extent ofV3 and V4 of the Macaque.

The Journal of Neuroscience , 8(June).Girard, B. and Berthoz, A. (2005). From brainstem to cortex: computationalmodels of saccade generation circuitry.

Progress in Neurobiology , 77(4):215–255.Girard, B., Tabareau, N., Pham, Q. C., Berthoz, a., and Slotine, J.-J. (2008).Where neuroscience and dynamic system theory meet autonomous robotics:a contracting basal ganglia model for action selection.

Neural networks : theoﬃcial journal of the International Neural Network Society , 21(4):628–41.Girman, S. V. and Lund, R. D. (2007). Most superﬁcial sublamina of rat superiorcolliculus: neuronal response properties and correlates with perceptual ﬁgure-ground segregation.

Journal of neurophysiology , 98(1):161–77.30roenewegen, H. J. and Berendse, H. W. (1994). The speciﬁcity of the ’non-speciﬁc’ midline and intralaminar thalamic nuclei.

Trends in neurosciences ,17(2):52–7.Guthrie, M., Leblois, A., Garenne, A., and Boraud, T. (2013). Interaction be-tween cognitive and motor cortico-basal ganglia loops during decision making:A computational study.

Journal of neurophysiology .Hikosaka, O., Takikawa, Y., and Kawagoe, R. (2000). Role of the basal gangliain the control of purposive saccadic eye movements.

Physiological reviews ,80(3):953–78.Hikosaka, O., Wurtz, R., et al. (1983). Visual and oculomotor functions ofmonkey substantia nigra pars reticulata. iv. relation of substantia nigra tosuperior colliculus.

J Neurophysiol , 49(5):1285–1301.Isa, T. (2002). Intrinsic processing in the mammalian superior colliculus.

Cur-rent Opinion in Neurobiology , 12(6):668–677.Koval, M., Lomber, S., and Everling, S. (2011). Prefrontal cortex deactivationin macaques alters activity in the superior colliculus and impairs voluntarycontrol of saccades.

Journal of Neuroscience , 31(23):8659–8668.Lamme, V. a. and Roelfsema, P. R. (2000). The distinct modes of vision oﬀeredby feedforward and recurrent processing.

Trends in neurosciences , 23(11):571–9.Lee, P. H., Helms, M. C., Augustine, G. J., and Hall, W. C. (1997). Role ofintrinsic synaptic circuitry in collicular sensorimotor integration.

Proceedingsof the National Academy of Sciences USA , 94:13299–13304.Lock, T. M., Baizer, J. S., and Bender, D. B. (2003). Distribution of corticotectalcells in macaque.

Experimental brain research. Experimentelle Hirnforschung.Expérimentation cérébrale , 151(4):455–70.Ludwig, C. J. H., Mildinhall, J. W., and Gilchrist, I. D. (2007). A populationcoding account for systematic variation in saccadic dead time.

Journal ofneurophysiology , 97(1):795–805.Lynch, J. and Tian, J.-R. (2006). Cortico-cortical networks and cortico-subcortical loops for the higher control of eye movements.

Progress in BrainResearch , 151:461–491.May, P. J. (2006). The mammalian superior colliculus: laminar structure andconnections.

Progress in brain research , 151:321–378.McHaﬃe, J., Jiang, H., May, P., Coizet, V., Overton, P., Stein, B., and Red-grave, P. (2006). A direct projection from superior colliculus to substancianigra pars compacta in the cat.

Neuroscience , 138:221–234.McHaﬃe, J. G., Stanford, T. R., Stein, B. E., Coizet, V., and Redgrave, P.(2005). Subcortical loops through the basal ganglia.

Trends in neurosciences ,28(8):401–7. 31cPeek, R., Han, J., and Keller, E. (2003). Competition between saccadegoals in the superior colliculus produces saccade curvature.

J Neurophysiol ,89(5):2577–2590.McPeek, R. and Keller, E. (2004). Deﬁcits in saccade target selection afterinactivation of superior colliculus.

Nat Neurosci , 7(7):757–763.McPeek, R. M. and Keller, E. L. (2002). Saccade target selection in the superiorcolliculus during a visual search task.

Journal of neurophysiology , 88(4):2019–34.Middleton, F. a. and Strick, P. L. (1996). The temporal lobe is a target of outputfrom the basal ganglia.

Proceedings of the National Academy of Sciences ofthe United States of America , 93(16):8683–7.Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition ofcompeting motor programs.

Progress in Neurobiology , 50(4):381–425.Montague, P. R., Dayan, P., and Sejnowski, T. J. (1996). A framework for mes-encephalic dopamine systems based on predictive hebbian learning.

Journalof Neuroscience , 16(5):1936–1947.Moschovakis, A., Scuddert, C., and Highstein, S. (1996). The microscopicanatomy and physiology of the mammalian saccadic system.

Progress inNeurobiology , 50:133–254.N’Guyen, S., Pirim, P., Meyer, J.-A., and Girard, B. (2010). An Integrated Neu-romimetic Model of the Saccadic Eye Movements for the Psikharpax Robot.In

From animals to animats 11 .Ogawa, T. and Komatsu, H. (2004). Target selection in area V4 during a mul-tidimensional visual search task.

The Journal of neuroscience : the oﬃcialjournal of the Society for Neuroscience , 24(28):6371–82.Ottes, F., Gisbergen, J., and Eggermont, J. (1986). Visuomotors ﬁelds of thesuperior colliculus: a quantitative model.

Vision research , 26:857–873.Ottes, F. P., Van Gisbergen, J. a., and Eggermont, J. J. (1984). Metrics ofsaccade responses to visual double stimuli: Two diﬀerent modes.

Vision Re-search , 24(10):1169–1179.Redgrave, P. (2007). Basal ganglia.

Scholarpedia , 2(6):1825.Redgrave, P., Prescott, T. J., and Gurney, K. (1999). The basal ganglia: avertebrate solution to the selection problem?

Neuroscience , 89(4):1009–1023.Rizzolatti, G. and Buchtel, H. (1980). Neurons with complex visual properties inthe superior colliculus of the macaque monkey.

Experimental Brain Research ,42:37–42.Saslow, M. G. (1967). Eﬀects of components of displacement-step stimuli uponlatency for saccadic eye movement.

Journal of the Optical Society of America ,57(8):1024–1029. 32chall, J., Morel, A., King, D. J., and Bullier, J. (1995). Topography of visualcortex connections with frontal eye ﬁeld in macaque: convergence and segre-gation of processing streams.

The Journal of Neuroscience , 15(6):4464–4487.Schiller, P. H. and Haushofer, J. (2005). What is the coordinate frame utilized forthe generation of express saccades in monkeys?

Experimental brain research.Experimentelle Hirnforschung. Expérimentation cérébrale , 167(2):178–86.Schiller, P. H., Haushofer, J., and Kendall, G. (2004). An examination ofthe variables that aﬀect express saccade generation.

Visual neuroscience ,21(2):119–27.Schiller, P. H., Sandell, J. H., and Maunsell, J. H. (1987). The eﬀect of frontaleye ﬁeld and superior colliculus lesions on saccadic latencies in the rhesusmonkey.

Journal of neurophysiology , 57(4):1033–49.Sommer, M. a. and Wurtz, R. H. (1998). Frontal eye ﬁeld neurons ortho-dromically activated from the superior colliculus.

Journal of neurophysiology ,80(6):3331–5.Sommer, M. a. and Wurtz, R. H. (2002). A pathway in primate brain for internalmonitoring of movements.

Science , 296(5572):1480–1482.Stanton, G., Goldberg, M. E., and Bruce, C. J. (1988). Frontal eye ﬁeld eﬀerentsin the macaque monkey: I. Subcortical pathways and topography of striataland thalamic terminal ﬁelds.

Journal of Comparative Neurology , 271:473–492.Sutton, R. (1988). Learning to predict by the methods of temporal diﬀerences.

Machine Learning , 3:9–44.Sutton, R. S. and Barto, A. G. (1998).

Reinforcement Learning: An Introduc-tion . The MIT Press, Cambridge, MA.Tabareau, N., Bennequin, D., Berthoz, A., Slotine, J.-J., and Girard, B. (2007).Geometry of the superior colliculus mapping and eﬃcient oculomotor compu-tation.

Biological cybernetics , 97(4):279–92.Thompson, K., Bichot, N., and Schall, J. (2001). From attention to action infrontal cortex. In Braun, J., Koch, C., and Davies, J., editors,

Visual attentionand cortical circuits , pages 137–157. MIT.Tian and Lynch, J. C. (1997). Subcortical input to the smooth and saccadiceye movement subregions of the frontal eye ﬁeld in Cebus monkey.

The Jour-nal of neuroscience : the oﬃcial journal of the Society for Neuroscience ,17(23):9233–47.Tompa, T. and Sáry, G. (2010). A review on the inferior temporal cortex of themacaque.

Brain research reviews , 62(2):165–82.Ungerleider, L. G., Galkin, T. W., Desimone, R., and Gattass, R. (2008). Corti-cal connections of area V4 in the macaque.

Cerebral cortex (New York, N.Y.: 1991) , 18(3):477–99. 33an Opstal, a. J. and Goossens, H. H. L. M. (2008). Linear ensemble-codingin midbrain superior colliculus speciﬁes the saccade kinematics.

Biologicalcybernetics , 98(6):561–577.White, B. J., Boehnke, S. E., Marino, R. a., Itti, L., and Munoz, D. P. (2009).Color-related signals in the primate superior colliculus.

The Journal of neuro-science : the oﬃcial journal of the Society for Neuroscience , 29(39):12159–66.White, B. J. and Munoz, D. P. (2011). Separate visual signals for saccade initia-tion during target selection in the primate superior colliculus.

The Journal ofneuroscience : the oﬃcial journal of the Society for Neuroscience , 31(5):1570–8.Zheng, T. and Wilson, C. J. (2002). Corticostriatal combinatorics: the impli-cations of corticostriatal axonal arborizations.