How Do You Act? An Empirical Study to Understand Behavior of Deep Reinforcement Learning Agents
HHow Do You Act?An Empirical Study to Understand Behavior ofDeep Reinforcement Learning Agents
Richard Meyes, Moritz Schneider, and Tobias Meisen
Chair of Technologies and Management of the Digital Transformation,University of Wuppertal, Rainer-Gruenter-Strae 21, 42119 Wuppertal, Germany { meyes, m.schneider-hk, meisen } @uni-wuppertal.de Abstract.
The demand for more transparency of decision-making pro-cesses of deep reinforcement learning agents is greater than ever, dueto their increased use in safety critical and ethically challenging domainssuch as autonomous driving. In this empirical study, we address this lackof transparency following an idea that is inspired by research in the fieldof neuroscience. We characterize the learned representations of an agent’spolicy network through its activation space and perform partial networkablations to compare the representations of the healthy and the inten-tionally damaged networks. We show that the healthy agent’s behavioris characterized by a distinct correlation pattern between the network’slayer activation and the performed actions during an episode and thatnetwork ablations, which cause a strong change of this pattern, lead tothe agent failing its trained control task. Furthermore, the learned rep-resentation of the healthy agent is characterized by a distinct patternin its activation space reflecting its different behavioral stages during anepisode, which again, when distorted by network ablations, leads to theagent failing its trained control task. Concludingly, we argue in favor ofa new perspective on artificial neural networks as objects of empirical in-vestigations, just as biological neural systems in neuroscientific studies,paving the way towards a new standard of scientific falsifiability with re-spect to research on transparency and interpretability of artificial neuralnetworks.
Keywords:
Transparency, Interpretability, Explainability, Deep Rein-forcement Learning, Neuroscience
Recent research on general-purpose artificial intelligence (AI) has seen somemajor breakthroughs in the past few years spurred by the advances of deepreinforcement learning (DRL) algorithms utilized in environments with sparserewards and complete information [1,2] or in complex multi-agent environmentswith incomplete information [3,4,5]. However, the research path leading up totoday’s pinnacle of these applications is marked by a crisis of reproducibility a r X i v : . [ c s . L G ] A p r R. Meyes et al. and required intense manual trial-and-error efforts such as finding a good net-work initialization and subsequent hyper-parameter tuning, which can make allthe difference between a working and a failing solution [6]. What complicatesthe problem even more is that many working solutions are interspersed withunwanted behavioral artifacts that manifest in the learned policy of agents, ifthe environment allows for such manifestation, e.g. in the domain of learninglocomotion [7]. Such artifacts are commonly caused by incentivizing an agentto solely maximize a possibly richly shaped reward without any constraints onits policy. The usual approach of training agents to maximize their cumulativereward and quantitatively evaluating them solely based on this reward or anyother performance metric, such as the ELO rating in chess, raises a key question:
How can we trust an agent, if we do not understand how its behavioremerges from its internal processes and the complex interplay of itsindividual functional components?
In this paper, we aim to contribute towards answering this question followinga research paradigm from the field of neuroscience based on empirical studiesof large and complex neural systems. Such systems have been the objects ofinvestigation for decades starting with the influential work of Hubel and Wieselin the 1950s [8], aiming to make them transparent and interpretable with respectto how their inner processes contribute to abstract concepts like consciousnessand decision-making. Specifically, we investigate the behavior of DRL agents inthree different classic control environments based on the learned representationsof their policy networks, aiming to find a link between these representationsand different behavioral stages during the execution of the trained policy. Wecharacterize the actor’s learned representations based on its layer activationduring the execution of the policy and use network ablations (cf. section 3.2) tointentionally damage agents, evoking malfunctioning behavior to compare therepresentations of the fully intact and damaged networks to each other.First, we investigate the impact of network ablations with different sizes indifferent layers on the agent’s capability to solve its trained control task andshow that the the agent exhibits a task specific robustness to these ablationsdepending on the size and location of the ablations. We further investigate howthe activations of single units contribute to solving the control task, uncoveringspecific correlation patterns between these activations and the executed actionsduring an episode. Finally, we investigate patterns in the temporal evolvementof the actor’s layer activation and find that the healthy agent’s learned repre-sentation contains distinct activation states that can be directly linked to thedifferent behavioral stages of the policy that successfully solves the control task,ultimately providing a link between the agent’s behavior and its internal pro-cesses.
Most of the recent work on transparency, interpretability and explainability ofAI comes from the field of computer vision (CV), where the main focus is com- ow Do You Act? 3 monly placed on investigations of convolutional neural networks (CNNs) andthe importance of specific input variables for a network’s output [9,10,11,12,13].Similar efforts are made in the field of natural language processing (NLP), whererecurrent neural networks (RNNs) are investigated for their representations oflinguistic properties, contextual understanding or sentiment [14,15,16,17]. Typi-cally, learned network representations are characterized via embedding methodslike t-SNE [18] or UMAP [19] visualizing the high dimensional activation-spaceof neural networks to identify the role of specific network components in solv-ing a given task [20,21,22,23,24]. To this end, network ablations were used tostudy the impact of single units on a network’s performance [25], aiming todecide which units can be pruned without affecting a network’s discriminativepower [26,27,28]. Subsequently, network ablations revealed that a single unit’simportance can be characterized by the magnitude of its weights [29] and theextent to which the distribution of its incoming weights changes during training[30,31]. Additionally, it was shown that units, which are easily interpretable, arenot necessarily more important than units with a less accessible interpretability[32]. Recently, controversial insights on methods how to evaluate the similar-ity of learned network representations have been reported and demonstrate theearly stage of current knowledge and thus, the importance and the need formore research on the topic [33,34]. In general, the extensive efforts of recentresearch aimed to map the classification result of a supervised trained networkto humanly interpretable explanations. We aim to extend these efforts towardsthe DRL domain, where despite some work on understanding Deep Q-Networksand interpreting their learned policies in environments with a discrete actionspace [35], to the best of our knowledge, work on facilitating transparency andinterpretability of learned representations by means of network ablations hasnot been conducted yet. However, in view of the fact that robust DRL and itsapplication in real world scenarios is still a matter of current research [6,36], weargue that a better interpretability of DRL agents is of utmost importance.
In this empirical study, we trained a DRL agent in three different classic con-trol environments, namely the cart-pole swing-up (CPSU) environment [37], thependulum swing-up (PSU) environment [38] and the cart-pole balance (CPB)environment [37] (cf. Figure 1). Although each environment poses an individ-ual challenge, they share the partial objectives of controlling a cart on a railor balancing a pendulum/pole in an upright position, providing some degree ofcomparability of the observed agent’s behavior across tasks. We refrain from amore detailed explanation of the intricacies of these environments regarding theirstate space, action space and reward functions at this point, as they are well-known benchmark environments for DRL research and have been extensivelyexplained elsewhere [37,38].
R. Meyes et al.Fig. 1: Three exemplary rendered images of the respective control environments.
As the object of investigation, we trained an actor-critic agent in the threedescribed environments with the deep deterministic policy gradient algorithmas outlined in [39]. Both, the actor and the critic network consist of two hiddenlayers with 400 units in the first layer and 300 units in the second layer with bothlayers using ReLU activation and layer normalization [40]. The critic is suppliedwith the actor’s chosen actions, which is superimposed by an OrnsteinUhlenbecknoise process [41], only in the second hidden layer. Each agent was trained for800,000 time steps and optimized via Adam [42] with all other hyper-parametersbeing the same as in [39]. All computations were performed on a single machinecontaining two Intel Xeon Platinum 8168 processors with a total number of 48physical cores and 8 NVIDIA Tesla V100 32G GPUs.
We characterize the actor’s learned representations based on its layer activationduring policy execution. We use network ablations to intentionally damage theactor, evoking malfunctioning agent behavior to compare the representations ofthe fully intact and damaged networks. To this end, we record the activation ofeach single unit within the fully intact actor and its predicted actions for eachtime step of an episode in addition to the cumulative episodic reward to establisha baseline recording. Additionally, we record the same data for each individualablation case to compare it to the baseline recording.
Network Ablations.
We perform partial network ablations in a single layerwith varying proportions of ablated units by manually clamping their activationsto zero, effectively preventing any flow of information through the ablated units.We select the amount of ablated units in a range from 5% to 90% in steps of 5%until 30% and then in steps of 10% until 90%. In addition, we deviate from thispattern once by ablating 33 .
33% of units within a layer. Thereby, the ablatedunits are selected in a sliding window manner that is shifted across the layer,similar to sliding a kernel over an image in a CNN while the window positionis frozen during an episode. Note that the total number of ablations with thesame proportion varies because they depend on the size of the layer, the size ofthe window and the stride of the window. For instance, in a layer with 300 unitsand a chosen window size of 5% with a stride of 10 units, 15 units are ablated ow Do You Act? 5 at once resulting in 29 different network ablations in total. For all ablations, wechose a constant stride value of 10 units to gather sufficient activation recordingsfor statistical analysis while at the same time keeping the computational effortsmanageable.
Extraction of Activation Patterns.
To determine how single units con-tribute to the control task, we calculate the Pearson correlation coefficient ofits set of activations A i,j = { a t | t ∈ [0 , T ] } and the outputs of the actor net-work U = { u t | t ∈ [0 , T ] } , for each time step within an episode, where t denotesthe time step within the episode, T denotes the total number of time steps perepisode, i denotes the i -th layer and j the j -th unit within that layer.Furthermore, to characterize the learned representations within a layer ofthe actor, we store the activations of each single unit in that specific layer foreach time step of an episode in a matrix M T × N , where T denotes the numberof time steps per episode and N denotes the number of units per layer. Wevisualize the evolvement of the actor’s activation during an episode using an opensource Python implementation of UMAP [19] to embed the stored activationsinto a two-dimensional space, i.e. M ∈ R T × . Thus, each point in the embeddedspace represents the activation of a specific layer of the actor network for asingle time step of an episode. We chose the default parameters for the UMAPembeddings after an initial attempt for finding better values for the number ofnearest neighbours or the minimum distance between data points yielded nosignificant visual improvement of the embeddings. To establish a baseline evaluation, we train the healthy agent to achieve nearstate-of-the-art results in all three environments, i.e. a maximum total episodicreward of 886 . − .
87 for the PSU task and 1000 for theCPB task. For reasons of performance comparability across the three environ-ments, the absolute return is normalized so that the minimum return value ineach environment is 0 and the respective baseline return value is 1.Figure 2 shows the normalized return for the baseline in comparison to all29 network ablations in the first and second layer with a window size of 30% (120units) for the three control tasks. For both swing-up tasks, most ablations in thefirst layer have a negative impact on the agent’s capability to solve the tasks.Interestingly, there are some ablations that have little to no impact or even apositive impact, thus increasing the return. In case of the CPB task, ablating30% of the units in the first layer does not affect the agent’s capability to solvethe task at all. Contrary to the first layer, all ablations in the second layer havea strong negative impact for the CPSU task and the CPB task (except for twocases), however, only a few ablations have a comparably negative impact forthe PSU task, where many ablations have little to no impact or even a positiveimpact. The negligible impact of ablations suggests that either the capacity of
R. Meyes et al.30% ablations in the first layer 30% ablations in the second layer C P S U P S U C P B Ablated BaselineFig. 2: Comparison of the normalized returns achieved as a result of ablations of 30%of the units (red bars) in to its respective baselines (blue bars). the network has not been exploited to its fullest extent so that some units donot contribute to solving the task and could be pruned or that the informationrepresented by the ablated units is redundantly represented by other units mak-ing the agent robust against network ablations. The positive impact of ablationssuggests that some units may play competing roles in the learned representationand that resolving this competition by targeted ablations improves the agent’scapability to solve a task. Both observations are consistent with previously re-ported findings on the impact of ablations in supervised trained neural networkson image recognition tasks [30,31].Figure 3 shows the distributions of the normalized returns resulting fromthe different network ablations in the first layer and second layer for the threecontrol tasks.On average, the return decreases proportionally to the amount of ablatedunits. Comparing the impacts in the first layer across the three tasks shows asimilar trend for the CPSU and the PSU task, i.e. a slow but steady decrease ofthe achieved return with increasing sizes of ablations but a much more robustbehavior for the CPB task, where ablations of up to 50% generally do not affectthe agent’s capability to solve the task. Further, comparing the impacts in thesecond layer shows a similar trend for the CPSU and the CPB task, i.e. a strongand sudden decrease in the achieved return for small ablation sizes, but a muchmore robust behavior for the PSU task, where ablations of up to 33 .
33% onlymarginally affect the agent’s capability to solve the task. Interestingly, connect-ing the similarity of the ablation impacts with the similarity of the differenttasks suggests that the first layer holds a representation of how to swing up thepole/pendulum while the second layer holds a representation of how to controlthe moving cart. More precisely, ablations in the first layer impact the agent inboth tasks, in which a pole has to be swung up, while the representation for ow Do You Act? 7Fig. 3: Distributions of the normalized returns for all ablations performed in the firstlayer (left side) and second layer (right side). the task, which merely requires balancing the pole, is very robust against ab-lations in this layer. Analogously, ablations in the second layer strongly impactthe agent in both tasks, in which a cart has to be controlled, while the repre-sentation for the task without a cart is fairly robust against ablations in thislayer. These results suggest that interlinked learning objectives to solve the tasksuch as controlling the cart, swinging up the pendulum and subsequently balanc-ing it, are represented in different locations of the network. These observationsare consistent with previously reported findings on the localized representationsof specific classes in supervised trained neural networks on image classificationtasks [43,44,45].
Following the observations described above, we wonder what role the preciseinterplay of SUA plays with respect to the agent’s executed policy. More specifi-cally, we ask whether the contribution of SUA to the executed actions during anepisode shows a distinct pattern for the healthy agent and to what extent this
R. Meyes et al. D i ff ere n ce Baseline Units A b l a t i o n s -0.49-816.51-906.35-0.38 Return
Positive correlation ( ≈
1) Negative correlation ( ≈ − pattern is distorted in case of ablations with a negative impact on the achievedreturn. To this end, we characterize this pattern via the set of Pearson correla-tion coefficients calculated for the activations of single units within a layer andthe outputs of the actor network for each time step within an episode (cf. 3.2).Figure 4 shows this pattern for the baseline and four exemplary ablationsof 5% of units in the first layer activated in the CPSU task. Each row contains400 entries corresponding to the 400 units in the first layer. Each entry containsthe correlation value and shows how the unit’s activation correlates with theactor’s chosen action. The empty spaces in the rows show the ablated units, forwhich no correlation coefficient is calculated. The top row shows the baselinecorrelation pattern in comparison to the following four rows, which show thecorrelation patterns corresponding to the four exemplary ablations. The bottomfour rows show to what extent the patterns resulting from the ablations changecompared to the baseline pattern, specified by the difference between the base-line pattern and the ablation patterns. The ablations of units 100 to 119 and 270to 289, resulting in the agent’s failure to solve the task, show a general increasein correlation between the SUA and the chosen actions and the strongest differ-ence of the pattern compared to the baseline. A high correlation value indicatesa unit’s exclusive contribution to a specific control direction, i.e. whenever thecart is moved to either side, specific units are selectively active and contribute tothe control in a specific direction. However, such distinct contributions of singleunits do not seem to resemble a robust representation as we find that patternswith less distinct correlations between single unit activations and the chosenactions generally lead to higher returns. This observation shows some similaritywith previously reported findings about the importance of single units in super-vised trained networks for image classification tasks. Specifically, networks thatmemorize well instead of generalizing are more reliant on units that show a highselectivity in their activation for specific classes, indicating that units which se- ow Do You Act? 9High return Low return Baseline
Units 100 to 119Units 270 to 289
Mean V a r i a n ce Units 300 to 319BaselineUnits 30 to 49
Fig. 5: Scatter plot of the mean and the variance of the correlation patterns for thebaseline and all 29 ablations of the size of 5% and their corresponding returns in theCPSU task. lectively get activated for specific classes do not contribute as much to a robustand generalized representation as units with a less selective activation [32].In order to further solidify that notion, we compared the mean and thevariance of the correlation patterns of all ablations with the mean and the vari-ance of the baseline pattern, hypothesizing that high values for the mean andthe variance, corresponding to strong and distinct correlations, result in a lowreturn. Figure 5 shows a scatter plot of the mean and the variance of the corre-lation patterns for the baseline and all 29 ablations of the size of 5% and theircorresponding returns. Confirming the hypothesis, ablations of units resultingin large values for the mean and the variance, e.g. units 100 to 119 (marked inthe top right corner of the scatter plot) lead to low returns. Almost all otherablations with mean and variance values close to the baseline (points withinthe red ellipsis) do not result in task failures but achieve returns comparable tothe baseline. Interestingly, the ablation of the units 270 to 289, which results insmall values for the mean and the variance, also leads to a low return, suggestingthat our hypothesis can be extended towards small values for the mean and thevariance, corresponding to no clear contribution for most of the single units tothe control task.To further test the validity of the hypothesis across different sizes of abla-tions and across the three tasks, Figure 6 summarizes the effects of all ablations(5% to 90%) on the return and the dependency on the characteristics of the
Baseline regime of high returns (a) First layer, CPSU
Baselineregime of high returns (b) First layer, PSU
Baselineregime ofhigh returns (c) Second layer, CPBFig. 6: Scatterplot showing the mean (x-axis) and variance (y-axis) of the correlationcoefficients for all ablations of the specified layer. correlation patterns. Analogously to figure 5, the x- and y-axis show the meanand the variance of the correlation patterns. For the CPSU task, the highestreturn is generally achieved for patterns with a low variance as ablations leadingto larger variances show a decreased return. This suggests that the CPSU taskrequires single units to be generically involved in the control task and not tospecialize too strongly on specific controls. On the contrary for the PSU task,higher returns are generally achieved for patterns with a high mean and highvariance, suggesting a further refinement of our hypothesis with respect to taskspecific characteristics. Interestingly, ablations that increase both values beyondthe baseline lead to even higher returns while patterns with low values lead tolow returns. This suggests that the ability to swing-up the pendulum requiresthe units to contribute to the control in a very specific rather than generic way.Consistently, a very clear picture emerges for the CPB task, where no swing-upis required and only patterns with low values for mean and variance result inhigh returns, verifying our initial hypothesis. In combination with the CPSUtask, this suggests that the ability to control the moving cart requires a genericinvolvement of single units in the control task rather than specific roles.
Although the correlation patterns provide some insights on how the agent acts,they do not capture the temporal evolvement of the learned representations anddo not answer questions with respect to such evolvements, e.g. at what pointduring the episode does the agent fail? When does it diverge from the base-line behavior and in what way? Does the agent go through different behavioralstages during an episode and can these stages be linked to specific patterns inthe the learned representation? In order to answer these questions, we charac-terize the learned representations by embedding the layer activations recordedduring an episode (cf. 3.2) and compare the representations of the baseline tothe representations resulting from the ablations. ow Do You Act? 11Ablated Baseline start swing-up divergencepath 2path 1unionrail border path 1path 2unionstart stabilizationstart balance (a) Ablation of units 20 to 39 (5%) in the first layer. start swing-up divergencepath 2path 1union rail borderstart balancestart stabilization (b) Units 110-149 (10%),layer 1. path 2path 1unionrail borderstart swing-upfailure &swing-up path 1path 2start stabilizationstartbalance beforefailure (c) Units 260-289 (10%),layer 2.Fig. 7: Comparison of the temporal evolvement of layer activations between the baselineand three exemplary ablation cases for the CPSU task.
Figure 7 shows this comparison for three exemplary ablation cases for theCPSU task. Each scatter plot contains 1000 blue and 1000 red points corre-sponding to the layer activation for each time step during an episode for thebaseline and the ablation case, respectively. Note that even though the baselinesin (a) and (b) show the exact same values, they are embedded slightly differentlyas the embeddings were calculated separately for all cases. The three cases cor-respond to ablations, which had no effect on the agent’s capability to solve thetask (Figure 7a) or which lead to only half the return of the baseline (Figure 7band c). Figure 7a shows the evolvement of the layer activation during an episodefor the healthy and the damaged agent and how the different behavioral stages ofthe episode are linked to different sections of this evolvement. Both, the healthyand the damaged agent, start with moving the cart to the side, accelerating thependulum to swing it up. After the initial swing-up (upon reaching the rail bor-der), the agent is required to compensate for the excess momentum of the polevia corresponding cart movement to stabilize its upright position. This changein behavior results in a jump in the activation space from the initial activation path that corresponds to the initial swing-up behavior to another path that cor-responds to the stabilization behavior. The difference in activations is likely dueto the movement of the cart into the opposite direction upon reaching the railborder. Following the successful stabilization, the agent is required to balancethe pole by rapidly switching directions of the cart to maintain an upright poleposition. Interestingly, this behavior is represented in the activation space by twopaths, along which the layer activation progresses as the agent acts throughoutthe episode. The layer activation repeatedly switches between these two pathssuggesting that the network constantly changes between two distinct activationstates corresponding to the balancing act of the pole. At some point during theepisode, these two paths merge together (union) as the balancing act leads toan almost static position of the cart and the pole. However, from a mechanicalperspective, this constitutes an unstable equilibrium point for the pole, wheresmall perturbations of the pole’s angular position result in its downfall trigger-ing a renewed balancing act that is resembled by a renewed separation of themerged paths. This observations suggests that the convergence of the actor’s ac-tivation towards a single final activation state is not sufficient to solve the task.Rather, a stable and continuous transition between two distinct activation statesis necessary to sufficiently represent the balancing act. This observation seemssomewhat surprising considering the weak correlations of SUA to the actor’schosen actions throughout an episode (cf. 4.2). Although the SUA does not cor-relate strongly with the network’s executed actions, their combined activationslead to two distinct activation states of the network, each of them correspondingto the movement of the cart in either one of the two possible directions duringthe balancing act. This suggests that single units do not contribute individuallyto the control task, but rather as part of a larger conglomerate of units thatconstitute the two different activation states.Figure 7b shows an ablation case, for which the agent fails to balance thepole continuously after the initial swing-up and drops it after a short period ofholding it in the upright position, reattempting the swing-up and balancing act.The layer activation diverges slightly from the baseline right from the start ofthe swing-up and further diverges completely after a short period of the stabi-lization phase. Consequently, due to this divergence, the layer activation of thedamaged agent does not show the emergence of two distinct paths connected tothe balancing act as the agent never succeeds in stabilizing the pole compensat-ing its excess momentum after the initial swing-up. Interestingly, the existenceof two distinct activation states is not exclusive to the actor’s first layer but alsoapparent in its second layer. Figure 7c shows an ablation case in layer two, inwhich the failure of the agent is caused by a drop of the pole after the initialswing-up and a short period of balancing, causing the pole to rotate at highspeed until the end of the episode. The blue points resemble a similar pattern ofthe second layer’s activation compared to the first layer including the divergenceof the activation along two distinct paths, the attempt to merge these paths andthe renewed separation. The failure of the agent, i.e. the continuous rotation ofthe pole at high speed, is visible in the activation space by the circularly ar- ow Do You Act? 13 ranged red points, from which the agent is not able to recover back onto thestabilization path and the both connected paths corresponding to the balancingact.
In this paper, we conducted an empirical study to understand how a DRL agentacts based on characterizing the learned representations of its policy network.We shed some light on the role of single units for the control task and foundthat despite the absence of a strong correlation between their activations and theactor’s chosen actions throughout an episode, agents, that solve their tasks suc-cessfully, show task specific patterns of weakly correlated SUA that get distortedby network ablations leading to low returns. The importance of these patternsfor a successful solution of the control task suggests that the careful interplaybetween single units with respect to the executed policy is essential rather thantheir sole and isolated behavior. However, we have only scratched the surface ofhow such patterns of joint activations can be characterized. In our future work,we plan to systematically investigate the role of functional neuron populationsand their involvement in solving a given control task. Specifically, we plan to in-vestigate the activation of sub-populations of neurons aiming to uncover if thereis a link between their activations and the emergent agent behavior.We further investigated the temporal evolvement of the actor’s layer ac-tivations during an episode and showed that, in case of the CPSU task, theconsecutive steps executed during the episode to solve the task are preciselyrepresented by the policy network and mapped onto its layer activations. Wefruther showed that this mapping is essential for solving the task as its distor-tion as a result of network ablations leads to low returns and failed attempts tosolve the task. The arrangement of the consecutive points in the embedded ac-tivation space revealed that the agent runs along specific paths in its activationspace and that diverging from this path is fatal for its task performance. Themost striking observation of these paths is given by the fact that the actor’s layeractivations can be very different for very similar states. We naively expected thatthe layer activation would converge to a single specific activation vector just asthe consecutive states to be processed by the network become more and moresimilar to each other as the pole is balanced. However, we found that this is notthe case, suggesting that the learned representations may contain some informa-tion that is encoded in the temporal dimension on which the states are ordered,i.e. that the same state evokes a different activation of the network depending onwhen it is presented to the network. In our future work, we plan to investigatehow these distinct activation patterns evolve during training, aiming to answerthe question, whether the different behaviors are learned hierarchically, i.e. in aspecific order, or whether they emerge collectively.Considering that our study was limited to a single agent solving three dis-tinct control tasks, the universality of our results is strongly limited and theirimplications for other networks and tasks is not clear. We plan to address this issue by transferring our study design to a larger number of different networksand control tasks aiming to establish a scientific standard for the falsifiability ofempirical studies conducted in the field of artificial neural networks. Ultimately,we aim to pave the way towards a new perspective of neuroscience inspired em-pirical studies on artificial neural networks to exploit them as a test bed forneuroscientific research. Uncovering parallels between the structure and orga-nization of represented knowledge in artificial and biological systems opens upmeasures and possibilities for initial large scale studies in artificial systems be-fore transferring them to biological systems. Specifically, this addresses the issueof reproducibility, which, despite modern experimental methods, is one of themost critical issues in modern neuroscience, stemming from the large differencesbetween brains and the commonly small sample sizes in neuroscientific studies.
References
1. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche,J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. , “Master-ing the game of go with deep neural networks and tree search,” nature , vol. 529,no. 7587, pp. 484–489, 2016.2. D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot,L. Sifre, D. Kumaran, T. Graepel, et al. , “Mastering chess and shogi by self-playwith a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815 ,2017.3. OpenAI, “Openai five,” https://blog.openai.com/openai-five/ , 2018.4. B. Baker, I. Kanitscheider, T. Markov, Y. Wu, G. Powell, B. McGrew, andI. Mordatch, “Emergent tool use from multi-agent autocurricula,” arXiv preprintarXiv:1909.07528 , 2019.5. M. Jaderberg, W. M. Czarnecki, I. Dunning, L. Marris, G. Lever, A. G. Castaneda,C. Beattie, N. C. Rabinowitz, A. S. Morcos, A. Ruderman, et al. , “Human-levelperformance in 3d multiplayer games with population-based reinforcement learn-ing,”
Science , vol. 364, no. 6443, pp. 859–865, 2019.6. A. Irpan, “Deep reinforcement learning doesn’t work yet.” , 2018.7. I. Popov, N. Heess, T. Lillicrap, R. Hafner, G. Barth-Maron, M. Vecerik, T. Lampe,Y. Tassa, T. Erez, and M. Riedmiller, “Data-efficient deep reinforcement learningfor dexterous manipulation,” arXiv preprint arXiv:1704.03073 , 2017.8. D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’sstriate cortex,”
The Journal of physiology , vol. 148, no. 3, pp. 574–591, 1959.9. N. Papernot, P. McDaniel, I. Goodfellow, S. Jha, Z. B. Celik, and A. Swami,“Practical black-box attacks against machine learning,” in
Proceedings of the 2017ACM on Asia conference on computer and communications security , pp. 506–519,ACM, 2017.10. R. C. Fong and A. Vedaldi, “Interpretable explanations of black boxes by meaning-ful perturbation,” in
Proceedings of the IEEE International Conference on Com-puter Vision , pp. 3429–3437, 2017.11. K. Faust, Q. Xie, D. Han, K. Goyle, Z. Volynskaya, U. Djuric, and P. Diamandis,“Visualizing histopathologic deep learning classification and anomaly detection us-ing nonlinear feature space dimensionality reduction,”
BMC bioinformatics , vol. 19,no. 1, p. 173, 2018.ow Do You Act? 1512. J. Su, D. V. Vargas, and K. Sakurai, “One pixel attack for fooling deep neuralnetworks,”
IEEE Transactions on Evolutionary Computation , 2019.13. R. Fong, M. Patrick, and A. Vedaldi, “Understanding deep networks via extremalperturbations and smooth masks,” 2019.14. A. Karpathy, J. Johnson, and L. Fei-Fei, “Visualizing and understanding recurrentnetworks,” arXiv preprint arXiv:1506.02078 , 2015.15. A. Radford, R. Jozefowicz, and I. Sutskever, “Learning to generate reviews anddiscovering sentiment,” arXiv preprint arXiv:1704.01444 , 2017.16. A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, F. Dalvi, and J. Glass, “Identifyingand controlling important neurons in neural machine translation,” arXiv preprintarXiv:1811.01157 , 2018.17. A. Madsen, “Visualizing memorization in rnns,”
Distill , 2019.https://distill.pub/2019/memorization-in-rnns.18. L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,”
Journal of machinelearning research , vol. 9, no. Nov, pp. 2579–2605, 2008.19. L. McInnes, J. Healy, and J. Melville, “Umap: Uniform manifold approximationand projection for dimension reduction,” arXiv preprint arXiv:1802.03426 , 2018.20. M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better analysis of deepconvolutional neural networks,”
IEEE transactions on visualization and computergraphics , vol. 23, no. 1, pp. 91–100, 2016.21. P. E. Rauber, S. G. Fadel, A. X. Falcao, and A. C. Telea, “Visualizing the hid-den activity of artificial neural networks,”
IEEE transactions on visualization andcomputer graphics , vol. 23, no. 1, pp. 101–110, 2016.22. Z. Elloumi, L. Besacier, O. Galibert, and B. Lecouteux, “Analyzing learnedrepresentations of a deep asr performance prediction model,” arXiv preprintarXiv:1808.08573 , 2018.23. D. V., “Convnet playground,” https://convnetplayground.fastforwardlabs.com ,2019.24. S. Carter, Z. Armstrong, L. Schubert, I. Johnson, and C. Olah, “Activation atlas,”
Distill , vol. 4, no. 3, p. e15, 2019.25. F. Dalvi, A. Nortonsmith, A. Bau, Y. Belinkov, H. Sajjad, N. Durrani, and J. Glass,“Neurox: A toolkit for analyzing individual neurons in neural networks,” in
Pro-ceedings of the AAAI Conference on Artificial Intelligence , vol. 33, pp. 9851–9852,2019.26. P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutionalneural networks for resource efficient inference,” arXiv preprint arXiv:1611.06440 ,2016.27. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters forefficient convnets,” arXiv preprint arXiv:1608.08710 , 2016.28. N. Cheney, M. Schrimpf, and G. Kreiman, “On the robustness of convolutionalneural networks to internal architecture and weight perturbations,” arXiv preprintarXiv:1703.08245 , 2017.29. F. Dalvi, N. Durrani, H. Sajjad, Y. Belinkov, D. A. Bau, and J. Glass, “What isone grain of sand in the desert? analyzing individual neurons in deep nlp models,”in
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , 2019.30. R. Meyes, M. Lu, C. W. de Puiseau, and T. Meisen, “Ablation studies in artificialneural networks,” 2019.31. R. Meyes, M. Lu, C. W. de Puiseau, and T. Meisen, “Ablation studies to un-cover structure of learned representations in artificial neural networks,”
Int’l Conf.Artificial Intelligence 2019 , 2019.6 R. Meyes et al.32. A. S. Morcos, D. G. Barrett, N. C. Rabinowitz, and M. Botvinick, “On the im-portance of single directions for generalization,” arXiv preprint arXiv:1803.06959 ,2018.33. A. Morcos, M. Raghu, and S. Bengio, “Insights on representational similarity inneural networks with canonical correlation,” in
Advances in Neural InformationProcessing Systems , pp. 5727–5736, 2018.34. S. Kornblith, M. Norouzi, H. Lee, and G. Hinton, “Similarity of neural networkrepresentations revisited,” arXiv preprint arXiv:1905.00414 , 2019.35. T. Zahavy, N. Ben-Zrihem, and S. Mannor, “Graying the black box: Understandingdqns,” in
International Conference on Machine Learning , pp. 1899–1908, 2016.36. G. Dulac-Arnold, D. Mankowitz, and T. Hester, “Challenges of real-world rein-forcement learning,” arXiv preprint arXiv:1904.12901 , 2019.37. OpenAI, “Openai roboschool,” 2017.38. G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, andW. Zaremba, “Openai gym,” 2016.39. T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver,and D. Wierstra, “Continuous control with deep reinforcement learning,” arXivpreprint arXiv:1509.02971 , 2015.40. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprintarXiv:1607.06450 , 2016.41. G. E. Uhlenbeck and L. S. Ornstein, “On the theory of the brownian motion,”
Physical review , vol. 36, no. 5, p. 823, 1930.42. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXivpreprint arXiv:1412.6980 , 2014.43. A. Veit, M. J. Wilber, and S. Belongie, “Residual networks behave like ensembles ofrelatively shallow networks,” in
Advances in neural information processing systems ,pp. 550–558, 2016.44. C. Olah, A. Satyanarayan, I. Johnson, S. Carter, L. Schubert, K. Ye,and A. Mordvintsev, “The building blocks of interpretability,”
Distill , 2018.https://distill.pub/2018/building-blocks.45. I. Rafegas, M. Vanrell, L. A. Alexandre, and G. Arias, “Understanding trainedcnns by indexing neuron selectivity,”