[PDF] Can Single Neurons Solve MNIST? The Computational Power of Biological Dendritic Trees

Abstract

Physiological experiments have highlighted how the dendrites of biological neurons can nonlinearly process distributed synaptic inputs. This is in stark contrast to units in artificial neural networks that are generally linear apart from an output nonlinearity. If dendritic trees can be nonlinear, biological neurons may have far more computational power than their artificial counterparts. Here we use a simple model where the dendrite is implemented as a sequence of thresholded linear units. We find that such dendrites can readily solve machine learning problems, such as MNIST or CIFAR-10, and that they benefit from having the same input onto several branches of the dendritic tree. This dendrite model is a special case of sparse network. This work suggests that popular neuron models may severely underestimate the computational power enabled by the biological fact of nonlinear dendrites and multiple synapses per pair of neurons. The next generation of artificial neural networks may significantly benefit from these biologically inspired dendritic architectures.

Full PDF

CCan Single Neurons Solve MNIST? TheComputational Power of Biological Dendritic

Trees

Ilenna Simone Jones and Konrad Kording Department of Neuroscience, University of Pennsylvania Departments of Neuroscience and Bioengineering, University ofPennsylvaniaSeptember 4, 2020

Abstract

Physiological experiments have highlighted how the dendrites ofbiological neurons can nonlinearly process distributed synaptic inputs.This is in stark contrast to units in artiﬁcial neural networks that aregenerally linear apart from an output nonlinearity. If dendritic trees canbe nonlinear, biological neurons may have far more computational powerthan their artiﬁcial counterparts. Here we use a simple model where thedendrite is implemented as a sequence of thresholded linear units. Weﬁnd that such dendrites can readily solve machine learning problems,such as MNIST or CIFAR-10, and that they beneﬁt from having thesame input onto several branches of the dendritic tree. This dendritemodel is a special case of sparse network. This work suggests thatpopular neuron models may severely underestimate the computationalpower enabled by the biological fact of nonlinear dendrites and multiplesynapses per pair of neurons. The next generation of artiﬁcial neuralnetworks may signiﬁcantly beneﬁt from these biologically inspireddendritic architectures. a r X i v : . [ q - b i o . N C ] S e p Introduction

Though the role of biological neurons as the mediators of sensory integrationand behavioral output is clear, the computations performed within neuronshas been a point of investigation for decades (McCulloch and Pitts, 1943;Hodgkin and Huxley, 1952; FitzHugh, 1961; Poirazi et al., 2003a; Mel, 2016).For example, the McCulloch and Pitts (M&P) neuron model is based on anapproximation that a neuron linearly sums its input and maps this througha nonlinear threshold function, allowing it to carry out a selection of logic-gate-like functions, which can be expanded to create logic-based circuits(McCulloch and Pitts, 1943). The M&P neuron also sets the foundation formodern day neurons in artiﬁcial neural networks (ANNs), where each neuronin the network linearly sums its input and maps this through a nonlinearactivation function (Goodfellow et al., 2016; Lecun et al., 2015). ANNs, madeup of often millions of these neurons, are demonstrably powerful algorithmsthat can be trained to solve complex problems, from reinforcement learning tonatural language processing to computer vision (Lecun et al., 2015; Krizhevskyet al., 2006; Mnih et al., 2015; Devlin et al., 2018; Huval et al., 2015). However,M&P neurons and neurons of ANNs are point-neuron models that rely onlinear sums of their inputs, whereas the observed physiology of biologicalneurons shows that dendrites impose nonlinearities on their synaptic inputsbefore summation at the soma (London and Häusser, 2005; Poirazi et al.,2003b; Antic et al., 2010; Agmon-Snir et al., 1998). This indicates that M&Pand ANN neurons may radically underestimate what individual neurons cando. Although many models of single neuron activity use linear point neurons(Ujfalussy et al., 2015), it is known that dendritic nonlinearities are responsiblefor a variety of neuronal dynamics and can be used to mechanistically explainthe roles of biological neurons in a variety of behaviorally signiﬁcant circuits(London and Häusser, 2005; Agmon-Snir et al., 1998; Barlow and Levick, 1965).For example, passive properties of dendrites lead to attenuation of currentalong the dendrite, allowing for low-pass ﬁltering of inputs (London andHäusser, 2005; Rall, 1959). Active properties of dendrites allow for synapticclustering to result in super-linear summation of voltage inputs upon reachingthe soma (Antic et al., 2010; Schiller et al., 2000; Branco and Häusser, 2011).These properties allow for important functions such as auditory coincidence2etection and even logical operations within dendrites (Mel, 2016; Londonand Häusser, 2005; Agmon-Snir et al., 1998; Koch et al., 1983). To fullyexplore the scope of biological neuron function it is then important to modelmore sophisticated computations within dendritic trees.Models for individual neurons with meaningful dendrites have been pro-posed to better understand neuron computation (Mel, 2016; Gerstner andNaud, 2009). Biologically detailed approaches, such as employing the multi-compartmental biophysical model (Hines and Carnevale, 1997), have beenﬁtted to empirical data in order to study dendritic dynamics such as backprop-agating action potentials and nonlinear calcium spikes (London and Häusser,2005; Hay et al., 2011; Wilson et al., 2016). Poirazi et al. (Poirazi et al.,2003a) pioneered a more abstracted approach of modelling single neuronsthat isolates the impacts of including dendritic sigmoidal nonlinearities onpredicting neural ﬁring rates produced by dendrite-complete biophysical mod-els. This novel approach used a sparsely connected two-layer ANN whosestructure is analogous to that of a dendritic tree, showing it is possible tomodel individual neurons with ANNs.

While the morphology of a dendritic tree is key to modelling its computationalcapabilities (Mel, 2016; London and Häusser, 2005; Mel, 1993; Segev, 2006;Wilson et al., 2016), it may also be important to consider the role of repeatedsynaptic inputs to the same postsynaptic neuron. Complex computationin ANNs depends on dense connection, which repeats inputs to each nodein each layer (Lecun et al., 2015). Empirically, electron microscopy studieshave shown that a presynaptic axon synapses approximately 4 times perpostsynaptic neuron (Kincaid et al., 1998). Also, these studies show evidenceof a certain kind of repeated synapses called multi-synaptic boutons (MSBs)(Jones et al., 1997). MSBs have shown to occur 11.5% of the time in ratsliving in enriched environments (Jones et al., 1997). Additionally, it hasbeen shown that an in vitro long-term potentiation (LTP) induction protocolcan also increase the number of MSBs of the same dendrite 6-fold (Joneset al., 1997). LTP, involved in learning and memory (Bliss and Lomo, 1973;Stuchlik, 2014), can then lead to the replication of synapses between twoneurons. This suggests that repeated synapses may be important for changingthe computations a single neuron can do.3 .3 Contribution

By training and testing ANNs on complex tasks, the ﬁeld of machine learninggains computational clarity (Goodfellow et al., 2016; Lecun et al., 2015). Atthe moment the ﬁeld of neuroscience does not have this kind of in-depthcomputational clarity with individual, dendrite-complete neurons, despitethe fact that we can describe the diﬀerent behaviorally signiﬁcant functionsindividual neurons are able to fulﬁll (London and Häusser, 2005; Agmon-Sniret al., 1998; Barlow and Levick, 1965; Gidon et al., 2020). If we are to considera neuron as an input/output device with a binary tree as its dendritic tree,we may be able to test its ability to learn to perform complex tasks and gaininsight on how dendritic trees may impact the computation of a deﬁned task.Here we design a trainable, dendrite-complete neuron model in order totest its performance on binary classiﬁcation tasks taken from the ﬁeld ofmachine learning. The model comprises a sparse ANN: a binary tree in whicheach nonlinear unit receives only 2 inputs. The nonlinearities and structuralconstraints of this ANN can be compared to a linear point-neuron model,allowing us to test the impacts of nonlinearities in a dendrite-like tree. Themodel also allows us to test the impact of repeated inputs on task performance.We found that our binary tree model, representing a single biological neuron ,performs better than a comparable linear classiﬁer. Furthermore, whenrepeated inputs are incorporated into our model, it approximately matchesthe performance of a comparable 2-layer fully connected ANN. These resultsdemonstrate that complex tasks, for which it has been assumed that anensemble of multiple relatively simple neuron models are required, can in factbe computed by a singular, dendrite-complete neuron model.

One of the classical questions in neuroscience is how dendrite structure and thevarious synaptic inputs to the dendritic tree aﬀect computation (London andHäusser, 2005; Mel, 2016; Rall, 1959). Traditional neuron models are designedto best match observed neural dynamics (Poirazi et al., 2003a; Gerstner andNaud, 2009; Brette et al., 2011; Gouwens et al., 2018; Hay et al., 2011; Ahrenset al., 2006), however, with exceptions (Poirazi et al., 2003a; Ujfalussy et al.,2015; Gidon et al., 2020; Zador et al., 1992; Zador and Pearlmutter, 1996;Legenstein and Maass, 2011), the impacts of nonlinearities and, especially,4igure 1: Novel ANN neuron model with repeated inputs. Left: Tracedmorphology of the dendrite of a human BA18 (occipital) cell (Travis et al.,2005). Soma location is marked in pink. Middle: A representation of ahypothetical neuron. Inputs in dark blue at the terminal ends of one subtreeare repeated in light blue in 3 other subtrees. Upper right: Representation ofa M&P-like linear point neuron model. Middle right: k -tree neuron model,where k = number of subtrees. Each input and hidden node has leaky ReLUactivation function, and the output node has a sigmoid activation function.Bottom right: Representation of a 2-layer fully connected neural network(FCNN). Each input and hidden node has leaky ReLU activation functionand the output node has a sigmoid activation function.the impacts of repeated inputs on the computational capabilities of neuronshave yet to be quantiﬁed in the way we suggest. The computational abilitiesof ANNs can be judged by their performance on various complex tasks(Goodfellow et al., 2016; Lecun et al., 2015). Following this lead, we imposeddendritic binary tree structural constraints (Figure 1) on a trainable nonlinearANN, resulting in a special case of sparsely connected ANN. We call this a1-tree because it is similar to the structure of a single soma-connected subtreeof a dendritic tree. (Figure 1) By repeating this subtree structure multipletimes and feeding each the exact same input, we create what we call a k -tree,where k is the number of repeated trees connected to a soma node. By usinga trainable k -tree that has a biological structure constraint and repeatedinputs, we can quantitatively judge the computational performance of thisneuron model on performing complex tasks.Neurons, arguably, produce binary outputs (presence or absence of anaction potential) (Hodgkin and Huxley, 1952). Therefore, to fairly judge5

56 inputs 1024 inputs 3072 inputs k k-tree FCNN k-tree FCNN k-tree FCNN1 511 514 2,047 2050 6,143 6,1462 1,022 1,028 4,094 4,100 12,286 12,2924 2,044 2,056 8,188 8,200 24,572 24,5848 4,088 4,112 16,376 16,400 49,144 49,16816 8,176 8,224 32,752 32,800 98,288 98,33632 16,352 16,448 65,504 65,600 196,576 196,336 Table 1: ANN Parameter Size Comparison. Fully connected neural network(FCNN) architectures are matched in parameter size to the k-tree architec-tures.an individual neuron model’s performance on a complex task, we will use abinary classiﬁcation task. The complexity in the tasks can come from high-dimensional vector inputs from images taken from classic computer visiondatasets used in the ﬁeld of machine learning (Figure 2).As controls for performance comparison, we used a linear discriminantanalysis (LDA) linear classiﬁer to approximate the performance of a linearpoint neuron model, and a fully connected neural network (FCNN) that iscomparable in size to the k -tree. The linear classiﬁer model is relativelysimple compared to the more parameter-complex k -tree and FCNN, and weexpect it to be able to learn fewer functions (Dreiseitl and Ohno-Machado,2002); therefore, its performance sets an expected lower-bound. The FCNN isdensely connected and consists of 2-layers. With its nonlinearities, we expectit to learn to express a greater variety of functions, therefore its performancesets an expected upper bound. To compare the two ANNs, let us say that n is the number of pixel inputs to each classiﬁer, determining the numberof parameters, P , needed in each network, and h is the number of nodes inthe hidden layer of the FCNN. Based on the constraints of each network, theFCNN will then have P = h ( n + 1) and the k -tree will have P = k (2 n − .To match the number of parameters of the FCNN to that of the k -tree, weassert that h = 2 k . (Table 1). 6igure 2: Classiﬁcation task datasets. We considered seven machine learningdatasets of varying content and size, each with ten classes. For each dataset,two of the classes were chosen by selecting the least linearly separable pairusing a linear discriminant analysis (LDA) linear classiﬁer. Each image wasvectorized in order to be compatibly presented to each model. Classical models of neurons have been of linear point neurons that do nottake into consideration dendritic nonlinearities (McCulloch and Pitts, 1943;Hodgkin and Huxley, 1952; FitzHugh, 1961). By considering dendritic non-linearity and structure, we designed a new neuron model: a nonlinear ANNwith the structural constraints of a dendritic tree called a 1-tree. We thencompared the performance of this new model against a proxy for a pointneuron, an LDA linear classiﬁer. Focusing on one simple image classiﬁcationtask of a binary dataset of handwritten numbers, MNIST, we compare thecomputational performance of the 1-tree and the linear classiﬁer and thenonlinear, structurally dendritic 1-tree. Signiﬁcantly, the performance of the7

NIST FMNIST EMNIST KMNIST -tree 0.9220 ± ± ± ± -tree 0.9635 ± ± ± ± ± ± ± ± ± ± ± ± -tree vs LDA p < p < p < p < p = p = p = p = -tree vs 32-FCNN p = p = p = p < CIFAR10 SVHN USPS -tree 0.5605 ± ± ± -tree 0.5784 ± ± ± ± ± ± ± ± ± -tree vs LDA p < p = p = p = p = p = -tree vs 32-FCNN p = p < p = Table 2: k -tree Mean Performance Comparison to FCNN and LDA. Per-formance accuracy is listed as mean ± standard error for a set of 10 trials. p -Values calculated using student’s t-test. LDA and FCNN are used as lowerand upper bounds that the k -tree is compared to.1-tree is greater than that of the linear classiﬁer with p < p < p = p = k -tree compared to a linear classiﬁer and FCNN. k -tree performance is compared to that of a lower bound, LDA, and an upperbound, FCNN. The k is doubled 5 times, resulting in tests of k = 1, 2, 4, 8,16, 32. In all cases, as k (the number of repeated dendritic subtrees) increases,so does the performance accuracy of the k -tree, approaching the upper bound.FCNN performance for CIFAR10 and SVHN (Figure 3E-F, Table 2) may bedue to the FCNN’s failure to train in some trials, resulting in performancesclose to 50%. For most tasks we tried, the FCNN performed much betterthan the 1-tree. The computational impact of repeated inputs to a dendritic tree is not clear,however studies have shown increased repetition of inputs as a result ofplasticity events (Toni et al., 1999), which has implications for learning andmemory. By modifying the 1-tree by repeating the tree structure and input tothe model k times, we can then achieve a k -tree neuron model (Figure 1). Thiscan be a proxy for seeing how repeated inputs might impact computationalperformance on various binary image classiﬁcation tasks. Returning to theMNIST dataset, we tested k = 1, 2, 4, 8, 16, 32 and observed how increasing10 can gradually improve performance. For example, compare the performanceof a 1-tree to that of a 32-tree in the MNIST binary classiﬁcation task ( p = ± ± p = k -tree neuron model improves its performance on the MNISTbinary classiﬁcation task, nearly meeting the performance of a comparableFCNN.In order to see if this result generalizes, we tested the k -tree on 6 additionalbinary image classiﬁcation datasets. All tasks see an increase in performanceas the number of subtrees in the k -tree increases up to k = 32 (Figure 4B-G).The 32-tree has meets the performance of the FCNN in the FMNIST ( p = p = p = k = k -treeneuron model improves its computational performance in all tasks such thatit approaches the performance of a comparable FCNN. Knowing that the output of a neuron is binary (presence or absence of anaction potential), we chose to train our neuron model on a binary classiﬁcationtask. Using standard, high-dimensional, computer vision datasets, we usedlinear discriminant analysis (LDA) linear classiﬁer to determine which 2classes within each dataset were least linearly separable through training theLDA linear classiﬁer and testing it on pairs of classes (Figure 2). We usedMNIST (Lecun et al., 1998), Fashion-MNIST (Xiao et al., 2017), EMNIST(Cohen et al., 2017), Kuzushiji-MNIST (Clanuwat et al., 2018), CIFAR-10(Krizhevsky, 2009), Street View House Numbers (SVHN) (Goodfellow et al.,2014), and USPS (Hastie et al., 2001) datasets.11 .2 Controls

The controls we use are the LDA linear classiﬁer and a fully connected neuralnetwork (FCNN). The linear classiﬁer sets a baseline performance for linearseparability of each of the two classes per dataset, in addition to acting as aproxy for a linear point neuron model. The 2-layer FCNN is a comparablereference to see if k -tree performance meets or exceeds that of a denselyconnected network. The hidden layer of the FCNN is equal to twice thenumber of trees ( k ) in the k-tree it is compared to and its output layer has1 node. We used datasets from the torchvision (version 0.5.0) python package. Wethen padded the 28 by 28 resolution images with zeros so that they were 32x 32, and ﬂattened the images to 1-D vectors. We then split the shuﬄedtraining set into training and validation sets (for MNIST, the ratio was 1:5so as to let the validation set size match the test set), Then we split theresultant shuﬄed training set and validation set into 10 independent subsets.Each set was used for a diﬀerent cross-validation trial.

Using PyTorch (version 1.4.0), we designed the k -tree model architecture tobe a feed forward neural network with sparse binary-tree connections. Theweight matrices, which were dense tensors, of each layer were sparsiﬁed suchthat each node receives 2 inputs and produces 1 output. For example, the1024 pixel-size images were fed to a 1-tree with 10 layers: the input layeris 1024 by 512, the 2nd layer 512 by 256 etc. until the penultimate layeris reached with dimensions 2 by 1. The ﬁnal layer is k by 1 where k is thenumber of subtrees in the k -tree; in this case it would be 1 by 1. In thespecial case of the 3072 pixel size images, inputs were fed into a 1-tree with11 layers, the input layer is 3072 by 1024, the 2nd layer is 1024 by 512, etc.To account for the sparsiﬁcation, we altered the initialization of the weightmatrices: we used standard “Kaiming normal” initialization with a gain of1/density of sparsiﬁed dense tensor weight matrices. We also created a “freezemask” that recorded which weights were set to 0 in order to freeze thoseweights during training later. For the forward step, we used leaky ReLU with12 slope of 0.01 for nodes between layers, and sigmoid nonlinearity at the ﬁnaloutput node which kept output values between 0 and 1. The model, inputs, and labels were loaded onto a Nvidia GeForce 1080 GPUusing CUDA version 10.1. The batch size was 256. Early stopping was usedsuch that after 60 epochs where no decrease in the loss is observed, trainingis stopped. Loss was calculated using binary cross entropy loss. We usedan Adam optimizer with a learning rate of 0.001. Within the training loopimmediately after the backward step and before updating the weights usingthe gradients, we zeroed out the gradients indicated by the freeze mask so asto keep the model sparsely connected. Each train-test loop was run for 10trials with a diﬀerent training subset each trial and the same test set everytrial. Trial averages and standard deviation were then calculated and p-valueswere calculated using student t-test.

Here we quantify the potential computational capabilities of an abstractedneuron model with dendritic features and repeated inputs. We designed atrainable neuron model: a sparse ANN with binary dendritic tree constraintsmade up of nonlinear nodes (Figure 1). The tree that resulted from thisconstraint was repeated k times with identical inputs in order to explorethe impacts of repeated inputs. We judged the model by determining itsperformance on 7 high-dimensional binary image classiﬁcation tasks (Figure2), and compared its performance to a linear classiﬁer, a lower bound, anda comparable FCNN, an upper bound. The 1-tree, with its nonlinear nodesand dendritic structure constraint, performed better than the linear classiﬁerin almost all tasks (Figure 3). When we increased k of the k -tree from k = 1to k = 32, we saw a consistent increase in k -tree performance across all tasks(Figure 4). In the case of the MNISt task, the performance of the 32-treewas close to the comparable FCNN performance. Surprisingly, the 32-treein the FMNIST, EMNIST, and USPS tasks met that of the comparableFCNN. These ﬁndings emphasize the importance for modelers to considerboth dendrites and synaptic input repetitions.A limitation of this study is the relevance of our computational tasks.13lthough it is hard to know exactly what kind of input a neuron receives fromits presynaptic connections, we do not believe the 1-dimensional vectorizedinput we provide our neuron model is biologically plausible. Ordering thepixel input to these models randomly overall decreases k-tree performance,implying that the order of the input impacts performance (see Figure S1 inthe Supplementary Material). Further investigation may be needed to explorehow the ordering of the 1-D pixel input might impact performance.The binary tree structure we chose to constrain an ANN to make the k -tree makes several assumptions. Each node of the tree is analogous toa compartment in a dendritic tree, and in biological dendritic trees eachcompartment will receive an exclusive set of inputs. Therefore, we chose notto use convolution or any kind of weight sharing in our model. In addition,the synaptic weights and inter-node weights are real-valued free parameters,however the weights analogous to inter-compartmental axial resistances (Rall,1959; Huys et al., 2006) could only be positive scalar values if biologicallyplausible. Future work to address this would be to constrain the free parameterranges to be completely positive.In this study we used an abstracted model to give us insights into theimpacts of biological constraints and properties. After all, these kinds ofoptimizations are not currently doable in more realistic models of neurons.Using this model, we see how nonlinear dendrites increase a neuron model’stask performance above that of a linear classiﬁer, which serves as a proxyfor models following the point-neuron assumption. Importantly, we see howby repeating the inputs to this dendrite model we can observe a consistentincrease in task performance. These ﬁndings emphasize the importance formodelers to consider both dendrites and synaptic input repetitions.Our results may also be relevant for the ﬁeld of deep learning. The k -treeswe consider are special cases of sparse ANN, wherein there are only 2 inputsto all nodes after the ﬁrst layer. These contrast with randomly-made sparsenetworks or pruned sparse networks (Frankle and Carbin, 2019), becausethey have very severe constraints. It is then surprising that a k -tree couldperform at the level of a comparable FCNN. We would be interested infuture work comparing the performance of binary tree structures, inspired bybiological dendrites, against the performance of less structured sparse ANNswith comparable edge density.This study tests the classiﬁcation performance of a dendrite-completeneuron model and compares it to a model that follows the point-neuronassumption, highlighting the importance of considering branching dendrite14tructure and nonlinearities when modeling neurons. We expand this test toconsider the possibility of repeated synaptic inputs in our model, showingthat the model consistently performs better with more repeated inputs toadditional subtrees. We also see that the sparse network neuron model wedesigned can reach similar performance to a comparable densely connectednetwork. Fundamentally, this study is a foray into directly considering aneuron’s computational capability by training the model to perform complextasks using deep learning methodology, which promises to further our insightsinto single neuron computation. I would like to thank the members of the Kording Lab, speciﬁcally RoozbehFarhoodi, Ari Benjamin, and David Rolnick for their help over the developmentof this project.

The code for this project can be found at the following github repository:https://github.com/ilennaj/ktree 15 eferences

H. Agmon-Snir, C. E. Carr, and J. Rinzel. The role of dendrites in auditorycoincidence detection.

Nature , 393:268–272, 1998.M. B. Ahrens, Q. J. M. Huys, and L. Paninski. Large-scale biophysical parameterestimation in single neurons via constrained linear regression.

Advances inNeural Information Processing Systems , 2006. ISSN 10495258. doi: 10.1007/s00439-005-0104-y. URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.184.2712{&}rep=rep1{&}type=pdf .S. D. Antic, W. L. Zhou, A. R. Moore, S. M. Short, and K. D. Ikonomu. Thedecade of the dendritic NMDA spike.

Journal of Neuroscience Research , 88(14):2991–3001, 2010. ISSN 03604012. doi: 10.1002/jnr.22444.H. B. Barlow and W. R. Levick. The mechanism of directionally selective units inrabbit’s retina.

The Journal of Physiology , 178(3):477–504, 1965. ISSN 14697793.doi: 10.1113/jphysiol.1965.sp007638.T. V. P. Bliss and T. Lomo. Long-lasting potentiation of synaptic transmissionin the dentate area of the unanaesthetized rabbit following stimulation of theperforant path.

The Journal of Physiology , 232(2):357–374, 1973. ISSN 14697793.doi: 10.1113/jphysiol.1973.sp010274.T. Branco and M. Häusser. Synaptic Integration Gradients in Single CorticalPyramidal Cell Dendrites.

Neuron , 69(5):885–892, 2011. ISSN 08966273. doi:10.1016/j.neuron.2011.02.006.R. Brette, B. Fontaine, A. K. Magnusson, C. Rossant, J. Platkiewicz, and D. F. M.Goodman. Fitting Neuron Models to Spike Trains.

Frontiers in Neuroscience , 5(February):1–8, 2011. doi: 10.3389/fnins.2011.00009.T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha.Deep Learning for Classical Japanese Literature.

Advances in Neural InformationProcessing Systems , pages 1–8, 2018. doi: 10.20676/00000341. URL http://arxiv.org/abs/1812.01718{%}0Ahttp://dx.doi.org/10.20676/00000341 .G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik. EMNIST: Extending MNISTto handwritten letters.

Proceedings of the International Joint Conference onNeural Networks , 2017-May:2921–2926, 2017. doi: 10.1109/IJCNN.2017.7966217.J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of DeepBidirectional Transformers for Language Understanding. arXiv preprint , 2018.URL http://arxiv.org/abs/1810.04805 . . Dreiseitl and L. Ohno-Machado. Logistic regression and artiﬁcial neural networkclassiﬁcation models: A methodology review. Journal of Biomedical Informatics ,35(5-6):352–359, 2002. ISSN 15320464. doi: 10.1016/S1532-0464(03)00034-0.R. FitzHugh. Impulses and Physiological States in Theoretical Models of NerveMembrane.

Biophysical Journal , 1(6):445–466, 1961. ISSN 00063495. doi: 10.1016/S0006-3495(61)86902-6. URL http://dx.doi.org/10.1016/S0006-3495(61)86902-6 .J. Frankle and M. Carbin. The lottery ticket hypothesis: Finding sparse, trainableneural networks. , pages 1–42, 2019.W. Gerstner and R. Naud. How good are neuron models?

Science , 326(5951):379–380, 2009. ISSN 00368075. doi: 10.1126/science.1181936.A. Gidon, T. A. Zolnik, P. Fidzinski, F. Bolduan, A. Papoutsi, P. Poirazi,M. Holtkamp, I. Vida, and M. E. Larkum. Dendritic action potentials andcomputation in human layer 2/3 cortical neurons.

Science (New York, N.Y.) ,367(6473):83–87, 2020. ISSN 1095-9203. doi: 10.1126/science.aax6239. URL .I. Goodfellow, Y. Bengio, and A. Courville.

Deep Learning . MIT Press, 2016. URL .I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet. Multi-digitnumber recognition from street view imagery using deep convolutional neuralnetworks. , pages 1–13, 2014.N. W. Gouwens, J. Berg, D. Feng, S. A. Sorensen, H. Zeng, M. J. Hawrylycz,C. Koch, and A. Arkhipov. Systematic generation of biophysically detailedmodels for diverse cortical neuron types.

Nature Communications , 9(1), 2018.ISSN 20411723. doi: 10.1038/s41467-017-02718-3. URL http://dx.doi.org/10.1038/s41467-017-02718-3 .T. Hastie, R. Tibshirani, and J. Friedman.

The elements of statistical learning .Springer-Verlag, 2001.E. Hay, S. Hill, F. Schürmann, H. Markram, and I. Segev. Models of neocorticallayer 5b pyramidal cells capturing a wide range of dendritic and perisomaticactive properties.

PLoS Computational Biology , 7(7), 2011. ISSN 1553734X. doi:10.1371/journal.pcbi.1002107. . L. Hines and N. T. Carnevale. The NEURON simulation environment. Neuralcomputation , 9(6):1179–209, 1997. ISSN 0899-7667. URL .Hodgkin and Huxley. A quantitative description of membrane current and itsapplication to conduction and excitation in nerve.

J Physiology , 1117:500–544,1952. ISSN 09237984. doi: 10.1080/00062278.1939.10600645.B. Huval, T. Wang, S. Tandon, J. Kiske, W. Song, J. Pazhayampallil, M. Andriluka,P. Rajpurkar, T. Migimatsu, R. Cheng-Yue, F. Mujica, A. Coates, and A. Y. Ng.An Empirical Evaluation of Deep Learning on Highway Driving. arXiv preprint ,pages 1–7, 2015. URL http://arxiv.org/abs/1504.01716 .Q. J. M. Huys, M. B. Ahrens, and L. Paninski. Eﬃcient Estimation of DetailedSingle-Neuron Models.

Journal of Neurophysiology , 96(2):872–890, 2006. ISSN0022-3077. doi: 10.1152/jn.00079.2006.T. A. Jones, A. Y. Klintsova, V. L. Kilman, A. M. Sirevaag, and W. T. Greenough.Induction of multiple synapses by experience in the visual cortex of adult rats.

Neurobiology of Learning and Memory , 68(1):13–20, 1997. ISSN 10747427. doi:10.1006/nlme.1997.3774.A. E. Kincaid, T. Zheng, and C. J. Wilson. Connectivity and convergence of singlecorticostriatal axons.

Journal of Neuroscience , 18(12):4722–4731, 1998. ISSN02706474. doi: 10.1523/jneurosci.18-12-04722.1998.C. Koch, T. Poggio, and V. Torre. Nonlinear interactions in a dendritic tree:Localization, timing, and role in information processing.

Proceedings of theNational Academy of Sciences of the United States of America , 80(May):2799–2802, 1983.A. Krizhevsky. Learning multiple layers of features from tiny images.

ArXiv , 2009.ISSN 00012475.A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet Classiﬁcation with DeepConvolutional Neural Networks.

Advances in Neural Information ProcessingSystems , 8:713–772, 2006. doi: 10.1016/B978-008046518-0.00119-7.Y. Lecun, L. Bottou, Y. Bengio, and P. Haﬀner. Gradient-based learning applied todocument recognition. proc. of the IEEE , 1998. URL http://ieeexplore.ieee.org/document/726791/{ . . Lecun, Y. Bengio, and G. Hinton. Deep learning. Nature , 521(7553):436–444,2015. ISSN 14764687. doi: 10.1038/nature14539.R. Legenstein and W. Maass. Branch-speciﬁc plasticity enables self-organizationof nonlinear computation in single neurons.

Journal of Neuroscience , 31(30):10787–10802, 2011. ISSN 02706474. doi: 10.1523/JNEUROSCI.5684-10.2011.M. London and M. Häusser. Dendritic Computation.

Annual Review of Neuroscience ,28(1):503–532, 2005. ISSN 0147-006X. doi: 10.1146/annurev.neuro.28.061604.135703. URL .W. S. McCulloch and W. Pitts. A logical calculus of the ideas immanent in nervousactivity.

The Bulletin of Mathematical Biophysics , 5(4):115–133, 1943. ISSN00074985. doi: 10.1007/BF02478259.B. Mel. Toward a simpliﬁed model of an active dendritic tree. In G. J. Stuart,N. Spruston, and M. Häusser, editors,

Dendrites . Oxford Scholarship Online,2016. ISBN 9780199682676. doi: 10.1093/acprof.B. W. Mel. Synaptic integration in an excitable dendritic tree.

Journal of neurophys-iology , 70(3):1086–101, 1993. ISSN 0022-3077. doi: 10.1152/jn.1993.70.3.1086.URL .V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare,A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie,A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, andD. Hassabis. Human-level control through deep reinforcement learning.

Nature ,518(7540):529–533, 2015. ISSN 14764687. doi: 10.1038/nature14236. URL http://dx.doi.org/10.1038/nature14236 .P. Poirazi, T. Brannon, and B. W. Mel. Pyramidal neuron as two-layer neural net-work.

Neuron , 37(6):989–999, 2003a. ISSN 08966273. doi: 10.1016/S0896-6273(03)00149-1.P. Poirazi, T. Brannon, and B. W. Mel. Arithmetic of subthreshold synapticsummation in a model CA1 pyramidal cell.

Neuron , 37(6):977–987, 2003b. ISSN08966273. doi: 10.1016/S0896-6273(03)00148-X.W. Rall. Physiological Properties of Dendrites.

Annals of the New York Academyof Sciences , 96(4):1071–1092, 1959. . Schiller, G. Major, H. J. Koester, and Y. Schiller. NMDA spikes in basal dendrites. Nature , 1261(1997):285–289, 2000. ISSN 1863-9135. doi: 10.1127/1863-9135/2007/0169-0223.I. Segev. What do dendrites and their synapses tell the neuron?

Journalof Neurophysiology , 95(3):1295–1297, 2006. ISSN 0022-3077. doi: 10.1152/classicessays.00039.2005. URL http://jn.physiology.org/cgi/doi/10.1152/classicessays.00039.2005 .A. Stuchlik. Dynamic learning and memory, synaptic plasticity and neurogenesis:An update.

Frontiers in Behavioral Neuroscience , 8(APR):1–6, 2014. ISSN16625153. doi: 10.3389/fnbeh.2014.00106.N. Toni, P. Buchs, I. Nikonenko, C. R. Bron, and D. Muller. LTP promotes formationof multiple spine synapses between a single axon terminal and a dendrite.

Nature ,402(November):421–425, 1999.K. Travis, K. Ford, and B. Jacobs. Regional dendritic variation in neonatal humancortex: a quantitative golgi study.

Developmental neuroscience , 27(5):277–287,2005.B. B. Ujfalussy, J. K. Makara, T. Branco, and M. Lengyel. Dendritic nonlinearitiesare tuned for eﬃcient spike-based computations in cortical circuits. eLife , 4(DECEMBER2015):1–51, 2015. ISSN 2050084X. doi: 10.7554/eLife.10056.D. E. Wilson, D. E. Whitney, B. Scholl, and D. Fitzpatrick. Orientation selectivityand the functional clustering of synaptic inputs in primary visual cortex.

NatureNeuroscience , 19(8):1003–1009, 2016. ISSN 15461726. doi: 10.1038/nn.4323.H. Xiao, K. Rasul, and R. Vollgraf. Fashion-MNIST: a Novel Image Dataset forBenchmarking Machine Learning Algorithms. arXiv preprint , pages 1–6, 2017.URL http://arxiv.org/abs/1708.07747 .A. M. Zador and B. A. Pearlmutter. VC Dimension of an Integrate-and-Fire NeuronModel.

Proceedings of the ninth annual conference on Computational learningtheory , pages 10–18, 1996.A. M. Zador, B. J. Claiborne, and T. H. Brown. Nonlinear pattern separation insingle hippocampal neurons with active dendritic membrane.

Advances in NeuralInformation Processing Systems , pages 51–58, 1992. Supplementary Material

Figure S1: Permuted and randomized input trials. kk