The Yin-Yang dataset
TThe Yin-Yang dataset
L. Kriener , J. G¨oltz , , M. A. Petrovici , Department of Physiology, University of Bern, 3012 Bern, Switzerland. Kirchhoff-Institute for Physics, Heidelberg University, 69120 Heidelberg, Germany.
Abstract
The Yin-Yang dataset was developed for research on bio-logically plausible error backpropagation and deep learningin spiking neural networks. It serves as an alternativeto classic deep learning datasets, especially in algorithm-and model-prototyping scenarios, by providing several ad-vantages. First, it is smaller and therefore faster to learn,thereby being better suited for the deployment on neuromor-phic chips with limited network sizes. Second, it exhibits avery clear gap between the accuracies achievable using shal-low as compared to deep neural networks.
We introduce the Yin-Yang dataset for learning in hierar-chical networks [1]. It is tailored to the requirements ofresearch on biologically plausible error backpropagation al-gorithms, learning in spiking neural networks and deep net-works on neuromorphic hardware. These fields typicallyrequire small but at the same time not trivially solvabledatasets to prototype and test network architectures andlearning algorithms. The datasets commonly used for thispurpose are the MNIST and the fashion-MNIST datasets[2, 3]. However, these require comparatively large networks,considering the size of the visible layer alone. Despite thisostensible difficulty, they can nevertheless be classified withhigh accuracy even by shallow networks. This is problematicbecause training a deep network with an imperfect learningalgorithm can result in performance undistinguishable fromthat of a shallow network. Conversely, a test on the MNISTdataset can fail to reveal the inability of the training algo-rithm to propagate error signals through the network, as theachieved high accuracies obscure the underlying problem.The Yin-Yang dataset can provide an alternative for thesetesting and prototyping scenarios as it is solvable by smallernetworks, contains fewer samples and most importantly ex-hibits a large gap between the accuracies reached by shallowand deep networks. y training set validation set y test set Figure 1: Training, validation and test dataset.
Each dot inthe yin-yang symbol represents one sample of the dataset. The colorof the dot denotes its class (“Yin”, “Yang” or “Dot”). This figurewas generated using the default settings for random seeds and datasetsizes (5000 samples for the training set and 1000 samples each for thevalidation and test set).
Each sample in the dataset represents a point in a two di-mensional representation of the yin-yang symbol. Depend-ing on their location in the symbol the samples are classi-fied into the “Yin”, “Yang” or “Dot” class (Fig. 1). Eventhough the areas in the yin-yang symbol covered by the dif-ferent classes have different sizes, the dataset is designed tobe balanced, which means that all classes are representedby approximately the same amount of samples. Note thattherefore the density of samples is higher in the “Dot”-classregions, as the combined area of these regions is smallerthan that of the others.The samples are randomly generated using rejection sam-pling. The exact version of a generated set of samples istherefore determined by the random seed and dataset size.This makes it possible to produce multiple dataset versionsby providing different random seeds and dataset sizes. Inthe default configuration the training set has 5000 sampleswhile the validation and test sets have 1000 samples respec-tively, each generated with a different random seed.As can be seen in Fig. 1, values of all samples in thedataset are strictly positive. This is the case to accom-modate network models which require positive input valuesonly (common in the field of biologically-plausible networks,1 a r X i v : . [ c s . A I] F e b
100 200 300epoch110100 e rr o r [ % ] train errorsvalidation errorstest error 0.0 0.5 1.0x10.000.250.500.751.00 y e rr o r [ % ] train errorvalidation errortest error 0.0 0.5 1.0x10.000.250.500.751.00 y A BC D
Figure 2: Comparison of training results for a deep versusa shallow network.
Network parameters are given in Table 2. A: Evolution of the validation and training error during training of a deepneural network. The dashed line indicates the test error after 300epochs of training. B: Training result of the deep network illustratedon the test set. Each sample is colored corresponding to the classdetermined by the trained network. C: Evolution of the validationand training error during training of a shallow neural network. D: Training result of the shallow network illustrated on the test set. as firing rates as well as spike times are typically denotedby positive numbers). Because of that the yin-yang sym-bol is not centered around zero. This however complicatestraining in neuron models without intrinsic (learnable) bias.To facilitate training for these models, each sample in thedataset consists not only of the coordinates ( x, y ) determin-ing the position in the yin-yang symbol but additionallyalso the values (1 − x, − y ). This effectively symmetrizesthe input and removes the need for a bias even though theyin-yang symbol is not centered around the origin of thecoordinate system. As a baseline for further applications of this dataset we alsoprovide some training results achieved with classical arti-ficial neural networks. The full set of training parameterscan be found in Table 2. In particular, we compare theperformances of a deep and a shallow network.This comparison (Table 1 and Fig. 2) illustrates a mani-fest advantage of the Yin-Yang dataset compared to othercommonly used datasets of comparable size: The shallownetwork is clearly unable to learn the required featuresto successfully classify the dataset, which leads to a gapof more than 30 % between the accuracies achieved by ashallow and a deep network. This can be crucial in, forexample, research on biologically plausible forms of error- v a li d a t i o n e rr o r [ % ]
30 hidden units100 hidden units400 hidden units
100 200 300 400hidden layer size110 t e s t e rr o r [ % ] A B
Figure 3: Impact of hidden layer size on network perfor-mance. A:
Validation errors during training for three different net-work architectures with different hidden layer sizes. For each architec-ture, ten training runs with different random weight initialization areoverlaid. B: Mean and standard deviation of the final test error de-pending on hidden layer size of the network. The colored data pointscorrespond to the runs shown in A. backpropagation, possibly even in spiking neural networks,as it requires the underlying learning algorithm to able toharness the additional representational power of latent neu-ronal layers.Another advantage of the Yin-Yang dataset over manyother commonly used datasets is the dimensionality of itssamples and the network sizes required to learn the task.Each sample consists of only four input values (comparedto, e.g., the 784 input channels required by MNIST), whichsignificantly reduces the required fan-in for hidden neu-rons. This can be especially beneficial on neuromorphicplatforms, where the number of synaptic connections to aneuron is very often limited by the chip architecture. Also,this dataset can be learned with a single hidden layer of rea-sonably small size (Fig. 3). For consistently high final ac-curacies, a hidden layer of more than a hundred neurons isrequired, but for a small proof-of-concept demonstration ofa learning algorithm or hardware prototype, even 30 hiddenunits are enough to achieve results that would be impossi-ble with shallow networks, or with algorithms that cannotprofit from a network’s representational hierarchy.In addition to the results shown here, the dataset hasalready been used to showcase algorithms for error back-propagation in spiking neural networks in [4] and [5].
Table 1:
Mean and standard deviation of training and test accuracyfor 20 training runs with different random initializations. Trainingparameters can be found in Table 2. network test accuracy training accuracy deep network (98.7 ± ± ± ± able 2: Training parameters used to produce the results in Fig. 2.Fig. 3 uses the same parameters except for the size of the hidden layer. parameter name value activation function ReLUsize input 4size hidden layer (for deep net) 120size output layer 3training epochs 300batch size 20optimizer Adam, [6]Adam parameter β (0 . , . (cid:15) − learning rate 0 . Code and data availability
Code for the Yin-Yang data set is available at https://github.com/lkriener/yin_yang_data_set . The exam-ple notebook in the repository includes the plotting of thedata samples (Fig. 1) and the training of deep and shal-low networks (Fig. 2). Additional data available on requestfrom the authors.
References Yin-Yang dataset repository https : / / github . com /lkriener/yin_yang_data_set . Accessed: 2021-01-20.2. LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P.Gradient-based learning applied to document recogni-tion.
Proceedings of the IEEE arXiv:1708.07747 (2017).4. G¨oltz, J. et al. Fast and deep: energy-efficient neuromor-phic learning with first-spike times. arXiv:1912.11443 (2019).5. Wunderlich, T. C. & Pehle, C. EventProp: Back-propagation for Exact Gradients in Spiking Neural Net-works. arXiv:2009.08378 (2020).6. Kingma, D. P. & Ba, J. Adam: A Method for StochasticOptimization. arXiv:1412.6980 (2014).