Multi-function Convolutional Neural Networks for Improving Image Classification Performance
MMulti-function Convolutional Neural Networks forImproving Image Classification Performance
Luna M. Zhang
Abstract
Traditional Convolutional Neural Networks (CNNs) typically use the same activa-tion function (usually ReLU) for all neurons with non-linear mapping operations.For example, the deep convolutional architecture Inception-v4 uses ReLU. Toimprove the classification performance of traditional CNNs, a new "Multi-functionConvolutional Neural Network" (MCNN) is created by using different activa-tion functions for different neurons. For n neurons and m different activationfunctions, there are a total of m n − m MCNNs and only m traditional CNNs.Therefore, the best model is very likely to be chosen from MCNNs because thereare m n − m more MCNNs than traditional CNNs. For performance analysis,two different datasets for two applications (classifying handwritten digits from theMNIST database and classifying brain MRI images into one of the four stages ofAlzheimer’s disease (AD)) are used. For both applications, an activation functionis randomly selected for each layer of a MCNN. For the AD diagnosis applica-tion, MCNNs using a newly created multi-function Inception-v4 architecture areconstructed. Overall, simulations show that MCNNs can outperform traditionalCNNs in terms of multi-class classification accuracy for both applications. Animportant future research work will be to efficiently select the best MCNN from m n − m candidate MCNNs. Current CNN software only provides users withpartial functionality of MCNNs since different layers can use different activationfunctions but not individual neurons in the same layer. Thus, modifying currentCNN software systems such as ResNets, DenseNets, and Dual Path Networksby using multiple activation functions and developing more effective and fasterMCNN software systems and tools would be very useful to solve difficult practicalimage classification problems. Deep learning techniques are very effective for computer vision applications [1-6]. In particular, inrecent years, Convolutional Neural Networks (CNNs) have been successful for image classificationin various important real-world applications such as medical imaging [4], self-driving cars [5], andboard games [6]. For example, CNNs detected skin cancer (about 72.1% accuracy) better thandermatologists did (about 66.0% accuracy) in 2017 [4].In 1979, Fukushima developed the earliest CNN [7]. It had several convolutional and pooling layerssimilar to modern CNNs. In 1998, LeCun et al. developed LeNet, which had outstanding performancefor handwritten digit recognition and zip codes [8]. AlexNet had an architecture that was very similarto LeNet, but it was larger and deeper, and had convolutional layers positioned on top of each other[2]. In 2012, AlexNet was used for the ImageNet ILSVRC competition to significantly outperformthe second runner-up [9].In recent years, many powerful popular CNNs have been created with outstanding performanceresults that include results from important competitions such as the ImageNet Large Scale VisualRecognition Competition (ILSVRC) [9-12] and COCO Object Detection Task Competition [13-14].Some examples are GoogLeNet [15], ResNets [3], DenseNets [16], Dual Path Networks (DPNs) a r X i v : . [ c s . C V ] M a y The purpose of convolution is to extract features from the input image. A filter slides over the inputimage (convolution operation) to produce an output image that is commonly called a feature map ofconvolved features. There can be many convolution layers in a CNN.Suppose we have a W × H × D image, and we apply m filters ( F , F , ..., F m ) of dimension N × N × D that are convolved with the input image. Let a filter have D kernels: D N × N matrices w k where k = 1 , , ..., D . Let x be a N × N × D patch. Let b n be the bias of a filter F n .A convolved feature of a feature map among m feature maps after applying a filter F n for n =1 , , ..., m is: θ n = D (cid:88) k =1 N (cid:88) i =1 N (cid:88) j =1 w nijk x ijk + b n . Similarly, the rest of the convolved features are calculated using the above formula.
Seven typical activation functions (with their 3-letter abbreviations for later reference) are Linear(LIN), Rectifier (REL), Sigmoid (SIG), Hyperbolic Tangent (TAN), Softplus (PLS), Softsign (SGN),and Exponential Linear Unit (ELU).A typical CNN uses a ReLU layer right after each convolutional layer (CONV) and before a poolinglayer (POOL) to generate m new non-linear feature maps from the m feature maps. For example, atypical CNN architecture is: INPUT → [[CONV → REL] * M → POOL] * N → FCL → OUTPUT.Let * indicate repetition and M (cid:62) (cid:62)
1. The same activation function (REL) is used to map allconvolved features to new features. However, different activation functions may be used to achievebetter classification accuracy.An activation layer (AL) is defined as a layer of neurons where each neuron uses an activationfunction selected from a set of different activation functions. An AL transforms m feature maps tonew m feature maps. The convolved feature θ n is transformed to a new feature, which is f ( θ n ) , byan activation function f . Different activation functions can be used for different individual convolvedfeatures and different feature sub-maps.For example, let AL(REL, SIG) mean that in the same AL, some neurons use REL and the others useSIG. A new MCNN uses an AL right after each CONV and before POOL.2et a convolution block (CB) be a CONV followed by an AL. For example, a MCNN’s CB is [CONV → AL(REL, SIG)]. The traditional CNN’s CB is [CONV → REL], which is equivalent to the newlydefined notation: [CONV → AL(REL)]. AL(REL) means that all neurons on the AL use REL.An example of a MCNN architecture is INPUT → [CONV → AL(REL, SIG)] → POOL → [CONV → AL(REL, SIG, TAN)] → [CONV → AL(LIN)] → POOL → [CONV → AL(PLS, SIG)] → [CONV → AL(SGN, ELU)] → [CONV → AL(TAN, ELU, LIN)] → POOL → FCL → OUTPUT.Another example for a MCNN architecture is INPUT → [CONV → AL(REL)] → POOL → [CONV → AL(SIG)] → [CONV → AL(TAN)] → POOL → FCL → OUTPUT.
Inception-v4 is an effective CNN architecture. For example, an ensemble system of three ResNets andone Inception-v4 network achieved 3.08% top-5 error on the test set of the ImageNet classificationchallenge [18]. A CB using REL is a basic building block for constructing an Inception-v4 network.For example, Fig. 1 shows that every CB of an Inception-A architecture uses the same activationfunction (REL). Figure 1: An Inception-A architecture based on Inception-v4 [18].The newly created multi-function Inception-v4 consists of input, multi-function stem, 4 × multi-function Inception-A, multi-function Reduction-A, 7 × multi-function Inception-B, multi-functionReduction-B, 3 × multi-function Inception-C, Max Pooling, Dropout(0.8), and Softmax. It does nothave FCLs. For example, Fig. 2 shows that different CBs of a multi-function Inception-A architectureuse 7 different activation functions.Figure 2: A multi-function Inception-A architecture based on Inception-v4 [18].3et a convolutional path be a series of connected max pooling and/or CBs starting from a filterconcatenation stage to the next filter concatenation stage. A simple multi-function Inception-A isshown in Fig. 3, which has four convolutional paths (Path 1: Max Pooling → × × × → × × → × → × Let there be a total of m AL neurons (each neuron uses an activation function selected from a set of j activation functions) and n fully-connected (FC) hidden neurons (each neuron uses an activationfunction selected from a set of k activation functions). For example, three FC hidden neurons inthe first FC hidden layer may use SIG, LIN, and REL, and a FC hidden neuron in the third FChidden layer may use TAN. Let a traditional CNN, which uses a single activation function for allneurons, be called a "Single-function CNN" (SCNN). Let the four types of general CNNs be denotedas SCNN-SS, MCNN-SM, MCNN-MS, and MCNN-MM, and for each type, the total numbers ofCNNs based on different function combinations are given in Table 1. Let "multi-function" mean thatat least two different activation functions are used for different neurons. Let "single-function" meanthat only one activation function is used for all neurons. Note that all neurons in a SCNN-SS shouldnot all use LIN since the entire transformation from the input to the final output should be non-linear.Table 1: Four types of general CNNsCNN Type AL Neurons FC Hidden Neurons No. CNNsSCNN-SS single-function single-function jk MCNN-SM single-function multi-function j( k n -k) MCNN-MS multi-function single-function ( j m -j)k MCNN-MM multi-function multi-function ( j m -j)( k n -k) Table 2 shows the case for CNNs without FCLs. Let the two types of general CNNs without FCLs bedenoted as SCNN-S and MCNN-M.Table 2: Two types of general CNNs without FCLsCNN Type AL Neurons No. CNNsSCNN-S single-function j MCNN-M multi-function j m -j .5 New general algorithms for creating MCNNs The following four general algorithms are developed to create the four types of MCNNs: MCNN-SM,MCNN-MS, MCNN-MM, and MCNN-M.Algorithm 1 - Creating a MCNN-SM: 1) Build CBs where all AL neurons use the same activationfunction. 2) Build FC hidden layers where FC hidden neurons use different activation functions. 3)Build a complete MCNN-SM using the built CBs and FC hidden layers.Algorithm 2 - Creating a MCNN-MS: 1) Build CBs where all AL neurons use different activationfunctions. 2) Build FC hidden layers where FC hidden neurons use the same activation function. 3)Build a complete MCNN-MS using the built CBs and FC hidden layers.Algorithm 3 - Creating a MCNN-MM: 1) Build CBs where all AL neurons use different activationfunctions. 2) Build FC hidden layers where FC hidden neurons use different activation functions. 3)Build a complete MCNN-MM using the built CBs and FC hidden layers.Algorithm 4 - Creating a MCNN-M: 1) Build CBs where all AL neurons use different activationfunctions. 2) Build a complete MCNN-M using the built CBs.
SCNNs and MCNNs with both ALs and FCLs are created and tested for the first application(classifying ten handwritten digits from the MNIST database). SCNNs and MCNNs with only ALsare also created and tested for the second application (classifying brain MRI images into one of thefour stages of Alzheimer’s disease (AD)). Python scripts were written using Keras and scikit-learn.
The MNIST dataset is a well-known classic benchmark dataset for classification algorithms incomputer vision [21-22]. There are 42,000 training images and 28,000 testing images. 3 to 8 FChidden layers were used, and 32, 64, or 128 neurons were used in each FC hidden layer. Six activationfunctions were used for ALs and FC hidden layers: LIN, REL, SIG, TAN, PLS, and SGN. For allCNNs tested, the output layer used the softmax function. Current CNN software limitations includean activation function being unable to be chosen for a specific neuron, so all neurons in the sameAL or FC hidden layer must use the same activation function. SCNN-SS, MCNN-SM, MCNN-MS,and MCNN-MM models were compared in terms of test accuracy for the 10-class classification. Forcomparison, all MCNN models are bolded, and the models are ordered by decreasing test accuracy inTables 3-8. For example, for any table, the first row represents the CNN with the highest test accuracyand the last row represents the CNN with the lowest test accuracy. Let “HL1” denote the first FChidden layer, “HL2” denote the second FC hidden layer, etc. Let “AL1” denote the first activationlayer, “AL2” denote the second activation layer, etc.
Tables 3-6 show results for MCNN-SM and SCNN-SS models, for 5 to 8 FC hidden layers and 3ALs (AL1 - 64 neurons, AL2 - 128 neurons, AL3 - 128 neurons) using REL. For example, in Table 3,for the first MCNN-SM model (REL-LIN-REL-LIN-REL), all neurons in HL1 use REL, all neuronsin HL2 use LIN, all neurons in HL3 use REL, all neurons in HL4 use LIN, and all neurons in HL5use REL.
Table 7 shows results for MCNN-MS and SCNN-SS models, for 3 ALs and 4 FC hidden layers (no.neurons: AL1-64, AL2-128, AL3-128, HL1-64, HL2-64, HL3-32, HL4-64).5able 3: Performance of MCNN-SM and SCNN-SS models with 5 FC hidden layersHL1 HL2 HL3 HL4 HL5 Training Accuracy Test Accuracy
REL LIN REL LIN REL 99.940% 99.443%
REL REL REL REL REL 99.940% 99.386%
REL PLS SIG TAN REL 99.910% 99.371%
LIN LIN LIN LIN LIN 99.940% 99.357%
REL REL LIN LIN LIN 99.910% 99.329%REL LIN REL LIN LIN 99.950% 99.300%
PLS PLS PLS PLS PLS 99.940% 99.229%TAN TAN TAN TAN TAN 99.750% 99.157%
REL SIG SOF PLS TAN 99.940% 99.157%
SIG SIG SIG SIG SIG 99.940% 99.114%Table 4: Performance of MCNN-SM and SCNN-SS models with 6 FC hidden layersHL1 HL2 HL3 HL4 HL5 HL6 Training Accuracy Test Accuracy
REL LIN REL LIN REL PLS 99.960% 99.543%REL LIN REL LIN REL LIN 99.940% 99.443%PLS LIN REL LIN REL PLS 99.900% 99.414%REL LIN PLS LIN PLS PLS 99.970% 99.400%
PLS PLS PLS PLS PLS PLS 99.910% 99.386%REL REL REL REL REL REL 99.950% 99.371%LIN LIN LIN LIN LIN LIN 99.950% 99.186%Table 5: Performance of MCNN-SM and SCNN-SS models with 7 FC hidden layersHL1 HL2 HL3 HL4 HL5 HL6 HL7 Training Accuracy Test Accuracy
LIN LIN PLS LIN LIN REL REL 99.962% 99.400%LIN LIN LIN PLS REL REL REL 99.971% 99.314%
REL REL REL REL REL REL REL 99.971% 99.286%
LIN LIN PLS PLS PLS REL REL 99.952% 99.286%REL LIN REL LIN REL LIN REL 99.964% 99.257%
LIN LIN LIN LIN LIN LIN LIN 99.969%, 99.143%PLS PLS PLS PLS PLS PLS PLS 99.950% 98.986%Table 6: Performance of MCNN-SM and SCNN-SS models with 8 FC hidden layersHL1 HL2 HL3 HL4 HL5 HL6 HL7 HL8 Training Acc. Test Acc.
REL REL LIN REL REL REL LIN REL 99.920% 99.500%REL LIN PLS LIN REL REL PLS REL 99.920% 99.457%REL PLS LIN REL REL REL REL REL 99.930% 99.429%REL LIN PLS LIN PLS LIN REL REL 99.990% 99.400%REL REL LIN REL LIN REL LIN REL 99.980% 99.371%
REL REL REL REL REL REL REL REL 99.962% 99.357%PLS PLS PLS PLS PLS PLS PLS PLS 99.962% 99.243%LIN LIN LIN LIN LIN LIN LIN LIN 99.943% 99.186%
Table 8 shows results for MCNN-MM and SCNN-SS models, for 2 ALs and 4 FC hidden layers (no.neurons: AL1-64, AL2-128, HL1-64, HL2-64, HL3-32, HL4-64).6able 7: Performance of MCNN-MS and SCNN-SS models with 3 ALs and 4 FC hidden layersAL1 AL2 AL3 HL1-HL4 Training Accuracy Test Accuracy
REL SGN REL SGN 99.950% 99.400%REL TAN LIN PLS 99.957% 99.357%REL SGN REL REL 99.955% 99.342%REL LIN REL LIN 99.967% 99.328%
REL REL REL REL 99.950% 99.300%REL REL REL SIG 99.955% 99.286%REL REL REL TAN 99.957% 99.286%
LIN LIN REL REL 99.976% 99.285%
Table 8: Performance of MCNN-MM and SCNN-SS models with 2 ALs and 4 FC hidden layersAL1 AL2 HL1 HL2 HL3 HL4 Training Accuracy Test Accuracy
LIN REL PLS REL LIN PLS 99.302% 98.957%LIN REL LIN SIG REL PLS 99.414% 98.928%
REL REL PLS PLS PLS PLS 99.429% 98.871%
LIN REL LIN SIG SGN REL 99.474% 98.857%
REL REL REL REL REL REL 99.317% 98.842%REL REL LIN LIN LIN LIN 99.510% 98.828%
LIN REL LIN SIG REL SGN 99.317% 98.814%3.1.4 Performance analysis
For the simulations, a small number of MCNN models were trained, and there was an average of 3MCNN models better than the best SCNN models based on Tables 3-8. Therefore, it is relativelyeasy to find a MCNN model with better performance than a SCNN model, so it would be feasible toperform MCNN model selection to quickly identify the best MCNN model.
Comparison between MCNN-SM and SCNN-SS
Overall, there were 2 (out of 6) best-rankedMCNN-SM models that got higher training accuracies than the best-ranked SCNN-SS models.SCNN-SS using REL is ranked 4.25 on average based on Tables 3-6. For one case shown in Table 4,SCNN-SS using PLS is better than SCNN-SS using REL in terms of test accuracy. Thus, a CNNusing a popular REL is not always optimal. For both training accuracies and test accuracies in Tables3-6, the best-ranked MCNN-SM models consistently placed 1st whereas the best-ranked SCNN-SSmodels did not. Overall, all 6 best-ranked MCNN-SM models got higher test accuracies than thebest-ranked SCNN-SS models. There are test accuracy improvements of the best-ranked MCNN-SMmodel over the best-ranked SCNN-SS model for every table. The differences in the test accuraciesrange from approximately 0.01% to 0.16%.
Comparison between MCNN-MS and SCNN-SS
For testing, the best-ranked MCNN-MS modelis more accurate than the best-ranked SCNN-SS model by 0.1%. The top 4 are MCNN models, and aSCNN-SS model using REL is ranked 5th as shown in Table 7.
Comparison between MCNN-MM and SCNN-SS
For testing, the best-ranked MCNN-MMmodel is more accurate than the best-ranked SCNN-SS model by 0.086%. The top 4 are MCNNmodels, and a SCNN-SS model using REL is ranked 5th as shown in Table 8.
An AD dataset with 436 MRI brain images (with extra data for 20 subjects) [23], which is pre-processed and ready to be used, is used for performance analysis. This dataset has a cross-sectionalcollection of 416 subjects aged 18 to 96. This research work uses all brain MRI images for a 4-classclassification problem to determine the AD stage (non-demented, very mild dementia, mild dementia,or moderate dementia) of a person [23-24]. 7tratified 3-fold cross validation was used to evaluate and compare SCNNs and MCNNs withthe calculations of comprehensive multi-class classification metrics (i.e. training accuracies, testaccuracies, train F1-scores and test F1-scores). An activation function set {REL, SIG, TAN} wasused to build different MCNN-M and SCNN-S models.Inception-v4 was used to build a SCNN-S model using REL. Inception-v4 was modified to use SIGor TAN instead of REL to create a SCNN-S model using SIG or TAN. MCNN-M models were builtby using the multi-function Inception-v4 using convolutional paths as shown in Fig. 3.
In Table 9, the top-ranked MCNN-M model with the multi-function Inception-v4 (MCNN-M model
From Tables 9-12, MCNN-M models can outperform SCNN-S models since the best MCNN-Mmodels achieve better test accuracies than the best SCNN-S models for three cases, the two bestMCNN-M models have the same test accuracy as the best SCNN-S model for one case, the bestMCNN-M model has higher train F1-scores and test F1-scores than the best SCNN-S model for allfour cases. Best MCNN-M models are better than SCNN-S models using REL based on Tables 9-12.For one case shown in Table 10, a SCNN-S model using SIG is better than a SCNN-S model usingREL. Thus, a SCNN model with Inception-v4 using the commonly used REL is not always the best.
A MCNN with a more variety of different activation functions can achieve better performance than aSCNN. Simulation results show that all of the tested MCNN-SM, MCNN-MS, MCNN-MM, andMCNN-M models performed better than all of the tested SCNN models (except for one case whichwas a tie) in terms of image classification test accuracy. Interestingly, many MCNN models and evensome SCNN models using an activation function other than ReLU were able to outperform SCNNmodels using ReLU. Therefore, using the same activation function for all AL neurons and all FChidden neurons may not always be optimal for CNNs. Since there are many more candidate MCNNmodels than SCNN models, MCNN models have much more chances of performing better with anoptimized set of activation function combinations than SCNN models. Although the differences inthe average training and test accuracies for MCNN and SCNN models may not seem significantright now, MCNN models can at least be considered as an alternative variant of traditional CNNs toachieve better performance. Even the smallest difference may eventually become significant. The testaccuracy improvement for MCNN models and the small numbers of MCNN models tested show thatit is feasible and not difficult to find MCNN models that perform better than SCNN models. MCNNmodels have the ability to perform much better than SCNN models especially for various complexapplications with big complex image data.
More MCNN models based on currently powerful architectures such as ResNets, DenseNets, andDPNs will be created and tested. An automatic process for building, training, testing, and optimizingSCNN and MCNN models will be created. Different neurons on each AL and different neurons oneach FC hidden layer may also use different activation functions. Then, there are a total of j m k n -jk different MCNN models based on Table 1. Thus, it is not practical to evaluate all models to find thebest model, so a partial number of models can be created by randomly choosing activation functionsfor all neurons and then evaluated. A general method will be developed and then used to make acomplex multi-function Inception-v4 architecture by assigning an activation function from a set ofactivation functions to each individual AL in each CB.Better and faster algorithms will be developed to efficiently find the best set of different activationfunctions for all neurons of a MCNN to achieve the best performance for various important applica-tions such as medical imaging for cancer detection [4] and brain imaging for mental illness diagnosissuch as autism detection [25]. Developing intelligent high-speed optimization software for identifyingthe best MCNN model for a particular application will be a difficult long-term challenge.9 .2 CNN software enhancement Current CNN software libraries such as Keras and scikit-learn can be used to create MCNN modelsby choosing different activation functions for ALs and FC hidden layers. However, they have thefollowing limitation: an AL or FC hidden layer can only use one activation function for all neurons.Then, these libraries can be improved by allowing different neurons in an AL or FC hidden layer touse different activation functions.Existing CNN software systems based on the single-function deep convolutional architecture suchas Inception-v4 and Inception-ResNet-v2 [18] can be modified by choosing different activationfunctions for different neurons in all ALs and FC hidden layers to create MCNN software systems.Since the MCNN model selection among a large number of candidate MCNNs would take a verylong computational time for big image data, parallel algorithms will be developed to speed up thelarge-scale optimization process.
Acknowledgments
Data were provided in part by OASIS: Cross-Sectional: Principal Investigators: D. Marcus, R,Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910,P20 MH071616, U24 RR021382.
References [1] LeCun, Y., Bengio, Y. & Hinton, G.E. (2015) Deep learning. Nature 521, pp. 436–444.[2] Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012) Imagenet classification with deep convolutional neuralnetworks. In
Advances in Neural Information Processing Systems 25 , pp. 1097–1105. Cambridge, MA: MITPress.[3] He, K., Zhang, X., Ren, S. & Sun, J. (2016) Deep Residual Learning for Image Recognition. In Proceedingsof the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778.[4] Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M. & Thrun, S. (2017) Dermatologist-levelclassification of skin cancer with deep neural networks.
Nature (7639):115–118.[5] Nugraha, B.T, Su, S.-F. & Fahmizal, F. (2017) Towards self-driving car using convolutional neural networkand road lane detector. In Proceedings of the 2nd International Conference on Automation, Cognitive Science,Optics, Micro Electro-Mechanical System, and Information Technology (ICACOMIT), pp. 65–69.[6] Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Driessche, G.V.D., Schrittwieser, J., Antonoglou, I.,Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap,T., Leach, M., Kavukcuoglu, K., Graepel, T. & Hassabis, D. (2016) Mastering the game of Go with deep neuralnetworks and tree search. Nature (529), pp 484–503.[7] Fukushima, K. (1979) Neural network model for a mechanism of pattern recognition unaffected by shift inposition-Neocognitron.
Transactions of the IECE
J62-A (10):658–665.[8] LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998) Gradient-based learning applied to documentrecognition.
Proc. IEEE (11):2278–2324.[9] Large Scale Visual Recognition Challenge 2012 (ILSVRC2012)
Large Scale Visual Recognition Challenge 2014 (ILSVRC2014)
Large Scale Visual Recognition Challenge 2015 (ILSVRC2015)
Large Scale Visual Recognition Challenge 2017 (ILSVRC2017)
COCO 2015 Object Detection Task (2015) [Online.] Available: http://cocodataset.org/
15] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed S., Anguelov D., Erhan, D., Vanhoucke, V. & Rabinovich,A. (2015) Going Deeper with Convolutions. In Proceedings of 2015 IEEE Conference on Computer Vision andPattern Recognition (CVPR), pp. 1–9.[16] Huang, G., Liu Z., Maaten, L.v.d. & Weinberger, K.Q. (2018) Densely Connected Convolutional Networks.[Online]. Available: https://arxiv.org/abs/1608.06993.[17] Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S. & Feng J. (2018) Dual Path Networks. [Online]. Available:https://arxiv.org/abs/1707.01629.[18] Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. (2017) Inception-v4, Inception-ResNet and the Impact ofResidual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence(AAAI-17), pp. 4278–4284.[19] Clevert, D.-A., Unterthiner, T. & Hochreiter, S. (2016) FAST AND ACCURATE DEEPNETWORK LEARNING BY EXPONENTIAL LINEAR UNITS (ELUS). [Online]. Available:https://arxiv.org/abs/1511.07289.[20] Zhang, L.M. (2016) A new multifunctional neural network with high performance and low energy consump-tion. 15th IEEE International Conference on Cognitive Informatics and Cognitive Computing, pp. 101–109.[21] LeCun, Y., Cortes, C. & Burges, C. (n.d.).
MNIST handwritten digit database . [Online]. Available:http://yann.lecun.com/exdb/mnist/.[22]
Digit recognizer
OASIS Brains Datasets