XceptionTime: A Novel Deep Architecture based on Depthwise Separable Convolutions for Hand Gesture Classification
Elahe Rahimian, Soheil Zabihi, Seyed Farokh Atashzar, Amir Asif, Arash Mohammadi
XXceptionTime: A NOVEL DEEP ARCHITECTURE BASED ON DEPTHWISE SEPARABLECONVOLUTIONS FOR HAND GESTURE CLASSIFICATION
Elahe Rahimian † , Soheil Zabihi ‡ , Seyed Farokh Atashzar †† , Amir Asif ‡ , and Arash Mohammadi †† Concordia Institute for Information System Engineering, Concordia University, Montreal, QC, Canada ‡ Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada †† Electrical & Computer Engineering, Mechanical & Aerospace Engineering, New York University, USA
ABSTRACT
Capitalizing on the need for addressing the existing challenges as-sociated with gesture recognition via sparse multichannel surfaceElectromyography (sEMG) signals, the paper proposes a novel deeplearning model, referred to as the XceptionTime architecture. Theproposed innovative XceptionTime is designed by integration ofdepthwise separable convolutions, adaptive average pooling, and anovel non-linear normalization technique. At the heart of the pro-posed architecture is several XceptionTime modules concatenatedin series fashion designed to capture both temporal and spatialinformation-bearing contents of the sparse multichannel sEMG sig-nals without the need for data augmentation and/or manual designof feature extraction. In addition, through integration of adaptive av-erage pooling, Conv1D, and the non-linear normalization approach,XceptionTime is less prone to overfitting, more robust to temporaltranslation of the input, and more importantly is independent fromthe input window size. Finally, by utilizing the depthwise separableconvolutions, the XceptionTime network has far fewer parametersresulting in a less complex network. The performance of Xception-Time is tested on a sub Ninapro dataset, DB1, and the results showeda superior performance in comparison to any existing counterparts.In this regard, . accuracy improvement, on a window size ms , is reported in this paper, for the first time. Index Terms — Surface Electromyography (sEMG), DepthwiseSeparable Convolution, Adaptive Average Pooling
1. INTRODUCTION
Recent evolution in deep learning architectures coupled with ad-vancements in rehabilitation technologies has resulted in a promisingfuture to develop intuitive myoelectric prostheses. The surface Elec-tromyography (sEMG) signals [1–3] derived from the muscle fibers’action potentials, have been used in the literature for hand motionrecognition in advanced myoelectric prostheses. In this regard, ges-ture recognition and classification has attracted a great deal of inter-est of many researchers due to the high potential for improving thequality of control over the actions of prostheses, which can signifi-cantly enhance the quality of lives of hand amputated individuals.The sEMG signals can be collected based on sparse multichan-nel sEMG or in more advanced cases using high-density sEMG (HD-sEMG) devices [4–6]. Multichannel system records electrical activ-ity of muscles through a spatially distributed electrodes over stumpmuscles to extract temporal information regarding muscle activity.Multi-channel recording secures several advantages including theability to obtain large amounts of data from different locations onthe muscles which enhances the sparsity of the information spaceregarding the activities of distributed motor units in the muscles, which potentially allowing for enhancing the quality of classifica-tion. Despite the unique advantages of multichannel recording (suchas high density systems), the multiplied size of the recorded infor-mation space with a high sampling frequency (which can be as highas KHz and is needed for enhancing the fine control) make the pro-cessing computationally demanding, which in turn can add latencyto the processing pipeline challenging the real-time implementation(which is imperative for the control of prosthetic systems).It should be also noted that although the performance of deeplearning algorithms can motivate the use for multichannel electrodespace, applying/training deep models based on signals obtainedfrom sparse multichannel sEMG devices is very challenging as suchdatasets are typically shallow. The paper aims at addressing this gapby designing a novel deep architecture with reduced computationalburden to achieve high accuracy using sparse multichannel sEMGsignals. NinaPro [7, 8] database, which is the most widely acceptedbenchmark for sparse multichannel sEMG signal processing, isutilized to design the proposed novel deep architecture.
Prior Research : A common strategy used for hand gesture recog-nition is to convert the multichannel sEMG recording over fix timewindows into images and then use Convolutional Neural Networks(CNN)-based image classification models [4, 5, 9, 10] to perform therecognition task. The problem with such an approach is that only thespatial information of sEMG signals are captured without consid-ering the sequential nature of the sEMG signals. Motivated by thisfact, Reference [6] proposed a hybrid CNN and Recurrent NeuralNetwork (RNN) architecture where both spatial and temporal fea-tures of the sEMG signals are captured. However, in [6], raw signalsare first converted to images (via six sEMG image representationapproaches) and then fed to the hybrid CNN-RNN architecture. Theresults obtained in [6] show that accuracy in classification dependscritically on the characteristic of the constructed images, revealingthat there is still a major question what is the optimal approach forconverting sEMG signals into images and if this is subject depen-dent [11]. Moreover, in this work, the algorithm proposed in Ref-erence [12] is utilized, which fuses various signal sequences as anactivity image used for training purposes. Although utilization ofthe aforementioned algorithm allows each sEMG sequence to be ad-jacent to all other sequences, this requires readjustment of the inputsignals adding to the complexity of the model. To overcome theseproblems, we have recently [14] developed a new composite archi-tecture to eliminate the need for converting the raw sEMG signalsinto images. Instead, these new approaches directly fed the sEMGsignals into their proposed temporal-convolutional network archi-tectures capitalizing on the time-series nature of the underlying sig-nals. Although the approach have advantages, i.e., there is no needfor readjustment, and the number of parameters is much less than a r X i v : . [ c s . L G ] N ov heir counterparts using RNN modules, high accuracy can only beachieved by using the complete sEMG sequence (a large window ofsEMG sequence). On the other hand, the model in [13] is trainedseparately for each subject limiting its generalization capabilities tobe used as a subject-independent model. Finally, in [15], the authorsextracted classical sEMG feature sets and then combined thesefeatures with a CNN framework. Although this can help with thecomputational expense of the technique, extraction of optimal engi-neered features and construction of optimal classifier are particularlychallenging and can saturate the achievable accuracy in many cases. Contributions : The paper aims to address the above-mentioneddrawbacks of existing solutions capitalizing on the fact that theproblem of recognizing a large set of hand gestures is still far frombeing solved using sparse multichannel sEMG signals, both in termsof the recognition accuracy and the complexity of the system. In thisregard, we aim to design a novel deep-learning model to classify hand movements from raw sparse multichannel sEMG signals,without any additional information (such as in [6]), data augmenta-tion (such as in [13]) or manual design of feature extraction (suchas in [15]). The paper proposes a novel CNN architecture, whichis constructed based on an innovative module, referred to as TheXceptionTime. The algorithm is designed using the concept of theInception Networks [16, 17]. In the proposed architecture, severalXceptionTime modules are deployed to classify the hand gesturerecognition where both temporal and spatial information-bearingcontents of the sparse multichannel sEMG signals are captured.The proposed novel architecture is independent of the window size.This means that by changing the size of the input sequence thereis no need to change/reconfigure the architecture itself (in exist-ing deepnet solutions, this is required due to incorporation of fullyconnected layers within the architecture). To achieve this goal, inthe proposed architecture we employed Adaptive Average Pooling in the classification layer, which is less prone to over-fitting thantraditionally-used fully connected layers [18]. Moreover, a novelmethod for normalization of the input inspired from [19] is pro-posed resulting in better performance both in terms of accuracyand the training speed. Finally, by utilizing the Depthwise Separa-ble Convolutions, our network has far fewer parameters comparedto situation when we use Conv1D convolutions [14], resulting inless complex network. The proposed algorithm is tested on DB1sub-database from NinaPro and an accuracy of . is achievedwhich is significantly superior to its counterparts in the literature onthe same dataset.
2. MATERIAL AND METHODS
In this section, first, the database on which the proposed model isevaluated is described. Then, the pre-processing approach for prepar-ing the data set will be explained.
As stated previously in Section 1, performance of deep learning tech-niques using sparse multichannel sEMG is yet far being optimal interms of (i) Recognition accuracy, (ii) Complexity of the system,and; (iii) Sufficiency of number of subjects and movements. There-fore, the proposed architecture will be evaluated on a public identi-fied scientific benchmark database, Ninapro [7,8], which is the mostwidely accepted benchmark for evaluation of different models devel-oped based on sparse multichannel sEMG signals. The first Ninaprodatabase [7,8], referred to as the DB1, is used in this work, where thesEMG signals are acquired using Otto Bock MyoBock 13E200 with wireless electrodes (channels) at a sampling rate of Hz. TheDB1 consists of 27 intact (healthy) subjects, where each subject hasto repeat gestures including finger, hand, and wrist movements. (a) (b) Fig. 1 : The electrical activity of muscles obtained from sensors: (a) ThesEMG signals before normalization. (b) The sEMG signals after µ -law non-linear normalization. The subjects repeated each gestures times, each time lasted for seconds followed by seconds of rest. For the sake of comparisonand following the recommendations provided by the database andalso previous studies [4, 5, 8, 10], the testing set consists of repeti-tions , , and , where the remaining repetitions are considered asthe training set. Evaluating the proposed model based on the suffi-cient number of subjects and hand gestures, shows its capability togeneralize the results for practical use in daily life. Following the proposed preprocessing procedure in the previousstudies [4,5,8,10], we adopted a st order Hz low-pass Butterworthfilter to preprocess the electrical activities of muscles. However,we develop and propose a new approach for the normalization, re-ferred to as the µ -law normalization, of sEMG signals in a nonlinearfashion based on µ -law transformation [20]. This normalizationapproach has been used traditionally in speech and communicationdomains for quantization purposes. We propose for the first time touse it for normalization in the context of sEMG processing. The µ -law normalization is performed based on the following formulation: F ( x t ) = sign ( x t ) ln (cid:0) µ | x t | (cid:1) ln (cid:0) µ (cid:1) . (1)where x t denotes the input scaler to be normalized, and µ = 256 is utilized. The nonlinear normalization preprocesses the sEMG sig-nals significantly better than linear normalization such as Minmaxnormalization, which is commonly used. In contrary to commonlyused Minmax normalization, which linearly distributed signal val-ues between the pre-defined range, the proposed µ -law normaliza-tion magnifies the outputs of sensors with small magnitude (in a log-arithmic fashion), while keeping the scale with those sensors havinglarger values over time. As an illustrative example, Fig. 1 shows the1Hz low-pass filtered sEMG signals obtained from sensors corre-sponding to the first repetition from Subject performing the secondgesture. As can be observed in Fig. 1(a), except from sensors , ,and , the values of the remaining sensors are close to zero. However,by using the proposed µ -law normalization (as shown in Fig. 1(b)),the outputs of the sensors will be amplified more nonlinearly.
3. THE PROPOSED XceptionTime ARCHITECTURE
In [16], inspired by the Inception V4 architecture, a new deepnetmodel has been recently proposed and named as “InceptionTime”for time series classification. In [16] it is shown that Inception-Time, which is an equivalent of AlexNet for time series data, ismore accurate and faster than its existing counterparts in time se-ries classification. On the other hand, in [17], by replacing theInception modules with depthwise separable convolution, a newarchitecture is designed and named as Xception, which has bet-ter performance than Inception V3 on a large image classification @SeparableConv(s, k, p) : Stands for number of
Depthwise Separable Convolution filters with stride s , kernel_size k , and padding p B : Batch Size C in : Number of Input Channels W : Window Length C out : Number of Output Channels=4×f ChannelConcatf@SeparableConv(1,11,5)f@SeparableConv(1,21,10)f@SeparableConv(1,41,20)MaxPool1D(s=1,k=3,p=1)Bottleneckf@Conv1x1 B×f×W f@Conv1x1B×C in ×W B×C out ×WB×C in ×W B×f×WB×f×WB×f×WB×f×W1 time c hanne l s f@Conv1x1 : Stands for number of Convolution filters with stride 1, kernel_size 1, and padding 0 (a) XceptionTime Module R eL U f = X c ep t i on T i m e M odu l e f = R eL U A dap t i v e A v g P oo l B×512×50 @ C on v x f = X c ep t i on T i m e M odu l e f = BN : Batch Normalization @ C on v x @ C on v x B N B N B NR eL U R eL UR eL U B×52 A dap t i v e A v g P oo l (b) XceptionTime Architecture Fig. 2 : (a) XceptionTime Module , which consists of two parallel paths, the first path includes three Depthwise Separable Convolutions, while the second pathincludes a MaxPooling followed by a Conv × . (b) XceptionTime Architecture , which includes series of XceptionTime modules with residual connectionsfollowed by Adaptive Average Pooling layers and Conv × layers. dataset. Motivated by the prior works [16, 17], we propose a noveldeep architecture, the XceptionTime, which is more accurate thanthe existing model for sparse sEMG-based hand gesture recognition.Furthermore, by deploying adaptive average pooling, the proposedend-to-end XceptionTime architecture is independent of the timewindow, meaning that for utilization of different time windows, e.g., ms, ms, or ms, there is no need to reconfigure and retrainthe XceptionTime model. Besides, by replacing the fully connectedlayers with adaptive average pooling, the proposed XceptionTimemodel is less prone to overfitting because there are no extra param-eters to optimize [18]. By deploying adaptive average pooling, theproposed architecture is more robust to temporal translation of theinputs as the temporal information will sum out.In the following sub-sections, first, the proposed XceptionTimemodule is introduced followed by a description of the XceptionTimearchitecture consisting of stacked XceptionTime modules and adap-tive average pooling layers. One of the challenging tasks in designing CNNs, is selecting theright kernel size, which has an important role in extracting globalor local information. However, inspired by Inception [21], as shownin Fig. 2(a), instead of committing ourselves to pick a filter with aspecific size, we adopt multiple one-dimensional filters with differ-ent kernel sizes to extract short and long time series’ features si-multaneously with the resulted feature maps being concatenated toconstruct the output features. Moreover, for mitigating the computa-tional cost problems, as well as lessening the overfitting problems,the bottleneck layer is used as the first component within the pro-posed XceptionTime Module. In the bottleneck layer, f number ofone-dimensional filters with kernel size one is utilized to transformthe input with C in channels into another time series with f channels.One key difference between the proposed XceptionTime mod-ule and InceptionTime module previously proposed in [16], is de-ploying depthwise separable convolutions, which significantly miti-gates the required number of parameters in the network. In Depth-wise Separable Convolution [22,23], two convolutions are deployed, i.e., the Depthwise Convolution , and the
Pointwise Convolution . InDepthwise Convolution, each channel of the input is convolved sepa-rately and then stacked together; therefore, the temporal convolutionis done without changing the depth. The consequent output from theDepthwise convolution is fed to the Pointwise convolution, where × convolutions are utilized to transform the number input chan-nels from the Depthwise convolution into a new channel depth. Laterin Section 4, it will be shown that by using the depthwise separableconvolutions, not only the recognition accuracy will be increased,but also the number of parameters will be reduced significantly.To summarize, as shown in Fig. 2(a), the time series input with C in number of channels is first fed to two parallel paths. The firstpath consists of a bottleneck, reducing the dimensionality of the in-put, followed by three sets of depthwise separable convolution eachwith f number of filters with kernel size l , where l is set to , ,or . In the second path, the input is fed to a MaxPooling layer fol-lowed by a Conv × component, which produces an output with f channels. Finally, the resulted feature maps of Depthwise SeparableConvolutions and skip connections are concatenated in a channel-wise fashion. As shown in Fig. 2(a), the time series input with C in channels are transformed to output with C out number of channels,where C out is four times that of the number of filters ( f ) used in thebottleneck as well as in the depthwise separable convolutions. The XceptionTime architecture is constructed based on the pro-posed XceptionTime modules described in Sub-section 3.1. Morespecifically, after preprocessing, sEMG signals acquired from sensors are segmented by a window with a length of W ∈{ ms, ms, ms, ms } (it is worth mentioning that W should be under ms to satisfy the acceptable delay time [24]).The sliding window with steps of ms is considered for segmenta-tion of the sEMG signals. The proposed XceptionTime architecture(Fig. 2(b)), includes XceptionTime modules where the numberof filters ( f ) are set to 16, , , and , respectively. Moreover,two residual connections [25] are deployed in the XceptionTimearchitecture to address the degradation problem. Each residual con- able 1 : (First Exp.): Comparison between the proposed XceptionTimemodel and XceptionTime-V2 model. (Second Exp.): Result of the proposedmodel and XceptionTime-V2 when the input is normalized by Minmax. Exp. Normalization Model Accuracy (%) Model Parameters50ms 100ms 150ms 200ms Trial F i r s t μ-law XceptionTime 81.71 87.4 90.76 92.3 95.43 413,516XceptionTime-V2 81.24 86.81 89.79 91.71 94.59 1,918,476 S ec o nd Minmax XceptionTime 71.49 82.63 88.94 90.51 92.2 413,516XceptionTime-V2 68.95 79.56 87.17 89.61 90.08 1,918,476 nection consists of a Conv × layer, to match up the input andoutput dimensions, followed by Batch Normalization, which is forregularization and also to reduce the internal covariate shift ef-fect [26]. Moreover, in order to learn the complex structure of data,Rectified Linear Unit (ReLU) is applied to the summation of outputsof residual connection and the XceptionTime module.As stated previously, one of the novelties of the proposedXceptionTime architecture is the independency of the architecturefrom the length of the time window. In other words, for an arbitrarywindow length, the XceptionTime architecture remains unchangedwithout any need of reconfiguration. To realize this objective, theoutput yielded from summation of the th XceptionTime Moduleand residual Connection is fed to an Adaptive Average Poolinglayer, which transforms the input with window size W to a fixedlength of . Then, the dimension of the time series input will bereduced to the number of the classes (i.e., in our settings) bystacking three Conv × , each followed by Batch Normalizationand ReLU. Finally, a second Adaptive Average Pooling layer is usedto convert the length of the input signal to one. We use Adam opti-mizer for training purpose with learning rate of 0.001. The learningrate changes in a cycle with a length of 20 epochs. After 20 epoch,we divided the learning rate by . These models are trained with amini-batch size of . For measuring the classification performancethe Cross-entropy loss is considered.
4. EXPERIMENTS AND RESULTS
In this section, the performance of the proposed architecture is eval-uated through a comprehensive set of different experiments and pro-vide comparisons with state-of-the-art models [4–6, 10, 13, 15] de-veloped recently on the same dataset to illustrates superior perfor-mance of the proposed XceptionTime over its counterparts. Experiment 1 : In this experiment, referred to as “First Exp.” inthe results, the objective is to validate our claim that by incorpo-ration of Depthwise Separable Convolutions within the proposedXceptionTime architecture, a much smaller model size with signif-icantly reduced complexity will be achieved. For this purpose, weimplemented a variant of the proposed architecture, where referredto as XceptionTime-V2, where standard convolutions are deployedwithin the XceptionTime Module instead of Depthwise SeparableConvolutions. Table 1 shows the results, where it can be observedthat while the accuracy associated with the XceptionTime is slightlybetter than XceptionTime-V2, the number of parameters is signif-icantly reduced. For example, the accuracy for the proposed Xcep-tionTime model for window length ms is . % using , number of parameters, while XceptionTime-V2 archives accuracy of . % but using extensively higher , , of parameters. Experiment 2 : In this experiment, referred to as “Second Exp.”in the results, the objective is to validate the effectiveness of us-ing the proposed non-linear µ -normalization within the proposedXceptionTime architecture. In this regard, in Table 1, results trainedby using Minmax normalization is shown for both variants of theproposed framework. From Table 1, it is observed that the accuracyof the model will decrease when Minmax normalization is applied Table 2 : Accuracy when the proposed XceptionTime Model is trained ona combination of different window lengths (i.e. 50, 100, 150, 200) and thentested on different windows.
Exp. Model Accuracy (%)50ms 100ms 150ms 200ms Trial T h i r d XceptionTime 77.87 87.64 91.81 93.91 95.44
Table 3 : Comparison of the proposed XceptionTimel with the state-of-the-art literature (number of parameters for [4, 5, 10] are reported from [13]).
Accuracy (%) Model Parameters50ms 100ms 150ms 200ms Trial
XceptionTime (First Exp.)
XceptionTime (Third Exp.)
GengNet [4] - - - 77.8 76.1 500,000
WeiNet [5]
HuNet [6] - - 86.8 87 97.3 -
AtzoriNet [10] - - 66.59 - - 85,000
TsinganosNet [13] - - - - 89.76 85,000
WeiNet [15] to the input. For instance, accuracy of the proposed XceptionTimeframework with µ -law normalization in window length of ms is . %, whereas using Minmax normalization within the proposedXceptionTime framework reduces the accuracy to . %. Anotherobservation is that the degradation effect of discarding the proposednonlinear normalization approach on XceptionTime-V2 is higher. Experiment 3 : The third experiment is performed to validate ourclaim that the proposed XceptionTime is applicable to different win-dow sizes without the need for reconfiguration. We evaluate the per-formance when the proposed architecture is trained based on a com-bination of different window sizes. In other words, instead of train-ing the XceptionTime model just with a specific time window (asis done for reporting the results in Table 1), inputs with differentwindow sizes are fed into network to increase the robustness of thenetwork during training. However, for the effectiveness of the train-ing process, only windows with the length of , , , and are used as input. Table 2 illustrates the results obtained from Xcep-tionTime trained with a combination of different window lengths andthen tested separately on each window size. As can be seen, the per-formance of the model, except for time window , is improved incomparison to the case where the model was just trained with a spe-cific window length (Table1(First Exp.)). In other words, not only theproposed model can handle different window sizes simultaneously,by utilizing this property the performance can be boosted. Finally,Table 3 shows performance of the proposed model in comparison tothe state-of-the art results obtained over the same DB1 dataset of hand gestures. As shown in Table3, our architecture outperforms ex-isting solutions while maintaining a reduced number of parameters.
5. CONCLUSION
With the goal of addressing identified shortcomings of existing mod-els for recognition tasks via sparse multichannel surface Electromyo-graphy (sEMG) signals, the paper proposed the novel XceptionTimearchitecture. The proposed innovative XceptionTime is designed byintegration of depthwise separable convolutions, adaptive averagepooling, and a novel non-linear normalization technique. To thebest of our knowledge, it is the first time that the proposed inno-vative XceptionTime architecture is introduced and has not beendesigned/utilized previously in any application. Its performance isevaluated via the benchmark sparse sEMG dataset outperformingany existing counterparts. As an attempt to achieve reproducibility,the code will be released on GitHub. . REFERENCES [1] N. Jiang, S. Dosen, K.R. Muller, D. Farina, “Myoelectric Con-trol of Artificial Limbs-Is There A Need to Change Focus? [inthe spotlight],”
IEEE Signal Process. Mag. , vol. 29, pp. 150-152, 2012.[2] D. Farina, R. Merletti, R.M. Enoka, “The Extraction of NeuralStrategies from the Surface EMG,”
J. Appl. Physiol. , vol. 96,pp. 1486-95, 2004.[3] M. Zia ur Rehman, S. Gilani, A. Waris, I. Niazi, G. Slabaugh,D. Farina, E. Kamavuako, “Stacked Sparse Autoencodersfor EMG-based Classification of Hand Motions: A Compar-ative Multi Day Analyses between Surface and IntramuscularEMG,”
Appl. Sci. , vol. 8, 1126, 2018.[4] W. Geng, Y. Du, W. Jin, W. Wei, Y. Hu, and J. Li, “Gesturerecognition by instantaneous surface EMG images. Scientificreports.,”
Scientific reports 6 , p.36571, 2016.[5] W. Wei, Y.Wong, Y. Du, Y. Hu, M. Kankanhalli, and W. Geng,“A multi-stream convolutional neural network for sEMG-basedgesture recognition in muscle-computer interface.,”
PatternRecognition Letters , 2017.[6] Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, and W. Geng,“A Novel Attention-based Hybrid CNN-RNN Architecture forsEMG-based Gesture Recognition,”
PloS One , vol. 13, no. 10,2018.[7] M. Atzori, A. Gijsberts, I. Kuzborskij, S. Heynen, A.G.MHager, O. Deriaz, C. Castellini, H. Mller, and B. Caputo, “ABenchmark Database for Myoelectric Movement Classifica-tion,”
Transactions on Neural Systems and Rehabilitation En-gineering , 2013.[8] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A.G.M.Hager, S. Elsig, G. Giatsidis, F. Bassetto, and H. Mller,“Electromyography data for non-invasive naturally-controlledrobotic hand prostheses.,”
Scientific data 1 , 140053, 2014.[9] P. Tsinganos, B. Cornelis, J. Cornelis, B. Jansen, and A. Sko-dras, “Deep Learning in EMG-based Gesture Recognition.,”
PhyCS , pp. 107-114, 2018.[10] M. Atzori, M. Cognolato, and H. Mller, “Deep learning withconvolutional neural networks applied to electromyographydata: A resource for the classification of movements for pros-thetic hands,”
Frontiers in neurorobotics 10 , p.9, 2016.[11] P. Tsinganos, B. Cornelis, J. Cornelis, B. Jansen, A. and Sko-dras, “A Hilbert Curve Based Representation of sEMG Signalsfor Gesture Recognition,”
International Conference on Sys-tems, Signals and Image Processing (IWSSIP) , 201-206, 2019.[12] W. Jiang, and Z. Yin, “Human Activity Recognition usingWearable Sensors by Deep Convolutional Neural Networks,”
ACM International Conference on Multimedia , 2015, pp. 1307-1310.[13] P. Tsinganos, B. Cornelis, J. Cornelis, B, Jansen, and A. Sko-dras, “Improved Gesture Recognition Based on sEMG Sig-nals and TCN,”
IEEE International Conference on Acoustics,Speech and Signal Processing (ICASSP) , 2019, pp. 1169-1173.[14] E. Rahimian, S. Zabihi, S. F. Atashzar, A. Asif, A. Moham-madi, “sEMG-Based Hand Gesture Recognition via DilatedConvolutional Neural Networks,”
GlobalSIP , 2019.[15] W. Wei, Q. Dai, Y. Wong, Y. Hu, M. Kankanhalli, and W.Geng, “Surface Electromyography-based Gesture Recognitionby Multi-view Deep Learning,”
IEEE Transactions on Biomed-ical Engineering. , 2019.[16] H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier, D. F. Schmidt,J. Weber, G. I. Webb, L. Idoumghar, P. Muller, F. Petitjean, “In-ceptionTime: Finding AlexNet for Time Series Classification.,” arXiv:1909.04939 , 2019.[17] F. Chollet, “Xception: Deep learning with depthwise separa-ble convolutions.,”
In Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2017, pp.1251-1258.[18] M. Lin, Q. Chen, and S. Yan, “Network in network,” arXivpreprint arXiv:1312.4400 , 2013.[19] A.V.D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals,A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu,“Wavenet: A generative model for raw audio,” arXiv preprintarXiv:1609.03499 , 2016.[20] TU-T. Recommendation G. 711 “Pulse code modulation(PCM) of voice frequencies,”
ITU , 1988.[21] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeperwith convolutions,”
In Proceedings of the IEEE conference oncomputer vision and pattern recognition , 2015, pp. 1-9.[22] L. Sifre, S. and Mallat, S., “Rigid-motion scattering for imageclassification,”
Ph. D. dissertation , 2014.[23] V. Vanhoucke, “Learning visual representations at scale,”
ICLRinvited talk , 2014.[24] B. Hudgins, P. Parker, and R.N. Scott, “A new strategy for mul-tifunction myoelectric control,”
IEEE Transactions on Biomed-ical Engineering , vol. 40, no. 1, p.82-94, 1993.[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learningfor image recognition,”
In Proceedings of the IEEE conferenceon computer vision and pattern recognition , 2016, pp. 770-778.[26] S. Ioffe, and C. Szegedy, “Batch normalization: Accelerat-ing deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167arXiv preprint arXiv:1502.03167