[PDF] A Novel Multi-Stage Training Approach for Human Activity Recognition from Multimodal Wearable Sensor Data Using Deep Neural Network

Abstract

Deep neural network is an effective choice to automatically recognize human actions utilizing data from various wearable sensors. These networks automate the process of feature extraction relying completely on data. However, various noises in time series data with complex inter-modal relationships among sensors make this process more complicated. In this paper, we have proposed a novel multi-stage training approach that increases diversity in this feature extraction process to make accurate recognition of actions by combining varieties of features extracted from diverse perspectives. Initially, instead of using single type of transformation, numerous transformations are employed on time series data to obtain variegated representations of the features encoded in raw data. An efficient deep CNN architecture is proposed that can be individually trained to extract features from different transformed spaces. Later, these CNN feature extractors are merged into an optimal architecture finely tuned for optimizing diversified extracted features through a combined training stage or multiple sequential training stages. This approach offers the opportunity to explore the encoded features in raw sensor data utilizing multifarious observation windows with immense scope for efficient selection of features for final convergence. Extensive experimentations have been carried out in three publicly available datasets that provide outstanding performance consistently with average five-fold cross-validation accuracy of 99.29% on UCI HAR database, 99.02% on USC HAR database, and 97.21% on SKODA database outperforming other state-of-the-art approaches.

Full PDF

IIEEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 1

A Novel Multi-Stage Training Approach for HumanActivity Recognition from Multimodal WearableSensor Data Using Deep Neural Network

Tanvir Mahmud,

Student Member, IEEE,

A. Q. M. Sazzad Sayyed,

Student Member, IEEE,

Shaikh Anowarul Fattah,

Senior Member, IEEE, and Sun-Yuan Kung,

Life Fellow, IEEE

Abstract —Deep neural network is an effective choice to auto-matically recognize human actions utilizing data from variouswearable sensors. These networks automate the process of featureextraction relying completely on data. However, various noises intime series data with complex inter-modal relationships amongsensors make this process more complicated. In this paper,we have proposed a novel multi-stage training approach thatincreases diversity in this feature extraction process to makeaccurate recognition of actions by combining varieties of featuresextracted from diverse perspectives. Initially, instead of usingsingle type of transformation, numerous transformations areemployed on time series data to obtain variegated representationsof the features encoded in raw data. An efﬁcient deep CNNarchitecture is proposed that can be individually trained toextract features from different transformed spaces. Later, theseCNN feature extractors are merged into an optimal architectureﬁnely tuned for optimizing diversiﬁed extracted features througha combined training stage or multiple sequential training stages.This approach offers the opportunity to explore the encodedfeatures in raw sensor data utilizing multifarious observationwindows with immense scope for efﬁcient selection of features forﬁnal convergence. Extensive experimentations have been carriedout in three publicly available datasets that provide outstandingperformance consistently with average ﬁve-fold cross-validationaccuracy of . % on UCI HAR database, . % on USCHAR database, and . % on SKODA database outperformingother state-of-the-art approaches. Index Terms —Sensor data processing, feature learning, CNN,activity recognition, multi-stage training.

I. I

NTRODUCTION A CTIVITY recognition using wearable sensors has been atrending topic of research for its widespread applicabilityon diverse domains ranging from health care services tomilitary applications [1]. With the ubiquitous availability ofmodern mobile devices such as smartphones, tablets, andsmartwatches, various types of sensor data are available thatcan be utilized effectively in numerous applications like activ-ity recognition. Various types of sensor data along with imageand video data have been employed for recognizing humanactivity [2]. In this work, we have mainly focused on the timeseries wearable sensor data, e.g. accelerometer, gyroscope, andmagnetometer, as these are easy to obtain even with our smartdevices and can be used to recognize human activity from

T. Mahmud, A. Q. M. S. Sayyed and S. A. Fattah are with the Departmentof Electrical and Electronic Engineering, BUET, Dhaka 1000, Bangladesh(e-mail: [email protected]; [email protected]).S. Y. Kung is with the Department of Electrical Engineering, PrincetonUniversity, USA (e-mail: [email protected]) distant position on real-time basis as these sensors’ data arevery small in volume and easy to share through internet.Large varieties of approaches have been applied to makethe correct recognition ranging from traditional feature-basedapproaches to the end-to-end deep neural network in recenttimes. Numerous hand-crafted feature extraction process withshallow classiﬁers are explored in the literature for utiliz-ing multimodal sensor data in activity recognition [3]–[10].Though these types of handcrafted features perform wellin limited training data scenario, the extraction of effectivefeatures gets very complicated with more number of sensors.Additionally, the process heavily demands domain expertisefor proper selection of features which becomes harder with thepresence of random noises that occurs very often in practicalconditions.To automate the complicated feature extraction process,various types of deep neural networks have been studied inthe literature to recognize human activity from wearable sensordata [11]–[21]. Most of these approaches directly employ thecollected raw sensor data for automated feature extraction us-ing the deep neural network, such as convolutional neural net-work (CNN) [11]–[13], recurrent neural network (RNN) [17],[18], long short term memory (LSTM) network [19], hybridCNN-LSTM network [20], and a more complicated LSTM-CNN-LSTM based hierarchical network [21]. Most of thesenetworks are very deep in structure and therefore, a largeamount of data is required to train them properly. Moreover,due to random noises and perturbations in multi-modal sensordata from different sources, the process gets more intricate tooperate with the raw data directly. Hence, with an increasingnumber of sensors, while having a small amount of data forsome of the activity classes, this problem becomes criticalfor the automated extraction of features from raw sensor datausing deep network that severely affects the performance.In [22]–[25], various approaches have been introduced torepresent the time series data in a modiﬁed space that makesthe feature extraction process easier by reducing the effectsof noise or random variations. These transformations on thetime series sensor data provide more opportunities to explorethe variations of features from different spaces. Though thesetransformations provide efﬁcient representation of some of thefeatures in a different space, some other features may becomecomplicated to extract from that particular space. However,different transformations provide diverse viewpoints to explorethe feature space of raw time series data. Hence, similar to a r X i v : . [ ee ss . SP ] J a n EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 2 (a) (b)Figure 1.

Multiple training stages are utilized to incorporate features from numerous transformed representations of input sensor data. In trainingstage-1, different feature extractors are individually trained to extract features from different transformed spaces. Features extracted from diverserepresentation spaces can be weighted and these weight vectors can be jointly optimized either (a) in a combined additional training stage, or (b)multiple sequential training stages can be utilized to learn the weight vectors in a sequential manner. these studies, depending solely on a single transformed spacefor feature exploration limits the scope of feature extractionthat may result in smaller performance in many circumstances.If features extracted from different transformed spaces canbe incorporated in the ﬁnal decision-making process, it willprovide a more robust opportunity to analyze the informationon raw data. But, the challenging task of integrating effect-ive features from diverse transformed spaces through jointoptimization to reach the optimum performance in activityrecognition is yet to be attempted.In this work, we have proposed a novel multi-stage train-ing methodology to overcome these limitations by efﬁcientlyemploying a multitude of time-series transformations that fa-cilitates the exploration of diversiﬁed feature spaces. Instead ofrelying on a single transformed space, features from numeroustransformed spaces are integrated together. An efﬁcient deepconvolutional neural network architecture is proposed thatcan be separately tuned and optimized as efﬁcient featureextractors from different transformed spaces. These CNNbased automated feature extractors reduce the complexity ofmanual feature selection from numerous transformed spaces.Afterwards, these separately optimized networks operatingon different transformed spaces are combined into a uni-ﬁed architecture utilizing the proposed additional combinedtraining stage or multi-stage sequential training stages wherefeatures extracted from different transformed spaces are se-quentially weighted, optimized, and integrated for the ﬁnalactivity recognition. Hence, different portions of this uniﬁedarchitecture are trained and optimized in several training stagesthat make it possible to optimize with a smaller amount ofavailable training data. Moreover, different types of realisticdata augmentation techniques have been introduced to increasethe variations of the available data. The proposed approachopens scopes for optimization of diversiﬁed features extractedfrom different transformed spaces and makes the process moreresilient from noise and other random perturbations by exploit-ing the advantages provided by numerous representations ofthe raw data. Results of intense experimentations have beenpresented using three publicly available datasets that providevery satisfactory performance compared to other state-of-the-art approaches. II. M

ETHODOLOGY

The proposed multi-stage training approach is representedin Fig. 1. In the ﬁrst stage of training, individual featureextractors operating on different transformed spaces are trainedin parallel with separate classiﬁers. In the literature, vari-eties of feature extractors from time-series data have beenexplored ranging from PCA, ICA, wavelet-based methods tomodern CNN, DNN, LSTM, and numerous deep learningmethods [11], [15]–[17]. To overcome additional complexitiesmainly arising from the difﬁculty of feature selection andoptimization from different diversiﬁed transformed represent-ations of time series data, we have proposed deep CNNarchitectures as feature extractors from different transformeddomains. As it is completely data-driven, deep CNN architec-ture can be trained as an efﬁcient feature extractor from anyrepresentation of data. For joint optimization of multiple trans-formed feature spaces, learning of this ﬁrst training stage istransferred into a uniﬁed structure utilizing another combinedtraining stage (Fig. 1a) or utilizing a number of sequentialtraining stages (Fig. 1b).After completing the ﬁrst stage of training, all the separateclassiﬁers of individual networks are removed. As a result,when input data is fed to the network, representational featuresextracted from different transformed domains utilizing thetrained feature extractors are available which were fed intoseparate classiﬁer units in the ﬁrst training stage. However,the feature quality can be varied with the transformation of theraw data which can be visible by evaluating the performanceof the separate feature extractors in the ﬁrst stage. Hence,in the second and ﬁnal stage of training (Fig. 1a), thesefeature vectors are multiplied by separate weighting vectorsto increase the selectivity of the system. Later, all theseweighted feature vectors are concatenated together and acommon dense classiﬁer unit is trained to provide the exactprediction from these concatenated features. Therefore, theseweighting vectors, along with the combined dense classiﬁerunit, are supposed to learn in this stage of training utilizingthe data again.In Fig. 1b, the proposed multi-stage sequential trainingis shown. In the two-stage training, as described, the ﬁnal

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 3 (a) (b)Figure 2.

Proposed (a) Convolutional Neural Network and (b) Convolutional Neural Network. Tensor dimensions shown after eachoperation are optimized for UCI HAR Database [26]. (a) (b)Figure 3.

Proposed (a) unit residual block and (b) unit residualblock. In (a), ‘l’ denotes length and ‘c’ denotes number of channels ofthe tensor. While in (b), ‘h’ and ‘w’ denote height and width of the tensor, respectively. Additionally, ‘k’ stands for kernel size, ‘f’ forﬁlter number, ‘s’ for strides, and ‘BN’ for batch normalization in theconvolution. second-stage training learns the weighting vector for eachfeature map at the same time with the combined classiﬁer.However, in the multi-stage sequential training, weightingvectors for only two feature vectors, extracted by the featureextractors trained in the previous stage, are learned along withthe combined classiﬁer at a time. In the following stage, theclassiﬁer is removed and the merged weighted feature vectors of these two transformations undergo through similar nextstage of training with one of the remaining feature vectorsrepresenting different transformation. Thus, in each stage ofsequential training, weighted feature vector from an additionaltransformed space is accumulated with the combined featureextractors trained in the previous stage. This method ofsequential training offers additional opportunity to convergeindividual feature representations corresponding to variegatedtransformed spaces to the ﬁnal decision-making process byoptimizing two feature vectors sequentially. Moreover, in deeplearning-based approaches, these weighting vectors applied onseparate feature vectors can be easily integrated by introducinga separate densely connected layer operating on each featurevectors accompanied by different weighting vectors. A. Transformations on Time Series Data

Different types of transformations on time series data havebeen utilized in the proposed approach. These are describedbrieﬂy as below.

1) Gramian Angular Field Transformation (GAF):

Gramian angular ﬁeld transformation maps the elements ofa D time-series data into a D matrix representation. Thisencoding scheme preserves the temporal dependency of theoriginal time series data along the diagonal of the encodedmatrix while the non-diagonal entries essentially representthe correlation between samples [22]. In this transformation, G : R N → R N × N , the input time series, X , is transformedinto polar coordinate ( r, φ ) after normalization. φ i = cos − ( x i ) , − ≤ x i ≤ , x i (cid:15) X (1) r i = t i N , t i (cid:15) N (2)Here, t i the time stamp and N is a constant factor to regularizethe span of the polar coordinate system. These polar anglesare utilized to get the ﬁnal transformed matrix G , which is, G i,j = cos ( φ i + φ j ) , i, j = 1 , , . . . , n (3)

2) Recurrence Plotting:

The recurrence plot portrays the in-herent recurrent behavior of time-series, e.g. irregular cyclicityand periodicity, into a D matrix [23]. This method providesa way to explore the m-dimensional phase space trajectoryof time series data for generating a D representation bysearching points of some trajectories that have returned to theprevious state and is represented by, R i,j = θ ( (cid:15) − || s i − s j || ) , s ( . ) (cid:15) R m , i, j = 1 , , . . . , K (4)where K is the number of considered states s , (cid:15) is athreshold distance, || . || a norm and θ ( . ) is the Heavisidefunction.

3) Scattering Wavelet Transformation:

Scattering wavelettransform offers representational features of the time-seriesdata those are rotation/translation-invariant while remainingstable to deformations. This technique provides the oppor-tunity to extract features from a very small number of data[24]. A mortlet wavelet function, deﬁned as mother wavelet,undergoes through convolution operation with the raw timeseries data while being scaled and rotated, and thus createsdifferent levels of representational features.

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 4 (a) (b)(c) (d) (e)Figure 4.

Schematic representation of the proposed multi-stage sequential training scheme. Here, (a) represents Individual training stage, (b)represents combined training stage where all the pre-trained networks are merged using one additional training stage, and (c), (d), (e) representthe sequential training stages where pre-trained networks are converged sequentially towards the uniﬁed architecture. The tensor dimensions areoptimized for UCI HAR dataset [26].

Let’s consider, W j and U j to be the averaging operationand complex modulus of the averaged signal, respectively, fororder j ( , , . . . , L ) of the scattering coefﬁcients, and thesecoefﬁcients can be described as S j = W j U j S j − ∗ | ψ j | ∗ φ j , (5)where φ j represents the Gaussian low pass ﬁlter and ψ j represents the mortlet wavelet function of order j . Therefore,a scattering representation, S X of time series data, X , isobtained by concatenating the scattering coefﬁcients of adifferent order, S X = [ S X, S X, . . . , S L X ] (6)As multi-channel sensor data collected from numeroussensors have been used in this work, each channel of suchtime-series data is transformed individually using any of thesetransformations, and all such transformed data are stackedtogether maintaining a similar time information in all thechannels. Later, they undergo through the feature extractionprocess utilizing deep neural networks. B. Proposed Deep Neural Network Architectures

For feature extraction and classiﬁcation, two deep CNNarchitectures are proposed, as shown in Fig. 2a and Fig. 2b,optimized to operate in 1D and 2D domain, respectively. Bothof them are very similar to each other, as the objective ofthem is to extract features for activity recognition, with somemodiﬁcations to operate in different domains for handlingdifferent dimensions of data. In general, the proposed CNNarchitecture mainly consists of a CNN base part followed bya top classiﬁer layer. The CNN base part involves a numberof convolution and pooling operations while the top classiﬁerlayer consists of a series of densely connected layers followedby the ﬁnal activation layer to generate activity prediction. Theoperations performed here are discussed below.i) The input 1D time-series data undergo an initial trans-formation operation as discussed above before startingthe convolutional ﬁltering in the deep network.ii) Next, the tensor enters the convolutional base part whereit passes through a series of unit residual block opera-tions to extract deep features from a broad spectrum.Different representations of these unit residual blocks

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 5

Algorithm 1:

Proposed Two-Stage Training Method

Data: training sample X ; training label y actual Result: weight matrices D , F /* Individual training begins */ for i ← to N do Calculate ˆ X i = T i ( X ) ; Randomize D l ,i and F i , for l = [1 , . . . , L ] ; while The training error threshold is unsatisﬁed do Calculate f i = F i ( ˆ X i ) ; Find y pred,i = D L ,i ( D L − ,i ( . . . ( D ,i ( f i )))) ; Find loss L ,i = L ( y pred,i , y actual ) ; Update D l ,i and F i , for l = [1 , . . . , L ] ; end Calculate d i = F i ( ˆ X i ) ; end /* Combined training stage begins */ Randomize D m , for m = 1 , . . . , L (cid:48) ; while The training error threshold is unsatisﬁed do for i ← to N do Set, f i = D ,i ( d i ) ; end Set feature mapping group, f = [ f , f , . . . , f N ] ; Find y pred = D L (cid:48) ( D L (cid:48) − ( . . . ( D ( f )))) ; Find loss L = L ( y pred , y actual ) ; Update D m , for m = 1 , . . . , L (cid:48) ; end F i denotes the CNN base part of i th transform. D ln denotes the l th densely connected layer of n th training stage. T i denotes the i th transformation on raw data. are shown in Fig. 3 with some variations in operationsfor handling D (Fig. 3a) and D (Fig. 3b) data. In theseblocks, the input tensor passes through two different op-erations in parallel and the transformed tensors get addedlater to produce the ﬁnal output tensor. Subsequently, aglobal average pooling operation is performed to extractthe global features from each channel of the trans-formed tensor. This CNN base part extracts effectivetemporal/spatial features through convolutional ﬁlteringand pooling operations required for the ﬁnal decision.iii) After that, the tensor propagates through the top clas-siﬁer block where series of densely connected layersexplore the extracted features of the CNN base partto get higher level of representation with the softmaxactivation layer at the end to merge these representationsinto a speciﬁed class of action.The values of different convolutional kernel sizes, numberof convolutional layers in each unit block, and number ofunit residual blocks are established through experimentationto reach the maximum performance. Shallower networks areprone to underﬁt with the training data while deeper networksare prone to overﬁt. However, the proposed network effectivelyutilizes efﬁcient separable convolutions along with residualoperations to reduce vanishing gradient and overﬁtting issuesfor achieving optimum performance. Algorithm 2:

Proposed Sequential Training Method

In the proposed training method, a number of training stageshave been utilized to combine features from different trans-formed spaces. In Fig. 4, this scheme is represented schem-atically. These optimizations of individually trained featureextractors can be done in two stages or number of sequentialstages. Algorithm 1 and 2 are executed for implementingtwo-stage training scheme, and multi-stage sequential trainingscheme, respectively. Operations performed in different stagesare described below.i)

Individual training stage:

This stage is common forboth two-stage and multi-stage training schemes. In thisstage, separate CNN base parts with associate denseclassiﬁers are trained individually to prepare the CNNbase part as an efﬁcient feature extractor for the respect-ive transformed domain, as shown in Fig. 4a. Here, theidentity transform is also used to incorporate featuresfrom unaltered raw data along with other transforma-tions. However, some of these transformations containmore distinctive features related to the ﬁnal activity

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 6 (a) (b) (c) (d) (e) (f)Figure 5.

Effect of various types of augmentation of the sample data. (a) Raw sample data collected from axis accelerometer, with (b) scaling, (c)jittering, (d) permutation, (e) magnitude warping, and (f) time warping applied on raw data. recognition compared to others that lead to variationsof performance after being trained.ii) Combined training stage:

After the ﬁrst training stage,each CNN base part provides an effective feature vectorfrom its respective transformed domain. An additionalcombined training stage is employed to combine allthese individually trained feature extractors for the pro-posed two-stage training scheme, as shown in Fig. 4b.Though these architectures are similar in structure,for being trained with different representations of thetransformed time series data, their extracted featureswill contain diverse characteristics. In this stage, allindividual top dense classiﬁer blocks trained in the ﬁrststage are removed while all CNN base parts are usedunaltered as they are ﬁnely tuned as efﬁcient featureextractors. Next, a separate densely connected layer isintroduced on top of each CNN base part to reducethe extracted spatial/temporal features into more generalrepresentation. These separate densely connected layersact as the weighting vectors for feature selection fromdifferent transformed domains as introduced in Fig. 1.Here, the number of nodes in these densely connectedlayers are varied for incorporating more features fromthe feature extractors that contain more information forﬁnal classiﬁcation. However, the information quantityof features extracted by individual CNN base parts canbe analyzed by observing their performance in the ﬁrstindividual training stage. Following that, output featurevectors from these densely connected layers are concat-enated and undergo through a combined dense classiﬁerblock. This block consisting series of densely connectedlayers will explore all the extracted features from differ-ent transformed domains as a whole and merge them tothe ﬁnal prediction with the softmax classiﬁer at the end.In this stage, all the newly introduced densely connectedlayers are optimized through further training with datawhile keeping the CNN base part unaltered as efﬁcientfeature extractors, as shown in Fig. 4b.iii)

Sequential training stages:

In the proposed multi-stagesequential training scheme, individually trained featureextractors are optimized and converged in a uniﬁedarchitecture through series of sequential training stages,as shown in Fig. 4c, 4d and 4e. In this approach, two ofthe CNN base units operating on different transformedspaces are optimized together at a time by training anindividual densely connected layer for each of the base units followed by feature concatenation and combineddense classiﬁer unit, as shown in Fig. 4c. Later, thesecombined two feature extractors are considered as anindividual unit and further merged with the next CNNbase part. Similarly, in the next stage, another separ-ate densely connected layers with a combined denseclassiﬁer unit are trained, as shown in Fig. 4d and 4e.Therefore, through each training stage, a new CNNbase part corresponding to another transformation iscombined with the merged feature extractor. Moreover,each such stage merges these base feature extractor unitsby introducing a newly trained densely connected layersfor providing the most optimized features at a whole util-izing all the existing features. As this approach optimizestwo architectures at a time and contributes the mergedarchitecture to the next training stage with the separatedensely connected layers discarding the classiﬁer unit,it provides more opportunity to empirically select thenumber of nodes of the separate dense layers usedfor feature selection and concatenation. Moreover, thenumber of parameters to be trained in a single stage isalso lower compared to the previous combined trainingstage and thus provides more opportunity to extract moregeneral features combining all these feature extractors inthe expense of an increased number of training stages.Additionally, this sequential training approach is highlyscalable that can incorporate a large number of featurespaces. Hence, features from additional space can beeasily integrated into the feature extraction process byutilizing additional training stages with separate featureextractors.

D. Data Augmentation

As imbalance in the dataset makes the training processcomplicated for learning the distribution of minority class,data augmentation is a viable approach to mitigate suchproblems. In this work, we have utilized the combinationof ﬁve techniques that incorporate realistic variations in thedata and make the process more robust [27]. However, allsuch augmentations are applied to the training data leavingthe testing data unaltered for proper evaluation of the pro-posed methods. Jittering simulates the randomness of additivethermal noise and environmental perturbations to the acquiredsensor data while scaling simulates the effect of changing thesensor’s dynamic range. Moreover, in permutation operation,the input time window is divided into several segments and

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 7

Table I C ONFUSION M ATRIX ON

UCI HAR D

ATASET [26]

FOR P ROPOSED T WO - STAGE T RAINING ON A T EST F OLD

Actual PredictedWalk Upstairs Downstairs Sit Stand LayWalk 466

20 5 0 0 0

Upstairs Downstairs

Sit

Stand Lay these segments are randomly permuted to create a modiﬁedwindow to make the trainer robust against the change inthe sequence of steps on a particular activity. In magnitudewarping, a smoothly varying random noise is multiplied withthe original time series signal to warp the magnitude tosimulate some random multiplicative noises that can be presentin the real scenario, while in time warping, the samplinginterval is smoothly varied to introduce variations in the timewindow. In Fig. 5, the individual effect of these augmentationsare shown on raw sample data. To increase the diversity ofthe augmentation process, we have used all ﬁve augmentationtechniques sequentially to generate each augmented sample.Hence, in each sample, the effects of all ﬁve techniquesare present that provide more realistic random variations inthe augmented samples. As there exists an imbalance in thenumber of samples per-class in all three databases used inthis study, the proposed augmentation process is applied in ahigher rate to the minority classes for generating more numberof augmented samples to balance out the training samples peractivity class. Hence, a higher number of synthetic samples aregenerated for the classes with a smaller number of samples.III. R

ESULTS AND D ISCUSSIONS

Three publicly available datasets used for this study aredescribed below. Detailed comparative analysis of the resultsobtained is discussed later.

A. Description of the Database

UCI HAR database [26] contains activities collectedfrom subjects with Hz sampling rate using axisaccelerometer, gyroscope, and magnetometer embedded on asmartphone placed on the waist. USC HAR database [28]contains activities collected from subjects with Hz sampling rate using axis accelerometer and gyroscope.SKODA database [6] contains activities collected from asingle subject in a car maintenance scenario using only a axis accelerometer sampled at Hz.

B. Experimentation

A ﬁve-fold cross-validation scheme is carried out for evalu-ation of the proposed scheme on each database separately. Theperformances of the evaluation metrics obtained from each testfold are averaged to get the ﬁnal values. All the augmentationtechniques were applied to training data only. Adam optimizer(learning rate = . , β = . and β = . ) was Table II C ONFUSION M ATRIX ON

UCI HAR D

ATASET [26]

FOR P ROPOSED M ULTI - STAGE T RAINING ON A T EST F OLD

PredictedActual Walk Upstairs Downstairs Sit Stand LayWalk 478

Upstairs Downstairs

Sit

Stand Lay

Table III A VERAGE C ROSS -V ALIDATION P ERFORMANCE A NALYSIS ON V ARIOUS A CTIVITIES OF

UCI HAR D

ATASET [26]

FOR P ROPOSED T WO -S TAGEAND M ULTI -S TAGE T RAINING

Met-rics Prop.Meth. ClassWalk UpStairs DownStairs Sit Stand LayPrec.(%) 2-Stg.

M-Stg.

Rec.(%) 2-Stg.

M-Stg.

IoU Sc.(%) 2-Stg.

M-Stg.

Figure 6.

Comparison of Average Cross-Validation IoU scores onvarious activities of UCI HAR Database [26] obtained using differenttransformation schemes along with the proposed combined schemes oftwo-stage and multi-stage training. employed for optimization with categorical cross-entropy asloss function ( L ). Keras deep learning library was used withpython programming language for the implementation of theproposed neural networks. The Wilcoxon rank-sum test is usedfor statistical analysis of the average accuracy improvementobtained from the proposed scheme. The accuracies of theproposed schemes are statistically analyzed and the statisticalsigniﬁcance level is set to α = 0 . . The null hypothesisis that no signiﬁcant improvement of average accuracy isachieved using the proposed scheme over the other existingbest performing approaches. EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 8

C. Performance Evaluation

The performance of the optimized networks is evaluatedusing the test data of various datasets. Traditional evaluationmetrics for the multi-class classiﬁcation task, i.e accuracy,precision, recall, and intersection-over-union (IoU) score, areemployed for analyzing the performance. In Tab. I and II,confusion matrices are provided for the proposed two-stageand multi-stage training approach on UCI HAR database [26]on a speciﬁc test fold. Moreover, in Tab. III, the score ofaveraged cross-validation evaluation metrics are provided forboth these training approaches. It is clear that both theseapproaches provide a considerable performance of over in most of these classes that are separated almost perfectly.However, the two-stage method slightly struggles to separatefeatures between walking and ascending upstairs activitiesas these activities contain close inter-relation in the featurespace. But, in the case of multi stage-training, this problem isreduced which signiﬁes the robust optimization capability ofthis method as it can separate features with proximity.In Fig. 6, the average cross-validation IoU score of theoptimized networks on different transformed spaces alongwith the ﬁnal converged networks using both two-stage andmulti-stage training are compared for all the activities. Itis visible that identity transform representing the unalteredraw data provides better performance with more than improvement in most classes compared to other transformedspaces in case of individual training. However, irrespectiveof the performance, all the networks operating on separatetransformed spaces extract features that are signiﬁcantly dif-ferent as they work with diversiﬁed representations of the data.Through optimization of these features, as visible in Fig. 6, theproposed two-stage, and multi-stage training approach providea sharp increase in IoU scores in all the activity classescompared to the individual training stage. However, lowerperforming transformed spaces are de-emphasized through asmaller number of densely connected nodes and with smallerweights generated in the later training stages while merging,as shown in Fig. 4. For example, in two-stage training con-ﬁguration (Fig. 4b) for UCI database, before concatenationof features extracted from multiple transformed spaces, 96densely connected nodes are provided following the identitytransformed feature space while 32 nodes are provided fol-lowing the GAF transformed space as identity transformedfeatures provided . higher average IoU score comparedto the GAF transformed space in the individual training stage.Hence, more number of nodes can be adjusted for emphasizingthe individually better-performing feature space. Despite that,all of the transformed spaces contribute some new and valuableinformation that may be indistinguishable even on other spacethat provides signiﬁcantly better performance. Moreover, thelater training stages are mainly dedicated to extracting the mostdistinguishable features while de-emphasizing the redundantfeatures and thus provides this higher IoU score.Moreover, in multi-stage sequential training, two of thefeature spaces are optimized at a time by integrating an addi-tional feature space to the resultant feature space (Fig. 4(c)-4(e)). It should be noticed that more number of nodes are Table IV N UMBER OF T RAINABLE AND N ON - TRAINABLE P ARAMETERS ON V ARIOUS T RAINING S TAGES FOR P ROPOSED T RAINING S CHEMES IN

UCI HAR D

ATABASE [26]TrainingStage Two Stage Training Multi Stage TrainingTrainable Non-Trainable Trainable Non-TrainableStage-1

Stage-2

82K 8.2M 82K 2.5M

Stage-3 - - 82K 5.4M

Stage-4 - - 82K 8.3MFigure 7.

Average Cross-Validation IoU score in various training stagesof multi-stage sequential training on different databases. provided in the densely connected layer following the featuresspace of respective transformation to emphasize the featuresfrom those space that provided higher performance during theindividual training stage. For example, in Fig 4(c), 144 nodesare provided for the identity transformed feature space while112 nodes for the scattering wavelet transformed space, asfeatures from identity transformed space performed better inthe individual training stage. This manipulation of the numberof nodes is iteratively done to reach maximum performance. Ifthe sequence of optimization is altered, we have to adjust thenumber of nodes that were applied to the extracted features ofdifferent transformed spaces. It is expected that in any combin-ation, the achieved performance will be similar if the numberof nodes for different transformed spaces is properly adjustedwhich ensures the proper exploration of the generated featurespaces. However, in the sequential integration process, wehave integrated individually better-performing feature spacesin early stages with more number of nodes before featureconcatenation.In Tab. IV, the number of trainable and non-trainableparameters on different training stages is presented for theUCI database. As the stage-1 is similar for both these methodsfor training individual networks, all the parameters are setas trainable to train different feature extractors. However, 2Drepresentation of the data requires a larger network comparedto 1D representation to train with an increased size of trans-

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 9

Table V C ONFUSION M ATRIX ON

USC HAR D

ATASET [28]

FOR P ROPOSED T WO - STAGE T RAINING ON A T EST F OLD

PredictedActual WalkingForward Walk.Left Walk.Right Walk.Upst. Walk.Downst. Running Jumping Sitting Standing Sleeping InElevatorW. Forward 1584

W. Left W. Right

W. Upstairs

W. Down.

Running

Jumping

Sitting

Standing

Sleeping In Elevator

Table VI C ONFUSION M ATRIX ON

USC HAR D

ATASET [28]

FOR P ROPOSED M ULTI - STAGE T RAINING ON A T EST F OLD

PredictedActual WalkingForward Walk.Left Walk.Right WalkUpst. Walk.Downst. Running Jumping Sitting Standing Sleeping InElevatorW. Forward 1590

W. Left W. Right

W. Upstairs

W. Down.

Running

Jumping

Sitting

Standing

Sleeping In Elevator

Table VII A VERAGE C ROSS -V ALIDATION P ERFORMANCE A NALYSIS ON V ARIOUS A CTIVITIES OF

USC HAR D

ATASET [28]

FOR T WO -S TAGE AND M ULTI -S TAGE T RAINING

Class Two stageTraining Multi StageTrainingPrec.(%) Rec.(%) IoU(%) Prec.(%) Rec.(%) IoU(%)Walking Forward

Walking Left

Walking Right

Walking Upstairs

Walking Down

Running

Jumping

Sitting

Standing

Sleeping

100 99.5 99.6 100 99.7 99.7

In Elevator formed data. In later training stages, most of the parametersare set as non-trainable to utilize already trained deep featureextractors while some of the trainable parameters are intro-duced to merge different networks. Though the number ofparameters increases in higher training stages, the numberof trainable parameters on a single training stage is muchsmaller that makes the network resilient against overﬁttingwith the training data. In multi-stage training, different featureextractors are merged in multiple stages and thus non-trainableparameters are increased in steps while in two-stage training,all four feature extractors are combined in one stage resultingin a larger number of non-trainable parameters in stage-2.In Tab. V and VI, confusion matrices for the USC HARdataset are provided for two-stage and multi-stage training

Table VIII A VERAGE C ROSS -V ALIDATION P ERFORMANCE A NALYSIS ON V ARIOUS A CTIVITIES OF

SKODA D

ATASET [6]

FOR T WO -S TAGE AND M ULTI -S TAGE T RAINING

Class Two stageTraining Multi StageTrainingPrec.(%) Rec.(%) IoU(%) Prec.(%) Rec.(%) IoU(%)Null

Write on notepad

Open hood

Close hood

Check gaps on front door

Open left front door

Close left front door

Check trunk gaps

Open and close trunks

Check steering wheel method, respectively. Moreover, in Tab. VII, the average cross-validation performance of the proposed schemes on differentactivity classes of this dataset is provided. It is clear thatboth these approaches provide consistent performance over for most of the classes. However, multi-stage trainingprovides a slight increase in incorrect predictions for someclosely related activities like among various walking actions,between standing and sitting actions. In Tab. VIII, the averagecrosss-validation performance of both the training approachesis presented on the SKODA dataset. Though most of the activ-ities contain close inter-relation in this dataset, our proposedtraining methods provide consistent performance over formost of the classes. However, some activities like openingand closing hood, opening, and closing doors, are difﬁcult toseparate as expected. Despite that, comparable performances

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 10

Table IX C ENTRAL T ENDENCY (M EAN ) AND D ISPERSION (S TANDARD D EVIATION ) M

EASURES OF THE A VERAGE E VALUATION M ETRICS A CROSS V ARIOUS C ROSS -V ALIDATION F OLDS ON D IFFERENT D ATASETS

Database Two Stage training Multi Stage trainingAccuracy(%) AveragePrecision(%) AverageRecall(%) AverageIoU Score(%) Accuracy(%) AveragePrecision(%) AverageRecall(%) AverageIoU Score(%)

UCI [26] 98.63 ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± Average 97.91 ± ± ± ± ± ± ± ± Table X C OMPARISON OF THE P ROPOSED S CHEMES WITH O THER E XISTING A PPROACHES ON D IFFERENT D ATASETS

UCI HAR Database [26] USC HAR Database [28] SKODA Database [6]Work Method Acc.(%) P-value Work Method Acc.(%) P-value Work Method Acc.(%) P-value [3] MLP 86.1 NA [29] MLP, J48 89.2 NA [6] HMM 86 NA[12] CNN 94.2 NA [8] Random Forest 90.7 NA [9] DBN 89.4 NA[7] DTW 95.3 NA [16] CNN 93.2 NA [15] Deep Conv LSTM 91.2 NA[26] SVM 96 NA [10] LS-SVM 95.6 NA [14] CNN 91.7 NA[17] Deep RNN 96.7 NA [11] CNN 97 NA [30] Ensemble LSTM 92.4 NA[4] SVM 97.1 NA [17] Deep RNN 97.8 NA [17] DeepRNN 92.6 NA

Prop. 2-Stage CNN 98.63 3.4e-5 98.57 2.5e-6 96.51 4.2e-4Prop. M-Stage CNN 99.29 5.1e-5 99.02 1.3e-6 97.21 2.8e-4 have been achieved in these classes utilizing the proposedscheme.In Fig. 7, the average IoU score in different stages areshown for multi-stage training. It is clear that each stageprovides some improvement in performance by incorporatingnew features. However, in the ﬁrst two stages, the trained net-work has achieved signiﬁcant performance improvement withmore than improvement in the average IoU score mostlyachieved utilizing the features from identity transformationand scattering wavelet transformation with the 1D deep CNNfeature extractor. Nevertheless, features from other transform-ations exploited at the later stages still provide a considerablecontribution with around ∼ improvement in total tomake the ﬁnal network more optimized to separate challengingclasses and thus to attain a higher average IoU score. Hence,integration of features from four transformed spaces in theproposed sequential training approach, ∼ improvementof average IoU score is achieved in total compared to operatingwith raw sensor data alone. In Tab. IX, central tendency (mean)and dispersion measures (standard deviation) of the evalu-ation metrics in different databases are provided. It shouldbe noticed that the standard deviations of performance overvarious cross-validation folds are trivial in most cases thatsignify the generalizability of the proposed scheme. Moreover,around reduction of standard deviation is achieved in themulti-stage training approach over the two-stage counterpart.Additionally, the average performances over all three databasesare also reported which surpasses in most cases thatsignify the robustness and consistency of the proposed schemeover numerous databases.Various existing approaches are compared with the proposedones in Tab. X on different datasets. Average accuraciesobtained from the proposed two-stage and multi-stage trainingmethods are compared with the reported accuracy of varietiesof state-of-the-art approaches. It can be noted that the pro-posed multi-stage scheme has improved average accuracy from86.1% to 99.29% (13.19% improvement) in UCI database, from 89.2% to 99.02% (9.82% improvement) in USC database,and from 86% to 97.21% (11.21% improvement) in SKODAdatabase. The improvement in the multi-stage approach isaround higher over the two-stage training approach for itsincreased opportunity of optimization through multiple stages.However, the training complexity also increases as morenumber of training stages need to be adjusted. As the p − valuesobtained from the statistical signiﬁcance test on different data-bases are considerably smaller from the predeﬁned thresholdof 0.01, we have to reject the null hypothesis and it suggeststhat considerable improvement of average accuracy is achievedusing the proposed schemes over other existing approaches.Moreover, the following observations can be drawn from theanalysis: • In the UCI database, the shallow machine learning ap-proaches ( [4], [26]) comparatively provide better per-formance compared to other traditional deep learning-based approaches ( [11], [12]) due to the smaller amountof available training data. It should be noticed that theproposed scheme exploits the available data by incorpor-ating a diverse representation of the feature space throughmultiple training stages that extract the effective featureswithout the overﬁtting issues which is predominant inother traditional deep learning approaches. Hence, despitethe smaller amount of available data, the proposed multi-stage deep CNN-based approach outperforms traditionalshallow classiﬁers in the UCI database. • Deep learning-based approaches mostly dominate in theUSC database due to the higher number of trainingsamples. Nevertheless, due to more number of activityclasses, there exist additional complexity in the featureextraction process. These traditional deep learning-basedmethods mostly struggle in challenging cases due to thecomplicated deep network-based approaches that operatesolely on raw sensor data. On the contrary, the proposedscheme employs deep feature extractors efﬁciently onnumerous transformed spaces and splits the training pro-

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 11 cess into several stages that provides a better selection offeatures along with higher resilience over random noisesand perturbations. • The SKODA database is more complicated due to closeinter-relation of many activities along with a large numberof activity classes that result in smaller performance inmost-other approaches. However, the proposed schemeexplores a number of representational feature spacesinstead of a single space that not only increases thediversity of features but also assists better separationof inter-related activities. Hence, the proposed schemeprovides a sharp improvement of average accuracy (morethan . in multi-stage over the other best approach) inthis database.Though we have incorporated features from four trans-formed spaces (including identity transform) in this work, itis to be noted that the proposed sequential training scheme isadaptive and features from newly transformed spaces can beeasily integrated with the resultant feature space by includingadditional training stages. However, to incorporate effectivefeatures from new representational space, separate CNN-basedfeature extractors need to be incorporated which will increasethe total size of the network accordingly. But, in the traditionaltraining approach, if the whole system of the network istrained in a single training stage, it will be very complicatedto achieve convergence and the network will be highly proneto overﬁt with the training data that will limit the integrationof numerous transformations. Whereas, the proposed trainingscheme separately optimizes individual deep feature extractorsand integrates the extracted feature spaces in a sequentialmanner that makes it possible to exploit a large number offeature spaces which provides a signiﬁcant advantage overthe traditional approaches. However, if a large number oftransformed spaces are integrated into the feature extractionprocess, the increased size of the network may limit itsapplication in mobile devices. Nevertheless, it is shown thatvery satisfactory performance is achieved by incorporating afewer number of transformed spaces only. Hence, to reducethe complexity of the network for mobile applications, afewer number of transformed spaces can be integrated intothe feature extraction process while for achieving more robustperformance, a large number of transformed feature spaces canbe utilized. IV. C ONCLUSION

In this paper, various types of human activities are re-cognized utilizing the proposed multi-stage training method.Firstly, the raw data undergo through numerous transform-ations to interpret the information encoded in raw data indifferent spaces and thus to obtain a diversiﬁed representationof the features. Afterwards, separate deep CNN architecturesare trained on each space to be an optimized feature extractorfrom that particular space for the ﬁnal prediction of activity.Later, these tuned feature extractors are merged into a ﬁnalform of deep network effectively through a combined trainingstage or through sequential stages of training by exploring theextracted feature spaces exhaustively to attain the most robust and accurate feature representation. It is found that instead ofutilizing trained CNN as a feature extractor from a single spaceif multiple trained CNNs dealing with numerous transformedspaces can be utilized together, much better representation offeatures can be obtained. Such an idea of multiple trainingstages utilizing the initially trained CNN models from thepreceding stages operating on different transformed spacescan offer a signiﬁcant increase in performance with ∼ improvement in average IoU scores. This method outperformsother state-of-the-art approaches in different datasets by a con-siderable margin with an average accuracy of . (11.49%average improvement) over three databases. Therefore, theproposed scheme opens up a new approach of employingmultiple training stages for deep CNNs deploying varioustransformed representations of data which can also be utilizedin very diversiﬁed applications by increasing the diversity ofthe extracted features. R EFERENCES[1] N. Islam, Y. Faheem, I. U. Din, M. Talha, M. Guizani, and M. Khalil,“A blockchain-based fog computing framework for activity recognitionas an application to e-healthcare services,”

Future Generation ComputerSystems , vol. 100, pp. 569–578, 2019.[2] A. Jalal, Y.-H. Kim, Y.-J. Kim, S. Kamal, and D. Kim, “Robust humanactivity recognition from depth video using spatiotemporal multi-fusedfeatures,”

Pattern recognition , vol. 61, pp. 295–308, 2017.[3] R.-A. Voicu, C. Dobre, L. Bajenaru, and R.-I. Ciobanu, “Human physicalactivity recognition using smartphone sensors,”

Sensors , vol. 19, no. 3,p. 458, 2019.[4] A. Jain and V. Kanhangad, “Human activity classiﬁcation in smartphonesusing accelerometer and gyroscope sensors,”

IEEE Sensors Journal ,vol. 18, no. 3, pp. 1169–1177, 2017.[5] R. C. Kumar, S. S. Bharadwaj, B. Sumukha, and K. George, “Humanactivity recognition in cognitive environments using sequential elm,”in . IEEE, 2016, pp. 1–6.[6] P. Zappi, C. Lombriser, T. Stiefmeier, E. Farella, D. Roggen, L. Benini,and G. Tr¨oster, “Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection,” in

European Conferenceon Wireless Sensor Networks . Springer, 2008, pp. 17–33.[7] S. Seto, W. Zhang, and Y. Zhou, “Multivariate time series classiﬁcationusing dynamic time warping template selection for human activity recog-nition,” in .IEEE, 2015, pp. 1399–1406.[8] P. Vaka, F. Shen, M. Chandrashekar, and Y. Lee, “Pemar: A pervasivemiddleware for activity recognition with smart phones,” in . IEEE, 2015, pp. 409–414.[9] M. A. Alsheikh, A. Selim, D. Niyato, L. Doyle, S. Lin, and H.-P.Tan, “Deep activity recognition models with triaxial accelerometers,” in

Workshops at the Thirtieth AAAI Conference on Artiﬁcial Intelligence ,2016.[10] Y. Zheng, “Human activity recognition based on the hierarchical fea-ture selection and classiﬁcation framework,”

Journal of Electrical andComputer Engineering , vol. 2015, 2015.[11] W. Jiang and Z. Yin, “Human activity recognition using wearable sensorsby deep convolutional neural networks,” in

Proceedings of the 23rd ACMinternational conference on Multimedia , 2015, pp. 1307–1310.[12] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, andI. De Munari, “Iot wearable sensor and deep learning: An integratedapproach for personalized human activity recognition in a smart homeenvironment,”

IEEE Internet of Things Journal , vol. 6, no. 5, pp. 8553–8562, 2019.[13] B. Zhou, J. Yang, and Q. Li, “Smartphone-based activity recognitionfor indoor localization using a convolutional neural network,”

Sensors ,vol. 19, no. 3, p. 621, 2019.[14] D. Ravi, C. Wong, B. Lo, and G.-Z. Yang, “Deep learning for humanactivity recognition: A resource efﬁcient implementation on low-powerdevices,” in . IEEE, 2016, pp. 71–76.

EEE SENSORS JOURNAL, VOL. 21, NO. 2, JANUARY 2021 12 [15] F. J. Ord´o˜nez and D. Roggen, “Deep convolutional and lstm recurrentneural networks for multimodal wearable activity recognition,”

Sensors ,vol. 16, no. 1, p. 115, 2016.[16] Y. Chen and Y. Xue, “A deep learning approach to human activityrecognition based on single accelerometer,” in . IEEE, 2015, pp. 1488–1492.[17] A. Murad and J.-Y. Pyun, “Deep recurrent neural networks for humanactivity recognition,”

Sensors , vol. 17, no. 11, p. 2556, 2017.[18] A. Gumaei, M. M. Hassan, A. Alelaiwi, and H. Alsalman, “A hybriddeep learning model for human activity recognition using multimodalbody sensing data,”

IEEE Access , vol. 7, pp. 99 152–99 160, 2019.[19] S. Chung, J. Lim, K. J. Noh, G. Kim, and H. Jeong, “Sensor dataacquisition and multimodal sensor fusion for human activity recognitionusing deep learning,”

Sensors , vol. 19, no. 7, p. 1716, 2019.[20] M. Lv, W. Xu, and T. Chen, “A hybrid deep convolutional and recur-rent neural network for complex activity recognition using multimodalsensors,”

Neurocomputing , vol. 362, pp. 33–40, 2019.[21] H. Yu, G. Pan, M. Pan, C. Li, W. Jia, L. Zhang, and M. Sun, “Ahierarchical deep fusion framework for egocentric activity recognitionusing a wearable hybrid sensor system,”

Sensors , vol. 19, no. 3, p. 546,2019.[22] Z. Wang and T. Oates, “Encoding time series as images for visualinspection and classiﬁcation using tiled convolutional neural networks,”in

Workshops at the Twenty-Ninth AAAI Conference on Artiﬁcial Intel-ligence , 2015.[23] N. Hatami, Y. Gavet, and J. Debayle, “Classiﬁcation of time-seriesimages using deep convolutional neural networks,” in

Tenth InternationalConference on Machine Vision (ICMV 2017) , vol. 10696. InternationalSociety for Optics and Photonics, 2018, p. 106960Y.[24] S. Mallat, “Group invariant scattering,”

Communications on Pure andApplied Mathematics , vol. 65, no. 10, pp. 1331–1398, 2012.[25] W. Lu, F. Fan, J. Chu, P. Jing, and S. Yuting, “Wearable computingfor internet of things: A discriminant approach for human activityrecognition,”

IEEE Internet of Things Journal , vol. 6, no. 2, pp. 2749–2759, 2018.[26] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A publicdomain dataset for human activity recognition using smartphones.” in

Esann , 2013.[27] T. T. Um, F. M. Pﬁster, D. Pichler, S. Endo, M. Lang, S. Hirche,U. Fietzek, and D. Kuli´c, “Data augmentation of wearable sensor data forparkinson’s disease monitoring using convolutional neural networks,” in

Proceedings of the 19th ACM International Conference on MultimodalInteraction , 2017, pp. 216–220.[28] M. Zhang and A. A. Sawchuk, “Usc-had: a daily activity dataset forubiquitous activity recognition using wearable sensors,” in

Proceedingsof the 2012 ACM Conference on Ubiquitous Computing , 2012, pp. 1036–1043.[29] C. Catal, S. Tufekci, E. Pirmit, and G. Kocabag, “On the use of ensembleof classiﬁers for accelerometer-based activity recognition,”

Applied SoftComputing , vol. 37, pp. 1018–1022, 2015.[30] Y. Guan and T. Pl¨otz, “Ensembles of deep lstm learners for activityrecognition using wearables,”

Proceedings of the ACM on Interactive,Mobile, Wearable and Ubiquitous Technologies , vol. 1, no. 2, pp. 1–28,2017.

Tanvir Mahmud (S’18) received the B.Sc. degreefrom the EEE Department, Bangladesh Universityof Engineering and Technology, where he is cur-rently pursuing the master’s degree. He is currentlyserving as a Lecturer with the EEE Departmentof Bangladesh University of Engineering and Tech-nology. His research interest lies in VLSI circuitdesign, computer vision, approximate computing,biomedical signal processing, image processing, andmachine learning.

A. Q. M. Sazzad Sayyed (S’18) received theB.Sc. degree from the EEE Department, BangladeshUniversity of Engineering and Technology, wherehe is currently pursuing the master’s degree. He iscurrently serving as a Lecturer with the EEE Depart-ment of Southeast University, Dhaka. His researchinterest lies in computer vision, image processing,evolutionary computation and machine learning.

Shaikh Anowarul Fattah (S’02–M’09–SM’16) re-ceived the B.Sc. and M.Sc. degrees from theBangladesh University of Engineering and Techno-logy (BUET), Bangladesh, and the Ph.D. degree inECE from Concordia University, Canada. He helda visiting postdoctoral position and later a visitingResearch Associate with Princeton University, Prin-ceton, NJ, USA. He has been serving as a Professorwith the Department of EEE, BUET. He has pub-lished over 200 inter-national journal articles andconference papers with some best paper awards. Hismajor research interests include biomedical engineering and signal processing.He is a Fellow of IEB. He is regularly delivering keynote/invited/visitingtalks in many countries. He received several prestigious awards, such as theConcordia University’s Distinguished Doctoral Dissertation Prize in ENS, theDr. Rashid Gold Medal (M.Sc., BUET),the NSERC Postdoctoral Fellowship,the URSI Canadian Young Scientist Award 2007, and the BAS-TWAS YoungScientists Prize 2014. He is the General Chair of IEEE R10 HTC2017,ICAICT 2020, the TPC Chair of IEEE TENSYMP 2020, IEEE WIECON-ECE2016, 2017, MediTec 2016, IEEE ICIVPR 2017, and ICAEE 2017. He is aCommittee Member of IEEEPES (LRPC), IEEE SSSIT (SDHTC), IEEE HAC(2018–2020), and R10. He was the Chair of the IEEE Bangladesh Section(2015–2016). He was the Chair of the IEEE EMBS Bangladesh Chapter(2017–2019). He is the Founder Chair of the IEEE RAS and SSIT BangladeshChapters. He was an Editor of the Journal of Electrical Engineering of IEB.He is an Editor of the IEEE PES eNews, an Editorial Board Member of IEEEACCESS, and an Associate Editor of CSSP (Springer).