[PDF] B-HAR: an open-source baseline framework for in depth study of human activity recognition datasets and workflows

Abstract

Human Activity Recognition (HAR), based on machine and deep learning algorithms is considered one of the most promising technologies to monitor professional and daily life activities for different categories of people (e.g., athletes, elderly, kids, employers) in order to provide a variety of services related, for example to well-being, empowering of technical performances, prevention of risky situation, and educational purposes. However, the analysis of the effectiveness and the efficiency of HAR methodologies suffers from the lack of a standard workflow, which might represent the baseline for the estimation of the quality of the developed pattern recognition models. This makes the comparison among different approaches a challenging task. In addition, researchers can make mistakes that, when not detected, definitely affect the achieved results. To mitigate such issues, this paper proposes an open-source automatic and highly configurable framework, named B-HAR, for the definition, standardization, and development of a baseline framework in order to evaluate and compare HAR methodologies. It implements the most popular data processing methods for data preparation and the most commonly used machine and deep learning pattern recognition models.

Full PDF

11 B-HAR: an open-source baseline framework for indepth study of human activity recognition datasetsand workﬂows

Florenc Demrozi,

Member, IEEE , Cristian Turetta, and Graziano Pravadelli,

Senior Member, IEEE

Abstract —Human Activity Recognition (HAR), based on ma-chine and deep learning algorithms is considered one of the mostpromising technologies to monitor professional and daily lifeactivities for different categories of people (e.g., athletes, elderly,kids, employers) in order to provide a variety of services related,for example to well-being, empowering of technical performances,prevention of risky situation, and educational purposes. However,the analysis of the effectiveness and the efﬁciency of HARmethodologies suffers from the lack of a standard workﬂow,which might represent the baseline for the estimation of thequality of the developed pattern recognition models. This makesthe comparison among different approaches a challenging task. Inaddition, researchers can make mistakes that, when not detected,deﬁnitely affect the achieved results. To mitigate such issues, thispaper proposes an open-source automatic and highly conﬁgurableframework, named B-HAR, for the deﬁnition, standardization,and development of a baseline framework in order to evaluateand compare HAR methodologies. It implements the most pop-ular data processing methods for data preparation and the mostcommonly used machine and deep learning pattern recognitionmodels.

Index Terms —Human Activity Recognition, Inertial sensors,Machine learning, Deep learning, Open-source library

I. I

NTRODUCTION

In the last decade, with the advent of the Internet ofThings (IoT), embedded sensors have been integrated intopersonal devices such as smartphones and smartwatches butalso into clothes and objects of daily life. This has pushedthe deﬁnition of research directions and the development ofsmart applications for Human Activity Recognition (HAR).Meanwhile, HAR has become a popular topic due to itsimportance in many areas, including health care, well-being,interactive gaming, sports, and monitoring of activities of dailylife (ADLs) in general, in both controlled and uncontrolledsettings [1]–[3].Despite its increasing importance, research on HAR en-counters multiple difﬁculties. Among them, the most criticalare: i) the variety of daily activities to be recognized, ii)

Manuscript received Month Day, 2021; revised Month Day, 2021; acceptedMonth Day, 2021. Date of publication Month Day, 2021; date of currentversion Month Day, 2021. The associate editor coordinating the review ofthis article and approving it for publication was Dr/Prof. XXXX XXXXX.This research has been partially supported by the FSE Projects n. 1695-0013-1463-2019 (Smart-PUMP) and n. 1695-0007-1463-2019 (Biofeedback)funded by Veneto Region.This work has been submitted to the IEEE for possible publication. Copyrightmay be transferred without notice, after which this version may no longer beaccessible. (Corresponding author: Florenc Demrozi.)F. Demrozi, C. Turetta and G. Pravadelli are with the Computer ScienceDepartment, University of Verona, Italy (e-mail: [email protected]). the intra-subject and inter-subject movement variability, iii)the trade-off between performance and privacy, iv) the needof computational efﬁciency and their actual availability inembedded and portable devices where HAR algorithms shouldrun, and v) the necessity of learning phases based on time-consuming data annotation processes [4], [5].Concerning the data elaborated by HAR algorithms, they aretypically collected from i) cameras; ii) ambient sensors (e.g.,temperature, humidity, brightness, seismic) and iii) wearableembedded sensors (e.g., inertial or physiological sensors) [6]–[8]. Cameras are broadly used in HAR, nevertheless, collectingvideo data presents privacy issues, and in addition, theirelaboration requires high computational resources. For thesereasons, many researchers has started to work with other ambi-ent and embedded sensors extensively [9], [10]. In particular,accelerometer sensors have shown excellent results in HAR,and their use in combination with other sensors is risingrapidly. The diffusion of accelerometer sensors is stronglyrelated to their capacity to map the human body movementdirectly [3], [11] and also human psychological state (e.g.,painlevel [12]). Besides, accelerometer sensors present affordablecosts and they can be integrated into the most of wearableobjects people own.Concerning the algorithmic design, HAR has seen an in-creasing interest in Deep Learning (DL) methods, mainly dueto their capabilities to work with raw data that do not requireexplicit data handling techniques as normalization, ﬁltering, orfeature extractions [13]. However, since DL models producehigh accuracy results on large HAR datasets, the lack of largedatasets with lower dimensionality requires to apply differentprocessing techniques [3].Figure 1 presents the HAR standard workﬂow, which gen-erally consists of: 1) the identiﬁcation of data sources, i.e.,sensors and devices used to collect data, 2) the data collectionand pre-processing, 3) the model selection and training, and4) the evaluation of the achieved results. Despite this generalworkﬂow, the data pre-processing and the model selectionphases include a high number of non-standardized steps whichrepresents the most critical aspects for the quality of theHAR approach. In particular, the data pre-processing includesdifferent elaborations, as normalization, noise removal, bal-ancing, and feature extraction. Their application profoundlyimpacts the recognition model selection and the recognitionquality. Indeed, no well-deﬁned workﬂow that identiﬁes howthese data processing steps should be applied exists, as wellas, there are not precise rules for the model selection and a r X i v : . [ ee ss . SP ] J a n Device identification Data collectionand pre-processing Model selection & training

Human activity Modelevaluation

Environmental sensorsInertial sensorsPhysiological sensorsIntegrated/Smart sensors

Noise removalSignal filteringNormalizationSegmentationFeature extractionTime domainFrequency domain Average accuracySensitivitySpecificityPrecisionROCAUCUsed testing datasets

Classic Machine LearningDeep Learning

Fig. 1. Overview of the HAR workﬂow. training. In addition, the testing phase is still an underestimatedprocess that is not clearly described in most of the existingmethodologies.The lack of a standard in the application of the workﬂowreported in Fig. 1 makes very difﬁcult to analyze, compareand evaluate the quality of different HAR approaches, even byadopting the same dataset. Indeed, many datasets are availablein the literature [14]–[23], which represent human activi-ties perceived through inertial sensors, such as accelerom-eters, gyroscopes, magnetometers, or physiological sensors,like electromyography (EMG), electrocardiogram (ECG), orelectroencephalogram (EEG). However, to the best of ourknowledge, at the state-of-the-art, there is no work (articleor tool) that can be used as a baseline for fairly comparingdifferent HAR methodologies.To ﬁll in this gap, this paper provides the scientiﬁc commu-nity with Baseline-HAR (B-HAR) , a Python framework that,starting from the target input dataset , provides the user with: i)the possibility of easily conﬁguring the different phases of datapre-processing and training, thus avoiding errors; ii) efﬁciencyindicators of the most used classiﬁcation models applied to thetarget dataset.In particular, the main characteristics B-HAR are: • it uniﬁes the general workﬂow of Fig. 1 into a singleframework; • it deﬁnes precise data pre-processing steps, which mini-mize possible user errors; • it provides the user with baseline information aboutthe most famous HAR datasets [14], [16], [17], [21],[22], [24], such that new approaches can be exhaustivelycompared with respect to existing results; • given a new HAR dataset, it automatically applies on itthe most used pattern recognition models to provide theuser with baseline results.The rest of the paper is organized as follows. Section IIintroduces some preliminary concepts. B-HAR and the under-pinning methodology are described in Section III. An exhaus- B-HAR website. Represented by time series of data perceived by inertial, physiologicalsensors and/or other type of data. tive experimental campaign is then presented in Section IV.Finally, Section V concludes the papers with ﬁnal remarks.II. P

RELIMINARIES

This section offers an overview of basic pattern recognitiontechniques exploiting machine and deep learning, and datapre-processing techniques (e.g., noise removal, data balancing,segmentation or feature extraction), which have been consid-ered in B-HAR.

A. Machine learning overview

Classical Machine Learning (CML) algorithms are catego-rized into two major groups: unsupervised learning (aka clus-tering) and supervised learning. Unsupervised learning aims toidentify existing patterns in input data without any informationregarding the output [25]. Instead, supervised learning aims toconstruct a mathematical model that exempliﬁes the relation-ship between input and output training data. Besides, one ormore pre-processing steps can be necessary, including featureextraction, segmentation, normalization, signal noise removal,data balancing, or features projection [26]. The resulting modelis applied to predict the outcome of unseen data samples.The most common CML algorithms are Na¨ıve Bayes (NB),k-means clustering, Support Vector Machine (SVM), RandomForests (RF), Decision Trees (DT), and k-Nearest Neighbours(k-NN).In the last decade, Deep Learning (DL) algorithms have alsobecome popular in many domains. Starting from raw inputdata, DL techniques automatically identify optimal featuresthat otherwise would remain unknown without any humanknowledge [27]. Nevertheless, DL models present some limi-tation [28]: data interpretation is not easy (black box model)and DL requires large datasets for training and high compu-tational devices for execution. The most common DL modelsare: Convolutional Neural Network (CNN), Recurrent NeuralNetworks (RNNs), and Long Short-Term Memory Networks(LSTMs) [29].

B. Data pre-processing

Besides the pattern recognition models, an essential stepin HAR workﬂow is the pre-processing of raw data. This is

Data cleaningModel training and testing

Handle data errors Noise removal

Data representation

Featuresextraction Raw data

Segmentation

Pre-processing

Train/Test split (Inter, Intra)

Features selection BalancingNormalizationDrop unnecessary data Machine Learning(grid search) Deep Learning(grid search) Available filters: • Low pass• High pass•

Band pass • Band stop ! Available techniques: • Mean fill • Forward fill • Backward fill • Constant fill • InterpolateAvailable class of features: • Time domain • Frequency domain • Spectral domainDrop data by: • Classes• Patients

Feature selection techniques: • Variance• Recursive features elimination• L1 -based features selection • Tree -based features selection • Convolutional Neural Network (CNN) ! • Support Vector Machine (SVM) • k Nearest Neighbors (kNN) • Weighted k Nearest Neighbors (wkNN) • Linear Discriminant Analysis (LDA) • Quadratic Discriminant Analysis (QDA) • Random Forest (RF) • Decision Tree (DT) ! Only one is executed

Computational expensiveOptional blockMain flow

Sub flow

Undersampling: • Random undersampling • Near Miss•

Edited Nearest NeighborsOversamplig: • Random oversamplig • SMOTE• K

Means SMOTE • ADASYN !! Performance Measures

Confusion matrix SpecificitySensitivityPrecision F1-Score Accuracy

Input

Dataset Configuration file ! Normalization techniques: • Robust • Standard•

MinMax ! ! Fig. 2. Overview of the B-HAR structure. necessary, since raw data perceived by sensors are generallyaffected by different issues [30] as hardware and environmen-tal noise, errors related to null and/or missing values and usermistakes made during the collection phase. In addition, thedata can present unbalanced distribution of classes (unbalanceddatasets), or require the extraction of speciﬁc feature beforebeing elaborated. The data pre-processing step aims to reducethe effect of such issues and prepare the data for the modeltraining phase.Concerning the hardware noise, in the literature, most ofthe existing techniques make use of digital ﬁlters as low-pass, highpass, bandpass, or statistical ﬁlters [31]–[33] asthe Kalman ﬁlter [34]. Furthermore, the unbalanced datasetsissue is usually handled, through speciﬁc techniques, byreducing/increasing the number of samples in the most/lesspopulated class [35]. Finally, since raw data does not provideinterpretable characteristics, feature extraction approaches areadopted to derive interpretable characteristics in the time andfrequency domains. However, no existing standard shows thecorrect order in which the previous pre-processing steps shouldbe applied. III. M

ETHODOLOGY

B-HAR comprises six dedicated modules: one input module,one output module, and four computation modules includingvarious sub-modules. Its structure is shown in Figure 2. Redbullets identify computationally heavy and time-consuming operations. The green bullet refers to a computation modulethat executes only one of its sub-modules. Grey sub-modulesidentify optional computations that B-HAR users may use ornot. Internal sub-modules are executed in the order indicatedby the dashed lines. Details on the structure are reported inthe following sections.

A. Input module

B-HAR takes in input two ﬁles containing, respectively, i)the dataset, i.e., values of the signals perceived by sensorsduring human activities, enriched by information concerningthe testing subject identity, the data collection session, andthe performed activity at a speciﬁc timestamp, and ii) aconﬁguration ﬁle that deﬁnes the library workﬂow.

1) Dataset format:

Since HAR datasets present differentstructures and information types, B-HAR can ﬂexibly handledifferent dataset formats. Fig. 3 shows the generic, customiz-able, structure of the input dataset that can be providedto B-HAR; (cid:51) identiﬁes mandatory data, while [ (cid:51) ] identiﬁesoptional data. B-HAR requires a dataset composed of signalsfrom one or more sensors (

Sensor ixyx ), a label ( A ID ) indicat-ing the human activity performed by the testing subject, andhis/her identiﬁer ( T ID ). B-HAR can handle datasets includingdata from different types of sensors (e.g., accelerometers, gy-roscopes, magnetometers, or other physiological sensors) thatcan be related to different activities, each one identiﬁed by adifferent session identiﬁer ( S ID ). Besides, if time information lacks, B-HAR users have to indicate, into the conﬁgurationﬁle, the discrete sampling frequency adopted by the sensors. T ime Sensor xyz ... Sensor nxyz A ID T ID S ID [ (cid:51) ] (cid:51) .... [ (cid:51) ] (cid:51) (cid:51) [ (cid:51) ] n ≥ (cid:51) ≡ is needed [ (cid:51) ] ≡ is optional Fig. 3. B-HAR dataset structure.

2) Conﬁguration ﬁle:

The B-HAR conﬁguration ﬁle pro-vides the user with total control over the computation modulesand relative sub-modules. Through such a ﬁle, B-HAR userscan easily modify the data processing workﬂow and test dif-ferent conﬁguration pipelines. Table I shows the implementedconﬁguration parameters (Column 2) alongside with theirdescription (Column 3) and possible usable values (Column4).

B. Data cleaning

The quality of the collected data depends mainly on twofactors: i) environment and hardware noise, and ii) datacollection technology and architecture, which often leads to theloss or corruption of the data during its transmission. Missingor corrupted data must be examined and integrated to maintainthe structure and the information of the time series. Instead,environmental and hardware noise is handled through a noiseremoval phase. The data cleaning phases is devoted to handlesuch issues.

1) Handling of data errors:

Missing and inconsistent data,such as NaN, or Inf, represent one of the main obstacles foran accurate analysis of observed data. B-HAR provides fourdifferent methods to handle missing data: • bﬁll : it is used to backward ﬁll the missing values in thetime series; • fﬁll : it is used to forward ﬁll the missing values in thetime series; • interpolate : it performs linear interpolation for missingdata based on previous and successive data points; • constant(k) : it substitutes missing values with a constantvalue k ; • mean : it substitutes missing data with the arithmetic meanof the previous and successive observations.B-HAR returns an error if the input dataset has more than5% of not adjacent missing values [36].

2) Noise removal:

Time series noise removal is an essentialand indisputable step in HAR signal processing. B-HARprovides four different types of ﬁlters [37]: • low-pass ﬁlters pass signals with a frequency lower thana selected cutoff and attenuate the rest; • high-pass ﬁlters pass signals with a frequency higher thana selected cutoff and attenuate the rest; • band-pass ﬁlters pass signals within a speciﬁc range andattenuates frequencies outside it; • band-stop ﬁlters pass most frequencies unaltered, butattenuate those in a speciﬁc range.

5% of the samples in a time series column.

The B-HAR users can also choose not to apply noiseremoval techniques, relying entirely on pattern recognitionmodels.

C. Data representation

In general, the data collected for HAR models can be usedin different forms: • raw data with no transformation; • after features extraction techniques have been applied; • after segmentation, for grouping them in time windows.Since single data observations in a time series are notstatistically informative, segmentation is generally applied.Moreover, HAR models resonate with ongoing activities overthe time, and not on single observation. On such an aim,segmentation and feature extractions have become state-of-the-art procedures for HAR. B-HAR provides to the user withthe possibility of selecting one of the aforementioned datatreatment procedures.

1) Raw data:

No data treatment is carried out by B-HARin this case. The pre-processing module takes in input theoriginal datasets. This representation becomes useful when theuser has already prepared the data ofﬂine and he/she wants touse B-HAR just to study different pattern recognition models.

2) Segmentation:

Given a n × time series HAR D , sam-pled at a ﬁxed frequency S f (expressed in Hertz), and a time-window segment of dimension T w (expressed in seconds), thebasic segmentation process implemented in B-HAR returns adataset SHAR D of dimension ( n/ ( T w ∗ S f )) × ( T w ∗ S f ) . Forexample, given a × time series HAR D , a sampling fre-quency S f = 25 Hz, and a time window T w = 2 seconds, thesegmentation process returns a dataset SHAR D of dimension (100 / (25 ∗ × (2 ∗ , i.e., × .However, many methodologies maintain an overlappingfragment between two consecutive segments. This part pro-vides the model with information of the preceding context.B-HAR can also handle this overlapping, if speciﬁed by theuser.

3) Features extraction:

The feature extraction process ex-plores the time and frequency domains of the input data.Time-domain features are mostly used because their extractionprocess requires a lower computation effort compared to theextraction of frequency-domain features [30], [38].B-HAR, given the original time series dataset, the type offeatures to be extracted in the desired domain (time and/orfrequency), the time-window size, and the overlapping size,returns the corresponding time and/or frequency feature-basedrepresentation. B-HAR uses the Time Series Feature Extrac-tion Library (TSFEL) presented in [38].

D. Pre-processing

This module adopts a set of techniques to reduce thedataset size by dropping unnecessary data, selecting a subsetof extracted features, and balancing the dataset. In addition, itstandardizes the data through normalization approaches.

Nr. Parameter Description Values Range1 path

Path to the dataset directory. String.2 separator .csv separator Separator character3 has_header

The input dataset ﬁle contains also the header Bool.4 header_type

Header format of the input dataset. tdc , tdcp , dc , dcp time window Time for desired window length, in seconds. Float6 sampling_frequency

Sampling frequency of loaded dataset in Hz. Integer7 overlap

Overlap between time windows, in seconds. Float8 group_by

Show stats by attribute.

CLASS , P ID data_treatment Data treatment type. segmentation , raw , features extraction features_domain Available domains for features extraction . statistical , spectral , temporal , all .11 features_selection Toggle features selection. Bool ( true , false )12 use_ml

Implemented Machine Learning models. kNN , wkNN,

LDA , QDA , SVM, RF , DT use_dl Implemented Deep Learning models.

CNN normalization_method Normalize data. none , minmax , robust , standard split_method Train/Test split method. intra , inter selection_method Features selection technique. variance , l1 , tree-based , recursive n_features_to_selec t Number of features to select, available only with recur-sive selection method. Integer18 data_balancing_method For unbalanced dataset, balancing techniques.(Under • Over sampling techniques) random under , near miss , edited nn • random over , smote , adasyn , kmeans smote .19 sub_method Substitute NaN and Inf values method. mean , forward , backward , constant , inter-polate constant_value The value which substitute NaN and ±∞ values. Float21 filter Data ﬁltering techniques. lowpass , highpass , bandpass , bandstop filter_order The order of the applied ﬁlter. Integer ( e.g., 4 )23 cut

Low/High-Cutoffs frequency in Hz. Integer ( e.g., 20 Hz ).24 test_size

The size of test data. Float ( e.g., 25% )25 epoch s Training epochs for CNN Integer ( e.g., 100 )26 k_fold

Number of fold for CNN k-fold training Integer ( e.g., 3 )27 loss_threshold

Bias used in CNN model ensembles Float ( e.g., 0.4 )28 use_features

If features selection is enabled, extracted features areused as input of the CNN. Bool ( true , false )TABLE IB-HAR

CONFIGURATION PARAMETERS . GRAY LINES DEFINES THE CONFIGURATION USED IN THE EXPERIMENTAL RESULTS SECTION .)

1) Drop of unnecessary data:

One of the main problemfor HAR, and pattern recognition techniques in general, isrepresented by the presence in the dataset of imbalanced classdistribution and noisy data related to a speciﬁc tester (subject)or activity (class). To solve such an issue, B-HAR providesthe possibility of excluding data from the input dataset. Inparticular, B-HAR can exclude data related to: • a single tester or a group of testers; • a single activity or a group of activities; • a single data collection session or a group of sessions.

2) Training/testing approach:

In machine learning, datausually are split into two (training and testing) or three(training, validation, and testing) subsets. The recognitionmodel is then trained on the training dataset (and cross-checked with the validation dataset when required) to makepredictions on the testing dataset. However, in HAR, trainand test datasets are usually partitioned by considering thepresence of data collected from several different testers [25].For such a purpose, B-HAR presents two different types oftrain/test splitting approaches: • Inter-subject (Leave-out) : The inter-subject train/test par-titioning approach works as follows: B-HAR users selectdata related to the human activities of a speciﬁc subset oftesters creating the testing dataset (e.g., 10 of 100), andthe remaining testers represent the training dataset (e.g.,90 of 100). By using this approach, we are absolutely surethat there is no overlap between the training dataset andthe test dataset, minimizing the possibility of overﬁtting. • Intra-subject (Hold-out) : In the intra-subject train/testapproach B-HAR provides the possibility of performing atraditional hold-out approach. B-HAR users give as inputthe test partition dimension that usually goes from 15%to 25% and a k value in the range 3 to 10. In particular,the datasets is initially divided into training and testingdataset (e.g., 75% train and 25% test), subsequently, aclassic k -fold cross-validation approach is applied onlyover the training dataset.

3) Normalization:

During the training phase, features withhigher values govern the training process. However, suchfeatures are not those that mostly represent the characteristicsof the dataset or the ﬁnal accuracy of the pattern recognitionmodel. Data normalization transforms multi-scaled data tothe same scale, where all variables positively inﬂuence themodel, thus improving the stability and the performance of thelearning algorithm [25]. Besides, when working with datasets,where different features represent every single sample, thedatasets perform independent normalization for every singlefeature. B-HAR provides the following normalization tech-niques: • Robust scaling : it scales each feature of the data set

HAR D by subtracting the median ( HAR Q D ) and thendividing by the Interquartile Range (IQR). The IQR isdeﬁned as the difference between the third and the ﬁrstquartile ( HAR Q D ) − HAR Q D ). The robust scaler of a dataset HAR D is expressed as: HAR normD = HAR D − HAR Q D HAR Q D − HAR Q D (1)This scaler uses statistics that are robust to outliers, incontrast with the other scalers, which use statistics thatare highly affected by outliers such as the maximum, theminimum, the mean, and the standard deviation. • Standard scaling (z-Score) : it maps the data into adistribution with mean 0 and a standard deviation 1.Each normalized value is computed by subtracting thecorresponding feature’s mean and then dividing by thestandard deviation. The standard scaler of a dataset

HAR D is expressed as: HAR normD = HAR D − HAR uD HAR sD (2)where HAR uD is the mean of the training dataset, and HAR sD is the standard deviation of the training datasets. • Min-Max scaling : it rescales the feature to a [0 , ﬁxedrange by subtracting the minimum value of the featureand then dividing by the range: HAR normD = HAR D − HAR minD

HAR maxD − HAR minD (3)

4) Features selection:

A higher number of features do notnecessarily imply better results from the pattern recognitionmodel. This is because features can positively or negativelyimpact the recognition process. For this purpose, featureselection techniques have been deﬁned to identify and orderthe features by importance. Such techniques, starting from atraining dataset, automatically select features that contributemost to the prediction accuracy. Moreover, the eliminationof features reduces the time required for the training andthe testing phases. The main beneﬁts of feature selectiontechniques are: i) reduction of the overﬁtting by eliminatingredundant data that consequently reduces also noise-relatederrors, ii) improvement of the accuracy, since misleading dataare eliminated, and iii) reduction of the training time due tofewer data points [25], [39].B-HAR provides the following feature selection techniques: • Variance : It removes all features whose variance is notgreater than a deﬁned threshold. By default, it removeszero-variance features, i.e., features with the same valuein all samples. • Recursive features elimination (RFE) : It ﬁts a modeland removes the less critical feature (or features) until adeﬁned number of features is reached, without knowinghow many features are valid. Features are ranked based ontheir importance and by recursively eliminating a smallnumber of features per loop. Besides, RFE attempts toeliminate dependencies and collinearity that may exist inthe model. • Lasso regularization (L1) : It is based on the Lasso regu-larization linear model that estimates sparse coefﬁcientsstarting from the non-zero coefﬁcients returned by alinear regression model, thus effectively reducing the number of features upon which the given solution isdependent. • Tree-based : It used to compute impurity-based featureimportance, which in turn can be used to discard irrele-vant features in cooperation with other feature selectiontechniques. Tree-based estimators, by deﬁnition, inter-nally create an ordering of the features representing thetraining dataset, which makes them very suitable for usingwithin feature selection methods.

5) Balancing:

Training a HAR model on an imbalanceddataset can introduce unique challenges during the modeltraining process, returning an inﬂuenced recognition model.There are two categories of techniques that manage this issueat the state-of-art: i) oversampling the less populated classesto the same number of observations of the most populatedclass, and ii) undersampling the most populated classes to thesame number of observations of the less populated class [35].B-HAR provides different techniques to handle such issues: • Undersampling random : It randomly eliminates a sub-setof observations of the must populated classes such that allclasses have the same population (equal to the initial lesspopulated class). The main limitation is that the removedobservations can be more informative than the kept ones. • Undersampling near miss : Given two observations, whichare part of two different classes and are very similar toeach other, it eliminates the observation belonging to themost populated class. • Undersampling edited nearest neighbors : It removes ob-servations whose actual class label differs from the classof at least k nearest neighbors. It can undersampleindiscriminately over all the existing classes or a subsetof the most populated classes. • Oversampling random : It randomly duplicates observa-tion of the less populated classes such that all classeshave the same population (equal to the initial most pop-ulated class). The main limitation is that the duplicatedobservations possibly lead to an overﬁtting problem. • Oversampling synthetic minority over-sampling technique(SMOTE) : It randomly selects an observation a from theless populated class and identiﬁes its k nearest neighborsfrom the same class. A synthetic observations s is gen-erated as a convex combination of a and an observation b randomly selected from its k nearest neighbors. • Oversampling k-means SMOTE : It uses the well-knownk-means unsupervised learning algorithm to generate newobservations of the less populated class in safe and crucialinput dataset areas. This technique avoids the generationof noise and effectively overcomes imbalances betweenand within classes. • Oversampling adaptive synthetic (ADASYN) : It uses aweighted distribution for different minority class obser-vations according to their learning phase difﬁculty. Then,less populated classes are enriched by new syntheticobservations of such harder to learn observations thanthose minority examples that are easier to learn. [40].

E. Model training and testing

In such a module, B-HAR, starting from the pre-processedinput dataset, takes a known set of input data and responses(training dataset) and trains a model to generate reasonablepredictions of new data (testing dataset). In particular, B-HAR provides to the user with the possibility of training eightdifferent pattern recognition models: seven machine learn-ing models, i.e., kNN, wkNN, Linear Discriminant Analysis(LDA), Quadratic Discriminant Analysis (QDA), RD, DT, andSVM, and one deep learning model, i.e., CNN. SVM and CNNare the least efﬁcient in terms of training time. Instead, LDAand QDA are the least efﬁcient in terms of required memoryspace.B-HAR does not perform the usual model training phase,but it applies the grid search training approach. This exhaus-tively generates candidates from a given grid of parameters.For example, the SVM model learns from the training datasetby using different conﬁguration parameters (e.g., kernel func-tion ∈ [linear, polynomial, sigmoid, radial basis function],penalty ∈ [li,l2], or loss function ∈ [hinge, squared hinge]) andit is tested on the testing dataset for each conﬁguration. Finally,B-HAR returns, for each tested model, the conﬁguration thatachieved the best results in terms of sensitivity, speciﬁcity,precision, f1-score, and accuracy. F. Performance measures

The quality of a pattern recognition model is measured byusing accuracy as the main parameter. However, this metriccould be negatively affected in cases where an unbalancedtraining dataset is given in input. To overcome such an issue,more representative metrics are used [41]. A confusion matrixis a speciﬁc item that clearly visualizes the performanceof the model. The mathematical representation of a multi-class confusion matrix is as follows, where rows representthe instances in a predicted class, and columns represent theinstances in an actual class. predicted C = ac t u a l  c ... c n ... . . . c n c nn  The confusion elements for each class are given by: • true positives: tp = (cid:80) c ii ; • false positives: f p = (cid:80) nl =1 c li − tp i ; • false negatives: f n = (cid:80) nl =1 c il − tp i ; • true negatives: tn = (cid:80) nl =1 (cid:80) nk =1 c lk − tp i − f p i − f n i .On the basis of the confusion matrix the following modelquality metrics are computed by B-HAR: P recision = tptp + fp S pecificity = tnfp + tn S ensitivity = tptp + fn A ccuracy = tp + tnp + n Concerning datasets with a high number of data sources. F Score = 2 × P recision × S ensitivity P recision + S ensitivity Moreover, to better explicate the results and the behavior ofthe tested models, B-HAR shows the trend of the loss functionand the total accuracy at each step of the model training.Finally, B-HAR provides detailed statistics regarding theexecution time of each module.IV. E

XPERIMENTAL RESULTS

This section presents the results of an extensive experimen-tal campaign performed by using B-HAR on seven of the mostpopular open-source HAR datasets, and two dedicated datasets(i.e., Channel State Information (CSI) and Received SignalStrength Indicator (RSSI)) used to recognize the occupancy(empty or occupied) status of an environment starting from theradio signal propagation patterns. The goal of the campaignis twofold: i) testing the ﬂexibility and easiness of use of B-HAR on heterogeneous datasets concerning number of testers,kind of activities, and type of sensors, and ii) measuringthe performances achieved by state-of-the-art machine anddeep learning models implemented in B-HAR on the selecteddatasets. Table II presents the main characteristics of the con-sidered datasets. Column 1 shows the name and the referenceof the datasets. Columns 2, 3 and 4 report, respectively, thenumber of testers, activities, and used sensors, where A refersto accelerometers and G to gyroscopes. Column 5 indicatesthe data sampling frequency. Finally, Column 6 presents thedimension of the time window and of the overlapping fragmentin seconds.

Datasetname ofSubjects ofActivities UsedSensors SamplingFrequency Ref.(w,o)WISDM v1 [14] 51 18 1 (A) 20 Hz [42] (10, 0)WISDM v2 [24] 225 6 1 (A) 20 Hz [42] (10, 0)DAPHNET [16] 10 2 3 (A) 65 Hz [11] (3, 1)HHAR (phone) [21] 9 6 2 (A, G) 200 Hz [21] (2, 1)HHAR (watch) [21] 9 6 2 (A, G) 200 Hz [21] (2, 1)PAPAM [17] 9 14 3 (A, G) 100 Hz [17] (5, 1)mHealth [22] 10 12 3 (A, G) 50 Hz [43] (5, 2.5)BLE RSSI [44] 4 2 4 (BLE) 200 Hz [44](1, 0)802.11ac CSI 3 2 12 (AP) 40 Hz (1, 0)A = Accelerometer G = Gyroscope BLE deviceAP (Access Points) w = window (seconds) o = overlap (seconds)

TABLE IIC

HARACTERISTICS OF DATASETS

B-HAR provides the users with two function calls. The ﬁrsttakes in input the target dataset and returns statistics that helpthe user in deﬁning the B-HAR conﬁguration ﬁle. In particular,such function call returns: • the distribution of observations concerning testers andactivities, helping the user in the identiﬁcation of possibleunder/overpopulation issues; • the boxplot distribution for all or speciﬁc activities ortesters, helping the user in the identiﬁcation of outlierdistribution or noisy data.The second function call takes in input the conﬁguration ﬁleand it returns the following information: • the performance metrics discussed above; • the most performing conﬁguration returned by the gridsearch engine; • the plot of the loss and accuracy variation during thetraining process; Dataset kNN LDA QDA RF DT CNN D a t a S e g m e n t a ti on WISDM v1 [14] [98,61,62,61,62] [95,12,12,12,12] [95,20,17,20,17] [99,76,76,76,76] [97,55,53,55,53] [96,26,26,26,26]WISDM v2 [24] [94,69,78,69,78] [64,40,34,40,34] [86,65,58,65,58] [96,90,91,90,91] [92,77,77,77,77] [60,62,62,66,60]DAPHNET [16] [16,90,87,90,88] [09,91,83,91,83] [08,91,82,91,82] [14,91,91,91,91] [08,91,83,91,83] [12,90,87,90,87]PAPAM [17] [90,65,66,65,66] [90,45,45,45,45] [92,15,19,15,19] [93,80,83,80,83] [91,60,60,60,60] [90,73,76,73,73]HHAR (phone) [21] [96,83,85,83,85] [88,43,45,43,45] [88,40,50,40,50] [98,89,89,88,89] [93,67,66,67,66] [96,84,84,84,84]HHAR (watch) [21] [95,78,82,78,82] [90,54,52,54,52] [84,26,27,26,27] [97,85,85,85,85] [94,69,69,69,69] [96,83,83,83,83]mHealth [22] [81,76,81,76,81] [75,38,59,38,59] [09,91,82,91,82] [73,85,85,85,85] [74,77,77,77,77] [65,80,80,80,80]RSSI [44] [91,91,91,91,91] [91,91,91,91,91] [91,91,91,91,91] [91,91,91,91,91] [91,91,91,91,91] [90,91,90,91,91]CSI [93,93,93,93,93] [93,93,93,93,93] [92,92,92,92,92] [93,93,93,93,93] [93,93,93,93,93] [92,92,92,92,92] F ea t u r e s E x t r ac ti on WISDM v1 [14] [97,56,57,56,57] [97,49,49,49,49] [96,42,45,42,45] [99,84,84,84,84] [98,60,60,60,60] [95,25,24,25,24]WISDM v2 [24] [98,92,92,92,92] [96,86,86,86,86] [82,61,56,61,56] [97,94,93,94,93] [96,87,87,87,87] [59,41,17,41,17]DAPHNET [16] [35,92,90,92,90] [64,90,92,90,91] [09,91,82,91,82] [57,94,93,94,93] [56,90,91,90,91] [09,90,82,90,82]PAPAM [17] [92,74,76,74,76] [95,65,67,65,67] [96,03,18,03,18] [95,87,88,87,88] [95,75,75,75,75] [89,71,66,71,66]HHAR (phone) [21] [96,81,82,81,82] [90,56,56,56,56] [83,28,28,28,28] [99,93,93,93,93] [97,83,83,83,83] [97,86,86,86,86]HHAR (watch) [21] [94,69,69,69,69] [91,54,54,54,54] [83,26,24,26,24] [98,89,89,89,89] [95,76,77,76,77] [95,77,78,77,78]mHealth [22] [92,89,90,89,90] [98,88,91,88,91] [30,71,52,71,52] [96,94,94,94,94] [82,87,87,87,87] [64,82,78,82,78]RSSI [44] [95,95,95,95,95] [95,95,95,95,95] [95,95,95,95,95] [95,95,95,95,95] [95,95,95,95,95] [94,95,94,95,94]CSI [99,99,99,99,99] [99,99,99,99,99] [97,98,97,97,97] [99,99,99,99,99] [99,99,99,99,99] [99,99,99,99,99]

Classiﬁcation results are shown in terms of [

Speciﬁcity, Sensitivity, Precision, Accuracy, F1-Score ] TABLE IIIB-HAR

RESULTS ON SEGMENT / FEATURES DATA REPRESENTATION OF THE MOST FAMOUS

HAR

DATASETS . • if used, the most important features selected by thefeature selection model; • if required, the segmented or features dataset; • if required, execution time statistics for each computationmodule and sub-modules.Concerning the experimental analysis, Table III shows theresults achieved by using B-HAR on the datasets of Table IIby applying the six of the eight pattern recognition modelsreported in Section III-E. The conﬁguration of B-HAR usedfor the experiments is shown by the gray rows of Table I. Inthe ﬁrst block of Table III, data segmentation has been applied.Instead, in the second block of Table III, feature extraction inthe time and frequency domains has been used.By observing the results reported in III the user can identifywhich model performs better for each metrics, at varying ofthe considered state-of-the-art datasets. Of course, the userscan provide B-HAR with new datasets and conﬁgure theframework by following different workﬂow. In this way, B-HAR is proposed as a baseline framework for fairly comparingexisting and new HAR approaches and easily identifying, viaits grid search engine, the one that provides the best results,given the target dataset.V. C ONCLUSIONS AND FUTURE EXTENSIONS

HAR based on inertial sensor data includes several pro-cessing steps that affect the quality of the achieved results.However, there is no a clear pipeline that shows the orderin which such steps have to be applied. Thus, researchersmight make elementary mistakes that affect the result quality.This article proposed B-HAR, a framework that facilitates thestudy of the behavior of the most famous pattern recognitionmodels in the context of HAR. B-HAR allows the users deﬁn-ing elaboration pipelines, including different pre-processingsteps, like noise removal, segmentation, feature extraction,normalization, feature selection, and balancing, in the rightorder, and it returns the evaluation metrics for eight state-of-the-art recognition models. This reduces the possibility ofthe incorrect usage of HAR methodologies and provides the users with a baseline framework for fairly comparing thecorresponding results.Future work will regard the creation of a GUI, the exten-sion of each sub-module with a larger set of state-of-the-art techniques, and the integration of new recognition andregression models. Finally, we plan to provide the users withthe capability of inserting their models directly in B-HAR.R

EFERENCES[1] V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, andI. De Munari, “Iot wearable sensor and deep learning: An integratedapproach for personalized human activity recognition in a smart homeenvironment,”

IEEE Internet of Things Journal , vol. 6, no. 5, pp. 8553–8562, 2019.[2] A. Poli, G. Cosoli, L. Scalise, and S. Spinsante, “Impact of wearablemeasurement properties and data quality on adls classiﬁcation accuracy,”

IEEE Sensors Journal , 2020.[3] F. Demrozi, G. Pravadelli, A. Bihorac, and P. Rashidi, “Human activityrecognition using inertial, physiological and environmental sensors: Acomprehensive survey,”

IEEE Access , pp. 1–1, 2020.[4] O. D. Lara and M. A. Labrador, “A survey on human activity recognitionusing wearable sensors,”

IEEE communications surveys & tutorials ,vol. 15, no. 3, pp. 1192–1209, 2012.[5] E. Fullerton, B. Heller, and M. Munoz-Organero, “Recognizing humanactivity in free-living using multiple body-worn accelerometers,”

IEEESensors Journal , vol. 17, no. 16, pp. 5290–5297, 2017.[6] L. Pucci, E. Testi, E. Favarelli, and A. Giorgetti, “Human activitiesclassiﬁcation using biaxial seismic sensors,”

IEEE Sensors Letters ,vol. 4, no. 10, pp. 1–4, 2020.[7] E. Gambi, G. Temperini, R. Galassi, L. Senigagliesi, and A. De Santis,“Adl recognition through machine learning algorithms on iot air qualitysensor dataset,”

IEEE Sensors Journal , vol. 20, no. 22, pp. 13 562–13 570, 2020.[8] J. Lu, X. Zheng, M. Sheng, J. Jin, and S. Yu, “Efﬁcient human activityrecognition using a single wearable sensor,”

IEEE Internet of ThingsJournal , 2020.[9] J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venu-gopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutionalnetworks for visual recognition and description,” in

Proceedings of theIEEE conference on computer vision and pattern recognition , 2015, pp.2625–2634.[10] C. P. Burgos, L. G¨artner, M. A. G. Ballester, J. Noailly, F. St¨ocker,M. Sch¨onfelder, T. Adams, and S. Tassani, “In-ear accelerometer-basedsensor for gait classiﬁcation,”

IEEE Sensors Journal , vol. 20, no. 21,pp. 12 895–12 902, 2020.[11] F. Demrozi, R. Bacchin, S. Tamburin, M. Cristani, and G. Pravadelli,“Towards a wearable system for predicting the freezing of gait in peopleaffected by parkinson’s disease,”

IEEE journal of biomedical and healthinformatics , 2019. [12] F. Demrozi, G. Pravadelli, P. J. Tighe, A. Bihorac, and P. Rashidi, “Jointdistribution and transitions of pain and activity in critically ill patients,”in , 2020, pp. 4534–4538.[13] E. Kanjo, E. M. Younis, and C. S. Ang, “Deep learning analysisof mobile physiological, environmental and location sensor data foremotion detection,”

Information Fusion , vol. 49, pp. 46–56, 2019.[14] J. R. Kwapisz, G. M. Weiss, and S. A. Moore, “Activity recognition us-ing cell phone accelerometers,”

ACM SigKDD Explorations Newsletter ,vol. 12, no. 2, pp. 74–82, 2011.[15] D. Roggen, A. Calatroni, M. Rossi, T. Holleczek, K. F¨orster, G. Tr¨oster,P. Lukowicz, D. Bannach, G. Pirkl, A. Ferscha et al. , “Collectingcomplex activity datasets in highly rich networked sensor environments,”in . IEEE, 2010, pp. 233–240.[16] M. Bachlin, M. Plotnik, D. Roggen, I. Maidan, J. M. Hausdorff,N. Giladi, and G. Troster, “Wearable assistant for parkinson’s diseasepatients with the freezing of gait symptom,”

IEEE Transactions onInformation Technology in Biomedicine , vol. 14, no. 2, pp. 436–446,2009.[17] A. Reiss and D. Stricker, “Introducing a new benchmarked dataset foractivity monitoring,” in . IEEE, 2012, pp. 108–109.[18] P. Zappi, T. Stiefmeier, E. Farella, D. Roggen, L. Benini, and G. Troster,“Activity recognition from on-body sensors by classiﬁer fusion: sensorscalability and robustness,” in . IEEE, 2007, pp.281–286.[19] D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A publicdomain dataset for human activity recognition using smartphones.” in

Esann , 2013.[20] M. Zhang and A. A. Sawchuk, “Usc-had: a daily activity dataset forubiquitous activity recognition using wearable sensors,” in

Proceedingsof the 2012 ACM Conference on Ubiquitous Computing . ACM, 2012,pp. 1036–1043.[21] A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard,A. Dey, T. Sonne, and M. M. Jensen, “Smart devices are different:Assessing and mitigatingmobile sensing heterogeneities for activityrecognition,” in

Proceedings of the 13th ACM conference on embeddednetworked sensor systems , 2015, pp. 127–140.[22] O. Banos, R. Garcia, J. A. Holgado-Terriza, M. Damas, H. Pomares,I. Rojas, A. Saez, and C. Villalonga, “mhealthdroid: a novel frameworkfor agile development of mobile health applications,” in

Internationalworkshop on ambient assisted living . Springer, 2014, pp. 91–98.[23] O. Ba˜nos, M. Damas, H. Pomares, I. Rojas, M. A. T´oth, and O. Amft,“A benchmark dataset to evaluate sensor displacement in activity recog-nition,” in

Proceedings of the 2012 ACM Conference on UbiquitousComputing . ACM, 2012, pp. 1026–1035.[24] J. W. Lockhart, G. M. Weiss, J. C. Xue, S. T. Gallagher, A. B.Grosner, and T. T. Pulickal, “Design considerations for the wisdm smartphone-based sensor mining architecture,” in

Proceedings of the FifthInternational Workshop on Knowledge Discovery from Sensor Data ,2011, pp. 25–33.[25] C. M. Bishop,

Pattern recognition and machine learning . springer,2006.[26] P. M. Domingos, “A few useful things to know about machine learning.”

Commun. acm , vol. 55, no. 10, pp. 78–87, 2012.[27] B. Shickel, P. J. Tighe, A. Bihorac, and P. Rashidi, “Deep ehr: a survey ofrecent advances in deep learning techniques for electronic health record(ehr) analysis,”

IEEE journal of biomedical and health informatics ,vol. 22, no. 5, pp. 1589–1604, 2017.[28] G. Marcus, “Deep learning: A critical appraisal,” arXiv preprintarXiv:1801.00631 , 2018.[29] J. Brownlee,

Master Machine Learning Algorithms: discover how theywork and implement them from scratch . Machine Learning Mastery,2016.[30] M. Shoaib, S. Bosch, O. D. Incel, H. Scholten, and P. J. Havinga, “Asurvey of online activity recognition using mobile phones,”

Sensors ,vol. 15, no. 1, pp. 2059–2085, 2015.[31] P. Nardi, “Human activity recognition: Deep learning techniques for anupper body exercise classiﬁcation system,” 2019.[32] F. Ord´o˜nez and D. Roggen, “Deep convolutional and lstm recurrentneural networks for multimodal wearable activity recognition,”

Sensors ,vol. 16, no. 1, p. 115, 2016.[33] K. Li, R. Habre, H. Deng, R. Urman, J. Morrison, F. D. Gilliland, J. L.Ambite, D. Stripelis, Y.-Y. Chiang, Y. Lin et al. , “Applying multivariatesegmentation methods to human activity recognition from wearable sensors’ data,”

JMIR mHealth and uHealth , vol. 7, no. 2, p. e11201,2019.[34] Y. Zhu, J. Yu, F. Hu, Z. Li, and Z. Ling, “Human activity recognitionvia smart-belt in wireless body area networks,”

International Journalof Distributed Sensor Networks , vol. 15, no. 5, p. 1550147719849357,2019.[35] G. Lemaˆıtre, F. Nogueira, and C. K. Aridas, “Imbalanced-learn: Apython toolbox to tackle the curse of imbalanced datasets in machinelearning,”

Journal of Machine Learning Research , vol. 18, no. 17, pp.1–5, 2017. [Online]. Available: http://jmlr.org/papers/v18/16-365[36] I. Pratama, A. E. Permanasari, I. Ardiyanto, and R. Indrayani, “Areview of missing values handling methods on time-series data,” in . IEEE, 2016, pp. 1–6.[37] “Sensormotion,” https://pypi.org/project/sensormotion/, accessed: 2020-11-24.[38] M. Barandas, D. Folgado, L. Fernandes, S. Santos, M. Abreu, P. Bota,H. Liu, T. Schultz, and H. Gamboa, “Tsfel: Time series feature extractionlibrary,”

SoftwareX , vol. 11, p. 100456, 2020.[39] G. Chandrashekar and F. Sahin, “A survey on feature selection methods,”

Computers & Electrical Engineering , vol. 40, no. 1, pp. 16–28, 2014.[40] H. He, Y. Bai, E. Garcia, and S. A. Li, “Adaptive synthetic samplingapproach for imbalanced learning. ieee international joint conference onneural networks. 2008,” 2008.[41] D. M. Powers, “Evaluation: from precision, recall and f-measure to roc,informedness, markedness and correlation,” – , 2011.[42] G. M. Weiss, K. Yoneda, and T. Hayajneh, “Smartphone andsmartwatch-based biometrics using activities of daily living,” IEEEAccess , vol. 7, pp. 133 190–133 202, 2019.[43] L. T. Nguyen, M. Zeng, P. Tague, and J. Zhang, “Recognizing newactivities with limited training data,” in

Proceedings of the 2015 ACMInternational Symposium on Wearable Computers , 2015, pp. 67–74.[44] F. Demrozi, F. Chiarani, and G. Pravadelli, “A low-cost ble-baseddistance estimation, occupancy detection and counting system,” in

ACM/IEEE DATE , 2021.

Florenc Demrozi,

PhD in computer science, IEEEmember, received the B.S. and M.E. degrees inComputer Science and Engineering from the Uni-versity of Verona, Italy, respectively in 2014 and2016, and the Ph.D. degree in Computer Sciencefrom University of Verona, Italy, in 2020. He iscurrently a Postdoctoral researcher and TemporaryProfessor at the Department of Computer Science,University of Verona, Italy, where he is memberof the ESD (Electronic Systems Design) ResearchGroup, working on Ambient Intelligence (AmI),Ambient Assisted Living (AAL) and Internet of Things (IoT).

Cristian Turetta, received the B.S. and M.E. de-grees in Computer Science and Engineering fromthe University of Verona, Italy, respectively in 2017and 2020. He is currently a research fellow atthe Department of Computer Science, University ofVerona, Italy working on Ambient Assisted Living(AAL), Internet of Things (IoT) and IoT Security.