[PDF] Sensor Data for Human Activity Recognition: Feature Representation and Benchmarking

Abstract

The field of Human Activity Recognition (HAR) focuses on obtaining and analysing data captured from monitoring devices (e.g. sensors). There is a wide range of applications within the field; for instance, assisted living, security surveillance, and intelligent transportation. In HAR, the development of Activity Recognition models is dependent upon the data captured by these devices and the methods used to analyse them, which directly affect performance metrics. In this work, we address the issue of accurately recognising human activities using different Machine Learning (ML) techniques. We propose a new feature representation based on consecutive occurring observations and compare it against previously used feature representations using a wide range of classification methods. Experimental results demonstrate that techniques based on the proposed representation outperform the baselines and a better accuracy was achieved for both highly and less frequent actions. We also investigate how the addition of further features and their pre-processing techniques affect performance results leading to state-of-the-art accuracy on a Human Activity Recognition dataset.

Full PDF

SSensor Data for Human Activity Recognition:Feature Representation and Benchmarking

Fl´avia Alves , Martin Gairing , Frans A. Oliehoek , andThanh-Toan Do Department of Computer Science, University of Liverpool, United Kingdom; e-mail:[email protected], [email protected], [email protected] Department of Intelligent Systems, Delft University of Technology, Netherlands;e-mail: [email protected]

Abstract

The ﬁeld of Human Activity Recognition (HAR) focuses on obtain-ing and analysing data captured from monitoring devices (e.g. sensors).There is a wide range of applications within the ﬁeld; for instance, assistedliving, security surveillance, and intelligent transportation. In HAR, thedevelopment of Activity Recognition models is dependent upon the datacaptured by these devices and the methods used to analyse them, whichdirectly aﬀect performance metrics. In this work, we address the issueof accurately recognising human activities using diﬀerent Machine Learn-ing (ML) techniques. We propose a new feature representation basedon consecutive occurring observations and compare it against previouslyused feature representations using a wide range of classiﬁcation meth-ods. Experimental results demonstrate that techniques based on the pro-posed representation outperform the baselines and a better accuracy wasachieved for both highly and less frequent actions. We also investigatehow the addition of further features and their pre-processing techniquesaﬀect performance results leading to state-of-the-art accuracy on a HumanActivity Recognition dataset.

Over the past ﬁfteen years, extensive research has been carried out in the ﬁeldof Human Activity Recognition [9]. This has been largely motivated by thetechnological advancement in monitoring devices within several research areas.One example where this applies is the improvement of services in elderly care.As discussed in [14], any form of traditional methodology (e.g. in-person visitsand telephone interviews) has its inherent limitations and a 24-hour continu-ous monitoring contributes towards mitigating the risks associated with them.Therefore, the potential that HAR has in order to detect physical and cognitive1 a r X i v : . [ c s . L G ] M a y hanges provides a great opportunity for the development of bespoke preventionplans.HAR aims to infer the actions taken by an individual using monitoring sen-sors [6]. A generic activity recognition model takes as input the data collectedby the sensors and aims to accurately classify the activities of the individual.We use the van Kasteren dataset [22] which consists of binary sensor activityfrom three diﬀerent houses (A, B and C) [22]. The binary sensors capture humanactivity by indicating, for instance, if a door or a cupboard is open or closed,if the toilet is being ﬂushed, or if a person is sitting on a couch, lying in bedor moving in a speciﬁc area. The dataset provides sensor readings in 60 secondintervals.In this paper, we present a thorough study of ML techniques including prob-abilistic (Na¨ıve Bayes, Hidden Markov Model, Hidden Semi-Markov Model andConditional Random Field) and neural network based (Recurrent Neural Net-work, Long Short-Term Memory Network, Gated Recurrent Unit, Multi-LayerPerceptron and a Long Short-Term Memory Network with a Conditional Ran-dom Field layer) models to the classiﬁcation task. The main contributions are:(i) A new feature representation (observation-based) is proposed and comparedagainst the state-of-the-art results for other feature representations. The pro-posed representation outperforms the others and, in general, is able to producea better accuracy for both dominant and minor classes; (ii) We provide an ex-tensive evaluation and analysis of the aforementioned classiﬁcation models. Ouranalysis shows that the Conditional Random Field model performs best usingan observation-based representation; (iii) Our best method produces state-of-the-art accuracy on the van Kasteren dataset. A number of papers have proposed techniques for classifying the data in [22]and evaluated them using two evaluation metrics: the overall accuracy and themean per class accuracy .Both generative (e.g. Na¨ıve Bayes (NB) [22], Hidden Markov Models (HMMs)[23, 22]) and discriminative (Support Vector Machines (SVMs) [2], ConditionalRandom Fields (CRFs) [23, 22]) methods have been evaluated against thisdataset. The state-of-the-art methods are Hidden Semi-Markov models (HSMMs)and CRFs [23, 22] depending on which evaluation metric is being considered.From the literature we are able to identify the state-of-the-art methods whichprovide the best accuracy and mean per class accuracy, in particular, CRFs andHSMMs, respectively. The previous best results for those metrics and theirstandard deviation and our improved results are summarised in Table 1. Ourresults for HSMM and CRF diﬀer from the ones that were published in [22],in particular, the values of the mean per class accuracy that we obtained forthe CRF method are signiﬁcantly higher. The improved results are most likely The accuracy calculates how often the predictions match the class labels and the meanper class accuracy calculates the average of the per-class accuracies.

House Model Mean per classaccuracy AccuracyA HSMM 74.96 ± ± ± ± ± ± ± ± ± ± B HSMM 65.18 ± ± ± ± ± ± ± ± ± ± C HSMM 55.98 ± ± ± ± ± ± ± ± ± ± due to the enhancement of the MATLAB library L-BFGS (Limited-memoryBroyden–Fletcher–Goldfarb–Shanno [4]).Recently, Arifoglu et al. [2] applied SVMs and diﬀerent types of RecurrentNeural Networks (RNNs) to the dataset. In their work, only a portion of thedata is used for testing, which diﬀers from the approach taken by van Kasteren etal. [22], where a full K-Fold cross validation is carried out. The results presentedin [22] are therefore more trustworthy, hence we apply the same technique inthis paper.Singh et al. [20] applied an LSTM network to the dataset. Even though theresults did not outperform state-of-the-art methods, this work demonstratedthat LSTMs are capable of performing well given the temporal dependenciespresent in this dataset.Other techniques such as stacked autoencoders [5] and modiﬁed weightedSVMs [1] have been considered in order to develop a classiﬁer for the dataset.Furthermore, hybrid approaches have also been discussed and applied [10, 12,17, 18, 19]. The improvement of the L-BFGS library in 2011 [3] has likely resulted in a better learningprocess of the Conditional Random Field model and, consequently, in an improved algorithmthat yields a better performance. .2 Roadmap The rest of the paper is organised as follows. Section 2 introduces some ofthe ML models that were used, Section 3 presents the proposed feature rep-resentation and the pre-processing techniques utilised. Section 4 demonstratesthe eﬀect that the feature representation as well as the combination of diﬀer-ent features has on a model’s performance. We also show how our best resultsimprove the state-of-the-art. Section 5 concludes this paper with pointers tofuture directions.

In this section, we present the task we aim to tackle and provide an overview ofsome of the ML models applied.Given a dataset { ( X it , y t ) } , such that t = 1 ...T and i = 1 ...N , where T isthe number of data points and N the number of features, the task is to learn afunction f : S N (cid:55)→ { , ..., c } , where S is some abstract space and c the numberof activities. In this kind of task, both (cid:104) X , y (cid:105) need to be provided in order toperform supervised learning.For our dataset, X represents the sensor data and y the corresponding labelsof the activities performed. Na¨ıve Bayes, Hidden Markov Model, Hidden Semi-Markov Model and Condi-tional Random Field constitute the state-of-the-art probabilistic models for thisdataset. In the following sections we provide a brief description of those models.

The Na¨ıve Bayes model assumes that data points are independently and identi-cally distributed, which does not account for temporal dependencies or relationsbetween data points with respect to an activity.Let X = ( x , ..., x T ) be a sequence of data points and y = ( y , ..., y T ) thecorresponding labels. The joint probability of y and X is calculated as follows: p ( y , X ) = T (cid:89) t =1 p ( x t | y t ) p ( y t ) , where p ( x t | y t ) is decomposed as p ( x t | y t ) = N (cid:89) i =1 p ( x it | y t )by assuming that the features (e.g. sensors) are conditionally independent givenan activity y t (“na¨ıve” conditional independence assumption). In other words,4ensors X i and X j ( i, j ∈ { ...N } , where i (cid:54) = j ) are conditionally independentgiven label y . This assumption reduces the complexity of the aforementionedclassiﬁer however, given that y occurs, knowledge of whether X i is active pro-vides no information on the likelihood of X j being active, and vice versa. The Hidden Markov Model is an extension of Na¨ıve Bayes and is capable ofmodelling temporal dependencies between consecutive time steps. Following thesame notation as in the previous section, the model relies on two independenceassumptions:(i) y t is only dependent on y t − (ﬁrst order Markov assumption);(ii) x t is only dependent on y t (output independence assumption).Moreover, the HMM is also a stationary process, which implies that p ( y t | y t − ) = p ( y | y ) , t ∈ { ...T } .The joint probability is calculated as follows: p ( y , X ) = T (cid:89) t =1 p ( x t | y t ) p ( y t | y t − )We will use maximum likelihood estimation (MLE) to estimate the parame-ters θ which maximises the likelihood of observing y and X given the model θ :ˆ θ = arg , max θ P ( y , X | θ ). A Semi-Markov Model is a generalised Poisson [8] process where the holdingtimes need not be independent and identically distributed. Although it is similarto a Markov renewal process [16], the Hidden Semi-Markov Model (HSMM)[24] is a stochastic process where a state has a corresponding length. Thelength of each state is determined by its duration. Therefore, this is a time-evolving process where the transition between states is made at jump times anddependent upon the corresponding probability distributions.The main diﬀerence between HMMs and HSMMs is the relaxation of theMarkov assumption. In particular, HSMMs are able to do this by modelling theduration of a state (e.g. activity). Therefore, a new variable d t is introduced inthis model and the joint probability is calculated as follows: p ( y , X , d ) = T (cid:89) t =1 p ( x t | y t ) p ( y t | y t − , d t − ) p ( d t | d t − , y t ) . We use MLE to estimate the parameters θ which maximises the likelihoodof observing y , X and d given the model θ : ˆ θ = arg , max θ P ( y , X , d | θ ).5 .1.4 Conditional Random Field The Conditional Random Field model, which is the most structurally similar tothe HMM model, is called a linear-chain CRF. This model relies on the sameindependence assumptions as the HMM:(i) y t is only dependent on y t − (ﬁrst order Markov assumption);(ii) x t is only dependent on y t (output independence assumption).Unlike HSMMs, linear-chain CRF models do not explicitly model the dura-tion of a state. The conditional distribution is calculated using the followingexpression: p ( y | X ) = 1 Z ( X ) T (cid:89) t =1 exp L (cid:88) l =1 λ l f l ( y t , y t − , x t ) , where f l ( y t , y t − , x t ) is a feature function, λ l is a weight parameter and L isthe number of feature functions. The potential function is the exponentialrepresentation of the product of λ l f l ( y t , y t − , x t ), which can take any positivevalue, hence why Z ( X ) is needed as a normalization term.A CRF is also a stationary process and it uses CMLE (Conditional MaximumLikelihood Estimator), which ﬁnds the θ (CRF parameters) that maximises theconditional likelihood of observing y given the model θ : ˆ θ = arg , max θ P ( y | X , θ ).Therefore, unlike HMMs which assume that x j are conditionally independent,CRFs make no assumptions about p ( X ). One of the main diﬀerences between statistical and neural network models isrelated to interpretability. Unlike statistical ML models, neural network modelsdo not provide interpretation even though they do provide an eﬀective repre-sentation of data properties [13].In the following sections, three diﬀerent recurrent neural network models arepresented: RNN, LSTM and GRU.

The RNN model considered is a fully-connected RNN, where the ouput is fedback to the input. Hence, RNNs contain loops in them which is what allowsthese type of networks to learn temporal dependencies.Let x = ( x , ..., x T ) be an input sequence and h = ( h , ..., h T ) the hiddenvector sequence computed by a recurrent neural network. In an RNN, the hiddenvector h ( t ) , at time step t , is computed as follows: h ( t ) = φ ( W h h ( t − + W x x ( t ) ) + b h ) , where φ is the activation function. The parameters W and b are the weightmatrix and bias vector, respectively. 6 .2.2 Long Short-Term Memory Network In long-term dependencies, when there is a large time gap between where speciﬁcinformation is stored and where it is needed, RNNs do not perform well; andLSTMs [11] are a better and more robust solution. LSTMs are a type of RNNswhich are able to detect dependencies across long time windows.The LSTM architecture is composed of connected cells and each cell is consti-tuted by three gates: the input ( i ( t ) ), output ( o ( t ) ) and forget ( f ( t ) ) gates, whichcontrol the information that is added to or removed from the cell. Moreover,besides having an internal state c ( t ) , a cell also contains a layer which producesthe variable (cid:101) c ( t ) . This variable is representative of the candidate values whichmay potentially be added to the internal state.This type of networks are able to learn the importance of features over timeby storing information in the hidden layers. This is done by performing anoptimisation of the weights that impacts the information ﬂow. Consequently,LSTMs can lead to a better comprehension of data patterns, which makes themuseful to be applied in the ﬁeld of HAR.The following equations are used, in an iterative manner, to obtain the scalarvalue h ( t ) , at time step t , of the output vector of the cell. The symbol (cid:12) denoteselement-wise multiplication. i ( t ) = σ ( W ih h ( t − + W ix x ( t ) + b i ) f ( t ) = σ ( W fh h ( t − + W fx x ( t ) + b f ) o ( t ) = σ ( W oh h ( t − + W ox x ( t ) + b o ) (cid:101) c ( t ) = φ ( W ch h ( t − + W cx x ( t ) + b c ) c ( t ) = i ( t ) (cid:12) (cid:101) c ( t ) + f t (cid:12) c ( t − h ( t ) = o ( t ) (cid:12) φ ( c ( t ) ) , where σ and φ are the activation functions. The Gated Recurrent Unit [7] is a variation of the LSTM, in which the inputand forget gates are combined into one and the cell state and hidden stateare the same. Moreover, a new gate called relevance gate is considered in thisarchitecture and it calculates how relevant c ( t − is to compute c ( t ) . In a GRU,the equations used in order to obtain h ( t ) , at time step t , are as follows: i ( t ) = σ ( W ic c ( t − + W ix x ( t ) + b i ) r ( t ) = σ ( W rc c ( t − + W rx x ( t ) + b r ) (cid:101) c ( t ) = φ ( W cc r ( t ) c ( t − + W cx x ( t ) + b c ) c ( t ) = i ( t ) (cid:12) (cid:101) c ( t ) + (1 − i ( t ) ) (cid:12) c ( t − h ( t ) = c ( t ) , where σ and φ are the activation functions.7igure 1: Relative frequencies of activities in houses A, B and C (a) House A (b) House B (c) House C The dataset which will be used in the experiments refers to sensor activity inthree diﬀerent houses (A, B and C) [22]. The data is representative of theactivation and deactivation of binary sensors, where a reading is provided everyminute for time spans ranging from 14 to 25 days. As a result, in the data thereare long stretches where the sensor readings do not change. For example, forhouses B and C, on average, the sensors change state only every one and a halfhour.Van Kasteren et al. [22] used various types of binary sensors (e.g. passiveinfrared; pressure mats; reed switches), which were placed in three diﬀerentenvironments: houses A, B and C. In order to map the observations obtainedfrom these sensors to activities, an annotation system was put in place [23].The relative frequencies of activities in the three diﬀerent houses are rep-resented in Figure 1. Table 2 presents some information about this dataset,in particular, the number of sensors placed around the house, the number ofactivities, the age of the person who inhabited the house and how many daysof data we have. In general, the most frequent labels in the three houses are‘Idle’, ‘Leave house’ and ‘Go to bed’. A slight higher frequency of label ‘Idle’ isnoticeable for house C. On the other hand, the label ‘Leave the house’ acquiresa higher frequency in houses A and B.

Since there are long periods of time where the sensors do not change, learningtemporal dependencies on this type of data requires a long time history ofprevious data points, denoted as look-back window. We have observed thatthere is a gradual increase of training time with higher values for the look-backwindow. To overcome this, we propose a new representation for sensor datacalled observation-based (OB) representation, which combines consecutive data8able 2: Details about the datasetHouse Sensors Activities Age Duration(days)A 14 10 26 25B 23 13 28 14C 21 16 57 19points with the same sensor readings into one data point. Hence, data pointsare merged if sensor readings remain unchanged.Furthermore, three diﬀerent feature representations were considered in [22]:raw, changepoint and last-ﬁred. These were initially introduced in [23] and area way of comparing how the data is given as an input and the impact thatit has in the overall recognition performance. In the raw representation, thesensor takes value 1 when it is activated and 0 otherwise; with the changepointrepresentation, the sensor takes value 1 when it changes state and 0 otherwise;the last ﬁred representation makes the last sensor that changed state to takevalue 1 until another sensor changes its state.In comparison, our proposed representation is more expressive than thechangepoint and last-ﬁred representations, because it yields information aboutthe current and/or most recent sensors that have changed its value, withouthaving to provide a large number for the look-back window. The disadvantageof having a large number for the look-back window is that it may aﬀect the clas-siﬁcation of other activities which do not require all the information providedby the data that is fed into the network.

When computing the OB representation, the variable ∆ t is obtained by calculat-ing how long the sensor readings remain unchanged. Since the dataset providessensor readings in 60 second intervals, ∆ t indicates the duration (in minutes)of no change for a sensor reading. We study the eﬀect of using this variable aswell as the hour variable, which represents the hour of a sensor reading. Byincorporating the latter, the information provided can be useful for classiﬁcationpurposes.The frequency of each possible value for variable ∆ t in house A is presentedin Figure 2a and we observe a similar distribution for houses B and C (Fig-ures 2b and 2c). For house A, this variable can take values from 1 to 2732. Inorder to keep the number of features small, we further discretize ∆ t into coarserbins. Hence, each bin will essentially represent an interval. Based on the rel-ative frequency, we considered two diﬀerent ways of splitting this variable intointervals: one results in a total of 48 intervals (48i) and the other one in a totalof 7 (7i). The diﬀerence between the two lies on the importance of categorisingsmaller durations. Let t i ∈ ∆ t . In 48i, the following cases were considered:9igure 2: Frequencies of the ∆t variable for houses A, B and C (a) House A (b) House B (c) House C

1. Each t i is uniquely encoded if t i < = 30;2. One encoding representation for each of the following intervals: (a) 30 < t i < = 40 (g) 120 < t i < = 150 (m) 300 < t i < = 360(b) 40 < t i < = 50 (h) 150 < t i < = 180 (n) 360 < t i < = 420(c) 50 < t i < = 60 (i) 180 < t i < = 210 (o) 420 < t i < = 480(d) 60 < t i < = 80 (j) 210 < t i < = 240 (p) 480 < t i < = 540(e) 80 < t i < = 100 (k) 240 < t i < = 270 (q) 540 < t i < = 600(f) 100 < t i < = 120 (l) 270 < t i < = 300 (r) t i > On the other hand, for 7i, each of the intervals below were uniquely encoded: (a) t i < = 5 (d) 60 < t i < = 120 (f) 150 < t i < = 660(b) 5 < t i < = 30 (e) 120 < t i < = 150 (g) t i > < t i < = 60 We then encode each interval considering two diﬀerent encoding processes:one-hot and unary-based encodings. Regarding the one-hot encoding process,it will generate a squared matrix, where the number of rows is the same asthe number of values. Therefore, it creates new binary columns, indicating thepresence of each possible value. As for the unary-based encoding, this processalso creates a squared matrix which has the same dimension as the matrixgenerated in the previous encoding process. The main diﬀerence between thesetwo encoding processes lies on the interpretation of the binary columns. Inthe one-hot encoding, the binary columns indicate the presence of each possiblevalue, therefore only one component in each column will take value one. Onthe other hand, for the unary-based coding, the binary columns indicate thepresence of values that are less than or equal to each possible value; hence,without loss of generality, supposing the values are in ascending order, all theelements of the lower triangle of the matrix will take value one.In regard to the hour variable, which can take values from 0 to 23, wealso encode this variable using the two processes aforementioned (one-hot andunary-based encodings) but we consider each number a category so, for thisparticular variable, we will have exactly 24 values. Hence, each category will10e representative of the hour of the sensor reading. The reason why we encodethis variable such that the values of the features are in the same range as theother features is because this makes training faster and reduces the chances ofgetting stuck in local optima.

In the following experiments, an OB representation of the dataset is used inorder to compare and evaluate against other feature representations. The OBrepresentation is obtained by directly collecting information from the sensors,which corresponds to the data in its raw representation format. As demon-strated in Table 19 (Appendix A), the raw representation gives the worst results,irrespective of the algorithm. Hence, a good performance by both generativeand discriminant algorithms is always dependent on considering a changepointor last-ﬁred representation. The OB representation provides a generalisationof the changepoint and last-ﬁred representations. Some further discussion ofthe proposed method and analysis of the results is presented in the followingsections.These experiments were run using a K-Fold cross validation approach, wherewe cycle through each one of the days using it for testing and the data corre-sponding to the remaining days is used for training. This is consistent withthe technique applied by van Kasteren et al. [22]. The mean per class accu-racy as well as the overall accuracy are presented as evaluation metrics; and theaccuracy for each class is also calculated.In regard to the neural network models - RNN, LSTM, GRU, MLP andLSTM with a CRF layer (LSTMCRF) - we considered the following set ofhyper-parameters: 128 for the number of units, a learning rate of 0 . We will be using the mean per class accuracy and the accuracy as evaluationmetrics for our experiments. The latter can be deﬁned as follows. Let pred and true be the N -dimensional arrays which contain the model’s predictionsand the true labels of each data point, respectively. Then, the accuracy is the11ercentage of correctly predicted activities, i.e.: accuracy = |{ i ∈ { , ..., N } | pred ( i ) = true ( i ) }| N .

Given the imbalance of the dataset, a classiﬁer would not be properly eval-uated if accuracy was the only metric utilised to assess its performance. There-fore, the accuracy for each class is also presented in order to analyse whetherthe models are being able to accurately classify not only highly frequent classesbut also infrequent ones.Formally, the accuracy of a class c is given by accuracy c = |{ i ∈ { , ..., N c } | pred c ( i ) = true c ( i ) }||{ i ∈ { , ..., N } | true ( i ) = c }| , where pred c and true c are the N c -dimensional arrays which contain the model’spredictions and the true labels of each data point belonging to a class c , respec-tively.Lastly, we deﬁne the mean per class accuracy as follows. Let c ∈ { , ..., C } ,where C is the number of activities in a dataset. Then, the mean per classaccuracy is calculated according to the following expression: mean per class accuracy = 1 c c (cid:88) c =1 accuracy c . The best values for the mean per class accuracy, overall accuracy and per-class accuracies are highlighted in bold.

All the experiments presented in this section do not take into considerationthe features hour ( NoToD , where

ToD stands for

Time of Day ) nor the ∆ t ( NoDeltaT ), i.e. the features hour and ∆ t were not added to the dataset. In the following tables (Tables 3, 4 and 5), we evaluate 8 diﬀerent methodsusing a raw feature representation: NB, HMM, HSMM, CRF, LSTM, GRU,RNN and LSTMCRF. We considered a look-back window of 1 and this servesas a baseline for the experiments run in the next subsections. In particular, theresults provided by the methods NB, HMM, HSMM and CRF were obtainedby reproducing the experiments done in [22]. The CRF model outperformedthe other models for houses A and C. Speciﬁcally, the accuracy(mean per classaccuracy) achieved for house A was 91.85 ± ± ± ± ± ± Label NB HMM HSMM CRF LSTM GRU RNN LSTMCRF‘Idle’ 10.65 49.01 55.03 ‘Prepare breakfast’ 45.98 48.28 51.72 .

46 45 .

18 47 . . .

63 48 .

62 48 .

22 48 . .

64 19 .

11 19 .

27 15 .

66 17 .

19 16 .

95 16 .

92 16 . .

13 59 .

08 58 . . .

94 85 .

34 84 .

80 85 . .

80 28 .

64 29 .

25 7 .

80 10 .

88 11 .

14 10 . Table 4: Accuracy per label (Raw feature representation) - House B (Look-backwindow: 1)

Label NB HMM HSMM CRF LSTM GRU RNN LSTMCRF‘Idle’ 33.79 28.62 37.38 .

65 43 .

77 44 . . . .

26 37 .

84 27 . .

26 12 .

84 13 .

62 13 .

17 13 .

42 12 .

31 11 .

14 12 . .

37 63 .

34 63 .

78 79 .

59 86 .

39 86 . . . .

97 24 .

49 24 .

27 23 .

84 11 .

41 11 .

12 11 .

12 18 . Label NB HMM HSMM CRF LSTM GRU RNN LSTMCRF‘Idle’ 37.04 13.23 24.25 .

85 17 .

24 20 . . .

06 14 .

94 15 .

26 15 . .

56 9 .

49 11 .

00 20 .

23 7 .

77 8 .

48 7 .

23 7 . .

49 26 .

48 31 . . .

28 41 .

36 44 .

73 45 . .

56 22 .

75 24 .

62 22 .

39 23 .

53 24 .

31 22 .

75 23 . We have also applied the LSTM, GRU, RNN and LSTMCRF methods to theraw and OB feature representations. We considered the following values for thelook-back window: 2, 5 and 10. In Tables 6, 7, 8, 9, 10 and 11, we present theresults achieved for houses A, B and C across diﬀerent look-back window values:2, 5 and 10.In regard to house A, we observe that, for all methods, this dataset does notrequire a large value for the look-back window in order to be able to accuratelyclassify highly frequent labels. LSTM is the method which provides the highestaccuracy considering a look-back window of 2. Also, considering the mean perclass accuracy, GRUs are able to perform better than any of the other RNN-based methods. We observe that the optimal value for the look-back windowhere was 5, which only diﬀers 0.3 percent points from the result obtained forthe same method with a look-back window of 2; therefore, since the diﬀerencebetween the mean per-class accuracies is not signiﬁcant, a small look-back win-dow provides enough knowledge in order to achieve a good performance in thisclassiﬁcation task.For house B, LSTMCRF is the method which provides highest accuracyconsidering a look-back window of 2. As for the mean per class accuracy, RNNwith a look-back window of 5 is the method that performs the best, but weobserve once again that there is not a signiﬁcant diﬀerence between the meanper class accuracies for a look-back window of 2 and 5.Lastly, for house C, the RNN method achieved the highest values for theevaluation metrics considered, where a look-back window of 5 and 2 gave thebest results for the mean per-class accuracy and the accuracy, respectively.14able 6: Accuracy per label using LSTM and GRU (Raw vs OB feature repre-sentations) - House A (Look-back window: 2, 5 and 10)

LSTM GRULabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 25.27 87.69 31.12 81.29 39.07 68.29 26.13 86.26 30.57 81.51 37.9 68.6‘Leave house’ 96.46 99.86 96.37 98.9 96.18 87.78 96.46 99.88 96.38 98.96 96.24 99.75‘Use toilet’ 42.19 62.47 43.01 57.53 40.55 18.36 45.21 64.11 49.04 59.73 47.12 57.26‘Take shower’ 0.0 0.0 0.0 5.58 5.58 10.76 0.0 0.0 0.0 .

70 62 .

03 49 .

22 58 .

63 49 .

44 45 .

18 50 .

90 62 .

79 52 . . .

98 61 . .

15 13 .

88 17 .

13 16 . .

56 19 .

04 16 .

61 13 .

71 16 .

37 15 .

71 16 .

53 15 . . . .

91 93 .

46 86 .

77 83 .

95 85 .

50 95 .

45 86 .

01 93 .

81 86 .

82 92 . .

50 3 .

25 10 .

31 6 .

08 9 .

90 12 .

96 10 .

43 3 .

37 10 .

26 5 .

48 9 .

89 7 . Table 7: Accuracy per label using RNN and LSTMCRF (Raw vs OB featurerepresentations) - House A (Look-back window: 2, 5 and 10)

RNN LSTMCRFLabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 25.94 86.0 30.55 66.18 39.19 55.41 25.91 .

68 62 .

58 51 .

22 60 .

70 53 .

68 55 .

54 50 .

80 57 .

37 52 .

48 41 .

38 51 .

43 32 . .

42 13 .

81 16 .

93 17 .

50 16 .

42 17 .

87 16 .

80 14 .

86 16 .

77 19 .

40 17 .

10 22 . .

39 95 .

46 86 . .

77 87 . .

86 85 .

46 95 .

56 86 .

08 86 .

96 87 .

04 57 . .

49 3 .

44 10 .

25 7 .

91 9 .

97 9 . .

45 3 .

36 10 .

23 12 .

38 9 .

93 26 . LSTM GRULabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 46.41 49.52 42.97 41.03 44.0 37.79 41.59 53.24 48.34 49.79 40.48 ‘Leaving the house’ 81.62 81.33 80.72 71.73 82.64 56.73 81.65 83.55 85.32 80.09 78.74 65.5‘Use toilet’ 2.6 1.3 0.0 0.0 0.0 0.0 .

92 36 .

65 36 .

50 35 .

78 36 .

29 30 .

81 37 .

92 38 .

70 39 .

71 39 .

45 36 .

88 36 . .

11 12 .

97 11 .

74 15 . .

21 13 .

35 11 .

86 11 .

20 10 .

90 14 .

08 11 .

73 14 . .

27 81 .

13 79 .

27 75 .

15 80 .

58 64 .

12 79 .

67 82 .

90 82 .

49 81 .

04 78 .

19 70 . .

79 20 .

34 22 .

35 22 .

36 22 .

02 21 .

68 25 .

46 20 .

13 22 .

50 20 .

70 21 .

99 23 . Table 9: Accuracy per label using RNN and LSTMCRF (Raw vs OB featurerepresentations) - House B (Look-back window: 2, 5 and 10)

RNN LSTMCRFLabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 35.52 46.28 38.41 39.59 38.62 43.79 39.86 54.97 57.31 42.83 51.79 47.45‘Leaving the house’ 85.78 84.03 85.7 75.33 78.15 65.07 85.59 .

29 39 .

49 40 . . .

02 34 .

96 33 .

61 34 .

60 34 .

77 32 .

45 32 .

86 28 . .

38 10 .

82 9 .

81 13 .

37 12 .

23 10 .

12 14 .

53 13 .

60 13 .

34 15 .

23 13 .

80 13 . .

86 82 .

70 82 .

19 77 .

98 77 .

56 68 .

84 81 . . .

44 75 .

95 79 .

34 63 . .

27 20 .

10 21 .

89 21 .

93 25 .

43 21 .

24 22 .

29 16 .

38 22 .

38 19 .

94 22 .

21 18 . LSTM GRULabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 38.5 51.93 49.21 ‘Use toilet downstairs’ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0‘Take shower’ 0.0 3.68 4.74 14.74 2.11 0.0 0.53 6.84 2.63 7.37 2.63 8.95‘Brush teeth’ 1.98 0.0 0.0 0.0 0.0 0.0 2.97 0.0 1.98 0.0 0.0 0.0‘Use toilet upstairs’ 8.75 1.25 1.25 0.0 0.0 0.0 11.25 5.0 2.5 0.0 1.25 0.0‘Shave’ 0.0 0.0 2.9 0.0 0.0 0.0 0.0 1.45 1.45 0.0 0.0 0.0‘Go to bed’ 68.39 96.26 74.87 96.39 78.09 88.53 70.11 93.43 74.68 97.19 75.36 91.92‘Get dressed’ 7.14 7.14 7.14 19.64 12.5 10.71 8.04 15.18 9.82 20.54 4.46 25.0‘Take medication’ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 .

70 23 .

09 17 .

69 26 .

52 18 .

88 21 .

37 15 .

06 24 .

68 18 .

22 26 .

93 16 .

43 26 . .

09 19 .

98 8 .

15 19 .

15 10 .

22 19 .

70 8 .

26 19 .

95 11 .

12 18 .

76 7 .

02 19 . .

33 59 .

31 42 .

58 58 .

04 44 .

42 47 .

06 38 .

21 60 .

01 43 .

33 59 .

43 41 .

22 52 . .

44 19 .

92 19 .

62 22 .

98 21 .

19 19 .

45 19 .

03 21 .

83 23 .

13 20 .

74 21 .

38 20 . Table 11: Accuracy per label using RNN and LSTMCRF (Raw vs OB featurerepresentations) - House C (Look-back window: 2, 5 and 10)

RNN LSTMCRFLabel 2 5 10 2 5 10Raw OB Raw OB Raw OB Raw OB Raw OB Raw OB‘Idle’ 33.13 61.62 40.49 64.72 41.12 57.25 33.85 49.39 48.05 61.78 49.31 60.37‘Leave house’ 24.64 47.19 25.07 19.94 21.19 22.85 41.53 .

35 25 .

66 21 . . .

54 19 .

86 15 .

19 23 .

47 17 .

01 25 .

41 19 .

67 20 . .

14 19 .

31 20 .

13 18 .

80 20 .

36 6 .

37 7 .

86 19 .

31 8 .

30 19 .

68 19 .

92 19 . . . .

97 51 .

29 43 .

25 45 .

67 45 .

67 62 .

53 44 .

09 55 .

12 40 .

96 45 . .

28 19 .

29 23 .

65 22 .

47 25 .

23 18 .

05 23 .

23 20 .

32 22 .

56 23 . .

90 18 .

17n general, we observe that for lower look-back window values, our proposedfeature representation achieves signiﬁcantly better results than the raw rep-resentation. Moreover, from the results obtained for the LSTM, GRU, RNNand LSTMCRF models, we conclude that the best accuracy for all houses wasobtained by considering a look-back window of 2 and an OB feature representa-tion of the data. In addition, neural network models seem not to beneﬁt muchfrom concatenating multiple data points for training as those techniques learntemporal dependencies diﬀerently.We also note that, if considering the raw feature representation, a longerlook-back window is required so that LSTM models are able to obtain reasonableresults. In particular, it becomes hard to accurately predict labels due to thelong-term dependencies inherent to the raw feature representation. Therefore,based on the results obtained, this implies that there is an advantage in usingthe proposed feature representation. The OB feature representation is shownto be beneﬁcial not only in obtaining a higher accuracy but also in decreasingthe training time given that a better performance is achieved when consideringa low look-back window value.

In this experiment, we use probabilistic models and a feed forward neural net-work model and considered an OB feature representation. Unlike recurrentneural networks, models such as NB, HMM, HSMM, CRF and MLP are limitedto a single “time step” (i.e. a look-back window of 1). However, it is possible toprovide look-back information to these models. We accomplish this by feedingin a sequence which contains concatenated data points. Speciﬁcally, we add themost recent data points as further features of the current single data point. Weconsider 2, 5 and 10 as the possible values for the number of recent data pointsto be concatenated with the current one.Also, we do not consider the raw representation for these models as it wouldresult in low information signals, where repeated information would be given asinput to the models in the form of equal concatenated data points.For both overall accuracy as well as per-class accuracies, CRFs were ableto outperform all the experiments done thus far by using an OB feature repre-sentation (Tables 12, 13 and 14). The best accuracy values were obtained byconcatenating 5 data points for house A (97.14 ± ± ± Label NB HMM HSMM CRF MLP2 5 10 2 5 10 2 5 10 2 5 10 2 5 10‘Idle’ 84.38 57.81 43.81 51.41 33.59 22.2 53.58 36.8 22.87 84.34 .

67 57 .

52 55 .

86 67 .

79 55 .

46 49 .

71 70 .

18 60 .

18 54 .

68 69 .

84 80 . . .

25 64 .

25 62 . .

29 17 .

55 20 .

53 17 .

78 21 .

63 24 .

84 17 .

82 17 .

73 23 .

14 13 .

69 12 .

65 12 .

57 14 .

42 14 .

10 17 . . .

65 76 .

73 86 .

71 74 .

86 58 .

46 87 .

37 79 .

91 67 .

26 96 . . .

12 95 .

89 95 .

37 89 . .

58 9 .

01 24 .

12 13 .

64 28 .

48 30 .

51 13 .

27 22 .

41 25 . .

92 5 .

89 8 .

29 2 .

62 3 .

54 11 . Table 13: Accuracy per label (OB feature representation) - House B (Datapoints concatenated: 2, 5 and 10)

Label NB HMM HSMM CRF MLP2 5 10 2 5 10 2 5 10 2 5 10 2 5 10‘Idle’ 42.55 35.93 44.14 21.52 25.03 32.41 23.52 25.52 34.21 55.66 51.1 .

85 32 .

15 30 .

54 36 .

26 36 .

15 31 .

77 36 .

42 36 .

38 31 .

90 57 .

24 61 . . .

06 46 .

44 45 . .

42 10 .

75 11 .

28 11 .

13 11 .

79 13 .

38 12 .

61 11 .

73 13 .

37 18 .

52 23 .

61 23 .

40 15 .

42 16 .

13 17 . .

17 65 .

85 59 .

49 63 .

19 60 .

74 57 .

87 63 .

57 60 .

77 58 .

00 87 .

10 81 . . .

62 81 .

91 80 . .

03 27 .

03 27 . .

02 28 .

96 27 .

86 24 .

66 28 .

90 27 .

77 22 .

02 23 .

85 16 .

77 16 .

44 20 . . Label NB HMM HSMM CRF MLP2 5 10 2 5 10 2 5 10 2 5 10 2 5 10‘Idle’ 43.66 38.52 33.77 27.4 14.99 20.16 33.33 17.12 20.63 67.71 68.31 ‘Get dressed’ 22.32 26.79 19.64 46.43 33.04 18.75 43.75 32.14 22.32 12.5 .

09 21 . .

23 21 .

61 19 .

05 15 .

04 22 .

14 19 .

48 15 .

86 38 .

74 49 . . .

20 34 .

83 34 . .

27 8 .

43 9 .

66 7 .

88 8 .

05 10 .

48 8 .

72 7 .

99 10 .

44 17 .

13 17 .

34 16 .

57 23 .

63 22 .

11 20 . .

61 40 .

93 31 .

19 32 .

76 24 .

59 16 .

82 33 .

67 25 . .

93 84 .

26 84 . . .

46 64 .

90 64 . .

53 18 .

83 19 .

34 20 .

74 18 .

36 19 .

13 20 .

46 18 .

05 19 .

02 14 .

05 20 .

48 14 .

85 23 .

81 25 .

74 26 . From the results presented in the last section, it is possible to conclude thatCRF is the algorithm which overwhelmingly is able to perform the best usingan OB feature representation. In this section, we show the results obtained byadding the time of day ( hour ) as a further feature to the dataset. In total, weconsidered ﬁfteen diﬀerent feature combinations in our experiments.In all the experiments presented in Section 4.2, the features hour ( ToD ) and∆ t ( DeltaT ) were not added to the dataset (

NoToD&NoDeltaT ). In order totest and evaluate the need to better distinguish duration intervals, we consideredall the other feature combinations, which result from adding a one-hot(unary-based) encoding of i intervals of the feature ∆ t - OneHotDeltaT i ( UnaryDeltaT i )- and/or a one-hot(unary-based) encoding of the feature hour - OneHotToD ( UnaryToD ) - to the dataset.In the following experiment, we test and evaluate the need to better distin-guish duration intervals, i.e. the improvements obtained in accuracy by consid-ering more ∆ t values. The results are shown is Table 15.From the results, we see that the best performance for houses A, B and C re-sulted from the feature combinations UnaryToD&UnaryDeltaT7 (5 data pointsconcatenated),

UnaryToD&UnaryDeltaT48 (5 data points concatenated) and

OneHotToD&UnaryDeltaT48 (10 data points concatenated), respectively. Fur-thermore, we observe that only house C signiﬁcantly beneﬁts from using more∆ t values and generally, one-hot and unary-based encodings produce similarresults for all houses.Speciﬁcally, 98 . ± .

62 was the best result achieved for house A, where aunary-based encoding with 7 bins was considered. For house B, the best resultachieved was 96 . ± .

35 by applying a unary-based encoding with 48 bins and20able 15: Accuracy using CRF (OB feature representation) - Houses A, B andC

Feature House A House B House CCombination 2 5 10 2 5 10 2 5 10NoToD&NoDeltaT 96 .

04 97 .

14 96 .

12 87 .

10 81 .

51 87 .

55 84 .

26 84 .

12 90 . .

92 5 .

89 8 .

29 22 .

02 23 .

85 16 .

77 14 .

05 20 .

48 14 . .

62 98 .

70 98 .

05 87 .

21 90 .

25 88 .

47 84 .

39 81 .

98 91 . .

01 1 .

66 3 .

56 16 . .

74 15 .

38 15 .

49 21 .

87 13 . .

29 98 .

10 98 .

77 88 .

28 90 .

89 89 .

77 85 .

97 90 .

38 92 . .

17 4 .

69 3 .

26 13 .

45 11 .

44 15 .

61 13 .

42 17 .

03 16 . .

55 98 .

49 98 .

55 88 .

49 87 .

27 91 .

96 83 .

28 86 .

63 89 . .

88 1 .

97 2 .

98 13 .

61 18 .

71 14 .

63 15 .

89 17 .

22 16 . .

32 98 .

41 98 .

79 89 .

59 92 .

13 88 .

85 88 .

88 86 .

78 91 . .

15 3 .

54 3 .

37 19 .

98 11 .

81 14 .

14 11 .

53 19 .

26 16 . .

78 98 .

24 96 .

85 86 .

53 91 .

80 87 .

66 87 .

01 88 .

84 90 . .

88 3 .

22 7 .

91 20 .

65 12 .

39 17 .

12 11 .

89 17 .

43 15 . .

29 98 .

34 97 .

85 90 .

95 92 .

65 90 .

53 88 .

27 89 .

22 86 . .

36 4 .

03 5 .

77 12 .

29 8 .

85 12 .

99 13 .

80 17 .

25 19 . .

16 98 .

50 95 .

94 88 .

18 92 .

72 91 .

23 84 .

52 92 .

74 92 . .

30 3 .

71 11 .

82 17 .

04 12 .

68 17 .

02 20 .

44 13 .

24 14 . .

37 98 .

31 98 .

17 91 .

12 95 .

47 94 .

11 83 .

64 88 .

10 88 . .

38 2 .

57 4 .

64 12 .

56 6 .

73 8 .

13 17 .

20 16 .

74 15 . .

71 98 .

77 98 .

79 87 .

97 93 .

72 93 .

05 89 .

36 91 . . Standard deviation 1 .

39 3 .

29 2 .

68 16 .

53 9 .

80 13 .

39 15 .

27 15 .

36 15 . .

90 97 .

00 97 .

40 87 .

15 90 .

35 95 .

48 87 .

54 89 .

78 92 . .

70 5 .

20 5 .

93 19 .

28 16 .

93 8 .

27 14 .

53 16 .

42 11 . .

92 98 .

49 98 .

38 93 .

23 95 .

63 92 .

89 87 .

48 86 .

06 86 . .

81 3 .

09 3 .

90 7 .

91 8 .

23 14 .

50 16 .

62 16 .

53 18 . .

69 97 .

66 96 .

00 93 .

95 94 .

63 95 .

82 87 .

89 91 .

24 91 . .

36 7 .

45 11 .

56 9 .

45 9 .

19 10 .

15 14 .

82 15 .

60 14 . . . .

28 94 .

32 95 .

68 95 .

70 85 .

96 89 .

17 90 . .

31 1 .

62 4 .

50 8 .

93 8 .

71 6 .

02 16 .

23 13 .

72 14 . .

62 98 .

75 98 .

66 92 . . .

33 88 .

14 89 .

48 92 . .

52 3 .

07 4 .

15 11 .

06 6 .

35 14 .

72 16 .

34 17 .

32 14 . . ± .

27 by using a unary-basedencoding with 48 bins.

In this section, we present our best results as well as the corresponding confu-sion matrices and compare them against the state-of-the-art (Tables 16, 17 and18). The state-of-the-art methods for this dataset are HSMM and CRF usingchangepoint and last-ﬁred feature representations, respectively [22].Table 16: Accuracy and mean per class accuracy rates (%) and their standarddeviation for state-of-the-art methods and our best method for house A - CRFusing OB feature representation (UnaryToD&UnaryDeltaT7 - Data points con-catenated: 5)

Label HSMM(Changepoint) [22] CRF (Last-ﬁred)[22] This paper‘Idle’ 50.75 86.62 ‘Leave house’ 99.66 ‘Use toilet’ 82.19 61.64 ‘Take shower’ 64.94 27.89 ‘Brush teeth’ 34.38 0.0 ‘Go to bed’ 96.53 ‘Prepare dinner’ 51.57 88.85 ‘Get snack’ 54.76 14.29 ‘Get drink’ 67.35 44.9

Mean per class accuracy 74.96 69.35

Standard deviation 12.10 12.07 12.43Accuracy 91.81 96.93

Standard deviation 5.88 2.11 1.62

Figure 3: House A: Confusion matrices of the aforementioned models (a) HSMM (Changepoint) (b) CRF (Last-ﬁred) (c) CRF (OB)

For house A (Table 16), we observe that the accuracy of every label increasedby applying a CRF model with our proposed representation. In particular, the22abel whose accuracy beneﬁted the most by using the OB representation was‘Get snack’, which improved by 45%. Other labels that had signiﬁcant improve-ments were ‘Take shower’ (17%), ‘Prepare breakfast’ (17%) and ‘Get drink’(23%). On average, considering the label accuracies, we observe an improve-ment of 13% between the best value obtained from the state-of-the-art methods(HSMM (Changepoint) and CRF (Last-ﬁred)) and the CRF model with ourproposed representation.Table 17: Accuracy and mean per class accuracy rates (%) and their standarddeviation for state-of-the-art methods and our best method for house B - CRFusing OB feature representation (UnaryToD&UnaryDeltaT48 - Data points con-catenated: 5)

Label HSMM(Changepoint) [22] CRF (Changepoint)[22] This paper‘Idle’ 59.86 72.62 ‘Leaving the house’ 93.7 ‘Go to bed’ 68.65 96.15 ‘Get dressed’ 69.57 69.57 ‘Prepare brunch’ 59.52 71.43 ‘Prepare dinner’ 38.03 ‘Eat dinner’ 42.86 0.0 ‘Eat brunch’ 39.04 0.0

Mean per class accuracy 65.18 58.06

Standard deviation 13.41 7.01 22.35Accuracy 82.27 94.99

Standard deviation 13.51 5.71 6.35

Figure 4: House B: Confusion matrices of the aforementioned models (a) HSMM (Changepoint) (b) CRF (Changepoint) (c) CRF (OB)

In regard to house B (Table 17), we observe that the accuracy of most ofthe labels improves, but the labels ‘Take shower’ and ‘Get a drink’ decrease by231% and 14%, respectively. In particular, the accuracy of label ‘Take shower’decreases due to being misclassiﬁed as ‘Going to bed’ and ‘Prepare brunch’. Asfor label ‘Get a drink’, it is classiﬁed 63% of the times as ‘Idle’, ‘Brush teeth’and ‘Prepare brunch’. Nevertheless, on average, we obtain an improvement of10 .

3% between the best value obtained from the state-of-the-art methods andthe CRF model with the OB representation.Table 18: Accuracy and mean per class accuracy rates (%) and their standarddeviation for state-of-the-art methods and our best method for house C - CRFusing OB feature representation (OneHotToD&UnaryDeltaT48 - Data pointsconcatenated: 10)

Label HSMM (Last-ﬁred)[22] CRF (Last-ﬁred)[22] This paper‘Idle’ 68.57 82.6 ‘Leave house’ 86.19 95.96 ‘Eating’ 22.19 6.73 ‘Use toilet downstairs’ ‘Brush teeth’ 26.73 4.95 ‘Use toilet upstairs’ 45.0 13.75 ‘Shave’ 43.48 31.88 ‘Go to bed’ 98.03 ‘Take medication’ 26.67 0.0 ‘Prepare breakfast’ 33.8 49.3 ‘Prepare lunch’ 48.33 41.67 ‘Prepare dinner’ 69.31 55.86 ‘Get snack’ 20.83 4.17 ‘Get drink’ 0.0 6.45

Mean per class accuracy 55.98 46.79

Standard deviation 15.4 15.63 18.99Accuracy 84.48 90.69

Standard deviation 13.17 9.05 15.27

Figure 5: House C: Confusion matrices of the aforementioned models (a) HSMM (Last-ﬁred) (b) CRF (Last-ﬁred) (c) CRF (OB)

24e observe that the largest improvement regarding label accuracy was givenby house C (Table 18): on average, there was an improvement of 22% betweenthe best value obtained from the state-of-the-art methods (HSMM (Change-point) and CRF (Last-ﬁred)) and the CRF model with our proposed repre-sentation. One exception we observe is the label ‘Use toilet downstairs’. Thehighest accuracy for this label is obtained with the HSMM method and a last-ﬁred representation. This occurs because, most of the times, the other twofeature representations misclassify this highly infrequent label as ‘Idle’ (see con-fusion matrices in Figures 5b and 5c). From Figure 1c, we know that this is ahighly infrequent label in this dataset.From the experiments above, we conclude that the OB representation out-performed the state-of-the-art feature representations and, in general, there isnot only a signiﬁcant improvement in the accuracies for each class but also inthe overall accuracy.Even though CRFs outperform HSMM from an overall accuracy standpoint,when considering the per-class accuracy, HSMMs are sometimes able to betterclassify infrequent classes in comparison to CRFs. This results from the learn-ing process each method is undertaking. Speciﬁcally, HSMMs build a model p ( x t | y t ) for each class, whereas CRFs use the same model for all classes bycomputing p ( c | X ), which causes competition among classes. Consequently, if adataset is imbalanced, a higher likelihood may be obtained if the data points areclassiﬁed as the dominant class(es) than if the low frequent classes are consideredand some of the dominant ones are misclassiﬁed [23]. In this paper, we have presented a thorough study of diﬀerent ML techniques fora standard HAR dataset. Our experiments show that a signiﬁcant improvementwas made in comparison to state-of-the-art methods in the HAR ﬁeld.A new representation for data that is to be given as input to a model waspresented. The results have shown that, by applying such a representation,models are better able to learn data patterns and, consequently, successfullyperform a classiﬁcation task in the HAR domain for both dominant and minorclasses.By using an OB representation, we improved the mean per-class accuracyand the accuracy for house A by 13 .

44% and 2 . .

9% and 1 . .

56% and 3 .

41% for the respective evaluation metrics con-sidered.Given the results obtained with an observation-based representation, its us-age may also be suitable and advantageous in other domains. Moreover, usingadversarial zero-shot learning [15, 21] to recognise abnormal human activity isan interesting direction for future work.25 cknowledgements

The support of NVIDIA Corporation is gratefully acknowledged with the dona-tion of the Quadro P6000 GPU used for this research.

References [1] M. Abidine, L. Fergani, B. Fergani, and M. Oussalah. The joint use ofsequence features combination and modiﬁed weighted svm for improvingdaily activity recognition.

Pattern Analysis and Applications , 21:119–138,2018.[2] D. Arifoglu and H. Bouchachia. Activity recognition and abnormal be-haviour detection with recurrent neural networks.

Procedia Computer Sci-ence , 110:86–93, 2017.[3] S. Becker. LBFGSB (L-BFGS-B) mex wrapper. . Accessed: 2019-12-05.[4] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm forbound constrained optimization.

SIAM Journal on Scientiﬁc Computing ,16:1190–1208, 1995.[5] G. Chen, A. Wang, S. Zhao, L. Liu, and C.-Y. Chang. Latent feature learn-ing for activity recognition using simple sensors in smart homes.

MultimediaTools and Applications , 77:15201–15219, 2018.[6] L. Chen, J. Hoey, C. Nugent, D. Cook, and Z. Yu. Sensor-based activityrecognition.

IEEE Transactions on Systems, Man, and Cybernetics, PartC (Applications and Reviews) , 42(6):790–808, 2012.[7] K. Cho, B. van Merri¨enboer, C. Gulcehre, F. Bougares, H. Schwenk, andY. Bengio. Learning phrase representations using rnn encoder-decoder forstatistical machine translation. In

Proceedings of the Conference on Em-pirical Methods in Natural Language Processing , pages 1724–1734, 2014.[8] P. C. Consul and F. Famoye. Generalized poisson distribution. In

La-grangian Probability Distributions , pages 165–190. Birkh¨auser, 2006.[9] E. De la Hoz, P. Ariza, J. Medina, and M. Espinilla. Sensor-based datasetsfor human activity recognition – a systematic review of literature.

IEEEAccess , 6:59192–59210, 2018.[10] K. Guo, Y. Li, Y. Lu, X. Sun, S. Wang, and R. Cao. An activity recognition-assistance algorithm based on hybrid semantic model in smart home.

In-ternational Journal of Distributed Sensor Networks , 12, 2016.2611] S. Hochreiter and J. Schmidhuber. Long short-term memory.

Neural Com-putation , 9:1735–1780, 1997.[12] I. Ihianle.

A Hybrid Approach to Recognising Activities of Daily Livingfrom Patterns of Objects Use . PhD thesis, University of East London Ar-chitecture Computing and Engineering, 2018.[13] M. Karlaftis and E. Vlahogianni. Statistical methods versus neural net-works in transportation research: Diﬀerences, similarities and some in-sights.

Transportation Research Part C: Emerging Technologies , 19:387–399, 2011.[14] J. Kaye, S. Maxwell, N. Mattek, T. Hayes, H. Dodge, M. Pavel, H. Jimison,K. Wild, L. Boise, and T. Zitzelberger. Intelligent systems for assessingaging changes: Home-based, unobtrusive, and continuous assessment ofaging.

The journals of gerontology. Series B, Psychological sciences andsocial sciences , 66 Suppl 1:i180–i190, 2011.[15] C. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen ob-ject classes by between-class attribute transfer. In

IEEE Computer SocietyConference on Computer Vision and Pattern Recognition , pages 951–958,2009.[16] N. Limnios and G. Opri¸san. Markov renewal processes. In

Semi-MarkovProcesses and Reliability , pages 31–49. Birkh¨auser, 2001.[17] G. Okeyo, L. Chen, H. Wang, and R. Sterritt. A hybrid ontological andtemporal approach for composite activity modelling. In

IEEE 11th In-ternational Conference on Trust, Security and Privacy in Computing andCommunications , pages 1763–1770, 2012.[18] F. J. Ord´o˜nez, P. De Toledo, and A. Sanchis. Activity recognition using hy-brid generative/discriminative models on home environments using binarysensors.

Sensors , 13:5460–5477, 2013.[19] D. Riboni, L. Pareschi, L. Radaelli, and C. Bettini. Is ontology-basedactivity recognition really eﬀective? In

IEEE International Conferenceon Pervasive Computing and Communications Workshops , pages 427–431,2011.[20] D. Singh, E. Merdivan, I. Psychoula, J. Kropf, S. Hanke, M. Geist, andA. Holzinger. Human activity recognition using recurrent neural net-works. In

International Cross-Domain Conference for Machine Learningand Knowledge Extraction , pages 267–274, 2017.[21] B. Tong, M. Klinkigt, J. Chen, X. Cui, Q. Kong, T. Murakami, andY. Kobayashi. Adversarial zero-shot learning with semantic augmentation.In

AAAI Conference on Artiﬁcial Intelligence , pages 2476–2483, 2018.2722] T. van Kasteren, G. Englebienne, and B. Krose. Human activity recognitionfrom wireless sensor network data: Benchmark and software. In L. Chen,C. D. Nugent, J. Biswas, and J. Hoey, editors,

Activity recognition in per-vasive intelligent environments , volume 4, pages 165–186. Atlantis Press,2011.[23] T. van Kasteren, A. Noulas, G. Englebienne, and B. Kr¨ose. Accurate activ-ity recognition in a home setting. In

Proceedings of the 10th InternationalConference on Ubiquitous Computing , pages 1–9, 2008.[24] S.-Z. Yu. Hidden semi-markov models.

Artiﬁcial Intelligence , 174:215–243,2010. Special Review Issue.

A Appendix

Below is a list of the results that we have compiled based on the literature withregard to the three datasets (Houses A, B and C).Table 19: Results obtained for Houses A, B and C

Model Feature Mean per class accuracy AccuracyRepresentation A B C A B CNB [22] Raw 42 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9