[PDF] Improving state estimation through projection post-processing for activity recognition in football

Abstract

The past decade has seen an increased interest in human activity recognition. Most commonly, the raw data coming from sensors attached to body parts are unannotated, which creates a need for fast labelling method. Part of the procedure is choosing or designing an appropriate performance measure. We propose a new performance measure, the Locally Time-Shifted Measure, which addresses the issue of timing uncertainty of state transitions in the classification result. Our main contribution is a novel post-processing method for binary activity recognition. It improves the accuracy of the classification methods, by correcting for unrealistically short activities in the estimate.

Full PDF

IImproving state estimation through projectionpost-processing for activity recognition in football

Micha(cid:32)l Ciszewski ∗ , Jakob S¨ohl, Geurt JongbloedApplied Mathematics, Faculty of Electrical Engineering,Mathematics & Computer Science, Delft University of Technology,Delft, The Netherlands Abstract

The past decade has seen an increased interest in human activityrecognition. Most commonly, the raw data coming from sensors attachedto body parts are unannotated, which creates a need for fast labellingmethod. Part of the procedure is choosing or designing an appropriate per-formance measure. We propose a new performance measure, the LocallyTime-Shifted Measure, which addresses the issue of timing uncertainty ofstate transitions in the classiﬁcation result. Our main contribution is anovel post-processing method for binary activity recognition. It improvesthe accuracy of the classiﬁcation methods, by correcting for unrealisticallyshort activities in the estimate. keywords: activity recognition, wearable sensors, post-processing, perfor-mance measures

In almost all areas of science and technology sensors are becoming more andmore prevalent. In recent years we have seen applications of sensor technology inﬁelds as diverse as energy saving in smart home environments [1], performanceassessment in archery [2], activity monitoring of elderly [3], recognition of humanstress [4], detection of mooring ships [5], early detection of Alzheimer disease[6], dietary monitoring [7] and recognition of emotional states [8], to name justa few.Our main interest lies in the detection of human activities using sensorsattached to the body. Sensors generate raw data without annotations suggestingthe use of unsupervised learning methods. If a pattern speciﬁed in advance isof interest, then supervised learning and labelled data are required. However, ∗ Corresponding author: [email protected] a r X i v : . [ c s . C V ] F e b he task of labelling activities manually from sensor data is labour-intensive andprone to error, which creates the need for fast and accurate automated methods.Human activity recognition (HAR) attracted much attention since its inceptionin the ’90s. A plethora of methods are being used [9], with various deep learningtechniques leading the charge [10, 11]. The goal of HAR is to ﬁnd a sequence ofactivities performed by a person based on observed data. Data can come fromdiﬀerent sources. Many researchers [12, 13, 14] use only sensors embedded in asmartphone to classify user activities. Radio-based activity recognition is lesspopular, but provides lower power consumption and lower cost of production ofthe sensors [15]. Physical sensors, such as accelerometer or gyroscope attacheddirectly to a body or video recordings from a camera, are the most popularsources of data for activity recognition. Placement of the sensors on the bodydiﬀers based on the research topic. In some cases more than one accelerometeror gyroscope is used [16, 17], but most commonly only one sensor is attached toa body [18]. Similarly, cameras can be either placed on the subject [19, 20, 21]or they can observe the subject [22, 23, 24]. Rarely, both camera and inertialsensor data is captured at the same time [25].In case of multiple wearable sensors attached to diﬀerent body parts, dataare highly time-dependent and eﬀective estimation should take into account thetemporal structure of the time series. This leads to many challenges; to accountfor time dependencies mainstream classiﬁcation techniques will need to be aug-mented. Alternatively, more complicated and more diﬃcult to train methodshave to be deployed. Another challenge lies in the reliability of manual labelling(in case of supervised learning). Quite often it is unreasonable to assume thatlabels annotating the observed data are exact with regards to timings of transi-tions from one activity to another [26]. Timing uncertainty can be caused by adeﬁciency of the manual labelling or the inability to objectively detect bound-aries between diﬀerent activities. This issue is well-known in the literature,for instance, Yeh et al. [27] introduced a scalable, parameter-free and domain-agnostic algorithm that deals with this problem in the case of one-dimensionaltime series.In order to provide more context, we describe the dataset used for the eval-uation of the methods that will be introduced later. Eleven amateur footballplayers participated in a coordinated experiment at a training facility of theRoyal Dutch Football Association of The Netherlands. Five Inertial Measure-ment Units (IMUs) were attached to both their shanks, thighs of both legs andpelvis. Every IMU measures six features in time: magnitude and direction ofacceleration in 3 dimensions (using a 3-axis accelerometer) and magnitude anddirection of angular velocity in 3 dimensions (using a 3-axis gyroscope). Athleteswere asked to perform exercises on command, e.g. ‘jog for 10 meters’ or ‘longpass’. For each athlete and exercise this resulted in a 30-dimensional time series(5 body parts times 6 features per IMU) of length varying from 4 to 14 seconds.Each athlete performed 70-100 exercises which amounts to nearly 900 time se-ries (each with a sampling frequency of 500 Hz). Time series are labelled withthe command given to an athlete, but there are still other activities performedin each of the time series, for example standing still. This causes a problem;2gnoring standing periods and treating them as part of the main signal pollutesthe data and lowers the quality of the classiﬁcation. Our goal is to sift throughthe time series for the activity of interest and to identify all time points as eitherstanding or the other activity. Wilmes et al. [28] describe the experiment moreclosely.For the remainder of this paper, following naming convention will be used.We are given a multivariate time series and an associated with it univariate timeseries of states. In our context, a state corresponds to speciﬁc human activity.Any sequence of states will be called a state sequence . A label is a state at agiven time point in a state sequence. If a state sequence corresponds to the trueunderlying sequence of activities in a time series, then it will be called the truelabels or the ground truth labels . An estimate of the true labels will be calledthe estimated labels . Speciﬁc to binary classiﬁcation the term event refers to atime interval, in which a state sequence takes value 1, while the label directlypreceding and following this time interval is not 1.To compare the quality of competing activity recognition methods, an ap-propriate performance evaluation metric has to be chosen. Commonly usedcriteria are accuracy, precision and the F -measure [9, 29]. Another approachis to use similarity measures for time series classiﬁcation [30], such as DynamicTime Warping or Minimum Jump Costs Dissimilarity. Our objective is to ﬁnda performance measure that satisﬁes problem-speciﬁc conditions, which usu-ally are not addressed by standard performance measures. In our case, alreadymentioned timing uncertainty in true labels as well as event fragmentation and merging are the problems of interest. Event fragmentation occurs when anevent in true labels is represented by more than one event in estimated labels,whereas merging refers to several true events being represented by a single eventin estimated labels. Ward et al. [31] provide an excellent overview of diﬀerentperformance metrics used in activity recognition proposing a solution to theproblem of timing uncertainty as well as event fragmentation and merging. Theissues mentioned above are also addressed here, however, in a diﬀerent way. Ourmain focus regarding the performance measure for our application is on detect-ing time shifts in the estimated labels (which address the problem of timinguncertainty), while the fragmented or merged events inﬂuence the performanceof a classiﬁer through the number of state transitions present in the estimatedlabels.The second contribution of this paper is the introduction of a post-processingprocedure, which projects a binary state sequence onto a certain subset. Thissubset of state sequences is characterized by a condition that bounds the statedurations from below. It allows us to mitigate the problem of event fragmen-tation in cases where some domain-speciﬁc information about state durationsis available. Based on empirical evidence, the performance (as measured bystandard performance measures or the one newly introduced here) of classicalmachine learning classiﬁers improves signiﬁcantly by projecting the state se-quence. This enables simple and fast but less accurate classiﬁcation methods tobe upgraded to accurate and fast classiﬁers.The outline of the paper is as follows. Section 2 introduces specialized perfor-3ance measures for assessing the quality of classiﬁcation in general and activityrecognition in particular. Section 3 provides a method for improving any binaryclassiﬁcation with a post-processing scheme that uses background knowledgein the speciﬁc context. In particular, it validates the state durations and pro-vides an improved classiﬁcation that satisﬁes the physical constraints on thestate durations imposed by the context. Section 4 presents an application ofthe techniques in the setting of the football exercises just described. In order to choose an appropriate performance measure for a given classiﬁcationtask, it is important to understand the problem-speciﬁc demands on the result.Just choosing the simplest or the most common performance measure can easilylead to results that do not truthfully represent the classiﬁers’ performance asvalued by the users. In this section, we aim to highlight the main characteristicsof the classiﬁcation of movements based on wearable sensors and to translatethem into speciﬁc requirements on the performance measure.First, physical restrictions need to be taken into account. States that areconsidered in our application represent human activities. As such they cannot bearbitrarily short; there is a lower bound on the duration of these states. Hence,estimated labels that violate this lower bound indicate a bad performance. Thelower bound condition requires two parameters: the lower bound and the penaltyfor each violation. The lower bound can either be estimated or determined bydomain knowledge, while the penalty can be chosen more freely.Second, the issue of timing uncertainty should also be addressed when de-signing the performance measure. To illustrate its importance more clearly, wepresent an example. Five people were asked to detect boundaries between ac-tivities in diﬀerent time series using a visualization tool. The tool outputs ananimated stick ﬁgure model given sensor data.Three time series were selected, each with one activity: running, jumpingand ball kick. Respectively, the start and the end of each activity were recordedby participants. Table 1 presents the results of the experiment.The experiment indicates there is indeed uncertainty regarding the statetransitions. Granted that the sample size is very small, we notice more vari-ation in results referring to the end of activities rather than the beginnings.Additionally, we see more variation in the results for the kick than the jumping.So the boundaries of some activities seem to be more diﬃcult to identify thanothers. A symbolic representation of the human body using only lines

We deﬁne a class of Globally Time-Shifted distances (GTS distances), looselyinspired by the Skorokhod distance on the space of c`adl`ag functions [32, pp. 121].Consider the set of states S = { s , ..., s m } and a metric d on S . If d ( s i , s j ) =1 − δ ij for all i, j = 1 , ..., m , where δ ij is the Kronecker delta equal to 1 if i = j and otherwise 0, then d will be called the discrete metric on S . Figure 1 showsthe metric space of states in the form of weighted graph. s s s s T ,the set of all c`adl`ag functions f : R → S with a ﬁnite number of discontinuities.We identify functions that are equal almost everywhere with respect to Lebesguemeasure on R . We deﬁne the standard distance between two trajectoriesdist : T × T (cid:51) ( f, g ) → dist( f, g ) = (cid:90) R d ( f ( t ) , g ( t )) dt. (1)If d is a metric on S , then it can be shown that dist is a metric on T . If d is thediscrete metric on S , then dist is the time spent by f in a state diﬀerent than g . The distance dist is an unsatisfying measure to compare two trajectories,since it does not incorporate the requirements posed in the previous section. Inorder to improve it, we start by modelling the timing uncertainty. Let f ∈ T bethe ground truth state process and let f have n discontinuities j , ..., j n . Thelocations of the discontinuities are corrupted by additive noise: j i = J i + X i , for all i = 1 , ..., n , where J i is the true and unknown location of the i -th jump.In general, X , ..., X n are i.i.d. random variables, but in this section we willassume that X = X = ... = X n (all jumps are moved by the same value; theglobal time shift).We proceed to deﬁne the Globally Time-Shifted distances. The GTS dis-tances are parametrized by two parameters. A parameter w controls the weightof misclassiﬁcation occurring from the uncertainty of the true labels, while aparameter σ controls by how much activities may be shifted. Deﬁnition 2.1 (Globally Time-Shifted distance) . Given w ≥ , σ > d on S we deﬁne a Globally Time-Shifted distance as: GT S w,σ ( f, g ) = inf (cid:15) ∈ [ − σ,σ ] { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} , where for (cid:15) > τ (cid:15) : R → R is a time shift deﬁned as follows: τ (cid:15) ( t ) = t − (cid:15). A GTS distance is deﬁned for f, g ∈ T and it is possible that the distance isinﬁnite.Depending on the choice of parameters the GTS distance possesses certainproperties. For w > σ = ∞ , the GTS distance is an extended metric and a proof of this fact is given in the appendix. If w > σ >

0, then it It may attain the value ∞ .

6s a semimetric meaning that it has all properties required for a metric, exceptfor the triangle inequality. Indeed, consider the following example. Let f, g, h be functions deﬁned by f = [0 . , . , g = [0 . , , h = [0 . , . . Let d be the discrete metric on S = { , } , σ = 0 . w = 0 .

6. In this casewe have

GT S w,σ ( f, g ) = 0 . ,GT S w,σ ( f, h ) + GT S w,σ ( h, g ) = 0 . , and we see that the triangle inequality does not hold.The main downside of the use of the GTS distance is the unrealistic as-sumption on timing uncertainty. However, if we know that the ground truthlabels preserve the true state durations then it is a good choice. Consider afunction f ∈ T with two state transitions j and j . Let estimate g ∈ T alsofeature two state transitions j − τ and j − τ . If τ (cid:54) = τ have opposite signs,then there is no global time shift that can align the functions f and g . Thisimplies that the true state durations need to be preserved in the estimate inorder to align functions using the global time shift. The global time shift stresses the state durations, which is not always desirable.For instance, if the true labels do not preserve the real state durations, or e.g.if the additive noise terms in the locations of the jumps are independent. Hereis an example: ﬁgure 2 shows f and its approximations g i for i = 1 , ,

3. It isimpossible to align f with any of the g i with a single time shift, however, itwould be possible if each state transition could be shifted ‘locally’.Additionally, the GTS distance is sensible only when there is at most oneevent in the time series, which limits its use. Naturally, to accommodate for bothof these issues a suitable modiﬁcation would be to replace one global time shiftwith multiple local time shifts. We will introduce a measure of closeness betweentrajectories which conceptually can be seen as derived from the GTS measure.Our approach can be compared to the one introduced in [31]. There the authorsmeasure performance based on segments, which are intervals in which neitherthe ground truth labels nor the estimate change the state. If the state in theestimate and the state in the ground truth labels agree in a given segment, wecan classify it as correctly classiﬁed. If that were not true, the authors providea variety of diﬀerent classiﬁcations of segments, such as fragmenting segment orinserted segment. This provides a deeper level of error characterization, whichis then used in diﬀerent metrics of classiﬁer performance.In our case, the characterization will be focused on whether the error iscaused by the timing uncertainty or some other cause. We will be working with7 . . fg g g Figure 2: The function f represents the ground truth labels with an uncertaintyaround state boundaries, g i are the approximations of f .sequences of jumps, but more speciﬁcally given two sequences of state bound-aries we will combine them together and sort the resulting joint sequence in anincreasing order. Subsequent pairs of values in this sequence are determiningsegments understood as in [31]. We weigh diﬀerent types of segments and theresult is a weighted average of segment lengths, which is supposed to reﬂect wellthe error magnitude of the classiﬁer.We deﬁne segments formally and introduce a new distance on T . Deﬁnition 2.2 (Segments) . Let f, g ∈ T . The elements of the smallest parti-tion of R such that in each element of the partition neither f nor g changesstate will be called segments.Since functions from T are piece-wise constant and have a ﬁnite number ofdiscontinuities, there is always a ﬁnite number of segments. The general formof segments that we will use is as follows:( −∞ , a ) ∪ l − (cid:91) i =1 [ a i , a i +1 ) ∪ [ a l , ∞ ) , (2)where a < a ... < a l if f and g are not equal everywhere. Otherwise there isonly one segment, consisting of the whole real line. By convention, a = −∞ and a l +1 = ∞ , and f ( a ) = f ( a − ) = lim x →−∞ f ( x ) , f ( a l +1 ) = f ( a l ) . A partition that cannot be made coarser w controls the weight of misclassiﬁ-cation occurring from the uncertainty of the true labels. The case when w < Deﬁnition 2.3 (Locally Time-Shifted distance) . Let w ≥ σ > d be ametric on S . Let f, g ∈ T and their set of segments to be denoted as in (2). Wedeﬁne the Locally Time-Shifted distance (LTS distance) as LT S w,σ ( f, g ) = l − (cid:88) i =1 δ i ( a i +1 − a i ) d ( f ( a i ) , g ( a i )) , where δ i =  w, a i +1 − a i ≤ σ, f ( a i − ) = g ( a i − ), f ( a i +1 ) = g ( a i +1 )1 , otherwise . If f ( a l ) (cid:54) = g ( a l ), then LT S w,σ ( f, g ) = ∞ and if there is only one segment(functions are equal on the whole real line), then LT S w,σ ( f, g ) = 0.The LTS distance is an extended semimetric for w > f, g, h be func-tions deﬁned by f = [0 , + ∞ ) , g = [ σ, + ∞ ) , h = [2 σ, + ∞ ) . We have

LT S w,σ ( f, h ) = 2 σ · d (0 , LT S w,σ ( f, g ) = LT S w,σ ( g, h ) = wσ · d (0 , w < LT S w,σ ( f, h ) > LT S w,σ ( f, g ) + LT S w,σ ( g, h ).The LTS distance itself addresses the issue of timing uncertainty in thetrue labels. The issue that some states in the state sequence are too shortstill remains. Let γ > λ > f ∈ T with its discontinuities j , ..., j n , we introduce a duration penaltyterm : DP λ,γ ( f ) = λ n − (cid:88) k =1 { x : x<γ } ( j k +1 − j k ) . This term will allow to lower the performance of classiﬁcations with unrealisti-cally short states.In practice, we will need to extend the functions to the real line in orderto use the LTS distance as its deﬁnition applies only to functions deﬁned onwhole of R . Hence, an extension is necessary. One natural extension could beto extend the ﬁrst and the last state of each function indeﬁnitely. However,this solution leads to a problem. Consider two functions f and g that diﬀeronly on the interval [0 , A ). No matter how small A is, the distance between f and g will always be inﬁnite when using this extension, since in this case f g are in diﬀerent states on the whole half line ( −∞ , A ). Both functionsneed to be extended by the same state for the distance to be ﬁnite. We extendany function f deﬁned on interval [0 , T ] to the real line, setting its value to anarbitrary state s outside of [0 , T ): f ∗ ( t ) = (cid:40) f ( t ) , t ∈ [0 , T ) s , t (cid:54)∈ [0 , T ) . (3)We combine the LTS distance and the duration penalty term to deﬁne theLTS measure of closeness of two trajectories. Deﬁnition 2.4.

Let f be a function of true labels and g its estimate, bothdeﬁned on [0 , T ]. The LTS measure is deﬁned as:

LT S w,σ,λ,γ ( f, g ) = exp( − LT S w,σ ( f ∗ , g ∗ ) /T − DP λ,γ ( g )) . The scaling through the division by T normalizes the LTS distance to theinterval [0 , , + ∞ ) (cid:51) x → exp( − x ) ∈ (0 ,

1] maps thesum of the LTS distance and the duration penalty term to the interval (0 , g is closer to f if the LTS measure is closerto 1. When choosing a classiﬁer for the task of activity recognition, we are often facedwith a dilemma. We can choose a classiﬁer that captures the underlying natureof the data better, for instance a classiﬁer that assumes time dependence inthe data through the semi-Markov property. Then the computational cost ofestimation can be high and the estimation itself might be of lesser quality ifthere is not enough data or the quality of the data is poor. If we choose asimpler classiﬁer we have no problems learning the parameters of the method,but the restrictive assumptions of a simple classiﬁer might not be satisﬁed. Fora simple classiﬁer one typically assumes independence between the observations;an assumption especially dangerous in the case of activity recognition since thedata coming from sensors are highly time-dependent. One way to mitigate thisproblem is to use the sliding window technique that equips each time pointwith some knowledge about the past and the future, however, simple classiﬁers(such as decision trees) themselves are not capable of using the informationabout the distribution of durations in their prediction. The goal of this sectionis to provide a post-processing procedure that corrects for classiﬁer’s mistakesregarding the distribution of durations.Now we speciﬁcally focus on binary setting, so S = { , } are the states.10 eﬁnition 3.1 (Function with bounded minimum duration of states) . Givenparameter γ > G γ ⊂ T , the set of functions with bounded minimumduration of states , such that for g ∈ G γ we have • g = N (cid:80) i =1 [ L i ,U i ) for some constant N ∈ N and an increasing sequence L < U < L < ... < U N (we allow L = −∞ and U N = ∞ ), • if N ≥

1, then ∀ i U i − L i ≥ γ and ∀ i> L i − U i − ≥ γ .We will project T onto G γ . As a measure of closeness between functionsfrom T and G γ , we use the standard distance on T as deﬁned in (1) (with thediscrete metric d on S ) together with a penalization of jumps of g . In this case,the standard distance on T coincides with the L -distance (when S = { , } ).Let f ∈ T and g ∈ G γ . Then we introduce the notation: E γ ( f, g ) = (cid:107) f − g (cid:107) + γ · J ( g ) / , (4)where J ( g ) is the number of jumps of g .Given f ∈ T , our goal is to ﬁnd ˆ f ∈ G γ such thatˆ f = arg min g ∈G γ E γ ( f, g ) (5)and then ˆ f is called a projection of f onto G γ .The regularization by penalizing high numbers of jumps narrows down theset of possible solutions to a ﬁnite nonempty subset of G γ (as will be shownlater), which leads to the existence of ˆ f . The solution might not be unique, asillustrated by the following example.Let f = [0 . , . + [0 . , + ∞ ) and γ = 0 .

2. Both ˆ f = [0 . , + ∞ ) as wellas ˆ f = [0 . , + ∞ ) are the projections of f . One could think of it as an issue,however, it does reﬂect well our understanding of the original problem. Theassumption is that f has impossibly short windows, because it is uncertainwhich activity is actually performed in the interval [0 . , . f we are unable to decide ourselves which solution is more suitable, hence itis only natural that the method also returns two possible options. Nevertheless,in real applications we can expect that such a situation will occur rarely. In general, ﬁnding ˆ f might not be an easy task. As an example, consider ﬁgure 3,where the function f was projected onto G . . Checking all possible functionsfrom G . is naturally infeasible, but also there does not seem to be any clearrule with regards to which jumps should be present in a projection. Naively,we could think that the shorter segments are removed and in general we cansee this is somewhat true, but a good counterexample to this rule are the shortactivities in the interval [8, 9]. 11 f ˆ f Figure 3: Example of projecting function from T onto G . We will devise a method for ﬁnding a projection in an eﬃcient manner. Afunction f can have multiple uninterrupted intervals shorter than γ . Each se-quence of these intervals can be studied separately in order to ﬁnd the optimal ˆ f as proved in the appendix. It is implied from the proof that a projection willnot introduce new jump locations since in that case the L -penalty could alwaysbe reduced by moving the new jump locations to jump locations of the originalfunction. Without loss of generality we will assume that f has n ≥ j i for i = 1 , ..., n , such that 0 < j i − j i − < γ for i = 2 , ..., n , and noother jumps are made. Since we can always consider the function 1 − f insteadof f , we will also assume that f takes value 0 in the interval ( −∞ , j ). Lastly,we use the following notation: j = −∞ , j n +1 = ∞ .Now we introduce the problem of ﬁnding the shortest path in a graph, whichis equivalent to ﬁnding ˆ f as will be shown later. We will now deﬁne the graphfor the shortest path problem. Let G = ( V, A ) be a directed graph such thatthe set of vertices V is given by V = { j , j , ..., j n , j n +1 }\{ j , j n − } (6)and the set of directed arcs is given by A = (cid:32) n (cid:91) l =0 A l (cid:33) \{ A } , (7)12here: A = { ( j , j k ) : ∀ k ∈{ ,...,n +1 }\{ n − } k mod 2 = 1 } ,A l = { ( j l , j k ) : ∀ k ∈{ l +3 ,...,n +1 }\{ n − } k − l ≡ } ,l = 1 , ..., n − ,A n − = ∅ ,A n = { ( j n , j n +1 ) } . There is a correspondence between each path from j to j n +1 and a sequenceof jumps in the interval ( j − γ, j n + γ ). A path ( j , j l , ...j l m , j n +1 ) representsa function g with jumps at j l , ..., j l m , such that f ( j l k ) = g ( j l k ). As we cansee some jumps of f are not present in V and many of the possible arcs areexcluded from A as well. It is shown later that such V and A are suﬃcient toﬁnd an optimal ˆ f . However, not all paths correspond to a function from G γ .We will introduce a weight function w : A → R + ensuring that every path ofﬁnite cost corresponds to a function from G γ and, moreover, that the cost ofthe path coincides with the error E ( f, · ) of the corresponding function in theinterval ( j − γ, j n + γ ). Let I k = j k +1 − j k for k = 0 , ..., n . It is noteworthythat I , I n = ∞ , while I k < γ for k = 1 , ..., n −

1. We introduce penalty fora jump J k = γ/ k = 1 , ..., n and J n +1 = 0. The function H γ : R → R ,validating the assumption of the class G γ , is deﬁned as follows: H γ ( x ) = (cid:40) ∞ , x < γ , x ≥ γ. Now we deﬁne the weight function w : w (( j k , j l )) = H γ ( j l − j k ) + l − (cid:88) m = k +1 m ≡ k +1 mod 2 I m + J l , (8)for all possible arcs ( j k , j l ), where by convention b (cid:80) m = a c m = 0 for all a > b . Theﬁrst term in this formula ensures that [ j k , j l ] is an interval longer than or equalto γ . The second term ﬁnds the L norm of f − g in [ j k , j l ]. The last term addsa penalty for jump at j l if j l is ﬁnite (the penalty for jump at j k was added ona previous arc in the path, if k > Lemma 3.1.

Let γ > and f ∈ T . Let J denote the set of all discontinuitiesof the function f . If a function g ∈ G γ contains jumps outside of J or jumpsfrom J , but in an opposite direction than in f , g cannot be a projection of f onto G γ . Theorem 3.1 (Problem equivalence) . Let γ > and ( j , ..., j n ) be the onlydiscontinuities of a function f ∈ T . Let G = ( V, A, w ) be a weighted, directedgraph deﬁned as in (6) , (7) , (8) above. The task of ﬁnding a projection of f onto G γ , deﬁned as in (5) , is equivalent to ﬁnding the shortest path from j to j n +1 in the graph G . E is at least γ/

2. On the other hand the weight can be at most γ/

2, in order tostudy each uninterrupted sequence of intervals shorter than γ separately. Thismotivates our choice for γ/ γ = 0 .

2, consider the function f = [0 . , . + [0 . , . and all itspossible projections: g ≡ g = [0 . , . , g = [0 . , . , g = [0 . , . . Wecan calculate the closeness of each g i from f : E ( f, g ) = 0 . , E ( f, g ) = 0 . ,E ( f, g ) = 0 . , E ( f, g ) = 0 . g is the projection of f onto G . . −∞ . . ∞ . . . Figure 4: Graph G constructed for the function f .We will construct the graph G , deﬁned as in (6), (7), (8), for f . The sets ofvertices and arcs are as follows V = {−∞ , . , . , ∞} ,A = { ( −∞ , . , ( −∞ , ∞ ) , (0 . , . , (0 . , ∞ ) } and the weights are shown in ﬁgure 4. There are two possible paths from −∞ to ∞ . The path P = ( −∞ , . , . , ∞ ) has the cost equal to 0 .

25, while thepath P = ( −∞ , ∞ ) has the cost 0 .

3. Since P has a lower cost, we concludeagain that g is the projection of f onto G . .In general, we can use the following known algorithm (exercise 22.4-2 in [33,pp. 614]) to ﬁnd the shortest path between j and j n +1 . Let G be any directedacyclic graph (DAG), w the corresponding weight function, s the source and e the end.The total running time of the algorithm is Θ( | V | + | A | ). It is also worthnoting that in our case V is already topologically sorted. Figure 5 shows thelinearity of the running time as a function of | V | + | A | and ﬁgure 6 showsthe graph size as a function of number of jumps. It is worth noting that inapplication of the real data we did not encounter sequences of jumps of lengthgreater than 400. 14 lgorithm 1 Finding shortest path in directed acyclic graph procedure DAG-ShortestPaths ( G , w , s , e ) topological sort of graph G for v ∈ V do v.d = ∞ // cost of the currently shortest distance from s to v v.π =NULL // predecessor of v on the currently shortest path s.d = 0 for u ∈ V , taken in topologically sorted order do for each neighbour v of vertex u do if v.d > u.d + w (( u, v )) then v.d = u.d + w (( u, v )) v.π = u shortestPath ← empty array currNode ← e while currNode (cid:54) = s do append currNode to shortestPath currNode ← currNode .π append s to shortestPath reverse shortestPath return shortestPath , e.d | V | + | A | . . . . . . . . t i m e ( s ) creation+shortestshortestcreation Figure 5: Running time of the algorithm 1 as a function of | V | + | A |

200 400 600 800 1000 number of jumps | V | + | A | Figure 6: Graph size as a function of number of jumps

We will now demonstrate the beneﬁts of the post-processing by projection, uti-lizing the LTS measure to compare diﬀerent methods of classiﬁcation. First, wewill describe the dataset further. Recall that the data comes from IMU sensorslocated on 5 diﬀerent body parts: left shank (LS), right shank (RS), left thigh(LT), right thigh (RT) and pelvis (P). Each IMU sensor contains a 3-axis ac-celerometer (Acc) and a 3-axis gyroscope (Gyro). The naming convention willbe as follows: LSAccX refers to the x -axis of the accelerometer located on theleft shank. The data come in the form of short time series, each containing oneexercise only. The type of the exercise is always given, but it is possible for thetime series to contain other activities as well, such as standing. To show theadvantages of post-processing by projection, we select only two states: standingand another activity encoded as 0 and 1, respectively. 15 time series (represen-tative of all possible actions performed by athletes) were manually labelled timepoint by time point in order to be able to train classiﬁers, and these will formour sample.In pre-processing we are using the sliding window technique on the sensors[34]. This method transforms the original raw data using windows of ﬁxedlength d and a statistic of choice T : given a time point t , its neighbourhoodof size d is fed to the statistic T for each variable separately. Performing theprocedure for each time point results in a time series of the same dimensionas the original one, but every observation is equipped with some knowledge16bout the past and the future through the statistic T and through forming theneighbourhoods of size d . Regarding the choice of statistic T one needs to becareful, since the sensors are highly correlated with each other. The informationabout standing contained in one variable is comparable to the one in another,namely the variance of the signal is low when the person is standing (diﬀerencescan occur when considering diﬀerent legs; a low variance on one leg might bemisleading since the other leg might already be transitioning into another state).10-fold cross-validation will be performed in order to select the best per-forming classiﬁcation method. 10 time series will be used for training and 5 fortesting. A typical approach to k -fold cross-validation with a training sample ofsize k − • We have limited information regarding how uncertain locations of statetransitions are, but based on the small experiment described in section 2.1we select σ = 0 .

35 (the largest deviation between diﬀerent ground truthlabels). • A weight w of the time shift represents the importance (or the certainty)of state transitions in the ground truth. It will be selected as follows. Weknow that the visualization tool applied for manual labelling used 0 .

1s asthe smallest step. Additionally in our problem it is diﬃcult to ﬁnd tran-sitions between states objectively, hence we can expect state transitionsto be misplaced on top of the limitations of the tool used for labelling.An experiment in section 2.1 regarding timing uncertainty in the groundtruth labels shows that the standard deviation of state transitions place-ment ranges from 0.09s to 0.34s (we keep in mind that the sample size wasvery small). It seems that for some activities it is more diﬃcult to identifythe boundaries than for others. Hence, we allow for additional 0 .

05s ontop of the limitation of the visualization tool. If w = 0 .

6, then the maxi-mum time shift σ = 0 .

35 is lowered by almost 0 .

15s in error measurement,hence we select 0 . w . • The lower bound γ on the duration of activities is selected as the lengthof the shortest activity in the learning dataset, which is equal to 0 .

8s inour case. • A penalty λ represents the cost of additional or missing jumps in a statesequence compared to the ground truth labels. Since an estimate alreadypays an L penalty for misclassiﬁcation, the penalty λ should only besupplementary. We decide for the penalty λ = 0 . λ is positive as illustrated in ﬁgure 7. Both estimates are equalin performance without the penalization by λ , but with it estimate 1 isperforming better. 17 . . . . . . time (s) standingrunning truthest. 1est. 2 Figure 7: An importance of including penalty for violating the lower bound onthe duration of activities.Before assessing classiﬁers on the training set, one needs to consider anappropriate feature set. Our variables are highly dependent on one another, sowe start with feature selection. The setup is two-fold: ﬁrst we perform featureranking using the Relieﬀ algorithm [35]. The algorithm is iterative; weights ofthe features are initialized to zeroes and R observations are randomly drawnfrom the training sample. For each observation r i we ﬁnd its k nearest hits H i (samples with the same state as the drawn observation) and k nearest misses M i (samples with a diﬀerent state than the drawn observation) in the featurespace with Euclidean distance. The weight of a feature a is updated as follows: w ( a ) = w ( a ) + R (cid:88) i =1 (cid:18) (cid:88) h ∈ H i | h ( a ) − r i ( a ) | − (cid:88) m ∈ M i | m ( a ) − r i ( a ) | (cid:19) . In the next step we choose the 6 most relevant features based on the Relieﬀweights (6 features out of 30, results in 80% reduction of the feature set). Thenwe test all possible combinations of these features, which is now computationallyfeasible, in order to ﬁnd the best set for each of the classiﬁers. The featuresselected by the Relieﬀ algorithm are RTGyroX, RTGyroY, RTAccX, RTAccZ,LTAccY, PAccY.Proceeding with the cross-validation we select the following classiﬁers (withtheir abbreviations) to be assessed: DT - Decision Tree, kNN - k-Nearest Neigh-bors, LR - Logistic Regression, MLP - Multi-layer Perceptron, NB - Naive Bayes,RF - Random Forest, SVM - Support Vector Machine. The results of 10-foldcross-validation are shown in table 2. It is striking that all classiﬁers are on18lassiﬁer OG Test PP TestMLP 0.916+/-0.031 0.972+/-0.008LR 0.898+/-0.034 0.968+/-0.015kNN 0.59+/-0.05 0.967+/-0.020RF 0.83+/-0.07 0.966+/-0.017SVC 0.894+/-0.034 0.966+/-0.017DT 0.83+/-0.07 0.965+/-0.008NB 0.88+/-0.04 0.944+/-0.023Table 2: Average of the 10-fold cross-validation scores for all classiﬁers usingthe best sensor set for each of them. The pre-processing consisted of the slidingwindow technique in combination with summarizing by the standard deviation.The OG Test averages the LTS measure on the test set for the original classiﬁer,while the PP Test is the same value for the post-processed classiﬁer.average within 2.8% of test score. This is due to post-processing by projection.The correction it provides brings all classiﬁers closer together. This astonishingresult can be extended even further. The test score of a decision tree rangesfrom 59% to 86% for diﬀerent sensor sets before post-processing, while usingthe post-processing results in a range of test scores from 93% to 96.5% and thisis not speciﬁc to decision trees only.The example shows that the post-processing is crucial. First, it increases ac-curacy of a given estimator on a given feature set by 35%. Second, it diminishesthe impact of feature selection as the diﬀerence in accuracy between diﬀerentfeature subsets decreases substantially. Feature selection is of course still im-portant as it decreases computational complexity of the problem and allowsto get rid of redundancy in the feature set. However, with methods that onlyrank features such as Relieﬀ the choice of the threshold we choose to classifya feature as signiﬁcant or not is less important. Finally and most importantly,the post-processing by projection allows to select a method according to criteriaother than the performance, namely the computational speed.

This paper introduces measures of classiﬁer performance in the task of activityrecognition using wearable sensors. It addresses the issue of timing oﬀsets aswell as unrealistic classiﬁcations, while retaining a typical scalar output of aperformance measure allowing for easy comparisons between classiﬁers.We have also introduced a post-processing scheme that allows to improveestimates in the binary setting. It ﬁnds estimated activities that are too shortand eliminates them in an optimal way by ﬁnding the shortest path in a directedacyclic graph.Real-life football sensor data were used to assess the adequacy of the post-19rocessing scheme. It signiﬁcantly improved performance of the classiﬁers. Atthe same time, post-processed classiﬁers are closer to each other in performancethan the original ones. This allows placing more importance on other criteria,such as the computational speed of a method.

Acknowledgments

We thank Erik Wilmes for providing footballdata of high quality and the stick-model anima-tion tool. It was the basis for the analysis of ourmethods in section 4. We also thank Bart vanGinkel for the idea of how to generalize from thebinary to the multiclass case. This work is partof the research programme CAS with projectnumber P16-28 project 2, which is (partly) ﬁ-nanced by the Dutch Research Council (NWO).20

Proofs

GTS distance with w > and s = ∞ is an extended metric Proof.

We will show that:

GT S w ( f, g ) = inf (cid:15) ∈ R { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} is an extended metric on T .0. Since for any (cid:15) , dist( f ◦ τ (cid:15) , g ) ≥ w | (cid:15) | ≥ GT S w is non-negative.1. It is obvious to see that GT S w ( f, f ) = 0 for any f ∈ T . Now let us assumethat for some f, g ∈ T we have GT S w ( f, g ) = 0. This implies that ∃ ( (cid:15) n ) dist( f ◦ τ (cid:15) , g ) + w | (cid:15) n | n →∞ −−−−→ . Since dist( f ◦ τ (cid:15) , g ) + w | (cid:15) n | is an upper bound of { dist( f ◦ τ (cid:15) , g ) , w | (cid:15) n |} , wehave | (cid:15) n | n →∞ −−−−→ , (cid:90) R d ( f ◦ τ (cid:15) ( t ) , g ( t )) dλ ( t ) n →∞ −−−−→ . From Fatou’s lemma we have (cid:90) R lim inf n →∞ d ( f ( t − (cid:15) n ) , g ( t )) dλ ( t ) = 0 , where λ is the Lebesgue measure on R . Because f and g are c`adl`ag and d is continuous, this implies that for almost all t we have f ( t − ) = g ( t ) or f ( t ) = g ( t ) and so we conclude that f = g almost everywhere.2. Let f, g ∈ T , we have following GT S w ( f, g ) = inf (cid:15) { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) { dist( g ◦ τ − (cid:15) , f ) + w | − (cid:15) |} = inf − (cid:15) { dist( g ◦ τ (cid:15) , f ) + w | (cid:15) |} = inf (cid:15) { dist( g ◦ τ (cid:15) , f ) + w | (cid:15) |} = GT S w ( g, f ) , hence we conclude that GT S w is symmetric.21. Letting f, g, h ∈ T , we have following GT S w ( f, g ) = inf (cid:15) { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) ◦ τ (cid:15) , g ) + w | (cid:15) + (cid:15) |}≤ inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) ◦ τ (cid:15) , h ◦ τ (cid:15) ) + dist( h ◦ τ (cid:15) , g )++ w | (cid:15) | + w | (cid:15) |} = inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) , h ) + w | (cid:15) | + dist( h ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) { dist( f ◦ τ (cid:15) , h ) + w | (cid:15) |} + inf (cid:15) { dist( h ◦ τ (cid:15) , g ) + w | (cid:15) |} = GT S w ( f, h ) + GT S w ( h, g ) , which shows that GT S w satisﬁes triangle inequality and that concludesthe proof. The LTS distance with w > is a semimetric Proof.

Let w > , σ > d on S be ﬁxed. We observe that LT S w,σ is nonnegative. Symmetry of

LT S w,σ follows directly from the deﬁnition. Itonly remains to show that

LT S w,σ ( f, g ) = 0 if and only if f = g for f, g ∈ T .We have LT S w,σ ( f, f ) = 0 , because there is only one segment. Assume now that LT S w,σ ( f, g ) = 0 and f (cid:54) = g . In that case, there exists more than one segment. LT S w,σ ( f, g ) = l − (cid:88) i =1 δ i ( a i +1 − a i ) d ( f ( a i ) , g ( a i )) = 0 ⇒ ∀ i =1 , , ,...,l − f ( a i ) = g ( a i ) , which implies that f = g , which contradicts the assumption. We conclude that LT S w,σ ( f, g ) = 0 iﬀ f = g , which completes the proof. A projection

T → G γ , f (cid:55)→ ˆ f does not change states that last longerthan γ , while if a state lasts exactly γ there exists a projection thatdoes not change it Proof.

Let f be a function with two neighboring jumps j , j satisfying thecondition j − j ≥ γ . Since the interval is longer than or equal to γ it satisﬁesthe condition of the class G γ . If j − j > γ , then no matter what the values ofa projection ˆ f will be outside of the interval ( j , j ] it will always be cheaper tomatch the values of f on ( j , j ] and to take a penalty of at most 2 · γ/ γ for possible jumps at the boundary of the interval than to take an L -penaltyof at least γ by assigning a value diﬀerent from f on the interval ( j , j ]. If j − j = γ , then whether we match the values of f on ( j , j ) or remove thejumps at j and j we take penalty of exactly γ . There is no unique projectionin this case, but one of them allows no change of the state.22 roof of Lemma 3.1 Let ˆ f be a projection of f onto G γ . Assume that ˆ f contains a jump j such that it is outside of the set J of discontinuities of f orit is inside J , but is in opposite direction than in f . Without loss of generalitywe assume j is a jump from 0 to 1. Amongst jumps of f closest to j from theleft and from the right we denote by j k the one from 0 to 1. Such jump exists,otherwise in the case of j (cid:54)∈ J , ˆ f would diﬀer from f on an inﬁnite interval tothe left or to the right of j or there would exist a state in ˆ f not present in f ,while in the case of j ∈ J , but in opposite direction than in f this would implythat f has only one jump, at j , which is false and hence ˆ f could not have beena projection of f onto G γ . Without loss of generality we assume j k is the jumpof f closest to j from the left. Let j a be a jump of ˆ f preceding j (if such jumpdoes not exist then j a = −∞ ). It is important to note that j k ≥ j a , unless j k isthe last jump of f (which can only happen if j (cid:54)∈ J ), in which case it is trivial tosee that ˆ f can be easily improved upon to reduce error (by removing jumps j a and j entirely) which contradicts its optimality. If j k − j a ≥ γ , then movingthe jump of ˆ f from j to j k results in the reduction of error by j − j k , whichcontradicts the assumption of optimality of ˆ f . If j k − j a < γ , then removingjumps j a and j from ˆ f yields a better approximation (increase in L norm issmaller than γ , while decrease in jump penalty is equal to γ ). Hence ˆ f cannotbe a projection of f . Proof of Theorem 3.1

We use Lemma 3.1 twice to prove that a projectionof a function from T onto G γ can only make jumps at the some positions andin the same directions as the jumps in the projected function. This leads tothe fact that ﬁnding the shortest path in the graph deﬁned in the paper is aproblem equivalent to ﬁnding ˆ f . References [1] Wesllen S. Lima et al. “User activity recognition for energy saving insmart home environment”. In:

Proceedings of the 2015 IEEE Symposiumon Computers and Communication (ISCC) . New York, NY: IEEE, 2015,pp. 751–757.[2] Markus Eckelt, Franziska Mally, and Angelika Brunner. “Use of Accel-eration Sensors in Archery”. In:

Proc.

49 (2020), p. 98. url : https ://doi.org/10.3390/proceedings2020049098 .[3] Stylianos Paraschiakos et al. “Activity recognition using wearable sensorsfor tracking the elderly”. In: User Model. and User-Adapt. Interact. url : https: //doi .org/10 .1007/ s11257- 020-09268-2 . The proof should ﬁrst be conducted for the case of a jump outside of J and then for thecase of a jump in opposite direction than in f . Int. J. of Neural Syst.

27 (2017),p. 1650041. url : https://doi.org/10.1142/S0129065716500416 .[5] Maurits Waterbolk et al. “Detection of Ships at Mooring Dolphins withHidden Markov Models”. In: Transp. Res. Rec. url : https://doi.org/10.1177/0361198119837495 .[6] R. Varatharajan et al. “Wearable sensor devices for early detection ofAlzheimer disease using dynamic time warping algorithm”. In: Clust. Com-put.

21 (2018), pp. 681–690. url : https://doi.org/10.1007/s10586-017-0977-2 .[7] Oliver Amft et al. “Analysis of Chewing Sounds for Dietary Monitor-ing”. In: Proceedings of the UbiComp 2005: Ubiquitous Computing . Ed.by Michael Beigl et al. Vol. 3660. Lecture Notes in Computer Science.Berlin, Heidelberg: Springer, 2005, pp. 56–72.[8] Agata Ko(cid:32)lakowska, Wioleta Szwoch, and Mariusz Szwoch. “A Review ofEmotion Recognition Methods Based on Data Acquired via SmartphoneSensors”. In:

Sens.

20 (2020), p. 6367. url : https://doi.org/10.3390/s20216367 .[9] Oscar D. Lara and Miguel A. Labrador. “A Survey on Human ActivityRecognition using Wearable Sensors”. In: IEEE Commun. Surv. & Tutor.

15 (2013), pp. 1192–1209. url : https://doi.org/10.1109/SURV.2012.110112.00192 .[10] L. Minh Dang et al. “Sensor-based and vision-based human activity recog-nition: A comprehensive survey”. In: Pattern Recognit.

108 (2020), p. 107561. url : https://doi.org/10.1016/j.patcog.2020.107561 .[11] Jindong Wang et al. “Deep learning for sensor-based activity recognition:A survey”. In: Pattern Recognit. Lett.

119 (2019), pp. 3–11. url : https://doi.org/10.1016/j.patrec.2018.02.010 .[12] Charissa Ann Ronao and Sung-Bae Cho. “Recognizing human activitiesfrom smartphone sensors using hierarchical continuous hidden Markovmodels”. In: Int. J. of Distrib. Sens. Netw.

13 (2017), p. 1550147716683687. url : https://doi.org/10.1177/1550147716683687 .[13] Nicole A. Capela, Edward D. Lemaire, and Natalie Baddour. “Featureselection for wearable smartphone-based human activity recognition withable bodied, elderly, and stroke patients”. In: PLoS One

10 (2015), e0124414. url : https://doi.org/10.1371/journal.pone.0124414 .[14] Carlos Aviles-Cruz et al. “Granger-causality: An eﬃcient single user move-ment recognition using a smartphone accelerometer sensor”. In: PatternRecognit. Lett.

125 (2019), pp. 576–583. url : https : / / doi . org / 10 .1016/j.patrec.2019.06.029 .[15] Shuangquan Wang and Gang Zhou. “A review on radio based activityrecognition”. In: Digit. Commun. and Netw. url : https://doi.org/10.1016/j.dcan.2015.02.006 .2416] Ramona Rednic et al. “Wearable posture recognition systems: Factorsaﬀecting performance”. In: Proceedings of 2012 IEEE-EMBS InternationalConference on Biomedical and Health Informatics . New York, NY: IEEE,2012, pp. 200–203.[17] Chun Zhu and Weihua Sheng. “Motion- and location-based online humandaily activity recognition”. In:

Pervasive and Mob. Comput. url : https://doi.org/10.1016/j.pmcj.2010.11.004 .[18] Maria Cornacchia et al. “A Survey on Activity Detection and Classiﬁca-tion Using Wearable Sensors”. In: IEEE Sens. J.

17 (2016), pp. 386–403. url : https://doi.org/10.1109/JSEN.2016.2628346 .[19] Lu Li et al. “Indirect activity recognition using a target-mounted cam-era”. In: Proceedings of the 2011 4th International Congress on Imageand Signal Processing . Ed. by Peihua Qiu et al. New York, NY: IEEE,2011, pp. 487–491.[20] Michael S. Ryoo and Larry Matthies. “First-Person Activity Recognition:What Are They Doing to Me?” In:

Proceedings of the 2013 IEEE Confer-ence on Computer Vision and Pattern Recognition . New York, NY: IEEE,2013, pp. 2730–2737.[21] Yoshihiro Watanabe et al. “Human gait estimation using a wearable cam-era”. In:

Proceedings of the 2011 IEEE Workshop on Applications of Com-puter Vision . New York, NY: IEEE, 2011, pp. 276–281.[22] Kai-Tai Song and Wei-Jyun Chen. “Human activity recognition using amobile camera”. In:

Proceedings of the 2011 8th International Conferenceon Ubiquitous Robots and Ambient Intelligence (URAI) . New York, NY:IEEE, 2011, pp. 3–8.[23] Ivan Laptev et al. “Learning realistic human actions from movies”. In:

Proceedings of the 2008 IEEE Conference on Computer Vision and Pat-tern Recognition . New York, NY: IEEE, 2008, pp. 1–8.[24] Yan Ke, Rahul Sukthankar, and Martial Hebert. “Eﬃcient visual eventdetection using volumetric features”. In:

Proceedings of the Tenth IEEEInternational Conference on Computer Vision (ICCV’05) . Vol. 1. NewYork, NY: IEEE, 2005, pp. 166–173.[25] Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. “UTD-MHAD: Amultimodal dataset for human action recognition utilizing a depth cameraand a wearable inertial sensor”. In:

Proceedings of the 2015 IEEE Inter-national Conference on Image Processing (ICIP) . New York, NY: IEEE,2015, pp. 168–172.[26] Jamie A. Ward, Paul Lukowicz, and Gerhard Tr¨oster. “Evaluating Per-formance in Continuous Context Recognition Using Event-Driven ErrorCharacterisation”. In:

Location- and Context-Awareness . Ed. by Mike Hazas,John Krumm, and Thomas Strang. Berlin, Heidelberg: Springer, 2006,pp. 239–255. 2527] Chin-Chia Michael Yeh, Nickolas Kavantzas, and Eamonn Keogh. “MatrixProﬁle IV: Using Weakly Labeled Time Series to Predict Outcomes”. In:

Proceedings of the VLDB Endowment . Ed. by Peter Boncz and Ken Salem.Vol. 10. VLDB Endowment, 2017, pp. 1802–1812.[28] Erik Wilmes et al. “Inertial Sensor-Based Motion Tracking in Footballwith Movement Intensity Quantiﬁcation”. In:

Sens.

20 (2020), p. 2527. url : https://doi.org/10.3390/s20092527 .[29] Wesllen Sousa Lima et al. “Human Activity Recognition Using InertialSensors in a Smartphone: An Overview”. In: Sens.

19 (2019), p. 3213. url : https://doi.org/10.3390/s19143213 .[30] Joan Serr`a and Josep Lluis Arcos. “An empirical evaluation of similaritymeasures for time series classiﬁcation”. In: Knowl.-Based Syst.

67 (2014),pp. 305–314. url : https://doi.org/10.1016/j.knosys.2014.04.035 .[31] Jamie A. Ward, Paul Lukowicz, and Hans W. Gellersen. “PerformanceMetrics for Activity Recognition”. In: ACM Trans. on Intell. Syst. andTechnol. url : https : / / doi . org / 10 . 1145 / 1889681 .1889687 .[32] Patrick Billingsley. Convergence of Probability Measures . second. Hobo-ken, NJ: John Wiley & Sons, Inc., 1999.[33] Thomas H. Cormen et al.

Introduction to Algorithms . third. Cambridge,MA: The MIT Press, 2009.[34] Thomas G. Dietterich. “Machine Learning for Sequential Data: A Re-view”. In:

Structural, Syntactic, and Statistical Pattern Recognition . Ed.by Terry Caelli et al. Vol. 2396. Lecture Notes in Computer Science. Berlin,Heidelberg: Springer, 2002, pp. 15–30.[35] Igor Kononenko, Edvard ˇSimec, and Marko Robnik-ˇSikonja. “Overcom-ing the Myopia of Inductive Learning Algorithms with RELIEFF”. In:

Appl. Intell. url : https://doi.org/10.1023/A:1008280620621https://doi.org/10.1023/A:1008280620621