Improving state estimation through projection post-processing for activity recognition in football
IImproving state estimation through projectionpost-processing for activity recognition in football
Micha(cid:32)l Ciszewski ∗ , Jakob S¨ohl, Geurt JongbloedApplied Mathematics, Faculty of Electrical Engineering,Mathematics & Computer Science, Delft University of Technology,Delft, The Netherlands Abstract
The past decade has seen an increased interest in human activityrecognition. Most commonly, the raw data coming from sensors attachedto body parts are unannotated, which creates a need for fast labellingmethod. Part of the procedure is choosing or designing an appropriate per-formance measure. We propose a new performance measure, the LocallyTime-Shifted Measure, which addresses the issue of timing uncertainty ofstate transitions in the classification result. Our main contribution is anovel post-processing method for binary activity recognition. It improvesthe accuracy of the classification methods, by correcting for unrealisticallyshort activities in the estimate. keywords: activity recognition, wearable sensors, post-processing, perfor-mance measures
In almost all areas of science and technology sensors are becoming more andmore prevalent. In recent years we have seen applications of sensor technology infields as diverse as energy saving in smart home environments [1], performanceassessment in archery [2], activity monitoring of elderly [3], recognition of humanstress [4], detection of mooring ships [5], early detection of Alzheimer disease[6], dietary monitoring [7] and recognition of emotional states [8], to name justa few.Our main interest lies in the detection of human activities using sensorsattached to the body. Sensors generate raw data without annotations suggestingthe use of unsupervised learning methods. If a pattern specified in advance isof interest, then supervised learning and labelled data are required. However, ∗ Corresponding author: [email protected] a r X i v : . [ c s . C V ] F e b he task of labelling activities manually from sensor data is labour-intensive andprone to error, which creates the need for fast and accurate automated methods.Human activity recognition (HAR) attracted much attention since its inceptionin the ’90s. A plethora of methods are being used [9], with various deep learningtechniques leading the charge [10, 11]. The goal of HAR is to find a sequence ofactivities performed by a person based on observed data. Data can come fromdifferent sources. Many researchers [12, 13, 14] use only sensors embedded in asmartphone to classify user activities. Radio-based activity recognition is lesspopular, but provides lower power consumption and lower cost of production ofthe sensors [15]. Physical sensors, such as accelerometer or gyroscope attacheddirectly to a body or video recordings from a camera, are the most popularsources of data for activity recognition. Placement of the sensors on the bodydiffers based on the research topic. In some cases more than one accelerometeror gyroscope is used [16, 17], but most commonly only one sensor is attached toa body [18]. Similarly, cameras can be either placed on the subject [19, 20, 21]or they can observe the subject [22, 23, 24]. Rarely, both camera and inertialsensor data is captured at the same time [25].In case of multiple wearable sensors attached to different body parts, dataare highly time-dependent and effective estimation should take into account thetemporal structure of the time series. This leads to many challenges; to accountfor time dependencies mainstream classification techniques will need to be aug-mented. Alternatively, more complicated and more difficult to train methodshave to be deployed. Another challenge lies in the reliability of manual labelling(in case of supervised learning). Quite often it is unreasonable to assume thatlabels annotating the observed data are exact with regards to timings of transi-tions from one activity to another [26]. Timing uncertainty can be caused by adeficiency of the manual labelling or the inability to objectively detect bound-aries between different activities. This issue is well-known in the literature,for instance, Yeh et al. [27] introduced a scalable, parameter-free and domain-agnostic algorithm that deals with this problem in the case of one-dimensionaltime series.In order to provide more context, we describe the dataset used for the eval-uation of the methods that will be introduced later. Eleven amateur footballplayers participated in a coordinated experiment at a training facility of theRoyal Dutch Football Association of The Netherlands. Five Inertial Measure-ment Units (IMUs) were attached to both their shanks, thighs of both legs andpelvis. Every IMU measures six features in time: magnitude and direction ofacceleration in 3 dimensions (using a 3-axis accelerometer) and magnitude anddirection of angular velocity in 3 dimensions (using a 3-axis gyroscope). Athleteswere asked to perform exercises on command, e.g. ‘jog for 10 meters’ or ‘longpass’. For each athlete and exercise this resulted in a 30-dimensional time series(5 body parts times 6 features per IMU) of length varying from 4 to 14 seconds.Each athlete performed 70-100 exercises which amounts to nearly 900 time se-ries (each with a sampling frequency of 500 Hz). Time series are labelled withthe command given to an athlete, but there are still other activities performedin each of the time series, for example standing still. This causes a problem;2gnoring standing periods and treating them as part of the main signal pollutesthe data and lowers the quality of the classification. Our goal is to sift throughthe time series for the activity of interest and to identify all time points as eitherstanding or the other activity. Wilmes et al. [28] describe the experiment moreclosely.For the remainder of this paper, following naming convention will be used.We are given a multivariate time series and an associated with it univariate timeseries of states. In our context, a state corresponds to specific human activity.Any sequence of states will be called a state sequence . A label is a state at agiven time point in a state sequence. If a state sequence corresponds to the trueunderlying sequence of activities in a time series, then it will be called the truelabels or the ground truth labels . An estimate of the true labels will be calledthe estimated labels . Specific to binary classification the term event refers to atime interval, in which a state sequence takes value 1, while the label directlypreceding and following this time interval is not 1.To compare the quality of competing activity recognition methods, an ap-propriate performance evaluation metric has to be chosen. Commonly usedcriteria are accuracy, precision and the F -measure [9, 29]. Another approachis to use similarity measures for time series classification [30], such as DynamicTime Warping or Minimum Jump Costs Dissimilarity. Our objective is to finda performance measure that satisfies problem-specific conditions, which usu-ally are not addressed by standard performance measures. In our case, alreadymentioned timing uncertainty in true labels as well as event fragmentation and merging are the problems of interest. Event fragmentation occurs when anevent in true labels is represented by more than one event in estimated labels,whereas merging refers to several true events being represented by a single eventin estimated labels. Ward et al. [31] provide an excellent overview of differentperformance metrics used in activity recognition proposing a solution to theproblem of timing uncertainty as well as event fragmentation and merging. Theissues mentioned above are also addressed here, however, in a different way. Ourmain focus regarding the performance measure for our application is on detect-ing time shifts in the estimated labels (which address the problem of timinguncertainty), while the fragmented or merged events influence the performanceof a classifier through the number of state transitions present in the estimatedlabels.The second contribution of this paper is the introduction of a post-processingprocedure, which projects a binary state sequence onto a certain subset. Thissubset of state sequences is characterized by a condition that bounds the statedurations from below. It allows us to mitigate the problem of event fragmen-tation in cases where some domain-specific information about state durationsis available. Based on empirical evidence, the performance (as measured bystandard performance measures or the one newly introduced here) of classicalmachine learning classifiers improves significantly by projecting the state se-quence. This enables simple and fast but less accurate classification methods tobe upgraded to accurate and fast classifiers.The outline of the paper is as follows. Section 2 introduces specialized perfor-3ance measures for assessing the quality of classification in general and activityrecognition in particular. Section 3 provides a method for improving any binaryclassification with a post-processing scheme that uses background knowledgein the specific context. In particular, it validates the state durations and pro-vides an improved classification that satisfies the physical constraints on thestate durations imposed by the context. Section 4 presents an application ofthe techniques in the setting of the football exercises just described. In order to choose an appropriate performance measure for a given classificationtask, it is important to understand the problem-specific demands on the result.Just choosing the simplest or the most common performance measure can easilylead to results that do not truthfully represent the classifiers’ performance asvalued by the users. In this section, we aim to highlight the main characteristicsof the classification of movements based on wearable sensors and to translatethem into specific requirements on the performance measure.First, physical restrictions need to be taken into account. States that areconsidered in our application represent human activities. As such they cannot bearbitrarily short; there is a lower bound on the duration of these states. Hence,estimated labels that violate this lower bound indicate a bad performance. Thelower bound condition requires two parameters: the lower bound and the penaltyfor each violation. The lower bound can either be estimated or determined bydomain knowledge, while the penalty can be chosen more freely.Second, the issue of timing uncertainty should also be addressed when de-signing the performance measure. To illustrate its importance more clearly, wepresent an example. Five people were asked to detect boundaries between ac-tivities in different time series using a visualization tool. The tool outputs ananimated stick figure model given sensor data.Three time series were selected, each with one activity: running, jumpingand ball kick. Respectively, the start and the end of each activity were recordedby participants. Table 1 presents the results of the experiment.The experiment indicates there is indeed uncertainty regarding the statetransitions. Granted that the sample size is very small, we notice more vari-ation in results referring to the end of activities rather than the beginnings.Additionally, we see more variation in the results for the kick than the jumping.So the boundaries of some activities seem to be more difficult to identify thanothers. A symbolic representation of the human body using only lines
We define a class of Globally Time-Shifted distances (GTS distances), looselyinspired by the Skorokhod distance on the space of c`adl`ag functions [32, pp. 121].Consider the set of states S = { s , ..., s m } and a metric d on S . If d ( s i , s j ) =1 − δ ij for all i, j = 1 , ..., m , where δ ij is the Kronecker delta equal to 1 if i = j and otherwise 0, then d will be called the discrete metric on S . Figure 1 showsthe metric space of states in the form of weighted graph. s s s s T ,the set of all c`adl`ag functions f : R → S with a finite number of discontinuities.We identify functions that are equal almost everywhere with respect to Lebesguemeasure on R . We define the standard distance between two trajectoriesdist : T × T (cid:51) ( f, g ) → dist( f, g ) = (cid:90) R d ( f ( t ) , g ( t )) dt. (1)If d is a metric on S , then it can be shown that dist is a metric on T . If d is thediscrete metric on S , then dist is the time spent by f in a state different than g . The distance dist is an unsatisfying measure to compare two trajectories,since it does not incorporate the requirements posed in the previous section. Inorder to improve it, we start by modelling the timing uncertainty. Let f ∈ T bethe ground truth state process and let f have n discontinuities j , ..., j n . Thelocations of the discontinuities are corrupted by additive noise: j i = J i + X i , for all i = 1 , ..., n , where J i is the true and unknown location of the i -th jump.In general, X , ..., X n are i.i.d. random variables, but in this section we willassume that X = X = ... = X n (all jumps are moved by the same value; theglobal time shift).We proceed to define the Globally Time-Shifted distances. The GTS dis-tances are parametrized by two parameters. A parameter w controls the weightof misclassification occurring from the uncertainty of the true labels, while aparameter σ controls by how much activities may be shifted. Definition 2.1 (Globally Time-Shifted distance) . Given w ≥ , σ > d on S we define a Globally Time-Shifted distance as: GT S w,σ ( f, g ) = inf (cid:15) ∈ [ − σ,σ ] { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} , where for (cid:15) > τ (cid:15) : R → R is a time shift defined as follows: τ (cid:15) ( t ) = t − (cid:15). A GTS distance is defined for f, g ∈ T and it is possible that the distance isinfinite.Depending on the choice of parameters the GTS distance possesses certainproperties. For w > σ = ∞ , the GTS distance is an extended metric and a proof of this fact is given in the appendix. If w > σ >
0, then it It may attain the value ∞ .
6s a semimetric meaning that it has all properties required for a metric, exceptfor the triangle inequality. Indeed, consider the following example. Let f, g, h be functions defined by f = [0 . , . , g = [0 . , , h = [0 . , . . Let d be the discrete metric on S = { , } , σ = 0 . w = 0 .
6. In this casewe have
GT S w,σ ( f, g ) = 0 . ,GT S w,σ ( f, h ) + GT S w,σ ( h, g ) = 0 . , and we see that the triangle inequality does not hold.The main downside of the use of the GTS distance is the unrealistic as-sumption on timing uncertainty. However, if we know that the ground truthlabels preserve the true state durations then it is a good choice. Consider afunction f ∈ T with two state transitions j and j . Let estimate g ∈ T alsofeature two state transitions j − τ and j − τ . If τ (cid:54) = τ have opposite signs,then there is no global time shift that can align the functions f and g . Thisimplies that the true state durations need to be preserved in the estimate inorder to align functions using the global time shift. The global time shift stresses the state durations, which is not always desirable.For instance, if the true labels do not preserve the real state durations, or e.g.if the additive noise terms in the locations of the jumps are independent. Hereis an example: figure 2 shows f and its approximations g i for i = 1 , ,
3. It isimpossible to align f with any of the g i with a single time shift, however, itwould be possible if each state transition could be shifted ‘locally’.Additionally, the GTS distance is sensible only when there is at most oneevent in the time series, which limits its use. Naturally, to accommodate for bothof these issues a suitable modification would be to replace one global time shiftwith multiple local time shifts. We will introduce a measure of closeness betweentrajectories which conceptually can be seen as derived from the GTS measure.Our approach can be compared to the one introduced in [31]. There the authorsmeasure performance based on segments, which are intervals in which neitherthe ground truth labels nor the estimate change the state. If the state in theestimate and the state in the ground truth labels agree in a given segment, wecan classify it as correctly classified. If that were not true, the authors providea variety of different classifications of segments, such as fragmenting segment orinserted segment. This provides a deeper level of error characterization, whichis then used in different metrics of classifier performance.In our case, the characterization will be focused on whether the error iscaused by the timing uncertainty or some other cause. We will be working with7 . . fg g g Figure 2: The function f represents the ground truth labels with an uncertaintyaround state boundaries, g i are the approximations of f .sequences of jumps, but more specifically given two sequences of state bound-aries we will combine them together and sort the resulting joint sequence in anincreasing order. Subsequent pairs of values in this sequence are determiningsegments understood as in [31]. We weigh different types of segments and theresult is a weighted average of segment lengths, which is supposed to reflect wellthe error magnitude of the classifier.We define segments formally and introduce a new distance on T . Definition 2.2 (Segments) . Let f, g ∈ T . The elements of the smallest parti-tion of R such that in each element of the partition neither f nor g changesstate will be called segments.Since functions from T are piece-wise constant and have a finite number ofdiscontinuities, there is always a finite number of segments. The general formof segments that we will use is as follows:( −∞ , a ) ∪ l − (cid:91) i =1 [ a i , a i +1 ) ∪ [ a l , ∞ ) , (2)where a < a ... < a l if f and g are not equal everywhere. Otherwise there isonly one segment, consisting of the whole real line. By convention, a = −∞ and a l +1 = ∞ , and f ( a ) = f ( a − ) = lim x →−∞ f ( x ) , f ( a l +1 ) = f ( a l ) . A partition that cannot be made coarser w controls the weight of misclassifi-cation occurring from the uncertainty of the true labels. The case when w < Definition 2.3 (Locally Time-Shifted distance) . Let w ≥ σ > d be ametric on S . Let f, g ∈ T and their set of segments to be denoted as in (2). Wedefine the Locally Time-Shifted distance (LTS distance) as LT S w,σ ( f, g ) = l − (cid:88) i =1 δ i ( a i +1 − a i ) d ( f ( a i ) , g ( a i )) , where δ i = w, a i +1 − a i ≤ σ, f ( a i − ) = g ( a i − ), f ( a i +1 ) = g ( a i +1 )1 , otherwise . If f ( a l ) (cid:54) = g ( a l ), then LT S w,σ ( f, g ) = ∞ and if there is only one segment(functions are equal on the whole real line), then LT S w,σ ( f, g ) = 0.The LTS distance is an extended semimetric for w > f, g, h be func-tions defined by f = [0 , + ∞ ) , g = [ σ, + ∞ ) , h = [2 σ, + ∞ ) . We have
LT S w,σ ( f, h ) = 2 σ · d (0 , LT S w,σ ( f, g ) = LT S w,σ ( g, h ) = wσ · d (0 , w < LT S w,σ ( f, h ) > LT S w,σ ( f, g ) + LT S w,σ ( g, h ).The LTS distance itself addresses the issue of timing uncertainty in thetrue labels. The issue that some states in the state sequence are too shortstill remains. Let γ > λ > f ∈ T with its discontinuities j , ..., j n , we introduce a duration penaltyterm : DP λ,γ ( f ) = λ n − (cid:88) k =1 { x : x<γ } ( j k +1 − j k ) . This term will allow to lower the performance of classifications with unrealisti-cally short states.In practice, we will need to extend the functions to the real line in orderto use the LTS distance as its definition applies only to functions defined onwhole of R . Hence, an extension is necessary. One natural extension could beto extend the first and the last state of each function indefinitely. However,this solution leads to a problem. Consider two functions f and g that differonly on the interval [0 , A ). No matter how small A is, the distance between f and g will always be infinite when using this extension, since in this case f g are in different states on the whole half line ( −∞ , A ). Both functionsneed to be extended by the same state for the distance to be finite. We extendany function f defined on interval [0 , T ] to the real line, setting its value to anarbitrary state s outside of [0 , T ): f ∗ ( t ) = (cid:40) f ( t ) , t ∈ [0 , T ) s , t (cid:54)∈ [0 , T ) . (3)We combine the LTS distance and the duration penalty term to define theLTS measure of closeness of two trajectories. Definition 2.4.
Let f be a function of true labels and g its estimate, bothdefined on [0 , T ]. The LTS measure is defined as:
LT S w,σ,λ,γ ( f, g ) = exp( − LT S w,σ ( f ∗ , g ∗ ) /T − DP λ,γ ( g )) . The scaling through the division by T normalizes the LTS distance to theinterval [0 , , + ∞ ) (cid:51) x → exp( − x ) ∈ (0 ,
1] maps thesum of the LTS distance and the duration penalty term to the interval (0 , g is closer to f if the LTS measure is closerto 1. When choosing a classifier for the task of activity recognition, we are often facedwith a dilemma. We can choose a classifier that captures the underlying natureof the data better, for instance a classifier that assumes time dependence inthe data through the semi-Markov property. Then the computational cost ofestimation can be high and the estimation itself might be of lesser quality ifthere is not enough data or the quality of the data is poor. If we choose asimpler classifier we have no problems learning the parameters of the method,but the restrictive assumptions of a simple classifier might not be satisfied. Fora simple classifier one typically assumes independence between the observations;an assumption especially dangerous in the case of activity recognition since thedata coming from sensors are highly time-dependent. One way to mitigate thisproblem is to use the sliding window technique that equips each time pointwith some knowledge about the past and the future, however, simple classifiers(such as decision trees) themselves are not capable of using the informationabout the distribution of durations in their prediction. The goal of this sectionis to provide a post-processing procedure that corrects for classifier’s mistakesregarding the distribution of durations.Now we specifically focus on binary setting, so S = { , } are the states.10 efinition 3.1 (Function with bounded minimum duration of states) . Givenparameter γ > G γ ⊂ T , the set of functions with bounded minimumduration of states , such that for g ∈ G γ we have • g = N (cid:80) i =1 [ L i ,U i ) for some constant N ∈ N and an increasing sequence L < U < L < ... < U N (we allow L = −∞ and U N = ∞ ), • if N ≥
1, then ∀ i U i − L i ≥ γ and ∀ i> L i − U i − ≥ γ .We will project T onto G γ . As a measure of closeness between functionsfrom T and G γ , we use the standard distance on T as defined in (1) (with thediscrete metric d on S ) together with a penalization of jumps of g . In this case,the standard distance on T coincides with the L -distance (when S = { , } ).Let f ∈ T and g ∈ G γ . Then we introduce the notation: E γ ( f, g ) = (cid:107) f − g (cid:107) + γ · J ( g ) / , (4)where J ( g ) is the number of jumps of g .Given f ∈ T , our goal is to find ˆ f ∈ G γ such thatˆ f = arg min g ∈G γ E γ ( f, g ) (5)and then ˆ f is called a projection of f onto G γ .The regularization by penalizing high numbers of jumps narrows down theset of possible solutions to a finite nonempty subset of G γ (as will be shownlater), which leads to the existence of ˆ f . The solution might not be unique, asillustrated by the following example.Let f = [0 . , . + [0 . , + ∞ ) and γ = 0 .
2. Both ˆ f = [0 . , + ∞ ) as wellas ˆ f = [0 . , + ∞ ) are the projections of f . One could think of it as an issue,however, it does reflect well our understanding of the original problem. Theassumption is that f has impossibly short windows, because it is uncertainwhich activity is actually performed in the interval [0 . , . f we are unable to decide ourselves which solution is more suitable, hence itis only natural that the method also returns two possible options. Nevertheless,in real applications we can expect that such a situation will occur rarely. In general, finding ˆ f might not be an easy task. As an example, consider figure 3,where the function f was projected onto G . . Checking all possible functionsfrom G . is naturally infeasible, but also there does not seem to be any clearrule with regards to which jumps should be present in a projection. Naively,we could think that the shorter segments are removed and in general we cansee this is somewhat true, but a good counterexample to this rule are the shortactivities in the interval [8, 9]. 11 f ˆ f Figure 3: Example of projecting function from T onto G . We will devise a method for finding a projection in an efficient manner. Afunction f can have multiple uninterrupted intervals shorter than γ . Each se-quence of these intervals can be studied separately in order to find the optimal ˆ f as proved in the appendix. It is implied from the proof that a projection willnot introduce new jump locations since in that case the L -penalty could alwaysbe reduced by moving the new jump locations to jump locations of the originalfunction. Without loss of generality we will assume that f has n ≥ j i for i = 1 , ..., n , such that 0 < j i − j i − < γ for i = 2 , ..., n , and noother jumps are made. Since we can always consider the function 1 − f insteadof f , we will also assume that f takes value 0 in the interval ( −∞ , j ). Lastly,we use the following notation: j = −∞ , j n +1 = ∞ .Now we introduce the problem of finding the shortest path in a graph, whichis equivalent to finding ˆ f as will be shown later. We will now define the graphfor the shortest path problem. Let G = ( V, A ) be a directed graph such thatthe set of vertices V is given by V = { j , j , ..., j n , j n +1 }\{ j , j n − } (6)and the set of directed arcs is given by A = (cid:32) n (cid:91) l =0 A l (cid:33) \{ A } , (7)12here: A = { ( j , j k ) : ∀ k ∈{ ,...,n +1 }\{ n − } k mod 2 = 1 } ,A l = { ( j l , j k ) : ∀ k ∈{ l +3 ,...,n +1 }\{ n − } k − l ≡ } ,l = 1 , ..., n − ,A n − = ∅ ,A n = { ( j n , j n +1 ) } . There is a correspondence between each path from j to j n +1 and a sequenceof jumps in the interval ( j − γ, j n + γ ). A path ( j , j l , ...j l m , j n +1 ) representsa function g with jumps at j l , ..., j l m , such that f ( j l k ) = g ( j l k ). As we cansee some jumps of f are not present in V and many of the possible arcs areexcluded from A as well. It is shown later that such V and A are sufficient tofind an optimal ˆ f . However, not all paths correspond to a function from G γ .We will introduce a weight function w : A → R + ensuring that every path offinite cost corresponds to a function from G γ and, moreover, that the cost ofthe path coincides with the error E ( f, · ) of the corresponding function in theinterval ( j − γ, j n + γ ). Let I k = j k +1 − j k for k = 0 , ..., n . It is noteworthythat I , I n = ∞ , while I k < γ for k = 1 , ..., n −
1. We introduce penalty fora jump J k = γ/ k = 1 , ..., n and J n +1 = 0. The function H γ : R → R ,validating the assumption of the class G γ , is defined as follows: H γ ( x ) = (cid:40) ∞ , x < γ , x ≥ γ. Now we define the weight function w : w (( j k , j l )) = H γ ( j l − j k ) + l − (cid:88) m = k +1 m ≡ k +1 mod 2 I m + J l , (8)for all possible arcs ( j k , j l ), where by convention b (cid:80) m = a c m = 0 for all a > b . Thefirst term in this formula ensures that [ j k , j l ] is an interval longer than or equalto γ . The second term finds the L norm of f − g in [ j k , j l ]. The last term addsa penalty for jump at j l if j l is finite (the penalty for jump at j k was added ona previous arc in the path, if k > Lemma 3.1.
Let γ > and f ∈ T . Let J denote the set of all discontinuitiesof the function f . If a function g ∈ G γ contains jumps outside of J or jumpsfrom J , but in an opposite direction than in f , g cannot be a projection of f onto G γ . Theorem 3.1 (Problem equivalence) . Let γ > and ( j , ..., j n ) be the onlydiscontinuities of a function f ∈ T . Let G = ( V, A, w ) be a weighted, directedgraph defined as in (6) , (7) , (8) above. The task of finding a projection of f onto G γ , defined as in (5) , is equivalent to finding the shortest path from j to j n +1 in the graph G . E is at least γ/
2. On the other hand the weight can be at most γ/
2, in order tostudy each uninterrupted sequence of intervals shorter than γ separately. Thismotivates our choice for γ/ γ = 0 .
2, consider the function f = [0 . , . + [0 . , . and all itspossible projections: g ≡ g = [0 . , . , g = [0 . , . , g = [0 . , . . Wecan calculate the closeness of each g i from f : E ( f, g ) = 0 . , E ( f, g ) = 0 . ,E ( f, g ) = 0 . , E ( f, g ) = 0 . g is the projection of f onto G . . −∞ . . ∞ . . . Figure 4: Graph G constructed for the function f .We will construct the graph G , defined as in (6), (7), (8), for f . The sets ofvertices and arcs are as follows V = {−∞ , . , . , ∞} ,A = { ( −∞ , . , ( −∞ , ∞ ) , (0 . , . , (0 . , ∞ ) } and the weights are shown in figure 4. There are two possible paths from −∞ to ∞ . The path P = ( −∞ , . , . , ∞ ) has the cost equal to 0 .
25, while thepath P = ( −∞ , ∞ ) has the cost 0 .
3. Since P has a lower cost, we concludeagain that g is the projection of f onto G . .In general, we can use the following known algorithm (exercise 22.4-2 in [33,pp. 614]) to find the shortest path between j and j n +1 . Let G be any directedacyclic graph (DAG), w the corresponding weight function, s the source and e the end.The total running time of the algorithm is Θ( | V | + | A | ). It is also worthnoting that in our case V is already topologically sorted. Figure 5 shows thelinearity of the running time as a function of | V | + | A | and figure 6 showsthe graph size as a function of number of jumps. It is worth noting that inapplication of the real data we did not encounter sequences of jumps of lengthgreater than 400. 14 lgorithm 1 Finding shortest path in directed acyclic graph procedure DAG-ShortestPaths ( G , w , s , e ) topological sort of graph G for v ∈ V do v.d = ∞ // cost of the currently shortest distance from s to v v.π =NULL // predecessor of v on the currently shortest path s.d = 0 for u ∈ V , taken in topologically sorted order do for each neighbour v of vertex u do if v.d > u.d + w (( u, v )) then v.d = u.d + w (( u, v )) v.π = u shortestPath ← empty array currNode ← e while currNode (cid:54) = s do append currNode to shortestPath currNode ← currNode .π append s to shortestPath reverse shortestPath return shortestPath , e.d | V | + | A | . . . . . . . . t i m e ( s ) creation+shortestshortestcreation Figure 5: Running time of the algorithm 1 as a function of | V | + | A |
200 400 600 800 1000 number of jumps | V | + | A | Figure 6: Graph size as a function of number of jumps
We will now demonstrate the benefits of the post-processing by projection, uti-lizing the LTS measure to compare different methods of classification. First, wewill describe the dataset further. Recall that the data comes from IMU sensorslocated on 5 different body parts: left shank (LS), right shank (RS), left thigh(LT), right thigh (RT) and pelvis (P). Each IMU sensor contains a 3-axis ac-celerometer (Acc) and a 3-axis gyroscope (Gyro). The naming convention willbe as follows: LSAccX refers to the x -axis of the accelerometer located on theleft shank. The data come in the form of short time series, each containing oneexercise only. The type of the exercise is always given, but it is possible for thetime series to contain other activities as well, such as standing. To show theadvantages of post-processing by projection, we select only two states: standingand another activity encoded as 0 and 1, respectively. 15 time series (represen-tative of all possible actions performed by athletes) were manually labelled timepoint by time point in order to be able to train classifiers, and these will formour sample.In pre-processing we are using the sliding window technique on the sensors[34]. This method transforms the original raw data using windows of fixedlength d and a statistic of choice T : given a time point t , its neighbourhoodof size d is fed to the statistic T for each variable separately. Performing theprocedure for each time point results in a time series of the same dimensionas the original one, but every observation is equipped with some knowledge16bout the past and the future through the statistic T and through forming theneighbourhoods of size d . Regarding the choice of statistic T one needs to becareful, since the sensors are highly correlated with each other. The informationabout standing contained in one variable is comparable to the one in another,namely the variance of the signal is low when the person is standing (differencescan occur when considering different legs; a low variance on one leg might bemisleading since the other leg might already be transitioning into another state).10-fold cross-validation will be performed in order to select the best per-forming classification method. 10 time series will be used for training and 5 fortesting. A typical approach to k -fold cross-validation with a training sample ofsize k − • We have limited information regarding how uncertain locations of statetransitions are, but based on the small experiment described in section 2.1we select σ = 0 .
35 (the largest deviation between different ground truthlabels). • A weight w of the time shift represents the importance (or the certainty)of state transitions in the ground truth. It will be selected as follows. Weknow that the visualization tool applied for manual labelling used 0 .
1s asthe smallest step. Additionally in our problem it is difficult to find tran-sitions between states objectively, hence we can expect state transitionsto be misplaced on top of the limitations of the tool used for labelling.An experiment in section 2.1 regarding timing uncertainty in the groundtruth labels shows that the standard deviation of state transitions place-ment ranges from 0.09s to 0.34s (we keep in mind that the sample size wasvery small). It seems that for some activities it is more difficult to identifythe boundaries than for others. Hence, we allow for additional 0 .
05s ontop of the limitation of the visualization tool. If w = 0 .
6, then the maxi-mum time shift σ = 0 .
35 is lowered by almost 0 .
15s in error measurement,hence we select 0 . w . • The lower bound γ on the duration of activities is selected as the lengthof the shortest activity in the learning dataset, which is equal to 0 .
8s inour case. • A penalty λ represents the cost of additional or missing jumps in a statesequence compared to the ground truth labels. Since an estimate alreadypays an L penalty for misclassification, the penalty λ should only besupplementary. We decide for the penalty λ = 0 . λ is positive as illustrated in figure 7. Both estimates are equalin performance without the penalization by λ , but with it estimate 1 isperforming better. 17 . . . . . . time (s) standingrunning truthest. 1est. 2 Figure 7: An importance of including penalty for violating the lower bound onthe duration of activities.Before assessing classifiers on the training set, one needs to consider anappropriate feature set. Our variables are highly dependent on one another, sowe start with feature selection. The setup is two-fold: first we perform featureranking using the Relieff algorithm [35]. The algorithm is iterative; weights ofthe features are initialized to zeroes and R observations are randomly drawnfrom the training sample. For each observation r i we find its k nearest hits H i (samples with the same state as the drawn observation) and k nearest misses M i (samples with a different state than the drawn observation) in the featurespace with Euclidean distance. The weight of a feature a is updated as follows: w ( a ) = w ( a ) + R (cid:88) i =1 (cid:18) (cid:88) h ∈ H i | h ( a ) − r i ( a ) | − (cid:88) m ∈ M i | m ( a ) − r i ( a ) | (cid:19) . In the next step we choose the 6 most relevant features based on the Relieffweights (6 features out of 30, results in 80% reduction of the feature set). Thenwe test all possible combinations of these features, which is now computationallyfeasible, in order to find the best set for each of the classifiers. The featuresselected by the Relieff algorithm are RTGyroX, RTGyroY, RTAccX, RTAccZ,LTAccY, PAccY.Proceeding with the cross-validation we select the following classifiers (withtheir abbreviations) to be assessed: DT - Decision Tree, kNN - k-Nearest Neigh-bors, LR - Logistic Regression, MLP - Multi-layer Perceptron, NB - Naive Bayes,RF - Random Forest, SVM - Support Vector Machine. The results of 10-foldcross-validation are shown in table 2. It is striking that all classifiers are on18lassifier OG Test PP TestMLP 0.916+/-0.031 0.972+/-0.008LR 0.898+/-0.034 0.968+/-0.015kNN 0.59+/-0.05 0.967+/-0.020RF 0.83+/-0.07 0.966+/-0.017SVC 0.894+/-0.034 0.966+/-0.017DT 0.83+/-0.07 0.965+/-0.008NB 0.88+/-0.04 0.944+/-0.023Table 2: Average of the 10-fold cross-validation scores for all classifiers usingthe best sensor set for each of them. The pre-processing consisted of the slidingwindow technique in combination with summarizing by the standard deviation.The OG Test averages the LTS measure on the test set for the original classifier,while the PP Test is the same value for the post-processed classifier.average within 2.8% of test score. This is due to post-processing by projection.The correction it provides brings all classifiers closer together. This astonishingresult can be extended even further. The test score of a decision tree rangesfrom 59% to 86% for different sensor sets before post-processing, while usingthe post-processing results in a range of test scores from 93% to 96.5% and thisis not specific to decision trees only.The example shows that the post-processing is crucial. First, it increases ac-curacy of a given estimator on a given feature set by 35%. Second, it diminishesthe impact of feature selection as the difference in accuracy between differentfeature subsets decreases substantially. Feature selection is of course still im-portant as it decreases computational complexity of the problem and allowsto get rid of redundancy in the feature set. However, with methods that onlyrank features such as Relieff the choice of the threshold we choose to classifya feature as significant or not is less important. Finally and most importantly,the post-processing by projection allows to select a method according to criteriaother than the performance, namely the computational speed.
This paper introduces measures of classifier performance in the task of activityrecognition using wearable sensors. It addresses the issue of timing offsets aswell as unrealistic classifications, while retaining a typical scalar output of aperformance measure allowing for easy comparisons between classifiers.We have also introduced a post-processing scheme that allows to improveestimates in the binary setting. It finds estimated activities that are too shortand eliminates them in an optimal way by finding the shortest path in a directedacyclic graph.Real-life football sensor data were used to assess the adequacy of the post-19rocessing scheme. It significantly improved performance of the classifiers. Atthe same time, post-processed classifiers are closer to each other in performancethan the original ones. This allows placing more importance on other criteria,such as the computational speed of a method.
Acknowledgments
We thank Erik Wilmes for providing footballdata of high quality and the stick-model anima-tion tool. It was the basis for the analysis of ourmethods in section 4. We also thank Bart vanGinkel for the idea of how to generalize from thebinary to the multiclass case. This work is partof the research programme CAS with projectnumber P16-28 project 2, which is (partly) fi-nanced by the Dutch Research Council (NWO).20
Proofs
GTS distance with w > and s = ∞ is an extended metric Proof.
We will show that:
GT S w ( f, g ) = inf (cid:15) ∈ R { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} is an extended metric on T .0. Since for any (cid:15) , dist( f ◦ τ (cid:15) , g ) ≥ w | (cid:15) | ≥ GT S w is non-negative.1. It is obvious to see that GT S w ( f, f ) = 0 for any f ∈ T . Now let us assumethat for some f, g ∈ T we have GT S w ( f, g ) = 0. This implies that ∃ ( (cid:15) n ) dist( f ◦ τ (cid:15) , g ) + w | (cid:15) n | n →∞ −−−−→ . Since dist( f ◦ τ (cid:15) , g ) + w | (cid:15) n | is an upper bound of { dist( f ◦ τ (cid:15) , g ) , w | (cid:15) n |} , wehave | (cid:15) n | n →∞ −−−−→ , (cid:90) R d ( f ◦ τ (cid:15) ( t ) , g ( t )) dλ ( t ) n →∞ −−−−→ . From Fatou’s lemma we have (cid:90) R lim inf n →∞ d ( f ( t − (cid:15) n ) , g ( t )) dλ ( t ) = 0 , where λ is the Lebesgue measure on R . Because f and g are c`adl`ag and d is continuous, this implies that for almost all t we have f ( t − ) = g ( t ) or f ( t ) = g ( t ) and so we conclude that f = g almost everywhere.2. Let f, g ∈ T , we have following GT S w ( f, g ) = inf (cid:15) { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) { dist( g ◦ τ − (cid:15) , f ) + w | − (cid:15) |} = inf − (cid:15) { dist( g ◦ τ (cid:15) , f ) + w | (cid:15) |} = inf (cid:15) { dist( g ◦ τ (cid:15) , f ) + w | (cid:15) |} = GT S w ( g, f ) , hence we conclude that GT S w is symmetric.21. Letting f, g, h ∈ T , we have following GT S w ( f, g ) = inf (cid:15) { dist( f ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) ◦ τ (cid:15) , g ) + w | (cid:15) + (cid:15) |}≤ inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) ◦ τ (cid:15) , h ◦ τ (cid:15) ) + dist( h ◦ τ (cid:15) , g )++ w | (cid:15) | + w | (cid:15) |} = inf (cid:15) ,(cid:15) { dist( f ◦ τ (cid:15) , h ) + w | (cid:15) | + dist( h ◦ τ (cid:15) , g ) + w | (cid:15) |} = inf (cid:15) { dist( f ◦ τ (cid:15) , h ) + w | (cid:15) |} + inf (cid:15) { dist( h ◦ τ (cid:15) , g ) + w | (cid:15) |} = GT S w ( f, h ) + GT S w ( h, g ) , which shows that GT S w satisfies triangle inequality and that concludesthe proof. The LTS distance with w > is a semimetric Proof.
Let w > , σ > d on S be fixed. We observe that LT S w,σ is nonnegative. Symmetry of
LT S w,σ follows directly from the definition. Itonly remains to show that
LT S w,σ ( f, g ) = 0 if and only if f = g for f, g ∈ T .We have LT S w,σ ( f, f ) = 0 , because there is only one segment. Assume now that LT S w,σ ( f, g ) = 0 and f (cid:54) = g . In that case, there exists more than one segment. LT S w,σ ( f, g ) = l − (cid:88) i =1 δ i ( a i +1 − a i ) d ( f ( a i ) , g ( a i )) = 0 ⇒ ∀ i =1 , , ,...,l − f ( a i ) = g ( a i ) , which implies that f = g , which contradicts the assumption. We conclude that LT S w,σ ( f, g ) = 0 iff f = g , which completes the proof. A projection
T → G γ , f (cid:55)→ ˆ f does not change states that last longerthan γ , while if a state lasts exactly γ there exists a projection thatdoes not change it Proof.
Let f be a function with two neighboring jumps j , j satisfying thecondition j − j ≥ γ . Since the interval is longer than or equal to γ it satisfiesthe condition of the class G γ . If j − j > γ , then no matter what the values ofa projection ˆ f will be outside of the interval ( j , j ] it will always be cheaper tomatch the values of f on ( j , j ] and to take a penalty of at most 2 · γ/ γ for possible jumps at the boundary of the interval than to take an L -penaltyof at least γ by assigning a value different from f on the interval ( j , j ]. If j − j = γ , then whether we match the values of f on ( j , j ) or remove thejumps at j and j we take penalty of exactly γ . There is no unique projectionin this case, but one of them allows no change of the state.22 roof of Lemma 3.1 Let ˆ f be a projection of f onto G γ . Assume that ˆ f contains a jump j such that it is outside of the set J of discontinuities of f orit is inside J , but is in opposite direction than in f . Without loss of generalitywe assume j is a jump from 0 to 1. Amongst jumps of f closest to j from theleft and from the right we denote by j k the one from 0 to 1. Such jump exists,otherwise in the case of j (cid:54)∈ J , ˆ f would differ from f on an infinite interval tothe left or to the right of j or there would exist a state in ˆ f not present in f ,while in the case of j ∈ J , but in opposite direction than in f this would implythat f has only one jump, at j , which is false and hence ˆ f could not have beena projection of f onto G γ . Without loss of generality we assume j k is the jumpof f closest to j from the left. Let j a be a jump of ˆ f preceding j (if such jumpdoes not exist then j a = −∞ ). It is important to note that j k ≥ j a , unless j k isthe last jump of f (which can only happen if j (cid:54)∈ J ), in which case it is trivial tosee that ˆ f can be easily improved upon to reduce error (by removing jumps j a and j entirely) which contradicts its optimality. If j k − j a ≥ γ , then movingthe jump of ˆ f from j to j k results in the reduction of error by j − j k , whichcontradicts the assumption of optimality of ˆ f . If j k − j a < γ , then removingjumps j a and j from ˆ f yields a better approximation (increase in L norm issmaller than γ , while decrease in jump penalty is equal to γ ). Hence ˆ f cannotbe a projection of f . Proof of Theorem 3.1
We use Lemma 3.1 twice to prove that a projectionof a function from T onto G γ can only make jumps at the some positions andin the same directions as the jumps in the projected function. This leads tothe fact that finding the shortest path in the graph defined in the paper is aproblem equivalent to finding ˆ f . References [1] Wesllen S. Lima et al. “User activity recognition for energy saving insmart home environment”. In:
Proceedings of the 2015 IEEE Symposiumon Computers and Communication (ISCC) . New York, NY: IEEE, 2015,pp. 751–757.[2] Markus Eckelt, Franziska Mally, and Angelika Brunner. “Use of Accel-eration Sensors in Archery”. In:
Proc.
49 (2020), p. 98. url : https ://doi.org/10.3390/proceedings2020049098 .[3] Stylianos Paraschiakos et al. “Activity recognition using wearable sensorsfor tracking the elderly”. In: User Model. and User-Adapt. Interact. url : https: //doi .org/10 .1007/ s11257- 020-09268-2 . The proof should first be conducted for the case of a jump outside of J and then for thecase of a jump in opposite direction than in f . Int. J. of Neural Syst.
27 (2017),p. 1650041. url : https://doi.org/10.1142/S0129065716500416 .[5] Maurits Waterbolk et al. “Detection of Ships at Mooring Dolphins withHidden Markov Models”. In: Transp. Res. Rec. url : https://doi.org/10.1177/0361198119837495 .[6] R. Varatharajan et al. “Wearable sensor devices for early detection ofAlzheimer disease using dynamic time warping algorithm”. In: Clust. Com-put.
21 (2018), pp. 681–690. url : https://doi.org/10.1007/s10586-017-0977-2 .[7] Oliver Amft et al. “Analysis of Chewing Sounds for Dietary Monitor-ing”. In: Proceedings of the UbiComp 2005: Ubiquitous Computing . Ed.by Michael Beigl et al. Vol. 3660. Lecture Notes in Computer Science.Berlin, Heidelberg: Springer, 2005, pp. 56–72.[8] Agata Ko(cid:32)lakowska, Wioleta Szwoch, and Mariusz Szwoch. “A Review ofEmotion Recognition Methods Based on Data Acquired via SmartphoneSensors”. In:
Sens.
20 (2020), p. 6367. url : https://doi.org/10.3390/s20216367 .[9] Oscar D. Lara and Miguel A. Labrador. “A Survey on Human ActivityRecognition using Wearable Sensors”. In: IEEE Commun. Surv. & Tutor.
15 (2013), pp. 1192–1209. url : https://doi.org/10.1109/SURV.2012.110112.00192 .[10] L. Minh Dang et al. “Sensor-based and vision-based human activity recog-nition: A comprehensive survey”. In: Pattern Recognit.
108 (2020), p. 107561. url : https://doi.org/10.1016/j.patcog.2020.107561 .[11] Jindong Wang et al. “Deep learning for sensor-based activity recognition:A survey”. In: Pattern Recognit. Lett.
119 (2019), pp. 3–11. url : https://doi.org/10.1016/j.patrec.2018.02.010 .[12] Charissa Ann Ronao and Sung-Bae Cho. “Recognizing human activitiesfrom smartphone sensors using hierarchical continuous hidden Markovmodels”. In: Int. J. of Distrib. Sens. Netw.
13 (2017), p. 1550147716683687. url : https://doi.org/10.1177/1550147716683687 .[13] Nicole A. Capela, Edward D. Lemaire, and Natalie Baddour. “Featureselection for wearable smartphone-based human activity recognition withable bodied, elderly, and stroke patients”. In: PLoS One
10 (2015), e0124414. url : https://doi.org/10.1371/journal.pone.0124414 .[14] Carlos Aviles-Cruz et al. “Granger-causality: An efficient single user move-ment recognition using a smartphone accelerometer sensor”. In: PatternRecognit. Lett.
125 (2019), pp. 576–583. url : https : / / doi . org / 10 .1016/j.patrec.2019.06.029 .[15] Shuangquan Wang and Gang Zhou. “A review on radio based activityrecognition”. In: Digit. Commun. and Netw. url : https://doi.org/10.1016/j.dcan.2015.02.006 .2416] Ramona Rednic et al. “Wearable posture recognition systems: Factorsaffecting performance”. In: Proceedings of 2012 IEEE-EMBS InternationalConference on Biomedical and Health Informatics . New York, NY: IEEE,2012, pp. 200–203.[17] Chun Zhu and Weihua Sheng. “Motion- and location-based online humandaily activity recognition”. In:
Pervasive and Mob. Comput. url : https://doi.org/10.1016/j.pmcj.2010.11.004 .[18] Maria Cornacchia et al. “A Survey on Activity Detection and Classifica-tion Using Wearable Sensors”. In: IEEE Sens. J.
17 (2016), pp. 386–403. url : https://doi.org/10.1109/JSEN.2016.2628346 .[19] Lu Li et al. “Indirect activity recognition using a target-mounted cam-era”. In: Proceedings of the 2011 4th International Congress on Imageand Signal Processing . Ed. by Peihua Qiu et al. New York, NY: IEEE,2011, pp. 487–491.[20] Michael S. Ryoo and Larry Matthies. “First-Person Activity Recognition:What Are They Doing to Me?” In:
Proceedings of the 2013 IEEE Confer-ence on Computer Vision and Pattern Recognition . New York, NY: IEEE,2013, pp. 2730–2737.[21] Yoshihiro Watanabe et al. “Human gait estimation using a wearable cam-era”. In:
Proceedings of the 2011 IEEE Workshop on Applications of Com-puter Vision . New York, NY: IEEE, 2011, pp. 276–281.[22] Kai-Tai Song and Wei-Jyun Chen. “Human activity recognition using amobile camera”. In:
Proceedings of the 2011 8th International Conferenceon Ubiquitous Robots and Ambient Intelligence (URAI) . New York, NY:IEEE, 2011, pp. 3–8.[23] Ivan Laptev et al. “Learning realistic human actions from movies”. In:
Proceedings of the 2008 IEEE Conference on Computer Vision and Pat-tern Recognition . New York, NY: IEEE, 2008, pp. 1–8.[24] Yan Ke, Rahul Sukthankar, and Martial Hebert. “Efficient visual eventdetection using volumetric features”. In:
Proceedings of the Tenth IEEEInternational Conference on Computer Vision (ICCV’05) . Vol. 1. NewYork, NY: IEEE, 2005, pp. 166–173.[25] Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. “UTD-MHAD: Amultimodal dataset for human action recognition utilizing a depth cameraand a wearable inertial sensor”. In:
Proceedings of the 2015 IEEE Inter-national Conference on Image Processing (ICIP) . New York, NY: IEEE,2015, pp. 168–172.[26] Jamie A. Ward, Paul Lukowicz, and Gerhard Tr¨oster. “Evaluating Per-formance in Continuous Context Recognition Using Event-Driven ErrorCharacterisation”. In:
Location- and Context-Awareness . Ed. by Mike Hazas,John Krumm, and Thomas Strang. Berlin, Heidelberg: Springer, 2006,pp. 239–255. 2527] Chin-Chia Michael Yeh, Nickolas Kavantzas, and Eamonn Keogh. “MatrixProfile IV: Using Weakly Labeled Time Series to Predict Outcomes”. In:
Proceedings of the VLDB Endowment . Ed. by Peter Boncz and Ken Salem.Vol. 10. VLDB Endowment, 2017, pp. 1802–1812.[28] Erik Wilmes et al. “Inertial Sensor-Based Motion Tracking in Footballwith Movement Intensity Quantification”. In:
Sens.
20 (2020), p. 2527. url : https://doi.org/10.3390/s20092527 .[29] Wesllen Sousa Lima et al. “Human Activity Recognition Using InertialSensors in a Smartphone: An Overview”. In: Sens.
19 (2019), p. 3213. url : https://doi.org/10.3390/s19143213 .[30] Joan Serr`a and Josep Lluis Arcos. “An empirical evaluation of similaritymeasures for time series classification”. In: Knowl.-Based Syst.
67 (2014),pp. 305–314. url : https://doi.org/10.1016/j.knosys.2014.04.035 .[31] Jamie A. Ward, Paul Lukowicz, and Hans W. Gellersen. “PerformanceMetrics for Activity Recognition”. In: ACM Trans. on Intell. Syst. andTechnol. url : https : / / doi . org / 10 . 1145 / 1889681 .1889687 .[32] Patrick Billingsley. Convergence of Probability Measures . second. Hobo-ken, NJ: John Wiley & Sons, Inc., 1999.[33] Thomas H. Cormen et al.
Introduction to Algorithms . third. Cambridge,MA: The MIT Press, 2009.[34] Thomas G. Dietterich. “Machine Learning for Sequential Data: A Re-view”. In:
Structural, Syntactic, and Statistical Pattern Recognition . Ed.by Terry Caelli et al. Vol. 2396. Lecture Notes in Computer Science. Berlin,Heidelberg: Springer, 2002, pp. 15–30.[35] Igor Kononenko, Edvard ˇSimec, and Marko Robnik-ˇSikonja. “Overcom-ing the Myopia of Inductive Learning Algorithms with RELIEFF”. In:
Appl. Intell. url : https://doi.org/10.1023/A:1008280620621https://doi.org/10.1023/A:1008280620621