SummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis
Kevin M. Amaral, Zihan Li, Wei Ding, Scott Crouter, Ping Chen
SSummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis
Kevin Amaral
Computer ScienceUniversity of Massachusetts Boston
Boston, [email protected]
Scott Crouter PhD.
Exercise and HealthUniversity of Tennessee-Knoxville
Knoxville, [email protected]
Zihan Li
Computer ScienceUniversity of Massachusetts Boston
Boston, [email protected]
Ping Chen PhD.
Computer EngineeringUniversity of Massachusetts Boston
Boston, [email protected]
Wei Ding PhD.
Computer ScienceUniversity of Massachusetts Boston
Boston, [email protected]
Abstract — SummerTime seeks to summarize globally time seriessignals and provides a fixed-length, robust summarization ofthe variable-length time series. Many classical machine learn-ing methods for classification and regression depend on datainstances with a fixed number of features. As a result, thosemethods cannot be directly applied to variable-length time seriesdata. One common approach is to perform classification over asliding window on the data and aggregate the decisions made atlocal sections of the time series in some way, through majorityvoting for classification or averaging for regression. The downsideto this approach is that minority local information is lost inthe voting process and averaging assumes that each time seriesmeasurement is equal in significance. Also, since time series canbe of varying length, the quality of votes and averages couldvary greatly in cases where there is a close voting tie or bimodaldistribution of regression domain. Summarization conducted bythe
SummerTime method will be a fixed-length feature vectorwhich can be used in-place of the time series dataset for use withclassical machine learning methods. We use Gaussian Mixturemodels (GMM) over small same-length disjoint windows in thetime series to group local data into clusters. The time series’rate of membership for each cluster will be a feature in thesummarization. By making use of variational methods, the GMMconverges to a more robust mixture, meaning the clusters aremore resistant to noise and overfitting. Further, the model is nat-urally capable of converging to an appropriate cluster count. Wevalidate our method on our [The dataset is created using our NIHawards. The name is removed in compliance with Triple Blind]dataset, an imbalanced physical activity dataset with variable-length time series structure. We compare our results to state-of-the-art studies in physical activity classification and show high-quality improvement by classifying with only the summarization.Finally, we show that regression using the summarization canaugment energy expenditure estimation, producing more robustand precise results.
Index Terms —Time Series, Clustering, Classification, Regres-sion, Summarization
I. I
NTRODUCTION
The physical activity (PA) is one of the most important areaswhen people try to know more about themselves.To record theactivities, time series data is abundant in the PA, especiallywhen researchers record the activities in the dynamic way.In this paper, we propose a data-driven based method whichprovides a variable-length summarization of physical activitytracks.The physical activity is defined as any movement of thebody that requires energy expenditure. Since the activitiesalways happen with multiple body movements, we named asequence of single movements as a specific activity. Then,the different combinations of single breakdown movementsare defined as different PA. Researchers need to match thereal movement track with the pre-defined activities whichcontains a specific sequence of movements. Since the differentPAs may consist of the same sub-sequence of single bodymovements, it is crucial to decide how to break the fullmovement track. However, it is hard to set an appropriatebreak in the movement track based on the traditional PAtechnique. In this paper, we apply the data-driven machinelearning method to provide a way to redefine the PAs asdifferent clusters. These clusters will have significant commonpatterns but may not follow the traditional PA definitions (likethe specific sequence of single movements). With the data-driven model, the data can reveal their hidden patterns byitself. Since no pre-defined movement sequence is required,the chance bias can be reduced significantly. Another benefitof our data-driven method is to improve the accuracy byconsidering the small effective events into the model. Thesmall effective events can also have significant effects to thefinal result, like the long-tail effect. The information from both a r X i v : . [ c s . L G ] F e b ajority and minority of effective events will contribute in themethod equally.Our method, SummerTime ( Time series
Summa e r ization),leverages the features of time windows as local informationabout the time series signal and use them to create a globaldescription of the signal which preserves the information of ev-ery time window within it. The traditional time series machinelearning methods depend on a fixed-length representation sothat they can build relationships between features and usethose to produce their results. However, it is difficult to applyclassical time series machine learning methods to the timeseries PA data. Since, though, the time series PA data is rich ininformation, it is rarely uniformly structured across instances,differing in length and scale. For instance, in regression meth-ods, the time series data can be trained by treating the shorterlength of features. However, building models in this way hasto loss parts of information in the data, which would leadto overfitting and to reduce the accuracy of models workingwith future unseen longer features. Our model, SummerTime ,attempts to encapsulate all global information about the timeseries PA signal as a fixed-length representation, producing afeature vector with a constant number of features for eachinstance. Not only including as much as information fromthe data, but also providing a competitive solution to dealwith variable-length time series PA data. The features ofthis global description are learnt latently using clusteringof the local time windows. This effectively treats the biasfrom manually-constructed features and results in a higher-quality interpretation of the features. This does not, however,fully remedy the bias from the manually constructed timewindow features. The prime assumptions we make is that thetime window features are already of moderate-to-significantquality for locally describing the signals and that the mainobstacles of other time series methods are in the additionalassumptions they undertake to solve their respective problems.These assumptions have merit as we are able to generate fromthe local time window features potent summary features, andwe observe many opportunities for competitive approachesto introduce biases and make our best efforts to avoid thosebiases. In accounting for these biases, we observe overall betterperformance than each of our competitive methods. We hopeto resolve any further biases by having an end-to-end data-driven solution. Our contributions are as follows: • Fixed-length feature-vector representation of variable-length time series:
This is one of the major obstaclesthat variable-length time series faces. Many approachesseek to instead classify and regress on local informationabout the signal or to resize the signal to uniform lengthacross all instances. In either case, these approachesusually achieve the fixed-length representation so as touse classical methods. However, rescaling the signal relin-quishes information about the proportion to time (i.e. theconcept of an n -second interval is lost) and voting overthe classifications and regressions done in the local timewindows fails to encode global information about the signal as a whole. Our model achieves the representationwithout losing proportional and global information. • End-to-end data-driven approach:
Our method,
Sum-merTime , is designed to fit between the time series datasetand the classical methods you want to apply, performingfeature construction independently of the experimenterand extracting all knowledge from the data itself. Whilethis fits the idea of a black box, we are of the opinionthat this is a valuable trait in a framework. By havingthe method be data-driven, we avoid the opportunity forintroducing further bias from the experimenter and reducethe loss of information. • Demonstration of improved robustness using onlysummarization features:
With our dataset, we show thatclassification using one of the other schemes, specificallyvoting over local time windows, is vulnerable to cross-class classification error between far physical activitytypes. We then show that using only features extracted bythe
SummerTime method, it was able to out-perform themoverall, and also significantly improved the misclassifica-tion between near classes, which is a major demonstrationof robustness. • Demonstration of improved precision using summa-rization features to augment regression:
Again, in ourdataset, we show that we can augment existing state-of-the-art regression models by including our
SummerTime features. Not only does doing so improve regression re-sults, but also note that the regression RMSE of the Run-type activities, a close class to the Walk-type activities,drops to around the same level as Walk-type activities.This increased precision indicates that the
SummerTime features overcome some bias clearly present in all otherregression methods we compare against, each with run-type error well-above Walk-type error.The rest of this paper is organized as follows: The secondsection introduces the time series PA dataset we are going toanalysis. The third section goes on to describe the problemof variable-time series and the goal of classification andregression on those types of data. In the forth section, wedescribe our proposed data-driven method, the
SummerTime
Framework, which solve the problem outlined in the thirdsection of the paper. In the fifth section, we present ourexperiment results, and the interpretations thereof. In the sixthsection, we present related work. Finally, we conclude in theseventh section. II. D
ATA D ESCRIPTION
A. Dataset
In the medical field, measuring physical activity in childrenis crucial in learning the link between physical activity andhealth problems such as obesity and other metabolic issues.Measuring physical activity is generally done in two ways: (1)identifying the types of activities performed and (2) measuringcaloric or energy expenditure. Accelerometers prove to be theideal device for physical activity data collection. While theyig. 1: The
SummerTime framework diagram. The local window features are clustered to produce new summary features foruse in the Classification and Regression phases. The Classification Phase makes use of only the newly constructed summariesfrom the Clustering Phase. In our experimental setting, the classification is of physical activity type. In the Regression Phase,the Window Features, the Summary Features, and the Classification result are all used to produce the physical activity energyexpenditure estimate.o not directly identify activities being performed nor measurecaloric expenditure, they provide an objective measurementwhich is information-rich and can be analyzed to predict andestimate these two measurements. They are also a low-cost andnon-invasive alternative to indirect calorimetry, which mea-sures caloric expenditure through a breathing device. However,analyzing accelerometer data to make these predictions andestimates is still a difficult, unsolved problem and estimatesmade with state-of-the-art methods are still high-variance [7].Furthermore, classical machine learning methods cannot beapplied out-of-the-box to a free-living setting since physicalactivity accelerometer data collected outside of a laboratoryenvironment is generally unstructured or continuous streamtime series.The dataset used for validation is a combined datasetfrom two studies performed by [Citation and Name Removedin compliance with Triple Blind]. The data was collectedfrom child participants performing a wide variety of physicalactivities. The participants were outfitted with an Actigraph Xaccelerometer which was used to collect tri-axial accelerome-ter counts at one count per second. Participants were hookedup with a Cosmed K4b to collect energy expenditure mea-surements in MET (Metabolic Equivalent of Task) units.Start and end times were observed, as well as physicalactivity (PA) type for every bout of activity performed by eachparticipant. Our model will not predict cut-points for activitiesbut will instead classify the PA type and estimate physicalactivity energy expenditure (PAEE) of time series segments.As such, our dataset comes pre-segmented with true segmentbounds.The variable-length nature of the data is an allegory for thefree-living setting, where activity length is not rigidly defined.The common approach to deal with multi-instance data, datafor which each instance has multiple parts, is break it upinto its smaller parts and build a model over the membersof the multi-instances instead. We will do just that, but wewill maintain the association between the multi-instance andits members by referring to the multi-instance as the activitybout and its members as activity windows which belong to thebout.In total, 184 child participants’ data were used. There were98 male participants between 8 years old and 15 years old.There were 86 female participants between 8 years old and14 years old.For each of these participants, one bout of lying restingfor up to 30 minutes with a median time of 17 minutes. Allother activities were performed for up to 10 minutes with amedian time of 4 minutes. All activity date is collected at 1-second resolution. Activities which were included in the studywere Computer Games, Reading, Light Cleaning, Sweeping,Brisk Track Walking, Slow Track Walking, Track Running,Walking Course, Playing Catch, Wall Ball, and WorkoutVideo. These activities classes were binned into categoriesby the rigorousness of the associated activity; the categoryclasses were Sedentary (Sed), Light Household Chores andGames (LHH), Moderate to Vigorous Household Chores and Sports (MtV), Walking (Walk), and Running (Run). See TableI for a visual breakdown of the categories. The categories areorganized in order of ascending intensity.The representation of each activity category in the datasetis not equal. Reference again Table I to see the huge disparityin the number of windows.III. P ROBLEM D ESCRIPTION
Physical activities (PA) of human consist of the elementalmovements of body, such as leg raising and arm stretching.We define each breakdown of movement as an atomic eventof the time series activities data. And each observed singleactivity is performed as a PA signal. In this case, each activitysignal is the combination of multiple activity events. The timeseries PA data are a discrete sequence of events of the form D = { x t , x t , . . . ; t i ∈ T } such that their index set T istemporal in nature. In many cases, the events do not needto occur at a constant frequency but for the purposes of thispaper, we will consider only the case when the time series hasevents at a constant frequency.The most popular two techniques of prediction methods areclassification and regression. However, both of them are notsuitable for the variable length time series PA activities data.The reasons and solutions are introduced as follow.Classification is the process in which a trained modelmaps time series data to a nominal class label. Traditionalclassification methods operate on the fixed number of features,such that they can be generalized as follows, where D is thespace of instances which have k features and C is the spaceof class labels. f ( x , . . . , x k ) : D → C (1)For a fixed-length vectors dataset, we can apply thosetraditional methods out of the box. However, in the case ofvariable-length time series, it is impossible to apply traditionalmodels directly.One current solution to this is to break-up each time seriessignal into equally-sized time series windows and to performtraditional classification over features of the windows. Thefinal result would be aggregated through voting or taking themode. This results in loss of minority information within thesignal. However, minority information may also contribute tothe classification, especially when the difference is not verysignificant. For example, in the cases where a signal has equalparts of two classes, the method observes 50% confidence inits classification result. This is akin to a coin flip.Another current solution to the variable-length time seriesdata is to rescale the time series signal so that all instances areof the same length. If done through interpolation, estimatingand including in-between points, we introduce bias from ourchoice of interpolation method. In non-smooth signal domains,estimates of in-between points can be radically inaccurate.Regardless of if we account for that bias somehow, rescalingthe time series removes proportionality from the signal, relin-quishing the distinction between an n -second interval and an -second interval. This causes another bias in that n -pointintervals are treated equally, even if they correspond to vastlydifferent stretches of real time.In this paper, instead of convert variable-length time seriesdata into a fixed-length summarized format, we propose amethod to do this in a way that preserves global informationabout the signal (i.e. no data point’s contribution is unac-counted for) and preserves in someway proportionality. Furthermore, our method works in a data-driven manner to avoidbiases causing by human being, such as the experiments orthe labeling process.Our solution is to cluster the time series window featuresand then use the ratios of membership to each cluster as thefeatures of the summary. This technique can fully encapsulatethe global information of the PA signals since the inclusionor exclusion of any events in the signal will affect everysummary feature. In this way, the proportionality can beaccounted by comparing the variable length time series signalswithout treating their events as equally frequent. Finally, theunsupervised clustering technique is applied to construct thesummary of the signals, which can avoid the bias causingby humans when assigning signals into the targeted clusterssignificantly.Regression part has a similar process to classification in thatwe are building a model which maps from a time series dataspace to a target space. However, the target space of regressionmethods is generally numeric and continuous, rather thancategorical and discrete as in a nominal space of classification.Regression models are usually of the form: f ( x , . . . , x k ) : D → R d (2)The same as classification case, traditional regression mod-els only can operate over data with the same number ofcomponents. One current solution for applying regressionmodels to variable-length data is to treat shorter instances ashaving missing values. The problem of this is that there arevariables which become specifically tuned to long instancesfrom the training set. When presented with new long-instancetest data from a different domain distribution, those variableswill impact the regression results in ways the training processcannot account for.Another current solution is to perform the regression locallyon windows and aggregate the regression result as a totalor average or similar statistic over the windows. In caseswhere the distribution of windows is multi-modal, an equally-weighted average can push results in the direction of themajority, heavily influenced by the number of points in thesignal. This is not ideal since it leads to the issues of propor-tionality like it in the classification. However, this influencecan be lessened without having to make major changes to theunderlying regression model.Our solution to the regression problem is to apply multipledistributions following the clusters from the classificationresults. In cases where the PA signals to be regressed overhave separate classifications, are different from each other in some categorical way, regression models can be built for eachcategory to produce better results than if a single model wasdesigned to handle every category. So, to decide on whichregression model should be applied to a new time series PAsignal, the classification of its events is necessary. The regres-sion model would be applied to each window and aggregatethe result over the whole signals as usual. But, we will include,as regression variables, the time series summarization of thesignal from the clustering. These variables carry informationabout the signal as a whole which we demonstrate to have apositive effect on time series of differing lengths.IV. M ETHODOLOGY
A. Modeling Approach
Compared with the traditional methods, our SummerTimealgorithm works on both the fixed-length and variable-lengthtime series data. Further more, our model can provide verycompetitive results of regression technique by consideringmultiple distributions for different kinds of PA signals.There are three major phases in the algorithm. The
Sum-merTime framework can be introduced as follow:1) Clusters Generation: Produce Time Series Summariza-tions2) Classification: Determine the Class of the Time SeriesSignals from the Summarization3) Regression: Select the Appropriate Regression ModelBased on the Previous phases and Leverage Summariesto Refine Regression over the Time Series Signals
1) Cluster Generation Phase:
In this stage, we need toextract the summarization of clusters from the signals withdifferent events.Gaussian Mixture Models (GMM) are a probability distri-bution consisting of a linear combination of multivariate Gaus-sian distributions each with their own mean and variance [3].These individual Gaussian distributions will each correspondto a cluster of time-series activity windows which are mostlikely to belong to them. We utilize this cluster phase of theframework to produce a feature vector of ratios for each timeseries signal. This feature vector’s components are of the form N nk N n where N nk is the number of windows n that belong tocluster k and N n is the total number of windows in the signal n . Using the features of the time series windows, we can definethe distribution p ( x ) over all windows x ∈ X . It is reasonableto assume that within the distribution, there are K componentdistributions to which each point x truly belongs. We furtherassume that those components are all normally distributed.As a result of these assumptions, we have Equation 3 as thedistribution function. p ( x | π, µ, Σ) = K (cid:88) k =1 π k N ( x | µ k , Σ k ) (3)Here, π k is referred to as a mixing coefficient of the k -thcomponent of the mixture distribution. We take (cid:80) π k = 1 so a)(b) (c)(d) (e) Fig. 2: Above, we show a single instance of Track Running as it makes its way through the clustering phase of the framework. (a) shows the x-axis accelerometer counts for the activity. (b) shows the breakdown of the signal into the 12-second windowsfor this axis. Not shown is the lag-1 autocorrelation feature. This is the standard means for constructing data instances fromphysical activity timeseries data. In (c) , we show a PCA projection of the entire activity. The shapes of the data points indicatethe clusters each activity window was assigned to. (d) and (e) demonstrate the feature construction from the clustering. Thenew features associated with an activity instance are the ratios of membership to each cluster.hat π k acts as the probability of an arbitrary x belonging tocomponent k .For x in the distribution, we can measure the probabilityof x belonging to component k by taking Equation 4. This isreferred to as γ k ( x ) , the responsibility of component k for x . γ k ( x ) = π k N ( x | µ k , Σ k ) (cid:80) l π l N ( x | µ l , Σ l ) (4)To determine which component k a data point x belongs to,we can take the maximum of the responsibilities γ k ( x ) over k . K ( x ) = arg max k γ k ( x ) (5)We learn this model using variational Bayesian techniqueswhich allow us to learn an accurate number of clusters K aswell as learn each of the K clusters very accurately withoutoverfitting or producing singularity clusters [13].This clustering function K ( x ) allows us to create the fixed-length feature vectors for the dataset. We’ll now define N nk asfollows where I k is the usual indicator function and X n is theset of all activity windows belonging to activity n . N nk = |X n | (cid:88) x I k ( K ( x )) (6) N n = |X n | (7)This is the end-point for the clustering phase of the frame-work. For each activity n , we now have a single fixed-lengthvector (cid:104) N n N n , . . . , N nK N n (cid:105) T .
2) Classification Phase:
We choose Artificial Neural Net-work (ANN) models for classification. An ANN is a layeredgraph-based non-linear regression model commonly used inmachine learning and statistical settings [4]. There is precedentin applying ANNs for classification of time series signals [16][17]. In this phase of the
SummerTime framework, we aregoing to perform classification given only the feature vectorsproduced in the previous clustering phase.As a result, our ANN model will consist of K inputnodes, where K is the latently-learned cluster-count variablecorresponding to the number of clusters. The hidden layerconsists of 25 hidden neurons. The output layer is a softmaxlayer which produces discrete categorical values correspondingto class predictions.One drawback to using an ANN is that the model is gen-erally considered a blackbox and is often over-parameterized.As a result, the number of hidden layer nodes was chosen em-pirically. However, this does not inhibit the model’s strength atperforming classification. By selecting empirically the hiddenlayer hyperparameter, we can assume that the network withthe appropriate number of hidden layer neurons will performbest.This is the end-point for the classification phase of theframework which produces our final classification prediction for each time series signal. We then feed-forward the fixed-length feature vector from the clustering phase and this clas-sification result forward into the regression phase.
3) Linear n -Regression Phase: Results from the Two-Regression Model by [6] show that knowledge of a time seriessignal’s class has a significant impact on accurate regressionover the signal. We intend to leverage not only the physicalactivity class prediction of the classification phase of theframework, but also to reuse the fixed-length feature vectorproduced in the clustering phase of the framework.For the regression model, we use |C| distinct linear re-gression models. Each model is trained independently of theothers, each on one of the time series classes. Using theclassification prediction, we select which of the |C| to use forthe regression phase. The regression model will be appliedover each window, rather than the full-length signal.Each model is a linear regression model of the followingform: y = Xβ + ε (8)The rows of our design matrix X correspond to time serieswindows. The columns of our design matrix X consist of abias term , the features of time series window, and finallyit also includes the K additional cluster features constructedfor the full-length time series signal. In total, the number ofvariables over which the regression is performed is W count + K , where W count is the number of features in the window. Themodel β is to be of the appropriate size and will be learnedwith minimal error ε . The regression target is represented by y . V. E XPERIMENTS AND R ESULTS
A. Window Features
For use with our method, a time series signal must besegmented into equally-sized intervals or windows. Each win-dow must be assigned features that characterize the localinformation in that region of the signal. For this purpose, wechose to use percentile information, which has precedence inother works [16] [17] .For percentile features, we characterize a window alongeach of its axes by the interval’s 10th, 25th, 50th, 75th, and90th percentiles. This follows the scheme used by those pre-vious works. This encodes information about the distributionof points within the window, excluding the minimum andmaximum to be robust against outliers.We also include an additional feature per axis, lag-1 auto-correlation. Autocorrelation measures the correlation betweena signal’s current value and its past values. By taking a lag of1, we are considering how correlated the values of the signalare with their previous value.In total, each window has five percentile features per axisand one lag-1 correlation feature per axis, for a total of 18features per window.Each activity bout in the dataset consists of a number ofminutes of activity. For use with our method, each minute ategory and Class Number of WindowsSedentary Lying Rest 14755 16475Playing Computer Games 860Reading 860Light Household and Games Light Cleaning 840 2505Sweeping 865Workout Video 800Moderate-Vigorous Household and Sports Wall Ball 845 1570Playing Catch 725Walk Brisk Track Walking 1210 3775Slow Track Walking 1000Walking Course 1565Run Track Running 485 485
TABLE I: Here we show the types of activities present in the data set and what activity categories they fall under. On the rightside, the columns are the number of 12-second windows corresponding to each category and class. As you can see, Sedentaryactivities are significantly over-represented within the data set and Running activities are under-represented, contributing toa severely imbalanced data set. Furthermore, Walking is the second most-represented category, having nearly eight times asmany windows as Running, a class that it is difficult to be distinguished from.of activity is broken up into five 12-second windows. Thislength was chosen because it was the lowest reasonable limitwe could justify to maximize the resolution of our time seriessignals. The fewer windows the activity bout has, the coarserthe cluster summary features would be.However, it must be addressed that 12-second intervalsdo not evenly divide into 10th, 25th, 50th, 75th, and 90thpercentiles. For this, we chose to use the nearest appropriatepoints for those percentiles: the 2nd, 3rd, 6th, 9th, and 11thpoints. Since our data resolution was 1-second, we are verylimited in our choices of window-lengths. The smallest evenly-divisible length would be 20. However, we chose 12 to grantus additional resolution beyond that limit. An alternative tochoosing the nearest appropriate points would be to performinterpolation, and in this setting we cannot justify interpolatingthe data at these scales. Since we cannot account for thegeneral behavior of the signal between measurements, there isno reasonable assumption to make for a best approximation ofin-between points; choosing any interpolation scheme wouldintroduce significant bias.All regression, ANN, and experiments performed hereinwere done using MATLAB 2018a [11]. The code for Varia-tional Bayesian Gaussian Mixture Models is part of the PatternRecognition and Machine Learning Toolbox on the MATLABFile Exchange [14].
B. Leave-one Person Out Cross Validation
For cross validating our method, we use leave-one personout. We train our framework on all of the data excludingthe instances associated with a single participant, and repeatfor each participant in the data set. We chose to test at theparticipant scope rather than the instance scope to avoid anybias we would incur while predicting PA type or estimatingPAEE for one of a participant’s activities given that we havetrained on the other activities they performed for the dataset.
C. Competitive Methods
For the classification phase, we compare
SummerTime witha baseline Artificial Neural Network (ANN) model. For the regression phase, we compare
SummerTime withLinear Regression on Local Features with Voting, on a 5-Regression Model applied to the classification by ANN, andto ANN on Local Features with Voting.
1) ANN Classification with Voting:
The ANN we compareclassification against takes as input the time window featuresand produces an activity type classification per window. Theclassification result for the entire activity is aggregated bymajority voting, using the model. The hidden layer consistsof 25 hidden neurons, following the network topology from[16].Comparison with this method shows that the summarizationfeatures produce better results than using the local featureswith voting, since the underlying classification method isequivalent.
2) Linear Regression on Local Features:
We compare theregression phase of
SummerTime with a linear regressionmodel. The design matrix of the linear regression consists ofa bias term and the window features. The PAEE estimates areaggregated as sums of each window for the activity as a whole.This model is a regression baseline for the bare minimumattempt at estimating the energy expenditure from availablevariables.
3) 5-regression splitting on ANN classification:
We alsocompare the regression phase with the 5-regression modelbut using the classification result of the ANN classifier. Theregression model does not include the summarization featuresas input variables, but is otherwise the same as described inSection 3.3. Again, regression results are aggregated as sumsfor the activity as a whole.Comparison with this method shows that our summarizationfeatures provide a notable improvement over performing the5-regression regression alone.
4) ANN Regression:
Finally, we compare the regressionphase with the ANN directly. The ANN is a powerful methodthat is one of the main methods used for classifying physicalactivity type. Once again, regression results are aggregated assums for the activity as a whole. ed. LHH MtV Walk RunSed. 16455
20 0 0 0
LHH
310 20 0
MtV
20 240
20 0
Walk
25 0 25 Run
TABLE II: Above, we have the counts of the Confusion Matrix of activity instances for the Classification Phase. Columns arepredictions and rows are actuals. Along the diagonal are the in-class true-counts.
Sed. LHH MtV Walk RunSed.
TABLE III: Here, we show the Confusion Matrix for the Classification Phase as percentages, with Recall along the diagonal.Columns are predictions and rows are actuals.
D. Classification Results
In validating PA classification, we use multi-class accuracy.We will generally refer to in-class accuracy, as opposed tooverall accuracy. The in-class accuracies of each class are moreindicative of the success of our method as they are not heavilyinfluenced by the large number of easily-correctly-classifiedSedentary activities. The changes in overall accuracy are lessindicative of improvement because of the imbalance in theunderlying dataset.In Table II, we show a confusion matrix detailing theclassification of activity instances. Rows represent the groundtruth labels of each instance, whereas columns represent ourclassification model’s predictions. Along the diagonals in boldscript are the correctly classified instances. Along the diagonalof Table III, we see the recall of the classifier.In Table IV, we show the classification results of an ANN,which achieves poor Run category recall. A recall of 8.25% isfar worse than of random guessing (an expectation of 20% fora 5-class classification). This indicates that the model is biasedagainst making predictions of Running. While the ANN hasa higher recall for the Light Household Chores category, itis misclassifying a larger portion of instances outside of theLHH category.
E. Regression Results
In validating PAEE regression results, we make use of theroot mean-squared error (RMSE) measure. The formula forRMSE is as follows, where ˆ y is the regression output and y is the true EE (Energy Expenditure).RMSE = (cid:114) E (cid:16) (ˆ y − y ) (cid:17) (9)This measure acts as the sample standard deviation ofdifferences between predicted and true values, meaning thatwe can interpret it directly as a measure of both accuracy andprecision of prediction [18]. We calculate RMSE for both in-class and overall.In Table V, we show the RMSE values associated with thepredicted METs for each regression model tested by every activity category. Our method out-performed each methodoverall and in each individual activity category as well, achiev-ing an overall RMSE of 0.7206. It should be noted that theRMSE associated with the Running is much larger than thatof the Walking category in other regression methods. In ourmethod’s predictions, the RMSE values of both categories aremuch closer together, differing by only 0.0425 units. Thisindicates that the intensity of the activity doesn’t have animpact on precision of EE estimates in our method.VI. R ELATED W ORK
Here we present a list of major works that have beendone on physical activity classification and energy expenditurepredictions using machine learning methods. We also includeimportant works in the greater field of time series analysis.In 2005, Crouter et al. introduce the two-regression modelwhich alternates between a quadratic regression model and alinear regression model based on the coefficients of variationsof each bout. [6] This novel approach broke the overallproblem objective into two key parts: first, separating instancesby their variability into two groupings based on their co-efficients of variation; second, applying to each grouping aregression model which is more appropriate for instances ofthat variability.In 2009, Ye et al. develop a new time series primitive calleda shapelet. The shapelet is a subsequence pattern intended tobe maximally representative of some class of time series. Theydemonstrate in the work that the new concept is time-efficient,accurate, and interpretable [19].In 2012, Trost et al. was able to improve physical activityclassification accuracy as well as low root mean squared-error(RMSE) in energy expenditure estimation with an ArtificialNeural Network (ANN) model. [17]In that same year, Mu et al., revisited the two-regressionmodel of Crouter et al. and extended it to a number ofregression models, one per each activity type [12]. The dataused in this study including each activity bout therein wasstructured rather variably, which made it analogous to a free-living data collection. This method utilized distance metric ed. LHH MtV Walk RunSed.
TABLE IV: Here, we show the Confusion Matrix for the ANN classification as percentages, with Recall along the diagonal.Columns are predictions and rows are actuals. Take note of the misclassification of Run activities as Walking and LightHousehold Chores and Games.
Sed. LHH MtV Walk Run OverallLinear Regression
ANN
Our Method 0.1741 1.0406 1.4231 1.2268 1.2693 0.7206
TABLE V: Above, we show the RMSE for MET estimates of each Regression Model by Activity Class. Our method performsbest in each class and overall. Note the similarity of RMSEs between Walking and Running, a similarity not experienced byany of the other methods. This indicates that the intensity of the activity doesn’t have an impact on precision of EE estimates.learning methods to learn the underlying block structure ofvariable-length activity bouts.In 2014, Staudenmayer et al. expanded on the field withanother ANN model which they applied to their own dataset[16]. However, their classification procedure was targetinglearned activity types, as opposed to expert-defined types.They produced these types through clustering based on theirsignal activity levels.Petitjean et al. developed a nearest centroid classificationmethod which constructs centroids that are meaningful to theDynamic Time Warping (DTW) Algorithm, to allow for time-efficient classification with the distance-based approach [15].Bastion et al. published an evaluation of cutting-edge meth-ods outside of the rigid laboratory setting and confirmedthe activity classification community’s suspicions that existingmethods would not perform well in the free-living setting [1].Hills et al. developed a time series classification methodusing shapelets to produce an alternative representation of timeseries signals where the new features are distances from eachof k shapelets [9].In 2015, Baydogan et al. made use of the dependencystructure within time series to develop a representation andsimilarity measure which they validate on a wide variety oftime series domains [2].In 2017, David Hallac et al. developed a method thatsegments and clusters time series data using structural net-work relations rather than spatial distance to encode differentgroupings of time series segments [8].In 2018, Stanislas Chambon et al. developed a deep learningapproach for sleep stage classification that learns end-to-endwithout computing spectrograms or extracting handcraftedfeatures [5].In 2019, Fazle Karim et al. proposed transforming theexisting univariate time series classification models, LSTM-FCN and ALSTM-FCN, into a multivariate time series classifi-cation model by augmenting the fully convolutional block witha squeeze-and-excitation block to further improve accuracy[10]. VII. D ISCUSSION
To resolve the lack of classical structure in the time seriesdata, in this paper, we chose to bridge the gap betweenthe variable-length, empirically-chosen features for physicalactivity data and a latently-learned fixed-length representation,or summary. Our method,
SummerTime , leverages an existingfeature construction and through clustering of disjoint timewindows establishes a summary of the time series as a whole.The features of the summarization were extracted from thedata unsupervised through Gaussian mixture using a varia-tional Bayesian approach. This allows our method to zero-inon the number of representative features of the summarization,naturally doing feature selection. This clustering provides uswith an informative fixed-length feature-vector which containsa global description of the signal for a single activity bout.From this, we are able to preform classical machine learningmethods on the summaries instead of the original instances.We show that this summarization is actually sufficient forproblems like physical activity type classification and out-performs classification on each time window independentlywith majority voting. We then show that
SummerTime can aug-ment energy expenditure predictions per window by includingwith each window’s original features the summarization ofthe bout. In what we believe to be a spectacular achievement,
SummerTime manages to get small, comparable error forboth Walking and Running type activities, two activity typeswhich normally have significant cross-class error. Overall,
SummerTime demonstrates low classification and regressionerror overall, robustness, and effectiveness in an imbalanceddataset. We hope to further demonstrate the strengths of
SummerTime in more time series domains in the future.VIII. A
CKNOWLEDGEMENTS [Content removed in compliance with Triple Blind. Will beadded back in after Triple Blind period.]
EFERENCES[1] Thomas Bastian, Aur´elia Maire, Julien Dugas, Abbas Ataya, Cl´ementVillars, Florence Gris, Emilie Perrin, Yanis Caritu, Maeva Doron,St´ephane Blanc, Pierre Jallon, and Chantal Simon. Automatic identi-fication of physical activity types and sedentary behaviors from triaxialaccelerometer: laboratory-based calibrations are not enough.
Journal ofApplied Physiology , 118(6):716–722, 2015.[2] Mustafa Gokce Baydogan and George Runger. Time series represen-tation and similarity based on local autopatterns.
Data Mining andKnowledge Discovery , 30(2):476–509, 2016.[3] Christopher M Bishop. Mixture models and em. In
Pattern recognitionand machine learning , chapter 9. Springer, 2006.[4] Christopher M Bishop. Neural networks. In
Pattern recognition andmachine learning , chapter 5. Springer, 2006.[5] Stanislas Chambon, Mathieu N Galtier, Pierrick J Arnal, Gilles Wainrib,and Alexandre Gramfort. A deep learning architecture for temporalsleep stage classification using multivariate and multimodal time series.
IEEE Transactions on Neural Systems and Rehabilitation Engineering ,26(4):758–769, 2018.[6] Scott E. Crouter, Kurt G. Clowers, and David R. Bassett. A novel methodfor using accelerometer data to predict energy expenditure.
Journal ofApplied Physiology , 100(4):1324–1331, 2006.[7] Patty S Freedson, Kate Lyden, Sarah Kozey-Keadle, and John Stauden-mayer. Evaluation of artificial neural network algorithms for predictingmets and activity type from accelerometer data: validation on anindependent sample.
Journal of Applied Physiology , 111(6):1804–1812,2011.[8] David Hallac, Sagar Vare, Stephen Boyd, and Jure Leskovec. Toeplitzinverse covariance-based clustering of multivariate time series data. In
Proceedings of the 23rd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining , KDD ’17, pages 215–223, NewYork, NY, USA, 2017. ACM.[9] Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and AnthonyBagnall. Classification of time series by shapelet transformation.
DataMining and Knowledge Discovery , 28(4):851–881, 2014.[10] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and SamuelHarford. Multivariate lstm-fcns for time series classification.
NeuralNetworks , 116:237–245, 2019.[11] MATLAB. version R2018a . The MathWorks Inc., Natick, Mas-sachusetts, 2018.[12] Y. Mu, H. Z. Lo, W. Ding, K. Amaral, and S. E. Crouter. Bipart:Learning block structure for activity detection.
IEEE Transactions onKnowledge and Data Engineering , 26(10):2397–2409, Oct 2014.[13] Nikolaos Nasios and Adrian G Bors. Variational learning for gaussianmixture models.
IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics) , 36(4):849–862, 2006.[14] M. Chen, Pattern Recognition and Machine Learning Toolbox, 2016.MATLAB File Exchange.[15] Franc¸ois Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nichol-son, Yanping Chen, and Eamonn Keogh. Dynamic time warpingaveraging of time series allows faster and more accurate classification.In
Data Mining (ICDM), 2014 IEEE International Conference on , pages470–479. IEEE, 2014.[16] J. Staudenmayer, D. Pober, S. Crouter, D. Bassett, and P. Freedson. Anartificial neural network to estimate physical activity energy expenditureand identify physical activity type from an accelerometer.
J. Appl.Physiol. , 107(4):1300–1307, Oct 2009.[17] S. G. Trost, W. K. Wong, K. A. Pfeiffer, and Y. Zheng. Artificial neuralnetworks to predict activity type and energy expenditure in youth.
MedSci Sports Exerc , 44(9):1801–1809, Sep 2012.[18] Wikipedia contributors. Root-mean-square deviation — Wikipedia, thefree encyclopedia, 2018.[19] Lexiang Ye and Eamonn Keogh. Time series shapelets: a new primitivefor data mining. In