[PDF] SummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis

Abstract

\textit{SummerTime} seeks to summarize globally time series signals and provides a fixed-length, robust summarization of the variable-length time series. Many classical machine learning methods for classification and regression depend on data instances with a fixed number of features. As a result, those methods cannot be directly applied to variable-length time series data. One common approach is to perform classification over a sliding window on the data and aggregate the decisions made at local sections of the time series in some way, through majority voting for classification or averaging for regression. The downside to this approach is that minority local information is lost in the voting process and averaging assumes that each time series measurement is equal in significance. Also, since time series can be of varying length, the quality of votes and averages could vary greatly in cases where there is a close voting tie or bimodal distribution of regression domain. Summarization conducted by the \textit{SummerTime} method will be a fixed-length feature vector which can be used in-place of the time series dataset for use with classical machine learning methods. We use Gaussian Mixture models (GMM) over small same-length disjoint windows in the time series to group local data into clusters. The time series' rate of membership for each cluster will be a feature in the summarization. The model is naturally capable of converging to an appropriate cluster count. We compare our results to state-of-the-art studies in physical activity classification and show high-quality improvement by classifying with only the summarization. Finally, we show that regression using the summarization can augment energy expenditure estimation, producing more robust and precise results.

Full PDF

SSummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis

Kevin Amaral

Computer ScienceUniversity of Massachusetts Boston

Boston, [email protected]

Scott Crouter PhD.

Exercise and HealthUniversity of Tennessee-Knoxville

Knoxville, [email protected]

Zihan Li

Computer ScienceUniversity of Massachusetts Boston

Boston, [email protected]

Ping Chen PhD.

Computer EngineeringUniversity of Massachusetts Boston

Boston, [email protected]

Wei Ding PhD.

Computer ScienceUniversity of Massachusetts Boston

Boston, [email protected]

Abstract — SummerTime seeks to summarize globally time seriessignals and provides a ﬁxed-length, robust summarization ofthe variable-length time series. Many classical machine learn-ing methods for classiﬁcation and regression depend on datainstances with a ﬁxed number of features. As a result, thosemethods cannot be directly applied to variable-length time seriesdata. One common approach is to perform classiﬁcation over asliding window on the data and aggregate the decisions made atlocal sections of the time series in some way, through majorityvoting for classiﬁcation or averaging for regression. The downsideto this approach is that minority local information is lost inthe voting process and averaging assumes that each time seriesmeasurement is equal in signiﬁcance. Also, since time series canbe of varying length, the quality of votes and averages couldvary greatly in cases where there is a close voting tie or bimodaldistribution of regression domain. Summarization conducted bythe

SummerTime method will be a ﬁxed-length feature vectorwhich can be used in-place of the time series dataset for use withclassical machine learning methods. We use Gaussian Mixturemodels (GMM) over small same-length disjoint windows in thetime series to group local data into clusters. The time series’rate of membership for each cluster will be a feature in thesummarization. By making use of variational methods, the GMMconverges to a more robust mixture, meaning the clusters aremore resistant to noise and overﬁtting. Further, the model is nat-urally capable of converging to an appropriate cluster count. Wevalidate our method on our [The dataset is created using our NIHawards. The name is removed in compliance with Triple Blind]dataset, an imbalanced physical activity dataset with variable-length time series structure. We compare our results to state-of-the-art studies in physical activity classiﬁcation and show high-quality improvement by classifying with only the summarization.Finally, we show that regression using the summarization canaugment energy expenditure estimation, producing more robustand precise results.

Index Terms —Time Series, Clustering, Classiﬁcation, Regres-sion, Summarization

I. I

NTRODUCTION

The physical activity (PA) is one of the most important areaswhen people try to know more about themselves.To record theactivities, time series data is abundant in the PA, especiallywhen researchers record the activities in the dynamic way.In this paper, we propose a data-driven based method whichprovides a variable-length summarization of physical activitytracks.The physical activity is deﬁned as any movement of thebody that requires energy expenditure. Since the activitiesalways happen with multiple body movements, we named asequence of single movements as a speciﬁc activity. Then,the different combinations of single breakdown movementsare deﬁned as different PA. Researchers need to match thereal movement track with the pre-deﬁned activities whichcontains a speciﬁc sequence of movements. Since the differentPAs may consist of the same sub-sequence of single bodymovements, it is crucial to decide how to break the fullmovement track. However, it is hard to set an appropriatebreak in the movement track based on the traditional PAtechnique. In this paper, we apply the data-driven machinelearning method to provide a way to redeﬁne the PAs asdifferent clusters. These clusters will have signiﬁcant commonpatterns but may not follow the traditional PA deﬁnitions (likethe speciﬁc sequence of single movements). With the data-driven model, the data can reveal their hidden patterns byitself. Since no pre-deﬁned movement sequence is required,the chance bias can be reduced signiﬁcantly. Another beneﬁtof our data-driven method is to improve the accuracy byconsidering the small effective events into the model. Thesmall effective events can also have signiﬁcant effects to theﬁnal result, like the long-tail effect. The information from both a r X i v : . [ c s . L G ] F e b ajority and minority of effective events will contribute in themethod equally.Our method, SummerTime ( Time series

Summa e r ization),leverages the features of time windows as local informationabout the time series signal and use them to create a globaldescription of the signal which preserves the information of ev-ery time window within it. The traditional time series machinelearning methods depend on a ﬁxed-length representation sothat they can build relationships between features and usethose to produce their results. However, it is difﬁcult to applyclassical time series machine learning methods to the timeseries PA data. Since, though, the time series PA data is rich ininformation, it is rarely uniformly structured across instances,differing in length and scale. For instance, in regression meth-ods, the time series data can be trained by treating the shorterlength of features. However, building models in this way hasto loss parts of information in the data, which would leadto overﬁtting and to reduce the accuracy of models workingwith future unseen longer features. Our model, SummerTime ,attempts to encapsulate all global information about the timeseries PA signal as a ﬁxed-length representation, producing afeature vector with a constant number of features for eachinstance. Not only including as much as information fromthe data, but also providing a competitive solution to dealwith variable-length time series PA data. The features ofthis global description are learnt latently using clusteringof the local time windows. This effectively treats the biasfrom manually-constructed features and results in a higher-quality interpretation of the features. This does not, however,fully remedy the bias from the manually constructed timewindow features. The prime assumptions we make is that thetime window features are already of moderate-to-signiﬁcantquality for locally describing the signals and that the mainobstacles of other time series methods are in the additionalassumptions they undertake to solve their respective problems.These assumptions have merit as we are able to generate fromthe local time window features potent summary features, andwe observe many opportunities for competitive approachesto introduce biases and make our best efforts to avoid thosebiases. In accounting for these biases, we observe overall betterperformance than each of our competitive methods. We hopeto resolve any further biases by having an end-to-end data-driven solution. Our contributions are as follows: • Fixed-length feature-vector representation of variable-length time series:

This is one of the major obstaclesthat variable-length time series faces. Many approachesseek to instead classify and regress on local informationabout the signal or to resize the signal to uniform lengthacross all instances. In either case, these approachesusually achieve the ﬁxed-length representation so as touse classical methods. However, rescaling the signal relin-quishes information about the proportion to time (i.e. theconcept of an n -second interval is lost) and voting overthe classiﬁcations and regressions done in the local timewindows fails to encode global information about the signal as a whole. Our model achieves the representationwithout losing proportional and global information. • End-to-end data-driven approach:

Our method,

Sum-merTime , is designed to ﬁt between the time series datasetand the classical methods you want to apply, performingfeature construction independently of the experimenterand extracting all knowledge from the data itself. Whilethis ﬁts the idea of a black box, we are of the opinionthat this is a valuable trait in a framework. By havingthe method be data-driven, we avoid the opportunity forintroducing further bias from the experimenter and reducethe loss of information. • Demonstration of improved robustness using onlysummarization features:

With our dataset, we show thatclassiﬁcation using one of the other schemes, speciﬁcallyvoting over local time windows, is vulnerable to cross-class classiﬁcation error between far physical activitytypes. We then show that using only features extracted bythe

SummerTime method, it was able to out-perform themoverall, and also signiﬁcantly improved the misclassiﬁca-tion between near classes, which is a major demonstrationof robustness. • Demonstration of improved precision using summa-rization features to augment regression:

Again, in ourdataset, we show that we can augment existing state-of-the-art regression models by including our

SummerTime features. Not only does doing so improve regression re-sults, but also note that the regression RMSE of the Run-type activities, a close class to the Walk-type activities,drops to around the same level as Walk-type activities.This increased precision indicates that the

SummerTime features overcome some bias clearly present in all otherregression methods we compare against, each with run-type error well-above Walk-type error.The rest of this paper is organized as follows: The secondsection introduces the time series PA dataset we are going toanalysis. The third section goes on to describe the problemof variable-time series and the goal of classiﬁcation andregression on those types of data. In the forth section, wedescribe our proposed data-driven method, the

SummerTime

Framework, which solve the problem outlined in the thirdsection of the paper. In the ﬁfth section, we present ourexperiment results, and the interpretations thereof. In the sixthsection, we present related work. Finally, we conclude in theseventh section. II. D

ATA D ESCRIPTION

A. Dataset

In the medical ﬁeld, measuring physical activity in childrenis crucial in learning the link between physical activity andhealth problems such as obesity and other metabolic issues.Measuring physical activity is generally done in two ways: (1)identifying the types of activities performed and (2) measuringcaloric or energy expenditure. Accelerometers prove to be theideal device for physical activity data collection. While theyig. 1: The

SummerTime framework diagram. The local window features are clustered to produce new summary features foruse in the Classiﬁcation and Regression phases. The Classiﬁcation Phase makes use of only the newly constructed summariesfrom the Clustering Phase. In our experimental setting, the classiﬁcation is of physical activity type. In the Regression Phase,the Window Features, the Summary Features, and the Classiﬁcation result are all used to produce the physical activity energyexpenditure estimate.o not directly identify activities being performed nor measurecaloric expenditure, they provide an objective measurementwhich is information-rich and can be analyzed to predict andestimate these two measurements. They are also a low-cost andnon-invasive alternative to indirect calorimetry, which mea-sures caloric expenditure through a breathing device. However,analyzing accelerometer data to make these predictions andestimates is still a difﬁcult, unsolved problem and estimatesmade with state-of-the-art methods are still high-variance [7].Furthermore, classical machine learning methods cannot beapplied out-of-the-box to a free-living setting since physicalactivity accelerometer data collected outside of a laboratoryenvironment is generally unstructured or continuous streamtime series.The dataset used for validation is a combined datasetfrom two studies performed by [Citation and Name Removedin compliance with Triple Blind]. The data was collectedfrom child participants performing a wide variety of physicalactivities. The participants were outﬁtted with an Actigraph Xaccelerometer which was used to collect tri-axial accelerome-ter counts at one count per second. Participants were hookedup with a Cosmed K4b to collect energy expenditure mea-surements in MET (Metabolic Equivalent of Task) units.Start and end times were observed, as well as physicalactivity (PA) type for every bout of activity performed by eachparticipant. Our model will not predict cut-points for activitiesbut will instead classify the PA type and estimate physicalactivity energy expenditure (PAEE) of time series segments.As such, our dataset comes pre-segmented with true segmentbounds.The variable-length nature of the data is an allegory for thefree-living setting, where activity length is not rigidly deﬁned.The common approach to deal with multi-instance data, datafor which each instance has multiple parts, is break it upinto its smaller parts and build a model over the membersof the multi-instances instead. We will do just that, but wewill maintain the association between the multi-instance andits members by referring to the multi-instance as the activitybout and its members as activity windows which belong to thebout.In total, 184 child participants’ data were used. There were98 male participants between 8 years old and 15 years old.There were 86 female participants between 8 years old and14 years old.For each of these participants, one bout of lying restingfor up to 30 minutes with a median time of 17 minutes. Allother activities were performed for up to 10 minutes with amedian time of 4 minutes. All activity date is collected at 1-second resolution. Activities which were included in the studywere Computer Games, Reading, Light Cleaning, Sweeping,Brisk Track Walking, Slow Track Walking, Track Running,Walking Course, Playing Catch, Wall Ball, and WorkoutVideo. These activities classes were binned into categoriesby the rigorousness of the associated activity; the categoryclasses were Sedentary (Sed), Light Household Chores andGames (LHH), Moderate to Vigorous Household Chores and Sports (MtV), Walking (Walk), and Running (Run). See TableI for a visual breakdown of the categories. The categories areorganized in order of ascending intensity.The representation of each activity category in the datasetis not equal. Reference again Table I to see the huge disparityin the number of windows.III. P ROBLEM D ESCRIPTION

Physical activities (PA) of human consist of the elementalmovements of body, such as leg raising and arm stretching.We deﬁne each breakdown of movement as an atomic eventof the time series activities data. And each observed singleactivity is performed as a PA signal. In this case, each activitysignal is the combination of multiple activity events. The timeseries PA data are a discrete sequence of events of the form D = { x t , x t , . . . ; t i ∈ T } such that their index set T istemporal in nature. In many cases, the events do not needto occur at a constant frequency but for the purposes of thispaper, we will consider only the case when the time series hasevents at a constant frequency.The most popular two techniques of prediction methods areclassiﬁcation and regression. However, both of them are notsuitable for the variable length time series PA activities data.The reasons and solutions are introduced as follow.Classiﬁcation is the process in which a trained modelmaps time series data to a nominal class label. Traditionalclassiﬁcation methods operate on the ﬁxed number of features,such that they can be generalized as follows, where D is thespace of instances which have k features and C is the spaceof class labels. f ( x , . . . , x k ) : D → C (1)For a ﬁxed-length vectors dataset, we can apply thosetraditional methods out of the box. However, in the case ofvariable-length time series, it is impossible to apply traditionalmodels directly.One current solution to this is to break-up each time seriessignal into equally-sized time series windows and to performtraditional classiﬁcation over features of the windows. Theﬁnal result would be aggregated through voting or taking themode. This results in loss of minority information within thesignal. However, minority information may also contribute tothe classiﬁcation, especially when the difference is not verysigniﬁcant. For example, in the cases where a signal has equalparts of two classes, the method observes 50% conﬁdence inits classiﬁcation result. This is akin to a coin ﬂip.Another current solution to the variable-length time seriesdata is to rescale the time series signal so that all instances areof the same length. If done through interpolation, estimatingand including in-between points, we introduce bias from ourchoice of interpolation method. In non-smooth signal domains,estimates of in-between points can be radically inaccurate.Regardless of if we account for that bias somehow, rescalingthe time series removes proportionality from the signal, relin-quishing the distinction between an n -second interval and an -second interval. This causes another bias in that n -pointintervals are treated equally, even if they correspond to vastlydifferent stretches of real time.In this paper, instead of convert variable-length time seriesdata into a ﬁxed-length summarized format, we propose amethod to do this in a way that preserves global informationabout the signal (i.e. no data point’s contribution is unac-counted for) and preserves in someway proportionality. Furthermore, our method works in a data-driven manner to avoidbiases causing by human being, such as the experiments orthe labeling process.Our solution is to cluster the time series window featuresand then use the ratios of membership to each cluster as thefeatures of the summary. This technique can fully encapsulatethe global information of the PA signals since the inclusionor exclusion of any events in the signal will affect everysummary feature. In this way, the proportionality can beaccounted by comparing the variable length time series signalswithout treating their events as equally frequent. Finally, theunsupervised clustering technique is applied to construct thesummary of the signals, which can avoid the bias causingby humans when assigning signals into the targeted clusterssigniﬁcantly.Regression part has a similar process to classiﬁcation in thatwe are building a model which maps from a time series dataspace to a target space. However, the target space of regressionmethods is generally numeric and continuous, rather thancategorical and discrete as in a nominal space of classiﬁcation.Regression models are usually of the form: f ( x , . . . , x k ) : D → R d (2)The same as classiﬁcation case, traditional regression mod-els only can operate over data with the same number ofcomponents. One current solution for applying regressionmodels to variable-length data is to treat shorter instances ashaving missing values. The problem of this is that there arevariables which become speciﬁcally tuned to long instancesfrom the training set. When presented with new long-instancetest data from a different domain distribution, those variableswill impact the regression results in ways the training processcannot account for.Another current solution is to perform the regression locallyon windows and aggregate the regression result as a totalor average or similar statistic over the windows. In caseswhere the distribution of windows is multi-modal, an equally-weighted average can push results in the direction of themajority, heavily inﬂuenced by the number of points in thesignal. This is not ideal since it leads to the issues of propor-tionality like it in the classiﬁcation. However, this inﬂuencecan be lessened without having to make major changes to theunderlying regression model.Our solution to the regression problem is to apply multipledistributions following the clusters from the classiﬁcationresults. In cases where the PA signals to be regressed overhave separate classiﬁcations, are different from each other in some categorical way, regression models can be built for eachcategory to produce better results than if a single model wasdesigned to handle every category. So, to decide on whichregression model should be applied to a new time series PAsignal, the classiﬁcation of its events is necessary. The regres-sion model would be applied to each window and aggregatethe result over the whole signals as usual. But, we will include,as regression variables, the time series summarization of thesignal from the clustering. These variables carry informationabout the signal as a whole which we demonstrate to have apositive effect on time series of differing lengths.IV. M ETHODOLOGY

A. Modeling Approach

Compared with the traditional methods, our SummerTimealgorithm works on both the ﬁxed-length and variable-lengthtime series data. Further more, our model can provide verycompetitive results of regression technique by consideringmultiple distributions for different kinds of PA signals.There are three major phases in the algorithm. The

Sum-merTime framework can be introduced as follow:1) Clusters Generation: Produce Time Series Summariza-tions2) Classiﬁcation: Determine the Class of the Time SeriesSignals from the Summarization3) Regression: Select the Appropriate Regression ModelBased on the Previous phases and Leverage Summariesto Reﬁne Regression over the Time Series Signals

1) Cluster Generation Phase:

In this stage, we need toextract the summarization of clusters from the signals withdifferent events.Gaussian Mixture Models (GMM) are a probability distri-bution consisting of a linear combination of multivariate Gaus-sian distributions each with their own mean and variance [3].These individual Gaussian distributions will each correspondto a cluster of time-series activity windows which are mostlikely to belong to them. We utilize this cluster phase of theframework to produce a feature vector of ratios for each timeseries signal. This feature vector’s components are of the form N nk N n where N nk is the number of windows n that belong tocluster k and N n is the total number of windows in the signal n . Using the features of the time series windows, we can deﬁnethe distribution p ( x ) over all windows x ∈ X . It is reasonableto assume that within the distribution, there are K componentdistributions to which each point x truly belongs. We furtherassume that those components are all normally distributed.As a result of these assumptions, we have Equation 3 as thedistribution function. p ( x | π, µ, Σ) = K (cid:88) k =1 π k N ( x | µ k , Σ k ) (3)Here, π k is referred to as a mixing coefﬁcient of the k -thcomponent of the mixture distribution. We take (cid:80) π k = 1 so a)(b) (c)(d) (e) Fig. 2: Above, we show a single instance of Track Running as it makes its way through the clustering phase of the framework. (a) shows the x-axis accelerometer counts for the activity. (b) shows the breakdown of the signal into the 12-second windowsfor this axis. Not shown is the lag-1 autocorrelation feature. This is the standard means for constructing data instances fromphysical activity timeseries data. In (c) , we show a PCA projection of the entire activity. The shapes of the data points indicatethe clusters each activity window was assigned to. (d) and (e) demonstrate the feature construction from the clustering. Thenew features associated with an activity instance are the ratios of membership to each cluster.hat π k acts as the probability of an arbitrary x belonging tocomponent k .For x in the distribution, we can measure the probabilityof x belonging to component k by taking Equation 4. This isreferred to as γ k ( x ) , the responsibility of component k for x . γ k ( x ) = π k N ( x | µ k , Σ k ) (cid:80) l π l N ( x | µ l , Σ l ) (4)To determine which component k a data point x belongs to,we can take the maximum of the responsibilities γ k ( x ) over k . K ( x ) = arg max k γ k ( x ) (5)We learn this model using variational Bayesian techniqueswhich allow us to learn an accurate number of clusters K aswell as learn each of the K clusters very accurately withoutoverﬁtting or producing singularity clusters [13].This clustering function K ( x ) allows us to create the ﬁxed-length feature vectors for the dataset. We’ll now deﬁne N nk asfollows where I k is the usual indicator function and X n is theset of all activity windows belonging to activity n . N nk = |X n | (cid:88) x I k ( K ( x )) (6) N n = |X n | (7)This is the end-point for the clustering phase of the frame-work. For each activity n , we now have a single ﬁxed-lengthvector (cid:104) N n N n , . . . , N nK N n (cid:105) T .

2) Classiﬁcation Phase:

We choose Artiﬁcial Neural Net-work (ANN) models for classiﬁcation. An ANN is a layeredgraph-based non-linear regression model commonly used inmachine learning and statistical settings [4]. There is precedentin applying ANNs for classiﬁcation of time series signals [16][17]. In this phase of the

SummerTime framework, we aregoing to perform classiﬁcation given only the feature vectorsproduced in the previous clustering phase.As a result, our ANN model will consist of K inputnodes, where K is the latently-learned cluster-count variablecorresponding to the number of clusters. The hidden layerconsists of 25 hidden neurons. The output layer is a softmaxlayer which produces discrete categorical values correspondingto class predictions.One drawback to using an ANN is that the model is gen-erally considered a blackbox and is often over-parameterized.As a result, the number of hidden layer nodes was chosen em-pirically. However, this does not inhibit the model’s strength atperforming classiﬁcation. By selecting empirically the hiddenlayer hyperparameter, we can assume that the network withthe appropriate number of hidden layer neurons will performbest.This is the end-point for the classiﬁcation phase of theframework which produces our ﬁnal classiﬁcation prediction for each time series signal. We then feed-forward the ﬁxed-length feature vector from the clustering phase and this clas-siﬁcation result forward into the regression phase.

3) Linear n -Regression Phase: Results from the Two-Regression Model by [6] show that knowledge of a time seriessignal’s class has a signiﬁcant impact on accurate regressionover the signal. We intend to leverage not only the physicalactivity class prediction of the classiﬁcation phase of theframework, but also to reuse the ﬁxed-length feature vectorproduced in the clustering phase of the framework.For the regression model, we use |C| distinct linear re-gression models. Each model is trained independently of theothers, each on one of the time series classes. Using theclassiﬁcation prediction, we select which of the |C| to use forthe regression phase. The regression model will be appliedover each window, rather than the full-length signal.Each model is a linear regression model of the followingform: y = Xβ + ε (8)The rows of our design matrix X correspond to time serieswindows. The columns of our design matrix X consist of abias term , the features of time series window, and ﬁnallyit also includes the K additional cluster features constructedfor the full-length time series signal. In total, the number ofvariables over which the regression is performed is W count + K , where W count is the number of features in the window. Themodel β is to be of the appropriate size and will be learnedwith minimal error ε . The regression target is represented by y . V. E XPERIMENTS AND R ESULTS

A. Window Features

For use with our method, a time series signal must besegmented into equally-sized intervals or windows. Each win-dow must be assigned features that characterize the localinformation in that region of the signal. For this purpose, wechose to use percentile information, which has precedence inother works [16] [17] .For percentile features, we characterize a window alongeach of its axes by the interval’s 10th, 25th, 50th, 75th, and90th percentiles. This follows the scheme used by those pre-vious works. This encodes information about the distributionof points within the window, excluding the minimum andmaximum to be robust against outliers.We also include an additional feature per axis, lag-1 auto-correlation. Autocorrelation measures the correlation betweena signal’s current value and its past values. By taking a lag of1, we are considering how correlated the values of the signalare with their previous value.In total, each window has ﬁve percentile features per axisand one lag-1 correlation feature per axis, for a total of 18features per window.Each activity bout in the dataset consists of a number ofminutes of activity. For use with our method, each minute ategory and Class Number of WindowsSedentary Lying Rest 14755 16475Playing Computer Games 860Reading 860Light Household and Games Light Cleaning 840 2505Sweeping 865Workout Video 800Moderate-Vigorous Household and Sports Wall Ball 845 1570Playing Catch 725Walk Brisk Track Walking 1210 3775Slow Track Walking 1000Walking Course 1565Run Track Running 485 485

TABLE I: Here we show the types of activities present in the data set and what activity categories they fall under. On the rightside, the columns are the number of 12-second windows corresponding to each category and class. As you can see, Sedentaryactivities are signiﬁcantly over-represented within the data set and Running activities are under-represented, contributing toa severely imbalanced data set. Furthermore, Walking is the second most-represented category, having nearly eight times asmany windows as Running, a class that it is difﬁcult to be distinguished from.of activity is broken up into ﬁve 12-second windows. Thislength was chosen because it was the lowest reasonable limitwe could justify to maximize the resolution of our time seriessignals. The fewer windows the activity bout has, the coarserthe cluster summary features would be.However, it must be addressed that 12-second intervalsdo not evenly divide into 10th, 25th, 50th, 75th, and 90thpercentiles. For this, we chose to use the nearest appropriatepoints for those percentiles: the 2nd, 3rd, 6th, 9th, and 11thpoints. Since our data resolution was 1-second, we are verylimited in our choices of window-lengths. The smallest evenly-divisible length would be 20. However, we chose 12 to grantus additional resolution beyond that limit. An alternative tochoosing the nearest appropriate points would be to performinterpolation, and in this setting we cannot justify interpolatingthe data at these scales. Since we cannot account for thegeneral behavior of the signal between measurements, there isno reasonable assumption to make for a best approximation ofin-between points; choosing any interpolation scheme wouldintroduce signiﬁcant bias.All regression, ANN, and experiments performed hereinwere done using MATLAB 2018a [11]. The code for Varia-tional Bayesian Gaussian Mixture Models is part of the PatternRecognition and Machine Learning Toolbox on the MATLABFile Exchange [14].

B. Leave-one Person Out Cross Validation

For cross validating our method, we use leave-one personout. We train our framework on all of the data excludingthe instances associated with a single participant, and repeatfor each participant in the data set. We chose to test at theparticipant scope rather than the instance scope to avoid anybias we would incur while predicting PA type or estimatingPAEE for one of a participant’s activities given that we havetrained on the other activities they performed for the dataset.

C. Competitive Methods

For the classiﬁcation phase, we compare

SummerTime witha baseline Artiﬁcial Neural Network (ANN) model. For the regression phase, we compare

SummerTime withLinear Regression on Local Features with Voting, on a 5-Regression Model applied to the classiﬁcation by ANN, andto ANN on Local Features with Voting.

1) ANN Classiﬁcation with Voting:

The ANN we compareclassiﬁcation against takes as input the time window featuresand produces an activity type classiﬁcation per window. Theclassiﬁcation result for the entire activity is aggregated bymajority voting, using the model. The hidden layer consistsof 25 hidden neurons, following the network topology from[16].Comparison with this method shows that the summarizationfeatures produce better results than using the local featureswith voting, since the underlying classiﬁcation method isequivalent.

2) Linear Regression on Local Features:

We compare theregression phase of

SummerTime with a linear regressionmodel. The design matrix of the linear regression consists ofa bias term and the window features. The PAEE estimates areaggregated as sums of each window for the activity as a whole.This model is a regression baseline for the bare minimumattempt at estimating the energy expenditure from availablevariables.

3) 5-regression splitting on ANN classiﬁcation:

We alsocompare the regression phase with the 5-regression modelbut using the classiﬁcation result of the ANN classiﬁer. Theregression model does not include the summarization featuresas input variables, but is otherwise the same as described inSection 3.3. Again, regression results are aggregated as sumsfor the activity as a whole.Comparison with this method shows that our summarizationfeatures provide a notable improvement over performing the5-regression regression alone.

4) ANN Regression:

Finally, we compare the regressionphase with the ANN directly. The ANN is a powerful methodthat is one of the main methods used for classifying physicalactivity type. Once again, regression results are aggregated assums for the activity as a whole. ed. LHH MtV Walk RunSed. 16455

20 0 0 0

LHH

310 20 0

MtV

20 240

20 0

Walk

25 0 25 Run

TABLE II: Above, we have the counts of the Confusion Matrix of activity instances for the Classiﬁcation Phase. Columns arepredictions and rows are actuals. Along the diagonal are the in-class true-counts.

Sed. LHH MtV Walk RunSed.

TABLE III: Here, we show the Confusion Matrix for the Classiﬁcation Phase as percentages, with Recall along the diagonal.Columns are predictions and rows are actuals.

D. Classiﬁcation Results

In validating PA classiﬁcation, we use multi-class accuracy.We will generally refer to in-class accuracy, as opposed tooverall accuracy. The in-class accuracies of each class are moreindicative of the success of our method as they are not heavilyinﬂuenced by the large number of easily-correctly-classiﬁedSedentary activities. The changes in overall accuracy are lessindicative of improvement because of the imbalance in theunderlying dataset.In Table II, we show a confusion matrix detailing theclassiﬁcation of activity instances. Rows represent the groundtruth labels of each instance, whereas columns represent ourclassiﬁcation model’s predictions. Along the diagonals in boldscript are the correctly classiﬁed instances. Along the diagonalof Table III, we see the recall of the classiﬁer.In Table IV, we show the classiﬁcation results of an ANN,which achieves poor Run category recall. A recall of 8.25% isfar worse than of random guessing (an expectation of 20% fora 5-class classiﬁcation). This indicates that the model is biasedagainst making predictions of Running. While the ANN hasa higher recall for the Light Household Chores category, itis misclassifying a larger portion of instances outside of theLHH category.

E. Regression Results

In validating PAEE regression results, we make use of theroot mean-squared error (RMSE) measure. The formula forRMSE is as follows, where ˆ y is the regression output and y is the true EE (Energy Expenditure).RMSE = (cid:114) E (cid:16) (ˆ y − y ) (cid:17) (9)This measure acts as the sample standard deviation ofdifferences between predicted and true values, meaning thatwe can interpret it directly as a measure of both accuracy andprecision of prediction [18]. We calculate RMSE for both in-class and overall.In Table V, we show the RMSE values associated with thepredicted METs for each regression model tested by every activity category. Our method out-performed each methodoverall and in each individual activity category as well, achiev-ing an overall RMSE of 0.7206. It should be noted that theRMSE associated with the Running is much larger than thatof the Walking category in other regression methods. In ourmethod’s predictions, the RMSE values of both categories aremuch closer together, differing by only 0.0425 units. Thisindicates that the intensity of the activity doesn’t have animpact on precision of EE estimates in our method.VI. R ELATED W ORK

Here we present a list of major works that have beendone on physical activity classiﬁcation and energy expenditurepredictions using machine learning methods. We also includeimportant works in the greater ﬁeld of time series analysis.In 2005, Crouter et al. introduce the two-regression modelwhich alternates between a quadratic regression model and alinear regression model based on the coefﬁcients of variationsof each bout. [6] This novel approach broke the overallproblem objective into two key parts: ﬁrst, separating instancesby their variability into two groupings based on their co-efﬁcients of variation; second, applying to each grouping aregression model which is more appropriate for instances ofthat variability.In 2009, Ye et al. develop a new time series primitive calleda shapelet. The shapelet is a subsequence pattern intended tobe maximally representative of some class of time series. Theydemonstrate in the work that the new concept is time-efﬁcient,accurate, and interpretable [19].In 2012, Trost et al. was able to improve physical activityclassiﬁcation accuracy as well as low root mean squared-error(RMSE) in energy expenditure estimation with an ArtiﬁcialNeural Network (ANN) model. [17]In that same year, Mu et al., revisited the two-regressionmodel of Crouter et al. and extended it to a number ofregression models, one per each activity type [12]. The dataused in this study including each activity bout therein wasstructured rather variably, which made it analogous to a free-living data collection. This method utilized distance metric ed. LHH MtV Walk RunSed.

TABLE IV: Here, we show the Confusion Matrix for the ANN classiﬁcation as percentages, with Recall along the diagonal.Columns are predictions and rows are actuals. Take note of the misclassiﬁcation of Run activities as Walking and LightHousehold Chores and Games.

Sed. LHH MtV Walk Run OverallLinear Regression

ANN

Our Method 0.1741 1.0406 1.4231 1.2268 1.2693 0.7206

TABLE V: Above, we show the RMSE for MET estimates of each Regression Model by Activity Class. Our method performsbest in each class and overall. Note the similarity of RMSEs between Walking and Running, a similarity not experienced byany of the other methods. This indicates that the intensity of the activity doesn’t have an impact on precision of EE estimates.learning methods to learn the underlying block structure ofvariable-length activity bouts.In 2014, Staudenmayer et al. expanded on the ﬁeld withanother ANN model which they applied to their own dataset[16]. However, their classiﬁcation procedure was targetinglearned activity types, as opposed to expert-deﬁned types.They produced these types through clustering based on theirsignal activity levels.Petitjean et al. developed a nearest centroid classiﬁcationmethod which constructs centroids that are meaningful to theDynamic Time Warping (DTW) Algorithm, to allow for time-efﬁcient classiﬁcation with the distance-based approach [15].Bastion et al. published an evaluation of cutting-edge meth-ods outside of the rigid laboratory setting and conﬁrmedthe activity classiﬁcation community’s suspicions that existingmethods would not perform well in the free-living setting [1].Hills et al. developed a time series classiﬁcation methodusing shapelets to produce an alternative representation of timeseries signals where the new features are distances from eachof k shapelets [9].In 2015, Baydogan et al. made use of the dependencystructure within time series to develop a representation andsimilarity measure which they validate on a wide variety oftime series domains [2].In 2017, David Hallac et al. developed a method thatsegments and clusters time series data using structural net-work relations rather than spatial distance to encode differentgroupings of time series segments [8].In 2018, Stanislas Chambon et al. developed a deep learningapproach for sleep stage classiﬁcation that learns end-to-endwithout computing spectrograms or extracting handcraftedfeatures [5].In 2019, Fazle Karim et al. proposed transforming theexisting univariate time series classiﬁcation models, LSTM-FCN and ALSTM-FCN, into a multivariate time series classiﬁ-cation model by augmenting the fully convolutional block witha squeeze-and-excitation block to further improve accuracy[10]. VII. D ISCUSSION

To resolve the lack of classical structure in the time seriesdata, in this paper, we chose to bridge the gap betweenthe variable-length, empirically-chosen features for physicalactivity data and a latently-learned ﬁxed-length representation,or summary. Our method,

SummerTime , leverages an existingfeature construction and through clustering of disjoint timewindows establishes a summary of the time series as a whole.The features of the summarization were extracted from thedata unsupervised through Gaussian mixture using a varia-tional Bayesian approach. This allows our method to zero-inon the number of representative features of the summarization,naturally doing feature selection. This clustering provides uswith an informative ﬁxed-length feature-vector which containsa global description of the signal for a single activity bout.From this, we are able to preform classical machine learningmethods on the summaries instead of the original instances.We show that this summarization is actually sufﬁcient forproblems like physical activity type classiﬁcation and out-performs classiﬁcation on each time window independentlywith majority voting. We then show that

SummerTime can aug-ment energy expenditure predictions per window by includingwith each window’s original features the summarization ofthe bout. In what we believe to be a spectacular achievement,

SummerTime manages to get small, comparable error forboth Walking and Running type activities, two activity typeswhich normally have signiﬁcant cross-class error. Overall,

SummerTime demonstrates low classiﬁcation and regressionerror overall, robustness, and effectiveness in an imbalanceddataset. We hope to further demonstrate the strengths of

SummerTime in more time series domains in the future.VIII. A

CKNOWLEDGEMENTS [Content removed in compliance with Triple Blind. Will beadded back in after Triple Blind period.]

EFERENCES[1] Thomas Bastian, Aur´elia Maire, Julien Dugas, Abbas Ataya, Cl´ementVillars, Florence Gris, Emilie Perrin, Yanis Caritu, Maeva Doron,St´ephane Blanc, Pierre Jallon, and Chantal Simon. Automatic identi-ﬁcation of physical activity types and sedentary behaviors from triaxialaccelerometer: laboratory-based calibrations are not enough.

Journal ofApplied Physiology , 118(6):716–722, 2015.[2] Mustafa Gokce Baydogan and George Runger. Time series represen-tation and similarity based on local autopatterns.

Data Mining andKnowledge Discovery , 30(2):476–509, 2016.[3] Christopher M Bishop. Mixture models and em. In

Pattern recognitionand machine learning , chapter 9. Springer, 2006.[4] Christopher M Bishop. Neural networks. In

Pattern recognition andmachine learning , chapter 5. Springer, 2006.[5] Stanislas Chambon, Mathieu N Galtier, Pierrick J Arnal, Gilles Wainrib,and Alexandre Gramfort. A deep learning architecture for temporalsleep stage classiﬁcation using multivariate and multimodal time series.

IEEE Transactions on Neural Systems and Rehabilitation Engineering ,26(4):758–769, 2018.[6] Scott E. Crouter, Kurt G. Clowers, and David R. Bassett. A novel methodfor using accelerometer data to predict energy expenditure.

Journal ofApplied Physiology , 100(4):1324–1331, 2006.[7] Patty S Freedson, Kate Lyden, Sarah Kozey-Keadle, and John Stauden-mayer. Evaluation of artiﬁcial neural network algorithms for predictingmets and activity type from accelerometer data: validation on anindependent sample.

Journal of Applied Physiology , 111(6):1804–1812,2011.[8] David Hallac, Sagar Vare, Stephen Boyd, and Jure Leskovec. Toeplitzinverse covariance-based clustering of multivariate time series data. In

Proceedings of the 23rd ACM SIGKDD International Conference onKnowledge Discovery and Data Mining , KDD ’17, pages 215–223, NewYork, NY, USA, 2017. ACM.[9] Jon Hills, Jason Lines, Edgaras Baranauskas, James Mapp, and AnthonyBagnall. Classiﬁcation of time series by shapelet transformation.

DataMining and Knowledge Discovery , 28(4):851–881, 2014.[10] Fazle Karim, Somshubra Majumdar, Houshang Darabi, and SamuelHarford. Multivariate lstm-fcns for time series classiﬁcation.

NeuralNetworks , 116:237–245, 2019.[11] MATLAB. version R2018a . The MathWorks Inc., Natick, Mas-sachusetts, 2018.[12] Y. Mu, H. Z. Lo, W. Ding, K. Amaral, and S. E. Crouter. Bipart:Learning block structure for activity detection.

IEEE Transactions onKnowledge and Data Engineering , 26(10):2397–2409, Oct 2014.[13] Nikolaos Nasios and Adrian G Bors. Variational learning for gaussianmixture models.

IEEE Transactions on Systems, Man, and Cybernetics,Part B (Cybernetics) , 36(4):849–862, 2006.[14] M. Chen, Pattern Recognition and Machine Learning Toolbox, 2016.MATLAB File Exchange.[15] Franc¸ois Petitjean, Germain Forestier, Geoffrey I Webb, Ann E Nichol-son, Yanping Chen, and Eamonn Keogh. Dynamic time warpingaveraging of time series allows faster and more accurate classiﬁcation.In

Data Mining (ICDM), 2014 IEEE International Conference on , pages470–479. IEEE, 2014.[16] J. Staudenmayer, D. Pober, S. Crouter, D. Bassett, and P. Freedson. Anartiﬁcial neural network to estimate physical activity energy expenditureand identify physical activity type from an accelerometer.

J. Appl.Physiol. , 107(4):1300–1307, Oct 2009.[17] S. G. Trost, W. K. Wong, K. A. Pfeiffer, and Y. Zheng. Artiﬁcial neuralnetworks to predict activity type and energy expenditure in youth.

MedSci Sports Exerc , 44(9):1801–1809, Sep 2012.[18] Wikipedia contributors. Root-mean-square deviation — Wikipedia, thefree encyclopedia, 2018.[19] Lexiang Ye and Eamonn Keogh. Time series shapelets: a new primitivefor data mining. In