[PDF] Machine Learning-based Classification of Active Walking Tasks in Older Adults using fNIRS

Abstract

Decline in gait features is common in older adults and an indicator of disability and mortality. Cortical control of gait, specifically in the pre-frontal cortex as measured by functional near infrared spectroscopy (fNIRS), during dual task walking has shown to be moderated by age, gender, cognitive status, and various age-related disease conditions. In this study, we develop classification models using machine learning methods to classify active walking tasks in older adults based on fNIRS signals into either Single-Task-Walk (STW) or Dual-Task-Walk (DTW) conditions. In this study, we develop classification models using machine learning methods to classify active walking tasks in older adults based on fNIRS signals into either single-task walking (STW) or dual-task walking (DTW). The fNIRS measurements included oxyhemoglobin (HbO2) and deoxyhemoglobin (Hb) signals obtained from prefrontal cortex (PFC) of the subject performing on the ground active walking tasks with or without a secondary cognitive task. We extract the fNIRS-related features by calculating the minimum, maximum, mean, skewness and kurtosis values of Hb and Hbo2 signals. We then use feature encoding to map the values into binary space. Using these features, we apply and evaluate various machine learning methods including logistic regression (LR), decision tree (DT), support vector machine (SVM), k-nearest neighbors (kNN), multilayer perceptron (MLP), and Random Forest (RF). Results showed that the machine learning models can achieve around 97\% classification accuracy.

Full PDF

11 Machine Learning-based Classiﬁcation of ActiveWalking Tasks in Older Adults using fNIRS

Dongning Ma, Meltem Izzetoglu, Roee Holtzer, and Xun Jiao

Abstract — Decline in gait features is common in olderadults and an indicator of disability and mortality. Corticalcontrol of gait, speciﬁcally in the pre-frontal cortex asmeasured by functional near infrared spectroscopy (fNIRS),during dual task walking has shown to be moderated byage, gender, cognitive status, and various age-related dis-ease conditions. In this study, we develop classiﬁcationmodels using machine learning methods to classify activewalking tasks in older adults based on fNIRS signals into ei-ther Single-Task-Walk (STW) or Dual-Task-Walk (DTW) con-ditions. The fNIRS measurements included oxyhemoglobin(HbO2) and deoxyhemoglobin (Hb) signals obtained fromprefrontal cortex (PFC) of the subject performing on theground active walking tasks with or without a secondarycognitive task. We extract the fNIRS-related features by cal-culating the minimum, maximum, mean, skewness and kur-tosis values of Hb and Hbo2 signals. We then use featureencoding to map the values into binary space. Using thesefeatures, we apply and evaluate various machine learningmethods including logistic regression (LR), decision tree(DT), support vector machine (SVM), k-nearest neighbors(kNN), multilayer perceptron (MLP), and Random Forest(RF). Results showed that the machine learning models canachieve around 97% classiﬁcation accuracy.

Index Terms — functional near infrared spectroscopy, ma-chine learning, aging, active walking

I. I

NTRODUCTION

Mobility impairments are common in healthy aging aswell as age-related disease conditions such as mild cognitiveimpairments and dementia [1]–[7]. Limitations in walking,speciﬁcally decline in gait speed is associated with variousadverse outcomes including higher rates of morbidity, lossof independence and mortality [8], [9]. Hence, impairmentsin locomotion can affect the individuals and their familiesdetrimentally and pose a major public health challenge tosociety [9], [10].Identifying mechanisms of mobility impairments is of vitalimportance in developing risk assessment and interventionprocedures to ameliorate mobility decline and disability inaging populations. Motor control models of locomotion androbust associations between structural changes in frontal andsubcortical brain regions with mobility outcomes have beenestablished [11]–[13]. Even though converging evidence sug-gest the role cognitive processes, speciﬁcally the executivefunctions in explaining mobility performance and decline inolder adults [14], [15], studies on the real time assessment offunctional neural correlates of simple and attention-demandinglocomotion tasks is scarce. This gap could be in part due to the requirements of subject immobility and supine position-ing in traditional neuroimaging modalities during scanningprocedures making functional imaging of real, on the groundwalking unattainable.Recent studies began to increasingly utilize an emergingneuroimaging modality, namely functional near infrared spec-troscopy (fNIRS) to assess cortical control and functionalnetworks of mobility in aging populations [1]–[7], [16]–[25].fNIRS is an optics based noninvasive, safe, portable, andwearable neuroimaging technique [26]–[30]. It can moni-tor relative changes in oxygenated-hemoglobin (HbO2) anddeoxygenated-hemoglobin (Hb) associated with cognitive ac-tivity using the light–tissue interaction properties of lightwithin the near infrared range (650 – 950nm) [31]–[34]. fNIRShas been widely applied for the monitoring of functional ac-tivity in executive function, attention, memory, motor, visual,auditory and language domains and well validated againsttraditional neuroimaging methods [26]–[30]. Since it was lessprone to movement artifacts and allow imaging of the brainfunctioning in upright and mobile conditions it is also wellsuited for the study of cognitive control of mobility in agingduring active walking tasks [16]–[25].While the tasks used in the investigation of functional brainmechanisms of mobility using fNIRS technology varies acrossstudies, the most commonly implemented ones involve thewalking under single- and dual-task conditions [1]–[7], [16]–[25]. The well validated dual-task walking (DTW) paradigmhas been used to determine the effect of increased demands onattentional resources on gait performance which has emergedas a key risk factor for incident frailty, disability and mor-tality [35]. Speciﬁcally, in prior studies reproducible andstatistically signiﬁcant results have been found in HbO2 valuesas measured by fNIRS from the PFC which increased inDTW as compared to STW due to greater cognitive demandsthat are inherent in the former walking condition [16]–[25].Furthermore, it was found that cortical responses to taskdemands speciﬁcally in the DTW condition were moderated byage [17], gender and stress [18], fatigue level [19], medicationuse [21], and disease status including diabetes [22], MultipleSclerosis (MS) [23], mild cognitive impairments [24], andneurological gait abnormalities [7].Even though these growing number of studies that utilizedfNIRS measures during DTW paradigms on older adults haverepeatedly shown via statistical comparisons that hemody-namic biomarkers from PFC can discriminate between var-ious walking task conditions and disease populations, theirautomatic classiﬁcation using machine learning algorithms a r X i v : . [ c s . L G ] F e b have not yet been studied. Automatic detection of atten-tion demanding vs simple walking tasks using discriminativehemodynamic features extracted from HbO2 and Hb not onlycan provide information on an individual’s use of his/herattentional resources but can also lead to diagnosis, monitoringand classiﬁcation of different age-related disease conditions.fNIRS measures have been used in the classiﬁcation of widerange of tasks and disease populations in different age groupsbefore such as in applications for monitoring of mental work-load, motor imagery, auditory and visual perception, variousbrain computer interfaces, pain assessment, anesthesia mon-itoring, attention deﬁcit and hyperactivity disorder (ADHD)diagnosis, cognitive decline in traumatic brain injury, diagnosisof various mental illnesses such as schizophrenia [35]–[43].There are a few studies on the classiﬁcation of intentions forthe initiation or stopping of walking, its step size and speedprimarily for gait rehabilitation applications to control assistivedevices such as prostheses or exoskeletons. Such studiesprimarily monitored motor areas and investigated classiﬁcationof intentions or preparations to different types of gaits inhealthy young adults where classiﬁcation accuracy was found ∼

80% ranges [44]–[47]. In these small number of prior work,fNIRS measures from PFC during simple and attentionallydemanding dual-task walking conditions that are indicativeof different cognitive states and disease conditions in elderlypopulations were not studied with machine learning modelsand algorithms.In this study our aim is to achieve automatic classiﬁcationbetween active walking under simple (STW) and more cog-nitively taxing conditions (DTW) in older adults using fNIRSmeasures from the PFC with high accuracy. We have used ourpreviously collected fNIRS data set from community residinghealthy older adults (n=451) while they were performing STWand DTW tasks [6]. We have extracted features from HbO2and Hb signals such as maximum, minimum, mean, skewnessand kurtosis, and used them with/without gender informationand the Repeatable Battery for the Assessment of Neuropsy-chological Status (RBANS) outcome in various machine learn-ing models to accurately classify the two walking activities.Among the evaluated machine learning algorithms includinglogistic regression (LR), support vector machine (SVM), k-nearest neighbors (kNN), multilayer perceptron (MLP) andrandom forest (RF), our ﬁndings indicated that LR generatedthe highest accuracy (97%) when fNIRS features togetherwith gender and RBANS scores are used. To the best ofour knowledge, we are the ﬁrst to apply machine learningmethods in fNIRS-based walking task classiﬁcation in olderadults achieving high accuracy.This paper is organized as follows: In Section II, weintroduce the information of the participants and our taskprotocol. In Section III, we explain our proposed methods indetail. We present the results of our comprehensive results inSection IV and ﬁnally, we provide concluding remarks andsuggestions for future work in Section V.

II. P

ARTICIPANTS AND T ASK P ROTOCOL

A. Participants

The study involved a total of n = 451 community dwellingolder adults in lower Westchester county, NY of age ≥ ± ± ± B. Task Protocol

The task protocol used in this study involved two singletasks and one dual-task conditions presented in a counterbal-anced order using a Latin-square design to minimize task ordereffects on the outcome measures. The single task conditionswere 1) single-task walking (STW) and 2) the cognitiveinterference task (Alpha). In STW condition, participantswere asked to walk at their “normal pace” around a 4 ×

14 foot electronic walkway (Zenometrics system with Zenoelectronic walkway using ProtoKinetics Movement AnalysisSoftware (PKMAS), Zenometrics, LLC; Peekskill, NY). Inthe Alpha condition participants were asked to stand still onthe electronic walkway while reciting alternate letters of thealphabet out loud (A, C, E. . . ) for 30 seconds. In the dual-task walking (DTW) condition, participants were required toperform the two single tasks at the same time by walkingaround the walkway at their normal pace while recitingalternate letters of the alphabet. Participants were speciﬁcallyasked to pay equal attention to both the walking and cognitive interference tasks to minimize task prioritization effects. Inboth STW and DTW conditions participants were asked towalk on the instrumented walkway in three continuous loopsthat consisted of six straight walks and ﬁve left-sided turns.The duration of each task condition varied depending on theindividual’s walking speed. Reliability and validity for thiswalking paradigm have been well established [50].

III. M

ETHODS

An overview of the proposed methods utilized in thiswork is illustrated in Fig. 1. We have four major steps: datacollection, data pre-processing, feature extraction and applyingmachine learning models: • Data Collection:

In data collection, participants wereasked to complete the task protocol as instructed, duringwhich their hemodynamic activations were collected us-ing fNIRS. In addition, we also collected subject-relateddata (gender and RBANS) of the participants. • Data Pre-processing:

In data pre-processing, we applydifferent methods such as visual inspection, wavelet de-noising, hemodynamic data conversion, and spline andlow pass ﬁlterings to obtain HbO2 and Hb data ofparticipants in time domain for different task conditions. • Feature Extraction:

We extract features by calculatingthe maximum, minimum, mean, kurtosis and skewnessvalues of the Hb and HbO2 signals in different taskconditions. The value is calculated as an average of theleft and right hemisphere of the brain, corresponding tochannel 1-8 and channel 9-16. We then combined thosehemodynamics fNIRS-related hemodynamics data withsubject-related data of gender and RBANS scores to buildup our features vectors to be used in machine learningmodels. • Applying Machine Learning:

We applied various ma-chine learning algorithms in establishing machine learn-ing models using scikit-learn framework [51]. We ﬁne-tuned the model by considering different hyperparametersand conﬁgurations, and analyzed the trade-off betweenclassiﬁcation accuracy and computational efﬁciency.

A. Data Collection fNIRS System

We have utilized the fNIRS Imager 1100(fNIRS Devices, LLC, Potomac, MD) in this study to collectthe hemodynamic activations in the PFC while participantswere performing the task protocol [6], [26], [27], [52]. In thisfNIRS device, the sensor consists of 4 LED light sources and10 photodetectors conﬁgured as shown in Fig. 2(a) where eachsource-detector separation is set to 2.5 cm. The light sourceson the sensor (Epitex Inc. type L4X730/4X805/4X850-40Q96-I) contain three built-in LEDs having peak wavelengths at730, 805, and 850 nm, with an overall outer diameter of9.2 ± B. Data Pre-processing

First, visual inspection was performed on individual datafrom all voxels to identify and eliminate the ones withsaturation, dark current conditions or extreme noise. Then toeliminate spiky type noise, wavelet denoising with Daubechies5 (db5) wavelet was applied to the raw intensity measurementsat 730 and 850 nm wavelengths as proposed in [54] and widelyapplied in fNIRS studies [55]. The artifact-removed raw inten-sity measurements were then converted to changes in HbO2and Hb using modiﬁed Beer-Lambert law (MBLL) [25], [27],[34]. In MBLL, previously published values for conversionparameters i.e. wavelength and chromophore dependent molarextinction coefﬁcients ( (cid:15) ) and age and wavelength adjusteddifferential pathlength factor (DPF) were used [25], [27], [56].Finally, we applied Spline ﬁltering [57] followed by a ﬁniteimpulse response low-pass ﬁlter with cut-off frequency at 0.08Hz [25], [58] to HbO2 and Hb data separately to removepossible baseline shifts and to suppress physiological artifactssuch as respiration and Mayer waves.Data epochs corresponding to each task condition, STW, Al-pha and DTW, were extracted to be used in further processingfor feature extraction and machine learning model generationfor automatic activity classiﬁcation. fNIRS data acquisitionand the electronic walkway system for gait analysis weresynchronized using a central “hub” computer with E-Prime 2.0software where time stamps of start and end points for eachbaseline and task condition were marked and recorded [16]–[25]. In order to correctly extract the data epochs duringthe exact walking task execution periods, a second levelprocessing time synchronization method was implemented.The HbO2 and Hb data epochs corresponding to time intervalbetween the ﬁrst recorded foot contact with the walkway untilthe end of the 6th and ﬁnal straight walk algorithmicallydetermined by PKMAS as previously described in [7] wereextracted for STW and DTW conditions. Finally, proximal 10-second baselines administered prior to each experimental taskwere used to determine the relative task-related changes inthe extracted HbO2 and Hb data epochs for each of the taskcondition using the previously described baseline correctionmethod (subtracting the average value of the proximal baselineregion data from the following task epoch data) [16]–[25].As our prior studies had suggested that DTW and STW arethe most pronounced differentiating task conditions in agingand age-related diseases [6], [7], [16]–[25], we only usedHbO2 and Hb data epochs in DTW and STW tasks in furtherfeature extraction and machine learning model development toautomatically classify these two tasks in this work.

Data Collection Data Pre-Processing Feature Extraction Applying Machine Learning

Task Protocol fNIRS

DataSubject

Data

Participants

Conditions (STW/DTW)

Visual Inspection

Noise Elimination

MBLL Data Conversion Hb Data

HbO2

DataSpline Filter

Low-pass Filter

Hb Data HbO2 Data

Subject-related Data

Hb Data HbO2 DataSubject Data

Subject-related Data

FeatureVectors

Conditions (STW/DTW)

Labels Model Training

ML Algorithms

Model

Training

Training DataTesting Data

Evaluation(Testing)

Classification Accuracy T r a i n / T e s t S p li t Min, Max, Mean, Skewness, KurtosisHb Features HbO2 FeaturesHb Features HbO2 Features

LeftRight

Fig. 1 : Overview. (a)(b)

Fig. 2 : fNIRS system (a) the sensor pad; (b) sensorplacement on the forehead with 16 voxel locations.

C. Feature Extraction fNIRS-related features fNIRS-related features used in thisstudy were extracted from Hb and HbO2 data in STW andDTW conditions. The data were in different time lengthsfor each walking task and subject because normal pace andhence task completion time differed between subjects andtask conditions. For consistency, we select to use all of thetime samples within the ﬁrst 60 seconds of both Hb andHbO2 data which corresponds to at most 120 samples of datasince the sampling rate was 2Hz. For all the 16 channelsin the fNIRS sensor pad, the total datapoints, namely thedimension of fNIRS-related features, is 16 (channels) * 120(samples) * 2 (Hb and HbO2) = 3840 at maximum, whichis unrealistically high for machine learning. Therefore, toreduce dimensionality, we calculate the statistical values suchas maximum, minimum, mean, skewness and kurtosis as the features for each channel. We further split the 16 channels intoleft hemisphere (channel 1-8) and right hemisphere (channel 9-16) and calculate the average statistical values as the featuresfor each hemisphere. Thus the dimension of fNIRS-relatedfeatures are reduced to 2 (hemispheres) * 5 (statistical features)* 2 (Hb and HbO2) = 20. This way issues such as differentdata lengths due to varying walking speeds across individualsand missing channel recordings due to noise elimination werealso resolved.In addition, we observe that fNIRS features drastically varyacross different subjects: from Fig. ?? we can observe that,although HbO2 generally has a higher average and maximumvalue than Hb, both of them shows a very similar normal dis-tribution pattern. Such distribution indicates that the featureswe extract from fNIRS data is strongly subject-dependent, thusare not feasible to directly feed into machine learning model.Therefore, to eliminate biases from individual subjects, wefurther perform feature encoding within each of the subjects.For each subject, we ﬁrst collect the 20 features from STWand DTW, respectively. Then we compare the features fromSTW and the features from DTW. The feature with a highervalue is then set to 1 and the feature with a lower value isthus set to 0. Therefore by feature encoding, we map a 20-dimension, ﬂoating point array into a 20-dimension binarizedarray based on the inequity between the two task conditions. Subject-related features

In addition to the fNIRS-relatedfeatures, we also considered other subject-related features suchas the gender ( G ) and neuropsychological status ( S ) basedon RBANS since they were found as moderators of DTWperformance previously [18]. These features were added asparallel columns with the fNIRS-related features to provideenhancement in classiﬁcation accuracy. Feature Vector and Label

In summary, we established featurevectors to be used in machine learning algorithms composed ofdifferent fNIRS-related features (maximum, minimum, mean,skewness and kurtosis on the left and right hemispheres) andsubject-related features (G and S) as shown in Eq. 1. Thecorresponding features are described in detail in Table I. Wehave used various machine learning models on these multiple s a m p l e s (a) STW Hb s a m p l e s (b) STW HbO2 s a m p l e s (c) DTW Hb s a m p l e s (d) DTW HbO2 Fig. 3 : Histogram of average Hb and HbO2 levels of different subjects under STW and DTW.fNIRS- and subject-related feature combinations to evaluatetheir performance in the classiﬁcation of STW and DTWconditions which will be discussed in Section IV. The labelswere the corresponding task conditions (either STW or DTW)obtained from the task protocol during the data collection. (cid:2)

G S

HbL HbR HbO2L HbO2R (cid:3) (1)

D. Machine Learning

We applied multiple machine learning algorithms as listedin Table II. such as decision tree (DT), support vector ma-chine (SVM), k-nearest neighbors (kNN), multilayer percep-tron (MLP) and random forest (RF). We ﬁrst used DT andkNN for their success and advantage in bioinformatics-relatedapplications and fast training and classiﬁcation time [59], [60].DT classiﬁer behaves like a ﬂow chart with different layers,using each node with a criterion on certain attributes (features)on the input data to classify them into different classes. kNNclassiﬁes an input by checking its k nearest neighbors’ class.In our case, if majority of the neighbors belongs to STW,then the input is classiﬁed as STW, otherwise as DTW. ForkNN, we tried two different parameters for k: 5 and 10, toobserve their classiﬁcation accuracy accordingly. However,because biomedical related data could often be multi-modaland have high dimension, decision tree and kNN suffersfrom disadvantages such as overgrown decision trees and highvariation in the nearest neighbours, which could signiﬁcantlyimpair the model performance.We further implemented the SVM classiﬁer known for han-dling high dimension data especially in biomedical applicationof which the dimension can be high. SVM tries to ﬁnd a(hyper-)plane to divide (classify) the input data into categories.However, a signiﬁcant drawback of SVM is that it requiresquite long time to train with large volume of data, resultingin cost-ineffectiveness. Recently, neural networks have beengaining growing attention on various types of classiﬁcationtasks, therefore, we also tried the MLP classiﬁer. A typicalMLP consists of one input layer to take input features, oneoutput layer for predicting classes and one or more hidden lay-ers with speciﬁc number of nodes. Since network architecturehas drastic impact on classiﬁcation accuracy, we implemented three structures of MLPs: two MLPs with one hidden layerof 10 and 50 nodes and one MLP with two hidden layers of10 nodes each. The major drawback of MLP is the modelsize to achieve competitive accuracy: to gain high accuracy,in general, more sophisticated network structures or morenodes need to be used, inhibiting the training and inferenceefﬁciency.As an ensemble classiﬁer from decision trees, we furtherused RF for the classiﬁcation task. An RF represents a “forest”of decision trees. Different decision trees inside the RF willgive different predictions on the classes based on the input.Then RF will use majority vote from the classiﬁcations toﬁnally decide the output class of this input. RF considers therelations between each input feature and could prevent the is-sues with a single DT to grow overly deep with large variance,yet RF requires higher computation complexity and more timeto train than DT. We tested RF of different structures with 5,10, and 25 trees.We adopted scikit-learn , a python-based machine learningplatform that support various types of machine learning algo-rithms as our machine learning framework [51]. The conﬁgu-ration and hyperparameters we used for these algorithms areshown in Table II. We randomly split the training data andtesting data in a ratio of 80% to 20%. We used the trainingdata to establish the model and then used the testing data toevaluate the model in terms of the classiﬁcation accuracy undereach conﬁguration. During classiﬁcation, we also record thetime on classifying 1000 samples of each machine learningalgorithm and conﬁguration applied, to evaluate the computa-tional cost. The machine learning experiments were executedon a computer conﬁguration of Intel Xeon Silver 4110 1-core2-thread CPU with 32GB memory.

IV. E

XPERIMENTAL R ESULTS

In this section, we present the experimental results on themachine learning model for the classiﬁcation of active walkingtasks in older adults. We evaluate the impact of differentfeature combinations as well as different machine learningalgorithms on classiﬁcation accuracy and computational ef-ﬁciency.

TABLE I : Features used for machine learning. 5-dimension of fNIRS-related features include maximum, minimum, mean,skewness and kurtosis.

Feature Description Dimesion Data TypeSubject-related RBANS Repeatable Battery for the Assessment of Neuropsychological Status 1 integerGen Gender (Male / Female) 1 binaryfNIRS-related HbL Features of Deoxygenated Hemoglobin from Left Hemisphere 5 binaryHbR Features of Deoxygenated Hemoglobin from Right Hemisphere 5 binaryHbO2L Features of Oxygenated Hemoglobin from Left Hemisphere 5 binaryHbO2R Features of Oxygenated Hemoglobin from Right Hemisphere 5 binary

TABLE II : Machine learning algorithms and theirconﬁguration

ML Algorithm Conﬁgurationlogistic regression (LR)decision tree (DT)random forest (RF) n trees = 5n trees = 10n trees = 25support vector machine (SVM)k-nearest neighbors (kNN) k = 5k = 10multilayer perceptron (MLP) n nodes = 10n nodes = 10, 10n nodes = 50Parameters absent from this table are default from Scikit-learn.

A. Classiﬁcation Accuracy Using Different Features

Features can impact machine learning model performanceconsiderably. To understand the impact of different features,both fNIRS-related and subject-related, we conducted a com-prehensive series of ablation experiments to examine themodel performance in terms of accuracy using different featurecombinations. Here, we chose to implement the model usinglogistic regression as the machine learning algorithm in allcomparisons.Table III shows the classiﬁcation accuracy of differentfNIRS-related feature combinations. With all the featuresincluded, the accuracy can achieve 96.68% at highest. First,by removing the features of maximum, minimum and meanvalues, the accuracy degrades to 80.74%, which indicatesthat those are critical features used by the machine learningalgorithm. Next, by removing the kurtosis and skewnessfeatures from the overall feature set, the accuracy slightlydrops to 95.9%. These two experiments reveal that althoughclassiﬁcation performance of using solely the kurtosis andskewness features is inferior, they can work with other featuresin synergy to improve the accuracy.We also investigate the relative importance of Hb and HbO2.Using only Hb or HbO2 features, the accuracy drops to97.56% and further to 95.9% or 95.51% when we remove thekurtosis and skewness features. Based on these experimentswe observe that there is no signiﬁcance on the importanceof Hb or HbO2, and using both of the features achieves thehighest accuracy.

B. Classiﬁcation Accuracy Using Different Algorithms

In machine learning, in addition to the selected features,the types of algorithms and their con- ﬁguration implementedin the analysis can also affect the model classiﬁcation perfor-mance, tremendously. As introduced in Section III, to searchfor the best machine learning method and conﬁguration forfNIRS-based active walking classiﬁcation, we conducted acomprehensive series of experiments with various machinelearning algorithms and conﬁgurations shown in Fig. II. Wepresent the classiﬁcation accuracy and the time consumedfor classifying 1000 samples using different machine learningalgorithms and conﬁgurations in Table IV.We can observe that for all the examined algorithms, theclassiﬁcation accuracy is above 95%. Algorithms such asrandom forest (RFC), support vector machine (SVM), k-nearest neighbor (kNN) and Decision Tree produces mediocreaccuracy from 95.9% to 97.54%. logistic regression (LR) andmultilayer perceptron (MLP) with 10 or 50 hidden nodesachieve highest accuracy of 96.68%.kNN (k=1) is the fastest amongst all the examined al-gorithms with less than 5 ms train time and 1.05 ms testtime, however its accuracy of 95.9% is the lowest of allalgorithms. Of the two most accurate algorithms, LR andMLP, LR is around 70X faster in training and 2X faster intesting than MLP. This is because neural networks like MLPhave considerable amount of parameters to train and higheriterations for the model to obtain adequate accuracy.

C. Classiﬁcation Accuracy Under Data Reduction

One major challenge of using machine learning in fNIRS-related applications is the “Small-N” problem. It means thesamples, or the size of datasets available is extremely limited.To examine the performance of our model under limiteddataset, we perform two types of experiments regarding datasetreduction.First, we reduce the number of subjects used in training.Without dataset reduction, we have a total number of 451subjects by default. We record the classiﬁcation of the LRmodel using only 25%, 50% and 75% of them. We can observefrom Fig. 4(a) that, using reduced size of training datasetwill cause degradation in accuracy. Slight reduction in datasetsize, e.g., from 100% to 75% does not impact the accuracysigniﬁcantly. However, as the reduction grows, the accuracydegrades exponentially from around 97% to 73%.Second, we reduce the time length of fNIRS-related signalsused in the feature extraction. By default, we use 60 seconds

TABLE III : Ablation experiments: Classiﬁcation accuracy with different features included.

Hb HbO2Max Min Mean Kurtosis Skewness Max Min Mean Kurtosis Skewness Accuracy (%)X X X X X X X X X X 96.68X X X X 75.73X X X X X X 94.39X X X X X 96.13X X X X X 96.13X X X 94.39X X X 94.13

TABLE IV : Classiﬁcation accuracy and time using differentmachine learning algorithms and conﬁgurations

Algorithm Accuracy (%) Train Time (ms) Test Time (ms)LR 96.68 7.4 1.1DT 95.45 2.87 1.04RFC 5 95.87 9.35 1.86RFC 10 96.13 14.02 2.27RFC 25 96.39 31.95 3.62SVM 96.00 6.59 2.3kNN 5 95.09 8.65 1.25kNN 1 92.73 4.95 1.05MLP 10 95.45 415.57 1.87MLP 10, 10 96.00 595.48 1.99MLP 50 96.13 627.94 2.1 of fNIRS signals to extract features for the machine learningmodel. According to the fNIRS experiments we acknowledgethat while for most subjects, the time lengths to complete thetask is around 30 – 40 seconds for STW and 35 – 50 secondsfor DTW, it can signiﬁcantly vary from 20 seconds to 100seconds as shown in 5. Therefore, we evaluated the accuracyof the machine learning model when up to 30, 45, 60 and90 seconds of data is considered as shown in Fig. 4(b). Firstwe can observe that, the machine learning model performsinferior when using reduced time length of signals as theaccuracy drops from around 97% to 81% when the time lengthis reduced from 60 seconds to 30 seconds. This outcomemay suggest that the discriminative information within theselect fNIRS features between the two task conditions canbe diminished if shorter data segments are used. However,using longer time length does not always guarantee a higheraccuracy. When using 90 seconds of fNIRS signal which is30 seconds longer than the default 60 seconds, the accuracydoes not gain additional increases. This ﬁnding could also bedue to the fact that the number of participants having datalength higher than 60 seconds were low to cause signiﬁcantdifference in the already very high accuracy values obtainedwhen up to 60 seconds of data is used.The counter-intuitive observation when using 90 secondstime length indicates useful features from the fNIRS signalmay not expand throughout the entire time length, as in-creasing the time length included in the feature extract from60 seconds to 90 seconds does not yield a higher accuracyin classiﬁcation. From the dataset reduction experiments, themost useful feature may exist within around the ﬁrst 60seconds based on our observation.

25% 50% 75% 100% a cc u r a c y ( % ) dataset size (a) Dataset Size Reduction

30 45 60 90 a cc u r a c y ( % ) time length (sec) (b) Time Length Reduction Fig. 4 : Classiﬁcation accuracy under data reduction.

20 30 40 50 60 70 80Time Length on Task Completion (sec)0102030405060 s u b j e c t s (a) STW

20 40 60 80 100Time Length on Task Completion (sec)0510152025303540 s u b j e c t s (b) DTW Fig. 5 : Histogram of time lengths for task completion.

V. C

ONCLUSION

In this study, we have applied machine learning methods onextracted fNIRS-based hemodynamic features in together withgender and cognitive status information for the classiﬁcation ofwalking tasks in older adults for the ﬁrst time. We extracteduseful feature representations such as maximum, minimum,mean kurtosis and skewness values, based on which we trainedmachine learning models using various algorithms includinglogistic regression, random forest and neural networks. Wecompare various feature combinations and different machinelearning models and hyperparameter conﬁgurations in termsof their classiﬁcation accuracy and computational efﬁciencyto select the best model. Our machine learning model showedhigh performance on the classiﬁcation of active walking tasksin older adults, with accuracy of around 97% using logisticregression over fNIRS-related features, combined with subject-related features of gender and RBANS. As we have shown herethat automatic classiﬁcation of active walking tasks in older adults can be successfully achieved by using appropriate ma-chine learning models and discriminative features, our futureworks involve classiﬁcation of different age-related diseasepopulations and healthy controls using the methods studiedhere. R EFERENCES [1] R. Holtzer, N. Epstein, J. R. Mahoney, M. Izzetoglu, andH. M. Blumen, “Neuroimaging of mobility in aging: A tar-geted review,”

Journals of Gerontology Series A: BiomedicalSciences and Medical Sciences , vol. 69, no. 11, pp. 1375–88,2014.[2] R. Vitorio, S. Stuart, L. Rochester, L. Alcock, and A. Pantall,“Fnirs response during walking—artefact or cortical activity?a systematic review,”

Neuroscience & Biobehavioral Reviews ,vol. 1, no. 83, pp. 160–72, 2017.[3] F. Herold, P. Wiegel, F. Scholkmann, A. Thiers, D. Hamacher,and L. Schega, “Functional near-infrared spectroscopy inmovement science: A systematic review on cortical activityin postural and walking tasks,”

Neurophotonics. , vol. 4, p. 4,2017.[4] D. Leff, F. Orihuela-Espina, C. E. Elwell, T. Athanasiou,D. T. Delpy, A. W. Darzi, and G. Z. Yang, “Assessment ofthe cerebral cortex during motor task behaviours in adults:A systematic review of functional near infrared spectroscopy(fnirs) studies,”

Neuroim. , vol. 54, no. 4, pp. 2922–36, 2011.[5] D. Hamacher, F. Herold, P. Wiegel, D. Hamacher, and L.Schega, “Brain activity during walking: A systematic review,”

Neuroscience & Biobehavioral Reviews , vol. 57, pp. 310–27,2015.[6] R. Holtzer, J. R. Mahoney, M. Izzetoglu, C. Wang, S. England,and J. Verghese, “Online fronto-cortical control of simple andattention-demanding locomotion in humans,”

Neuroimage. ,vol. 112, pp. 152–9, 2015.[7] R. Holtzer, J. Verghese, G. Allali, M. Izzetoglu, C. Wang,and J. R. Mahoney, “Neurological gait abnormalities moderatethe functional brain signature of the posture ﬁrst hypothesis,”

Brain topo. , vol. 29, no. 2, pp. 334–43, 2016.[8] M. Hirvensalo, T. Rantanen, and E. Heikkinen, “Mobility difﬁ-culties and physical activity as predictors of mortality and lossof independence in the community-living older population,”

JAm. Geriatr. Soc. , vol. 48, no. 5, pp. 493–498, 2000.[9] S. Studenski, S. Perera, K. Patel, C. Rosano, K. Faulkner,M. Inzitari, J. Brach, J. Chandler, P. Cawthon, E. B. Connor,M. Nevitt, M. Visser, S. Kritchevsky, S. Badinelli, T. Harris,A. B. Newman, J. Cauley, L. Ferrucci, and J. Guralnik, “Gaitspeed and survival in older adults,”

JAMA , vol. 305, no. 1,pp. 50–58, 2011.[10] J. Verghese, C. Wang, and R. Holtzer, “Relationship of clinic-based gait speed measurement to limitations in community-based activities in older adults,”

Arch Phys Med Rehabil. ,vol. 92, 844–846, 2011.[11] T. Drew, S. Prentice, and B. Schepens, “Cortical and brainstemcontrol of locomotion,”

Prog Brain Res , vol. 143, pp. 251–261,2004.[12] M. Lucas, M. E. Wagshul, M. Izzetoglu, and R. Holtzer,“Moderating effect of white matter integrity on brain activationduring dual-task walking in older adults,”

J Geront. Ser. A ,vol. 74, no. 4, pp. 435–441, 2019.[13] M. E. Wagshul, M. Lucas, K. Ye, M. Izzetoglu, and R. Holtzer,“Multi-modal neuroimaging of dual-task walking: Structuralmri and fnirs analysis reveals prefrontal grey matter volumemoderation of brain activation in older adults,”

NeuroImage ,vol. 189, pp. 745–754, 2019.[14] R. Holtzer, C. Wang, and J. Verghese, “The relationshipbetween attention and gait in aging: Facts and fallacies,”

MotorControl , vol. 16, pp. 64–80, 2012. [15] G. Yogev-Seligmann, J. M. Hausdorff, and N. Giladi, “Therole of executive function and attention in gait,”

Mov. Disord. ,vol. 23, no. 2, pp. 329–42, 2008.[16] R. Holtzer, M. Izzetoglu, M. Chen, and C. Wang, “Distinctfnirs-derived hbo2 trajectories during the course and overrepeated walking trials under single-and dual-task conditions:Implications for within session learning and prefrontal cortexefﬁciency in older adults,”

J. of Geront.: Ser. A , 2018.[17] R. Holtzer, J. R. Mahoney, M. Izzetoglu, K. Izzetoglu, B.Onaral, and J. Verghese, “Fnirs study of walking and walkingwhile talking in young and old individuals,” in

J Geront , Ser.A: Biomed Sci. Med. Sci. vol. 66, no. 8, 2011, pp. 879–87.[18] R. Holtzer, C. Schoen, E. Demetriou, J. R. Mahoney, M.Izzetoglu, C. Wang, and J. Verghese, “Stress and gendereffects on prefrontal cortex oxygenation levels assessed duringsingle and dual-task walking conditions,”

European Journal ofNeuroscience. , vol. 45, no. 5, pp. 660–70, 2017.[19] R. Holtzer, J. Yuan, J. Verghese, J. R. Mahoney, M. Izze-toglu, and C. Wang, “Interactions of subjective and objectivemeasures of fatigue deﬁned in the context of brain control oflocomotion,”

The Journals of Gerontology: Series A. , vol. 72,no. 3, pp. 417–23, 2017.[20] M. Chen, S. Pillemer, S. England, M. Izzetoglu, J. R. Mahoney,and R. Holtzer, “Neural correlates of obstacle negotiationin older adults: An fnirs study,”

Gait & posture , vol. 58,pp. 130–5, 2017.[21] C. J. George, J. Verghese, M. Izzetoglu, C. Wang, and R.Holtzer, “The effect of polypharmacy on prefrontal cortexactivation during single and dual task walking in communitydwelling older adults,”

Pharmacological research , vol. 139,pp. 113–9, 2019.[22] R. Holtzer, C. J. George, M. Izzetoglu, and C. Wang, “Theeffect of diabetes on prefrontal cortex activation patternsduring active walking in older adults. brain and cognition,” vol. , vol. 125, pp. 14–22, 2018.[23] M. E. Hernandez, R. Holtzer, G. Chaparro, K. Jean, J. M.Balto, B. M. Sandroff, M. Izzetoglu, and R. W. Motl, “Brainactivation changes during locomotion in middle-aged to olderadults with multiple sclerosis,”

Journal of the neurologicalsciences , vol. 370, pp. 277–83, 2016.[24] R. Holtzer and M. Izzetoglu, “Mild cognitive impairmentsattenuate prefrontal cortex activations during walking in olderadults,”

Brain Sci , vol. 10, pp. 415–31, 2020.[25] M. Izzetoglu and R. Holtzer, “Effects of processing methodson fnirs signals assessed during active walking tasks in olderadult,”

IEEE Trans. on Neural Systems and Rehab. Eng. ,vol. 28, no. 3, pp. 699–709, 2020.[26] S. C. Bunce, M. Izzetoglu, K. Izzetoglu, B. Onaral, andK. Pourrezaei, “Functional near-infrared spectroscopy,”

IEEEengineering in medicine and biology magazine , vol. 25, no. 4,pp. 54–62, 2006.[27] M. Izzetoglu, K. Izzetoglu, S. Bunce, H. Ayaz, A. Devaraj,B. Onaral, and K. Pourrezaei, “Functional near-infrared neu-roimaging,”

IEEE Trans. on Neural Systems and Rehab. Eng. ,vol. 13, no. 2, pp. 153–9, 2005.[28] F. Scholkmann, S. Kleiser, A. J. Metz, R. Zimmermann, J. M.Pavia, U. Wolf, and M. Wolf, “A review on continuous wavefunctional near-infrared spectroscopy and imaging instrumen-tation and methodology,”

Neuroimage. , vol. 85, pp. 6–27,2014.[29] S. Cutini, S. B. Moro, and S. Bisconti, “Functional nearinfrared optical imaging in cognitive neuroscience: An intro-ductory review,”

J. of Near Infrared Spectroscopy , vol. 20,no. 1, pp. 75–92, 2012.[30] V. Quaresima and M. Ferrari, “Functional near-infrared spec-troscopy (fnirs) for assessing cerebral cortex function duringhuman behavior in natural/social situations: A concise review,”

Org. Res. Met.. 428116658959 , vol. 1094, 2016. [31] J. FF., “Noninvasive infrared monitoring of cerebral andmyocardial oxygen sufﬁciency and circulatory parameters,”

Science , vol. 198, 1264–1267, 1977.[32] B. Chance, E. Anday, S. Nioka, et al. , “A novel method forfast imaging of brain function, non-invasively, with light,”

OptExpress , vol. 2, 411–423, 1998.[33] G. Strangman, D. A. Boas, and S. JP., “Non-invasive neu-roimaging using near-infrared light,”

Biol Psychiatry , vol. 52,679–693, 2002.[34] M. Cope and D. T. Delpy, “System for long-term measurementof cerebral blood and tissue oxygenation on newborn in-fants by near infra-red transillumination,”

Med. Biol.Eng.Com. ,vol. 26, no. 3, 289–294, 1988.[35] F. Putze, S. Hesslinger, C.-Y. Tse, Y. Huang, C. Herff, C.Guan, and T. Schultz, “Hybrid fnirs-eeg based classiﬁcationof auditory and visual perception processes,”

Frontiers inneuroscience , vol. 8, p. 373, 2014.[36] Y. Liu, H. Ayaz, and P. A. Shewokis, “Multisubject ”learn-ing” for mental workload classiﬁcation using concurrent eeg,fnirs, and physiological measures,”

Front. in Hum. Neurosci. ,vol. 11, p. 389, 2017.[37] A. M. Chiarelli, P. Croce, A. Merla, and F. Zappasodi, “Deeplearning for hybrid eeg-fnirs brain–computer interface: Appli-cation to motor imagery classiﬁcation,”

J Neural Eng , vol. 15,no. 3, p. 036 028, 2018.[38] N. Naseer and K. S. Hong, “Fnirs-based brain-computer in-terfaces: A review,”

Front. in Hum. Neurosci. , vol. 9, p. 3,2015.[39] A. Pourshoghi, I. Zakeri, and K. Pourrezaei, “Applicationof functional data analysis in classiﬁcation and clusteringof functional near-infrared spectroscopy signal in responseto noxious stimuli,”

J of Biomed. Optics , vol. 21, no. 10,p. 101 411, 2016.[40] G. Hernandez-Meza, M. Izzetoglu, M. Osbakken, M. Green,H. Abubakar, and K. Izzetoglu, “Investigation of optical neuro-monitoring technique for detection of maintenance and emer-gence states during general anesthesia,”

J Clin. Monit. Comp. ,vol. 32, no. 1, pp. 147–163, 2018.[41] A. Guven, M. Altinkaynak, N. Dolu, M. Izzetoglu, F. Pektas,S. Ozmen, and T. Batbat, “Combining functional near-infraredspectroscopy and eeg measurements for the diagnosis ofattention-deﬁcit hyperactivity disorder,”

Neural Comp. Appl. ,vol. 3, pp. 1–4, 2019.[42] A. C. Merzagora, M. Izzetoglu, R. Polikar, V. Weisser, B.Onaral, and M. T. Schultheis, “Functional near-infrared spec-troscopy and electroencephalography: A multimodal imagingapproach,” in

Int. Conf. on Found. of Aug. Cog. , Berlin:Springer, 2019, pp. 417–426.[43] H. Song, L. Chen, R. Gao, I. I. Bogdan, J. Yang, S. Wang,W. Dong, W. Quan, W. Dang, and X. Yu, “Automaticschizophrenic discrimination on fnirs by using complex brainnetwork analysis and svm,”

BMC medical informatics anddecision making , vol. 17, no. 3, p. 166, 2017.[44] H. Jin, C. Li, and J. Xu, “Pilot study on gait classiﬁcationusing fnirs signals,”

Comp. Intel. and Neurosci. , 2018.[45] C. Li, J. Xu, Y. Zhu, S. Kuang, W. Qu, and L. Sun, “Detectingself-paced walking intention based on fnirs technology for thedevelopment of bci,”

Medical & Biological Engineering &Computing , vol. 21, pp. 1–9, 2020.[46] M. Rea, M. Rana, N. Lugato, P. Terekhin, L. Gizzi, D. Br¨otz,A. Fallgatter, N. Birbaumer, R. Sitaram, and A. Caria, “Lowerlimb movement preparation in chronic stroke: A pilot studytoward an fnirs-bci for gait rehabilitation,”

Neurorehabilitationand neural repair , vol. 28, no. 6, pp. 564–75, 2014.[47] R. A. Khan, N. Naseer, N. K. Qureshi, F. M. Noori, H. Nazeer,and M. U. Khan, “Fnirs-based neurorobotic interface for gaitrehabilitation,”

J Neuroeng. Rehab. , vol. 15, no. 1, p. 7, 2018. [48] R. Holtzer, J. Verghese, C. Wang, C. B. Hall, and R. B. Lipton,“Within-person across-neuropsychological test variability andincident dementia,”

JAMA , vol. 300, no. 7, pp. 823–830, 2008.[49] K. Duff, J. D. H. Clark, S. E. O’Bryant, J. W. Mold, R. B.Schiffer, and P. B. Sutker, “Utility of the rbans in detectingcognitive impairment associated with alzheimer’s disease: Sen-sitivity, speciﬁcity, and positive and negative predictive pow-ers,”

Archives of clinical neuropsychology : the ofﬁcial journalof the National Academy of Neuropsychologists , vol. 23,pp. 603–612, 2008.[50] R. Holtzer, C. Wang, and J. Verghese, “Performance varianceon walking while talking tasks: Theory, ﬁndings, and clinicalimplications,”

Age (Dordr) , vol. 36, no. 1, pp. 373–381, 2014.[51] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V.Dubourg, et al. , “Scikit-learn: Machine learning in python,” theJournal of machine Learning research , vol. 12, pp. 2825–2830,2011.[52] H. Ayaz, P. A. Shewokis, A. Curtin, M. Izzetoglu, K. Izzetoglu,and B. Onaral, “Using mazesuite and functional near infraredspectroscopy to study learning in spatial navigation,”

JoVE(Journal of Visualized Experiments). , vol. 8, no. 56, e3443,2011.[53] H. Ayaz, M. Izzetoglu, and S. M. Platek, “Registering fnirdata to brain surface image using mri templates,” in

Conf ProcIEEE Eng Med Biol Soc , 2006, pp. 2671–4.[54] B. Molavi and G. A. Dumont, “Wavelet-based motion arti-fact removal for functional near-infrared spectroscopy,”

Phys.Meas. , vol. 33, no. 2, pp. 259–70, 2012.[55] A. M. Chiarelli, E. L. Maclin, M. Fabiani, and G. Gratton, “Akurtosis-based wavelet algorithm for motion artifact correctionof fnirs data,”

Neuroimage , vol. 112, pp. 128–137, 2015.[56] F. Scholkmann and M. Wolf, “General equation for the differ-ential pathlength factor of the frontal human head dependingon wavelength and age,”

Journal of biomedical optics , vol. 18,no. 10, p. 5004, 2013.[57] F. Scholkmann, S. Spichtig, T. Muehlemann, and M. Wolf,“How to detect and reduce movement artifacts in near-infraredimaging using moving standard deviation and spline interpola-tion,”

Physiological measurement , vol. 31, no. 5, p. 649, 2010.[58] M. A. Y¨ucel, S. J., C. M. Aasted, P. Y. Lin, D. Borsook,L. Becerra, and D. A. Boas, “Mayer waves reduce the accuracyof estimated hemodynamic response functions in functionalnear-infrared spectroscopy,”

Biomedical optics express , vol. 7,no. 8, pp. 3078–3088, 2016.[59] D. Che, Q. Liu, K. Rasheed, and X. Tao, “Decision tree andensemble learning algorithms with their applications in bioin-formatics,” in

Software tools and algorithms for biologicalsystems , Springer, 2011, pp. 191–199.[60] T. Prasartvit, A. Banharnsakun, B. Kaewkamnerdpong, andT. Achalakul, “Reducing bioinformatics data dimension withabc-knn,”