On using the Microsoft Kinect TM sensors in the analysis of human motion
aa r X i v : . [ phy s i c s . m e d - ph ] M a y On using the Microsoft Kinect TM sensors inthe analysis of human motion M.J. Malinowski a , E. Matsinos ∗ a , S. Roth b a Institute of Mechatronic Systems, School of Engineering, Zurich University ofApplied Sciences (ZHAW), Technikumstrasse 5, CH-8401 Winterthur, Switzerland b Institute of Applied Information Technology, School of Engineering, ZurichUniversity of Applied Sciences (ZHAW), Steinberggasse 13, CH-8401 Winterthur,Switzerland ∗ E-mail: evangelos[DOT]matsinos[AT]zhaw[DOT]ch,evangelos[DOT]matsinos[AT]sunrise[DOT]ch
Abstract
The present paper aims at providing the theoretical background required for inves-tigating the use of the Microsoft Kinect TM (‘Kinect’, for short) sensors (original andupgraded) in the analysis of human motion. Our methodology is developed in such away that its application be easily adaptable to comparative studies of other systemsused in capturing human-motion data. Our future plans include the application ofthis methodology to two situations: first, in a comparative study of the performanceof the two Kinect sensors; second, in pursuing their validation on the basis of com-parisons with a marker-based system (MBS). One important feature in our approachis the transformation of the MBS output into Kinect-output format, thus enablingthe analysis of the measurements, obtained from different systems, with the samesoftware application, i.e., the one we use in the analysis of Kinect-captured data;one example of such a transformation, for one popular marker-placement scheme(‘Plug-in Gait’), is detailed. We propose that the similarity of the output, obtainedfrom the different systems, be assessed on the basis of the comparison of a numberof waveforms, representing the variation within the gait cycle of quantities whichare commonly used in the modelling of the human motion. The data acquisitionmay involve commercially-available treadmills and a number of velocity settings:for instance, walking-motion data may be acquired at 5 km/h, running-motion dataat 8 and 11 km/h. We recommend that particular attention be called to systematiceffects associated with the subject’s knee and lower leg, as well as to the ability ofthe Kinect sensors in reliably capturing the details in the asymmetry of the motionfor the left and right parts of the human body. The previous versions of the studyhave been withdrawn due to the use of a non-representative database. PACS: ey words:
Biomechanics, motion analysis, treadmill, marker-based system,Kinect
Microsoft Kinect TM ◦ . Interestingly, the authors also made a pointregarding the depth measurements of the Kinect sensor, which are subject toincreasing uncertainty with increasing distance from the sensor, reaching amaximal value of about 4 cm at the most distal position of the sensor, which(according to the specifications) should not exceed about 4 . • Cavagna and Kaneko [6] studied the efficiency of motion in terms of themechanical work done by the subject’s muscles. • Cavanagh and Lafortune [7] studied the ground reaction forces in runningat about 16 . • Cairns, Burdett, Pisciotta, and Simon [8] analysed the motion of ten com-petitive race-walkers in terms of the ankle flexion, of the knee and hip angles,as well as of the pelvic tilt, obliquity, and rotation. The work discussed themain differences between walking and race-walking, and provided explana-tions for the peculiarity of the motion in the latter case, invoking the goalof achieving higher velocities (than in normal walking) while maintainingdouble support with fully-extended knee and suppressing the vertical un-dulations of the subject’s centre of mass (CM). • ˜Ounpuu [9] discussed important aspects of the biomechanics of gait, includ-ing the variation of relevant physical quantities within the gait cycle. Thatwork may be used as a starting point for those in seek of an overview inthe topic. It must be borne in mind that the subjects used in Ref. [9] werechildren. • Novacheck [10] also provided an introduction to the biomechanics of motion.Figs. 5 and 6 of that work contain the variation of the important angles(projections on the coronal, sagittal, and transverse planes) within the gaitcycle, at three velocities: 4 .
32 km/h (walking), 11 .
52 km/h (running), and14 .
04 km/h (sprinting). Fig. 9 therein provides the variation of the jointmoments and powers (kinesics) in the sagittal plane within the gait cycle. • In a subsequent article [11], Schache, Bennell, Blanch, and Wrigley investi-gated the inter-relations in the movement of the lumbar spine, pelvis, andhips in running, aiming at optimising the rehabilitation process in case ofrelevant injuries.It is rather surprising that only one study, addressing the possibility of involv-ing Kinect in the analysis of walking and running motion (i.e., not in a staticmode or in slow motion), has appeared so far [12]. Using similar methodologyto the one proposed herein, the authors in this study came to the conclusionthat the original sensor is unsuitable for applications requiring high precision;after analysing preliminary data, we have come to the same conclusion. Ofcourse, it remains to be seen whether any improvement (in the overall qualityof the output) can be obtained with the upgraded sensor.Our aim in the present paper is to develop the theoretical background re-quired for the comparison of the output of two measurement systems used (orintended to be used) in the analysis of human motion; we will give all impor-tant definitions and outline meaningful tests. Although this methodology hasbeen developed for a direct application in the case of the Kinect sensors, otherapplications may use this scheme of ideas in order to obtain suitable solutionsin other cases. The tests we propose in Section 5 should be sufficient to identifythe important differences in the output of two such measurement systems. Assuch, they should (in a comparative study) pinpoint the essential differencesin the performance of the two Kinect sensors or (if the second measurementsystem is an MBS) enable the validation of the Kinect sensors.4he material in the present paper has been organised as follows. In Section2, the output of the two Kinect sensors is described; subsequently, the out-put, obtained with one popular marker-placement scheme from an MBS, isdetailed. A scheme of association of these two outputs is developed. The def-initions of important quantities, used in the description of the motion, aregiven in Section 3. Section 4 describes one possibility for the data acquisition;in the second part of this section, we explain how one may extract character-istic forms (waveforms) from the motion data, representative of the subject’smotion within one gait cycle. In Section 5, we outline our proposal for thenecessary tests, to be performed on the waveforms obtained in the previoussection. The last part contains a short summary of the paper and outlines twodirections in future research.
To enable the analysis of the data with the same software application, theMBS output, obtained for the specific marker-placement scheme described inSubsection 2.2, will be transformed into Kinect-output format, using reason-able associations between the Kinect nodes and the marker locations; due tothe removal of the constant offsets in the data analysis (see Subsection 4.2.3),the exact matching between the Kinect nodes and the locations at which thesemarkers are placed is not essential.
In the original Kinect sensor, the skeletal data (‘stick figure’) of the outputcomprises 20 time series of three-dimensional (3D) vectors of spatial coordi-nates, i.e., measurements of the ( x , y , z ) coordinates of the 20 nodes which thesensor associates with the axial and appendicular parts of the human skeleton.In coronal (frontal) view of the subject (sensor view), the Kinect coordinatesystem is defined with the x axis (medial-lateral) pointing to the left (i.e., tothe right part of the body of the subject being viewed), the y axis (vertical)upwards, and the z axis (anterior-posterior) away from the sensor, see Fig. 1.The nodes 1 to 4 are main-body nodes, identified as HIP CENTER, SPINE,SHOULDER CENTER, and HEAD. The nodes 5 to 8 relate to the left arm:SHOULDER LEFT, ELBOW LEFT, WRIST LEFT, and HAND LEFT; sim-ilarly, the nodes 9 to 12 on the right arm are: SHOULDER RIGHT, EL-BOW RIGHT, WRIST RIGHT, and HAND RIGHT. The eight remainingnodes pertain to the legs, the first four to the left (HIP LEFT, KNEE LEFT,5NKLE LEFT, and FOOT LEFT), the remaining four to the right (HIP RIGHT,KNEE RIGHT, ANKLE RIGHT, and FOOT RIGHT) leg of the subject .The nodes of the original sensor may be seen in Fig. 2.In the upgraded Kinect sensor, some modifications have been made in the nam-ing (and placement) of some of the nodes. The original node HIP CENTERhas been replaced by SPINE BASE (and appears slightly shifted downwards);the original node SPINE has been replaced by SPINE MID (and appearsslightly shifted upwards); finally, the original node SHOULDER CENTERhas been replaced by NECK (and also appears slightly shifted upwards). Fivenew nodes have been appended at the end of the list (which was a goodidea, as this action enables easy adaption of the analysis code processing theKinect output), one of which is a body node (SPINE SHOULDER, node 21),whereas four nodes pertain to the subject’s hands, HAND TIP LEFT (22),THUMB LEFT (23), HAND TIP RIGHT (24), and THUMB RIGHT (25).Evidently, emphasis in the upgraded sensor is placed on the orientation of thesubject’s hands (i.e., on gesturing).In both versions, parallel to the captured video image, Kinect acquires aninfrared image, generated by the infrared emitter (seen on the left of theoriginal sensor in Fig. 1); captured with a CCD camera, this infrared imageprovides the means of extracting information on the depth z . The samplingrate in the Kinect output (for the video and the skeletal data, for both versionsof the sensor) is 30 Hz.The description of the algorithm, used in the determination of the 3D positionsof the skeletal joints of the subject being viewed by the original sensor, may befound in Ref. [13]. Candidate values for the 3D positions of each skeletal jointare obtained via the elaborate analysis of each depth image separately. Thesepositions may be used as starting points in an analysis featuring the temporaland kinematic coherence in the subject’s motion; it is not clear whether sucha procedure has been hardcoded in the preprocessing (hardware processing) ofthe captured data. Shotton et al. define 31 body segments covering the humanbody, some of which are used in order to localise skeletal joints, some to fill thegaps or yield predictions for other joints. In the development of their algorithm,Shotton et al. generated static depth images of humans (of children and adults)in a variety of poses (synthetic data). The application of their method resultsin the extraction of probability-distribution maps for the 3D positions of theskeletal joints; their joint proposals represent the modes (maxima) in thesemaps. According to the authors, the probability-distribution maps are bothaccurate and stable, even without the imposition of temporal or kinematicconstraints. It must be borne in mind that the ‘3D positions of the joints’ of The subject’s left and right parts refer to what the subject perceives as the leftand right parts of his/her body. ζ c ) of 39 mm (see end of Section 3 of Ref. [13]). Althoughthe ‘computational efficiency and robustness’ of the procedure are praisedin Ref. [13], it remains to be seen whether results of similar quality can beobtained in dynamic applications (e.g., when the subject is in motion). Featuring several cameras, viewing the subject from different directions, MBSsprovide powerful object-tracking solutions, yielding high-quality, low-latencydata, at frame rates exceeding that of the Kinect sensors. Such systems reliablyreconstruct the time series of the spatial coordinates of markers (reflectiveballs, flat markers, active markers, etc.) directly attached to the subject’s bodyor to special attire worn by the subject. One popular placement scheme of themarkers, known as ‘Plug-in Gait’ [14], uses a total of 39 markers (see Table 1).The MBS output for these markers may be transformed into Kinect-outputformat (for simplicity, we refer to the naming of the nodes in the originalKinect sensor) by using the following association scheme. • The Kinect-equivalent HEAD is assigned to the midpoint of the markerpositions LFHD and RFHD. The marker positions LBHD and RBHD, per-taining to the back of the head, are not used. • The Kinect-equivalent SHOULDER CENTER is taken to be the markerposition CLAV. The marker positions C7 and RBAK, which are placedon the back part of the body, are not used in comparisons with dataacquired with the original Kinect sensor; in the upgraded Kinect sensor,SPINE SHOULDER may be identified with C7. • The Kinect-equivalent SPINE is estimated as an average of the markerpositions T10, LPSI, and RPSI. • The Kinect-equivalent SHOULDER LEFT and SHOULDER RIGHT aretaken to be the marker positions LSHO and RSHO, respectively. Regardingthe upper part of the body, the marker positions LUPA, LFRA, RUPA, andRFRA are not used. • The Kinect-equivalent ELBOW LEFT and ELBOW RIGHT are taken tobe the marker positions LELB and RELB, respectively. • The Kinect-equivalent WRIST LEFT and WRIST RIGHT are assigned tothe midpoints of the marker positions LWRA and LWRB, and of RWRAand RWRB, respectively. • The Kinect-equivalent HAND LEFT and HAND RIGHT are taken to bethe marker positions LFIN and RFIN, respectively. • The Kinect-equivalent KNEE LEFT and KNEE RIGHT are taken to be7he corrected (according to Ref. [15]) marker positions LKNE and RKNE,respectively. • The Kinect-equivalent ANKLE LEFT and ANKLE RIGHT are taken to bethe corrected (according to Ref. [15]) marker positions LANK and RANK,respectively. • The Kinect-equivalent FOOT LEFT and FOOT RIGHT are taken to bethe marker positions LTOE and RTOE, respectively. • The Kinect-equivalent HIP LEFT and HIP RIGHT positions are evaluatedfrom those of the marker positions LASI, RASI, LPSI, and RPSI, accordingto Ref. [15]. Regarding the procedure set forth in that paper, a few commentsare due. The positions of the hips are obtained therein using a model forthe geometry of the pelvis, featuring three parameters ( θ , β , and C ), thevalues of which had been obtained from a statistical analysis of radiographicdata of 25 subjects; however, the values of these parameters are poorlyknown (see page 583 of Ref. [15]). A simple analysis of the uncertaintiesgiven in Ref. [15] shows that, when following that method, the resultinguncertainties in the estimation of the positions of the hips are expected toexceed about 10 mm in each spatial direction. As a result, the positionsof the hips, calculated from the MBS output according to that procedure,should not be considered as accurate as the rest of the information obtainedfrom the MBS. More importantly, it is not evident how the movement ofthe pelvis reflects itself in the motion of the four markers which are usedin the extraction of its position and orientation; it is arguable whether anymarkers, placed on the surface of the human body, can capture the pelvicmotion accurately. • The Kinect-equivalent HIP CENTER is estimated as an average of theKinect-equivalent HIP LEFT and HIP RIGHT, and of the marker positionSTRN. • Regarding the lower part of the body, the marker positions LTHI, LTIB,LHEE, RTHI, RTIB, and RHEE are not used.In regard to the markers placed on the human extremities, it must be bornein mind that their positions are also affected by rotations, not only by thetranslational motion of these extremities; the markers are placed at some dis-tance from the actual rotation axes, coinciding with the longest dimension ofthe upper- and lower-extremity bones. For instance, rotating the left humerusby 90 ◦ around its long axis (assumed, for the sake of the argument, to alignwith the vertical axis y ) will result in a movement of the marker LELB alonga circular arc, thus affecting its x and z coordinates. On the other hand, theKinect nodes are rather placed on (or, in any case, closer to) the rotationaxes; as a result, it is expected that they are less affected by such rotations.As such effects cannot be easily accounted for, it is evident that the associa-tion scheme, proposed in the present section, can only lead to an approximatecomparison of the output of the two measurement systems.8 Definitions and scoring options for assessing the similarity ofwaveforms
We will next describe how one may obtain estimates of three important anglesin the sagittal plane, representing the level of flexion of the trunk, of the hip,and of the knee. Estimates for the left and right parts of the body will beobtained for the hip and knee angles. • Trunk angle . This angle is obtained from the ( y , z ) coordinates of fourpoints, comprising the nodes 1 (HIP CENTER), 3 (SHOULDER CENTER),and two midpoints, namely of the nodes 13 (HIP LEFT) and 17 (HIP RIGHT),and of the nodes 5 (SHOULDER LEFT) and 9 (SHOULDER RIGHT). Anunweighted least-squares fit on the ( y , z ) coordinates of these four points yields the slope α (with respect to the y axis) of the optimal straight line.The trunk angle is defined as θ T = − arctan( α ); θ T = 0 ◦ in the uprightposition, positive for forward leaning. • Hip angle . Two definitions of the hip angle have appeared in the literature:the angle may be defined with respect to the trunk or to the y axis; in thepresent paper, we adopt the latter definition. If the relevant hip coordinatesare ( y H , z H ) and those of the knee are ( y K , z K ), the hip angle is obtained viathe expression: θ H = arctan z H − z K y H − y K ! . (1)Two hip angles will be obtained: the left-hip angle θ HL uses the nodes 13(HIP LEFT) and 14 (KNEE LEFT); the right-hip angle θ HR uses the nodes17 (HIP RIGHT) and 18 (KNEE RIGHT). • Knee angle . This is the angle between the femur (thigh) and the tibia(shank). Two definitions of the knee angle have appeared in the literature:the knee angle may be 180 ◦ or 0 ◦ in the extended position of the knee; weadopt the latter definition. It will shortly become clear why we make use ofboth the sine and the cosine of the knee angle: β ≡ sin( θ K ) = ( y A − y K )( z K − z H ) − ( y K − y H )( z A − z K ) L f L t (2)and β ≡ cos( θ K ) = ( y K − y H )( y A − y K ) + ( z K − z H )( z A − z K ) L f L t , (3) As it is not clear at which depth (and on which basis) the original sensor placesnode 2 (SPINE), this node should not be included in estimations involving the z coordinate. y A , z A ), and L f and L t are the projected lengths of the femur and the tibia onto the sagittal plane,respectively: L f = q ( y K − y H ) + ( z K − z H ) and L t = q ( y A − y K ) + ( z A − z K ) . We define the knee angle as: θ K = arccos( β ) , for β > β ) , otherwise . (4)Two knee angles will be obtained: the left-knee angle θ KL uses the nodes 13(HIP LEFT), 14 (KNEE LEFT), and 15 (ANKLE LEFT); the right-kneeangle θ KR uses the nodes 17 (HIP RIGHT), 18 (KNEE RIGHT), and 19(ANKLE RIGHT).We define four angles in the coronal plane: the lateral trunk, the lateral hip,the lateral knee, and the lateral pelvic angles; the lateral pelvic angle is alsocalled pelvic obliquity. Estimates for the left and right parts of the body willbe obtained for the lateral hip and lateral knee angles. • Lateral trunk angle . The same four points, which had been used in theevaluation of the trunk angle in the sagittal plane, are also used in extractingan estimate of the lateral trunk angle; of course, the ( x , y ) coordinates ofthese points must be used now. In addition to these nodes, node 2 (SPINE)may also be used. The lateral trunk angle is defined with respect to the y axis; θ lT = 0 ◦ in the upright position, positive for tilting in the positive x direction (tilt of the subject to his/her right). • Lateral hip angle . This angle describes hip abduction/adduction in thecoronal plane. Similarly to the hip angle in the sagittal plane, two definitionsof the lateral hip angle are possible: the angle may be defined with respectto the trunk or to the y axis; herein, we adopt the latter definition. If therelevant hip coordinates are ( x H , y H ) and those of the knee are ( x K , y K ), thelateral hip angle is obtained via the expression: θ lH = − arctan x H − x K y H − y K ! . (5)Two lateral hip angles will be obtained: the lateral left-hip angle θ lHL usesthe nodes 13 (HIP LEFT) and 14 (KNEE LEFT); the lateral right-hip angle θ lHR uses the nodes 17 (HIP RIGHT) and 18 (KNEE RIGHT). • Lateral knee angle . This is the projection of the angle between the femurand the tibia onto the coronal plane. θ lK = arcsin ( x K − x H )( y A − y K ) − ( x A − x K )( y K − y H ) L f L t ! , (6)10here L f and L t are now redefined as the projected lengths of the femurand the tibia onto the coronal plane, respectively: L f = q ( x K − x H ) + ( y K − y H ) and L t = q ( x A − x K ) + ( y A − y K ) . The angle is defined positive when, with respect to the femur direction, theankle appears (in coronal view) ‘further away’ from the subject’s body. Ofcourse, two lateral knee angles may be defined, corresponding to the leftand right parts of the human body, θ lKL and θ lKR , respectively. • Pelvic obliquity . This angle is defined as: θ lP = arctan (cid:18) y HR − y HL x HR − x HL (cid:19) , (7)where ( x HL , y HL ) and ( x HR , y HR ) are the ( x , y ) coordinates of the left andright hips, respectively.In regard to motion analysis, a few additional angles may be found in the lit-erature: the pelvic tilt and the angle describing the plantarflexion/dorsiflexionof the foot are defined in the sagittal plane; the hip, pelvic, and foot rotationsin the transverse plane. We do not believe that the Kinect output can yield re-liable (if any) information on these quantities. The knee angle, obtained fromthe 3D vectors ( x K − x H , y K − y H , z K − z H ) and ( x A − x K , y A − y K , z A − z K ),will be called ‘knee angle in 3D’; it is easily evaluated using expressions anal-ogous to Eqs. (2)-(4). In view of the fact that the angle between 3D vectorsis invariant under rotations (SO(3) rotation group) and translations in 3D,the knee angle in 3D is independent of the details regarding the alignmentbetween the relevant coordinate systems (e.g., between the Kinect sensor andthe MBS coordinate systems).Two last comments are due.(1) The trunk angle θ T is positive in walking and running; it is difficult tomaintain balance if one leans backwards while moving forwards. However,the trunk angle, obtained from the Kinect output, is frequently negative.This is due to the fact that the nodes of the Kinect output, which enterthe evaluation of θ T , do not represent locations on the spine.(2) Due to the properties of the knee joint, the knee angle is expected tosatisfy the condition θ K ≥
0. In practice, even in the fully-extendedposition, θ K remains (for many subjects) positive; knee hyperextension isa deformity. However, owing to the placement of the nodes by Kinect, theknee angle (estimated from the Kinect output) may occasionally comeout negative. To examine further such cases, we retain Eq. (4) in theevaluation of the knee angle. 11ne possibility to avoid these effects is to extract robust measures for theselected physical quantities from the data. For instance, one could use thevariation of these quantities within the gait cycle or even their range of motion(RoM), i.e., the difference between the maximal and minimal values within thegait cycle. As long as an extremity moves as one rigid object, such measures(being differences of two values) are not affected by a constant bias which maybe present in the data. We propose that the similarity of corresponding waveforms (representing thevariation of a quantity within the gait cycle, see Subsection 4.2.3) be judged onthe basis of one (or more) of the following scoring options: Pearson’s correlationcoefficient, the Zilliacus error metric, the RMS error metric, Whang’s score,and Theil’s score. Assuming that a (0-centred) waveform from measurementsystem 1 (e.g., from one of the Kinect sensors) is denoted by k i and thecorresponding (0-centred) waveform from measurement system 2 (e.g., fromthe MBS) by q i , these five scoring options are defined in Eqs. (8)-(12) (fordetails on the original works, see Ref. [16]); all sums are taken from i = 1 to N , where N stands for the number of bins used in the histograms yieldingthese waveforms. (In our analyses, we normally use N = 50.)Pearson’s correlation coefficient r = P k i q i qP k i qP q i (8)Zilliacus error metric d z = P | k i − q i | P | q i | (9)RMS error metric d rms = P ( k i − q i ) P q i (10)Whang’s score d w = P | k i − q i | P | q i | + P | k i | (11)Theil’s score d t = P ( k i − q i ) P q i + P k i (12)In case of identical waveforms (from the two measurement systems), r = 1;all other scores vanish ( d z = d rms = d w = d t = 0).Evidently, Whang’s score is the symmeterised version of the Zilliacus errormetric, whereas Theil’s score is the symmeterised version of the RMS errormetric. Although the differences between the Zilliacus and the RMS errormetric are generally small (as are those between Whang’s and Theil’s scores),we make use of all aforementioned scoring options in our research programme.12ther ways for testing the similarity of the output of different measurementsystems have been put forth. For instance, some authors favour the use of the‘coefficient of multiple correlation’ (CMC) [17,18,19,20]. Ferrari, Cutti, andCappello [20] define the CMC as:CMC = " − P Pi =1 P Wj =1 P Nk =1 ( w ijk − ¯ w .jk ) / ( W N ( P − P Pi =1 P Wj =1 P Nk =1 ( w ijk − ¯ w .j. ) / ( W ( P N − / , (13)where the triple array w ijk contains the entire data, i.e., P W waveforms ofdimension N ( N depends on the gait cycle in Ref. [20]); P is the number ofmeasurement systems being used in the study (‘protocols’, in the languageof Ref. [20]) and W denotes the number of waveforms obtained within eachmeasurement system. The averages ¯ w .jk and ¯ w .j. in Eq. (13) are defined as:¯ w .jk = 1 P P X i =1 w ijk , (14)¯ w .j. = 1 N N X k =1 ¯ w .jk . (15)Unlike Pearson’s correlation coefficient, ‘directional information’ for the as-sociation between the tested quantities is lost when using the CMC in ananalysis. In its first definition [21], the CMC was bound between 0 and 1.However, the quantity CMC, obtained with Eq. (13), is frequently imaginary(the ratio of the triple sums may be larger than 1); this is due to the use of ¯ w .j. ,instead of the grand mean (along with the normalisation factor W ( P N − W P N − ρ , where − < ρ <
1. The test when ρ = 0 involves the transformation: t = r s N − − r . The variable t is expected to follow the t -distribution (Student’s distribution)with N − ρ = 0 involve Fisher’stransformation; the details may be found in standard textbooks on Statistics.No tests are possible when ρ = 1, i.e., when attempting to judge the good-ness of the association between waveforms, if ideally the waveforms should13e identical. The only tests which can be carried out in such a case are thoseinvolving ρ = 0, i.e., investigating the presence of a statistically-significantcorrelation between the tested waveforms when the null hypothesis for no sucheffects is assumed to hold. In practice, the one-sided tests for N − r & . r & . χ function to assess the goodness of the association. The variability of the outputacross different sensors could also be assessed and this additional uncertaintycould be taken into account in the tests. b) Another possibility would be toinvoke analysis of variance (ANOVA), defining the reduced ‘within-treatments’variation as ˜ V w = P X i =1 W X j =1 N X k =1 ( w ijk − ¯ w i.k ) / ( P N ( W − V b = P X i =1 W X j =1 N X k =1 ( ¯ w i.k − ¯ w ..k ) / ( N ( P − . (17)Appearing in these expressions are two average waveforms: the average wave-form obtained with measurement system i :¯ w i.k = 1 W W X j =1 w ijk (18)and the grand-mean waveform:¯ w ..k = 1 P P X i =1 ¯ w i.k . (19)The ratio F = ˜ V b / ˜ V w is expected to follow Fisher’s distribution with N ( P − P N ( W −
1) DoF. The resulting p-value enables a decision on the ac-ceptance or rejection of the null hypothesis, i.e., of the observed effects beingdue to statistical fluctuation. c) A third possibility would be to histogram the difference of corresponding waveforms obtained from the two measurementsystems within the same gait cycle j ; the decision on whether the final wave-form is significantly different from 0 can be made on the basis of a numberof tests, including χ tests for the constancy and shape of the result of thehistogram. Nevertheless, to retain simplicity in the present paper, we have de-cided to make use in the data analysis of the simple scoring options introducedby Eqs.(8)-(12). 14 Data acquisition and analysis
The data acquisition may involve subjects walking and running on commercially-available treadmills. The placement of the treadmill must be such that the mo-tion of the subjects be neither hindered nor influenced in any way by close-byobjects. Prior to the data-acquisition sessions, the two measurement systemsmust be calibrated and the axes of their coordinate systems be aligned (spa-tial translations are insignificant). The measurement systems must then beleft untouched throughout the data acquisition.The original Kinect sensor also provides information on the elevation (pitch)angle at which it is set. During our extensive tests, we discovered that thisinformation is not reliable, at least for the particular device we used in ourexperimentation. To enable the accurate determination of the elevation angleof the Kinect sensor, we set forth a simple procedure. The subject stands (inthe upright position, not moving) at a number of positions on the treadmillbelt, and static measurements (e.g., 5 s of Kinect data) at these positions areobtained and averaged. The elevation angle of the Kinect sensor may be easilyobtained from the slope of the average (over a number of Kinect nodes, e.g., ofthose pertaining to the hips, knees, and ankles) ( y , z ) coordinates correspond-ing to these positions. The output data, obtained from the Kinect sensor, mustbe corrected (off-line) accordingly, to yield the appropriate spatial coordinatesin the ‘untilted’ coordinate system. To prevent Kinect from re-adjusting theelevation angle during the data acquisition (which is a problematic feature),we attach its body unto a plastic structure mounted on a tripod.It is worth mentioning that, as we are interested in capturing the motion of thesubject’s lower legs (i.e., of the ankle and foot nodes), the Kinect sensors mustbe placed at such a height that the number of lost lower-leg signals be keptreasonably small. Our past experience dictates that the Kinect sensor mustbe placed close to the minimal height recommended by the manufacturer,namely around 2 ft off the (treadmill-belt) floor. Placing the sensor higher(e.g., around the midpoint of the recommended interval, namely at 4 ft off thetreadmill-belt floor) leads to many lost lower-leg signals leg (the ankle andfoot nodes are not tracked), as the lower leg is not visible by the sensor duringa sizeable fraction of the gait cycle, shortly after the toe-off (TO) instant.The Kinect sensor may lose track of the lower parts of the subject’s extremities(wrists, hands, ankles, and feet) for two reasons: either due to the particularityof the motion of the extremity in relation to the position of the sensor (e.g., theidentification of the elbows, wrists, and hands becomes problematic in some15ostures, where the viewing angle of the ulnar bone by Kinect is small) or dueto the fact that these parts of the human body are obstructed (behind thesubject) for a fraction of the gait cycle. Assuming that these instances remainrare (e.g., below about 3% of the available data in each time series, namely oneframe in 30), the missing values may be reliably obtained (interpolated) fromthe well-determined (tracked) data. Although, when normalised to the totalnumber of the available values, the untracked signals usually appear ‘harm-less’ as they represent a small fraction of the total amount of measurements,particular attention must paid in order to ensure that no node be significantlyaffected, as in such a case the interpolation might not yield reliable results.A few velocities may be used in the data acquisition: walking-motion datamay be acquired at 5 km/h; running-motion data at 8 and 11 km/h. At eachvelocity setting, the subject must be given time (e.g., 1 min) to adjust his/hermovements comfortably to the velocity of the treadmill belt. To obtain reli-able waveforms from the Kinect-captured data, we recommend measurementsspanning at least 2 min at each velocity. The subject’s motion is split into two components: the motion of the subject’sCM and the motion of the subject’s body parts relative to the CM. Of course,the accurate determination of the coordinates of the subject’s physical CMfrom the Kinect or MBS output is not possible. As a result, the obtained CMshould rather be considered to be one reference point, moving synchronouslywith the subject’s physical CM. Ideally, these two points are related via asimple spatial translation (involving an unknown, yet constant 3D vector) atall times; if this condition is fulfilled, the obtained CM may be safely identifiedas the subject’s physical CM, because a constant spatial separation betweenthese two points does not affect the evaluation of the important quantities usedin the modelling of the motion. At all time frames, we obtain the coordinatesof the subject’s CM from seven nodes, namely from the first three main-body nodes 1 to 3, from the shoulder nodes 5 and 9, as well as from thehip nodes 13 and 17. Being subject to considerable movement in walking andrunning motion, the node 4 (HEAD) is not included in the determinationof the coordinates of the subject’s CM. Prior to further processing, the CMoffsets ( x CM , y CM , z CM ) are removed from the data; thus, the motion is definedrelative to the subject’s CM at all times. (The angles, defined in Subsection3.1, involve differences of corresponding coordinates; as a result, they are notaffected by the removal of the CM offsets from the data.) The largeness of the‘stray’ motion of the subject may be assessed on the basis of the root-mean-square (rms) of the x CM , y CM , and z CM distributions.16o investigate the stability of the motion over time, the data may be split intosegments. In our data analysis, the duration of these segments may be chosenat will; up to the present time, we have made use of 10 and 12 s segments inthe analysis of the Kinect-captured data. Within each of these segments, in-formation which may be considered ‘instantaneous’ is obtained, thus enablingan examination of the ‘stability’ of the subject’s motion at the specific veloc-ity (see Subsection 4.2.2). The symmetry of the motion for the left and rightparts of the human body may be investigated by comparing the correspond-ing waveforms. Finally, the largeness of the motion of the extremities maybe examined on the basis of the RoMs obtained from these waveforms. Wesubsequently address some of these issues in somewhat more detail. Ideally, the period of the gait cycle T is defined as the time lapse between suc-cessive time instants corresponding to identical postures of the human body(position and direction of motion of the human-body parts with respect tothe CM). (Of course, the application of ‘identicalness’ in living organisms isillusional; no two postures can ever be expected to be identical in the formalsense.) We define the period of the gait cycle as the time lapse between suc-cessive most distal positions z of the same lower leg (i.e., of the ankle or of theankle-foot midpoint). The arrays of time instants, at which the left or rightlower leg is at its most distal position with respect to the subject’s instanta-neous CM, may be used in timing the waveforms corresponding to the left orright part of the human body.The period of the gait cycle is related to two other quantities which are usedin the analysis of motion data. • The stride length L is the product of the velocity v and the period of thegait cycle: L = vT . • The cadence C is defined as the number of steps per unit time; one commonly-used unit is the number of steps per min. It has been argued (e.g., by Daniels[23]) that the minimal cadence in running motion should be (optimally) 180steps per min, implying a maximal period of the gait cycle of 2 / To examine the constancy of the period of the gait cycle throughout eachsession (according to our definition, each session involves one velocity), thevalues of the instantaneous period of the gait cycle are submitted to furtheranalysis. The overall constancy is judged on the basis of a simple χ test,assessing the goodness of the representation of the input data by one overallaverage value; the resulting p-value is obtained from the minimal value χ Using the time-instant arrays from the analysis of the left and right lower-leg signals (as described in Subsection 4.2.1), each time series (pertaining to aspecific node and spatial direction) is split into one-period segments, which aresubsequently superimposed and averaged, to yield a representative movementfor the node and spatial direction over the gait cycle. Finally, one averagewaveform for each node and spatial direction is obtained, representative of themotion at the particular velocity. The investigation of the asymmetry in themotion rests on the comparison of the waveforms obtained for correspondingleft and right nodes, and spatial directions.Average waveforms for all nodes and spatial directions, representing the vari-ation of the motion of that node (in 3D) within the gait cycle, are extractedseparately for the left and right nodes of the extremities; waveforms are alsoextracted for the important angles introduced in Subsection 3.1. As mentionedin Subsection 4.2.1, the time instant at which the subject’s left (right) lowerleg is at its most distal position (with respect to the subject’s CM) marks thestart of each gait cycle (as well as the end of the previous one), suitable forthe study of the left (right) part of the human body. In case that left/right(L/R) information is not available (as, for example, for the trunk angle), theright lower leg may be used in the timing. All waveforms are subsequently0-centred. The removal of the average offsets is necessary, given that the twomeasurement systems yield output which cannot be thought of as correspond-ing to the same anatomical locations. For instance, according to the ‘Plug-inGait’ placement scheme, the markers for the shoulder are placed on top of theacromioclavicular joints; the Kinect nodes SHOULDER LEFT and SHOUL-DER RIGHT match better the physical locations of the shoulder joints.The left and right waveforms yield two new waveforms, identified as the ‘L/Raverage’ (LRA) and the ‘right-minus-left difference’ (RLD); if emphasis isplaced on the extraction of asymmetrical features in the motion from theKinect output, the validation of the RLDs is mandatory.18
Comparison of the waveforms obtained from the two measure-ment systems
The comparisons of the waveforms obtained for the nodes of the extremitiesfrom the two measurement systems, as well as of those obtained for the impor-tant angles defined in Subsection 3.1, are sufficient in providing an estimate ofthe degree of the association of the output of the systems under investigation.If one of these systems is an MBS, such a comparison enables decisions onwhether reliable information may be obtained from the tested Kinect sensor(assumed to be the second system); a common assumption in past studies[2,4,5,12] is that the inaccuracy of the MBS output is negligible compared tothat of the Kinect sensor. (Of course, to obtain from the marker positionsinformation on the internal motion, i.e., on the motion of the human skeletalstructure, is quite another issue; we are not aware of works addressing thissubject in detail.) As already mentioned, the theoretical background, devel-oped in the present paper, also applies to a comparative study of the twoKinect sensors, identifying the similarities and the differences in their perfor-mance, but (of course) it cannot easily enable decisions on which of the twosensors performs better. In summary, irrespective of whether one of the twomeasurement systems is an MBS or not, the same tests are performed, butthe interpretation of the results is different. We propose tests as follows: • Identification of the node levels of the extremities and spatial directions withthe worst association (e.g., with a similarity-index value in the first quartileof the distribution) between the waveforms of the two measurement systems. • Determination of the similarity of the association between the waveformspertaining to the upper and lower parts of the human body. • Determination of the similarity of the association between the waveformspertaining to the three spatial directions x , y , and z . • Determination of the similarity of the association between the waveformsobtained from the raw lower-leg signals.We propose separate tests for the LRA and RLD waveforms (see end of Sub-section 4.2.3); if the reliable extraction of the asymmetry of the motion is notrequired in a study, one may use only the LRA waveforms. After studying thegoodness of the association between the waveforms at fixed velocity, velocity-dependent effects may be investigated. We will now provide additional detailson each of these tests.The goodness of the association between the waveforms, obtained from thetwo measurement systems for the eight node levels of the extremities (SHOUL-DER, ELBOW, WRIST, HAND, HIP, KNEE, ANKLE, and FOOT) and spa-tial directions, may be assessed as follows. Separately for each of the fivescoring options of Subsection 3.2, for each velocity setting, and for each spa-19ial direction, the node levels may be ranked according to the goodness of theassociation of the waveforms of the two measurement systems. The node levelwith the worst association may be given the mark of 0, whereas the one withthe best association the mark of 7. The sum of the ranking scores over allvelocities and scoring options yields an 8 × · · N v = 35 · N v , where N v is the numberof the velocities used in the data acquisition; further analysis of the entriesof this matrix yields relative information on the goodness of the associationfor the node levels of the extremities and spatial directions, e.g., it identifiesthose pertaining to the first quartile of the similarity-index distribution (worstassociation).To assess the similarity of the waveforms of the two measurement systems,obtained for the nodes of the upper and lower extremities, one-factor ANOVAtests may be performed, separately for each of the five scoring options ofSubsection 3.2, on the scores obtained at each velocity setting, for all upper-extremity nodes and spatial directions, and all lower-extremity nodes and spa-tial directions. The outlined test should be sufficient in determining whetherthe performance between the two measurement systems for the lower part ofthe human body (in relation to its upper part) deteriorates. It must be alsoinvestigated whether the aforementioned results are significantly affected afterexcluding the nodes with the worst association between the waveforms of thetwo measurement systems.The goodness of the association between the waveforms, pertaining to thethree spatial directions x , y , and z , may be determined after employing ANOVAtests. Similarly to the previous tests, it must be investigated whether the re-sults are significantly affected after the exclusion of the nodes with the worstassociation between the waveforms of the two measurement systems.Our past experience indicates that the y waveforms, corresponding to the rawlower-leg signals (i.e., the y offsets of the subject’s CM are not be removedfrom the signals), must be examined. This comparison is important for tworeasons. First, the lower-leg signals are used in timing the motion; second, weintend to use these signals in order to obtain the times (expressed as fractionsof the gait cycle) of the initial contact (IC) and the TO [9,10]; the differenceof these two values is the stance fraction. We had noticed in the past that asalient feature in the waveforms obtained from the original Kinect sensor isa pronounced peak appearing around the IC; this peak is less pronounced inthe data obtained with the upgraded sensor, e.g., see Figs. 3 and 4. Althoughit cannot influence the timing of the motion (because of its position), thisartefact complicates the determination of the stance fractions, at least whenusing the original sensor. 20he goodness of the association between the RLD waveforms must be inves-tigated in the case that emphasis is placed on the reliable determination ofany asymmetric features in the motion. To establish whether the differencesin the reliability of the LRA and RLD waveforms are significant, two-sidedt-tests may be performed on the score distributions between correspondingLRA and RLD waveforms, a total of 15 · N v tests (five scoring options, threetests per scoring option, N v velocity settings). As it is not clear which type oft-tests is more suitable, we propose that three tests be made per case: paired,homoscedastic, and unequal-variance.Finally, we address the comparison of the RoMs obtained from the waveformsof the two measurement systems. It might be argued that one could simplyuse in a study the RoMs, rather than the waveforms, as representative ofthe motion of each node. Of course, given that each waveform is essentiallyreplaced by one number, the information content in the RoMs is drasticallyreduced compared to that contained in the waveforms. Plotted versus oneanother (scatter plot), the ideal relation between the RoMs obtained from thetwo measurement systems should be linear with a slope equal to 1, both forthe LRA and for the RLD waveforms. The comparison of the two straight-line slopes, obtained in case of the LRA and the RLD waveforms, provides anindependent assessment on the significance of the differences in the reliabilityof the LRA and RLD waveforms. Our aim in the present paper was to develop the theoretical background re-quired for the comparison of the output of two measurement systems used (orintended to be used) in the analysis of human motion; important definitionsare given in Section 3, whereas the data acquisition and the first part of thedata analysis are covered in Section 4. A list of meaningful tests, comprisingthe second part of the data analysis, is given in Section 5.Although this methodology has been developed for a direct application in thecase of the Microsoft Kinect TM (‘Kinect’, for short) [1] sensors, the use of whichin motion analysis is our prime objective, its adaption may yield solutionssuitable in other cases. The outcome of the proposed tests of Section 5 shouldbe sufficient in identifying the important differences in the output of twomeasurement systems. As such, these tests identify (in our case) differences inthe performance of the two Kinect sensors (in a comparative study) or enableconclusions regarding the outcome of the validation of the output of eitherof the Kinect sensors (if the second measurement system is a marker-basedsystem (MBS)). 21s next steps in our research programme, we first intend to conduct a com-parative study of the two Kinect sensors, after applying the methodology setforth herein. At a later stage, we will attempt to validate the output of thetwo Kinect sensors on the basis of standard MBSs. Conflict of interest statement
The authors certify that, regarding the material of the present paper, theyhave no affiliations with or involvement in any organisation or entity withfinancial or non-financial interest.
References th European LS-DYNA Conference, Salzburg, Austria, May 14-15, 2009.[17] M.P. Kadaba et al., Repeatability of kinematic, kinetic, and electromyographicdata in normal adult gait, J. Orthop. Res. 7 (1989) 849-860.[18] J.L. McGinley, R. Baker, R. Wolfe, M.E. Morris, The reliability of three-dimensional kinematic gait measurements: a systematic review, Gait Posture29 (2009) 360-369.[19] P. Garofalo et al., Inter-operator reliability and prediction bands of a novelprotocol to measure the coordinated movements of shoulder-girdle and humerusin clinical settings, Med. Biol. Eng. Comput. 47 (2009) 475-486.[20] A. Ferrari, A.G. Cutti, A. Cappello, A new formulation of the coefficientof multiple correlation to assess the similarity of waveforms measuredsynchronously by different motion analysis protocols, Gait Posture 31 (2010)540-542.[21] H. Theil, Economic Forecasts and Policy, Second Edition, North-Holland,Amsterdam (1961).[22] A. Ferrari et al., First in vivo assessment of “Outwalk”: a novelprotocol for clinical gait analysis based on inertial and magnetic sensors,Med. Biol. Eng. Comput. 48 (2010) 1-15.[23] J. Daniels, Daniels’ running formula, Third Edition, Human Kinetics (2013),pp. 26-28.[24] http://db-maths.nuxit.net/CaRMetal/index en.html able 1 The notation for the marker positions according to the ‘Plug-in Gait’ placementscheme [14].Marker number Marker-position identifier Placement1 LFHD left front head2 RFHD right front head3 LBHD left back head4 RBHD right back head5 C7 7 th cervical vertebrae6 T10 10 th thoracic vertebrae7 CLAV clavicle8 STRN sternum9 RBAK right back (middle of the right scapula)10 LSHO left shoulder11 LUPA left upper arm12 LELB left elbow13 LFRA left forearm14 LWRA left wrist A15 LWRB left wrist B16 LFIN left fingers (second metacarpal head, dorsum)17 RSHO right shoulder18 RUPA right upper arm19 RELB right elbow20 RFRA right forearm21 RWRA right wrist A22 RWRB right wrist B23 RFIN right fingers (second metacarpal head, dorsum)24 LASI left anterior superior iliac spine25 RASI right anterior superior iliac spine26 LPSI left posterior superior iliac spine27 RPSI right posterior superior iliac spine28 LTHI left thigh able 1 continued Marker number Marker-position identifier Placement29 LKNE left knee30 LTIB left tibia31 LANK left ankle32 LHEE left heel, on the calcaneus33 LTOE left toes, second metatarsal head34 RTHI right thigh35 RKNE right knee36 RTIB right tibia37 RANK right ankle38 RHEE right heel, on the calcaneus39 RTOE right toes, second metatarsal head ig. 1. The front view of the original Kinect sensor; also shown is the Kinect coor-dinate system. ig. 2. The 20 nodes of the original Kinect sensor. The figure has been produced withCaRMetal, a dynamic geometry free software (GNU-GPL license), first developedby R. Grothmann and recently under E. Hakenholz [24]. f y ( mm ) Kinect v1; left leg f y ( mm ) Kinect v2; left leg
Fig. 3. Preliminary results for the waveforms for the raw y coordinate of the leftlower leg (ankle) obtained from one subject, using both Kinect sensors. The quantity f is the fraction of the gait cycle. The sensors were attached unto a plastic structuremounted on a tripod; the difference in the y values simply reflects the higher positionon the mount of the upgraded Kinect sensor. f y ( mm ) Kinect v1; right leg f y ( mm ) Kinect v2; right leg