[PDF] Investigating naturalistic hand movements by behavior mining in long-term video and neural recordings

Abstract

Recent technological advances in brain recording and artificial intelligence are propelling a new paradigm in neuroscience beyond the traditional controlled experiment. Rather than focusing on cued, repeated trials, naturalistic neuroscience studies neural processes underlying spontaneous behaviors performed in unconstrained settings. However, analyzing such unstructured data lacking a priori experimental design remains a significant challenge, especially when the data is multi-modal and long-term. Here we describe an automated approach for analyzing simultaneously recorded long-term, naturalistic electrocorticography (ECoG) and naturalistic behavior video data. We take a behavior-first approach to analyzing the long-term recordings. Using a combination of computer vision, discrete latent-variable modeling, and string pattern-matching on the behavioral video data, we find and annotate spontaneous human upper-limb movement events. We show results from our approach applied to data collected for 12 human subjects over 7--9 days for each subject. Our pipeline discovers and annotates over 40,000 instances of naturalistic human upper-limb movement events in the behavioral videos. Analysis of the simultaneously recorded brain data reveals neural signatures of movement that corroborate prior findings from traditional controlled experiments. We also prototype a decoder for a movement initiation detection task to demonstrate the efficacy of our pipeline as a source of training data for brain-computer interfacing applications. Our work addresses the unique data analysis challenges in studying naturalistic human behaviors, and contributes methods that may generalize to other neural recording modalities beyond ECoG. We publicly release our curated dataset, providing a resource to study naturalistic neural and behavioral variability at a scale not previously available.

Full PDF

IInvestigating naturalistic hand movements by behaviormining in long-term video and neural recordings

Satpreet H. Singh , Steven M. Peterson , , Rajesh P. N. Rao , , , andBingni W. Brunton , , , † Department of Electrical and Computer Engineering, University of Washington, Seattle,USA Department of Biology, University of Washington, Seattle, USA Paul G. Allen School of Computer Science and Engineering, University of Washington,Seattle, USA Center for Neurotechnology, Seattle, USA University of Washington Institute for Neuroengineering, Seattle, USA eScience Institute, University of Washington, Seattle, USA † Author to whom correspondence should be addressed.E-mail: [email protected]

Abstract.

Objective : Recent technological advances in brain recording and artiﬁcial intelligenceare propelling a new paradigm in neuroscience beyond the traditional controlled experiment.Rather than focusing on cued, repeated trials, naturalistic neuroscience studies neuralprocesses underlying spontaneous behaviors performed in unconstrained settings. However,analyzing such unstructured data lacking a priori experimental design remains a signiﬁcantchallenge, especially when the data is multi-modal and long-term. Here we describean automated approach for analyzing simultaneously recorded long-term, naturalisticelectrocorticography (ECoG) and naturalistic behavior video data.

Approach : We take a behavior-ﬁrst approach to analyzing the long-term recordings.Using a combination of computer vision, discrete latent-variable modeling, and string pattern-matching on the behavioral video data, we ﬁnd and annotate spontaneous human upper-limbmovement events. We then demonstrate applications of these naturalistic behavior events,along with their associated neural recordings, for neural encoding and decoding.

Main results : We show results from our approach applied to data collected for 12human subjects over 7–9 days for each subject. Our pipeline discovers and annotates over40,000 instances of naturalistic human upper-limb movement events in the behavioral videos.Analysis of the simultaneously recorded brain data reveals neural signatures of movementthat corroborate prior ﬁndings from traditional controlled experiments. We also prototype adecoder for a movement initiation detection task to demonstrate the efﬁcacy of our pipeline asa source of training data for brain-computer interfacing applications.

Signiﬁcance : Our work addresses the unique data analysis challenges in studyingnaturalistic human behaviors, and contributes methods that may generalize to other neuralrecording modalities beyond ECoG. We publicly release our curated dataset, providing aresource to study naturalistic neural and behavioral variability at a scale not previouslyavailable.

Keywords : naturalistic behavior, computer vision, neural correlates, neural decoding, electro-corticography, brain-computer interfaces a r X i v : . [ q - b i o . N C ] J un nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings

1. Introduction

Neuroscience has long been interested in understandingbrain activity associated with spontaneous behaviors infreely behaving subjects. Even so, hypotheses regardingbrain function have typically been tested using carefullydesigned, well-controlled experimental tasks, where timingof cues, stimuli, and behavioral responses are knownprecisely. Fortunately, recent technological advances haveenabled us to study increasingly naturalistic and longerbrain recordings, giving rise to a new paradigm called“naturalistic neuroscience” (Nastase et al. , 2020; Huk et al. ,2018; Gabriel et al. , 2019; Markowitz et al. , 2018; Wang et al. , 2016) where neural computations associated withsuch spontaneous behaviors are studied. Understandingsuch unstructured, long-term, and multi-modal data posesa substantial analytic challenge, due in part to the lack of a priori experimental design and the difﬁculty of isolatinginterpretable behavioral events.

Our work is related to several areas of active researchin neuroscience, neuroengineering, and neuroethology thatintegrate techniques from machine-learning, computer-vision, and statistical modeling. Many recent methodolog-ical innovations have addressed the automated analysis of non-human animal behavior (Batty et al. , 2019; Pereira etal. , 2019; Nassar et al. , 2019; Mathis et al. , 2018; Johnson et al. , 2016; Wiltschko et al. , 2015) (see also Mathis andMathis (2020) for a recent survey and Anderson and Perona(2014) for a perspective on this emerging area). A typicalnon-human naturalistic neuroscience experiment (Johnson et al. , 2020; Markowitz et al. , 2018; Berman, 2018) ﬁrstcollects simultaneously recorded behavioral video and neu-ral activity data from one or more freely behaving subjectsin an uncontrolled but sufﬁciently conﬁned environment.Next, the video recordings are processed through an exten-sive pipeline consisting of steps such as: segmenting thesubject(s) from the background, transforming subject poseto common coordinates using afﬁne transformations, esti-mating pose of body-parts across frames, and higher-leveloperations such as classifying pose or segmenting pose intoactions. Combined with the simultaneously recorded neu-ral data, such naturalistic behavior data are being used toshed light on previously intractable questions in behavioralneuroscience, often at unprecedented scale.Human action-recognition methods from mainstreamcomputer vision (Ramasamy Ramamurthy and Roy, 2018) are relevant but not directly applicable to the needs ofnaturalistic human neuroscience. Traditionally, action-recognition research has concerned itself with discriminat-ing activities at a coarse level, such as sitting vs. walking(Ghorbani et al. , 2020), and has often assumed the avail-ability of a large corpus of labeled training data. In con-trast, to study the kinds of behaviors that interest neurosci-entists and neuroengineers, we seek to localize ﬁne-grainedmovements to sub-second temporal resolution, and ideallyuse the fewest behavioral labels possible (Seethapathi et al. ,2019). Lastly, since it is not known which behaviors or be-havioral characteristics will elicit neural responses worthstudying further, a queryable representation that supportsthe ﬂexibility to study several kinds of behaviors is desir-able. Recent work by (Fu et al. , 2019) develops such repre-sentations for semi-automated exploration of scenes in gen-eral videos.Our work is most closely related to recent work in hu-man naturalistic neuroscience that combine computer vi-sion with opportunistic clinical brain recordings, includingWang et al. (2016), Alasfour et al. (2019), and Chambers etal. (2019). In particular, we build on the work of Wang et al. (2018) using similar video data and estimate the pose of hu-man upper-body keypoints using neural-networks. Gabriel et al. (2019) use optical ﬂow and image partitioning to de-tect coarse limb movements from video taken in a clini-cal setting similar to ours and develop neural decoders fordetecting these movements from brain data. Compared toWang et al. (2018), who use a moving window heuristic onpose estimates to detect movements, we take a more prin-cipled approach to modeling the pose data. This allows usto localize movement events with ﬁner temporal-resolutionand characterize entire movement trajectories, which in turnenables novel applications described later in the paper. Wealso use newer, more efﬁcient computer vision methods(Mathis and Mathis, 2020; Nath et al. , 2019) that allow usto process data at a scale that exceeds all of the aforemen-tioned studies taken together in the number of subjects andduration of recordings analyzed. Finally, we focus on curat-ing, characterizing, and making our dataset available to theresearch community to foster further research and develop-ment in this area.

We present a scalable behavior-mining approach to analyzesimultaneously recorded naturalistic brain and behaviordata, obtained opportunistically from human subjectsundergoing long-term clinical monitoring prior to epilepsy nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings (a) Pose estimation (b) Time-series segmentation (c) Event detection C on f . X L _ W R I S T Y L _ W R I S T

0 Time [s] 20 V i d e o c li p i nd e x

0 Time [min] 3.5

Figure 1.

Pipeline for behavioral video data processing. (a) Video frame showing estimated pose keypoints (colored dots) on human subject. (b)Autoregressive hidden Markov model (Section 3.1.2) robustly segmented pose trajectory into rest (shaded grey) and move (shaded light blue) states. (c)Raster plot of pose states ( move in dark blue, rest in white) for several video-clips for pattern matching at scale. Red box depicts one movement initiation event matching a pattern of 15 contiguous rest states (0.5s) followed by 15 contiguous move states (0.5s). surgery. Our video processing pipeline (Figure 1) ﬁrstestimates the locations of keypoints (e.g. wrists andelbows) on the upper-body using a neural network trainedon each subject (Mathis et al. , 2018). We then segmentthe trajectory of each keypoint in time using discretelatent-variable models, building a discrete representation ofpose dynamics. Interestingly, having a discrete, sequentialrepresentation of upper-limb pose simpliﬁes the problem ofdetecting behavioral events to pattern-matching on strings.Using regular-expressions corresponding to patterns ofinterest, we discover thousands of interpretable events persubject—an order of magnitude more observations than ina typical controlled human experiment. To study the richnaturalistic variability associated with these events, we alsoextract metadata including movement angle, magnitude,and duration.Next, we explore the use of these behavioral events forneuroscience and neuroengineering applications by analyz-ing the simultaneously recorded brain data. Event-averagedspectrograms associated with our naturalistic human upper-limb movement initiation events corroborate and strengthenprevious ﬁndings from controlled experiments (Miller etal. , 2007) (see also (Peterson et al. , 2020)). Preliminary in-vestigations also suggest that our workﬂow could producedata useful for training brain-computer interface (BCI) de-coders; due to the use of larger sample sizes of training datarepresentative of naturalistic variability, such decoders mayperform more robustly in real-world deployments.Our key contributions in this paper are as follows.First, we present a highly automated, novel workﬂow foranalyzing simultaneously recorded naturalistic long-termhuman brain and behavioral video data. Second, we de-velop a domain-relevant, robust, temporally precise, andqueryable representation of human upper-limb pose. Third,to showcase our workﬂow, we demonstrate example appli-cations in neuroscience and neuroengineering, suggestingthat our approach and results are of broad interest. Finally, to support open science and facilitate further research in thisarea, we release our curated dataset consisting of annotatednaturalistic events and associated neural recordings.

2. Dataset

Our dataset consists of human intracranial electrocorticog-raphy (ECoG) (Parvizi and Kastner, 2018) neural record-ings and simultaneously recorded behavioral video record-ings, obtained opportunistically from 12 patients withepilepsy for the duration of each patient’s long-term (7–9 days) continuous clinical observation. The Universityof Washington Institutional Review Board for the protec-tion of human subjects approved our study and all patientsprovided their informed written consent. Patient behav-ior was continuously recorded by a wall-mounted camera(RGB/infrared, 640 ×

480 pixels) for real-time monitoringby an around-the-clock clinical team, except during inter-mittent equipment servicing or private times when the cam-era was switched off or turned away. Patients were observedperforming their daily activities (including talking, eating,watching TV, using a computer or phone, sleeping, receiv-ing clinical care etc.) from the hospital bed while beingtethered to a brain-recording interface.Each patient had about 90 electrodes implanted underthe skull and dura, directly on their brain surface, includingeither an 8 × × nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings We applied standard pre-processing steps to the ECoGdata (Peterson et al. , 2020; Gabriel et al. , 2019; Miller,2019; Miller et al. , 2007; Schalk et al. , 2007), includingdown-sampling to 500Hz, 60-Hz line noise removal, large-amplitude artifact removal, and median-centering using acommon reference across all electrodes. All electrodepositions were localized and converted to MontrealNeurological Institute and Hospital (MNI) coordinatesusing the Fieldtrip toolbox (Oostenveld et al. , 2011; Stolk etal. , 2018) in MATLAB. To aid interpretability, we restrictedour analysis in this paper to the 64 grid electrodes coveringone hemisphere per subject. We excluded electrodes withrecording issues, such as persistent presence of artifacts.However, data for all available electrodes are providedin the publicly released dataset accompanying this paper.In most cases, we also did not analyze the neurally andbehaviorally atypical data from the ﬁrst two days of apatient’s hospital stay, since patients were usually heavilymedicated during this time while recovering from electrodeimplantation surgery.Before processing the video data, we manuallyinspected and annotated it at a coarse (every 3 minutesor so) level of granularity to create an omit-list thatwas excluded from further processing. The omit-listincluded long time-spans of sleep, times when a clinicalor research team was actively working with the subject,private times, and times when applying computer-visionalgorithms was impossible due to poor lighting conditionsor severe occlusion of the subject’s body. Almost allcamera movement occurs around times when the clinicalteam is actively working with the patient. Removing thesetimes results in a mostly steady recording conﬁgurationas seen in Figure 1(a). We also labeled and excludedtimes when the clinical team had placed seizure restraintson the subjects’ hands, since these limited mobility andgave rise to unnatural movements. Completing thesemanual annotations took about 6–12 hours per subject,depending on their clinical treatment regime, activity andsleep schedule, and length of hospital stay. When analysingECoG and video data together, the two data-streams weresynchronized using metadata extracted using the equipmentmanufacturer’s (Natus Medical Incorporated) software.

3. Methods

We developed and validated a pipeline to extract temporallyprecise, interpretable movement events, by processingthe video data through pose-estimation, pose time-seriessegmentation, event detection and ﬁnally, event metadataextraction (Figure 1).

To extract a subject’spose from raw video, we trained a state-of-the-artmarkerless pose estimation tool (Mathis et al. , 2018) knownfor its speed and data efﬁciency (Nath et al. , 2019). Fortraining data, we manually annotated around 1000 framesper subject chosen randomly from the entire durationof a subject’s video data, preferentially sampling active,daytime hours over times when the subject was asleep. Foreach frame, we annotated up to 9 keypoints on the subject’sbody whenever visible (Bourdev and Malik, 2009). Thesekeypoints were the nose, both wrists, elbows, shoulders,and ears (Figure 1a).During prediction, the pose-estimation tool producedthe ( x, y ) coordinates and a conﬁdence estimate between [0 , for each keypoint per frame (Figure 1b). To quantifythe performance of keypoint tracking, we estimated thepixel-wise RMS error to be 1.54 ± ± % of the manual annotations, excluding points belowconﬁdence threshold of . . As an approximate scaling, pixels in the video span about cm in physical units,which is about the width of a human wrist. We estimatedthis scale by comparing standard human measurements(McDowell et al. , 2009) with median distance betweenshoulder keypoints (in pixels) at movement onset for a fewsubjects. On average, estimating pose for the entire durationof a subject’s video took GPU-hours per subjectusing

AWS p2.16xlarge NVIDIA K80

GPUs. Wedenoised the estimated pose trajectories by median ﬁltering(window length frames) and smoothing (window length , nd order Savitzky-Golay (Schafer, 2011) ﬁlter). Next, wesegmented the pose time-series into discrete, interpretablestates while preserving the temporal precision of thekeypoint tracking. We applied a ﬁrst-order autoregressivehidden semi-Markov model (ARHSMM) (Murphy, 2012)with two latent states to each keypoint’s time-series(Figure 1b for left-wrist). This model converts eachkeypoint’s continuous pose dynamics into discretizeddynamics consisting of rest and move states . Usinga semi-Markov, rather than a Markov model, accountsfor the bias that limbs tend to be at rest most of thetime and mitigates unnecessary switching between latentstates. Similar to (Wiltschko et al. , 2015), we ﬁtthe ARHSMM using the pyhsmm-autoregressive package in Python (Johnson and Willsky, 2013). Theresulting states are at video frame-rate resolution and thesegmentation is relatively robust to variation in lighting,camera angle, and level of activity in the video.

Discretizing the posetrajectories facilitates the description of scientiﬁcallyinteresting behaviors performed spontaneously by the nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings RRRRRRRRRRRRRRRRRRRRRRMMMMMMMMMMMRRR D L _ W R I S T [ px ] S t a t e A CB

ReachMovementRest Rest

Video frame [index]

A B C

RRRRRRRRRRRRMMMMMMMMMMMMMMMMMMMMMMRRRRRRR S t a t e RRRRRRRMMMMMMMMMMMMMMMMMMMMMMRRRRRRR

Bimanual overlap window for R_WRISTL_WRISTR_WRIST

Figure 2.

Schematic of metadata extraction: [Top] Cartoon showingleft-wrist movement at time of (A) movement initiation (B) maximumdisplacement from initiation ( reach ) and (C) movement end. [Middle]Time course of left-wrist radial distance [px: pixels], and discretized state(R: rest , M: move ). Extracted movement metadata include duration, startand end coordinates, among others (Sec. 3.1.4). [Bottom] Discretized statesequence for both left and right-wrists, now showing movement initiationin the right-wrist while the left-wrist is still in motion. The dashed lineshows the window over which bimanual overlap metadata is calculated,corresponding to the number of frames for which the opposing (left) wristis in motion over that duration. subject, even though they vary greatly in duration.Speciﬁcally, the task of ﬁnding different types of behavioral events thus reduces to string pattern matching on thediscretized dynamics. For the behaviors we explore inthe rest of this paper, we looked for movement initiation events by matching a pattern of consecutive rest states( . s), followed by at least consecutive move states( . s). Similarly, no-movement events are state sequencesof rest states ( . s) across both wrists and the nose. Tocreate our database of wrist movements, we use regularexpressions to quickly ﬁnd thousands of non-overlappinginstances of such patterns in the discretized pose dynamicsfor each subject.Parameters for smoothing, hyperparameters for theARHSMM segmentation model, and the choice of regularexpressions for event detection, were picked empirically byassessing performance on pose time-series derived from asmall representative set of subject videos. We conﬁrmedthat the temporal accuracy of event boundaries matched ourexpectations by manually inspecting a few dozen randomevents of each movement type for each subject. For each detectedmovement event, we extracted several metadata featuresfrom the continuous pose-dynamics associated with themovement. These include movement-associated metadata(Figure 2) like the ( x, y ) coordinates of the keypoint at thestart and end of the event, duration of the entire movement(up to next rest state), and rest duration before and aftermovement.Observed naturalistic hand movements often consistedof a hand reaching out, touching, or grabbing an object, thenbringing the hand back to the body. Therefore, we deﬁnedthe reach of a wrist movement to be its maximum radialdisplacement during the course of the event, as calculatedfrom its location at the start of the event. We extracted the magnitude , angle , and duration for each reach.To measure the shape of a movement, we ﬁt st , nd and rd -degree polynomials to a keypoint’s displacementtrajectory. Differences between the quality of the ﬁt (asmeasured by R ) to each polynomial type provide a roughmeasure of the “curviness” of the movement trajectory. Wealso estimated a movement’s onset and offset speeds , bycalculating the keypoint’s displacement change within shorttime windows around the start and end of the movement.Since people often move both hands at the same time(i.e. “bimanually”), we augmented each movement eventwith metadata about the opposing wrist’s movement, ifany (Figure 2). By juxtaposing the discrete state sequenceof both wrists, we calculated when the opposing handstarts to move ( lead/lag time difference ) and how long thismovement overlaps with that of the primary hand ( overlapduration ).False positives in event discovery were still present inthe data at this stage due to pose estimation failures andunusual pose states. To compensate for failures in 2D poseestimation, we calculated movement-weighted conﬁdencescores for each event and removed those below a manuallydetermined threshold. To eliminate outlier pose states, wecalculated mean distance and mean angle between shoulder keypoints, then removed events from the top and bottom 5percentiles of these quantities. A core scientiﬁc question in systems neuroscience is howbehaviors are encoded by the coordinated activation of brainregions. To examine the neural correlates of naturalisticmovement initiation, we performed a time-frequency(TF) analysis of the neural recordings (Cohen, 2014)by averaging event-locked spectrograms for each subject,using hundreds of movement initiation events chosento match movement statistics (reach magnitude, onsetvelocity, and shape) of a previous controlled experimentalstudy (Miller et al. , 2007). Using the aforementionedmetadata to guide our search, we selected up to 200 eventsper day over 5 days for each of 12 subjects, and then furtherinspected the video for each event to remove any false nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings Day 36a 12p 6p 6a 12p 6p 6a 12p 6pDay 4 Day 5S01S02S03S04S05S06S07S08S09S10S11S12 S ub j ec t I D Hospital stay day/timeSubject ID S S S S S S S S S S S S N u m b e r o f e v e n t s [ ]

0 Day 9

R_wrist 2020-06-05

Figure 3. [Left] Number of right-wrist movement initiation events discovered per day for each of 12 subjects, totaling 475 to 3526 events per subjectacross their entire duration of clinical observation (268 ±

123 s.d. per day). [Right] Raster plot of right-wrist movement initiation event occurrencesshowing bursts of activity interspersed with periods of rest or omit-listed (Section 3.1.1) periods. See Figure A2 for equivalent plots for left-wrist.

Figure 4.

A sample of 50 typical right-wrist trajectories (px:pixels; translated to start at origin) showing diversity of naturalistic reachmovements for a single subject (S10). Different colors represent differentindividual trajectories. Note the large variability in the movements,compared to what is normally captured by controlled experiments. positives (17.8 % mean ± % s.d. events). A grand challenge in neuroengineering is development ofBCIs that can be used to predict spontaneous activity andintentions outside the lab, in everyday settings (Shanechi,2019; Smalley, 2019; Shanechi, 2018; Warren et al. ,2016; Shenoy and Chestek, 2012). Here we performeda preliminary study leveraging our pipeline as a sourceof training data for a BCI decoder that detects wristmovement initiation events. Speciﬁcally, we trainedseparate classiﬁers, tailored to each subject, to discriminatebetween movement initiation events and no-movement epochs for each wrist using only features derived from theECoG neural recordings.Our decoder uses the Random Forest (RF) algorithm(Breiman, 2001; Murphy, 2012), which is typicallyconsidered one of the best off-the-shelf classiﬁcationalgorithms for small/medium sized datasets (Hastie et al. ,2009). We used ECoG data . s before to . s aftereach event to compute TF spectrograms at each of the64 grid electrodes and used the ﬂattened vector of TFbins as features for the classiﬁer (TF bins were 200ms × et al. , 2019; Klosterman etal. , 2016), a reduced subset of events from 3 consecutivedays (typically days 3 through 5 of clinical monitoring)were used. We used events from the last day as the testset. To eliminate the confound of movement initiation inthe opposing wrist, we further ﬁltered events to excludethose with signiﬁcant movement ( ≥ . seconds) in theopposing wrist within the ± . s window used for ECoGdata. Positive (movement initiation) and negative (no-movement) examples were balanced by down-samplingnegative examples. This balancing eliminated bias inthe training set and set up a baseline performance of accuracy for test set performance. Training andtest supports were 633 ±

417 s.d. and 331 ± [50 , ) and maximum tree-depth (range: [3 , ). For each set of hyperparameters, 5-fold cross-validation holdout accuracy was used to measureperformance. Final performance reported is from trainingusing best hyperparameters and corresponds to classiﬁeraccuracy on events from the withheld test day. nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings (a) Reachmagnitude (b) Reachduration (e) Reach angle0 250 0 1 2 3 0.0 0.5 1.0[pixels] [seconds] [seconds] [degrees] OUT DOWN INUP S ub j ec t I D S01S02S03S04S05S06S07S08S09S10S11S12

R_wrist 2020-06-05 (c) Bimanuallead/lag-0.5 0 +0.5[seconds] D e n s it y (d) Bimanualoverlap S01S05S09 S04S08S12

Figure 5.

Histograms of right-wrist movement initiation event metadata per subject for their entire duration of clinical observation: (a) Reach magnitudeshows a dominance of small movements, (b) Reach durations tended to be concentrated around ≈ ±

4. Results

Our pipeline extracted 959 to 6745 individual wristmovement events per subject (487 ±

215 per day) across12 subjects (Figures 3 and A2). We found large variabilitybetween subjects in the number of events discovered,which we attribute to inter-subject differences in cycles ofsleep and wakeful activity and clinical treatment regimes(Figure 3 and A2). We also observed rich within-subjectvariability in the event metadata (Figures 4, 5, and A3),which further differentiates our dataset from those collectedin controlled experiments. Since our subjects received noinstructions for when and how to move, we expect theobserved movement statistics to closely reﬂect the naturalstatistics of human upper-limb movements while seated.

We observed movement-associated power increases in ahigh-frequency band (76–100 Hz) and decreases in a low-frequency band (8–32 Hz) across many cortical areas(Figure 6). This pattern is similar to what has beenobserved with controlled experimental trials in Miller etal. (2007), Volkova et al. (2019), and Yuan and He(2014). Furthermore, we strengthen prior ﬁndings byshowing that these movement-associated patterns holdacross 5 consecutive days, well beyond the timescale ofa typical controlled experiment (less than hours). Tothe best of our knowledge, this is the ﬁrst reportedinstance of a TF analysis of spontaneous naturalistic

Figure 6.

Neural correlates of movement initiation: Event-lockedspectrograms, averaged by brain region (cyan color in insets) across 12subjects, showed movement-associated high-frequency power increase andlow-frequency power decrease. These patterns corroborate and strengthenprevious ﬁndings from controlled experiments (Miller et al. , 2007). Seeour companion preprint Peterson et al. (2020) for a deeper exploration ofthe behavioral and neural variability of these movements. movements using events discovered by an automatedworkﬂow. We elaborate on these results in our concurrentlyreleased preprint (Peterson et al. , 2020), where we furtherinvestigated the consequences of the relatively highervariance of naturalistic movement statistics and modeledthe contributions of the various movement metadata to theobserved neural responses. nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings Figure 7.

Test set decoding accuracy for initiation of movement ofcontralateral (side opposite electrode implant) and ipsilateral (same side)wrists: As expected, decoding of contralateral movements is slightly moreaccurate than ipsilateral in almost all cases.

Individual subject classiﬁer performance varied widelybetween subjects, ranging from around chance levels to on test accuracy (Figure 7), comparable to previouslyreported work (Gabriel et al. , 2019; Wang et al. , 2018).Classiﬁer performance tended to be correlated with extentof motor cortex coverage (Figure 8 and A1). Dueto hemispheric lateralization of brain function, decodingcontralateral limb movements is expected to be moreaccurate than decoding ipsilateral movements (Tam et al. ,2019). Consequently, test set decoding accuracy (Figure 7)was higher for the contralateral wrist in almost all subjects.Since false positives (FPs) in the event data establish aceiling on classiﬁer accuracy, we estimated their prevalenceby manually inspecting 100 randomly sampled eventsper event-type from each subject (5 % ± % s.d. forno-movement, 22 % ± % s.d. for contralateral, and14 % ± % s.d. for ipsilateral events). With morestringent rejection of FPs, we expect improved decodingperformance and potentially more pronounced differencesbetween contralateral and ipsilateral decoding accuracy.To interpret the importance of spectral features in thedecoder, we visualized Random Forest feature importancescores (Breiman, 2001; Hastie et al. , 2009). We aggregatedthese scores in two ways to gain insight into their spatial(Figure 8 and A1) and frequency (Figure 9) components.Spectral features are indexed by electrode, time, andfrequency. We deﬁne Feature Importance aggregated byElectrode F I E ( e ) for electrode e ∈ E as: F I E ( e ) = (cid:88) f ∈F ,t ∈T F I ( t, f, e ) , where E , F and T are the sets of electrodes, frequency-bins, and time-bins over which the spectral features are calculated, respectively. For the purpose of visualization,we normalized these values to get Normalized FeatureImportance aggregated by Electrode N F I E ( e ) : N F I E ( e ) = F I E ( e )max e ∈E F I E ( e ) . As seen in Figures 8 and A1, we found that electrodes oversensorimotor cortex, when available, dominated featureimportance (e.g. Subjects S07, S06, S03 and S11 inFigure 7). When motor cortex coverage is limited orunavailable, decoding is still possible because inter-regioncorrelations are known to exist in the brain (Tam et al. ,2019; Miller et al. , 2007; Schalk et al. , 2007) and werelikely exploited by the classiﬁer, but with limited decodingcapacity.To understand the contributions of various frequency-bins to decoding, we deﬁne analogous formulas for featureimportances aggregated by frequency-bin:

F I F ( f ) = (cid:88) e ∈E ,t ∈T F I ( t, f, e ) N F I F ( f ) = F I F ( f )max f ∈F F I F ( f ) As seen in Figure 9, we found that a low-frequencyband ( < et al. , 2007). If ipsilateral wrist movement wasbeing decoded, or when motor cortex electrode coveragewas lacking (e.g. contralateral S02 and S05 in Figures 8 and1), spectral feature contributions tended to be more broadlydistributed across the frequency spectrum and correlatedwith lower decoding accuracy.

5. Discussion

In summary, we have developed a highly automatedand scalable approach for analyzing long-term datasetsof simultaneously collected human brain and naturalisticbehavior data. Our workﬂow robustly uncovered andannotated thousands of human upper-limb movementevents in behavior videos. To detect movement events,we ﬁrst discretized pose time-series for each wrist intotwo latent states, indicating movement or rest, and thenused regular expressions to look for user-speciﬁed patternsin the latent state sequences. This semi-supervisedstrategy allowed us to rapidly explore movements andtheir associated brain responses. Importantly, ourcurated naturalistic dataset supported direct comparisonwith existing literature from controlled experiments. Todemonstrate the applicability of our workﬂow, we analyzedthe brain data associated with the annotated events fromtwo perspectives: characterizing neural correlates ofmovement, and decoding naturalistic movement initiationusing ECoG data. Key to the success of our applications nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings I p s il a t e r a l C on t r a l a t e r a l E S01 S03 S04 S06 S07 S08 S11 S12

Figure 8.

Contralateral and ipsilateral wrist movement initiation decoder feature importance scores aggregated by electrode (NFI E ), showing spatialcontributions of different brain regions. Scores are normalized by dividing by highest electrode score for each decoder. Electrode coverage over motorcortex is highly correlated with decoder accuracy; for instance, subjects having good motor cortex coverage (S07, S06, S03 and S11) have the highestdecoding performance (Figure 7). See Figure A1 for plot with all 12 subjects. F Subject ID S S S S S S S S S S S S N u m b e r o f e v e n t s [ ]

0 Day 9

Figure 9.

Decoder feature importance scores aggregated by frequencyand normalized by dividing by score of highest frequency bin per subject(NFI F ). Heatmaps show that the most relevant spectral features tend tocome from a low-frequency band ( < et al. , 2007). Whenmotor cortex electrode coverage is lacking (e.g. contralateral S02 andS05) or if ipsilateral wrist movement is being decoded, spectral featurecontributions tend to be more broadly distributed across the frequencyspectrum, and correlated with lower decoding accuracy. is the availability of a large number of repeated instancesof movement initiation events, all available with hightemporal precision, which is an essential requirement forgenerating event-averaged spectrograms (Cohen, 2014).The ability to select movements by magnitude, onsetvelocity, and complexity (using shape metadata) allowedus to match movement statistics between naturalistic andcontrolled experimental data, enabling a fair comparison.Furthermore, the ability to select events without opposingwrist activity allowed us to disambiguate confounds when comparing movement decoders for opposing wrists. Our work has a number of limitations that can be improvedwith further development. First, our strategy of discretiz-ing individual keypoint time-series to two latent states andthen pattern-matching on latent state sequences may bechallenging with more complex behaviors involving coor-dinated movement of more keypoints. When we increasedthe number of latent states in the pose segmentation pro-cess, we also noticed that behavioral states were harder tointerpret and associated ECoG responses were not easilyseparable. The automated analysis of behavior for simplemodel organisms such as worms (Gupta and Gomez-Marin,2019), zebraﬁsh (Johnson et al. , 2020), ﬂies (Berman etal. , 2016) and mice (Luxem et al. , 2020; Markowitz et al. ,2018; Datta, 2019), has advanced to the extent of beingable to automatically extract hierarchies of coordinated be-havioral sequences (or grammars ) from naturalistic videos.Except for some very limited work (Summers-Stay et al. ,2012; Yang et al. , 2014), such progress has been elusive inhuman computer vision, possibly due to the sheer complex-ity and variability of human movements in various contexts.Though not tailored to our temporal precision requirements,future research in ﬁne-grained human action recognition insports (Shao et al. , 2020; Piergiovanni and Ryoo, 2018),domestic (Rohrbach et al. , 2012) and industrial (Kobayashi et al. , 2019) contexts could eventually provide methods thatenable the collection of massive datasets of ﬁnely annotatedhuman behavior.All of our data was acquired opportunistically andvideos were recorded from a single clinical monitoringcamera. Thus, a primary drawback of the event metadatagenerated by our pipeline is that they are derived frompose-estimation on single-camera RGB images, implyingthat all pose coordinates are 2D projections and that theﬁdelity of pose-derived metadata is limited. However, nvestigating naturalistic hand movements by behavior mining in long-term video and neural recordings et al. ,2020; Hansen et al. , 2019; Saraﬁanos et al. , 2016).We controlled false positives in the event discoveryprocess using a combination of pose-estimation conﬁdenceand a tedious manual omit-listing process. We found theconﬁdence estimate provided by our pose-estimation toolto perform well under conditions of good visibility, but itwas sensitive to variations arising from naturalistic lightingand occlusions. One potential source of improvement couldcome from using pose-estimation algorithms that employbody models, such as OpenPose (Cao et al. , 2017). In ourassessment, DeepLabCut (Mathis and Mathis, 2020; Nath et al. , 2019) offered a better speed (cost) vs. accuracytradeoff at the scale we deployed for pose-estimation.Future work is poised to take advantage of rapid innovationsin computer vision, as more tools become available andaccessible. While manual creation and review of an omit-list cannot be completely avoided for compliance withhuman research protocols, we believe that a stereoscopicor depth based camera system could also help detectocclusions better and lead to a reduction in false positives.Finally, two limitations arise from the opportunisticdata-collection paradigm itself. First, we have limitedour study to a subject’s wrists because they are relativelyunconstrained and can perform spontaneous naturalisticmovements compared to the rest of the subject’s body. Oursubjects’ heads are tethered to a brain recording devicethat partially restricts the movement of the rest of theirupper body. The study of more naturalistic, especially moreactive, behaviors would require wireless recording. Second,ECoG data such as ours has been obtained opportunisticallyfrom a neuro-atypical patient population undergoing long-term monitoring preceding invasive epilepsy resectionsurgery. We note with caution that conclusions fromanalyzing such data might not generalize well to thebroader, neuro-typical population.

Accompanying our manuscript, we have publicly releasedour curated dataset comprising neural data and eventmetadata for over 40,000 instances of naturalistic humanupper-limb movement events, and an equal number of restevents, across 12 subjects over about a week of clinicalmonitoring each. We expect our dataset to be broadlyapplicable to BCI research like previously released datasetssuch as the BCI competitions I–IV datasets (Sajda et al. ,2003; Blankertz et al. , 2004, 2006; Tangermann et al. ,2012), or other ECoG data libraries (Miller, 2019) that weregenerated through controlled experimentation. While theaforementioned datasets consist of 10s–100s of repeated instances of a behavior per subject, our dataset provides1000s of instances per subject. It captures rich naturalisticvariability across multiple axes relating to the neuralactivity (subject, seizure foci, day of observation, electrodeplacement, and recording ﬁdelity) and the behavior (subjectactivity proﬁle, medication and treatment regime; wristmovement times, movement handedness and sequencing;and other event metadata).We are working on reﬁnements and extensions of thepreliminary investigations presented here, and believe thatour dataset could serve several other lines of scientiﬁcinquiry. A thorough analysis of neural encoding of wristmovement initiation is available in our simultaneouslyreleased preprint (Peterson et al. , 2020). As a follow-upto the limited prototype described here, we are currentlyexploring the use of neural networks to build moreexpressive decoders (Roy et al. , 2019) for events and theirassociated event metadata. The abundant availability ofneural data also allows us to explore representation learningto obtain interpretable task-speciﬁc neural features (Pailla et al. , 2019; Shiraishi et al. , 2020), and transfer learning toadapt decoders trained for one subject to another (Wu et al. ,2020; Elango et al. , 2017; Shenoy et al. , 2007).Our dataset is amenable to several types of modelinggoals and approaches including unsupervised latent factormodeling to extract single-trial neural dynamics (Pandari-nath et al. , 2018a; Ly et al. , 2018; Cole and Voytek, 2019;Pandarinath et al. , 2018b; Zhao and Park, 2016), dynamicalmodeling of the electrocorticographic spectrum (Chaudhuri et al. , 2018; Beck et al. , 2018; Haller et al. , 2018; Brunton etal. , 2016), probabilistic modeling to better understand neu-ral data variability across trials, subjects and brain-regions(Omigbodun et al. , 2016; Yang et al. , 2019; Abbaspourazad et al. , 2018; Yang et al. , 2017), generative modeling to gen-erate synthetic brain data (Hartmann et al. , 2018; Aznan etal. , 2019) and modeling the non-stationarity of the brainsignal over long recording time spans (Farshchian et al. ,2019; Klosterman et al. , 2016; Shenoy et al. , 2006). Wehope that our dataset will enable further research on modelsof neural function that incorporate naturalistic variability.The presence of false positives in the data alsomotivates exploring algorithms for machine learning withnoisy labels (Rolnick et al. , 2017; Natarajan et al. , 2013;Han et al. , 2018). This paradigm has been well studiedfor other applications of machine learning such as computervision (Li et al. , 2017), where amassing large datasets withnoisy labels is relatively inexpensive, but quality labelingis expensive to obtain. The large behavioral variabilityassociated with our neural data could also be used toinvestigate optimal training set selection, i.e. what typesof and how much training data could be ideal for training adecoder (Wei et al. , 2015, 2014; Krause et al. , 2008). Suchcharacterizations could be used to inform the engineeringof BCIs, making them signiﬁcantly more robust to thevariations present in real-world deployments.

EFERENCES Code and dataset release

Code to reproduce several key plots in this manuscript ispublicly available at: https://github.com/BruntonUWBio/singh2020. Our complete curated dataset consisting ofevents and their metadata and associated neural data canalso be downloaded following the instructions provided atthe aforementioned URL.

Acknowledgements

We thank John So for extensive help with manualannotation of the video data. This work beneﬁted from andwas enabled by the groundwork laid by Nancy X. R. Wangtowards study approval, initial clinical data procurement,preprocessing, and manual annotation, and establishing theplausibility of movement initiation prediction using a subsetof this clinical data. We thank the neurosurgeons Dr.Jeffrey G. Ojemann and Dr. Andrew Ko, and the staffand consenting patients at the University of WashingtonHarborview Medical Center in Seattle, for supporting thisresearch. We thank Pierre Karashchuk, Kameron D. Harris,James Wu, Nile Wilson, Ariel Rokem, Renshu Gu, DavidJ. Caldwell, and Preston Jiang for helpful discussionsand suggestions. This work was funded by NSF award(1630178) and DOD/DARPA award (FA8750-18-2-0259)to BWB and RPNR, NSF award EEC-1028725 to RPNR,the Alfred P. Sloan Foundation (BWB), and the WashingtonResearch Foundation (BWB).

Author Contributions

SHS, RPNR, and BWB conceived of the study/analysis.SHS and SMP performed the data analysis. SHS, SMP,RPNR, and BWB interpreted the results. SHS and BWBwrote the manuscript. SHS, SMP, RPNR and BWB editedthe manuscript. RPNR and BWB acquired funding for theproject.

References

H. Abbaspourazad, Y. Wong, B. Pesaran, and M. M.Shanechi. Identifying multiscale hidden states to decodebehavior. In , pages 3778–3781. IEEE, 2018.A. Alasfour, P. Gabriel, X. Jiang, I. Shamie, L. Melloni,T. Thesen, P. Dugan, D. Friedman, W. Doyle, O. Devin-sky, et al. Coarse behavioral context decoding.

Journalof Neural Engineering , 16(1):016021, 2019.D. Anderson and P. Perona. Toward a Science ofComputational Ethology.

Neuron , 84(1):18–31, October2014. N. K. N. Aznan, A. Atapour-Abarghouei, S. Bonner,J. D. Connolly, N. Al Moubayed, and T. P. Breckon.Simulating brain signals: Creating synthetic EEG datavia neural-based generative models for improved SSVEPclassiﬁcation. In , pages 1–8. IEEE, 2019.E. Batty, M. Whiteway, S. Saxena, D. Biderman, T. Abe,S. Musall, W. Gillis, J. Markowitz, A. Churchland,J. Cunningham, et al. BehaveNet: nonlinear embeddingand bayesian neural decoding of behavioral videos. In

Advances in Neural Information Processing Systems ,pages 15680–15691, 2019.A. M. Beck, E. P. Stephen, and P. L. Purdon. Statespace oscillator models for neural data analysis. In ,pages 4740–4743. IEEE, 2018.G. J. Berman, W. Bialek, and J. W. Shaevitz. Predictabilityand hierarchy in drosophila behavior.

Proceedings of theNational Academy of Sciences , 113(42):11943–11948,2016.G. Berman. Measuring behavior across scales.

BMCBiology , 16(1), December 2018.B. Blankertz, K.-R. Muller, G. Curio, T. M. Vaughan,G. Schalk, J. R. Wolpaw, A. Schlogl, C. Neuper,G. Pfurtscheller, T. Hinterberger, et al. The BCIcompetition 2003: progress and perspectives in detectionand discrimination of EEG single trials.

IEEETransactions on Biomedical Engineering , 51(6):1044–1051, 2004.B. Blankertz, K.-R. Muller, D. J. Krusienski, G. Schalk,J. R. Wolpaw, A. Schlogl, G. Pfurtscheller, J. R. Millan,M. Schroder, and N. Birbaumer. The BCI competitioniii: Validating alternative approaches to actual BCIproblems.

IEEE Transactions on Neural Systems andRehabilitation Engineering , 14(2):153–159, 2006.L. Bourdev and J. Malik. Poselets: Body partdetectors trained using 3D human pose annotations. In

International Conference on Computer Vision , sep 2009.L. Breiman. Random forests.

Mach. Learn. , 45(1):5–32,October 2001.B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N.Kutz. Extracting spatial–temporal coherent patternsin large-scale neural recordings using dynamic modedecomposition.

Journal of Neuroscience Methods ,258:1–15, 2016.Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh. Realtime multi-person 2d pose estimation using part afﬁnity ﬁelds. In

Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition , pages 7291–7299, 2017.C. Chambers, N. Seethapathi, R. Saluja, H. Loeb, S. Pierce,D. Bogen, L. Prosser, M. J. Johnson, and K. P.Kording. Computer vision to automatically assess infantneuromotor risk.

BioRxiv , page 756262, 2019.

EFERENCES

Cerebral Cortex ,28(10):3610–3622, 2018.M. Cohen.

Analyzing Neural Time Series Data: Theory andPractice . MIT Press, January 2014.S. Cole and B. Voytek. Cycle-by-cycle analysis of neuraloscillations.

Journal of Neurophysiology , 122(2):849–861, 2019.S. R. Datta. Q&A: Understanding the composition ofbehavior.

BMC Biology , 17(1):44, 2019.V. Elango, A. N. Patel, K. J. Miller, and V. Gilja. Sequencetransfer learning for neural decoding.

BioRxiv , page210732, 2017.A. Farshchian, J. A. Gallego, J. P. Cohen, Y. Bengio, L. E.Miller, and S. A. Solla. Adversarial domain adaptationfor stable brain-machine interfaces. In . OpenReview.net,2019.D. Fu, W. Crichton, J. Hong, X. Yao, H. Zhang, A. Truong,A. Narayan, M. Agrawala, C. R´e, and K. Fatahalian.Rekall: Specifying Video Events using Compositions ofSpatiotemporal Labels. arXiv:1910.02993 [cs] , October2019.P. Gabriel, K. Chen, A. Alasfour, T. Pailla, W. Doyle,O. Devinsky, D. Friedman, P. Dugan, L. Melloni,T. Thesen, D. Gonda, S. Sattar, S. Wang, and V. Gilja.Neural Correlates of Unstructured Motor Behaviors.

Journal of Neural Engineering , 16(6):066026, October2019.S. Ghorbani, K. Mahdaviani, A. Thaler, K. Kording, D. J.Cook, G. Blohm, and N. F. Troje. Movi: A largemultipurpose motion and video dataset. arXiv preprintarXiv:2003.01888 , 2020.S. Gupta and A. Gomez-Marin. A context-free grammar forcaenorhabditis elegans behavior.

BioRxiv , page 708891,2019.M. Haller, T. Donoghue, E. Peterson, P. Varma, P. Se-bastian, R. Gao, T. Noto, R. T. Knight, A. Shestyuk,and B. Voytek. Parameterizing neural power spectra.

BioRxiv , page 299859, 2018.B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang,and M. Sugiyama. Co-teaching: Robust training ofdeep neural networks with extremely noisy labels. In

Advances in Neural Information Processing Systems ,pages 8527–8537, 2018.L. Hansen, M. Siebert, J. Diesel, and M. P. Heinrich.Fusing information from multiple 2D depth camerasfor 3D human pose estimation in the operating room.

International Journal of Computer Assisted Radiologyand Surgery , 14(11):1871–1879, 2019. K. G. Hartmann, R. T. Schirrmeister, and T. Ball. EEG-GAN: Generative adversarial networks for electroen-cephalograhic (EEG) brain signals. arXiv preprintarXiv:1806.01875 , 2018.T. Hastie, R. Tibshirani, and J. Friedman. The elementsof statistical learning: data mining, inference, andprediction, Springer Series in Statistics, 2009.A. Huk, K. Bonnen, and B. He. Beyond Trial-Based Paradigms: Continuous Behavior, OngoingNeural Activity, and Natural Stimuli.

The Journal ofNeuroscience , pages 1920–17, July 2018.M. J. Johnson and A. S. Willsky. Bayesian NonparametricHidden Semi-Markov Models.

Journal of MachineLearning Research , 14:673–701, February 2013.M. Johnson, D. Duvenaud, A. Wiltschko, R. Adams,and S. Datta. Composing graphical models withneural networks for structured representations and fastinference. In

Advances in Neural Information ProcessingSystems 29 , pages 2946–2954. Curran Associates, Inc.,2016.R. E. Johnson, S. Linderman, T. Panier, C. L. Wee, E. Song,K. J. Herrera, A. Miller, and F. Engert. Probabilisticmodels of larval zebraﬁsh behavior reveal structure onmany scales.

Current Biology , 30(1):70–82, 2020.P. Karashchuk, K. L. Rupp, E. S. Dickinson, E. Sanders,E. Azim, B. W. Brunton, and J. C. Tuthill. Anipose:a toolkit for robust markerless 3D pose estimation.

BioRxiv , 2020.S. L. Klosterman, J. R. Estepp, J. W. Monnin, and J. C.Christensen. Day-to-day variability in hybrid, passivebrain-computer interfaces: Comparing two studiesassessing cognitive workload. In , pages 1584–1590. IEEE, 2016.T. Kobayashi, Y. Aoki, S. Shimizu, K. Kusano, andS. Okumura. Fine-grained action recognition in assemblywork scenes by drawing attention to the hands. In , pages440–446. IEEE, 2019.A. Krause, H. B. McMahan, C. Guestrin, and A. Gupta.Robust submodular observation selection.

Journal ofMachine Learning Research , 9(Dec):2761–2801, 2008.Y. Li, J. Yang, Y. Song, L. Cao, J. Luo, and L.-J.Li. Learning from noisy labels with distillation. In

Proceedings of the IEEE International Conference onComputer Vision , pages 1910–1918, 2017.K. Luxem, F. Fuhrmann, J. Kuersch, S. Remy, and P. Bauer.Identifying behavioral structure from deep variationalembeddings of animal motion.

BioRxiv , 2020.

EFERENCES , pages 110–114. IEEE,2018.J. Markowitz, W. Gillis, C. Beron, S. Neufeld, K. Robert-son, N. Bhagat, R. Peterson, E. Peterson, M. Hyun,S. Linderman, B. Sabatini, and S. Datta. The StriatumOrganizes 3D Behavior via Moment-to-Moment ActionSelection.

Cell , 174(1):44–58.e17, June 2018.M. Mathis and A. Mathis. Deep learning tools forthe measurement of animal behavior in neuroscience.

Current Opinion in Neurobiology , 60:1–11, 2020.A. Mathis, P. Mamidanna, K. Cury, T. Abe, V. Murthy,M. Mathis, and M. Bethge. DeepLabCut: Markerlesspose estimation of user-deﬁned body parts with deeplearning. Technical report, Nature Publishing Group,2018.M. A. McDowell, C. D. Fryar, and C. L. Ogden.Anthropometric reference data for children and adults:United states, 1988-1994.

Vital and health statistics.Series 11, Data from the national health survey , (249):1–68, 2009.K. Miller, E. Leuthardt, G. Schalk, R. Rao, N. Anderson,D. Moran, J. Miller, and J. Ojemann. Spectral Changesin Cortical Surface Potentials during Motor Movement.

Journal of Neuroscience , 27(9):2424–2432, February2007.K. J. Miller. A library of human electrocorticographic dataand analyses.

Nature Human Behaviour , 3(11):1225–1235, 2019.K. P. Murphy.

Machine Learning: A ProbabilisticPerspective . The MIT Press, 2012.J. Nassar, S. Linderman, M. Bugallo, and I. M. Park. Tree-structured recurrent switching linear dynamical systemsfor multi-scale modeling. In

International Conference onLearning Representations , 2019.S. A. Nastase, A. Goldstein, and U. Hasson. Keep itreal: rethinking the primacy of experimental control incognitive neuroscience.

PsyArXiv , pages 1–12, 01 2020.N. Natarajan, I. S. Dhillon, P. K. Ravikumar, andA. Tewari. Learning with noisy labels. In

Advancesin Neural Information Processing Systems , pages 1196–1204, 2013.T. Nath, A. Mathis, A. Chen, A. Patel, M. Bethge, andM. Mathis. Using DeepLabCut for 3D markerlesspose estimation across species and behaviors.

NatureProtocols , 14:2152–2176, 2019. A. Omigbodun, W. K. Doyle, O. Devinsky, D. Friedman,T. Thesen, and V. Gilja. Hidden-markov factor analysisas a spatiotemporal model for electrocorticography. In ,pages 1632–1635. IEEE, 2016.R. Oostenveld, P. Fries, E. Maris, and J.-M. Schoffe-len. Fieldtrip: Open source software for advancedanalysis of MEG, EEG, and invasive electrophysiologi-cal data.

Computational Intelligence and Neuroscience ,2011:156869–156869, 2011.T. Pailla, K. J. Miller, and V. Gilja. Autoencoders forlearning template spectrograms in electrocorticographicsignals.

Journal of Neural Engineering , 16(1):016025,2019.C. Pandarinath, K. C. Ames, A. A. Russo, A. Farshchian,L. E. Miller, E. L. Dyer, and J. C. Kao. Latent factorsand dynamics in motor cortex and their application tobrain–machine interfaces.

Journal of Neuroscience ,38(44):9390–9401, 2018.C. Pandarinath, D. J. O’Shea, J. Collins, R. Jozefowicz,S. D. Stavisky, J. C. Kao, E. M. Trautmann, M. T.Kaufman, S. I. Ryu, L. R. Hochberg, et al. Inferringsingle-trial neural population dynamics using sequentialauto-encoders.

Nature Methods , page 1, 2018.J. Parvizi and S. Kastner. Human intracranial EEG:promises and limitations.

Nature Neuroscience ,21(4):474, 2018.T. D. Pereira, D. E. Aldarondo, L. Willmore, M. Kislin,S. S.-H. Wang, M. Murthy, and J. W. Shaevitz. Fastanimal pose estimation using deep neural networks.

Nature Methods , 16(1):117, 2019.S. M. Peterson, S. H. Singh, N. X. Wang, R. P. Rao, andB. W. Brunton. Behavioral and neural variability ofnaturalistic arm movements.

BioRxiv , 2020.A. Piergiovanni and M. S. Ryoo. Fine-grained activityrecognition in baseball videos. In

Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition Workshops , pages 1740–1748, 2018.S. Ramasamy Ramamurthy and N. Roy. Recent trendsin machine learning for human activity recognition - Asurvey.

Wiley Interdisciplinary Reviews: Data Miningand Knowledge Discovery , 8(4):e1254, 2018.M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele. Adatabase for ﬁne grained activity detection of cookingactivities. In , pages 1194–1201. IEEE, 2012.D. Rolnick, A. Veit, S. Belongie, and N. Shavit. Deeplearning is robust to massive label noise. arXiv preprintarXiv:1705.10694 , 2017.

EFERENCES

Journal of Neural Engineering , 16(5):051001, 2019.P. Sajda, A. Gerson, K.-R. Muller, B. Blankertz, andL. Parra. A data analysis competition to evaluate machinelearning algorithms for use in brain-computer interfaces.

IEEE Transactions on Neural Systems and RehabilitationEngineering , 11(2):184–185, 2003.N. Saraﬁanos, B. Boteanu, B. Ionescu, and I. A. Kakadiaris.3D human pose estimation: A review of the literatureand analysis of covariates.

Computer Vision and ImageUnderstanding , 152:1–20, 2016.R. W. Schafer. What is a Savitzky-Golay ﬁlter?

IEEESignal Processing Magazine , 28(4):111–117, 2011.G. Schalk, J. Kubanek, K. Miller, N. Anderson,E. Leuthardt, J. Ojemann, D. Limbrick, D. Moran,L. Gerhardt, and J. Wolpaw. Decoding two-dimensionalmovement trajectories using electrocorticographic sig-nals in humans.

Journal of Neural Engineering , 4(3):264,2007.N. Seethapathi, S. Wang, R. Saluja, G. Blohm, andK. P. Kording. Movement science needs different posetracking algorithms. arXiv preprint arXiv:1907.10226 ,2019.M. M. Shanechi. Brain–machine interfaces. In

DynamicNeuroscience , pages 197–218. Springer, 2018.M. M. Shanechi. Brain–machine interfaces from motor tomood.

Nature Neuroscience , 22(10):1554–1564, 2019.D. Shao, Y. Zhao, B. Dai, and D. Lin. Finegym: A hierar-chical video dataset for ﬁne-grained action understand-ing. In

Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition , pages 2616–2625, 2020.K. Shenoy and C. Chestek. Neural prosthetics.

Scholarpe-dia , 7(3):11854, 2012.P. Shenoy, M. Krauledat, B. Blankertz, R. P. Rao, and K.-R.M¨uller. Towards adaptive classiﬁcation for BCI.

Journalof Neural Engineering , 3(1):R13, 2006.P. Shenoy, K. J. Miller, J. G. Ojemann, and R. P.Rao. Generalized features for electrocorticographicBCIs.

IEEE Transactions on Biomedical Engineering ,55(1):273–280, 2007.Y. Shiraishi, Y. Kawahara, O. Yamashita, R. Fukuma,S. Yamamoto, Y. Saitoh, H. Kishima, and T. Yanagisawa.Neural decoding of electrocorticographic signals usingdynamic mode decomposition.

Journal of NeuralEngineering , 2020.E. Smalley. The business of brain-computer interfaces.

Nature Biotechnology , 37(9):978, 2019. A. Stolk, S. Grifﬁn, R. van der Meij, C. Dewar, I. Saez,J. J. Lin, G. Piantoni, J.-M. Schoffelen, R. T. Knight,and R. Oostenveld. Integrated analysis of anatomicaland electrophysiological human intracranial data.

NatureProtocols , 13(7):1699–1723, 2018.D. Summers-Stay, C. L. Teo, Y. Yang, C. Ferm¨uller, andY. Aloimonos. Using a minimal action grammar foractivity understanding in the real world. , pages 4104–4111, 2012.W. Tam, T. Wu, Q. Zhao, E. Keefer, and Z. Yang. Humanmotor decoding from neural signals: a review.

BMCBiomedical Engineering , 1(1):22, 2019.M. Tangermann, K.-R. M¨uller, A. Aertsen, N. Birbaumer,C. Braun, C. Brunner, R. Leeb, C. Mehring, K. J. Miller,G. Mueller-Putz, et al. Review of the BCI competition iv.

Frontiers in Neuroscience , 6:55, 2012.K. Volkova, M. A. Lebedev, A. Kaplan, and A. Ossadtchi.Decoding movement from electrocorticographic activity:A review.

Frontiers in Neuroinformatics , 13, 2019.N. Wang, J. Olson, J. Ojemann, R. Rao, and B. Brunton.Unsupervised Decoding of Long-Term, NaturalisticHuman Neural Recordings with Automated Video andAudio Annotations.

Frontiers in Human Neuroscience ,10, April 2016.N. Wang, A. Farhadi, R. Rao, and B. Brunton. AJILEmovement prediction: Multimodal deep learning fornatural human neural recordings and video. In

Thirty-Second AAAI Conference on Artiﬁcial Intelligence , 2018.D. J. Warren, S. Kellis, J. G. Nieveen, S. M. Wendelken,H. Dantas, T. S. Davis, D. T. Hutchinson, R. A. Normann,G. A. Clark, and V. J. Mathews. Recording anddecoding for neural prostheses.

Proceedings of the IEEE ,104(2):374–391, 2016.K. Wei, Y. Liu, K. Kirchhoff, and J. Bilmes. Unsupervisedsubmodular subset selection for speech data. In , pages 4107–4111. IEEE,2014.K. Wei, R. Iyer, and J. Bilmes. Submodularity in datasubset selection and active learning. In

InternationalConference on Machine Learning , pages 1954–1963,2015.A. Wiltschko, M. Johnson, G. Iurilli, R. Peterson, J. Katon,S. Pashkovski, V. Abraira, R. Adams, and S. Datta.Mapping Sub-Second Structure in Mouse Behavior.

Neuron , 88(6):1121–1135, December 2015.D. Wu, Y. Xu, and B. Lu. Transfer learning for EEG-basedbrain-computer interfaces: A review of progresses since2016. arXiv preprint arXiv:2004.06286 , 2020.

EFERENCES

Advances in Cognitive Systems ,pages 67–86, 2014.Y. Yang, E. F. Chang, and M. M. Shanechi. Dynamictracking of non-stationarity in human ECoG activity. In ,pages 1660–1663. IEEE, 2017.Y. Yang, O. G. Sani, E. F. Chang, and M. M. Shanechi. Dy-namic network modeling and dimensionality reductionfor human ECoG activity.

Journal of Neural Engineer-ing , 16(5):056014, 2019.H. Yuan and B. He. Brain–computer interfaces using sen-sorimotor rhythms: Current state and future perspec-tives.

IEEE Transactions on Biomedical Engineering ,61:1425–1435, 2014.Y. Zhao and I. M. Park. Interpretable nonlinear dynamicmodeling of neural trajectories. In

Advances inNeural Information Processing Systems , pages 3333–3341, 2016.

EFERENCES APPENDIX I p s il a t e r a l C on t r a l a t e r a l E S01 S02 S03 S04 S05 S06 S07 S08 S09 S10 S11 S12

Figure 1.

Contralateral and ipsilateral wrist movement initiation decoder normalized feature importance scores aggregated by electrode (NFI E ), showingspatial contributions of different brain regions for all 12 subjects. We see the same trend of motor cortex coverage being correlated with decodingaccuracy, as was noted in Figure 8. Subjects having good motor cortex coverage (S07, S06, S03 and S11) have the highest decoding performance(Figure 7). Additionally, we see that electrodes with high normalized feature importance tend to be more spatially localized in the case where good motorcortex coverage is available. Day 36a 12p 6p 6a 12p 6p 6a 12p 6pDay 4 Day 5S01S02S03S04S05S06S07S08S09S10S11S12 S ub j ec t I D Hospital stay day/timeSubject ID S S S S S S S S S S S S N u m b e r o f e v e n t s [ ]

0 Day 9

L_wrist 2020-06-05

Figure 2.

Number of left-wrist movement initiation events discovered per day for each of 12 subjects, totaling 484 to 3338 events per subject acrosstheir entire duration of clinical observation (219 ±

104 s.d. per day). [Right] Raster plot of left-wrist movement initiation occurrences. See Figure 3 forequivalent plots for the right-wrist. (a) Reachmagnitude (b) Reachduration (e) Reach angle0 250 0 1 2 3 0.0 0.5 1.0[pixels] [seconds] [seconds] [degrees]

IN DOWN OUTUP S ub j ec t I D S01S02S03S04S05S06S07S08S09S10S11S12

L_wrist 2020-06-05 (c) Bimanuallead/lag-0.5 0 +0.5[seconds] D e n s it y (d) Bimanualoverlap S01S05S09 S04S08S12