[PDF] Online Mobile App Usage as an Indicator of Sleep Behavior and Job Performance

Abstract

Sleep is critical to human function, mediating factors like memory, mood, energy, and alertness; therefore, it is commonly conjectured that a good night's sleep is important for job performance. However, both real-world sleep behavior and job performance are hard to measure at scale. In this work, we show that people's everyday interactions with online mobile apps can reveal insights into their job performance in real-world contexts. We present an observational study in which we objectively tracked the sleep behavior and job performance of salespeople (N = 15) and athletes (N = 19) for 18 months, using a mattress sensor and online mobile app. We first demonstrate that cumulative sleep measures are correlated with job performance metrics, showing that an hour of daily sleep loss for a week was associated with a 9.0% and 9.5% reduction in performance of salespeople and athletes, respectively. We then examine the utility of online app interaction time as a passively collectible and scalable performance indicator. We show that app interaction time is correlated with the performance of the athletes, but not the salespeople. To support that our app-based performance indicator captures meaningful variation in psychomotor function and is robust against potential confounds, we conducted a second study to evaluate the relationship between sleep behavior and app interaction time in a cohort of 274 participants. Using a generalized additive model to control for per-participant random effects, we demonstrate that participants who lost one hour of daily sleep for a week exhibited 5.0% slower app interaction times. We also find that app interaction time exhibits meaningful chronobiologically consistent correlations with sleep history, time awake, and circadian rhythms. Our findings reveal an opportunity for online app developers to generate new insights regarding cognition and productivity.

Full PDF

OOnline Mobile App Usage as an Indicator of Sleep Behavior andJob Performance

Chunjong Park ∗ , Morelle Arian ∗ , Xin Liu, Leon Sasson † , Jeffrey Kahn † Shwetak Patel, Alex Mariakakis ‡ , Tim Althoff University of Washington, Rise Science Inc. † , University of Toronto ‡ ABSTRACT

Sleep is critical to human function, mediating factors like memory,mood, energy, and alertness; therefore, it is commonly conjecturedthat a good night’s sleep is important for job performance. However,both real-world sleep behavior and job performance are difficultto measure at scale. In this work, we demonstrate that people’severyday interactions with online mobile apps can reveal insightsinto their job performance in real-world contexts. We present anobservational study in which we objectively tracked the sleep be-havior and job performance of salespeople ( 𝑁 =

15) and athletes( 𝑁 =

19) for 18 months, leveraging a mattress sensor and onlinemobile app to conduct the largest study of this kind to date. Wefirst demonstrate that cumulative sleep measures are significantlycorrelated with job performance metrics, showing that an hour ofdaily sleep loss for a week was associated with a 9.0% average re-duction in contracts established for salespeople and a 9.5% averagereduction in game grade for the athletes. We then investigate theutility of online app interaction time as a passively collectible andscalable performance indicator. We show that app interaction timeis correlated with the job performance of the athletes, but not thesalespeople. To support that our app-based performance indicatortruly captures meaningful variation in psychomotor function as itrelates to sleep and is robust against potential confounds, we con-ducted a second study to evaluate the relationship between sleepbehavior and app interaction time in a cohort of 274 participants.Using a generalized additive model to control for per-participantrandom effects, we demonstrate that participants who lost one hourof daily sleep for a week exhibited average app interaction timesthat were 5.0% slower. We also find that app interaction time ex-hibits meaningful chronobiologically consistent correlations withsleep history, time awake, and circadian rhythms. The findingsfrom this work reveal an opportunity for online app developers togenerate new insights regarding cognition and productivity.

KEYWORDS mobile app interaction, interaction time, sleep tracking, sleep be-havior, job performance ∗ Both authors contributed equally to this research.This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-8312-7/21/04.https://doi.org/10.1145/3442381.3450093

Sleep is essential to human function, affecting memory [89], en-ergy [20], mood [17], and alertness [3]. The importance of sleep iswidely accepted, yet a significant portion of the population does notget sufficient sleep at night [49], and an increasing number of peo-ple report experiencing sleep problems [12]. In recent years, sleeptracking has become more commonplace with the introductionof commercially available sleep-tracking technologies like smart-phones, smartwatches, mattress sensors, and other devices [50].Online mobile apps associated with such devices collect and man-age sleep data so that users can learn about and improve upon theirsleep behavior. In doing so, many people hope to feel more restedand be more productive at their workplace.The impact of sleep on people’s psychomotor function has beenwidely studied, usually in controlled lab settings. For example, re-searchers have found that partial sleep deprivation over multipledays can affect people’s ability to perform simple tasks like reactiontime tasks such as the psychomotor vigilance test (PVT) and mentalmath [56, 85]. The consequences of sleep deprivation have evenbeen found to be comparable to the cognitive and motor impair-ments experienced during alcohol intoxication [92]. Although priorliterature suggests that poor sleep behavior can impact real-worldjob performance, this relationship has remained largely unquanti-fied due to the lack of objective measures of both sleep behaviorand job performance. Many careers involve a complex combinationof cognitive and psychomotor tasks, so it is unclear how contrivedtasks like the PVT translate to higher level performance. Further-more, job performance assessment can require privacy-invasivemethods that disrupt a person’s work.Prior literature has leveraged technology interaction patternsas passive and scalable indicators of alertness and other aspects ofpsychomotor function [1, 36, 59, 66, 67, 69]. In our work, we extendthis literature by investigating the relationships between smart-phone app-based performance, objective sleep behavior metrics,and objective job performance metrics gathered from two concur-rent studies carried out over 18 months. In our first study (

STUDY1 ), we recruited 34 employees from two organizations—a bank-ruptcy law firm consultancy ( 𝑁 =

15) and the National FootballLeague ( 𝑁 = a r X i v : . [ c s . H C ] F e b WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al. is correlated with decreased job performance. We find that one hourof reduced time-in-bed daily for one week was associated with 9.0%fewer contracts established for the average salesperson and a 9.5%grade drop for the average athlete (Section 4.1).Since job performance metrics can be difficult to capture in prac-tice, we explore the possibility of using timed interactions withthe sleep-tracking app as an unobtrusive indicator of broader psy-chomotor performance. We examined the amount of time partici-pants spent interpreting the information on the app’s main screenas an instantiation of an app-based performance metric. We findthat app interaction times were correlated with the athletes’ gameperformance ( 𝜌 =-0.296, 𝑝 =0.046), but not the salespeople’s perfor-mance ( 𝜌 =-0.0752, 𝑝 =0.4106) (Section 4.2).Although STUDY 1 is larger than its predecessors, job perfor-mance can be extremely diverse and subject to many confoundsthat are infeasible to track (e.g., external personal issues leadingto diminished performance). To corroborate the use of app-basedperformance as a valid indicator of psychomotor function via itsrelationship to sleep, we tracked the sleep behavior and app inter-action times of 274 individuals (

STUDY 2 ). We analyzed this datato determine whether app interaction time is sensitive to knownsleep-related influences on psychomotor function while being ro-bust to other individual-level effects like user-specific baselinesand smartphone specifications. Across 7,200 tracked nights of sleepwith more than 16,000 app interaction events, our analyses revealthat daily variations in app interaction time aligned with constructsin sleep biology, most notably circadian rhythm [3, 16, 25] and sleepinertia [3]. We find that app interaction time was negatively corre-lated with time-in-bed ( 𝜌 =-0.015, 𝑝 =0.049), sleep history ( 𝜌 =-0.055, 𝑝 =3 . × − ), and sleep debt ( 𝜌 =-0.095, 𝑝 =5 . × − ). Furthermore,we demonstrate that participants who lost one hour of daily sleepover the previous week exhibited app interaction times that were 0.5seconds slower (Section 5.1). In summary, our research investigatesthe following questions: RQ.1

Is sleep behavior correlated with job performance?(

STUDY 1 , Section 4.1)

RQ.2

Is app-based performance correlated with job performance?(

STUDY 1 , Section 4.2)

RQ.3

Is app-based performance correlated with sleep behavior?(

STUDY 2 , Section 5.1)

In this section, we describe prior work on (1) sleep biology, (2)consumer sleep-tracking applications and their effects on users’sleep behavior, (3) the relationship between sleep and performance,and (4) the use of technology to passively infer performance.

Dijk et al. [16, 25] describe sleep biology using an additive two-process model consisting of circadian rhythm , the 24-hour biologicalcycle that occurs in nearly all creatures, and homeostasis , the in-creasing pressure to sleep as one stays awake for longer periods oftime. Akerstedt et al. [3] later added a third process: sleep inertia ,the initial drowsiness that occurs immediately after waking up. Paststudies have used two- and three-process models of sleep biology tounderstand the effects of sleep schedules on mood [35] and athletic performance [83]. Genetic predispositions and chronotyping (i.e.,“morning person” vs. a “night owl”) have also been shown to affectsleep behavior [4, 81]. Matchock et al. [62] and Althoff et al. [7] findevidence of significant interaction between circadian rhythms andchronotyping on reaction times. A recent study suggests that femalesleep duration is not strongly dependent on menstrual cycles [70].To characterize the importance of sleep, many studies requirethat subjects adhere to strict sleep schedules that range from a fullnight’s rest to total sleep deprivation [71]; however, researchershave noted that natural sleep is more commonly characterized bypartial sleep deprivation over multiple days, also known as chronicsleep restriction. Regardless of the specifics of one’s sleep schedule,researchers have noted the accumulation of sleep debt as an im-portant metric of sleep behavior [26, 27, 41, 85]. Calculating sleepdebt requires understanding an individual’s sleep need—the op-timal amount of daily sleep an individual requires. Sleep need isoften measured through a controlled study where a participant issubjected to extended time-in-bed over many days; under theseconditions, sleep length typically decays exponentially over timeand approaches an asymptote that represents the individual’s sleepneed [48]. Since sleep need is often difficult to measure in uncon-trolled settings, Kitamura et al. [47] propose a method of estimatingsleep debt based on one’s history of time-in-bed. We utilize Kita-mura et al.’s calculation of sleep debt as a cumulative sleep metricin our analyses. Furthermore, we leverage Akerstedt et al.’s three-process model to understand the correlations between sleep debt,job performance, and our app-based performance metric.

Traditional polysomnography studies utilize expensive sensors likeEEGs and EMGs to get fine-grained information about how a personsleeps [42]. Given the growing desire for health-related self-trackingtechnologies, sleep tracking has become more commonplace. Ko etal. [50] provide a review of consumer sleep sensing technologies.Their review covers sleep-sensing form factors like smartphones[19, 65], smartwatches, wristbands, mattress sensors, and wirelessradios [74]. Overall, these technologies are able to gather sleepmetrics ranging from sleep duration to disturbance frequency. Forinstance, Min et al. [65] propose a mobile app that processes sevendifferent sensor streams (e.g., motion, sound, light) to classify aperson’s sleep state and sleep quality, while Rahman et al. [74]demonstrate that coarse body movements and subtle chest move-ments from breathing and heartbeats can be detected by measuringthe reflections of high-frequency wireless signals.Beyond exploring new ways of extracting sleep data, anotherline of research has explored how sleep data should be presented tousers and how recommendations should be generated to improvesleep behavior. Bauer et al. [13] show that a recommendation-basedperipheral display can serve as a low-effort, yet effective, method forimproving awareness of healthy sleep behavior. Daskalova et al. [23]address the creation of personalized recommendations throughguided self-experimentation. Their mobile app, SleepCoacher, trackssleep behavior metrics from the accelerometer and microphone togenerate data-driven recommendations; as users engage with theserecommendations, SleepCoacher is able to measure whether the nline Mobile App Usage as an Indicator of Sleep Behavior and Job Performance WWW ’21, April 19–23, 2021, Ljubljana, Slovenia intervention had its intended effect. Daskalova et al. have alsoexplored cohort-based sleep tracking and recommendations [22].To the best of our knowledge, the aformentioned body of litera-ture has not explored the opportunity of using interactions with asleep-tracking system as an additional source of information. Theact of examining sleep data on a smartphone requires users to exertcognitive load, which itself is tied to sleep behavior. We introducethe notion of app-based performance to investigate whether inter-actions with a sleep-tracking system can provide insight into sleepbehavior or job performance.

In this work, we contextualize large-scale sleep data through job-based and mobile app-based performance measurements [5]. Oneof the most common tests that have been used for measuring psy-chomotor function is the psychomotor vigilance test (PVT) [14, 28,44], during which a person is asked to respond to a visual signal bypressing a button. Researchers have employed a variety of othercontrived tasks to measure cognitive and motor performance inrelation to sleep. Pilcher and Huffcutt [71] provide a meta-analysisof 19 research studies that examine task performance and moodas a function of sleep restriction. The cognitive tasks Pilcher andHuffcut list in their review include logical reasoning [15], mentalmath [11], visual search tasks [11], and word memory tasks [63].The motor tasks they reference include exercise [61, 68], endurancetasks [60], and muscle strength tests [86].The aforementioned tests have been used to examine the ef-fects of sleep quality on various performance dimensions. Rajdev etal. [75] use the PVT to validate a mathematical model of psychomo-tor performance based on sleep debt. Ramakrishnan et al. [76] alsouse the PVT to validate their own phenotype-specific group-averagemodel of psychomotor performance. Lo et al. [56] use a battery ofseven cognitive tasks to find that partial sleep deprivation impairsa wide range of cognitive functions, subjective alertness, and mood.Lastly, Killgore et al. [45, 46] study the effects of total sleep depriva-tion on measures of emotional intelligence, constructive thinking,and decision-making during a gambling task.Watson [91] provides a literature review on the interaction be-tween sleep and athletic performance, drawing closer to our re-search questions on sleep and job performance. In one of the mostclosely related studies to our own

STUDY 1 , Mah et al. [57] exam-ined the effects of sleep on collegiate basketball players. Their studyentailed having athletes maintain their typical sleep schedule, afterwhich they underwent a period of sleep extension with a minimumgoal of 10 hours in bed each night. The athletes were evaluatedusing the PVT, subjective scales of sleepiness, and performancemetrics specific to basketball practices (e.g., sprint time, shootingpercentage). Furthermore, the athletes were asked to rate their ownperformance during practices and games. Although research likethat of Mah et al. strives towards our goal of measuring high-leveljob performance rather than low-level task performance, their workfalls short capturing an objective measurement of in-game perfor-mance. In our work, we build on this literature by examining thecorrelation between sleep behavior and natural, widely acceptedjob performance metrics collected from the workplaces of athletesand salespeople (

STUDY 1 ). Our study represents the largest ef-fort to study this relationship to date, spanning multiple careers

Figure 1: The home screen of Rise Science’s sleep-trackingapp shows data about the user’s most recent night of sleep. categories and nearly 300 nights of data in a study population thatmore than doubles prior work [57].

Technology interaction patterns have been used as an indicator forunderstanding different aspects of performance. Regarding smart-phones, indicators like app usage and app-specific productivityhave been used to estimate alertness [1, 36, 59, 66, 67, 69]. For ex-ample, Murnane et al. [66] demonstrate that app usage patterns varyfor individuals with different chronotypes and rhythms of alert-ness. Oulasvirta et al. [67] correlate frequent, short bursts of smart-phone interaction (i.e., checking a notification) with inattentiveness.Other researchers have leveraged technology interaction to esti-mate higher level constructs like stress [32], mood [35, 64], academicperformance [90], inebriation [10, 58], and accident risk [6].Even with just the timing between two interaction events, re-searchers have been able to assess aspects of a person’s cognition.Vizer et al. [87] analyze variation in computer keystroke rate to inferincreased stress levels. Althoff et al. [7] track users’ typing and clickspeed on a web search engine as a measure of psychomotor function.Their work shows that keystroke time and click time vary basedon sleep duration, circadian rhythms, and the homeostatic process.Inspired by this prior work, we measure app interaction time as apotential non-intrusive indicator of performance (

STUDY 1 ) andsleep (

STUDY 2 ). Concretely, we demonstrate that this indicatorcorrelates with athletic job performance and meaningfully reflectspsychomotor performance variation due to biological functions (i.e.,circadian rhythm and sleep inertia).

Both of our observational studies,

STUDY 1 and

STUDY 2 , fol-lowed the same protocol. In this section, we first describe the tech-nology that participants used to track and monitor their sleep be-havior. We then describe the metrics that we use to quantify sleepbehavior, job performance, and app-based performance. We con-clude by detailing the procedures that were used to clean the datasetin preparation for analysis.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al.

Participants in both

STUDY 1 and

STUDY 2 were recruited asexisting users of Rise Science’s mobile app. We enrolled partici-pants via targeted recruitment (salespeople and athletes) as well asbroader recruitment calls, but participants from all sources wereonboarded in a similar fashion. Each participant received a kit con-sisting of an Emfit QS and a sleep-tracking mobile app as shownin Figure 1. The Emfit QS is a highly sensitive pressure sensor thatlies underneath the user’s mattress (or their preferred side of themattress when the bed is shared). The sensor uses ballistocardio-graphy to track heart rate, breathing rate, and movement. In paststudies, the Emfit QS has been validated against a standard clinicalheart rate monitor and polysomnography equipment [37, 51, 78].Within the sleep-tracking mobile app, participants can access andvisualize their own sleep data, view sleep session summaries, createsleep plans, and learn about the importance of sleep. The data collection period started on May 2017 and ended on De-cember 2018, spanning 592 days. Recruitment happened throughoutthat period, and participants joined and left the study at their owndiscretion. In

STUDY 1 (Section 4), participants from the bank-ruptcy law firm consultancy were enrolled for 225 days, and par-ticipants from the NFL teams were enrolled for 450–520 days. In

STUDY 2 (Section 5), participants were enrolled for 80–580 days.Demographic data like age and gender were not collected to mini-mize intrusion and maintain privacy. However, we know that mostNFL players are between 23–27 years old [33], and a public data-base estimated that 71% of the bankruptcy law firm’s employees aremillenials . In STUDY 2 , we expect that most of the participantswere under the age of 45 since self-tracking requires active engage-ment with sleep-technology, which is more common in youngerdemographics [9].Special care was taken to avoid coercion during recruitment. Toprotect participants’ privacy, employers were never told who en-rolled in the study and were only given aggregated results after thestudy was done. Participants did not receive explicit instructionsfrom the research team and were free to follow whatever sleepschedule they chose. Participants were also free to use the EmfitQS and sleep-tracking app at will; if they had to travel while par-ticipating in the study, they could choose to either bring the EmfitQS with them or leave it behind. The mobile app sent participantsnotifications, reminders, and recommendations for improving theirsleep (e.g., reducing caffeine intake, dimming lights); participantswere free to disable these features at any time. Our retrospectivedata analysis was conducted in accordance with the InstitutionalReview Board at the University of Washington.

The Emfit QS reports the following metrics to describe a singlenight’s rest: bedtime, wake time, sleep midpoint, time-in-bed, and https://qs.emfit.com/ https://play.google.com/store/apps/details?id=com.risesci.risesciapp https://apps.apple.com/us/app/rise-science/id1107659850?app=itunes&ign-mpt=uo%3D4 The firm’s name is removed to preserve their anonymity. total sleep duration. Time-in-bed measures how long a person isin their bed, thus only requiring accurate presence detection. Totalsleep duration, on the other hand, estimates how long a person isactually asleep in their bed, thus requiring both accurate presenceand sleep detection. Because total sleep duration is more suscep-tible to sensing errors, we exclude it from the analyses reportedin this paper and focus on time-in-bed measures, which can bemeasured with higher accuracy and validity. This choice is com-mon in previous work as well [1, 2, 7, 8, 88]. Nevertheless, sleepduration and time-in-bed were strongly correlated in our dataset( 𝜌 = . , 𝑝 < . ∑︁ 𝑖 = − 𝑒 − 𝑖 / ∗ ( SleepNeed − TimeInBed 𝑖 ) where 𝑖 is the number of days in the past. Note that the differencebetween sleep need and debt is weighted by a decaying exponentialwith a time constant of 7 days [77], indicating that recent measure-ments have greater importance. Whenever a participant skippeda day of tracking, we impute the missing time-in-bed value usingtheir average time-in-bed over the past week. Significant impu-tation happened in only 12.7% of the weeks (see Section 3.6 fordetails). Sleep need is typically estimated in a controlled labora-tory study, making it challenging to estimate sleep debt in the wild.Therefore, we estimate sleep need using the approach proposedby Kitamura et al. [47]. Their approach involves using long nightsof sleep to predict the difference between sleep need and habitualsleep (i.e., the average time-in-bed over two weeks) for a minimumof four nights. We also introduce a simplified sleep history metricthat avoids the notion of sleep need but still captures an aggregatemeasure of sleep behavior:1 (cid:205) 𝑛 = 𝑒 − 𝑛 / ∑︁ 𝑖 = 𝑒 − 𝑖 / ∗ TimeInBed 𝑖 The calculation of sleep history is normalized such that weightssum to one, making the metric more interpretable as a weightedaverage of time-in-bed over the past week.

For

STUDY 1 , we were able to utilize organizational partnershipsto gather the job-specific performance metrics described below:

The salespeople whoparticipated in our study work at a bankruptcy law firm consultancy.Their job entails fielding phone calls from potential clients in needof bankruptcy relief and referring those callers to an attorney. Theemployees collect a fee upon successfully hiring a client, which isthe company’s primary revenue source. Employees in this companyare evaluated on a variety of metrics related to that revenue stream,such as the amount they collect in fees. However, the distributionof fees is highly variable ($250–$1750) and primarily dependentupon the clients rather than the employees themselves. Therefore, nline Mobile App Usage as an Indicator of Sleep Behavior and Job Performance WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Metric DescriptionSleep

Bedtime Time at which the user got into their bedWake time Time at which the user got out of their bedMidpoint Midpoint between start and end timeTime-in-bed The total time the user spent in bed during a single day including nighttimesleep and naps, regardless of whether they were sleepingSleep debt Weighted average of difference between sleep need and time-in-bedSleep history Weighted average of time-in-bed

JobPerformance

Number of hires (salespeo-ple) Number of contracts made after consulting, normalized by the number ofhours they workGame grade (athletes) Score of a player’s game performance out of 100 assigned through threeindependent experts

App Usage

Interaction time Time between opening home screen of app to another screen by user’s touchinput

Table 1: A summary of the metrics we collect in our dataset through three data streams: (1) sleep metrics through the EmfitQS, (2) job performance through the participants’ employers, and (3) app usage through a sleep-tracking mobile app. we focus on the number of hires per day the salespeople were ableto establish as their job performance metric. This metric followsa right-sided normal distribution [55] with a mean of 3.80 and astandard deviation of 3.27 hires per day. Although work hourswere generally consistent across the company, we normalized thenumber of hires a salesperson made by the number of hours theyworked that day to account for whatever variance remained.

The athletes who partici-pated in our study play in a professional American football leaguein the United States. We gather job performance metrics for the ath-letes’ performance during weekly games using Pro Football Focus (PFF). PFF evaluates athletes using the following procedure [72]:two experts score every play the athlete is involved in, a third expertresolves disagreements between those experts, an external groupof ex-players and coaches verifies the scores, and then the scoresare summed together and normalized to a grade between 0–100.Although PFF is not purely quantitative, the experts can accountfor in-game context that is lost by purely statistical methods (e.g.,injuries, matchups). For this reason, PFF has been used in the pastliterature for assessing performance in football [18, 29, 73].In American football, each player has their own unique skill setaccording to their position; for example, quarterbacks are typicallyknown for their throwing ability and wide receivers are known fortheir speed and catching ability. The notion of positional special-ization makes it difficult to compare athletes across positions in apurely quantitative way, especially since some skills are position-specific. Nevertheless, PFF’s method of expert observation and scorenormalization allows them to produce an overall game performance grade that can be used to compare athletes across positions. Participants had to interact with a sleep-tracking app in order toexamine their sleep summaries, so we leverage these interactions asa novel source of data. We take inspiration from Althoff et al. [7] byusing app interaction time —the time between two touch events inthe app—as an app-based performance metric. App interaction timeis not meant to be a direct replacement of the PVT; instead, it servesas a more general measure of cognition by measuring the user’s ability to process information on the app’s screen. Interaction speedcan be confounded by the content that is shown on the screen. Toaccount for this confound, we restrict our analysis of app interactiontime to transitions from the home screen (shown in Figure 1) toother endpoints within or outside of the app. Sleep behavior, job performance, and app interaction metrics (Ta-ble 1) were collected from separate sources at different intervals.Therefore, post-processing was needed to join and collate them.

We followed best-practices in prepar-ing mobile app data for analysis [40]. The calculation of time-in-bedincluded naps, which were either automatically annotated if theuser’s bedtime or wake time fell in the afternoon (12:00-18:00)or manually annotated by the user. Naps appeared in 9.3% of thenightly sleep metrics (62% automatically tagged vs. 38% manuallyannotated), contributing an additional 1.22 hours to time-in-bedon average. Sleep events when participants spent more than 16hours in bed in a single session were attributed to faulty sensingand removed from the dataset. The remaining nights, along withimputed averages for missing values, were used for calculatingsleep debt and sleep history. A full week of sleep data was availablefor calculating 46.9% of the cumulative sleep metrics, meaning thatno imputation was needed for them; three or more nights were onlymissing in 12.7% of the cumulative sleep metrics. When cumulativesleep metrics were calculated without imputation, the standard de-viation of the times within the same week was only 1 hour and 10minutes; this shows that there was not significant variance withina week, justifying the use of a short-term average. For the analy-ses related to app-based performance, interaction events that wereshorter than 0.45 seconds (2 . th -percentile) were excluded sincethese were likely accidental or automatically generated by the appitself; events longer than 54.83 seconds (97 . th -percentile) wereexcluded since they were likely indicative of the user engaging inanother activity. Job performance data for the salespeo-ple was collected on a daily basis. Therefore, every night of sleepthat a salesperson tracked with their Emfit QS was collated with

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al.

STUDY 1 Statistics Salespeople Athletes

Number of participants 15 19Total unique days with both sleep-tracking and job performance measurements 118 171Total unique days with both app interaction and job performance measurements 122 46Total nights of sleep tracked with app-based performance measure 234 418Total nights of sleep tracked 834 2,687Total number of transitions between screens 679 909Total number of times app was opened 425 691Nights of sleep tracked per user (avg ± std) 46.33 ± ± ± std) 7.283 ± ± ± std) 28.25 ± ± Table 2: Summary statistics for our dataset in STUDY 1 after the filtering described in Section 3.6.Sleep MetricsRaw Metrics Per-Person Z-Normalized Metrics

Time-in-Bed Sleep Debt Sleep History Time-in-Bed Sleep Debt Sleep History J o b P e r f o r m a n c e M e t r i c s NFL PlayerGame Grades( 𝑁 = 19) -0.024( 𝑝 =0.751) -0.095( 𝑝 =0.218) -0.029( 𝑝 =0.711) 0.086 ( 𝑝 =0.263) 𝑝 =0.031) 0.179( 𝑝 =0.020) SalespeopleHires per Day( 𝑁 = 15) -0.067( 𝑝 =0.469) 𝑝 =0.022) 𝑝 =0.690) -0.102( 𝑝 =0.283) 0.164 ( 𝑝 =0.088) -0.047( 𝑝 =0.634) Table 3: Spearman correlation coefficients between sleep behavior and job performance. P-values are provided in parentheses;results with p-value < 0.05 are shown in bold. the job performance metric from the next day. Aligning the datastreams for the athletes was more difficult since they had games ona weekly basis. The athletes also had to travel to games away fromtheir home stadium, leaving larger gaps in their sleep-tracking data.To accommodate these issues, we aligned the weekly PFF gradeswith the sleep behavior metrics from the most recent tracked nightof sleep within the two nights before the relevant game day; if nonights were tracked in that span, the game grade from that weekwas filtered out.

Using D’Agostino’s 𝐾 test [21], we determined that the job per-formance metrics in our dataset were non-normally distributed(number of hires: 𝐾 =21.37, 𝑝 =2 . × − ; game grades: 𝐾 =14.87, 𝑝 =5 . × − ). The same holds true for app-based performance ( 𝐾 = . × − ) and app event count ( 𝐾 =71.60, 𝑝 =2 . × − ).Therefore, we use Spearman’s Rank Correlation ( 𝜌 ) across all cor-relational analyses throughout this paper. Our first study investigates

RQ.1 and

RQ.2 within a cohort of 15salespeople and 19 athletes. Table 2 shows the summary statistics ofour dataset after post-processing. The large standard deviations inthe various metrics are due to the logistics of our study. Participantswere recruited throughout the 18-month-long period, so some peo-ple had many more opportunities to use the sleep-tracking toolsthan others.

Using the objective job performance metrics we were able to ob-tain from our participants’ employers, we first examine whetherbetter sleep behavior improves job performance. To the best of ourknowledge, our study is the largest to date on this topic withoutany constraints on how participants slept or went about their dailyjobs [56, 57]. The code for all of our analyses can be found in theGitHub repository associated with this project . For this analysis, we calculate correlationcoefficients between the job performance metrics and three sleepbehavior metrics: time-in-bed, sleep debt, and sleep history. Sleepmetrics can vary across individuals due to genetic predispositionand other possible confounds [4, 81], so we repeat the analysisusing standardized sleep behavior metrics according the Z-scorewithin each individual’s data. Participants who did not track atleast 5 nights of sleep were excluded from this analysis to ensurethat the data was representative of their typical sleep behavior. Thesalespeople and athletes contributed data from 118 and 171 nightsof sleep with corresponding job performance metrics, respectively.

The correlation coefficients between the sleep be-havior and job performance metrics in our dataset are presented inTable 3. The analysis reveals positive, statistically significant corre-lations in some, but not all, cases. For the salespeople, sleep debt waspositively correlated with the number of hires they made ( 𝜌 =0.218, 𝑝 =0.022). For the athletes, normalized sleep history ( 𝜌 =0.179, 𝑝 =0.020)and sleep debt ( 𝜌 =0.166, 𝑝 =0.031) were both positively correlatedwith game performance. Fewer correlations were found for thesalespeople than the athletes, which could be due to the nature Code available at https://github.com/cjpark87/mobile-app-sleep-performance. nline Mobile App Usage as an Indicator of Sleep Behavior and Job Performance WWW ’21, April 19–23, 2021, Ljubljana, Slovenia - - - - - - Sleep Debt H i r e s P e r H o u r ρ =0.2181, p=0.0215 - - - O v e r a ll G a m e G r a d e ρ =0.1785, p=0.0195 O v e r a ll G a m e G r a d e ρ =0.1659, p=0.0307 Sleep History Z-Norm - - - Sleep Debt Z-Norm (a) Sleep Debt - - - O v e r a ll G a m e G r a d e ρ =0.1785, p=0.0195 O v e r a ll G a m e G r a d e ρ =0.1659, p=0.0307 Sleep History Z-Norm - - - Sleep Debt Z-Norm (b) Normalized Sleep History O v e r a ll G a m e G r a d e ρ =0.1659, p=0.0307 - - - Sleep Debt Z-Norm (c) Normalized Sleep Debt

Figure 2: Regression plots showing the effect sizes for thestatistically significant results from Table 3. The job perfor-mance of both the (a) salespeople and (b+c) athletes is sensi-tive to cumulative sleep metrics. Throughout the paper, thedata are binned into discrete and evenly distributed inter-vals (quintiles or deciles). Point estimate and error bars rep-resent mean estimate (black) and standard error (blue), re-spectively. Orange lines represent the best linear regressionfit to the raw data along with standard errors (shaded area). of their jobs. The athletes rely on millisecond-scale reaction timesduring their games, whereas salespeople do not need to operate atsuch a rapid pace. These results could imply that careers focused onphysical and psychomotor skills may be more strongly affected bysleep behaviors than careers that focus primarily on cognition. Thefact that multiple correlations emerged between cumulative sleepbehavior metrics and job performance, combined with the lackof such correlations from single-day metrics, suggests that sleepover an extended period has a stronger impact on a person’s jobperformance than a single night of sleep. Additionally, the generalincrease in correlation coefficients after the sleep behavior metricswere normalized within individuals supports the notion that sleepneeds and behaviors vary between individuals.We further analyze the statistically significant correlations bymeasuring their effect sizes, which are shown in Figure 2. Onehour of sleep debt by the average salesperson was associated witha 2.2% decrease in the number of hires they were able to make.Since sleep debt is a weighted sum of sleep deficits, another wayto consider this effect size is by saying that one hour of sleep lossthe night before was associated with 1.9% fewer hires. The averagesalesperson made 3.8 hires per workday and collected $936 in feesper hire. Therefore, a 1.9% decrease translates to a $67 loss perday. The average athlete experienced a 2.0% drop (1.3 points) intheir game grade when they lost one hour of sleep the night before.Although these performance decreases may appear small, they canaccumulate over time or across multiple people on the same team.In fact, sleep debt implies that a deficit can be spread over multiple

Interaction Time (sec) O v e r a ll G a m e G r a d e ρ =-0.2964, p=0.0455 (a) Game Grade Interaction Time (sec) H i r e s P e r H o u r ρ =-0.0752, p=0.4106 (b) Hires Per Hour Figure 3: Regression plots showing the effect sizes betweenapp-based performance and (a) overall game grade and (b)hires per day. App interaction time is sensitive to the jobperformance of the athletes, but not the salespeople. days, so one hour of sleep loss the night before is equivalent to 2.4hours of sleep loss a week before or 0.2 hours of sleep loss everyday for a week. A more severe, but not uncommon scenario oflosing an hour of sleep every day for a week is equivalent to losing4.75 hours of sleep yesterday or 11.2 hours of sleep one week ago.On average, this loss in sleep debt was associated with a 9.5% (6.2points) reduction in game performance, and a 9.0% ($317) reductionin hires for salespeople.

Having supported the hypothesis that better sleep behavior is cor-related with heightened job performance, we now explore the pos-sibility of leveraging passively captured app interaction data asa non-invasive indicator of job performance. We investigate thisquestion on the basis that app-based performance provides an in-situ measurement of psychomotor and cognitive function that maybe easier to track than sleep behavior or job performance itself.

To examine whether app interactiontime could serve as a non-invasive indicator of job performance,we calculate the correlation between these two data sources. Wealso fit least squares models between app interaction time and jobperformance metrics to determine effect sizes. The salespeople andathletes contributed data from 122 and 46 unique days with bothapp interaction and job performance measurements; note that this isa many-to-one relationship since participants frequently interactedwith their app multiple times in the same day.

Figure 3 shows real-world job performance againstapp interaction time for those participants. App interaction timewas not found to be significantly correlated with the number of hiresthe salespeople made ( 𝜌 =-0.0752, 𝑝 =0.411). A significant correlationwas found between app interaction time and the athletes’ gamegrade ( 𝜌 =-0.296, 𝑝 =0.0455). The effect size shows that athletes whowere 10 seconds faster in their app interaction time had an averageof 5 more points in game grades. Our app interaction metric ispartly related to reaction time, so the discrepancy between athletesand salespeople in this analysis may be because the athletes’ day-to-day activities require rapid, precise reactions; the salespeople’sactivities, on the other hand, are typically more forgiving withrespect to psychomotor function. Another explanation could be that WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al.

Raw Sleep Data Per PersonZ-Normalization of Sleep Data

Time-in- Sleep Sleep Time-in- Sleep SleepBed History Debt Bed History Debt

Interaction -0.015 -0.055 -0.095

Time ( 𝑝 =0.049) ( 𝑝 = . × − ) ( 𝑝 = . × − ) ( 𝑝 =0.483) ( 𝑝 =0.140) ( 𝑝 =0.230) Table 4: Spearman correlation coefficients between sleep behavior and app-based performance. P-values are provided in paren-theses; results with p-value < 0.05 are shown in bold.STUDY 2 Statistics Participants

Number of participants 274Total nights of sleep tracked with app-basedperformance measurements 7,195Total nights of sleep tracked 30,618Total number of transitions between screens 16,336Total number of times app was opened 11,140Nights of sleep tracked per user (avg ± std) 109.2 ± ± std) 7.338 ± ± std) 43.68 ± ± std) 10.14 ± Table 5: Summary statistics for our dataset in STUDY 2 afterthe filtering described in Section 3.6.

PFF includes contextual information, such as whether the opponentpresented a favorable matchup during a game; the number of hiresa salesperson can make in a given day is more dependent uponexternal factors (e.g., customer needs, health of the economy).

The previous study revealed statistically significant correlationsbetween our app-based performance measurement and athletic jobperformance, but not salesperson job performance. These findingshighlight the fact that jobs can be extremely diverse (e.g., uniqueskill requirements and methods of rating performance), makingit challenging to test the generalizability of our findings even fur-ther. If app interaction time is truly an indicator of performance, itshould be sensitive to factors that are known to impact psychomo-tor function. Therefore, we conducted an exploration on a broaderpopulation of 274 participants to support the idea that app-basedperformance truly captures aspects of a person’s psychomotor andcognitive function. Using the PVT, sleep researchers have demon-strated that psychomotor and cognitive function improve withbetter sleep behavior [75, 76]. Separately, computing researchershave shown that the timing between interaction events in a desktopor smartphone can be an indicator of psychomotor and cognitivefunction [7, 87]. Our third and final research question (

RQ.3 ) aimsto join these two bodies of literature. Table 5 shows the summarystatistics of the dataset used for this analysis after post-processing.

It is well established that psychomotorand cognitive function vary throughout the day due to circadianrhythms homeostatic sleep drive, and sleep inertia, collectively A pp I n t e r a c t i o n T i m e ρ =-0.0154, p=0.0490 Time in Bed (a) Time-in-bed A pp I n t e r a c t i o n T i m e ρ =-0.0154, p=0.0490

10 8 6 4 2 0

Sleep Debt A pp I n t e r a c t i o n T i m e ρ =-0.0948, p=5.2e-30 Time in Bed

Sleep History A pp I n t e r a c t i o n T i m e ρ =-0.0549, p=3.9e-11 (b) Sleep History A pp I n t e r a c t i o n T i m e ρ =-0.0154, p=0.0490

10 8 6 4 2 0

Sleep Debt A pp I n t e r a c t i o n T i m e ρ =-0.0948, p=5.2e-30 Time in Bed (c) Sleep Debt

Figure 4: Regression plots showing the effect sizes for thestatistically significant results from Table 4. App interactiontime is sensitive to many sleep metrics: (a) time-in-bed, (b)sleep history, and (c) sleep debt. forming the three-process model of sleep [3, 7, 34, 62]. Any per-formance indicator should therefore be sensitive to variations oftime and sleep. To examine whether this is the case for our app-based performance metric, we evaluate the relationship betweenapp interaction time and four different measures: time of day, timesince wake-up, sleep debt, and sleep history. Beyond calculatingthe correlation between these two data sources, we also create ageneralized additive model similar to the one proposed by Althoffet al. [7] to characterize app interaction time as a function of sleepbehavior and time of day. We extend this model by incorporatingrandom effects intercepts for each user, which not only accommo-dates user-specific performance baselines, but also accounts fordevice-specific effects like the rendering capabilities of the user’ssmartphone. If some participants took certain medications, regu-larly napped, or consistently consumed high quantities of caffeine,these confounds would be adjusted through the random interceptsas well. Our participants logged 7,195 nights of sleep that werepaired with at least one app interaction event during the same day.

Table 4 summarizes the correlation coefficients be-tween sleep behavior and app interaction time. Time-in-bed ( 𝜌 = − . 𝑝 = . 𝜌 = − . 𝑝 = . × − ), and nline Mobile App Usage as an Indicator of Sleep Behavior and Job Performance WWW ’21, April 19–23, 2021, Ljubljana, Slovenia sleep debt ( 𝜌 = − . 𝑝 = . × − ) had negative correlationswith app interaction time; in other words, participants with bettersleep behaviors had faster app interaction times. Although the corre-lation coefficients on individual performance are rather small due tosignificant variation within and across participants, these estimatesalign with findings from previous work [7, 56, 85]. When averagingthe app interaction times of samples within a certain sleep metricbin, the effect sizes are practically meaningful and span differencesof up to 2.5 seconds. For example, as shown in Figure 4(c), one hourless of sleep debt was associated with app interactions times thatwere 0.175 seconds slower than the average. Another way to framethis effect size is that one hour less of daily sleep over the past weekwas associated with app interaction times that were 0.72 secondsslower. As before, the cumulative sleep behavior metrics exhibitedstronger correlations than total time-in-bed; however, app-basedperformance correlated better with non-normalized sleep behaviormetrics. This result suggests that the minimal complexity of theapp interaction task engendered less variance across individuals.Moreover, we found that extended sleep does not improve psy-chomotor performance. Figure 4(b) shows that app interaction timewas fastest when individuals had an average of 7.75 hours of dailysleep over the past week. Similar U-shaped relationships have alsobeen reported in previous work on psychomotor performance [7]and other outcomes (e.g., mortality [53]).Since we found statistically significant correlations between cu-mulative sleep behavior metrics and job performance, we usedsleep history and sleep debt in generalized additive models. Fig-ure 5 shows the variation of app interaction time as a functionof time of day, time since wake-up, and the aforementioned met-rics. We find that app interaction times are slowest at night andfastest between 3-6 PM; the difference between those extremes isapproximately 1.5 seconds. Note that the relationship between appinteraction time and time of day (Figure 5, top) generally aligns withcircadian rhythm processes as measured through controlled sleepstudies [3, 16, 25]. Our results also align with the chronobiologicalprocess of sleep inertia [3] since participants had slower app inter-action times within one hour of waking up (Figure 5, middle). Appinteraction time decreases in the first six hours after wake-up andthen begins to increase again, consistent with both the chronobio-logical process of homeostatic sleep drive [16] and previous workexamining click speeds in search engines [7]. App interaction timeincreased by an average of 0.4 seconds when sleep history improvedfrom 6 to 8 hours, and app interaction time increased by 0.5 secondsbeyond the threshold of -5 sleep debt hours. In other words, whenparticipants lost one hour of sleep daily for a week, they exhibitedapp interaction times that were 5% (0.5 seconds) slower. Note thatthis estimate is slightly less than the estimate of 0.72 seconds inFigure 4(c). The difference between the two estimates is explainedby the fact that the generalized additive model controls for theimpacts of circadian rhythm, homeostatic sleep drive, and sleepinertia, as well as participant-specific baselines through randomeffects. Establishing the relationship between sleep behavior and job per-formance has been a challenge in the past due to the difficulty

Local Time (hr) C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) Time Since Wakeup (hr) C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) Sleep History (a) Accounting for Sleep History

Local Time (hr) C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) Time Since Wakeup (hr)

Sleep Debt C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) C o n t r i b u t i o n t o I n t e r a c t i o n T i m e ( s e c ) (b) Accounting for Sleep Debt Figure 5: Generalized additive models of app interactiontime accounting for (a) sleep history and (b) sleep debt. Inboth cases, the models account for (top row) the local timein the participant’s time zone, (center row) time since wake-up, and (bottom row) sleep behavior. These models showthat app interaction time is sensitive to sleep behaviors in-cluding circadian rhythm, time awake, and cumulative sleepmetrics. Both models include random intercepts for eachparticipant, and standard errors are shown. in collecting objective measures in real-world settings. By takingadvantage of ubiquitous sleep-tracking technology and the increas-ing desire within companies to evaluate job performance throughdata, our research signifies a major step towards understandingthis relationship. We demonstrate that an app-based performancemetric is correlated with both job performance metrics and sleepbehaviors in a way that is consistent with sleep biology. This high-lights an interesting opportunity for future assessments of sleepand performance in uncontrolled settings. Below, we describe theimplications and limitations of our work.

The PVT has been used to measure psychomotor and cognitive func-tion in the wild [2]; however, the PVT can be disruptive if deployedat inopportune moments. Other prior work has required partici-pants to adhere to a strict sleep schedule in order to measure theeffects of sleep on behavior [56, 85]. In our work, we found that ourinstantiation of app-based performance was correlated with bothbetter sleep behavior and athletic job performance, suggesting thepotential power of a passive, nonintrusive performance indicator.Passive sensing through ubiquitous technologies like smartphones

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al. can enables continuous data collection for the study of populationsthat have traditionally been difficult to recruit to controlled studies.We restricted our correlation analysis of app-based performanceto comparable interactions within the sleep-tracking app that startedfrom the home screen and involved single touches; however, notall interactions are created equal, nor does app interaction time tellthe whole story about how the user is engaging with the app’s con-tent. Some screens require more time to process than others, andlonger processing times may indicate that the user is engaging morewith the displayed information. Understanding how app interactiontime is a function of on-screen content could be explored furtherto enable more robust measurements. Beyond app interaction time,comparable performance metrics have also been elicited throughother interactions like typing and web browsing [7, 82, 87]. Re-sponses to alarms and notifications could also provide more naturalopportunities for capturing app-based performance in the future.

One design recommendation that we propose for sleep-trackingapps involves personalized views of sleep metrics. Many researchershave noted that sleep behaviors are unique according to geneticpredisposition and chronotyping [4, 81]. Throughout our analy-ses, there were cases when normalizing sleep behavior metricsaccording to each user’s history produced statistically significantcorrelations, but the same was not true for the raw data. Present-ing raw values in combination with data that is scaled relative tothe individual could provide useful insights to users in the future.Because sleep quality is subjective and not well-defined [39, 79],future apps could also allow users to explore what sleep metricsmatter to their perceived sleep quality. In fact, we posit that jobperformance may be influenced by a person’s perception of theirown sleep quality, so our research may inform ways of exploringthis matter in the future.Finally, lapses in sleep tracking and the resulting lack of dataare an important consequence of real-world data collection thatshould be addressed. Our dataset exhibited an extreme case ofthis issue since athletes can be away from home for at least 3-4 days at a time; nevertheless, travel is a regular occurrence formany people. The cumulative sleep metrics in our dataset—sleephistory and sleep debt—were most informative in our analysesrelated to sleep behavior. We used the average time-in-bed of nearbynights for imputation when a participant skipped a night of sleeptracking (Section 3.6). Future work could explore other alternativesto imputation, such as improving generative models through deeplearning [30, 94] or multi-device sensing to remedy data gaps [50].

PFF game grades are able to incorporate context because they areassigned by experts who watch the games and understand the ath-letes’ match. Our other data streams, however, lacked such context.For example, the performance of salespeople depends on the de-mand of their goods and services. Job performance in general isalso a function of experience and division of labor. Such informa-tion from managers and worker profiles could be incorporated forrefined analyses in future work. Sleep is known to be affected by a wide variety of factors: age [24,31, 93], ambient light [52], caffeine intake [54], and diet [38], toname a few. The effect of travel between time zones (2–3 hour differ-ence) has not been shown to significantly impact sleep [80], but aneffect has been demonstrated on athletic performance [43]. Measur-ing these factors through sensors and accounting for their effects instatistical analyses could improve evidence of links between sleepbehavior, job performance, and app usage.

Our dataset included participants from a bankruptcy law firm con-sultancy and the NFL, which allowed us to compare two populationswith distinct job demands whose job performance can be quantifiedeffectively. In both cases, we were able to identify sleep behaviormetrics that correlated with job performance; however, the correla-tions manifested in different sleep behavior metrics (e.g., sleep debtfor salespeople, personalized sleep history for athletes) (Section 4.1.Beyond the discrepancy between the two groups’ job demands, thedifferences in results can also be attributed to idiosyncrasies withinthe job performance metrics themselves. For the salespeople, thenumber of hires an employee is able to make may depend on thestate of the economy and the rate of bankruptcy in the country.For the athletes, the subjective nature of the expert’s grades canmanifest in anchoring effects towards common values [84]. We userank-based correlation methods and per-person normalization toaccount for some of these idiosyncrasies (Section 4.1), but futurework should explore and compare alternative sources of job per-formance data. Furthermore, an exciting avenue of research mayentail the creation of a job performance metric that generalizesacross different careers.Although salespeople and athletes have very different job de-mands, they do not cover the entire spectrum of careers. Eachprofession has its own demands and may not overlap with eitherof the ones that were included in our study. There was also anelement of selection bias in our participant pool; the people whoenrolled in our observational study may have been more excited totrack their sleep and interact with the app than the average person,producing inflated app engagement measurements. Similarly, theobservational and correlational nature of our data preclude us frommaking causal inferences. Learning about how our findings maygeneralize to other populations remains an area of future work.Lastly, there are many confounds that could have affected ourdatasets. People have unique habits that affect their sleep behaviorand job performance [38, 52, 54]. Unique smartphone parameterslike clock speed or operating system throttling due to current bat-tery level affect app interaction time. We addressed within-personconfounds as much as possible via statistical methods. For our corre-lational analyses, we examined both raw and per-person normalizedsleep behavior metrics (Section 4.1, 4.2, 5.1). For our generalizedadditive model of app interaction time against sleep behavior andtime of day, we utilized random effects intercepts to accommodatefor performance baselines, habits, and device specifications specificto each participant (Section 5.1). These steps helped us account forconfounds that existed throughout a participant’s enrollment inthe study, including regular medication intake, naps, or caffeineconsumption. nline Mobile App Usage as an Indicator of Sleep Behavior and Job Performance WWW ’21, April 19–23, 2021, Ljubljana, Slovenia

Many people recognize that improving sleep behavior benefits jobperformance, but the precise relationship between the two has beendifficult to capture and quantify in the past. Our study advancesthe literature in this space by providing a correlational analysisbetween objectively measured sleep behavior metrics from a mat-tress sensor and job performance metrics from a bankruptcy lawfirm and the NFL. Our findings suggest that establishing good sleepbehaviors over extended periods is more important to job perfor-mance than simply getting a good night’s sleep one day prior. Wealso found evidence that passively captured app interaction metricscan serve as a useful indicator for some job performance and sleepmeasures, thereby highlighting another mechanism through whichresearchers can collect relevant psychomotor and cognitive per-formance measures at scale. It is our hope that our work inspiresresearchers to examine in-situ sleep behaviors and performancemeasures across diverse contexts to further develop our understand-ing of human performance.

ACKNOWLEDGMENTS

This research has been supported in part by NSF grant IIS-1901386,Bill & Melinda Gates Foundation (INV-004841), the Allen Institutefor Artificial Intelligence, and a Microsoft AI for Accessibility grant.

REFERENCES [1] Saeed Abdullah, Mark Matthews, Elizabeth L. Murnane, Geri Gay, and TanzeemChoudhury. 2014. Towards circadian computing: "Early to bed and early to rise"makes some of us unhealthy and sleep deprived. In

Proc. UbiComp ’14 . 673–684.[2] Saeed Abdullah, Elizabeth L Murnane, Mark Matthews, Matthew Kay, Julie AKientz, Geri Gay, and Tanzeem Choudhury. 2016. Cognitive rhythms: unobtrusiveand continuous sensing of alertness using a mobile phone. In

Proc. UbiComp ’16 .ACM Press, New York, New York, USA, 178–189.[3] Torbjörn Åkerstedt and Simon Folkard. 1997. The three-process model of alertnessand its extension to performance, sleep latency, and sleep length.

ChronobiologyInternational

14, 2 (jan 1997), 115–123.[4] Karla V Allebrandt et al. 2010. CLOCK Gene Variants Associate with SleepDuration in Two Independent Populations.

Biological Psychiatry

67, 11 (jun2010).[5] Tim Althoff. 2017. Population-scale pervasive health.

IEEE Pervasive Computing

16, 4 (2017).[6] Tim Althoff, Eric Horvitz, and Ryen W White. 2018. Psychomotor functionmeasured via online activity predicts motor vehicle fatality risk. npj DigitalMedicine

1, 1 (2018).[7] Tim Althoff, Eric Horvitz, Ryen W White, and Jamie Zeitzer. 2017. Harnessing theWeb for Population-Scale Physiological Sensing. In

Proc. WWW ’17 . 113–122.[8] Sonia Ancoli-Israel, Roger Cole, Cathy Alessi, Mark Chambers, William Moor-croft, and Charles P Pollak. 2003. The role of actigraphy in the study of sleepand circadian rhythms.

Sleep

26, 3 (2003), 342–392.[9] Consumer Electronics Association and National Sleep Foundation. 2015.

Con-sumer Awareness and Perception of Sleep Technology . Consumer ElectronicsAssociation.[10] Sangwon Bae, Denzil Ferreira, Brian Suffoletto, Juan C Puyana, Ryan Kurtz,Tammy Chung, and Anind K Dey. 2017. Detecting Drinking Episodes in YoungAdults Using Smartphone-based Sensors.

Proc. IMWUT ’17

1, 2 (jun 2017), 1–36.[11] Joseph Baranski and Ross Pigeau. 1997. Self-monitoring cognitive performanceduring sleep deprivation: effects of modafinil, d-amphetamine and placebo.

Jour-nal of Sleep Research

6, 2 (jun 1997), 84–91.[12] Mathias Basner, Kenneth M Fomberstein, Farid M Razavi, Siobhan Banks, Jef-frey H William, Roger R Rosa, and David F Dinges. 2007. American time usesurvey: sleep time and its relationship to waking activities.

Sleep

30, 9 (2007).[13] Jared Bauer, Sunny Consolvo, Benjamin Greenstein, Jonathan Schooler, Eric Wu,Nathaniel F Watson, and Julie Kientz. 2012. ShutEye: Encouraging Awareness ofHealthy Sleep Recommendations with a Mobile, Peripheral Display. In

Proc. CHI’12 . ACM Press, New York, New York, USA, 1401.[14] Gregory Belenky et al. 2003. Patterns of performance degradation and restorationduring sleep restriction and subsequent recovery: A sleep dose-response study.

Journal of Sleep Research

12, 1 (mar 2003), 1–12. [15] Mark Blagrove, Carol Alexander, and James A Horne. 1995. The effects of chronicsleep reduction on the performance of cognitive tasks sensitive to sleep depriva-tion.

Applied Cognitive Psychology

9, 1 (feb 1995), 21–40.[16] Alexander A Borbély, Serge Daan, Anna Wirz-Justice, and Tom Deboer. 2016. Thetwo-process model of sleep regulation: A reappraisal.

Journal of Sleep Research

25, 2 (2016), 131–143.[17] David H Brendel et al. 1990. Sleep Stage Physiology, Mood, and Vigilance Re-sponses to Total Sleep Deprivation in Healthy 80-Year-Olds and 20-Year-Olds.

Psychophysiology

27, 6 (nov 1990), 677–685.[18] Nikhil Byanna and Diego Klabjan. 2016. Evaluating the Performance of OffensiveLinemen in the NFL. arXiv preprint arXiv:1603.07593 (2016).[19] Zhenyu Chen, Mu Lin, Fanglin Chen, Nicholas D Lane, Giuseppe Cardone, RuiWang, Tianxing Li, Yiqiang Chen, Tanzeem Choudhury, and Andrew T Campbell.2013. Unobtrusive sleep monitoring using smartphones. In

Proc. PervasiveHealth’13 . IEEE, 145–152.[20] Ronald D Chervin. 2000. Sleepiness, fatigue, tiredness, and lack of energy inobstructive sleep apnea.

Chest

Biometrika

58, 2 (1971), 341–348.[22] Nediyana Daskalova, Bongshin Lee, Jeff Huang, Chester Ni, and Jessica Lundin.2018. Investigating the Effectiveness of Cohort-Based Sleep Recommendations.

Proc. IMWUT ’18

2, 3 (2018), 1–19.[23] Nediyana Daskalova, Danaë Metaxa-Kakavouli, Adrienne Tran, Nicole Nugent,Julie Boergers, John McGeary, and Jeff Huang. 2016. SleepCoacher: A PersonalizedAutomated Self-Experimentation System for Sleep Recommendations. In

Proc.UIST ’16 . 347–358.[24] Derk Jan Dijk and Jeanne F Duffy. 1999. Circadian regulation of human sleepand age-related changes in its timing, consolidation and EEG characteristics.[25] Derk Jan Dijk, Jeanne F Duffy, and Charles A Czeisler. 1992. Circadian andsleep/wake dependent aspects of subjective alertness and cognitive performance.

Journal of Sleep Research

1, 2 (1992), 112–117.[26] David F Dinges. 2004. Sleep debt and scientific evidence.

Sleep

27, 6 (sep 2004).[27] David F Dinges, Frances Pack, Katherine Williams, Kelly A Gillen, John H Powell,Goeffrey E Ott, Caitlin Aptowicz, and Allen I Pack. 1997.

Cumulative Sleepiness,Mood Disturbance, and Psychomotor Vigilance Performance Decrements During aWeek of Sleep Restricted to 4-5 Hours per Night . Technical Report 4.[28] David F Dinges and John W Powell. 1985. Microcomputer analyses of performanceon a portable, simple visual RT task during sustained operations.

BehaviorResearch Methods, Instruments, & Computers

17, 6 (1985), 652–655.[29] Christopher C Dodson, Eric S Secrist, Suneel B Bhat, Daniel P Woods, and Peter FDeluca. 2016. Anterior Cruciate Ligament Injuries in National Football LeagueAthletes From 2010 to 2013: A Descriptive Epidemiology Study.

OrthopaedicJournal of Sports Medicine

4, 3 (mar 2016).[30] Chenguang Fang and Chen Wang. 2020. Time Series Data Imputation: A Surveyon Deep Learning Approaches. arXiv:cs.LG/2011.11347[31] Irwin Feinberg. 1974. Changes in sleep cycle patterns with age.

Journal ofPsychiatric Research

10, 3-4 (1974), 283–306.[32] Raihana Ferdous, Venet Osmani, and Oscar Mayora. 2015. Smartphone app usageas a predictor of perceived stress levels at workplace. In

Proc. PervasiveHealth 2015 .Institute of Electrical and Electronics Engineers Inc., 225–228. arXiv:1803.03863[33] Michael Gertz. 2017. NFL Census 2016 - ProFootballLogic.[34] Namni Goel, Mathias Basner, Hengyi Rao, and David F Dinges. 2013. Circadianrhythms, sleep deprivation, and human performance. In

Progress in MolecularBiology and Translational Science . Elsevier, 155–190.[35] Scott A Golder and Michael W Macy. 2011. Diurnal and Seasonal Mood Varywith Work, Sleep, and Daylength Across Diverse Cultures.

Science

Proc. CHI ’19 . 168.[37] G. Guerrero-Mora, Palacios Elvia, A. M. Bianchi, J. Kortelainen, M. Tenhunen,S. L. Himanen, M. O. Mendez, E. Arce-Santana, and O. Gutierrez-Navarro. 2012.Sleep-wake detection based on respiratory signal acquired through a PressureBed Sensor. In

Proceedings of the Annual International Conference of the IEEEEngineering in Medicine and Biology Society, EMBS . 3452–3455.[38] Shona L Halson. 2008. Nutrition, sleep and recovery.

European Journal of SportScience

8, 2 (mar 2008), 119–126.[39] Allison G Harvey, Kathleen Stinson, Katriina L Whitaker, Damian Moskovitz, andHarvinder Virk. 2008. The Subjective Meaning of Sleep Quality: A Comparisonof Individuals with and without Insomnia.

Sleep

31, 3 (mar 2008), 383–393.[40] Jennifer L Hicks, Tim Althoff, Peter Kuhar, Bojan Bostjancic, Abby C King, JureLeskovec, Scott L Delp, et al. 2019. Best practices for analyzing large-scale healthdata from wearables and smartphone apps.

NPJ digital medicine

2, 1 (2019).[41] Jim Horne. 2004. Is there a sleep debt?

Sleep

27, 6 (sep 2004), 1047–1049.[42] Vanessa Ibáñez, Josep Silva, and Omar Cauli. 2018. A survey on sleep assessmentmethods.

PeerJ

Medicineand Science in Sports and Exercise

25, 1 (jan 1993), 127–131.

WW ’21, April 19–23, 2021, Ljubljana, Slovenia Park and Arian, et al. [44] Megan E Jewett, Derk Jan Dijk, Richard E Kronauer, and David F Dinges. 1999.Dose-response relationship between sleep duration and human psychomotorvigilance and subjective alertness.

Sleep

22, 2 (1999), 171–179.[45] William DS Killgore, Thomas J Balkin, and Nancy J Wesensten. 2006. Impaireddecision making following 49 h of sleep deprivation.

Journal of Sleep Research

Sleep Medicine

Scientific Reports

6, 1 (dec 2016), 35812.[48] Elizabeth B Klerman and Derk Jan Dijk. 2008. Age-Related Reduction in theMaximal Capacity for Sleep-Implications for Insomnia.

Current Biology

18, 15(2008), 1118–1123.[49] Kristen L Knutson, Eve Van Cauter, Paul J Rathouz, Thomas DeLeire, and Diane SLauderdale. 2010. Trends in the prevalence of short sleepers in the USA: 1975-2006.

Sleep

33, 1 (2010), 37–45.[50] Ping-Ru T Ko, Julie A Kientz, Eun Kyoung Choe, Matthew Kay, Carol A Landis,and Nathaniel F Watson. 2015. Consumer Sleep Technologies: A Review of theLandscape.

Journal of Clinical Sleep Medicine

11, 12 (2015), 1455–1461.[51] Juha M. Kortelainen, Martin O. Mendez, Anna Maria Bianchi, Matteo Matteucci,and Sergio Cerutti. 2010. Sleep staging based on signals acquired through bedsensor.

IEEE Transactions on Information Technology in Biomedicine

14, 3 (may2010), 776–785.[52] Tomoaki Kozaki, Shingo Kitamura, Yuichi Higashihara, Keita Ishibashi, HirokiNoguchi, and Akira Yasukouchi. 2005. Effect of Color Temperature of LightSources on Slow-wave Sleep.

Journal of Physiological Anthropology and AppliedHuman Science

24, 2 (2005), 183–186.[53] Daniel F Kripke, Ruth N Simons, Lawrence Garfinkel, and E Cuyler Hammond.1979. Short and long sleep and sleeping pills: is increased mortality associated?

Archives of general psychiatry

36, 1 (1979), 103–116.[54] Hans Peter Landolt, Derk-Jan Dijk, Stephanie E Gaus, and Alexander A Borbély.1995. Caffeine reduces low-frequency delta activity in the human sleep EEG.

Neuropsychopharmacology

12, 3 (1995), 229–238.[55] F. C. Leone, L. S. Nelson, and R. B. Nottingham. 1961. The Folded NormalDistribution.

Technometrics

3, 4 (1961), 543–550.[56] June C Lo, Ju Lynn Ong, Ruth LF Leong, Joshua J Gooley, and Michael WL Chee.2016. Cognitive Performance, Sleepiness, and Mood in Partially Sleep DeprivedAdolescents: The Need for Sleep Study.

Sleep

39, 3 (2016), 687–698.[57] Cheri D Mah, Kenneth E Mah, Eric J Kezirian, and William C Dement. 2011. TheEffects of Sleep Extension on the Athletic Performance of Collegiate BasketballPlayers.

Sleep

34, 7 (jun 2011), 943–950.[58] Alex Mariakakis, Sayna Parsi, Shwetak N Patel, and Jacob O Wobbrock. 2018.Drunk User Interfaces: Determining Blood Alcohol Level through EverydaySmartphone Tasks. In

Proc. CHI ’18 , Vol. l. 1–13.[59] Gloria Mark, Shamsi T Iqbal, Mary Czerwinski, and Paul Johns. 2014. Boredmondays and focused afternoons: The rhythm of attention and online activity inthe workplace. In

Proc. CHI ’14 . 3025–3034.[60] Bruce J Martin. 1981. Effect of sleep deprivation on tolerance of prolongedexercise.

European Journal of Applied Physiology and Occupational Physiology

Medicine and Science in Sports and Exercise

13, 4 (1981), 220–223.[62] Robert L Matchock and J Toby Mordkoff. 2009. Chronotype and time-of-dayinfluences on the alerting, orienting, and executive components of attention.

Experimental brain research

Journal of Sleep Research

8, 3 (sep 1999), 185–188.[64] Abhinav Mehrotra, Robert Hendley, and Mirco Musolesi. 2016. Towards multi-modal anticipatory monitoring of depressive states through the analysis ofhuman-smartphone interaction. In

Proc. UbiComp ’16 . ACM, 1132–1138.[65] Jun-Ki Min, Afsaneh Doryab, Jason Wiese, Shahriyar Amini, John Zimmerman,and Jason I. Hong. 2014. Toss ’n’ turn: smartphone as sleep and sleep qualitydetector. In

Proc. CHI ’14 . 477–486.[66] Elizabeth L Murnane et al. 2016. Mobile manifestations of alertness. In

Proc.MobileHCI ’16 . ACM Press, New York, USA.[67] Antti Oulasvirta, Tye Rattenbury, Lingyi Ma, and Eeva Raita. 2012. Habits makesmartphone use more pervasive.

Personal and Ubiquitous Computing

16, 1 (2012).[68] GF Pickett and AF Morris. 1975. Effects of acute sleep and food deprivation ontotal body response time and cardiovascular performance.

The journal of sportsmedicine and physical fitness

15, 1 (mar 1975), 49–56. [69] Martin Pielot, Tilman Dingler, Jose San Pedro, and Nuria Oliver. 2015. Whenattention is not scarce-detecting boredom from mobile phone usage. In

Proc.UbiComp ’15 . 825–836.[70] Emma Pierson, Tim Althoff, Daniel Thomas, Paula Hillard, and Jure Leskovec.2021. Daily, weekly, seasonal and menstrual cycles in women’s mood, behaviourand vital signs.

Nature Human Behaviour (2021).[71] June J Pilcher and Allen I Huffcutt. 1996. Effects of sleep deprivation on perfor-mance.

Sleep

19, 4 (1996), 318–326.[72] Pro Football Focus. 2017. How We Grade. , 4 pages.[73] Matthew T Provencher et al. 2018. A History of Anterior Cruciate LigamentReconstruction at the National Football League Combine Results in InferiorEarly National Football League Career Participation.

Arthroscopy - Journal ofArthroscopic and Related Surgery

34, 8 (2018), 2446–2453.[74] Tauhidur Rahman et al. 2015. DoppleSleep: A contactless unobtrusive sleepsensing system using short-range doppler radar. In

Proc. UbiComp ’15 . 39–50.[75] Pooja Rajdev, David Thorsley, Srinivasan Rajaraman, Tracy L Rupp, Nancy JWesensten, Thomas J Balkin, and Jaques Riefman. 2013. A unified mathematicalmodel to quantify performance impairment for both chronic sleep restrictionand total sleep deprivation.

Journal of theoretical biology

331 (2013), 66–77.[76] Sridhar Ramakrishnan, Srinivas Laxminarayan, David Thorsley, Nancy J Wesen-sten, Thomas J Balkin, and Jaques Reifman. 2012. Individualized performanceprediction during total sleep deprivation: Accounting for trait vulnerability tosleep loss. In

Proc. EMBS ’12 . 5574–5577.[77] Sridhar Ramakrishnan, Nancy J Wesensten, Thomas J Balkin, and Jaques Reifman.2016. A Unified Model of Performance: Validation of its Predictions acrossDifferent Sleep/Wake Schedules.

Sleep

39, 1 (2016), 249–262.[78] Jukka Ranta, Timo Aittokoski, Mirja Tenhunen, and Mikko Alasaukko-Oja. 2019.EMFIT QS heart rate and respiration rate validation.

Biomedical Physics andEngineering Express

5, 2 (2019), 25016.[79] Ruth Ravichandran, Sang Wha Sien, Shwetak N Patel, Julie A Kientz, and Laura RPina. 2017. Making sense of sleep sensors: How sleep sensing technologiessupport and undermine sleep health. In

Proc. CHI ’17 . ACM, 6864–6875.[80] Louise K Richmond, Brian Dawson, Glenn Stewart, Stuart Cormack, David RHillman, and Peter R Eastwood. 2007. The effect of interstate travel on the sleeppatterns and performance of elite Australian Rules footballers.

Journal of Scienceand Medicine in Sport

10, 4 (jun 2007), 252–258.[81] Till Roenneberg, Anna Wirz-Justice, and Martha Merrow. 2003. Life betweenclocks: Daily temporal patterns of human chronotypes.

Journal of BiologicalRhythms

18, 1 (feb 2003), 80–90.[82] Martin Thirkettle, Jennifer Lewis, Darren Langdridge, and Graham Pike. 2018.A Mobile App Delivering a Gamified Battery of Cognitive Tests Designed forRepeated Play (OU Brainwave): App Design and Cohort Study.

JMIR SeriousGames

6, 4 (2018), e10519.[83] Eirunn Thun, Bjørn Bjorvatn, Elisabeth Flo, Anette Harris, and Ståle Pallesen.2015. Sleep, circadian rhythms, and athletic performance.

Sleep Medicine Reviews

23 (oct 2015), 1–9.[84] Amos Tversky and Daniel Kahneman. 1974. Judgment under uncertainty: Heuris-tics and biases.

Science

Sports Medicine

7, 4 (apr 1989), 235–247.[87] Lisa M Vizer, Lina Zhou, and Andrew Sears. 2009. Automated stress detectionusing keystroke and linguistic features: An exploratory study.

InternationalJournal of Human Computer Studies

67, 10 (oct 2009), 870–886.[88] Olivia J Walch, Amy Cochran, and Daniel B Forger. 2016. A global quantificationof “normal” sleep schedules using smartphone data.

Science advances

2, 5 (2016).[89] Matthew P Walker and Robert Stickgold. 2005. Sleep, Memory, and Plasticity.

Annual Review of Psychology

57, 1 (jan 2005), 139–166.[90] Rui Wang, Gabriella Harari, Peilin Hao, Xia Zhou, and Andrew T Campbell. 2015.SmartGPA: How smartphones can assess and predict academic performance ofcollege students. In

Proc. UbiComp ’15 . 1–13.[91] Andrew M Watson. 2017. Sleep and Athletic Performance.

Current Sports MedicineReports

16, 6 (2017), 413–418.[92] Ann M Williamson and Anne-Marie Feyer. 2000. Moderate sleep deprivation pro-duces impairments in cognitive and motor performance equivalent to legally pre-scribed levels of alcohol intoxication.

Occupational and Environmental Medicine

57, 10 (2000), 649–655.[93] In Young Yoon, Daniel F Kripke, Jeffrey A Elliott, Shawn D Youngstedt,Katharine M Rex, and Richard L Hauger. 2003. Age-related changes of circadianrhythms and sleep-wake cycles.