[PDF] Reconstructing Detailed Browsing Activities from Browser History

Abstract

Users' detailed browsing activity - such as what sites they are spending time on and for how long, and what tabs they have open and which one is focused at any given time - is useful for a number of research and practical applications. Gathering such data, however, requires that users install and use a monitoring tool over long periods of time. In contrast, browser extensions can gain instantaneous access months of browser history data. However, the browser history is incomplete: it records only navigation events, missing important information such as time spent or tab focused. In this work, we aim to reconstruct time spent on sites with only users' browsing histories. We gathered three months of browsing history and two weeks of ground-truth detailed browsing activity from 185 participants. We developed a machine learning algorithm that predicts whether the browser window is focused and active at one second-level granularity with an F1-score of 0.84. During periods when the browser is active, the algorithm can predict which the domain the user was looking at with 76.2% accuracy. We can use these results to reconstruct the total time spent online for each user with an R^2 value of 0.96, and the total time each user spent on each domain with an R^2 value of 0.92.

Full PDF

RReconstructing Detailed Browsing Activitiesfrom Browser History

Geza Kovacs

Stanford [email protected]

ABSTRACT

Users’ detailed browsing activity – such as what sites theyare spending time on and for how long, and what tabs theyhave open and which one is focused at any given time – isuseful for a number of research and practical applications.Gathering such data, however, requires that users install anduse a monitoring tool over long periods of time. In contrast,browser extensions can gain instantaneous access months ofbrowser history data. However, the browser history is incom-plete: it records only navigation events, missing importantinformation such as time spent or tab focused. In this work,we aim to reconstruct time spent on sites with only users’browsing histories. We gathered three months of browsinghistory and two weeks of ground-truth detailed browsing ac-tivity from 185 participants. We developed a machine learn-ing algorithm that predicts whether the browser window isfocused and active at one second-level granularity with anF1-score of 0.84. During periods when the browser is ac-tive, the algorithm can predict which the domain the user waslooking at with 76.2% accuracy. We can use these results toreconstruct the total time spent online for each user with an R value of 0.96, and the total time each user spent on eachdomain with an R value of 0.92. Author Keywords browsing histories; browsing activities; browser focus; webbrowsing

ACM Classiﬁcation Keywords

H.5.m. Information Interfaces and Presentation (e.g. HCI):Miscellaneous

INTRODUCTION

Knowing where users spend their time online, second-by-second, has numerous applications to both research and prod-uct. For example, productivity-tracking tools like Rescue-Time provide information about how much time users spendonline on productivity and entertainment sites. Browsing ac-tivity data is also essential for studying phenomenon such as self-interruptions, where users may take a break from work tospend time on other sites.However, gathering browsing activity data is a long and intru-sive process. It requires the end user to install a monitoringapplication — such as a browser extension — that continuallylogs where users are spending their time, and transmits it to aserver. This requires extensive permissions which may makeusers wary of participation, on suspicions that the extensionmay be malware. The user must also keep the extension in-stalled over the duration of the study. Most problematically, alongitudinal study is required, with duration equivalent to theamount of browsing activity data desired.Browsing histories, in contrast, can be instantaneously gath-ered by a browser extension. For a Chrome extension, thisrequires only a Browsing History permission, which is clas-siﬁed as low-risk. Browser histories can be automatically ormanually scanned and ﬁltered before sending [9], and can beuninstalled as soon as the history has been transmitted to theserver. Most promisingly, users’ browsing histories can storeup to several months of historical browsing data, allowing usto instantly get results without a longitudinal study.In this work we aim to reconstruct four pieces of browsingactivity, using only browsing history: • When is the browser focused and being actively used? • What domain is the browser focused on at any given time? • How much time did each user spend actively browsing? • How much time did each user spend on each domain?These tasks are non-trivial because the browsing history rep-resents only a thin slice: it logs events when a new page is vis-ited, not time spent within a page or switching/closing tabs.This makes naive time-estimation heuristics fail on pageswhere users might spend a long time without any record in thebrowsing history (ie, watching a YouTube video, or scrollingdown a Facebook news feed).To train and evaluate our reconstruction mechanism, we gath-ered browsing histories, as well as two weeks of second-by-second browsing activities, from 185 participants recruitedfrom Amazon Mechanical Turk. We utilize domain-relatedand temporal features in a random forest to outperformheuristics such as assuming a ﬁxed time after a page visit,and are able to correctly reconstruct the time the user spenton domains with an R value of 0.92. a r X i v : . [ c s . H C ] F e b igure 1. The top 10 domains for which we have time logged in ourdataset. RELATED WORKGathering Browsing Activities

Gathering browsing activities by logging it in a longitudinalstudy is a methodology that underlies a number of studies.For example, Mark et al have conducted studies that relatebrowsing activities to sleep debt [6] and stress [5], as well asusing them to investigate social media usage [8] and multi-tasking [4].Eyebrowse is an application where users can voluntarily sharetheir browsing activities [9]. They have gathered a datasetof browsing activities from their userbase. We complementEyebrowse by gathering a larger longitudinal dataset of fullbrowsing activities and introducing a model for reconstruct-ing attention data. If our model is successful, however, it maythreaten some measure of users’ security on Eyebrowse.

Estimating User Activities from Logs

Although no prior work has attempted to reconstruct brows-ing activities from browsing histories, there has been work onestimating user activities from logged data in other contexts.Huang et al. use mouse clicks and cursor movements to esti-mate users’ gaze on search engine result pages [3, 2]. Park etal. investigate the relationship between video view durationson Youtube and its view count, number of likes per view, andsentiment in the comments [7]. They ﬁnd that these factorshave signiﬁcant predictive power over the duration the video,and are able to predict the duration of video views with an R value of 0.19. DATASETDataset Collection

We ﬁrst gathered a dataset of browsing activities and browserhistories. We recruited 225 participants from MechanicalTurk, and asked them to install our extension which collectsbrowser histories and browsing activity events (window andtab focus and switches, as well as mouse and keyboard activ-ity such as clicks and scrolls on pages), and transmits it to ourservers.We paid users $2 for installing the extension, and gave abonus of $1 for each week they kept the extension installed. We excluded users if they uninstalled the extension or becameinactive for more than 3 days. We were able to gather datafrom 185 users in this way — most of the 40 who droppeduninstalled the extension shortly after receiving the initial $2payment. We split the 185 users into training and test sets (93users in the training set, 92 users in the test set).In Figure 1 we show the total amount of time spent on eachdomain, across all users. The domain with the most timespent is Mechanical Turk (as our users are from MechanicalTurk), but the other sites are all broadly used and representa-tive of the sites used by a general audience.

Reference Browser Activity Data

The reference browser activity dataset was obtained by log-ging open, close, switch, and change events for tabs and win-dows via Google Chrome’s tab and window APIs for exten-sions. We also logged when the user’s screen locked or thebrowser became idle (deﬁned by Chrome as 1 minute withoutmouse or keyboard activity), via Chrome’s idle API for ex-tensions. For each event, we logged which tabs and windowswere open, the URLs they were visiting, and which tabs werefocused.We then transformed this data into spans of time, whichrecord when the user starts and ends a period of activity on aURL: the start occurs when the URL is visited or gains tab fo-cus, and the end occurs by navigating to a different page, clos-ing the tab or window, switching to a different tab, browserwindow, or application, or if the browser becomes idle or thescreen is locked.

History Data

The history data was obtained via Chrome’s history API forextensions. It includes the URL that was visited, the time itwas visited at, as well as how the visit occurred (by clickinga link, reloading a page, navigation within a frame, etc).While attempting to ﬁnd the correspondences between ourhistory data and our reference browsing activities, we foundthere existed some differences. Obviously, there are manyevents such as tab switches and time spent scrolling downa page that are only represented in the browsing activitydata. However, there are also some activities that occur inthe browsing history but not the browsing activity data. Onetype of such event is navigation within frames, which we cor-rected for by eliminating them from the history (the browsinghistory explicitly marks navigation within frames as such).

RECONSTRUCTION PROCEDURE

Since our goal is to be able to reconstruct, second-by-second,whether the user’s browser is active and which domain theyare browsing, we broke this procedure into two parts: • Estimate the spans during which the browser is active (asopposed to the browser being closed, idle, or a differentwindow being in focus) • Within a span in which we believe the browser to be active,estimate which domain is being viewed at each point intime.

HEN WAS THE BROWSER ACTIVE?

We consider a browser to be active at a particular secondof time if the browser window is focused and there hasbeen mouse movement/scrolling/clicking, keyboard activity,or navigation activity within the past minute. If the browserwindow loses focus, is closed, or the screen is locked, we con-sider the browser to be inactive from that second onwards.Following common search engine practice, we consider a browsing session to be a continuous period of time from theﬁrst second when the browser is active, until 20 minutes afterthe last second when the browser is active, such that there isno continuous inactive period of more than 20 minutes.Determining when the browser is active is a classiﬁcation,for each second in the browsing session, whether or not thebrowser is active. We consider a true positive to be when wecorrectly predict that the browser was active, a true negativeto be when we correctly predict that the browser was inactive,a false positive to be when we predict the browser was activewhen it was in fact inactive, and a false negative to be whenwe predict the browser was inactive when it was in fact active.(We could alternatively deﬁne our task as classifying whetherthe browser is active or not for all seconds we have data for,including out-of-session times, but our models correctly clas-sify all out-of-session times as being inactive, so with the ex-ception of true negatives on out-of-session seconds, the tasksare equivalent).If we did not have browsing history data for a user and onlyhad aggregate data about their activities, a baseline approachmight simply classify a user as being active in an all browsingsessions, or as inactive in all browsing sessions (whicheveris more accurate for that particular user). This approachachieves an F1-score of 0.72 and accuracy of 0.63.If we do have browser history, a simple model for estimat-ing when the browser was active, is to simply guess that thebrowser remained active for some amount of time (for exam-ple, 1 minute or 2 minutes) after the last recorded event in thehistory. For example, if we set the threshold at 1 minute, thiswould achieve an F1-score of 0.64 and accuracy of 0.67. Wetried various thresholds on our training data (each 1-minutethreshold from 1 minute to 10), and found that a thresholdof 5 minutes maximized both F1-score and accuracy. Thismodel achieves an F1-score of 0.79 and accuracy of 0.76 onthe test data.We then developed a more sophisticated model for this binaryclassiﬁcation problem using machine learning. It is based onthe following intuitions: • Browsing occurs in spans of activity – within a continu-ous browsing span, the navigation activities will be denselypacked. • The domain may inﬂuence the expected duration of thevisit – it may be a domain that users tend to stay on forshorter or longer. • The domain also inﬂuences how frequently navigationevents will occur – consider a domain that displays con-tent in a paginated format (where navigation events willoccur frequently and will be recorded in the history), ver-

Figure 2. Performance of our machine learning method, versus varioussimpler approaches, on the task of predicting whether the user’s browseris active at a particular second within the browsing session. sus a single-page application with inﬁnite scrolling (whereno navigation events will be recorded in the history).We capture the importance of the domain and the browsingspans using the following features for classiﬁcation. For alltime-based features, we used the logarithm of the duration. • Time between the most recent activity and next activity inthe history. If short, then the user is likely actively brows-ing during that entire timespan. • Time since the most recent activity in the history. If short,the user is likely still on that page. • Time until the next activity in the history. If short, the usermay have just switched back to the browser window buthas not yet made a navigation event. • Domain on which the previous browsing activity occurredin the history (categorical feature with 20 categories, rep-resenting the top 20 most popular domains in the trainingdata). • Domain on which the next browsing activity occurs in thehistory (categorical feature with 20 categories). • RescueTime productivity level of the domain (categori-cal feature with 5 categories, drawn from the RescueTimecommunity).For the two categorical features with domains, the 20 domainswe consider are the ones that had the most visits among theusers in the training set. We consider only the top 20 domainsbecause of the way categorical features are turned into a bi-nary vector with length equal to the number of possible cate-gories (in our case, 20 binary features to represent 20 possibledomains), through a process known as one-hot encoding. Toavoid the curse of dimensionality (which would lead to in-creased model complexity, training time, and overﬁtting), weconsider only the top 20 domains.Productivity levels assigned one of 5 categories to each do-main: very productive, productive, neutral, distracting, orvery distracting. These classiﬁcations were drawn from Res-cueTime, which obtained the classiﬁcations from annotationsby their userbase. Domains which RescueTime does not havea productivity level for are assigned a default neutral level.As this is also a categorical feature, this is transformed into alength-5 binary feature vector via one-hot encoding.We then train a random forest with these features, usingH2O’s implementation of the random forest algorithm withthe default parameters [1].Our model achieves an F1 score of 0.84 and accuracy of 0.80on the task of predicting whether the browser is active or not igure 3. In this graph, for each user in our test set, we plotted a pointfor the total active time they spent online (x-coordinate is the actual time,and the y-coordinate is the time our algorithm estimates). at a given second of time. In Figure 2, we show the perfor-mance of our model on the task of classifying each in-sessionsecond as either active or inactive, compared to each baseline.Our model successfully classiﬁes all out-of-session samplesas true negatives, so on the task of predicting whether thebrowser is active or not at all times (including out-of-session,e.g., while the user is sleeping), the precision, recall, and F1scores remain equal, while accuracy rises to 0.96.

TOTAL TIME EACH USER SPENT BROWSING

Now that we have reconstructed whether a user is activelyusing the browser at any given point in time, we can estimatethe total amount of time each user spent online. In Figure 5,for each user we have plotted the reference time the user spentonline, against the time our algorithm estimates that the userspent online (reconstructed by summing the active secondspredicted by our classiﬁer). The result is well-correlated, withan R value of 0.96.If we consider for each user the absolute error normalizedby the total reference time spent online, and take the meanacross users, the mean normalized absolute error for our pre-dicted total online times is 0.15 ( σ =0.14). If we had insteadused the 5-minute-threshold classiﬁer for determining whatthe active-browsing times were for each user, the mean nor-malized absolute error for our predicted online times wouldbe 0.19 ( σ =0.21). WHICH DOMAIN WAS FOCUSED?

Now that we have an estimation of when the browser wasactive, we can determine which domain was focused at anygiven point in time. The active domain often does not matchthe most recent navigation event in the history, because usersswitch tabs or keep multiple windows open. If we had only aggregate data about users’ browsing activity,we might simply always predict that that the user is on thedomain that they spend the most time on. This approach pre-dicts the domain correctly on 31.6% of seconds (among theseconds the browser is active, on the users in the test set).If we have browsing history data for a user, a simple heuristicfor predicting which domain the user is on is to assume thatwe are browsing the page that was visited most recently. Thisis able to predict the domain correctly on 74.2% of seconds inthe dataset (among the seconds the browser is active, on theusers in the test set).We developed a more sophisticated model which treats prob-lem as a multi-class classiﬁcation problem. Our model at-tempts to decide between 4 classes for the domain the user iscurrently on: • The domain in the most recent navigation event in the his-tory, which we will refer to as C (for “current”) • The domain in the next navigation event in the history,which we will refer to as N • The domain before C in the history (not matching C),which we will refer to as P1 (“past, one back”) • The domain before P1 the history (not matching C or P1),which we will refer to as P2 (“past, two back”)The intuition behind our model is that if a user has tabbedover to a different tab, it must have been opened at some pointin the past – this is what P1 and P2 are designed to keep trackof (they approximate potential tabs that might be open in thebackground). The type of domain also matters – users aremore likely to keep certain common sites, such as Facebookor Gmail, open in the background than other pages. Finally,if a user has switched to a different tab, they may eventuallynavigate to another page from it, which will appear in thebrowsing history.We choose these 4 classes because they account for most ofthe domains the user is on during browsing – in our test data,92.4% of active browsing time will be on one of these do-mains. (As for the remaining 7.6% of time, for 6.3% the do-main visit appears further back in the browsing history, while1.3% do not appear in the history at all – we will discuss rea-sons for this in the Discussion section).Note that these classes can overlap (ie, N can equal C, P1,or P2). In these cases, for the purpose of labeling samples inour training data, we labeled it as the most common class itcould belong to (where the order of commonness is C, N, P1,P2). In the 7.6% of seconds where the active domain did notmatch any of C, N, P1, or P2, we did not include the samplein our training data.The features we used are described below. For all time-basedfeatures, we used the logarithm of the duration. We will usethe shorthand t(C) to refer to the time of the most recent nav-igation event in the history, t(N) to refer to the time the nextnavigation event, t(P1) to refer to the time of the most recenthistory event where P2 appears, and t(P2) to refer to the timeof the most recent history event where P2 appears. igure 4. Confusion matrix for our random forest, which classiﬁes eachactive second of browsing as either the domain seen most recently in thehistory (C), the next domain in the history (N), the domain before C inthe history (P1), or the domain before P1 in the history (P2). • Time between t(C) and t(N). If short, the user will likelynot be tabbing to other locations. • Time that has elapsed since t(C). If short, the user is likelystill on domain C. • Time until t(N). If short and N was already open as a tab,the user may have switched to domain N. • Time that has elapsed since t(P1). If long, the user is lesslikely to be on P1. • Time that has elapsed since t(P2). • • • • • • • Whether the referring visit id (the source page for the nav-igation event) of N equals the visit id of C. If true, the userhad likely stayed on C prior to opening N. • Whether the referring visit id of N equals the visit id ofP1. If true, the user had likely switched tabs to P1 prior toopening N. • Whether the referring visit id of N equals the visit id of P2. • Which domain C, N, P1, and P2 are (each is a categoricalfeature with 20 categories) • Whether N is the same domain as C, P1, or P2 (these 3binary features help resolve overlap in classes)The categorical features representing domains represent the20 most common domains in the training set. The referringvisit id feature makes use of a metadata ﬁeld accessible viaChrome’s history API which tells us which prior link a par-ticular visit came from, if it was accessed by clicking a link.We then train a random forest with these features, usingH2O’s implementation of the random forest algorithm withthe default parameters [1].Our model correctly predicts the domain in 82.5% of secondswhere it is one of C, N, P1, or P2. The confusion matrix isshown in Figure 4, showing that most errors are with rarerclasses being mispredicted as more common classes. How-ever, because in 7.6% of the seconds the domain is not oneof C, N, P1, or P2 and hence cannot be correctly classiﬁedby our model, then this results in our algorithm predicting

Figure 5. In this graph, for each user in our test set, we plotted a pointfor each domain they visited representing the time spent on the domain(x-coordinate is the actual time, and the y-coordinate is the time ouralgorithm estimates). the correct domain 76.2% of the time (among the seconds thebrowser is active, on the users in the test set).

HOW MUCH TIME WAS SPENT ON EACH DOMAIN?

Having now reconstructed the domains where users spenttheir time on a second-by-second basis, we can now evalu-ate how accurately this can be used to compute overall timespent on domains. Overall time spent on domains is a use-ful piece of data that can be used in several time-tracking andproductivity applications, as well as studies where time spentonline or on a speciﬁc service is of interest.In Figure 5, for each user in our test set, we plotted a point foreach domain they visited, representing the relation betweenactual time spent on the domain, versus our combined pre-diction model’s predicted time (which was obtained by ﬁrstdetermining the active seconds with our browser-active ma-chine learning classiﬁer, feeding this to our focused-domainmachine learning classiﬁer, and summing over the results).Our reconstructed total time spent on each domain is well-correlated with the actual time spent. If we take the meanof the R value over all users, we achieve a mean R valueof 0.92 ( σ =0.122). If instead of our machine learning clas-siﬁers we instead use the simpler 5-minute active-thresholdand most-recent-domain heuristic classiﬁers, this achieves amean R value of 0.91 ( σ =0.123).We also computed for each user the absolute error summedover each domain prediction, and normalized it by the to-tal time spent. With our machine learning classiﬁers, themean of this normalized absolute error is 0.305 across users( σ =0.129), while with the heuristic classiﬁers, the mean nor-malized absolute error is 0.344 ( σ =0.128). ISCUSSION

We will now discuss sources of errors and limitations of ourtechnique, and how they might be addressed.If we look back to our plot of reconstructed total domain-focus times in Figure 5, we see that there are a handful ofoutliers where we predict a much lower amount of browsingthan the reference. Many of these are due to rarer video sitesor long single-page articles which are not among the top 20domains. Here, users might spend several minutes activelybrowsing without any record in the history. A potential wayto ﬁx this issue is to estimate the amount of time it wouldtake a user to consume the content within a given URL, usinga headless browser. For example, for video content, we couldscrape the page, see if there are any videos, and detect thelength of the video. We might then predict whether the userhad fully watched the video or not – based on whether thenext event in the browsing history occurred around when thevideo would have ﬁnished playing. Analogously, for textualcontent, we might estimate the amount of time needed to readthe article based on the amount of visible text, and developa model to predict whether the user had fully read the articlebased on the surrounding browsing history. This informationcould then be used to correct our estimate of how much timethe user had actually spent on the page.Another underlying cause for some underestimates of timespent was that the user had partially cleared their browsinghistory – Chrome provides an option to clear browsing historyfrom the past hour. Although we had attempted to excludeusers who cleared their browsing histories from our trainingand test datasets, our technique had only detected when usershad cleared at least a day’s worth of data (as our extensionsent the histories to our servers on a daily basis). Hence, whenmaking computations and inferences using browsing histo-ries, we must consider the possibility that the user may havepartially cleared their history.We had mentioned that during 1.3% of all active browsingseconds, the domain that the user was focused on did not ap-pear anywhere in the preceding browsing history. Althoughpart of this may have been due to users partially clearn-ing their browsing histories, another cause was that URLsfor certain non-http/https protocols are not logged in thehistory. Among these, the chrome://newtab page which isvisited when the user opens a new tab accounts for nearlyhalf (0.6% of total active browsing seconds), while otherchrome:// URLs such as the the bookmarks, downloads, set-tings, extension settings, and various extension-related pagescontributed to another 0.1% of total active browsing seconds.An additional limitaiton of our technique is that browsing his-tories are not logged in incognito mode (this is Chrome’s termfor the private browsing mode), so we might not be able tocapture all browsing activity from users who use the incog-nito feature. However, this limitation is also shared by usinga browser extension to log data, as browser extensions aredisabled in incognito mode by default.

CONCLUSION

Browsing activity data, which tells us on a second-by-secondbasis whether the browser is active and which page is beingviewed, is useful for many experiments and applications, butis difﬁcult and time-consuming to gather as it requires a lon-gitudinal study. Browsing histories, in contrast, are easy togather – we can access several months of browsing historydata instantly simply by asking the user to install a Chromeextension – but does not capture key details, such as whenthe browser is in focus, when the user is actively browsing apage, and when the user switches windows or tabs.In this paper we used browser histories to reconstruct esti-mates of 4 key elements of browsing activity: what times thebrowser is active, which domain the user is focused on whenthe browser is active, total time spent online, and total timespent on each domain. We ﬁrst gathered a dataset by askingMechanical Turk users to install our extension which collectsboth longitudinal browsing activity data as well as browserlogs. We then used this gathered dataset to by training apair of machine learning algorithms, one of which classiﬁeswhether the browser is active or not at a given second, andanother which identiﬁes which domain is focused when thebrowser is active. These metrics can be used to derive howmuch time the user spent on each domain, as well as the totaltime spent online.These reconstructed browsing activities have many applica-tions, both for research and productivity applications. Forexample, in the context of a productivity or time-tracking ap-plication, we can bootstrap the process with our estimates oftime spent on each domain, allowing the user to see (approxi-mate) results immediately based on several months of data. Itcan also be used to develop more robost sureveys, and smarterinterventions: rather than asking people to self-report howmuch time they spend on sites like Facebook, a survey canask the user to install an extension that will locally computean estimate based on the user’s browsing history, and ﬁll outthe question. If we collected these reconstructed browsingactivities for a pool of potential participants and stored themin a database, experiments and interventions that target par-ticular populations – for example, users who spend over fourhours on Reddit each day – can now much more effectivelyrecruit participants based on their browsing activities.Reconstructed browsing activity data could also potentiallybe used to gather data and identify patterns in browsing be-haviors faster and at larger scale than the small datasets wecan collect via longitudinal studies. We hope we might beable to use these at-scale reconstructed browsing activitiesto understand patterns of behaviors such as self-interruptionsduring web usage, and use this to develop interventions toimprove users’ productivity. The ability to instantly recon-struct several months’ worth of browsing activities by ask-ing the user to install an extension would open the gates toa new class of intelligent productivity-enhancement, survey,and data mining opportunities.

EXTENSION AND CODE

We have developed an open-source Chrome extension and re-construction code that allows researchers to access an end-ser’s reconstructed browsing activity and total time spentper-domain from their own websites, once the user hasinstalled our extension. The Chrome extension is avail-able at https://github.com/gkovacs/browserlog and thereconstruction code is at https://github.com/gkovacs/browsing-behavior-reconstuction-analysis

REFERENCES

1. Breiman, L. Random forests.

Machine learning 45 , 1(2001), 5–32.2. Huang, J., White, R., and Buscher, G. User see, userpoint: gaze and cursor alignment in web search. In

Proceedings of the SIGCHI Conference on HumanFactors in Computing Systems , ACM (2012), 1341–1350.3. Huang, J., White, R. W., and Dumais, S. No clicks, noproblem: using cursor movements to understand andimprove search. In

Proceedings of the SIGCHIConference on Human Factors in Computing Systems ,ACM (2011), 1225–1234.4. Mark, G., Iqbal, S., Czerwinski, M., and Johns, P.Focused, aroused, but so distractible: Temporalperspectives on multitasking and communications. In

Proceedings of the 18th ACM Conference on ComputerSupported Cooperative Work & Social Computing , ACM(2015), 903–916. 5. Mark, G., Wang, Y., and Niiya, M. Stress andmultitasking in everyday college life: an empirical studyof online activity. In

Proceedings of the SIGCHIconference on human factors in computing systems , ACM(2014), 41–50.6. Mark, G., Wang, Y., Niiya, M., and Reich, S. Sleep debtin student life: Online attention focus, facebook, andmood.7. Park, M., Naaman, M., and Berger, J. A data-driven studyof view duration on youtube. In

International AAAIConference on Weblogs and Social Media (2016).8. Wang, Y., Niiya, M., Mark, G., Reich, S. M., andWarschauer, M. Coming of age (digitally): An ecologicalview of social media use among college students. In

Proceedings of the 18th ACM Conference on ComputerSupported Cooperative Work & Social Computing , ACM(2015), 571–582.9. Zhang, A. X., Blum, J., and Karger, D. R. Opportunitiesand challenges around a tool for social and public webactivity tracking. In