[PDF] Data@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction

Abstract

Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by enabling flexible time manipulation with speech. We designed and developed Data@Hand, a mobile app that leverages the synergy of two complementary modalities: speech and touch. Through an exploratory study with 13 long-term Fitbit users, we examined how multimodal interaction helps participants explore their own health data. Participants successfully adopted multimodal interaction (i.e., speech and touch) for convenient and fluid data exploration. Based on the quantitative and qualitative findings, we discuss design implications and opportunities with multimodal interaction for better supporting visual data exploration on mobile devices.

Full PDF

DData@Hand: Fostering Visual Exploration of Personal Dataon Smartphones Leveraging Speech and Touch Interaction

Young-Ho Kim

University of MarylandCollege Park, MD, [email protected]

Bongshin Lee

Microsoft ResearchRedmond, WA, [email protected]

Arjun Srinivasan ∗ Tableau ResearchSeattle, WA, [email protected]

Eun Kyoung Choe

University of MarylandCollege Park, MD, [email protected] “Summer 2019” “January 1” “Compare sleep ranges of winter and summer this year” a b c “Show this period by days of the week” Figure 1: Data@Hand supports multimodal interactions to enable people to easily navigate and compare their personal healthdata on smartphones. People can execute a context-agnostic command such as setting up a comparison by specifying two newperiods using a global speech button ○ . They can feed a context to their utterance by touch, such as the start date ○ , thetarget for comparison ○ , or the time range for refining the view ○ . (Please refer to our supplementary video, available at https://data-at-hand.github.io/chi2021 , which demonstrates the interactions.) ABSTRACT

Most mobile health apps employ data visualization to help peopleview their health and activity data, but these apps provide limitedsupport for visual data exploration. Furthermore, despite its huge ∗ Arjun Srinivasan conducted this work while with Georgia Institute of Technology.Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445421 potential benefits, mobile visualization research in the personaldata context is sparse. This work aims to empower people to easilynavigate and compare their personal health data on smartphones byenabling flexible time manipulation with speech. We designed anddeveloped Data@Hand, a mobile app that leverages the synergyof two complementary modalities: speech and touch. Through anexploratory study with 13 long-term Fitbit users, we examined howmultimodal interaction helps participants explore their own healthdata. Participants successfully adopted multimodal interaction (i.e.,speech and touch) for convenient and fluid data exploration. Basedon the quantitative and qualitative findings, we discuss designimplications and opportunities with multimodal interaction forbetter supporting visual data exploration on mobile devices. a r X i v : . [ c s . H C ] J a n HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

CCS CONCEPTS • Human-centered computing → Visualization ; Visualizationsystems and tools ; Empirical studies in visualization ; Ubiq-uitous and mobile computing systems and tools . KEYWORDS

Personal informatics, data visualization, multimodal interaction,speech, smartphone

ACM Reference Format:

Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe.2021. Data@Hand: Fostering Visual Exploration of Personal Data on Smart-phones Leveraging Speech and Touch Interaction. In

CHI Conference onHuman Factors in Computing Systems (CHI ’21), May 8–13, 2021, Yokohama,Japan.

ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3411764.3445421

Smartphones, equipped with high-resolution displays and powerfulprocessors, are increasingly becoming a dominant way to access in-formation [74]. A vast number of mobile health (or mHealth) apps,including wearable devices’ companion apps (e.g., Fitbit App [30],Apple Health [3], Samsung Health [59], Garmin [33], and Mi Fit [36]),enable people to access their health and activity data collected overtime. While these mHealth apps commonly employ data visual-izations to help people view and understand personal data [2],they provide limited support for navigating and exploring the data.Furthermore, research on mobile data visualization is sparse [48].Much of the mobile visualization research has been carried out withtablets [10], and only a handful of projects have recently begunto study data visualizations on smartphones (e.g., [11, 12, 61]) andsmartwatches (e.g., [9, 13, 46]).In this work, we investigate how to facilitate flexible data explo-ration on smartphones in the context of self-tracking data, while ad-dressing several challenges smartphones pose. Due to their limitedscreen space, smartphones cannot afford a control panel of widgets(alongside the visualizations), which are effective means to supportdynamic queries [1]. It is distracting to navigate to a separate pageto adjust the widgets and come back to the page with visualizationsto see the effect. In addition, the lack of mouse input makes it diffi-cult to perform two essential actions—(1) a precise selection and (2)details-on-demand using a mouse hover interaction—which are wellsupported in a desktop environment. Furthermore, while time is aprimary dimension of self-tracking data, it is laborious to performtime-based interactions on smartphones, such as entering specificdate, time, and ranges. As a result, most mHealth apps tend to limittime manipulations. For example, the Fitbit App restricts peopleto view data by predefined time segments, such as one week, onemonth, three months, and one year. Inspired by previous researchadvocating the benefits of multimodal interaction [15, 20, 49, 50, 75],we incorporate an additional input modality, speech , to overcomethese challenges. Speech-based interaction takes little screen space.Speech is flexible to cover different ways that people specify date(e.g., “Last Thanksgiving” or “Lunar New Year’s Day”) and dateranges (e.g., “2017” indicating the range from January 1, 2017 toDecember 31, 2017), which people are already familiar with. Combining two complementary modalities, speech and touch,we designed and developed Data@Hand (Figure 1), a mobile appthat facilitates visual data exploration. As a first step, informed byprior work on personal insights [17, 18], we support navigation andtemporal comparisons of personal health data, as well as data-drivenqueries. To understand how speech and touch interaction can helplay individuals explore their data, we conducted an exploratorystudy with 13 long-term Fitbit users using [email protected] observed that participants successfully adopted multimodalinteraction, using both speech and touch interactions while findingpersonal insights. Participants reported that they made deliberatechoices between the two input modalities for a more convenient andfluid data exploration. Flexible time expressions enabled by speech-based natural language interaction helped them freely navigatedata in a specific time frame (e.g., “

Go to March 2020 ”), quicklyset up comparisons (e.g., “

Compare sleep ranges of winter andsummer this year, ” illustrated in ○ in Figure 1), and easily executedata-driven queries (e.g., “ Days I walked more than 10,000 stepslast month ”). Speech commands combined with touch input (e.g., ○ , ○ , and ○ in Figure 1) enabled easy modifications of the timecomponents. For example, to change the start/end date, one cansimply utter a specific date while holding on the start/end datelabel. Also, graphical widgets (e.g., calendar widget, data sourcedrop-down list) served as a fallback to correct erroneous resultsof speech or to explore a set of categorical values. Being satisfiedwith their overall experiences, all but one participant expressedthat they are willing to keep using Data@Hand after the study. Thekey contributions of this work are: (1) The design and implementation of Data@Hand, the first mobileapp that leverages the synergy of speech and touch input modalitiesfor personal data exploration. Data@Hand helps people interactwith their own personal data on smartphones by accessing the Fitbitdata using the Fitbit REST API. It runs on both iOS and Android,using the Apple speech framework [4] and Microsoft CognitiveSpeech API [53] as speech recognizers. The Data@Hand sourcecode is available at https://data-at-hand.github.io . (2) An empirical study conducted with 13 long-term Fitbit usersusing Data@Hand. From the quantitative and qualitative analysis,we provide an understanding of how people explore their own datausing speech and touch interaction on smartphones, uncoveringsituations and rationale for people’s choice of interaction. (3)

Design implications and opportunities for multimodal interac-tion for mobile data visualization. Reflecting on our observationsand participants’ feedback, we draw design implications and oppor-tunities for developing a multimodal interaction to better supportpersonal data exploration on mobile devices.

In this section, we cover related work in the areas of (1) visual ex-ploration of personal data and (2) natural language and multimodalinteraction for visual data exploration.

As collecting and reflecting on personal data has become common-place, research on personal visualization has gained increasing ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan attention [37, 49, 51, 73]. Personal visualizations equipped with in-teractivity enable visual data exploration , making it easy for peopleto understand and reflect on their data. As such, many PersonalInformatics systems (e.g., [6, 18, 26, 28, 38, 72]) support visual dataexploration to empower people to gain personal insights.Because time is a primary dimension of personal data (or self-tracking data), systems that support visual data exploration striveto enable easy manipulation of the time component. For exam-ple, Visualized Self [18]—a web application that enables people tointegrate and explore personal data from multiple self-trackingservices—employs the timeline mini-map to enable rapid adjust-ment of the data scope. Activity River [6] takes a similar timeline-based approach, but the scope was fixed as a single day. VisualMementos [72] supports visual exploration of personal locationhistory. It incorporates a multidimensional selection widget forthe precise scoping of event episodes in a series of location logs.Huang and colleagues [38] designed an on-calendar visualizationtool that integrated people’s physical activity data. They chose toleverage a calendar, an inherent time-based visualization with richpersonal context, to make the data readily accessible. We note thatthese systems were designed for a desktop environment and didnot investigate how their interfaces could be applicable in a mobileenvironment with smaller screen space.Commercial mHealth apps, including wearable devices’ compan-ion apps, also provide the visual exploration capability. However,most of these apps are limited in terms of time navigation, makingit hard to jump to an arbitrary time frame or to compare data fromtwo different time frames. These commercial apps usually showdaily information using a dashboard on their main page, aiming topromote self-awareness of the current performance. They are alsoconstrained by the smartphone form factor, such as small screenand imprecise touch input. Furthermore, existing widgets (e.g., cal-endar) for date entry are not flexible enough to handle the variousways to specify time. As a result, limited navigation support be-comes a barrier to performing flexible data exploration, and in turnobtaining personal data insights on smartphones.The practitioner community developed ample applications ofdata visualization in mobile apps & websites (refer to [56, 57] forcurated practices). Data visualization is also commonly used formobile form factors, such as smartphones and tablets, in researchprototypes (e.g., [16, 22, 41, 42, 55, 60]) developed by UbiCompand Human-Computer Interaction researchers. However, researchspecifically focusing on mobile data visualization is sparse andmuch of the mobile visualization research has been carried out withtablets [10]. As such, the research community has recently put ef-forts to shape a research agenda for mobile data visualization whilecalling for more research endeavors [14, 48, 49]. Although only ahandful, mobile visualization research has begun to pay attentionto the smaller form factors (i.e., smartphones and smartwatches).They examined effectiveness of visual representations (e.g., rangeson timeline [11], animated transition vs. small multiples of scatter-plots [12], data comparison on smartwatches [9]) and interactiontechniques (e.g., multivariate network exploration [25], pan andzoom timelines and sliders [61]). In addition, in their workshop pa-per, Choe and colleagues [15] envisioned a scenario where speechinteraction could facilitate personal data exploration on mobiledevices. Inspired by this line of research and vision, we contribute to mobile data visualization with Data@Hand, the first mobile appthat leverages the synergy of speech and touch input modalities toaugment personal data exploration.

Advancements in natural language understanding and speech recog-nition technology have promoted the design of natural languageinterfaces (NLIs) for data visualization [8, 21, 23, 32, 34, 35, 39, 40, 47,62–65, 67, 69, 79]. These systems commonly exploit two advantagesof natural language: (1) high flexibility in synthesizing multiplecommands and parameters [21] and (2) low barriers in expressingintents and questions regarding the data [7, 66]. A majority of thesesystems are primarily designed for desktop settings and investi-gate typed natural language input (e.g., [21, 23, 32, 35, 62, 69]). Onthe other hand, another subset of prior systems focus on speechinput and explore multimodal visualization interfaces that incor-porate speech with other modalities such as pen and/or touch inpost-WIMP [76] settings including tablets [39, 64] and large dis-plays [47, 65, 67].Given the context of the smartphone form factor, two existingsystems that are most relevant to our work are the tablet-basedsystems, Valletto [39] and InChorus [64]. Valletto allows peopleto specify charts through speech and then perform simple touchgestures such as a rotate to flip axes and swipe to change the visu-alization type. Exploiting the complementary nature of pen/touchand speech [19], InChorus illustrates a vocabulary of multimodalactions involving a wide range of visualizations for tabular data(e.g., selecting marks with a pen and saying “

Remove others " to filterunselected points, pointing on an axis with a finger and speakingattribute names to specify data mappings).Our work extends this line of research on multimodal visualiza-tion interfaces on mobile devices in two notable ways. First, unlikeValletto and InChorus that were designed for tablets, Data@Hand isthe first system specifically designed for smartphones. Correspond-ingly, we discuss the unique constraints we faced in the form ofmore limited screen space and lower precision with touch input [78],and how these constraints impacted our interface and interactiondesign (see DR3 in Section 3.1). Second, compared to prior NLIs thatprimarily target avid users of visualization tools (e.g., data analysts,developers or managers in visualization-oriented products), ourwork targets a broader population of lay individuals interested inexploring their personal health data. In doing so, we highlight therole of lay individuals that affected the design of Data@Hand, anddiscuss in detail the interaction patterns individuals employed forperforming tasks such as time manipulation, temporal comparisons,and data-driven queries with Data@Hand.

To enable flexible data exploration on smartphones, we designedand developed Data@Hand, a mobile app that leverages the benefitsof two complementary input modalities, speech and touch. In thissection, we describe our design rationales and the Data@Hand appalong with the implementation details.

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

DR1: Use Simple and Familiar Visualizations to Support LayIndividuals.

Our target audience is people who collect their healthand activity data using commercial wearable devices. They usuallydo not have expertise in data visualization and analytics. Therefore,we incorporated familiar visual representations and conventionalcharts that people commonly encounter on existing mHealth apps.For example, we used bar charts, line charts, and range charts tovisualize daily measurement values (e.g., step counts and restingheart rates in Figure 1a and sleep ranges in Figure 2a). As for vi-sualizing the aggregated data over a period, we designed customrepresentations called aggregation plots : a simplified version of box-and-whisker plots. We encoded only the average and the minimumand the maximum values (for the range) because these metricsare more relevant to the personal tracking context than medianand percentiles. For example, Figure 2c shows the aggregated sleepranges for February and August of 2020.

DR2: Enable Flexible Time Manipulation to Help Identify Per-sonal Insights.

Unlike the general and broader visual data explo-ration, personal visualizations require different design considera-tions because of the nature of the personal data and the diverse per-sonal data collection goals [37, 73]. In visual exploration with per-sonal data (or self-tracking data), people look for specific personalinsights, such as whether they achieved a certain personal goal(e.g., 10,000 steps per day), how their behaviors (e.g., step counts,sleep patterns) and emotional/physiological states (e.g., mood, heartrates) change over time, and what factors might have affected thechanges (e.g., before & after the COVID-19 lockdown) [18]. Further-more, as shown in prior works [17, 18, 51, 68], comparison by timesegmentation is one of the most common visual exploration tasksthat people actively perform to gain personal insights. However,flexible time manipulation—a key facilitator in drawing personal insights—is rarely supported in mHealth apps due to the limitationswe described earlier (see Section 2.1). We strove to enable flexibletime manipulation, focusing on navigation (Figure 1a) and temporalcomparisons (Figure 1b & 1c and Figure 2c & 2d).

DR3: Leverage the Synergy of Speech and Touch Interactionson Smartphones.

Smartphones are increasingly becoming a dom-inant way to access information [74], and much of the personaldata is collected from smartphones and wearable devices. As such,we wanted to facilitate easy access to people’s own data on smart-phones. To overcome the challenges smartphones pose, we leverageboth speech and touch input modalities that are complementary innature: speech input affords a high freedom of expression withoutrequiring much screen space, whereas touch input supports direct interaction [58, 64, 65, 67]. When combining the strengths of thesetwo modalities, we aimed to provide a complementary set of oper-ations [20] rather than providing an equivalent set of operationsfor each modality. We detail how we synergistically incorporatedthe two modalities in supporting a diverse set of operations onsmartphones in Section 3.2.2.

Data@Hand currently supports five health metrics that are re-trieved from the Fitbit data sources: (1) step count, (2) resting heartrate, (3) sleep duration, (4) hours slept, and (5) weight. For sleep,we included only one sleep log per day that is marked as the mainsleep. We chose the metrics based on their prevalence in commercialhealth apps according to a recent survey [44].

Data@Hand sup-ports navigation, temporal comparison, and data-driven queriesusing four main pages—

Home (Figure 1a),

Data Source Detail (Fig-ure 2a),

Two-range Comparison (Figure 1b), and

Cyclical Compar-ison (Figure 1c). As a default view, the Home page visualizes the a cb d “Compare with last August” “February 2020” “Show 2020 by month”“Days I woke up earlier than 7:30 AM” Figure 2: Excerpt of the exploration flow in our usage scenario. ○ Zoe executes a data-driven query via natural language. ○ Thesystem infers omitted parameters (e.g., a pre-selected date range as a comparison target) using the current screen information. ○ Zoe changes August 2019 to February 2020 by touch+speech interaction on the aggregation plot. ○ Zoe establishes thecyclical comparison to see this year’s monthly trend. ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan past 7-day data for the five data sources. The Data Source Detailpage shows detailed information for a single data source. Bothcomparison pages juxtapose aggregated measurement values, butin two different ways: the Two-range Comparison page plots twoselected periods side by side, and the Cyclical Comparison pagedisplays values within a specific period grouped by a predefinedtime cycle, such as days of the week (i.e., Sunday through Saturday)and months of the year (i.e., January through December).These pages contain common interaction components. An appheader and bottom toolbar are located on all pages (see Figure 1).The range widget on the app header is for manipulating date ranges,and the global microphone button on the toolbar is used toexecute a speech command in a global scope—a command that isagnostic to the current context or page. We describe its functionali-ties in more detail in Section 3.2.2. The Home button is a shortcutthat brings back to the Home page, maintaining the current daterange. The Compare button opens the configuration panel wherepeople can configure data source, comparison type, and date rangesto execute comparison queries.People can execute data-driven queries by specifying a conditionin natural language. Data@Hand responds by highlighting the dataitems in red (see Figure 2a). Also, the query bar ( ○ in Figure 2a),shown below the app header, contains parameter widgets for ma-nipulating recognized parameters in a query (e.g., [Wake Time],[earlier than], [07:30 AM]), and the number of days that satisfythe condition (e.g., ‘5 days’). The system automatically updates thequery result when people manipulate the time or data source onthe screen, until they dismiss the query bar by swiping it to theleft. The design was inspired by the ambiguity widgets from priorsystems (e.g., [32, 62, 65, 67]). Using touch and speech inputmodalities, Data@Hand provides three types of interaction— touch-only , speech-only , and touch+speech . With Data@Hand, people candirectly interact with graphical widgets with touch, as they wouldnormally do with any mobile apps. For example, they can tap onan item (e.g., label, chart) to select it, and swipe the range widgetto shift the time frame back and forth. Using speech, people canissue powerful commands that are applicable to a global scope: forexample, changing a data source and date range together with “Showme the step counts from this summer,” or executing a data-drivenquery with “ Highlight the days I walked more than 10,000 steps .”To handle speech, Data@Hand adopts a “push-to-talk” technique:the system records the speech input while people are pressingon the global microphone button . Given that pressing on theglobal microphone button serves only as a means to initiate speechinteraction, we consider using speech with the global microphonebutton as speech-only interaction.For the touch+speech interaction, people press on a target ele-ment while uttering speech commands that make use of the specificcontext the target element provides. For example, as shown in ○ inFigure 1, people can navigate to a different date range by utteringonly a date (e.g., “ January 1 ”) while pressing on the “start date”part of the range widget. Furthermore, with touch+speech peoplecan perform a related command while keeping the same context.For example, as shown in ○ in Figure 1, after comparing the sleepranges of winter and summer of this year (2020), they can simply utter “ Summer 2019 ” while pressing on the aggregation plot forthe winter to compare summer of 2019 and 2020. As demonstratedin previous research [52, 64, 77], this context helps Data@Handfacilitate faster interactions and reduce the complexity of speechcommands, and ultimately improves user experiences.

When people press theglobal microphone button or long-press a target element fortouch+speech interaction, Data@Hand displays the speech inputpanel (Figure 3) while dimming outside. If the target element is anaggregation plot, the speech input panel is embedded in a tooltip(Figure 3, left). As a preemptive guide, Data@Hand displays a con-textual message (e.g., “Say a new date” for start/end date) or examplephrases on the panel (Figure 3, left) until people start to utter. Whilepeople are talking to the system, the panel displays a real-time dic-tation result (Figure 3, right) to make people aware of how theirutterance is being recognized and to prevent the early release ofthe finger before the utterance is completely dictated.After each execution of an operation, Data@Hand provides dif-ferent types of feedback depending on its result. When Data@Handcould translate the utterance and execute a valid operation, it mo-mentarily displays a confirmation message ( ○ in Figure 4) alongwith the undo button ( ○ in Figure 4) as a quick recovery option. Ifthe translated operation is invalid, the system opens a contextualmessage dialog ( ○ in Figure 4). For example, if one utters just onedate range using speech-only interaction on the Two-range Com-parison page, the system suggests try the same command usingtouch+speech interaction through the aggregation plot for disam-biguation. When Data@Hand fails to translate the utterance, itinforms people accordingly ( ○ in Figure 4). Here we describe Data@Hand’s interactions through a usage sce-nario: a self-tracker Zoe has been collecting data using a Fitbitband for almost five years. While this scenario emphasizes thespeech-only and touch+speech interactions, most operations arealso supported by touch-only interaction using graphical widgets.(Refer to our supplementary video to see more detailed interaction.)

Data Navigation & Data-Driven Queries.

Being curious abouther long-term activity patterns, Zoe opens Data@Hand. The system

Figure 3: The speech input panel that displays preemptiveguides (left) and a real time dictation result (right), beforeand during a speech interaction.

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

41 32

Figure 4: When the system could translate the utterance andexecute a valid operation, it momentarily displays a confir-mation message ○ along with the undo button ○ . When thetranslated operation is not valid, the system opens a contex-tual message dialog ○ . When the system fails to translatethe utterance, it informs people accordingly ○ . initially shows the past seven days of data on the Home page.To extend the scope, she long-presses the start date and utters“ January 1 ” ( ○ in Figure 1), and the system sets the date range toJanuary 1, 2020 to today (August 27, 2020). Skimming the bar chartfor step counts, she notices a dramatic drop of step counts sincemid-March, being reminded of the start of the COVID-19 lockdown.This plummet motivates her to explore her data since the lockdownhas taken effect.Zoe starts to explore her step counts wanting to see how manydays she achieved her daily step goal. She speaks “ Days I met mystep goal. ” Referring to her step count goal (10,000 steps) from herFitbit account, Data@Hand highlights the days with step countshigher than 10,000. She finds that the highlighted days are denseearlier in the year but sparse since March. Zoe scrolls through thecharts for other data sources. Once she reaches the chart for sleeprange, she notices that her sleep has been pushed back since March.Seeing this, Zoe decides to take a detailed look at her sleep ranges.To narrow down the scope to a more recent period, she speaks“

Sleep range of this month .” The system opens the Data SourceDetail page for August’s sleep range. By asking “

Days I woke upearlier than 7:30 AM ,” she learns that she woke up that early justfor five days in August (Figure 2a).

Temporal Comparisons.

Zoe is curious about how her sleep dif-fers from last year’s. She utters “

Compare with last August ” ( ○ inFigure 2). Translating last August to August 2019 , Data@Hand opensthe Two-range Comparison page comparing August 2019 againstAugust 2020 (Figure 2b). Zoe notices that this August’s averagesleep schedule is shifted by more than an hour late compared toAugust 2019. Also, the ranges of the bedtime and wake time inAugust 2020 are longer, implying her irregular sleep pattern. Zoenow wonders how the lockdown has affected her sleep. So, shechanges the range from

August 2019 to February 2020 , the previ-ous month before the lockdown, by uttering “

February 2020 ” whilelong-pressing the aggregation plot for August 2019 ( ○ in Figure 2).She confirms that, compared to February, her sleep schedules forAugust are also shifted towards later hours and show more irregularbedtime and wake time (Figure 2c). To see how the lockdown affected her sleep schedules fromthe monthly trend, she speaks “ Show 2020 by month ” ( ○ inFigure 2). Data@Hand opens the Cyclical Comparison page, withher sleep ranges in 2020 grouped by month (Figure 2d). She learnsthat her average sleep schedules have become more regular sinceMay and that they have shifted to earlier hours. Zoe continues onthe exploration, switching to other data sources by swiping the data source widget on the app header. We implemented Data@Hand in TypeScript [54] upon React Na-tive [27] to support both iOS and Android. When a participant firstsigns in with a Fitbit account through OAuth 2, the system usesFitbit REST API [31] to download the Fitbit data of the entire periodsince the account creation to the local SQLite database. The systemalways uses the locally-cached data to improve the performance byminimizing network overheads.We used the Apple speech framework [5] and Microsoft Cogni-tive Services [53] as a speech-to-text recognizer on iOS and Android,respectively. We initially used a built-in speech recognizer for eachOS. However, for Android, we decided to use Microsoft’s Speechservice because of the limitations of Android’s built-in speech rec-ognizer API. We appended a set of application-specific keywords(e.g., name of the data sources) and time expressions (e.g., “May”is likely to refer a month rather than a verb) to the recognizers’vocabulary to improve accuracy for short phrases.We implemented the system interpreter to work locally on thedevice. Receiving the recognized input text, the interpreter pre-processes it by performing part-of-speech tagging using the Com-promise [43] Javascript library and identifying parameters such asdata sources, query conditions, and periods. To identify the timeinformation mentioned in the input text, we used a customizedversion of Chrono [70], a natural language time parsing library.After the preprocessing, the interpreter infers the operation basedon the tagged verbs and parameters, the current screen information,and the pressed element for the touch+speech interaction.

We conducted an exploratory study with Data@Hand, employinga think-aloud protocol, to examine how multimodal interactionshelp people explore their data. As part of this study, participantsinteracted with their own Fitbit data using their smartphones. Dueto the COVID-19 outbreak, all study sessions were held remotelyusing Zoom video call (in July 2020). In Section 4.2, we explain pre-cautionary action we took to deliver a remote tutorial and to ensureclose monitoring of the study session while mitigating potentialprivacy invasion. We refined both the system and study protocolthrough six pilot sessions with Fitbit users recruited from Reddit.This study was approved by the Institutional Review Board of theUniversity of Maryland at College Park.

We recruited 13 participants (P1–13; nine females and four males)from Reddit by advertising the study on the subreddits for jobpostings in the United States. Our inclusion criteria were adultswho (1) are native English speakers; (2) have used Fitbit wearables ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan

Alias Age Gender Occupation Fitbit usage Fitbit wearable Collected data

P1 27 M Client services trainer 2y 4m Versa lite edition Step, HR, Sleep, Weight (Aria)P2 27 F Chemical engineer 4y 6m Alta Step, HR, SleepP3 28 F Freelance social media manager 4y 10m Versa Step, HR, SleepP4 32 M Implementation specialist 5y 1m Ionic Step, HR, Sleep, Weight (BT)P5 35 F Freelance photographer 5y 2m Blaze Step, HR, Sleep, Weight (BT)P6 23 F Graduate student 4y 8m Alta HR Step, HR, SleepP7 23 F Graduate student 5y 6m Charge 2 Step, HR, Sleep, WeightP8 46 F Product manager 1y 7m Versa Step, HR, Sleep, WeightP9 30 F Healthcare consultant 5y 9m Versa Step, HR, Sleep, Weight (Aria)P10 24 M Unemployed (Varsity football player) 1y 8m Versa Step, HR, Sleep, WeightP11 28 M Software engineer 4y 1m Charge 3 Step, HR, Sleep, WeightP12 36 F Professional figure skater 2y 1m Alta Step, HR, Sleep, WeightP13 39 F Market survey manager 4y 4m Alta Step, HR, Sleep, Weight

BT: Bluetooth scales which are from other vendors but can feed the data to Fitbit.

Table 1: Summary of demographic and the Fitbit experience of our study participants. for at least six months and tracked at least three of the followingmeasures: step count, heart rate, sleep, and weight; (3) are interestedin looking at their Fitbit data; (4) are currently using iPhone orAndroid; (5) have no visual, motor, or speech impairments; (6) haveused voice recognition systems within six months with a generallypositive experience; and (7) can understand simple charts.The demographic and Fitbit usage information of our study par-ticipants is shown in Table 1. Participants’ ages ranged from 23 to46 ( avg = 30.62). Ten participants were full-time employees, twowere graduate students, and one was unemployed. At the time ofthe study, participants had used Fitbit for an average of 47 monthsbased on their account creation date. Four participants had beentracking weight using Aria [29] or third-party Bluetooth scales(which share the data with Fitbit). The screen sizes of participants’smartphones ranged from 4.7 to 6.1 inches (eight participants usediPhones with a 4.7-inch screen). We offered a 30 USD Amazon giftcard for their participation.

A fully-remote study using partici-pants’ own data required us to take extra precautions, such as miti-gating potential invasion of privacy from the use of participants’own smartphones, preparing the training material, and establishingrobust audio- and video- recording methods. Furthermore, becauseFitbit allowed only 150 API calls per hour per account, we had toprefetch the Fitbit data before the study session. (Immediately afterthe study session, we deleted the participants’ Fitbit data from ourserver.) To do so, we sent participants a link to a web page wherethey could sign an electronic consent form and fill out a pre-studyquestionnaire asking their Fitbit usage patterns and experiences ofusing voice assistant systems. After completing the questionnaire,the participants were asked to sign in with their Fitbit accounts sothat our crawler could cumulatively download participants’ entireFitbit data. We also delivered the Data@Hand app to participantsthrough TestFlight (iOS) or Google Play beta testing (Android) sothat they could install the app on their phone.

Participants joined a 90-minute studysession via Zoom video call [80] from their computer. Figure 5illustrates the settings of the remote study session. Via TeamViewerQuickSupport [71], participants shared their smartphone screenwith the experimenter. Prior to the screen sharing, the experimenterinstructed participants to remove any privacy-sensitive informationfrom their home screen and to turn off all the notifications. Theexperimenter then shared his monitor screen with the participantusing Zoom’s screen sharing feature: the participant could see howtheir smartphone screen was being displayed to the experimenter.Using the recording feature in Zoom, the experimenter recordedthe video call session including the shared smartphone screen.

Tutorial.

After explaining the goal of the study, the experimentergave a 40-minute tutorial, using an example dataset containingfabricated data generated based on the first author’s four years ofFitbit data. The tutorial covered Data@Hand’s key design compo-nents and interactions—data sources & charts, time manipulation,data navigation, temporal comparison, and data-driven queries.The experimenter introduced and demonstrated each feature us-ing presentation slides and a video clip on a shared screen (referto our supplementary material available at https://data-at-hand.github.io/chi2021 ), and gave participants a chance topractice before moving to a new feature. In particular, participantswere encouraged to practice the push-to-talk interaction exploringthe example dataset. We gave them example speech queries (viashared Zoom screen) that they could use for practice, but also en-couraged them to try out their own speech commands. We gavethem enough time to practice until they feel comfortable using boththe touch and speech input to interact with Data@Hand.

Free-form Exploration.

In this phase, the experimenter instructedparticipants to freely explore their own data with [email protected] asked them to use any of the supported interaction modalities(touch, speech, touch+speech) of their choice. We also asked themto think aloud as they explored the data so that we can capturethe insights they found and the challenges they faced, and under-stand their intentions and experiences. The experimenter observed

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

Training Material

Participant Experimenter

Share smartphone screenShare monitor screenVideo Call (Zoom) + Screen Recording

Figure 5: Settings of the remote study session using video call and screen sharing. The participant shares the smartphonescreen with the experimenter, and the experimenter shares his monitor screen with the participant. how participants interacted with Data@Hand through the sharedscreen showing their smartphone screen and faces. We audio- andvideo-recorded each video call session and the system logged the in-teraction history and uploaded the logs to our server. This free-formexploration phase lasted approximately 20 minutes.

Debriefing.

We conducted a semi-structured interview around 10–15 minutes at the end of the session. We asked participants abouttheir experiences with Data@Hand, difficulties and confusing fea-tures, and follow-up questions based on our observation in theexploration phase. We also asked them about the use cases ofData@Hand they could envision and their willingness to keepusing the Data@Hand app after the study.

We analyzed the video recordings and the interaction logs from thefree-form exploration phase. We performed both quantitative andqualitative analysis to examine how participants used the speechand touch modalities in finding personal insights. As for the quanti-tative analysis, we first extracted participants’ interaction attemptsto perform actions for data navigation, temporal comparisons, anddata-driven queries, reviewing the exploration videos and interac-tions logs. We defined an interaction attempt as a series of low-levelinteractions (e.g., tapping, swiping) that were involved to obtain adesired outcome. To modify a start date, for example, people maytap the start date on the range widget to invoke a calendar popupand tap the target date. We treated this series of tap interactions asone attempt with touch-only interaction.As for the qualitative analysis, we analyzed the transcripts fromthe exploration phase to identify the types of personal insights,following Choe and colleagues’ definition of personal insight (“anindividual observation about the data accompanied by visual ev-idence”) [17, 18]. We extracted personal insights and categorizedtheir types. For example, we extracted the following from P10’sexploration session: (On the Cyclical Comparison page) “

Just prettyinteresting that I get my most steps on Saturdays. ” We coded thisobservation with three insight types: extreme (“ most steps ”), refer-ence (“ Saturdays ”), and comparison by time segments (essential toidentify the day with the most step counts in this case). We describewhen and how participants gained the insights in Section 5.2. We transcribed the audio recordings of the debriefing interviews,which were grouped according to the following aspects: (1) partici-pants’ rationales of choosing the input modalities; (2) new analy-ses/tasks/questions Data@Hand enabled; (3) challenges participantsencountered; and (4) how participants envisioned the use cases ofData@Hand in their own contexts. We referenced this informationwhen interpreting the video recordings and interaction logs, as wellas to understand participants’ general reactions to Data@Hand(reported in Section 5.3).

In this section, we report the results of our study in three parts: (1)interaction usages, (2) personal insights, and (3) general reactionsto Data@Hand.

We identified 809 interaction attempts in total. Among these, 400(49.4%) were touch-only, 281 (34.7%) were speech-only, and 128(15.8%) were touch+speech interaction attempts. Among the 400touch-only attempts, five were aborted by participants to performthe equivalent action using speech instead (e.g., P4 first opened acalendar picker to modify the start date, but he closed it and mod-ified the start date using the touch+speech interaction). Amongthe 281 speech-only attempts, 32 were failed due to the recogni-tion/interpretation errors, 16 were invalid actions (e.g., “

Com-pare hours slept ” without designating any comparative periods),and five were unsupported actions (e.g., attempting to executea data-driven query to the aggregation plots). Among the 128touch+speech attempts, eight were recognition/interpretation er-rors, and seven were invalid actions (e.g., uttering a date where aperiod is required). As a result, 736 (395 touch-only, 228 speech-only,and 113 touch+speech) interaction attempts were successfully exe-cuted, which we call operations from now on. Of these, we includedonly 589 operations that are relevant to the three main features(time manipulation, temporal comparison, and data-driven queries)into further analysis (268 touch-only, 209 speech-only, and 112touch+speech operations). The rest 147 operations consisted of 127touch-only and 19 speech-only operations for data source manip-ulation, and one touch+speech operation for data-driven query.Figure 6 visualizes these operations by participant. In the following ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan subsections, we describe participants’ detailed usage of these inputmodalities for the three main features.

In total, participants manipulated time470 times (see Table 2 for the summary). Participants specifiedtime (

T1–5 ; e.g., change the start/end date) or manipulated time aspart of executing comparison (

C1–4 ) or data-driven queries ( Q1 ).When manipulating time, participants actively used both speechand touch: to navigate to a new date range, participants used speech-only interaction 71 times ( T3 ) and a calendar picker with touch48 times ( T1 ). To modify ranges in the Two-range Comparisonpage, participants tended to use touch+speech interaction on theaggregation plot instead of touch-only interaction. As shown inTable 3-top, 12 participants used touch+speech 37 times ( T5 ) while3 participants used touch-only 13 times ( T1 ).When participants modified only start or end date of the daterange, their behaviors differed depending on the distance betweentheir target date and the currently selected one. If the target datewas close to the original one (especially within the same month),they preferred a calendar picker ( T1 ) as it would require only afew taps. On the other hand, if the target date was far from thecurrently selected date (e.g., several months away), they preferredtouch+speech interaction ( T4 ) by long-pressing the date label andmentioning the target date.Three participants (P5, P11, P13) heavily used swipe to modifythe date range ( T2 , 146 out of 170 times). For example, starting from the Data Source Detail page for weight in year 2020, P13swiped through the year 2016, skimming the trend of each year (seeswiping sequences in Figure 6). The touch-only swipe is a quickway to navigate through using a preset date range. Table 3 summarizes the operationsto execute comparisons (

C1–4 ), including the cases that modifytime as part of the follow-up comparisons (

T1–5 ). As shown inFigure 6, participants often performed a series of comparisons (184operations, units in both green and yellow without an X mark)by refining the time range (or the data source). When executingcomparison queries ( C1–4 ), participants tended to choose the inputmodalities depending on the type of comparison. For two-rangecomparison , participants were inclined to use speech-only inter-action ( C2 ): only three participants used the Compare button toexecute the two-range comparison query with touch-only inter-action ( C1 , 4 instances). On the other hand, participants showedmixed patterns on modality for cyclical comparison : among 12participants who used the cyclical comparison, four participants(P1, P6, P12, and P13) used only touch ( C3 ), two (P2 and P3) usedonly speech ( C4 ), and the other six used both modalities. Table 4 summarizes the operations ded-icated to initializing and editing a data-driven query. Data@Handsupported only speech-only interaction to initialize a query ( Q1 ),with touch-only interaction only for follow-up editing of the rec-ognized parameters through the widgets on the query bar ( Q2 ). Time manipulationTemporal comparison

Initiation of temporal comparison

Data-driven queryTouch+speechSwiping sequenceSpeech-only

Interaction ModalityMisc.Feature

P1P2P3 P4P5

26 9 4 10 11

P6P7P8 P9P10P11

P12P13

15 13 7 7

Figure 6: Sequences of operations—successfully executed interaction attempts—that are relevant to the three main features(time manipulation, temporal comparison, and data-driven query) by participant. Each unit on the horizontal-axis representsone operation, and the color of rectangles in a unit indicates the intended feature. The X mark indicates the initiation oftemporal comparison (and thus only applicable for yellow rectangles; see C1–4 in Table 3). The border of a unit indicates theuse of the speech modality (i.e., speech-only or touch+speech). A series of swiping to manipulate time is bundled or collapsedwith a black horizontal line. This operation overview shows that participants used all three interaction types to performvarious actions. Also, the prevalence of a series of the green+yellow units without X suggests that participants often performeda series of comparisons with time refinement.

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

Action Operation Pattern Modality Total P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13Modifytime directly

T1.

Tap start/end date label then pick date T

48 11 1 13 1 6 3 3 3 4 1 2

T2.

Swipe range widget until reaching the target T

170 8 2 62 4 4 5 1 41 43

T3.

Speak <( data source ) period >“

Last 30 days ” / “

Step count in 2019 ” S

71 6 11 5 6 5 7 8 3 7 6 4 3

T4. start/end date label and speak < date >start date + “

January 1, 2019 ” T S

75 4 1 1 1 14 1 18 8 13 4 10

T5. aggregation plot and speak < period >In the two-range comparison page,the left period plot + “

January 2020 ” T S

37 1 10 3 6 3 4 2 2 1 1 1 3 𝑎 Executecomparison query

C1+3.

Tap Compare button then configure parameters T

13 1 1 1 2 1 5 2

C2.

Speak “

Compare January 2018 with January 2019 ” S

30 2 4 6 3 3 1 3 2 1 1 3 1

C4.

Speak <( data Source ) cycle period >“

Show me sleep by month for 2020 ” S

16 8 3 1 1 1 1 1 𝑏 Executedata-driven query

Q1.

Speak < condition period >“

Maximum step count last month ” S

10 1 1 3 5(): There exist operations which do not include this parameter. 470 25 34 37 17 86 34 16 13 39 21 65 22 61 𝑎 The occurrences are a subset of the operations of the equivalent patterns in Table 3. 𝑏 The occurrences are a subset of the operations of the equivalent patterns in Table 4.

Table 2: Summary of operations that contributed to the time manipulation, with the number of occurrences per participantand example utterances from participants. The modality column indicates the input modality of the operation (T: touch-only,S: speech-only, TS: touch+speech). T1–5 indicate the operations to directly manipulate time, whereas C1–4 and Q1 indicate theoperations to execute the comparison or data-driven queries with time parameters.

In total, participants executed data-driven queries 89 times ( avg =6.85). Eight participants edited a recognized query parameter usinga parameter widget 24 times. A majority of the data-driven querieswere invoked to identify extreme values (e.g., lowest step count) ordays when they achieved a goal (e.g., days with step counts higherthan 10,000). However, participants also used data-driven queriesto identify unusual days (e.g., days with step counts less than 3,000steps for lazy days [P5], days with bedtimes later than 5:00 AM forthe days with sleep troubles [P12]).

We extracted 367 data-driven observations ( avg = 28.23, min = 10, max = 52) from 13 participants and derived 694 personal insights.Table 5 summarizes the insight type categories, frequency, andexample quotes for each category (refer to [18] for definition ofeach category). Overall, participants gained various types of in-sights, covering most of those observed with a desktop personaldata exploration tool [18] and from the data presentation videos ofquantified-selfers (the enthusiastic self-trackers) [17], except threecategories: comparison by multiple services , prediction , and distri-bution by category . In this section, we highlight notable categoriesand how Data@Hand supported gaining the insights.Participants found 143 instances of comparison insights lever-aging Data@Hand’s two types of comparison: two-range compari-son and cyclical comparison. They actively drew existing knowl-edge, or external contexts , which were not captured in the data, in executing comparison queries; these contexts often served as a fac-tor of comparison. For example, most participants were interestedin comparing their activity level before and after the stay-at-homeorders around mid-March 2020 caused by the COVID-19 outbreak;common patterns for this were to compare a recent month (afterthe lockdown) with the same month of previous years (e.g., July2019 vs. July 2020), and to investigate the monthly trend of year2020. Other factors participants considered include job changes,start of a new project, and school semesters. When participantswere curious about a past period, they usually compared it withthe recent period (e.g., “ This month ,” “

Last 90 days ”) as a reference.Participants compared not only values (e.g., measurement val-ues or their average) but also other aspects of their data, such as trends and variability or consistency . In the comparison page,participants often inferred the variability from the aggregationplots, as each plot showed the range of the values (e.g., “[Lookingat the days-of-the-week comparison screen] ...the time I went to bedwas most inconsistent on Mondays, Wednesdays, and Saturdays. ” –P7). About a half of the variability instances (23 out of 49) wereinferred from the aggregation plots. When comparing trends amongdifferent periods, however, participants used working memory, bysequentially navigating to each period because the comparisonscreens only provided with the aggregated information.Eight participants sought correlation insights by associatingvalues from different data sources on the same day. The highlightsprovided in response to data-driven queries served as visual links, ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan

Action Operation Pattern Modality Total P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13Two-range ComparisonExecute two-rangecomparison query

C1.

Tap Compare button then configure period ( period ) T C2.

Speak S

30 2 4 6 3 3 1 3 2 1 1 3 1 𝑎 Modifytime directly

T1.

Tap start/end date label then pick date T

13 3 8 2

T2.

Swipe range widget until reaching the target T

45 6 39

T4. start/end date label and speak < date >

T S

T5. aggregation plot and speak < period >

T S

37 1 10 3 6 3 4 2 2 1 1 1 3

Cyclical ComparisonExecute cyclicalcomparison query

C3.

Tap Compare button thenconfigure ( data source ) cycle ( period ) T

22 1 2 1 4 5 1 1 1 4 2

C4.

Speak <( data Source ) cycle ( period )> S

19 9 3 1 1 2 1 1 1 𝑏 Modifytime directly

T1.

Tap start/end date label then pick date T T2.

Swipe range widget until reaching the target T

54 13 4 1 36

T3.

Speak < period > S

10 1 3 3 1 2

T4. start/end date label and speak < date >

T S

17 5 4 7 1(): There exist operations which do not include this parameter. 259 8 27 24 9 66 18 15 5 8 10 49 11 9 𝑎,𝑏

The occurrences are a subset of the operations of the equivalent patterns in Table 2.

Table 3: Summary of operations that contributed to establish a new comparison. C1–4 indicate the operations to execute acomparison query and T1–5 indicate the operations to directly manipulate time during the comparison, with their occurrencesdivided by the type of comparison.

Operation Pattern Modality Total P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13

Q1.

Speak < condition ( period )>“

Days I slept more than six hours ” S 89 4 6 9 1 3 4 19 6 11 4 9 7 6

Q2.

Tap parameter widget then modify query parameter T 24 2 1 11 4 2 1 2 1(): There exist operations which do not include this parameter.

Table 4: Summary of operations to execute data-driven queries. Q1 indicates the operations to execute a query in naturallanguage, and Q2 indicates the operations to use the parameter widgets on the query bar to edit the recognized query. as the same days were highlighted across all data sources. Guided bydata-driven queries, participants often scrolled through the chartson the Home page or switched the data source on the Data SourceDetail page, to identify similar patterns of peaks and drops againstthe superposed highlights. For instance, P13 highlighted days withlow resting heart rate (lower than 56) on the heart rate detail page,and then navigated to the step count page, finding the similaritybetween the highlighted days and the daily step count values onthose days (“ ...the days my heart rate was low, was the days with mystep count also, fairly low, for most days ” – P13).

In the debriefing interview, we gathered participants’ feedback onData@Hand. In general, participants expressed excitement aboutthe flexible time navigation and comparison features Data@Handoffered. Participants described time manipulation in natural lan-guage to be fast and flexible . For instance, P7 remarked, “

I liked that I was able to change the date using the speech becauseI thought that was really easy rather than having to go throughall of the different dates and months. And also it was cool to saylike ‘around this date’ or ‘this month’ and it would get what youwere talking about, whereas I think it could be hard to do in atraditional touch format. ”Nine participants contrasted their experience with Data@Hand totheir previous experience with the Fitbit App, which was tediousto reach a specific date or period. P1 mentioned, “

If I was in theFitbit App, I would have to go through a whole bunch of differentscreens to find one specific date and all of that data .” Participantsalso appreciated being able to view two periods side-by-side in theTwo-range Comparison page. Seven participants reported that theywanted similar feature in the Fitbit App, with which they currentlyhave to sequentially view the two periods, while relying on workingmemory. P3 remarked, “

It [Fitbit App] just isn’t as efficient, but Idefinitely have gone and looked back. But it took so much time to doit, but by the time I got to the date from last year and then I got backto today, I forgot [the last year’s value]. ” HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe

Insight Type and Frequency Example QuotesDetail (263)

Identify value (109) - 12Ps “

What was that day? I got so much sleep. Wow, I got 13 and a half hours on that Monday. ” – P10

Identify references (104) - 13Ps “

Um.. wonder why it’s slow in April. ” – P2“

It’s pretty consistent, but in December I had a very broad range the hours. ” – P6

Identify extreme (50) - 12Ps “

It looks like December was probably my highest activity month if I look back at the whole trend. ” – P8

Comparison (143)

Of two instances (66) - 11Ps “

It’s interesting to see my step count average is I was much more all over the place in October. But in February, I was much moreconsistent, which is better. ” – P1

By time segments (43) - 10Ps “

My average was cut about a half from January, February, down to March. ” – P4

By factor (33) - 10Ps “

My average (step count) was obviously higher six months ago than it is now because we’re all locked in our houses. ” – P8

Against external data (1) - 1P “

I’ve been tracking what I eat. So I definitely used to weigh more. Look at that. this is like 150 lb. ” – P11

Recall (115)

External context (79) - 13Ps “

That[hourly step count chart] would remind me of what I did that day. I know, based on the time of day and the day of theweek, that it was a hike that I went on and that’s why I got the extra steps. ” – P8

Confirmation (26) - 8Ps “

I know I’ve been getting less sleep recently compared to before. That’s what I wanted to see. ” – P11

Contradiction (10) - 5Ps “ ...because nothing feels the same (after the COVID-19 outbreak). But it’s interesting to see that the data looks less terrible thanI expected, and I’m kind of happy. ” – P12

Value judgment (51) - 10Ps “

My average bedtime is.. somewhere between 2 and 4 am..pretty terrible. ” – P9

Variability (49) - 12Ps “

Looks like I mean for everything in 2020, it seems to be my sleep is getting much more consistent, which is a good sign. ” – P1

Trend (42) - 11Ps “

It’s an increasing trend but like it’s super low in the beginning of April which I find odd. ” – P2

Correlation (18) - 8Ps “

The days that I have done the least amount of steps are the days of my heart rate is the lowest on average.That makes sense. ” – P3

Outlier (7) - 3Ps “

Wow, there is definitely an outlier there. ” – P1

Data summary (6) - 3Ps “

My average steps is 10760. Wow 4 millions (of total steps) *laugh* that’s a lot. Range anywhere from eight to twenty fourthousand, because sometimes I didn’t wear it. ” – P6

Table 5: Visualization insight types identified from the transcripts from the exploration phase, frequency with the number ofparticipants (Ps), and example quotes. The insight types are sorted by frequency.

Lastly, to gauge the utility of Data@Hand beyond the studysession, we asked participants if they would be willing to useData@Hand in real life. (Data@Hand on their phones will continueto work by retrieving recent data directly from the Fitbit server.)All but one participant said that they would keep the Data@Handapp for continued use after the study. One participant decided notto keep the app because she had not been engaging in physicalactivities as usual due to the COVID-19 lockdown.

In this section, we discuss lessons learned from the design andimplementation of Data@Hand as well as from the exploratorystudy. We also reflect on implications for better supporting personaldata exploration through multimodal interaction in mobile contexts.

Enabling flexible time manipulation was one of our key designrationales (DR2), which we strongly believe facilitates the insightsgaining process. We observed that Data@Hand’s time manipula-tion feature was well received by participants, mainly due to thespeech that enabled flexible expression of time that participants arealready familiar with. To support time-based data exploration with personal data, a system needs a time range specified by start andend time points (this is commonly applied in typical graphical rangewidgets). People, however, express time in a variety of ways, notnecessarily specifying two time points. Examples include the rangeusing the present date as an implicit anchor (e.g., “

Last six months ”),and the range presented with a semantic phrase (e.g., “

This March ”indicating the range from March 1, 2020 to March 31, 2020). Further-more, people may not remember the date for notable events (e.g.,the summer solstice) and some events are not represented with afixed date (e.g., Thanksgiving in the United States). Accommodatingthese diverse time expressions in Data@Hand enabled participantsto easily and flexibly set time ranges.Major speech recognizers handle national holidays and seasonsingrained in our culture, but they do not know key personal events(e.g., job change, vacation, semester), which are important in explor-ing personal data. Thus, our study participants had to rely on theirmemory to perform time-based exploration using their personalcontexts. Tagging personally meaningful events and being able torefer to them with speech could address this issue, offering great op-portunities to enhance personal data exploration (e.g., “

Comparemy sleep range between spring semester and summer break ”).Although our work focused on a much-needed mobile scenario ofexploring personal health data, we envision that multimodal inter-actions leveraging time expressions can be similarly useful in other ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan contexts. For example, flexible time navigation and comparisonfeatures can facilitate exploring personal data beyond the healthand fitness domains, such as productivity, finance, and locationhistory (e.g., “

Compare last week’s screen time with this week’s ,”“

Most expensive expense during the last quarter ,” “

Show methe places that I visited in the last three months ”).

Our quantitative results showed that participants used both speechand touch modalities, individually and in tandem, performing allthree types of interactions—touch-only, speech-only, and touch+speech. Our observations and participants’ feedback also suggestthat participants made deliberate choices between the two inputmodalities. In the debriefing interview, participants distinguishedthe advantages of the two modalities.

Speech interaction was generally considered to be fast and flex-ible, especially when making big shifts in terms of time ranges, orexecuting commands involving multiple keywords (e.g., “

Com-pare step counts of this month and last month, ” from the Home page).P13 remarked, “

I would like to be able to use voice for more thingsjust because it’s easier and I’ve found myself just sometimes thinkinglike, ‘Oh, would this be quicker if I use a voice command?’ ‘Would itbe easier if I use a voice command?’”

On the other hand, touch in-teraction was preferred in some cases, such as shifting time framessuccessively with swipe (see the swiping sequences from Figure 6)or choosing a data source from a list with a tap. This is also reflectedin the high number of touch-only operation for data source manip-ulation (127 out of 146). In addition, participants resorted to touchwhen they were having a difficulty remembering exact keywordsfor a speech command (e.g., “month-of-the-year” for the cyclicalcomparison). Participants favored the touch+speech interactionwhen refining pre-executed commands (e.g., uttering a new datewhile pressing on the date button to modify only the start date).P4 noted, “

I felt much more confident doing that [touch+speech] be-cause I knew that it was only got to manipulate that one aspect of acomparison chart or only the start date, instead of having to be moreprecise with my speech in what I was asking. ” We observed two different patterns of participants’ use of speechcommands—natural language (e.g., the two-range comparison anddata-driven queries) and keyword-based utterances (e.g., uttering“

Hours slept ” to set the data source). They impose different technicalchallenges. The natural language commands were sensitive to thelinguistic structure of participants’ utterance. All 18 interpretationerrors (the system recognized the utterance correctly but did notinterpret the recognized text successfully) occurred during the natu-ral phrasing of commands. To prevent such errors and improve theinterpretation coverage of the system, we can collect utterance ex-amples via crowdsourcing or from pilot studies to identify commonlinguistic structures people use to perform interactions.On the other hand, keyword-based commands were vulnerableto recognition errors [45] with the generalized speech-to-text rec-ognizers we used. All eight errors related to the keyword phrases occurred at the recognizer level. Such recognition errors may beprevented by training the recognizer with keywords as shown inrecent projects [64, 65]. However, using a customized recognizeris currently not feasible for smartphone apps without involving aremote server, which may cause additional delays and thus hamperthe user experience.

Our main design goal with Data@Hand was to support the visualexploration of self-tracking data on smartphones. The smartphoneform factor and personal data context led to the design choices thatare different from general-purpose multimodal data explorationsystems on tablet devices, such as InChorus [64] and Valletto [39].Multimodal interaction of InChorus and Valletto focuses on con-structing visualization (e.g., performing data binding and visualencoding, specifying chart types) and their exploration is driven byattributes in a given table. For example, InChorus incorporates awide range of multimodal interaction to support a flexible visualiza-tion construction on tablets: people can choose the input modalitythey prefer, such as drag-and-drop, point & tap (using two fingers),point and write (with a pen) or speak, utter a command with speech,to perform a mapping between data attributes and chart elements,such as axes and legends. Valletto supports simple touch gesturessuch as swiping and rotating for manipulating visual encoding (e.g.,flipping the axes) and provides a persistent chat panel where peoplecan speak to generate a new chart, exclude/include attributes, orask analytic questions such as correlation between two attributes.However, it would be challenging to provide all of such interactionson a smartphone, which has a smaller screen, needs to be held witha non-dominant hand, and may not support a pen.Furthermore, in the personal data context especially to assistlay individuals, it is not necessary to support flexible visualizationconstruction: it is more important to facilitate easy navigationand comparison across the time dimension in a given chart (e.g.,bar chart and line chart). Therefore, Data@Hand’s multimodalinteraction focuses on the manipulation of time parameters, whilereducing the complexity of interface.Our study results suggest that participants can learn and usesuch multimodal interaction to find various insights from theirself-tracking data, and their reactions were generally positive. Webelieve that Data@Hand’s multimodal interaction achieved a goodbalance between flexibility and learnability by carefully consideringsmartphone form factor and personal data context together. In addi-tion, we note that, for the personal data exploration, Data@Hand’sinteraction can be transferred to tablet form factors.

While describing their real-world use cases in the debriefing, sevenparticipants noted that they would be inclined to use only touchin the public space for two main reasons: (1) they did not want todisturb others and (2) they were afraid that surrounding peoplemight feel awkward seeing them verbalizing health-related queries.For example, P3 mentioned, “

If you’re only able to use speech, there’sno privacy. You can’t be at the doctor’s office and be like, ‘Tell me

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe how much weight I’ve lost’ or ‘What day was I the fattest last year.’ ”These remarks align with the findings from previous research thatprivacy concerns can discourage the use of voice interaction inclose proximity to other people [24], thus potentially limiting theapplicability of speech-incorporated multimodal interaction in thepublic setting.

According to our design rationale (DR1), we used basic charts thatmany people are already familiar with, such as bar charts and linecharts. We also designed a new representation (i.e., aggregationplots) to support temporal comparison, which requires data aggre-gations. Furthermore, to efficiently communicate results for thedata-driven queries, we enhanced basic charts with a highlight-ing capability while presenting data without aggregation (i.e., onebar/dot per day) in the Home page.We see opportunities to improve visual representations to furtherenhance data exploration experiences especially with long-termdata. While participants could view year-long data without aggre-gation (which was shown to be readable in recent research [11]),the current charts would not scale to view a longer term beyond ayear. One straightforward solution would be to use our aggrega-tion plots with data grouping (e.g., by week, by month, by year),a common approach in existing mobile apps. However, the lowerlevel of details induced by the aggregation makes it difficult tohighlight particular data points. It is an open research question toeffectively show query results on these aggregation plots, for ex-ample, highlighting days with steps over 10,000 on the aggregationplots grouped by month.

We had to address a number of challenges to convert the originalplan of running the in-person study into a fully-remote one due tothe COVID-19 outbreak. Here, we share some of the challenges andissues we encountered and how we alleviated them. First, it wasinfeasible to effectively demonstrate multimodal interactions (e.g.,the push-to-talk recording) during the remote tutorial. We thereforeprepared a video clip with subtitles and played it during the tutorialto introduce interaction methods to our participants. Second, wedid not have control over participants’ environment. Before run-ning the pilot sessions, we sent a checklist to our participants (e.g.,turning off the smartphone notification, connecting the laptop to apower cable) and asked them to follow the instructions. However,some participants did not actually comply with the provisions, evenif they confirmed that they did. Furthermore, participants were oc-casionally distracted with pets or family members. To alleviate suchissues in the main study sessions, we thoroughly checked whetherparticipants turned on the most strict do-not-disturb mode priorto the screen sharing. Also, when participants were interrupted,we paused the session and asked them to handle the situation (e.g.,closing the room door).On the contrary, we were pleasantly surprised by unexpected ad-vantages that our original in-person protocol would not have. First,because we were less constrained by time and location, we couldreach a broad audience with diverse backgrounds and occupations. Second, the screen sharing app enabled us to record the smartphonescreen in high-resolution, supporting better observations. Third,the research team members from remote locations could attend andobserve the study session without much interference (by turningoff the webcam and muting the microphone). We demonstratedthat a remote study can be a viable option for deploying and testingmultimodal interactions in a mobile app and hope that this studycould inform other researchers wanting to design and run similartypes of remote studies.

We presented Data@Hand, a novel mobile app that combines twocomplementary input modalities, speech and touch, to supportexploring personal health data on smartphones. Data@Hand sup-ports three types of interactions—touch-only, speech-only, andtouch+speech—to enable flexible time manipulation, temporal com-parisons, and data-driven queries. To examine how multimodalinteraction helps people explore their own data, we conducted anexploratory think-aloud study with 13 long-term Fitbit users. Partic-ipants successfully adopted multimodal interaction and used bothspeech and touch interactions while finding personal insights. Wealso learned when and why people choose one interaction modalityover others. We highlighted several areas for future research, in-cluding incorporating personally meaningful events and contexts,improving the recognition and interpretation of speech commands,and refining visualizations for further enhancing data exploration.We also showed that a remote study can be a viable option fordeploying and testing a mobile app with multimodal interaction.In summary, our work contributes the first mobile app that lever-ages the synergy of speech and touch input modalities for personaldata exploration, and the study conducted with participants’ ownlong-term data on their devices. We hope this work can inform andinspire researchers in the visualization and broad HCI communitiesto leverage multimodal interactions to foster fluid and flexible dataexploration on smartphones.

ACKNOWLEDGMENTS

We thank our study participants for their time and efforts. We alsothank Niklas Elmqvist, Snehesh Shrestha, and the HCIL membersat the University of Maryland at College Park, who provided feed-back on the early version of this paper. Jarrett Lee, Lily Huang,Rachael Zehrung, Yuhan Luo, Pramod Chundury, and Ignacio Jau-regui also helped us improve the tutorial protocol and material.This study was in part supported by the National Science Foun-dation award

REFERENCES [1] Christopher Ahlberg and Ben Shneiderman. 2003. Visual Information Seeking:Tight Coupling of Dynamic Query Filters with Starfield Displays. In

The Craft ofInformation Visualization . Morgan Kaufmann Publishers Inc., San Francisco, CA,USA, 7–13. https://doi.org/10.1145/191666.191775 [2] Majedah Alrehiely, Parisa Eslambolchilar, and Rita Borgo. 2018. A Taxonomy forVisualisations of Personal Physical Activity Data on Self-Tracking Devices andTheir Applications. In

Proceedings of the 32nd International BCS Human Computer ata@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction CHI ’21, May 8–13, 2021, Yokohama, Japan

Interaction Conference 32 . BCS Learning and Development Ltd., Swindon, UK,1–15. https://doi.org/10.14236/ewic/HCI2018.17 [3] Apple Inc. 2021. Health. Retrieved Jan 05, 2021 from [4] Apple Inc. 2021. SIRI shortcuts boost health and fitness routines - Apple NewsRoom. Retrieved Jan 05, 2021 from [5] Apple Inc. 2021. Speech | Apple Developer. Retrieved Jan 05, 2021 from https://developer.apple.com/documentation/speech [6] Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, and SheelaghCarpendale. 2020. Activity River: Visualizing Planned and Logged PersonalActivities for Reflection. In

Proceedings of the 2020 International Conference onAdvanced Visual Interfaces (AVI ’20) . ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3399715.3399921 [7] Jillian Aurisano, Abhinav Kumar, Alberto Gonzales, Khairi Reda, Jason Leigh,Barbara Di Eugenio, and Andrew Johnson. 2015. “Show me data”: ObservationalStudy of a Conversational Interface in Visual Data Exploration. In

IEEE VIS 2015(Poster) . IEEE, Washington, DC, USA, 2 pages.[8] Jillian Aurisano, Abhinav Kumar, Alberto Gonzalez, Jason Leigh, Barbara DiEugenio, and Andrew Johnson. 2016. Articulate2: Toward a ConversationalInterface for Visual Data Exploration. In

IEEE VIS 2016 (Poster) . IEEE, Washington,DC, USA, 3–4.[9] Tanja Blascheck, Lonni Besançon, Anastasia Bezerianos, Bongshin Lee, and PetraIsenberg. 2018. Glanceable Visualization: Studies of Data Comparison Perfor-mance on Smartwatches.

IEEE transactions on visualization and computer graphics

25, 1 (2018), 630–640. https://doi.org/10.1109/TVCG.2018.2865142 [10] Kerstin Blumenstein, Christina Niederer, Markus Wagner, Grischa Schmiedl,Alexander Rind, and Wolfgang Aigner. 2016. Evaluating Information Visualizationon Mobile Devices. In

Proceedings of the Beyond Time and Errors on Novel Evalua-tion Methods for Visualization - BELIV ’16 , Vol. 24-October. ACM Press, New York,New York, USA, 125–132. https://doi.org/10.1145/2993901.2993906 [11] Matthew Brehmer, Bongshin Lee, Petra Isenberg, and Eun Kyoung Choe. 2018.Visualizing Ranges Over Time on Mobile Phones: a Task-Based CrowdsourcedEvaluation.

IEEE Transactions on Visualization and Computer Graphics

25, 1 (2018),619–629. https://doi.org/10.1109/TVCG.2018.2865234 [12] Matthew Brehmer, Bongshin Lee, Petra Isenberg, and Eun Kyoung Choe. 2019. AComparative Evaluation of Animation and Small Multiples for Trend Visualiza-tion on Mobile Phones.

IEEE Transactions on Visualization and Computer Graphics

26, 1 (2019), 364–374. https://doi.org/10.1109/TVCG.2019.2934397 [13] Yang Chen. 2017. Visualizing Large Time-series Data on Very Small Screens.In

Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: ShortPapers . The Eurographics Association, Goslar, Germany, 37–41. https://doi.org/10.2312/eurovisshort.20171130 [14] Eun Kyoung Choe, Raimund Dachselt, Petra Isenberg, and Bongshin Lee. 2019.Mobile Data Visualization (Dagstuhl Seminar 19292).

Dagstuhl Reports

MultimodalVis’18 Workshop at AVI 2018 . ACM, New York, NY, USA, 4 pages. [16] Eun Kyoung Choe, Bongshin Lee, Matthew Kay, Wanda Pratt, and Julie A. Kientz.2015. SleepTight: Low-burden, Self-monitoring Technology for Capturing andReflecting on Sleep Behaviors. In

Proceedings of the 2015 ACM International JointConference on Pervasive and Ubiquitous Computing (Osaka, Japan) (UbiComp ’15) .ACM, New York, NY, USA, 121–132. https://doi.org/10.1145/2750858.2804266 [17] Eun Kyoung Choe, Bongshin Lee, and M.c. Schraefel. 2015. CharacterizingVisualization Insights from Quantified Selfers’ Personal Data Presentations.

IEEEComputer Graphics and Applications

35, 4 (July 2015), 28–37. https://doi.org/10.1109/MCG.2015.51 [18] Eun Kyoung Choe, Bongshin Lee, Haining Zhu, Nathalie Henry Riche, and Do-minikus Baur. 2017. Understanding Self-reflection: How People Reflect on Per-sonal Data Through Visual Data Exploration. In

Proceedings of the 11th EAIInternational Conference on Pervasive Computing Technologies for Healthcare (Barcelona, Spain) (PervasiveHealth ’17) . ACM, New York, NY, USA, 173–182. https://doi.org/10.1145/3154862.3154881 [19] Philip R Cohen, Mary Dalrymple, Douglas B Moran, Fernando C N Pereira, andJoseph W Sullivan. 1989. Synergistic Use of Direct Manipulation and NaturalLanguage. In

Proceedings of the SIGCHI Conference on Human Factors in ComputingSystems (CHI ’89) . ACM, New York, NY, USA, 227–233. https://doi.org/10.1145/67450.67494 [20] Joëlle Coutaz, Laurence Nigay, Daniel Salber, Ann Blandford, Jon May, andRichard M. Young. 1995. Four Easy Pieces for Assessing the Usability of Mul-timodal Interaction: The Care Properties. In

Proceedings of the IFIP TC13 Inter-antional Conference on Human-Computer Interaction (INTERACT ’95) . SpringerUS, Boston, MA, 115–120. https://doi.org/10.1007/978-1-5041-2896-4_19 [21] Kenneth Cox, Rebecca E. Grinter, Stacie L. Hibino, Lalita Jategaonkar Jagadeesan,and David Mantilla. 2001. A Multi-Modal Natural Language Interface to an In-formation Visualization Environment.

International Journal of Speech Technology

4, 3-4 (2001), 297–314. https://doi.org/10.1023/A:1011368926479 [22] Pooja M Desai, Matthew E Levine, David J Albers, and Lena Mamykina. 2018.Pictures Worth a Thousand Words: Reflections on Visualizing Personal BloodGlucose Forecasts for Individuals with Type 2 Diabetes. In

Proceedings of the 2018CHI Conference on Human Factors in Computing Systems (CHI ’18) . ACM, NewYork, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174112 [23] Kedar Dhamdhere, Kevin S. McCurley, Ralfi Nahmias, Mukund Sundararajan, andQiqi Yan. 2017. Analyza: Exploring Data with Conversation. In

Proceedings of the22nd International Conference on Intelligent User Interfaces (IUI ’17) . ACM Press,New York, New York, USA, 493–504. https://doi.org/10.1145/3025171.3025227 [24] Aarthi Easwara Moorthy and Kim-Phuong L. Vu. 2015. Privacy Concerns forUse of Voice Activated Personal Assistant in the Public Space.

InternationalJournal of Human-Computer Interaction

31, 4 (April 2015), 307–335. https://doi.org/10.1080/10447318.2014.986642 [25] Philipp Eichmann, Darren Edge, Nathan Evans, Bongshin Lee, Matthew Brehmer,and Christopher White. 2020. Orchard: Exploring Multivariate HeterogeneousNetworks on Mobile Phones.

Computer Graphics Forum

39, 3 (June 2020), 115–126. https://doi.org/10.1111/cgf.13967 [26] Daniel Epstein, Felicia Cordeiro, Elizabeth Bales, James Fogarty, and Sean Munson.2014. Taming Data Complexity in Lifelogs: Exploring Visual Cuts of PersonalInformatics Data. In

Proceedings of the 2014 Conference on Designing InteractiveSystems (DIS ’14) . ACM, New York, NY, USA, 667–676. https://doi.org/10.1145/2598510.2598558 [27] Facebook Inc. 2021. React Native · A framework for building native apps usingReact. Retrieved Jan 05, 2021 from https://reactnative.dev/ [28] Clayton Feustel, Shyamak Aggarwal, Bongshin Lee, and Lauren Wilcox. 2018.People Like Me: Designing for Reflection on Aggregate Cohort Data in PersonalInformatics Systems.

Proceedings of the ACM on Interactive, Mobile, Wearableand Ubiquitous Technologies

2, 3 (2018), 1–21. https://doi.org/10.1145/3264917 [29] Fitbit, Inc. 2021. Aria Scale. Retrieved Jan 05, 2021 from [30] Fitbit, Inc. 2021. Fitbit. Retrieved Jan 05, 2021 from [31] Fitbit, Inc. 2021. Fitbit Web API. Retrieved Jan 05, 2021 from https://dev.fitbit.com/build/reference/web-api/ [32] Tong Gao, Mira Dontcheva, Eytan Adar, Zhicheng Liu, and Karrie G. Karahalios.2015. DataTone: Managing Ambiguity in Natural Language Interfaces for DataVisualization. In

Proceedings of the 28th Annual ACM Symposium on User InterfaceSoftware & Technology (UIST ’15) . ACM Press, New York, New York, USA, 489–500. https://doi.org/10.1145/2807442.2807478 [33] Garmin Ltd. 2021. Garmin. Retrieved Jan 05, 2021 from https://garmin.com [34] Marti Hearst and Melanie Tory. 2019. Would You Like A Chart with That?Incorporating Visualizations into Conversational Interfaces. In . IEEE, Washington, DC, USA, 36–40. https://doi.org/10.1109/VISUAL.2019.8933766 [35] Enamul Hoque, Vidya Setlur, Melanie Tory, and Isaac Dykeman. 2017. ApplyingPragmatics Principles for Interaction with Visual Analytics.

IEEE Transactions onVisualization and Computer Graphics

24, 1 (2017), 309–318. https://doi.org/10.1109/TVCG.2017.2744684 [36] Huami Inc. 2021. Mi Fit. Retrieved Jan 05, 2021 from https://apps.apple.com/us/app/mi-fit/id938688461 [37] Dandan Huang, Melanie Tory, Bon Adriel Aseniero, Lyn Bartram, Scott Bateman,Sheelagh Carpendale, Anthony Tang, and Robert Woodbury. 2014. PersonalVisualization and Personal Visual Analytics.

IEEE Transactions on Visualizationand Computer Graphics

21, 3 (2014), 420–433. https://doi.org/10.1109/TVCG.2014.2359887 [38] Dandan Huang, Melanie Tory, and Lyn Bartram. 2016. A Field Study of On-Calendar Visualizations. In

Proceedings of Graphics Interface 2016 (Victoria, BritishColumbia, Canada) (GI 2016) . Canadian Human-Computer CommunicationsSociety / Société canadienne du dialogue humain-machine, Waterloo, Canada,13–20. https://doi.org/10.20380/GI2016.03 [39] Jan-Frederik Kassel and Michael Rohs. 2018. Valletto: A Multimodal Interface forUbiquitous Visual Analytics. In

Extended Abstracts of the 2018 CHI Conference onHuman Factors in Computing Systems (CHI EA ’18) . ACM, New York, NY, USA,1–6. https://doi.org/10.1145/3170427.3188445 [40] Jan-Frederik Kassel and Michael Rohs. 2019. Talk to Me Intelligibly: InvestigatingAn Answer Space to Match the User’s Language in Visual Analysis. In

Proceedingsof the 2019 on Designing Interactive Systems Conference (DIS ’19) . ACM, New York,NY, USA, 1517–1529. https://doi.org/10.1145/3322276.3322282 [41] Matthew Kay, Eun Kyoung Choe, Jesse Shepherd, Benjamin Greenstein, NathanielWatson, Sunny Consolvo, and Julie A. Kientz. 2012. Lullaby: A Capture &

Proceedings of the2012 ACM Conference on Ubiquitous Computing (Pittsburgh, Pennsylvania) (Ubi-Comp ’12) . ACM, New York, NY, USA, 226–234. https://doi.org/10.1145/

HI ’21, May 8–13, 2021, Yokohama, Japan Young-Ho Kim, Bongshin Lee, Arjun Srinivasan, and Eun Kyoung Choe [42] Matthew Kay, Tara Kola, Jessica R Hullman, and Sean A Munson. 2016. When(ish) is My Bus? User-centered Visualizations of Uncertainty in Everyday, MobilePredictive Systems. In

Proceedings of the 2016 CHI Conference on Human Factorsin Computing Systems (CHI ’16) . ACM, New York, NY, USA, 5092–5103. https://doi.org/10.1145/2858036.2858558 [43] Spencer Kelly. 2019. Compromise. Retrieved Jan 05, 2021 from http://compromise.cool/ [44] Yoojung Kim, Bongshin Lee, and Eun Kyoung Choe. 2019. Investigating dataaccessibility of personal health apps.

Journal of the American Medical InformaticsAssociation

26, 5 (May 2019), 412–419. https://doi.org/10.1093/jamia/ocz003 [45] Yea-Seul Kim, Mira Dontcheva, Eytan Adar, and Jessica Hullman. 2019. VocalShortcuts for Creative Experts. In

Proceedings of the 2019 CHI Conference onHuman Factors in Computing Systems (CHI ’19) . ACM Press, New York, New York,USA, 1–14. https://doi.org/10.1145/3290605.3300562 [46] Konstantin Klamka, Tom Horak, and Raimund Dachselt. 2020. Watch+Strap:Extending Smartwatches with Interactive StrapDisplays. In

Proceedings of the2020 CHI Conference on Human Factors in Computing Systems (CHI ’20) . ACM,New York, NY, USA, 1–15. https://doi.org/10.1145/3313831.3376199 [47] Abhinav Kumar, Barbara Di Eugenio, Jillian Aurisano, Andrew Johnson, AbeerAlsaiari, Nigel Flowers, Alberto Gonzalez, and Jason Leigh. 2017. MultimodalCoreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification. In

SEMDIAL 2017 (SaarDial) Work-shop on the Semantics and Pragmatics of Dialogue . ISCA, Saarbrücken, Germany,41–51. https://doi.org/10.21437/SemDial.2017-5 [48] Bongshin Lee, Matthew Brehmer, Petra Isenberg, Eun Kyoung Choe, RicardoLangner, and Raimund Dachselt. 2018. Data Visualization on Mobile Devices. In

Extended Abstracts of the 2018 CHI Conference on Human Factors in ComputingSystems (Montreal QC, Canada) (CHI EA ’18) . ACM, New York, NY, USA, ArticleW07, 8 pages. https://doi.org/10.1145/3170427.3170631 [49] Bongshin Lee, Eun Kyoung Choe, Petra Isenberg, Kim Marriott, John Stasko,and Theresa-Marie Rhyne. 2020. Reaching Broader Audiences With Data Visu-alization.

IEEE Computer Graphics and Applications

40, 2 (March 2020), 82–90. https://doi.org/10.1109/MCG.2020.2968244 [50] Bongshin Lee, Arjun Srinivasan, John Stasko, Melanie Tory, and Vidya Setlur.2018. Multimodal Interaction for Data Visualization. In

Proceedings of the 2018International Conference on Advanced Visual Interfaces (Castiglione della Pescaia,Grosseto, Italy) (AVI ’18) . Association for Computing Machinery, New York, NY,USA, Article 11, 3 pages. https://doi.org/10.1145/3206505.3206602 [51] Ian Li, Anind K. Dey, and Jodi Forlizzi. 2011. Understanding My Data, Myself:Supporting Self-reflection with Ubicomp Technologies. In

Proceedings of the 13thInternational Conference on Ubiquitous Computing (Beijing, China) (UbiComp ’11) .ACM, New York, NY, USA, 405–414. https://doi.org/10.1145/2030112.2030166 [52] Jean-claude Martin. 1998. TYCOON: Theoretical Framework and Software Toolsfor Multimodal Interfaces. In

Intelligence and Multimodality in Multimedia Inter-faces . AAAI Press, Palo Alto, CA, USA, 1–25.[53] Microsoft. 2021. Cognitive Speech Services | Microsoft Azure. Retrieved Jan 05,2021 from https://azure.microsoft.com/en-us/services/cognitive-services/speech-services/ [54] Microsoft. 2021. TypeScript. Retrieved Jan 05, 2021 from [55] Laura Pina, Sang-Wha Sien, Clarissa Song, Teresa M Ward, James Fogarty, Sean AMunson, and Julie A Kientz. 2020. DreamCatcher: Exploring How Parents andSchool-Age Children Can Track and Review Sleep Information Together.

Pro-ceedings of the ACM on Human-Computer Interaction

4, CSCW1 (2020), 1–25. https://doi.org/10.1145/3392882 [56] Irene Ros. 2021. MobileVis. Retrieved Jan 05, 2021 from http://mobilev.is/ [57] Sebastian Sadowski. 2021. Mobile InfoVis. Retrieved Jan 05, 2021 from https://mobileinfovis.com [58] Ayshwarya Saktheeswaran, Arjun Srinivasan, and John Stasko. 2020. Touch?Speech? or Touch and Speech? Investigating Multimodal Interaction for VisualNetwork Exploration and Analysis.

IEEE Transactions on Visualization and Com-puter Graphics

0, 0 (2020), 2168–2179. (In press).[59] Samsung. 2021. S Health. Retrieved Jan 05, 2021 from http://shealth.samsung.com [60] Hanna Schneider, Julia Wayrauther, Mariam Hassib, and Andreas Butz. 2019.Communicating Uncertainty in Fertility Prognosis. In

Proceedings of the 2019 CHIConference on Human Factors in Computing Systems . ACM, New York, NY, USA,1–11. https://doi.org/10.1145/3290605.3300391 [61] Michail Schwab, Sicheng Hao, Olga Vitek, James Tompkin, Jeff Huang, andMichelle A Borkin. 2019. Evaluating Pan and Zoom Timelines and Sliders. In

Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems(CHI ’19) . ACM, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300786 [62] Vidya Setlur, Sarah E. Battersby, Melanie Tory, Rich Gossweiler, and Angel X.Chang. 2016. Eviza: A Natural Language Interface for Visual Analysis. In

Pro-ceedings of the 29th Annual Symposium on User Interface Software and Technology .ACM, New York, NY, USA, 365–377. https://doi.org/10.1145/2984511.2984588 [63] Vidya Setlur, Melanie Tory, and Alex Djalali. 2019. Inferencing UnderspecifiedNatural Language Utterances in Visual Analysis. In

Proceedings of the 24th In-ternational Conference on Intelligent User Interfaces . ACM, New York, NY, USA,40–51. https://doi.org/10.1145/3301275.3302270 [64] Arjun Srinivasan, Bongshin Lee, Nathalie Henry Riche, Steven M. Drucker, andKen Hinckley. 2020. InChorus: Designing Consistent Multimodal Interactions forData Visualization on Tablet Devices. In

Proceedings of the 2020 CHI Conference onHuman Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20) . Associationfor Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376782 [65] Arjun Srinivasan, Bongshin Lee, and John T. Stasko. 2020. Interweaving Multi-modal Interaction with Flexible Unit Visualizations for Data Exploration.

IEEETransactions on Visualization and Computer Graphics

0, 0 (2020), 15 pages. https://doi.org/10.1109/TVCG.2020.2978050 (Early access).[66] Arjun Srinivasan and John Stasko. 2017. Natural Language Interfaces for DataAnalysis with Visualization: Considering What Has and Could Be Asked. In

Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: ShortPapers (Barcelona, Spain) (EuroVis ’17) . The Eurographics Association, Goslar,Germany, 55–59. https://doi.org/10.2312/eurovisshort.20171133 [67] Arjun Srinivasan and John Stasko. 2018. Orko: Facilitating Multimodal Inter-action for Visual Exploration and Analysis of Networks.

IEEE Transactionson Visualization and Computer Graphics

24, 1 (Jan. 2018), 511–521. https://doi.org/10.1109/TVCG.2017.2745219 [68] Poorna Talkad Sukumar, Gonzalo J. Martinez, Ted Grover, Gloria Mark, Sidney K.D’Mello, Nitesh V. Chawla, Stephen M. Mattingly, and Aaron D. Striegel. 2020.Characterizing Exploratory Behaviors on a Personal Visualization Interface UsingInteraction Logs. In

EuroVis 2020 - Short Papers , Andreas Kerren, Christoph Garth,and G. Elisabeta Marai (Eds.). The Eurographics Association, Goslar, Germany,1–5. https://doi.org/10.2312/evs.20201052 [69] Yiwen Sun, Jason Leigh, Andrew Johnson, and Sangyoon Lee. 2010. Articulate: ASemi-Automated Model for Translating Natural Language Queries into Meaning-ful Visualizations. In

International Symposium on Smart Graphics . Springer-Verlag,Berlin, Heidelberg, 184–195.[70] Wanasit Tanakitrungruang. 2014. Chrono. Retrieved Jan 05, 2021 from https://github.com/wanasit/chrono [71] TeamViewer. 2021. TeamViewer QuickSupport. Retrieved Jan 05, 2021 from [72] Alice Thudt, Dominikus Baur, Samuel Huron, and Sheelagh Carpendale. 2016.Visual Mementos: Reflecting Memories with Personal Data.

IEEE Transactionson Visualization and Computer Graphics

22, 1 (Jan. 2016), 369–378. https://doi.org/10.1109/TVCG.2015.2467831 [73] Alice Thudt, Bongshin Lee, Eun Kyoung Choe, and Sheelagh Carpendale. 2017.Expanding Research Methods for a Realistic Understanding of Personal Vi-sualization.

IEEE Computer Graphics and Applications

37, 2 (2017), 12–18. https://doi.org/10.1109/MCG.2017.23 [74] Viktoriya Trifonova. 2018. How Device Usage Changed in 2018 and Whatit Means for 2019. https://blog.globalwebindex.com/trends/device-usage-2019 .[75] Matthew Turk. 2014. Multimodal Interaction: A Review.

Pattern RecognitionLetters

36, 15 (Jan. 2014), 189–195. https://doi.org/10.1016/j.patrec.2013.07.003 [76] Andries van Dam. 2001. Post-Wimp User Interfaces: the Human Connection.In

Frontiers of Human-Centered Computing, Online Communities and VirtualEnvironments . Springer London, London, 163–178. https://doi.org/10.1007/978-1-4471-0259-5_11 [77] Minh Tue Vo and Alex Waibel. 1993. Multimodal Human-Computer Interaction.In

Proceedings of ISSD 1993 (ISSD ’93) . Waseda University, Shinjuku City, Tokyo,Japan, 1–8.[78] Benjamin Watson and Vidya Setlur. 2015. Emerging Research in Mobile Visual-ization. In

Proceedings of the 17th International Conference on Human-ComputerInteraction with Mobile Devices and Services Adjunct (Copenhagen, Denmark) (MobileHCI ’15) . Association for Computing Machinery, New York, NY, USA,883–887. https://doi.org/10.1145/2786567.2786571 [79] Bowen Yu and Claudio T. Silva. 2020. FlowSense: A Natural Language Interfacefor Visual Data Exploration within a Dataflow System.

IEEE Transactions onVisualization and Computer Graphics

26, 1 (Jan. 2020), 1–11. https://doi.org/10.1109/TVCG.2019.2934668 [80] Zoom Video Communications, Inc. 2021. Zoom. Retrieved Jan 05, 2021 from[80] Zoom Video Communications, Inc. 2021. Zoom. Retrieved Jan 05, 2021 from