[PDF] Data-First Visualization Design Studies

Abstract

We introduce the notion of a data-first design study which is triggered by the acquisition of real-world data instead of specific stakeholder analysis questions. We propose an adaptation of the design study methodology framework to provide practical guidance and to aid transferability to other data-first design processes. We discuss opportunities and risks by reflecting on two of our own data-first design studies. We review 64 previous design studies and identify 16 of them as edge cases with characteristics that may indicate a data-first design process in action.

Full PDF

DData-First Visualization Design Studies

Michael Oppermann * Tamara Munzner † University of British ColumbiaFigure 1: Reﬁned and extended framework for data-ﬁrst design studies. A new acquire stage is added for both obtaining andabstracting data.

Discover is moved and renamed to elicit , to emphasize the elicitation of tasks from potential stakeholders.

Winnow focuses on analyzing the match between the abstract tasks of these stakeholders and the data abstraction. Nuances differ in the cast and design stages to incorporate the speciﬁc characteristics of the data-ﬁrst approach. A BSTRACT

We introduce the notion of a data-ﬁrst design study which is trig-gered by the acquisition of real-world data instead of speciﬁc stake-holder analysis questions. We propose an adaptation of the designstudy methodology framework to provide practical guidance and toaid transferability to other data-ﬁrst design processes. We discussopportunities and risks by reﬂecting on two of our own data-ﬁrstdesign studies. We review 64 previous design studies and identify16 of them as edge cases with characteristics that may indicate adata-ﬁrst design process in action.

Index Terms:

Human-centered computing—Visualization—Visualization design and evaluation methods—Visualization theory,concepts and paradigms

NTRODUCTION

Design studies are frequently used to conduct and report problem-driven visualization research with domain experts. A design studyis characterized by its highly iterative nature with a tight interplaybetween the analysis and abstraction of stakeholder needs, the trans-formation of domain-speciﬁc into domain-agnostic data, and thevisualization design.The design study methodology (DSM) by Sedlmair et al. [36]provides methodological guidance on how to conduct this type ofapplied visualization research, proposing a nine-stage sequence: learn , winnow , cast , discover , design , implement , deploy , reﬂect ,and write .The DSM focuses on connecting with collaborators in the form ofdomain-expert stakeholders at an early winnow stage with suggestedcriteria for narrowing down the set of potential collaborators, identi-fying their roles in the cast stage, and then undertaking a discover stage focused on problem characterization and task abstraction withthe chosen stakeholder. The selection of an appropriate stakeholderdictates the relevant data, which is then abstracted from domain-speciﬁc to domain-agnostic form in the design stage, in parallel with * e-mail: [email protected] † e-mail: [email protected] iteratively creating appropriate visual idioms. We now characterizethis DSM approach as a stakeholder-ﬁrst ordering.We ﬁnd that this framework falls short of capturing the nature ofdesign studies that are primarily initiated by acquiring an interestingreal-world dataset rather than selecting a speciﬁc stakeholder; wecall these data-ﬁrst design studies. In a data-ﬁrst design study,the early selection of the data constrains appropriate choices forstakeholders. Data abstraction is carried out early, and followedby problem characterization and abstraction for multiple potentialstakeholders. Stakeholders are chosen based on whether their taskscan be supported by the selected data source.These non-traditional data-ﬁrst design studies have not been ex-plicitly reported or analyzed in the visualization literature. How-ever, in retrospect we realize that they occur frequently in differentvariations, especially in class projects or visualization design com-petitions [10, 13, 19] but also in research contexts such as our own.We suspect that every visualization researcher has experienced thesituation of stumbling across a dataset that aroused curiosity. Insome cases, these serendipitous discoveries lead to the explorationof interesting design alternatives where the results and methodsare shared on social media or documented in blog posts. In othercases, especially with more complex datasets or involving uniquedata characteristics, this process can lead to novel visualizationcontributions [34].In this paper, we reﬂect on two of our own design studies thatwe now consider to be examples of a data-ﬁrst approach, the BikeSharing Atlas [30] and

Ocupado [31], that guided our initial thinkingabout this alternative methodological approach for design studies.We follow up with a review of 64 previous design studies where wenote 16 edge cases that may implicitly indicate a data-ﬁrst ordering.The full list of 64 studies is provided in the supplemental materials,and the 16 identiﬁed examples are provided in Appendix A.We contribute a ﬁrst characterization of data-ﬁrst design stud-ies, namely those triggered by data instead of speciﬁc stakeholderquestions. We propose an adapted version of the nine-stage designstudy methodology framework where we introduce a new acquire stage, move and rename discover to elicit , and adjust other stagesaccordingly. In addition, we present a review of previous designstudies and discuss opportunities and risks of these non-traditionaldesign studies. Our goal is to provide practical guidance and aidtransferability of data-ﬁrst design processes. a r X i v : . [ c s . H C ] S e p R ELATED W ORK

We discuss the extensive previous work on design studies and vi-sualization process models to contextualize our proposed data-ﬁrstapproach. Literature on design studies emphasizes the need for real-world data and an early data understanding, but a stakeholder-ﬁrstordering is generally presumed.In the original DSM [36] the order and timing of the data acquisi-tion is not addressed head on; the closest guidance are the questions “Does real data exist, is it enough, and can I have it?” related tothe pitfall of “no real data available (yet)” that is noted in thewinnowing stage where potential stakeholders are being assessed.The design study lite methodology [37] is an expedited version toﬁt within the time frame of undergraduate and graduate visualizationcourses. Recruiting community partners is a precondition in thisframework. Although highly desirable, access to many stakeholderswith course-relevant visualization needs is not always feasible. Inlarger class settings, students may not have access to stakeholdersbut could begin with data. The data-ﬁrst methodology would providea path for some or all such student groups to engage with domainexperts at a later stage.Munzner’s nested model for visualization design [27] and differ-ent types of process models assume a user with a speciﬁc domainproblem at the start, who may act as a data provider as well. Theactivity-centered framework [22] suggests to mandatorily requestaccess to real data, or a data sample of sufﬁcient size, early in theprocess. McKenna et al. [24] proposed a framework that links ac-tions directly to the nested model with the four activities, understand , ideate , make , and deploy , whereby a project can start with any ac-tivity. The understand activity implies the data acquisition and taskelicitation of the proposed data-ﬁrst approach.McCurdy et al. [23] suggest that a real-world domain problemis either expressed by domain experts or discovered by design re-searchers, whereas the latter is closely related to our framework adap-tation. The human-centered design process by Lloyd and Dykes [21]highlights the effectiveness of real and interesting data to engageparticipants in requirement elicitation, which is a key objective indata-ﬁrst design studies.Crisan and Munzner [9] proposed a conceptual framework ofdata reconnaissance and task wrangling to explore unfamiliar datalandscapes. This framework puts data ﬁrst but focuses on domainexperts and how they reﬁne data to support a speciﬁc analysis goal.Goodwin et al. [14] presented a creative design case study in theenergy domain that explores possible visualization usage scenariosfor a smart meter dataset, which is closely related to our suggestedworkﬂow but discusses primarily creativity techniques and assumesa set of users at the beginning.In the context of evaluating visualization tools in large compa-nies, Sedlmair et al. [35] distinguish between employee-pull and researcher-push solutions. Although the fundamental concept ofpushing ideas from an outside perspective is similar, these authorsregard it as advertising a speciﬁc tool. We instead propose to adver-tise data and the use of visualizations to solve a domain problemtogether. ASE S TUDIES

We introduce two of our own previously published design studies thatwe now consider to be examples of data-ﬁrst design studies, beforewe generalize the underlying method in a reﬁned DSM frameworkin Sect. 4 and discuss its implications in Sect. 5.The Bike Sharing Atlas combines and visualizes distributed datafrom several hundred bike-sharing networks worldwide [12]. Weshow that the data produced by these networks reveal interestinginsights, not only into patterns of bicycle usage, but also the underly-ing spatiotemporal dynamics of a city. By working with users fromdifferent domains including a bike sharing operator, a city planner,and urban sociologists, we illustrated how interactive visualization can help to open the data that is produced in our cities to a wideraudience.Ocupado is a set of visual decision-support tools centered aroundoccupancy data for stakeholders in facilities management and plan-ning [31]. We take WiFi device counts as a proxy for human pres-ence, showing how to leverage data that had previously only beenused for automation in building control systems in many new ways.We interviewed potential stakeholders from different domains andevaluated the conformity between tasks and data affordances. In ahighly iterative process and with extensive feedback from a set ofcore stakeholders, we developed and deployed Ocupado, which waseventually adopted by our industry partner.These two projects were different than regular design studies inthe way that we engaged with stakeholders after acquiring the data.We had worked with a bike sharing operator early on to analyzedata from Vienna, but the discovery and collection of data frommany station-based bike sharing networks worldwide [8] triggeredadditional use cases and aroused interest from new stakeholdersworking in different domains. The Ocupado project began througha collaboration with a startup [1] that estimates occupancy basedon WiFi signals for building automation usage. We conjecturedthat with suitable visualizations, this data could be actively usefulto many additional user groups for decision-making and resourcemanagement, and explicitly sought out these potential stakeholders.In both design studies we had to consider whether to embed ad-ditional data sources into the mix to support potential stakeholdertasks. For example, for the bike sharing project, we included eleva-tion proﬁles because the elevation differences between stations arean essential factor for the functioning of a bike sharing system, andthe combined data can provide relevant insights. When we workedwith custodial managers during the Ocupado project, we were askedto add work schedules that can be compared to occupancy statistics.We ultimately decided against an integration because these scheduleswere not relevant for other stakeholders and it was unclear if andhow they facilitate core tasks related to occupancy analysis.

EFINED AND E XTENDED F RAMEWORK

We propose an alternative, extended version of the DSM framework,generalizing our experience as a guide for future data-ﬁrst designstudies. The data-ﬁrst design study stages are shown in Fig. 1.One major change is to add an early acquire stage focused on bothobtaining and abstracting data. The second change is to move the discover stage immediately afterwards and rename it elicit to reﬂectits task-based emphasis: obtaining information about and abstractingtasks. We keep the same ordering and general function of the winnow and cast stages, though winnow is heavily dependent on the initiallyacquired data, and nuances differ in the cast stage. The focus ofthe design stage is narrowed to cover only idioms, since the dataabstraction occurs earlier in the new acquire phase. The subsequentstages implement and deploy are unchanged, as is the initial learn stage.

Reﬂect and write are adjusted to reﬂect and report also onthe peculiarities of the data-ﬁrst process. As with the original DSMframework, jumping backwards to previous stages is common andnecessary, in particular when adding new stakeholders.We use the term visualizer to mean either a visualization re-searcher or practitioner. The adapted framework contains 10 stages:

Learn:

This stage remains unchanged from the original becauseit concerns the visualization knowledge in general and indepen-dent of the speciﬁc domain problem. A solid understanding of thevisualization literature is crucial for every design study.

Acquire data:

A visualizer encounters, collects, generates, or ob-tains access to a dataset. The visualizer translates the data descriptioninto domain-independent language and begins to develop a data ab-straction. Data sketches and descriptive statistics can provide ﬁrstinsights and guide the development of initial hypotheses about its igure 2: Simple conceptual model illustrating the challenges of ﬁnding an intersection of relevant tasks and data while keeping the problem spaceconstrained. (a) The data and task axes both range from peripheral to core. (b) In the elicit stage, the visualizer examines the match between theinitially acquired abstracted data and stakeholder tasks. (c) In the winnow stage, the visualizer can assess the beneﬁts and risks of expanding thescope to additional data to expand the set of tasks and thus stakeholders. underlying semantics.The data-ﬁrst DSM framework emphasizes iteration, as does theoriginal DSM, and thus takes into account the idea that further dataabstraction reﬁnement will surely occur after task abstractions areidentiﬁed. Relevant questions:

What type of data am I working withand what are the underlying semantics? Are there any data qualitychallenges and is long-term access guaranteed? What is specialabout this data and who would beneﬁt of seeing and exploring it?

Elicit tasks:

The visualizer seeks out multiple potential stakeholdersand elicits domain-speciﬁc tasks from them that might be relevantfor the chosen data abstraction. The visualizer both explains theinitial data abstraction and learns about unsolved stakeholder needs.The visualizer translates the tasks into domain-independent lan-guage and considers what data is required to address them.

Is therea match between given data and stakeholder need? Is more or otherdata required to answer domain-speciﬁc questions, and if so, howdifﬁcult is it to acquire and integrate?

Winnow:

The goal of the data-ﬁrst DSM winnowing stage is toassess and prioritize the set of potential stakeholders according tothe match between their abstract tasks and the data abstraction. Thisprocess may involve multiple rounds of task and data abstraction,as the understanding of the stakeholder needs and of the datasetsemantics evolves from partial to more complete. We discuss thesedata and task perspectives in Sect. 5.1.

How frequent are their data-relevant tasks? How central are these tasks to the stakeholder’sprimary mission? How many people in the organization deal withthese tasks?

Cast:

The cast stage is similar to the original DSM with the goal toidentify collaborator roles; here we note some previously proposedroles that have particular ramiﬁcations in the data-ﬁrst case.We named the promoter role in the Ocupado paper [31], and callit out here as essential in a data-ﬁrst design study. The visualizerneeds to serve in this role to reach out to potential stakeholders topresent data and visualization opportunities. In addition, partners orstakeholders may act as promoters themselves when they undertaketechnical evangelism to other potential users about the beneﬁts ofa visualization system. When this evangelism includes demos ofsoftware prototypes, as it often does, the promoter may be involvedas an intermediary in the task elicitation and winnowing stages. Inthis case, the promoters can serve as a conduit to relay informationback to the visualizers about stakeholder needs and how they alignwith prototype capabilities. In retrospect, we identify an example ofthis role occurring within our own previous Overview work [7].The data producer and data consumer roles were ﬁrst proposed byKerzner et al. [18] within stakeholder-ﬁrst design studies. These tworoles are most certainly held by different people in data-ﬁrst studies,and therefore require particular consideration. A data provider mayneed to ensure long-term access to and the functioning of the datapipeline. The visualizer may sometimes take on the data provider role in a later project stage, even if that role was played by somebodyelse at an early stage.

Design:

The design stage focuses only on visual idioms, whereasit also includes data abstraction work in the original DSM; in ourframework, the latter was carried out earlier, in the acquire stage.

Implement and deploy:

Remain unchanged from the original DSM.

Reﬂect and write:

Data-ﬁrst design studies invite speciﬁc consider-ations during the reﬂection and writing stages.

How many potentialstakeholders and domains have been considered, and ultimatelychosen? What are the differences and commonalities among thestakeholder tasks? Did the original data sufﬁce or was there a needto integrate data from additional sources?

EFLECTIONS AND RECOMMENDATIONS

We reﬂect on the challenges of the winnowing stage, the opportuni-ties of the data-ﬁrst approach, and its risks.

The key challenge in this non-traditional design study approach isto identify which stakeholders are appropriate, and thus to identifywhether there is a match between potential stakeholder tasks and theavailable data. To do so, the visualizer must assess the relevance ofthe identiﬁed task for the given data, and also consider whether tointegrate secondary data sources. These considerations take placeduring the winnow stage.Fig. 2 illustrates a simple conceptual model for reasoning aboutthe conformity between data and tasks as aligned axes through acontinuum with peripheral on the left, and core on the right. At thecore are the current data and the high-relevance tasks, with other dataand low-relevance tasks being relegated to the periphery. Fig. 2bshows several tasks assessed according to the initial data selectionarising from the elicit stage, with some determined to be mismatchesand some considered as matches.Fig. 2c illustrates the tradeoffs of incorporating additionaldatasets into the scope of a project during the winnow stage. Apossible beneﬁt of adding secondary data sources could be to sup-port more tasks, and thus to expand the set of eligible stakeholders.However, the visualizer also needs to ensure that the problem spaceunder consideration maintains sufﬁcient cohesion to have clear sys-tem goals and design targets. Cohesion can be reasoned about asconstraints on the number and variety of stakeholders and of tasks.

We have identiﬁed several opportunities of a data-ﬁrst approach.

Capitalizes on possibilities that do not ﬁt traditional DSM pro-cess.

We consider data-ﬁrst design studies as an alternative way toapproach visualization research collaborations, where interestingresearch questions can emerge through conversations that unfoldater in this process than in the traditional one. Analogous to mar-ket pull versus technology push dynamics in product innovationstrategies [5], the originally acquired data is used to push ideasfrom an outside perspective instead of pulling analysis questionsfrom a domain expert. For example, using existing WiFi accesspoints to sense occupancy patterns in hundreds of rooms withoutinstalling additional hardware yielded a completely new opportunitythat Ocupado stakeholders had not envisioned before.Launching a data-ﬁrst design study without a speciﬁc stakeholdercollaboration commitment at the beginning of a project is a possiblemethod to approach long-distance work and to engage with domainexperts in imperfect circumstances, such as under COVID-19 restric-tions. The learn and acquire stages are conducted independentlyby the visualizer. When reaching out to potential stakeholders, ﬁrstideas for solving a speciﬁc problem are presented and guide thediscussion, instead of conducting a fully open-ended brainstormingsession which can be challenging using only digital communicationtools.

Allows very early data sketches or technology probes with real-world data.

Launching a design study project without access to realdata is a common pitfall [36]. Synthetic or toy data often results inadditional project obstacles and delays [18, 39], and may even leaddown the wrong path of data abstraction and visualization design.Access to real-world datasets at the beginning of the project allowsthe creation of data sketches [21] and technology probes [16], both tointernally guide the development of hypotheses and possible usagescenarios, and to externally inquire needs and desires of target usergroups.

Supports gradual expansion of stakeholder set.

Multi-channelengagement with several stakeholders fosters serendipity and en-ables more impactful visualization designs, as demonstrated byWood et al. [41] and conﬁrmed in our own data-ﬁrst design stud-ies [30, 31], while providing breadth to a convincing evaluation [34].However, addressing the needs of multiple stakeholders involvesthe risk of conﬂicting tasks and an ever-expanding project scope.In data-ﬁrst design studies, stakeholders are selected by evaluatingthe conformity between tasks and data capabilities. Although addi-tional data could be added, the primary data source serves as a guideto navigate the due diligence with stakeholders and to delimit theproblem space.

Encourages progression of course or side projects into publish-able results.

Data-ﬁrst design studies may start off during visualiza-tion courses or when exploring and visualizing data out of curiosity.Characteristics such as complex datasets, unique characteristics, orcombinations of different data sources that address high-relevancestakeholder tasks may suggest a fruitful direction to expand initialefforts to result in interesting and novel knowledge contributions.

Hypothesized tasks.

In the data-ﬁrst approach, visualizers beginexploration immediately after acquiring the initial data, with conjec-tures about potential stakeholders and usage scenarios. We considerthe delayed involvement of potential domain experts to be a majorrisk, because the initial implementation is based on hypothesizedrather than veriﬁed tasks. To mitigate this risk, technology probesshould be limited in functionality and open-ended with respect touse to avoid wasted effort [16].

Hammer looking for a nail.

The danger of having data at our ﬁngertips is to converge on aspeciﬁc idea prematurely and develop a full-ﬂedged prototype with-out an in-depth engagement of stakeholders and the considerationof the broader design space. In a design study, stakeholder partic-ipation needs to be maximized and many design alternatives must be considered, otherwise stakeholders are marginalized to designveriﬁers.

No actual users.

In the worst case, the search for potential stake-holders may not succeed. In this case, other types of research con-tributions such as a novel visualization technique or algorithm maybe possible, but a successful design study would not be achievablebecause it requires working with real users to solve their real-worldproblems. A major pitfall would be attempting to validate designdecisions prematurely based on speculative tasks; care should betaken to avoid this mistake. A less extreme situation is when signif-icant time is spent on promotion, leaving less energy available todirect towards research questions. The pitfall in this case would bea design study that requires longer than planned before the work issufﬁciently complete to publish.

Mismatch between data and stakeholder tasks.

A clear match be-tween data and stakeholder tasks is not always immediately obviousand a short-cut to data and task abstractions can lead to a prematurecommitment to speciﬁc stakeholders. In some cases, proxy measuresthat stand in for a variable of interest entail a range of limitations. Inother cases, the given data is out of date or needs to be augmentedwith other data sources. A misinterpretation can lead to projectdetours or to a failed design study.

Joy of discovery versus actual needs.

The presentation of datathrough visualizations often sparks interest, particularly when po-tential stakeholders see their own data or data similar to theirs. Weobserved many meetings where domain experts enthusiastically be-gan to conjecture about the many different ways they might be ableto use the data. This process is encouraged to assess the conformitybetween the given data and relevant tasks. However, at these initialmeetings ideas are often proposed based on limited information; thechallenge for the researcher is to discriminate between initial enthu-siasm and substantive user needs. A central concern is to understandwhether the visualization addresses a core task of the target usergroup, to avoid the pitfall of a task that is only of peripheral rele-vance. Fellow tool builders and gatekeepers [36], who frequently actas intermediaries in winnow -stage meetings, might misunderstandspeciﬁc needs of front-line analysts. The pitfall to avoid is prematureselection of stakeholders whose actual tasks do not in fact align withthe data, who would then abandon a visualization solution quickly.

Data promises.

Decoupling the data producer from the consumermay increase the responsibility for the visualizer who needs toensure the functioning of the data pipeline throughout all stagesof the design study—as opposed to stakeholder-ﬁrst design studieswhere data is typically provided by or collected together with thedomain expert. A risk is therefore to have a strong collaborationwith stakeholders but not deliver on the initial data promises. Forexample, generating interest by presenting data from one city, butnot being able to follow through by collecting data from the citythat is most relevant to the stakeholder, could be a major barrier forproject success that is independent of the quality of the proposedvisualization solution.Although in many cases setting up a static database at the be-ginning is sufﬁcient, for projects that rely on the availability ofcontinuous data streams, external API or system level changes canjeopardize the success of a design study. To guarantee long-termavailability and avoid continuous engineering effort, it is imperativeto ofﬂoad this responsibility.Stakeholders likely depend on data providers even when the re-search project concludes. Visualization researchers may, but do notnecessarily, act as data providers in a data-ﬁrst design study.

Closed design approach.

Data iteration [15], including the integra-tion of secondary data sources in a visualization system, can be acrucial step to address a speciﬁc domain problem. One major riskin data-ﬁrst design studies is to keep the focus too narrow on thenitial data without considering the wider problem space and ensur-ing extensibility and adaptability from an engineering and designperspective.

EVIEW OF P REVIOUS D ESIGN S TUDIES

In parallel with formalizing our proposal for a reﬁned data-ﬁrstdesign study methodology framework, we conducted a literaturereview of a diverse set of design studies. While we had our owndirect experience of two projects that started with a dataset ratherthan a speciﬁc stakeholder as an existence proof, the ratio of thesestudies in the previous visualization literature was unclear. Wewere interested whether data-ﬁrst approaches have been explicitlyor implicitly described previously, what type of data they have used,and if there are any underlying common themes.In total, we selected 64 design studies that were published be-tween 1999 and 2019 (full list provided in supplemental material).We used the work by Lam et al. [20] who identiﬁed 39 design stud-ies as a seed and extended this list with additional examples fromSedlmair [34] and by reviewing previous conference proceedings. Inaddition, we sourced design studies from vispubdata [17] by search-ing for the term “design study” in titles and abstracts. The vastmajority of these design studies are published at InfoVis with a fewexamples from VAST and short papers.In our analysis we identiﬁed 16 edge cases (25%) that indicatecharacteristics of a data-ﬁrst design process in the motivation orprocess statements, listed in Appendix A. An accurate allocation toa stakeholder-ﬁrst or data-ﬁrst ordering is not possible based on thiskind of retrospective interpretation. Nevertheless, these edge casesprovide relevant insights and illustrate that data-ﬁrst approaches arenot at all unusual, although they are often underdocumented.These potential instances of data-ﬁrst design studies span a broadrange of topics and data sources, such as music listening histories [4],code repositories [29], electricity consumption [14, 38], ﬁnancialtransaction ﬂows [2], sports [32, 33], social media [25], hotel cus-tomer reviews [43], and genealogies [6]. The data is mostly acquiredfrom publicly available databases but in some cases scraped or gen-erated by researchers.For instance, Miranda et al. [25] collected Flickr activity andtweets to capture spatio-temporal patterns in a city. The authorsillustrated how interactive visualizations of this data can help ex-perts specializing in different domains to better understand peoplesbehavior in public places.The BallotMaps [14] design study used electoral data from anopen data repository to identify name and ordering biases in ballotpapers. External domain experts are not apparent based on thepublication but the study demonstrates with real-world exampleshow a visual approach can reveal signiﬁcant biases that were notevident from statistical analysis alone.Distinguishing between researchers being their own stakeholdersand data-ﬁrst design studies is most difﬁcult, as there is sometimes noclear separation. Researchers with extensive expertise in a speciﬁcdomain problem may be qualiﬁed to self-validate a visualization so-lution in an autobiographical design study, as we discuss in Sect. 7.3.In these 16 edge cases it is often unclear if the researchers stumbledupon data because of a broad interest in a topic or if they had aparticular analysis question themselves. The level of domain ex-pertise is sometimes indicated, for example, one of the authors ofTenniVis [33] had 35 years of tennis playing experience, but in mostdesign studies it is not evident.While some authors [3, 32, 43] report on early interviews withdomain experts to elicit requirements, many design study projectsinclude external stakeholders rather late in the process, for example,to participate in a summative evaluation instead of inﬂuencing thedesign process. We consider this late involvement of target usersa major risk (see Sect. 5.3), particularly for data-ﬁrst approacheswhen real data is available and simple data sketches progress to more complex solutions.While this review is an attempt to analyze many different types ofdesign studies from the last two decades, our selection is certainlyincomplete as many other design studies have been published. A fur-ther limitation is that our judgement is based on limited informationon the design process and additional investigations are required tobetter understand implications of data-ﬁrst approaches that were notexplicitly reported. Interviewing visualization researchers would bea great opportunity for future work.

ISCUSSION

We discuss the role of the researcher’s domain expertise, and designstudies without engaging in collaboration with external domainexperts.

The nature of data-ﬁrst design studies requires a certain degree ofdomain familiarity and data understandability. Thus, not all datasetsare appropriate for data-ﬁrst design studies: the selection stronglydepends on the personal background of the visualizer as it pertainsto the research context. Initial data should be self-explanatory orwell-documented, or the visualizer should have a strong personalinterest.Visualizers acquire and investigate data on their own before en-gaging into conversations with potential stakeholders. While speciﬁcuse cases might not be immediately apparent and talks with differentdomain experts can lead to serendipitous ﬁndings, visualizers needto be aware of problems faced by domain experts and foreshadowpotential tasks at the beginning of a project. We argue that visualiz-ers need to be task-curious to advance this type of problem-drivenresearch, which is methodologically very different to traditionaldesign studies where only stakeholders come forward with tasks.A thorough understanding of the selected data and its limitationsis essential. Although this understanding will typically grow overtime, a basic knowledge at the beginning is beneﬁcial. Descriptivestatistics and superﬁcial explorations quickly reach their limits. Inboth our data-ﬁrst design studies, we recorded data based on a livedata stream. In the bike sharing project, although we recorded datafrom hundreds of cities, it followed a relatively simple and uniformscheme which allowed us to detect data limitations without having adeep domain knowledge. In the Ocupado project, we worked closelywith our industry partner and data provider to better understand datasemantics and quality issues.

The data-ﬁrst approach entices visualizers to use available data andapply it to new problem contexts. For example, proxy variablesmay be used to answer questions when direct measurement is pro-hibitively expensive [26]. The repurposing of data [42] providesunique opportunities but entails a range of risks: data collectors maynot have given consent for their data to be used in a particular way,the original data collection process may not be fully transparent andthus the data quality and underlying assumptions can not be judgedreliably, and a lack of correlation with the variable of interest canultimately entice analysts to draw wrong conclusions. Visualizationresearchers are responsible to determine if data is ﬁt for the new useand need to understand all its ethical and analytical implications. Asdiscussed in Sect. 5.1, one of the goals in the winnowing phase is toidentify if the data is appropriate for a given task or if it would bebetter served with a different dataset.

Following the terminology of Neustaedter et al. [28], we deﬁnethe process of designing and evaluating visualizations through self-usage as autobiographical design studies, that we distinguish from data-ﬁrst design studies.lthough the DSM paper does not use this terminology, whichwas introduced at roughly the same time as its publication, the ideaof it is clearly articulated within it: “While strong problem-drivenwork can result from situations where the same person holds bothof these roles [visualization researchers and domain experts], we donot address this case further here” [36].Autobiographical design draws on “extensive, genuine usage bythose creating or building the system” [28]. Visualization expertsact as primary stakeholders and design iterations are based on theirown experiences. This type of ﬁrst-person research [11] usuallyemerges from an unmet personal need and allows tinkering with anidea in a research context. The researcher should have or quicklygain extensive expertise related to the domain problem in order tovalidate the effectiveness of a visualization.In contrast, data-ﬁrst design studies follow the autobiographicaldesign only at the very beginning. Domain familiarity is importantbut the process largely focuses on external domain experts [40] astarget users and their feedback shapes the design decisions insteadof insights from self-usage.

ONCLUSION

We contribute the ﬁrst characterization of a data-ﬁrst design studyfor applied visualization projects that begin with an interesting real-world dataset rather than stakeholder selection. The motivationbehind this approach stems from our own experience in conductingtwo data-ﬁrst design studies. Problem-driven research motivated andinspired by data enables opportunities that do not ﬁt in the traditionaldesign study process, although it does introduces risks and tensions.We propose an adaptation of the design study methodology frame-work to provide practical guidance for researchers interested in thisalternative way to approach research collaborations. A preliminaryreview of published design studies revealed that roughly one quarterof them were edge cases which might have had an implicit data-ﬁrstordering. We conjecture that some of these researchers may haveunder-emphasized their true trajectory by bending their explanationcloser to the previously articulated design study methodology frame-work. We hope this work encourages researchers to be more explicitin documenting and reﬂecting on various ﬂavors of design studies. A CKNOWLEDGMENTS

We thank J¨urgen Bernard, Madison Elliott, Steve Kasica, Zipeng Liu,Francis Nguyen, and Ben Shneiderman for inspiring discussionsand feedback. We also thank the anonymous reviewers for theircomments. R EFERENCES [1] Sensible Building Science, 2020. Accessed: 2020-07-07.[2] A. Arleo, C. Tsigkanos, C. Jia, R. A. Leite, I. Murturi, M. Klaffenb¨ock,S. Dustdar, M. Wimmer, S. Miksch, and J. Sorger. Sabrina: Modelingand visualization of ﬁnancial data over time with incremental domainknowledge. In

Proc. IEEE Visualization Conference (VIS) , pp. 51–55,2019.[3] R. C. Basole, T. Clear, M. Hu, H. Mehrotra, and J. Stasko. Under-standing interﬁrm relationships in business ecosystems with interac-tive visualization.

IEEE Trans. Visualization and Computer Graphics ,19(12):2526–2535, 2013.[4] D. Baur, F. Seiffert, M. Sedlmair, and S. Boring. The streams of our lives:Visualizing listening histories in context.

IEEE Trans. Visualization andComputer Graphics , 16(6):1119–1128, 2010.[5] M. Baxter.

Product Design . CRC Press, 1995.[6] A. Bezerianos, P. Dragicevic, J.-D. Fekete, J. Bae, and B. Watson. Ge-neaquilts: A system for exploring large genealogies.

IEEE Trans. Visu-alization and Computer Graphics , 16(6):1073–1081, 2010.[7] M. Brehmer, S. Ingram, J. Stray, and T. Munzner. Overview: TheDesign, Adoption, and Analysis of a Visual Document Mining Toolfor Investigative Journalists.

IEEE Trans. Visualization and ComputerGraphics , 20(12):2271–2280, 2014. [8] CityBikes API. https://api.citybik.es, 2016. Accessed: 2020-07-07.[9] A. Crisan and T. Munzner. Uncovering data landscapes through datareconnaissance and task wrangling. In

Proc. IEEE Visualization Confer-ence (VIS) , pp. 46–50, 2019.[10] Data is Beautiful. DataViz Battle. https://reddit.com/r/dataisbeautiful.Accessed: 2020-07-07.[11] A. Desjardins and A. Ball. Revealing tensions in autobiographicaldesign in HCI. In

Proc. Designing Interactive Systems Conference(DIS) , pp. 753–764, 2018.[12] J. E. Froehlich, J. Neumann, and N. Oliver. Sensing and predictingthe pulse of the city through shared bicycling. In

Int. Joint Conf. onArtiﬁcial Intelligence , 2009.[13] GFDRR. VizRisk Challenge. https://understandrisk.org/vizrisk/, 2019.Accessed: 2020-07-07.[14] S. Goodwin, J. Dykes, S. Jones, I. Dillingham, G. Dove, A. Duffy,A. Kachkaev, A. Slingsby, and J. Wood. Creative user-centered visualiza-tion design for energy analysts and modelers.

IEEE Trans. Visualizationand Computer Graphics , 19(12):2516–2525, 2013.[15] F. Hohman, K. Wongsuphasawat, M. B. Kery, and K. Patel. Under-standing and visualizing data iteration in machine learning. In

ACMSIGCHI Conf. Human Factors in Computing Systems (CHI) , pp. 1–13,2020.[16] H. Hutchinson, W. Mackay, B. Westerlund, B. B. Bederson, A. Druin,C. Plaisant, M. Beaudouin-Lafon, S. Conversy, H. Evans, H. Hansen,et al. Technology probes: Inspiring design for and with families. In

ACM SIGCHI Conf. Human Factors in Computing Systems (CHI) , pp.17–24. ACM, 2003.[17] P. Isenberg, F. Heimerl, S. Koch, T. Isenberg, P. Xu, C. D. Stolper,M. Sedlmair, J. Chen, T. M¨oller, and J. Stasko. vispubdata.org: Ametadata collection about IEEE Visualization (VIS) publications.

IEEETrans. Visualization and Computer Graphics , 23(9):2199–2206, 2016.[18] E. Kerzner, L. A. Butler, C. Hansen, and M. Meyer. A shot at visualvulnerability analysis. In

Computer Graphics Forum , vol. 34, pp. 391–400, 2015.[19] A. Kriebel and E. Murray. . John Wiley & Sons,2018.[20] H. Lam, M. Tory, and T. Munzner. Bridging from goals to tasks withdesign study analysis reports.

IEEE Trans. Visualization and ComputerGraphics , 24(1):435–445, 2017.[21] D. Lloyd and J. Dykes. Human Centered Approaches in Geovisualiza-tion Design Investigating Multiple Methods Through a Long Term CaseStudy.

IEEE Trans. Visualization and Computer Graphics , 17(12):2498–2507, 2011.[22] G. E. Marai. Activity-centered domain characterization for problem-driven scientiﬁc visualization.

IEEE Trans. Visualization and ComputerGraphics , 24(1):913–922, 2017.[23] N. McCurdy, J. Dykes, and M. Meyer. Action design research andvisualization design. In

BELIV Workshop: Beyond Time and Errors -Novel Evaluation Methods for Visualization , pp. 10–18, 2016.[24] S. McKenna, D. Mazur, J. Agutter, and M. Meyer. Design activityframework for visualization design.

IEEE Trans. Visualization andComputer Graphics , 20(12):2191–2200, 2014.[25] F. Miranda, H. Doraiswamy, M. Lage, K. Zhao, B. Gonc¸alves, L. Wil-son, M. Hsieh, and C. T. Silva. Urban pulse: Capturing the rhythm ofcities.

IEEE Trans. Visualization and Computer Graphics , 23(1):791–800, 2016.[26] M. R. Montgomery, M. Gragnolati, K. A. Burke, and E. Paredes. Mea-suring living standards with proxy variables.

Demography , 37(2):155–174, 2000.[27] T. Munzner. A Nested Model for Visualization Design and Validation.

IEEE Trans. Visualization and Computer Graphics , 15(6):921–928,2009.[28] C. Neustaedter and P. Sengers. Autobiographical design in HCI re-search: designing and learning through use-it-yourself. In

Proc. Design-ing Interactive Systems Conference (DIS) , pp. 514–523, 2012.[29] M. Ogawa and K.-L. Ma. code swarm: A design study in organic soft-ware visualization.

IEEE Trans. Visualization and Computer Graphics ,15(6):1097–1104, 2009.[30] M. Oppermann, T. M¨oller, and M. Sedlmair. Bike Sharing Atlas: Visualnalysis of Bike-Sharing Networks.

Int. Journal of Transportation (IJT) ,6(1):1–14, 2018.[31] M. Oppermann and T. Munzner. Ocupado: Visualizing Location-BasedCounts Over Time Across Buildings.

Computer Graphics Forum , 39(3),2020.[32] C. Perin, R. Vuillemot, and J.-D. Fekete. SoccerStories: A kick-offfor visual soccer analysis.

IEEE Trans. Visualization and ComputerGraphics , 19(12):2506–2515, 2013.[33] T. Polk, J. Yang, Y. Hu, and Y. Zhao. Tennivis: Visualization for tennismatch analysis.

IEEE Trans. Visualization and Computer Graphics ,20(12):2339–2348, 2014.[34] M. Sedlmair. Design study contributions come in different guises:Seven guiding scenarios. In

BELIV Workshop: Beyond Time and Errors- Novel Evaluation Methods for Visualization , pp. 152–161, 2016.[35] M. Sedlmair, P. Isenberg, D. Baur, and A. Butz. Information visu-alization evaluation in large companies: Challenges, experiences andrecommendations.

Information Visualization , 10(3):248–266, 2011.[36] M. Sedlmair, M. Meyer, and T. Munzner. Design Study Methodolgy:Reﬂections from the Trenches and the Stacks.

IEEE Trans. Visualizationand Computer Graphics , 1(12):2431–2440, 2012.[37] U. H. Syeda, P. Murali, L. Roe, B. Berkey, and M. A. Borkin. Designstudy “lite” methodology: Expediting design studies and enabling thesynergy of visualization pedagogy and social good. In

ACM SIGCHIConf. Human Factors in Computing Systems (CHI) , pp. 1–13, 2020.[38] J. J. Van Wijk and E. R. Van Selow. Cluster and calendar basedvisualization of time series data. In

Proc. IEEE Symp. InformationVisualization (InfoVis) , pp. 4–9, 1999.[39] K. Williams, A. Bigelow, and K. Isaacs. Visualizing a moving target:A design study on task parallel programs in the presence of evolvingdata and concerns.

IEEE Trans. Visualization and Computer Graphics ,26(1):1118–1128, 2019.[40] Y. L. Wong, K. Madhavan, and N. Elmqvist. Towards characterizingdomain experts as a user group. In

BELIV Workshop: Evaluation andBeyond-Methodological Approaches for Visualization , pp. 1–10, 2018.[41] J. Wood, R. Beecham, and J. Dykes. Moving beyond sequential design:Reﬂections on a rich multi-channel approach to data visualization.

IEEETrans. Visualization and Computer Graphics , 20(12):2171–2180, 2014.[42] P. Woodall. The data repurposing challenge: new pressures from dataanalytics.

Journal of Data and Information Quality (JDIQ) , 8(3-4):1–4,2017.[43] Y. Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, and H. Qu. Opinion-Seer: interactive visualization of hotel customer feedback.

IEEE Trans.Visualization and Computer Graphics , 16(6):1109–1118, 2010.

A A

PPENDIX : E

DGE C ASE D ESIGN S TUDIES [1] A. Arleo, C. Tsigkanos, C. Jia, R. A. Leite, I. Murturi, M. Klaffenb¨ock,S. Dustdar, M. Wimmer, S. Miksch, and J. Sorger. Sabrina: Modelingand visualization of ﬁnancial data over time with incremental domainknowledge. In

Proc. IEEE Visualization Conference (VIS) , pp. 51–55,2019.[2] R. C. Basole, T. Clear, M. Hu, H. Mehrotra, and J. Stasko. Under-standing interﬁrm relationships in business ecosystems with interactivevisualization.

IEEE Trans. Visualization and Computer Graphics ,19(12):2526–2535, 2013.[3] D. Baur, F. Seiffert, M. Sedlmair, and S. Boring. The streams of ourlives: Visualizing listening histories in context.

IEEE Trans. Visualiza-tion and Computer Graphics , 16(6):1119–1128, 2010.[4] A. Bezerianos, P. Dragicevic, J.-D. Fekete, J. Bae, and B. Watson.Geneaquilts: A system for exploring large genealogies.

IEEE Trans.Visualization and Computer Graphics , 16(6):1073–1081, 2010.[5] S. Goodwin, J. Dykes, S. Jones, I. Dillingham, G. Dove, A. Duffy,A. Kachkaev, A. Slingsby, and J. Wood. Creative user-centered vi-sualization design for energy analysts and modelers.

IEEE Trans.Visualization and Computer Graphics , 19(12):2516–2525, 2013.[6] U. Hinrichs, S. Forlini, and B. Moynihan. Speculative practices: Uti-lizing infovis to explore untapped literary collections.

IEEE Trans.Visualization and Computer Graphics , 22(1):429–438, 2015.[7] B. Kim, B. Lee, S. Knoblach, E. Hoffman, and J. Seo. Geneshelf:A web-based visual interface for large gene expression time-series data repositories.

IEEE Trans. Visualization and Computer Graphics ,15(6):905–912, 2009.[8] F. Miranda, H. Doraiswamy, M. Lage, K. Zhao, B. Gonc¸alves, L. Wil-son, M. Hsieh, and C. T. Silva. Urban pulse: Capturing the rhythm ofcities.

IEEE Trans. Visualization and Computer Graphics , 23(1):791–800, 2016.[9] M. Ogawa and K.-L. Ma. code swarm: A design study in organic soft-ware visualization.

IEEE Trans. Visualization and Computer Graphics ,15(6):1097–1104, 2009.[10] C. Perin, R. Vuillemot, and J.-D. Fekete. SoccerStories: A kick-offfor visual soccer analysis.

IEEE Trans. Visualization and ComputerGraphics , 19(12):2506–2515, 2013.[11] T. Polk, J. Yang, Y. Hu, and Y. Zhao. Tennivis: Visualization for tennismatch analysis.

IEEE Trans. Visualization and Computer Graphics ,20(12):2339–2348, 2014.[12] A. Thudt, D. Baur, S. Huron, and S. Carpendale. Visual mementos:Reﬂecting memories with personal data.

IEEE Trans. Visualizationand Computer Graphics , 22(1):369–378, 2015.[13] J. J. Van Wijk and E. R. Van Selow. Cluster and calendar basedvisualization of time series data. In

Proc. IEEE Symp. InformationVisualization (InfoVis) , pp. 4–9, 1999.[14] J. Wood, D. Badawood, J. Dykes, and A. Slingsby. Ballotmaps: De-tecting name bias in alphabetically ordered ballot papers.

IEEE Trans.Visualization and Computer Graphics , 17(12):2384–2391, 2011.[15] J. Wood, R. Beecham, and J. Dykes. Moving beyond sequential design:Reﬂections on a rich multi-channel approach to data visualization.

IEEE Trans. Visualization and Computer Graphics , 20(12):2171–2180,2014.[16] Y. Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, and H. Qu. OpinionSeer:interactive visualization of hotel customer feedback.