[PDF] Fits and Starts: Enterprise Use of AutoML and the Role of Humans in the Loop

Abstract

AutoML systems can speed up routine data science work and make machine learning available to those without expertise in statistics and computer science. These systems have gained traction in enterprise settings where pools of skilled data workers are limited. In this study, we conduct interviews with 29 individuals from organizations of different sizes to characterize how they currently use, or intend to use, AutoML systems in their data science work. Our investigation also captures how data visualization is used in conjunction with AutoML systems. Our findings identify three usage scenarios for AutoML that resulted in a framework summarizing the level of automation desired by data workers with different levels of expertise. We surfaced the tension between speed and human oversight and found that data visualization can do a poor job balancing the two. Our findings have implications for the design and implementation of human-in-the-loop visual analytics approaches.

Full PDF

FFits and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop

Anamaria Crisan

Tableau [email protected]

Brittany Fiore-Gartland

Tableau [email protected]

Figure 1: Levels of Automation in Data Science Work. From our interviews we illustrate the desired level of automation accord-ing to level of technical expertise in data science. We ground our findings in the levels of automation proposed by Parasuramanet al. [41] and Lee et al. [34]

ABSTRACT

AutoML systems can speed up routine data science work and makemachine learning available to those without expertise in statisticsand computer science. These systems have gained traction in en-terprise settings where pools of skilled data workers are limited. Inthis study, we conduct interviews with 29 individuals from orga-nizations of different sizes to characterize how they currently use,or intend to use, AutoML systems in their data science work. Ourinvestigation also captures how data visualization is used in con-junction with AutoML systems. Our findings identify three usagescenarios for AutoML that resulted in a framework summarizingthe level of automation desired by data workers with different levelsof expertise. We surfaced the tension between speed and humanoversight and found that data visualization can do a poor job bal-ancing the two. Our findings have implications for the design andimplementation of human-in-the-loop visual analytics approaches.

CCS CONCEPTS • Human-centered computing → Empirical studies in HCI . Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permitted. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior specific permission and/or afee. Request permissions from [email protected].

KEYWORDS

Data Science, Automation, Machine Learning, Data Scientists

ACM Reference Format:

Anamaria Crisan and Brittany Fiore-Gartland. 2021. Fits and Starts: En-terprise Use of AutoML and the Role of Humans in the Loop. In

CHI’21: ACM Conference on Human Factors in Computing Systems, May 08–13, 2021, Yokohama, Japan.

ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/1122445.1122456

Organizations are flush with data but bereft of individuals with thetechnical expertise required to transform these data into actionableinsights [45]. To bridge this gap, organizations are increasinglyturning toward automation in data science work beginning withthe adoption of techniques that automate the creation of machinelearning models [13, 46]. However, the adoption of this technologyinto enterprise settings has not been seamless. Currently AutoMLofferings have limitations in what they can flexibly support. End-to-end systems encompassing the full spectrum of data science work,from data preparation to communication, are not yet fully real-ized [34, 54]. Consequently, AutoML systems still require human in-tervention to be practically applicable [42, 46]. This mode of human-machine collaboration presents a number of challenges [2, 35], chiefamong them being the importance of balancing the speed affordedby AutoML with the agency of individuals to interpret, correct,and refine automatically generated models and results [20]. Datavisualization can play an important role in facilitating this human-machine collaborative process [20, 46], but there are few studies a r X i v : . [ c s . H C ] J a n HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al. that examine if and how data visualization is used in real-world set-tings together with AutoML. To fill this gap, we conduct interviewswith 29 individuals from organizations of different sizes and thatextend across different domains to capture how they currently, orplan to, use AutoML to carry out data science work. We examinespecifically if and how participants use data visualization as a wayto integrate the human in the automation loop.Our investigation reveals that the practical use of AutoML tech-nology in real world settings requires considerable human effort.This effort is complicated by the need to trade-off data work be-tween individuals with different expertise, for example data sci-entists and business analysts. This trade-off is exacerbated by adata knowledge gap that participants believe AutoML technologyis widening. While participants saw the value of data visualizationas one way to facilitate human-in-the-loop interactions with Au-toML tools, many still reported using visualization in a limited way.Participants found that creating quality visualizations for AutoMLwas often too difficult and time consuming and had the effect ofslowing down automation often with limited benefit. Moreover,participants reported a lack of useful visualization tools to supportthem in some of their more pressing needs, such as collaborating ondata work among their diverse teams and with AutoML technology.Altogether our study makes the following timely contributions tothe existing literature on AutoML and the design of human-in-the-loop tools for data science: • An interview study that presents real world uses of AutoMLtechnologies in enterprise settings with a focus on the roleof the human-in-the-loop facilitated by data visualization • A summary of three use cases for AutoML according todifferent organizational needs • A framework that illustrates the level of automation that isdesirable for individuals with different levels of technicalexpertise.As AutoML systems continue to gain traction in enterprise set-tings, our contributions will be a resource to the research commu-nities developing human-in-the-loop approaches that support anappropriate balance of automation and human agency.

We review prior work that investigates the use of AutoML in datascience, the ways that humans act within these processes, andcurrent data visualization approaches that mediate these processes.As we reviewed this work, we were challenged by the varieduse of the term ‘AutoML’. The preliminary goals of automationin machine learning began with the objective of removing the hu-man specifically from hyper-parameter tuning and model selectionsteps [50]. However, it quickly became clear that other steps, suchas data preparation or feature engineering, were also critical to thesuccess of hyper-parameter tuning. The scope of the term AutoML,and more recently “AutoAI” or “driverless AI”, began to encompassbroader steps in the data science workflow [46, 54]. We observedthat the terms AutoML, AutoAI, and the phrase ‘automation indata science’ are often used interchangeably in the literature. Here,we use the term AutoML to broadly encompass automation acrossmultiple data science steps, from preparation to monitoring to de-ployment.

Data science leverages techniques from machine learning to derivenew and potentially actionable insights from real-world data [3, 6,12]. AutoML systems have been developed to automate the com-putational work involved in building a data analysis pipeline thatenable individuals to derive these insights from data. Several com-mercial systems already exists and are used within different types oforganizations, including AWS SageMaker AutoPilot [24], Google’sCloud AutoML [26], Microsoft’s AutomatedML [29], IBM’s Au-toAI [28], H20 Driverless AI [27], and Data Robot [25]. There arealso implementations of AutoML that build upon widely used datascience packages, such as the scikit-learn [43] python library, auto-sklearn [15, 16] and TPOT [38, 39]. The focus of these AutoMLsystems are toward largely supervised tasks concerning feature en-gineering, hyper-parameter tuning, and model selection [14, 50, 54].Recent innovations have proposed possible end-to-end solutionsthat also support data preparation [34, 50, 54] and it is likely thatAutoML technologies will continue to expand toward broader end-to-end support.The means and extent to which AutoML systems integrate witha computational data science pipeline is variable. Some AutoMLsystems exist as a single component within a larger pipeline, suchas automated feature selection step, that the analyst or data scien-tist creates. At other times, AutoML systems can also create thesepipelines with minimal user input. In their comprehensive analysisof existing AutoML tools, Zöller et al. [54] describe three commonconfigurations for including AutoML in data science work. The twoconfigurations are "fixed structure pipelines”, where the AutoMLsystem assumes a very specific configuration of computationalpipeline. The authors differentiate between fixed pipelines thatare optimized for specific AutoML methods (for example, neuralnetworks or random forests) compared to those that are not. Whilethese fixed systems are common, they have limitations when con-fronted with different data types and tasks. For example, imagedata or text data demand more flexibility within the structure of thecomputational pipeline. The second category is a “variable struc-ture pipeline”, which refers to a fairly recent approach that aimsto learn the appropriate steps within a data science pipeline [54].TPOT [38, 39] is an example of one of the first variable pipelines.Unlike fixed models that execute a pre-determined set of processes,variable structure approaches learn a network of process in responseto different datasets and user objectives.While the stated goal of many of these AutoML systems is toeffectively remove humans from many aspects of data sciencework [50], a view that data scientists themselves express [46], to-day these systems still rely on considerable human labor to beof use [19]. These limitations stem from both the complexity ofdata science work and the brittleness of fixed structure pipelinesthat are in common use [54]. Our study catalogs this human laboracross data science work and examines how visualization is usedby individuals engaged in data work.

Human-in-the-loop approaches provide a way to explicitly incor-porate human interaction within automated processes. Identifying its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan

Figure 2: Example Illustrations of Fixed and Variable Structure Pipelines. Adapted with modification from Zöller et al. [54] .when and how to add the human-in-the-loop within AutoML pro-cesses is important in order to appropriately balance the speed thatautomation affords with the importance of human guidance. Para-suraman et al. [41] proposed a model to help designers identify theappropriate type and level of automation for information-seekingprocesses. They define four broad functions for how automation isused : 1) information acquisition; 2) information analysis; 3) deci-sion and action selection; and 4) action implementation. They arguethat the level of automation, from none to fully automated, shouldbe evaluated against human performance consequences, automa-tion reliability, and costs of actions. When the impact of automationis both significant and potentially harmful, human intervention isessential. The question of when, how, and how much to automateremains critical to the discussion of AutoML technologies today.A number of recent studies in the HCI literature have examinedthis trade-off between automation and human intervention as itrelates to AutoML technology.Lee et al. [34], Gil et al. [17] and Liao et al. [35] describe a setof interaction modalities for users to engage with AutoML sys-tems. Lee et al. [34] categorizes Parasuraman’s et al. [41] levels oflow to high automation into three different modes of interaction:‘user-driven’, ‘cruise-control’, and ‘autopilot’. In ‘cruise control’ auser directs an AutoML algorithm to a set of possible configura-tions to explore, as opposed to specifying a single and immediatenext configuration. As an example, configuration can mean theuser setting a parameter for hyper-parameter tuning during modelcreation. Gil et al. [17] describe a framework for human guidedmachine learning (HGML), which is predicated on the ability toeffectively map user actions to a so-called ‘AutoML planner’ capableof translating and executing the action. Similar to Gil e al., Liaoet al. [35] proposes a declarative way for the user to specify theirobjectives while allowing the system to automatically generate theunderlying processes. By their descriptions, the systems proposedby Lee et al., Gil et al., and Liao et al. are akin to variable structurepipelines that were described in the previous section, in that theylearn the processes in the pipeline. Studies have also examinedhuman-ML/AI collaboration as it pertains to model authoring andinterpretation specifically. While these studies are not exclusiveto AutoML, they highlight key challenges for interacting with andinterpreting machine learning models in enterprise settings. As anexample, Hong et al. [22] interviewed 20 individuals across differentdomains (the majority of whom identified as data scientists), andfound that collaboration among different organizational roles wasof chief importance for operationalizing machine learning modelsinto organizational practices.Honeycutt et al. [21], Liao et al. [35], and Amershi et al. [2]describe the ways that information can be shared between humansand AutoML systems throughout a variety interactions. Honeycutt et al. [21] identifies ‘relevance feedback’ and ‘incremental learning’as two general ways that humans can provide feedback to AutoMLsystems. Humans can provide relevant feedback, which informs theAutoML systems about whether its actions were effective or not.For example, humans may provide labeled data or correct errorswhen they arise. Humans may also provide new information inthe form of incremental feedback to AutoML systems, which canbe used to correct for issues like concept drift in models that havebeen deployed into production settings. Liao et al. and Amershi etal. focus on the flow of information in the opposite direction, whichconcerns the types of information humans require to interpret theresults of from AutoML systems. Liao [35] conducted interviewswith 20 UX design practitioners using a question bank to surfacelimitations in guidance targeting the development of explainableAutoML technologies. Their work demonstrates that the importanceof the ML/AI results and their presentation is highly dependenton the question posed by the individual. Finally, Amershi et al. [2]proposes a comprehensive set of 18 design guidelines that outlinethe appropriate modes of interaction when experts, a) initiallyinteract with an AutoML system; b) as the system is churning; c)when errors surface; and d) throughout user interactions.Studies that examine how people use AutoML technologies andhow they respond to human-in-the-loop features are also emerging.Wang et al. [46] interviewed 20 data scientists across industries tointerrogate their practices and perceptions of AutoML. They foundthe benefits of AutoML for augmenting, but not replacing, humanintuition were valued and appreciated by practitioners. Passi etal. [42] conducted an extensive six month ethnographic study thatinvolved over 50 data scientists. Their findings surface the differ-ent organizational needs and challenges of data workers as theycollaborated with each other in the context of automation in datascience work. Zhang et al. [53], Drozal et al. [13], and Honeycuttet al. [21] conducted controlled experiments to evaluate decisionmaking and trust in AutoML technologies, but their studies didnot recruit current practitioners. Both Zhang et al. and Honeycuttet al. conducted their research via Mechanical Turk and Drozalet al. recruited undergraduate and graduate students in quantita-tive disciplines. Zhang et al. and Honeycutt et al. both found thatreporting accuracy data alone was not sufficient for improvingconfidence and trust in the results produced by AutoML systems.Honeycutt et al. observed that the act of interacting with a machinelearning model reduced confidence of individuals in the model’sperformance even when the human guidance increased accuracy.These findings by Zhang et al. and Honeycutt et al. underscorethe challenges of designing useful feedback mechanisms betweenhumans and AutoML systems.While many human-in-the-loop approaches to support AutoMLprocesses, and by extension data science work, exist, there are few

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al. studies aimed at understanding how they are integrated by practi-tioners in enterprise settings. We found two studies that concretelyexplore AutoML in enterprise, and we build upon these findings inour present study in order to further assesses attitudes of individu-als in enterprise settings toward human-in-the-loop approaches.

Our work specifically focuses on visualization systems that sup-port human-in-the-loop interactions for AutoML. Two prior andcomprehensive state-of-the-art surveys capture the role of visual-ization in explaining [7] and building trust [8] in machine learning.Recent work by Yuan [51] demonstrates visual analytic approachesthroughout the data science process, including prior to model build-ing (data prep and feature engineering), during model building, andafter model building (verification, deployment). These surveys showthe diversity of approaches that are taken to support decision mak-ing throughout the data science pipeline. Here, we highlight fivesystems that collectively capture this diversity. Google Vizier [18]and ATMSeer [47] a) surface the complex latent space of models, b)search this space through interaction and visualization, and c) triagemachine learning models. These systems present users with resultsfrom multiple models across their hyper-parameters through mul-tiple coordinated views of the data. As with GoogleVizier, Pipeline-Profiler [40] and AutoAIViz [48] make use of parallel coordinateplots to help users navigate the model search space and to highlightpossible hyper-parameter settings. AutoAIViz shows the utility ofconditional parallel coordinates plots to visualize subsequent stepsin an AutoML pipeline based upon the user’s current selections.One limitation of visualization for AutoML in data science pipelinesis the assumption of a fixed structure (see Section 2.1), making itdifficult to visually compare variable AutoML pipelines. To addressthis limitation, PipelineProfiler was developed as a wrapper forthe auto-sklearn [15] package, supporting the visualization andcomparisons of different end-to-end AutoML implementations.Taken together, we believe that these systems represent a ‘cruise-control’ mode of including a human-in-the-loop, balancing betweenthe slower ‘user-driven’ and faster, but less transparent, ‘autopi-lot’ modes for executing and interacting with AutoML. Moreover,these systems, co-created with experts in design and data science,represent real implementations of the existing design guidancetoward the use of visualization to help interrogate AutoML sys-tems. However, it remains to be understood how such systems thatare intended to build trust or transparency in AutoML actually getused, or perhaps more concerning, whether they get used at all. Ourstudy sought to surface the visualization strategies within AutoMLin enterprise settings.

The current state of the art in AutoML is informed by multidisci-plinary research endeavours spanning machine learning, humancomputer interaction, and visualization. Given this research effort,there exist a number of AutoML offerings with varying types ofpipeline configurations, from fixed to variable, and that supportdifferent modes of interaction so that “intelligent services and usersmay collaborate efficiently to achieve the user’s goals” [23]. How-ever, there remain few studies on how this technology is applied

Table 1: Summary of the Participants. in enterprise settings, whether users can effectively leverage thebenefits of this technology, and how adding the human-in-the-loopvia visualization is viewed by enterprise users. Moreover, existingstudies [22, 42, 46] looking at enterprise settings focused on specificthemes, namely collaboration and trust, and did not closely examinehow AutoML broadly intersects with data science work. Buildingon these prior findings, our study conducts a broader examinationof AutoML and data science work that surfaces how AutoML issituated within organizational processes.

We conducted semi-structured interviews to develop an understand-ing of how AutoML is used to automate data science work. We werealso interested in surfacing the role of the human-in-the-loop asit is mediated by data visualization tools, such as those used toexplore data or support model tuning and selection.

We recruited participants through a snowball sampling approach [9],with the first point of contact being individuals that had participatedin prior studies, were known to the authors, or other collaborators.We recruited and conducted interviews with 29 individuals thatself-identified as data scientists, or analysts engaged in data science its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan type work, or a manager overseeing a team comprising either en-tirely data scientists or a mixture of data scientists with others. Thesemi-structured interview format prompted participants to discussdata science work in their organization; if and how they currentlyuse AutoML systems, or plan to deploy AutoML systems; and theways they use data visualization, using the Tableau platform orother tools.Interviews were scheduled for approximately 60 minutes, audio-recorded, and transcribed. Our participant screening questionnairesand interview guides are provided as supplemental materials. Dueto the nature of semi-structured interviews, the range of topics thatparticipants chose to touch upon were quite broad. Moreover, dueto the novelty and diversity of uses for AutoML technology theperceptions and pain points described by our participants were notalways overlapping. All interviews were conducted over video con-ferencing software. A summary of participants, their organizationsize, and domain are summarized in Table 1.

Sensitizing con-cepts are an important component of qualitative research, as theyground the analysis in important emergent features and operate asa key interpretative device in data analysis [4]. At the outset of ourstudy, we had some preliminary concepts that we were sensitized tofrom prior research we conducted that examined the nature of datascience work and workers [10]; we used this prior research as partof selective coding processes. In addition to this prior framework,we also had our own notions of concepts that could be pertinent toAutoML, visualization, and human-in-the-loop interactions specifi-cally and these informed our initial interview questions.As we completed interviews we debriefed and conducted initialthematic coding of transcripts, we became sensitized to particularthemes in our analysis that further refined our existing conceptsof data science work and generated new ideas that we had notpreviously considered. Specifically, these emerging themes includedthe importance of different types and levels of participant expertiseand participants’ use and attitudes around “click” (low or no-codesolutions) as opposed to “code-based” solutions. Analysis of par-ticipant pain points surfaced issues of tool switching, trust, andcollaboration. Finally, we also found that predictive modeling wasthe primary way that these organizations applied AutoML tech-nology. As we became sensitized to these concepts, we revised ourinterview guide to ask more pointed questions about these themes.We provide both the preliminary and modified interview guides inour supplemental materials.

Partici-pants self-identified as either analysts or managers. Analysts wereindividuals that were engaged in the day-to-day tasks of data anal-ysis, including data scientists, business analysts, or other technicalanalysts engaged in data science work. Managers oversaw teamsthat often contained a mixture of data scientists, business analysts,or other types of organizational decision makers. In total, 17 partic-ipants in our study were analysts and 12 were managers. Overall,participants had high data science expertise, although one could beclassified more as a citizen data scientist, which did not have formaltraining in data science but was exploring this field with the aideof AutoML. Participants also represented organizations of different sizes performing a variety of functions. Four participants were at or-ganizations with fewer than 100 individuals, 10 with between 100 to1,000 individuals, 4 with between 1,000 and 10,000, 2 with between10,000 and 50,000, and 9 with more than 50,000 individuals. Partic-ipants worked in a broad range of organizations across differentindustries that were focused on data analytics, finance, government,healthcare, management and consulting, security, telecommunica-tions, and travel.We further stratified participants according to their current us-age of AutoML technology.

Active users were those who reportedthat they, or members of their team, used a specific AutoML tech-nology to conduct their work. We did not stipulate some requiredfrequency of use (daily vs not) or the number of individuals cur-rently using this technology.

Experimenting users were those thatreported creating proof of concepts or described at least some pre-liminary projects specifically for the purposes of exploring AutoMLtechnologies. Unlike active users, those that were experimentingwith the technology articulated that their use of AutoML was in theearly stages and exploratory in nature. Finally, those individualsthat we categorized as knowledgeable had high context for datascience work including AutoML, but were not using or planning touse this technology in their work. Among our 29 participants 8 wereactive users, 10 were experimenting, and 11 were knowledgeable.

A prior study [10] used an open coding process to define a frame-work of data science work that comprises four higher-order pro-cesses and fourteen lower-order processes . We use the set of codesfrom this prior study to carry out a selective coding of our interviewtranscripts. Selective coding is a stage in grounded theory researchthat serves to organize the analysis around a core set of variables [5],in this case the processes of data science work, rather than deriveand organize a new set of codes as is done in open and axial coding.The reason we use selective, as opposed to more commonly usedopen and axial coding approaches [37] is due to AutoML technol-ogy being relatively new and as a result participants having variedexperiences with it. While our interviews captured a rich diversityof experiences with AutoML this diversity also led to spareness inour data that made it difficult to achieve theoretical saturation in anopen coding process. Using a selective coding process allowed us toscaffold our analysis around a cohesive narrative of how AutoMLis used across data science work. The selective coding process stillmakes use of constant comparison that allowed us to eventuallyachieve theoretical saturation in our findings.The set of four higher-order processes and fourteen lower-order processes in this framework for data science work were: • Preparation:

Defining Needs, Data Gathering, Data Creation,Profiling , and

Data Wrangling • Analysis:

Experimentation, Exploration, Modeling, Verifica-tion , and

Interpretation • Deployment:

Monitoring and

Refinement • Communication:

Dissemination and

Documentation

The authors of [10] also indicated two lower-order processes,

Collaboration and

Pedagogy , that were identified as emergent butdid not have sufficient context to place within the higher ordercategories.

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al.

Figure 3: Example of Annotating Interviews with processesfrom an existing framework of data science work and work-ers [10]

In Figure 3, we exemplify how we performed a selective codingfor these processes across our interviews. Some statements madeexplicit references to data science processes, for example, “Hardpart - data discovery, data curation” are explicit references to thepreparation higher-order process, as the terminology used can belinked directly to a higher order or lower order processes in theexisting framework. By comparison, some references to processeswere more implicit and were inferred by the authors with othercontext from the interviews. For example, “still need to educate,and visualization is important for that. Still need someone who isthinking through the problem” was determined to be an implicitreference to communication processes although there were notexplicit terms specific to communication.As with putting any model into practice, the prior frameworknot only deepened our analysis, but also generated natural tensionsbetween our observations and our framing of data science. We lever-aged these tensions as points of inquiry that allowed us to critiqueor expand upon these frameworks based upon the participants’reported experiences. We reflect on our approach and propose mod-ifications to it that we describe in Section 5.4 (as these modificationswere motivated by our analysis) and again in discussion (Section 6).

We first present our general findings regarding prevailing attitudestoward AutoML and the role of automation of data science work.Then, we examine the intersection of AutoML in data science workat a level of higher-order processes.

In this section, we describe prevailing attitudes toward adoptingautomation in data science as described by our participants. Weidentified four primary themes that encapsulate these attitudes:the role of AutoML to drive productivity, the importance of toolintegration, the concerns about automating bad decisions, and,finally, the desire to limit the role of the human-in-the-loop.

AutoML was embraced withcautious optimism by participants at organizations of differentsizes, but we found that it tended to be more widely used or ex-plored at larger organizations. It is not clear what is driving thesedifferences, but we believe that it may relate to different amountsand rates of data collection at larger organizations that motivate agreater need for automation. Among larger organizations that hadimplemented data science automation tools, the primary reason forinvesting in AutoML was that it “just makes data scientists more productive” (P01) by automating aspects of their work and allowingthem to triage to focus on more pressing problems. As a specificexample, P07 indicated that there isLots of waste determining which models to work on.[I] Wonder [if we] should focus on managing thepipeline so [that] the things get through have moreimpact. [We] need a predictive model to figure outwhich predictive models are the ones to work on.The automation of routine work, like model tuning and selec-tion, was seen as a desirable way to shift the effort of human labortoward model verification and, if needed, correction tasks. Whilesome participants felt strongly that technical expertise was requiredto safely use AutoML technology for productivity gains (a topicwe return to in 4.1.3), others (P03, P12) saw the benefit of AutoMLto democratize data work. Individuals without a background instatistics and computer science that occupy roles of “business an-alysts” or “moonlighters” [10, 33] would benefit from the lowerbarrier to entry that AutoML affords. This democratization effortmay improve productivity in those roles, but it also opens a doorto capabilities across self-service analytics that were previouslyinaccessible.Overall, AutoML systems reduce the amount of code that isneeded to use data within data science workflows. Individuals withhigh technical expertise, such as data scientists can leverage Au-toML systems to improve speed and efficiency of routine tasks. Fornon-experts, AutoML systems can also democratize the accessibilityof data science workflows and machine learning solutions.

Participants re-ported using a variety of commercial tools to facilitate AutoMLwork and the need to create custom solutions for preparation, de-ployment, and communication data science processes. Participantsused or were actively investigating platforms like Alteryx (n=5) forautomating their workflows and integrating with Data Robot (n=3)or H2o.ai (n=3) to facilitate automatic modeling steps. Dataiku wasalso used in lieu of Alteryx and was seen as a better tool for facil-itating collaboration across processes. Participants also reportedusing Sagemaker (n=2), PowerBI (n=2) (Azure), and Data Bricks(1). Participants report leveraging libraries like TensorFlow (n=4),scikit-learn(n=2), and mxnet (n=1) for their AutoML work. Python,R, and their attendant notebook environments were described asbeing used by roles that had higher technical expertise. However,P02 observed that “businesses can’t deal with the notebooks” be-cause “data scientist(s) are now [needing] to build things that canrun in a production environment” in order to operationalize models.Moreover, as organizations seek to spread out the data work fromdata scientists to others in the organization, P15 observed that theywere “starting to see heavier reliance on data science products thatdon’t require heavy coding.”. For individuals without a data scienceor computer science background the need to write code appear tobe a barrier, but our participants comments indicate that AutoMLcan lower, or potentially remove, this barrier. Moreover, there isan increasing appetite for a “platform [that] helps people movebetween tools that they have selected”(P04) and where “75% of theorganization could work”. Tool switching is common in data sciencebecause there are “Different tools for different analysis” and yet wefound most users would “prefer to stay in one environment”(P06). its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan

Importantly, data science processes are not linear but occur in a “bigrecursive” (P09) loop . Constantly changing environments withinmultiple cycles of iteration and refinement is time consuming andimpractical. Participants did report visualizing their data via sys-tems like Tableau, or via charting libraries in R or Python, but manydescribed their limited use:As organizations scale, they’re going to spend less andless time doing visualizations [...] the job is to deliverresults of a model in some form[...] Data scientistsaren’t going to deploy with ggplot, but they may useit for static reporting or just for their information.(P17)This participant didn’t see visualization tools as scaling wellalongside AutoML and other data science work, leading to aban-donment, especially within results communication. Two other par-ticipants had to awkwardly move back and forth across modelingand visualization tools, and as a result the role of visualization waslimited throughout the process.

Participantsalso clearly understood that blind trust in AutoML could lead to po-tentially catastrophic failures. P20 worries that “lots of people willtry to predict things without really understanding. [...] People willmake horrific mistakes and not realize they’ve made them.” P12 col-orfully indicated that “having this [AutoML] tooling may just allowpeople to make stupid mistakes easier!” P05 observed that it’s “badto slap together models and try to make decisions from it withoutunderstanding how things work. [That’s like] giving a loaded gunto a child”. Participants were also concerned about regulatory con-straints, for example the European GDPR legislation, that requiresorganizations to be able to explain decisions made by automateddata science technologies. Even without legislative pressures, therewere internal organizational concerns around these technologies,especially when large financial decisions were involved. A toler-ance for errors and failure was an important factor in evaluatingthe use of AutoML technology; however, in many cases perfectionwas not required. For instance, P12 observed that there were “lotsof business use cases where 80% accurate could be okay”. P29’sobservation echoes P12’s that it is desirable for automation to helpyour surface failure points in your data preparation and analysisprocesses:I think they [executives] want you to use those [auto-mated insights] to look at a graph and say, “Oh wow,this is life changing. Let’s go make this change in ourbusiness.” We didn’t use it like that. We used it tomake sure that the results we were getting back madecommon sense.A surprising finding was the general concern around the use ofAutoML by “citizen data scientists” or domain experts that were notformally trained in data science, statistics or computer science. P12stated that while they understand organizations want to democ-ratize data science work, it still worries them because “in practiceyou’ll still have to be pretty technical” to analyze data. P22 raisedthe issue of the overhead needed to ensure those “who aren’t aswell versed...in the data science space are able to not make sillymistakes that they shouldn’t be making”. Perhaps the strongest stance we heard was from a participant who stated that they would“restrict it [AutoML] just to the data scientists” and “use it to getefficiency after demonstrating they know they are doing” (P05).This attitude was a recurrent theme in our study.Overall, from participants’ responses we see that the promiseof automating data science is tempered by very real concerns ofhow things could go wrong. However, this does not mean thatorganizations are pulling back from their investment in these tech-nologies. As P02 noted: “ambition is still ‘industry 4.0’ with lots ofautomation.”

Concerns toward safetyand trust of AutoML in many ways highlight the value of humans-in-the-loop approaches to balance automation with human oversight.However, participants also expressed concerns about the inclusionof humans within the AutoML loop. For example, while P02 ex-pressed that there was still “lots of room for humans-in-the-loop”innovations in their industry, they also stated that “the manual part,where you have to visualize something, is getting cut out as muchas possible”. P01 stated that even as "there are lots of automatedtools in place making decisions”, at the same time there was a lotof “anxiety in the firm about what people do with ML,” stemmingfrom concerns of automating decisions at scale. We interpret theseseemingly contradictory positions to mean that human-in-the-loopapproaches are valuable when applied at the right time and in theright way.

We briefly summarize the key takeaways forparticipant attitudes toward AutoML. While participants expressedconcerns about the potential to automating bad decision-making,there was also a growing interest in using AutoML technology toproduce a ’good enough’ result that could scope out the viability andpossible issues with the data or machine learning product. AutoMLallows for the creation of sophisticated tools with minimal codeand offers opportunities to ‘fail fast’, which enables data scientists,and even so-called ‘citizen data scientists’, to surface issues earlier.However, what is most clear is that applying AutoML technology atsuitable points in data science work is very important, otherwise it isdismissed as intrusive. To further explore when and where

AutoMLtechnologies could support data science, we analyzed interviewsthrough the lens of an existing model for data science work.

In this section we summarize our findings on the use of AutoML inthe data science process. We consider places where considerablehuman labor is required either to support the creation of a machinelearning model or to interpret, communicate, and act on its find-ings. Following the framework described in Section 3.2, we beginby examining AutoML in data preparation, followed by analysis,deployment, and communication.

Participants understood that without robust datapreparation the AutoML portion of their Data Science processeswould be ineffective. P02 succinctly stated that,AutoML has never been the solution - it’s a shiny toy.[It’s] always going to be about the quality of the data- do you understand what you are modeling?

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al.

In our interviews, participants identified several challenges indata preparation work that still required a lot of human labor, fromgathering data, profiling it, and wrangling it into shape for analy-sis. It is important to emphasize that participants rarely began bycleaning a single tabular dataset, but often needed to bring severaldata sources together. P12 reported that “40-50% of my team’s time[was spent] on Alteryx to bring data together”, while P23 reflectedthat “the most difficult part of my day is getting the data that I needto work with”. Once the data were gathered, participants faceddifficulties with data profiling, a challenge that was also surfacedin prior work by Kandel [31] and Alspaugh [1]. P19 expressed thatautomation inData profiling would be a huge win. I spend a ton oftime having to explain the shape of data, and whatshapes work best, how to explore the data, and howto refactor as neededEven when participants have data gathered and profiled, theystill need to assess its utility for further downstream analysis. Rapiditeration via AutoML plays a role in speeding up this manual process.P23 described that they “get [the data] to a point where maybe it’s60% [clean], and they start to run algorithms...[in order to]...expandon the data itself”. The workflow described here resonates withrapid prototyping and failing fast to discover issues or limitations inthe data. This observation also echoes data reconnaissance and taskwrangling processes previous described in [11], in which individualsacquire and quickly view data in order to assess its suitability foranalysis and decide whether to pursue additional data sources.Despite the usefulness of AutoML-driven prototyping, challengesassociated with data preparation remain. This reflects an emergingtheme from our analysis that there is a growing need to more tightlycouple investments in data prep and model building. For instance,the experience of these challenges led P03 and their team to makemore significant investments toward “tools for democratization ofdata prep and (to a lesser extent) model building”.It is not surprising that data preparation is both time consumingand important to the successful application of AutoML and datascience more generally. Prior studies [33, 46] with data scientistsand other domain experts have routinely pointed to this bottleneckfor years, and visualization tools such as Trifacta (and its academicpredecessors Wrangler [32] and Profiler [30]), and Tableau Prephave been developed to address this challenge. It is disconcertingthat preparation continues to be such a significant bottleneck de-spite existing tools. Our observations suggest that one reason forthis may be that existing tools for data preparation do not easily fitwithin a data workers’ analytics environment. By extension of theseobservations, AutoML technology needs to be well-integrated withexisting tooling environments, while also surfacing the manuallabor and lack of adequate tooling for data preparation.

Perhaps as a testament to the advancement of AutoML technology,participants reported that model building “is fast and easy”(P12).P25 further elaborated thatIf I can actually get through all the data stuff, thengetting the predictive model is not really that hard.There’s a huge bunch of code to get the data ready for the model, then a tiny bit of code for the model. Andthen the rest of the work is delivery to the customer.The desired amount of oversight and control over AutoML in theanalysis appeared to vary by level of technical expertise. P26 felt thatindividuals with high expertise in statistics and/or computer sciencewere less likely to use automation because “they want everything tobe customizable”, whereas those with less technical expertise, whomthey refer to as “citizen data scientits” were “focused on integratingintelligence to their app” and tended to prefer higher levels ofautomation. This latter group is growing as P03 observed : “the vastmajority of data science type work is done by non-data-scientist[s]”.From participants’ responses we also noted that this group with alow or evolving technical expertise often needed heavy guidance,low or no-code AutoML implementations, and visualization. P29also observed how individuals with higher technical expertise couldact as a rate limiting step to analysts engaged in data work, but thatautomation could serve as a catalyst:[Analysts] maybe can do their own little, their ownlittle predictions, off to the side. And they can dothem fast [...] PhDs could always do it better, but arethey there? Do they have time? The answer is almostalways no. You can either do good or you can donothing. Better is there, but you’re not getting better.They’re [PhDs] busy, working on the big problems.Moreover, summarize that individual with without high technicalexpertise benefited from proper guidance, no-code solutions, anddata visualization to help situate themselves within the analysisprocess given many “steps in (data science) workflow contain lotsof details that are hidden” (P03).It was surprising that visualization was not more widely men-tioned to steer the model authoring processes even though thereexists a number of visualizations systems to help individuals doso [7, 8, 53]. One possibility is that individuals with high technicalexpertise are constructing novel and bespoke models that exist-ing data visualization tools cannot easily support. Instead, thesetechnical experts might benefit from highly customized visualiza-tions for certain classes of models, such as those generated bytensorboard [49]. In contrast, individuals with lower technical ex-pertise lack the background knowledge to orient themselves andeffectively interact with visualizations exposing the mathematicalunderpinnings of machine learning models. They rely on the Au-toML systems to make model decisions and it may not be easy forthem to correct and refine these models.While participants did raise concerns that AutoML-derived mod-els and results were a ‘black box’, it also appears that technicalacumen in statistics and computer science was perceived as nec-essary and possibly sufficient to ‘open up the black box’. Echoingconcerns we summarized in Section 4.1.3 participants saw AutoMLas potentially contributing an existing “knowledge gap [that is]becoming wider” (P19) because the availability of AutoML meantthat individuals with lower technical expertise could harness thepower of machine learning without having to seriously engagewith its technical and nuanced underpinnings. Our interpretationof these concerns was that trust in the individual conducting thedata analysis was as important as trust in the AutoML processitself. Moreover, collaboratively sharing knowledge or including its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan human oversight may mitigate some of these concerns and couldbe achieved through visualization of data science processes. Thedifferences in data science roles and the extension of data sciencework to individuals without formal training in statistics and com-puter science has several implications for the design of AutoML andvisualization tools. While AutoML tools reduce or even eliminatethe need to write code it becomes important to consider what kindsof guard rails might need to be put in place. We believe that datavisualization tools are an important component of such guardrails,but that we require a finer-grained understanding of data workersto design such tools effectively.

Participants used AutoML technologies to rapidly prototypeviable solutions in both data preparation and analysis. The needfor rapid prototyping stems from the challenges of generalizingAutoML to a variety of problems, which requires manual effort. P21acknowledged that “AutoML is really hard, and I think we have somany operations with such nuance that we actually most of thetime really... just want to be doing simple stuff correctly, rather thanadding additional layers of complication.” P24 was much more ex-plicit in stating that “every customer is different [...] [but] AutoMLis supposed to be a generalized framework. So, that is a problem”.The challenges of AutoML to generalize to a variety of problemsare known, especially when it concerns fixed structure computa-tional pipelines (which is the most common implementation ofAutoML) [54]. Despite this lack of generalizability, participantsfound that they could leverage AutoML to rapidly prototype datascience solutions. P23 offered description of such a use case:I talk to clients daily. If I could get ML done just realquick, as a prototype into what we could build (indepth), that would be super helpful. [...] Can youget me 50% of the way to answering some questionquickly, that would benefit meP21 reported using automatically generated results to start aconversation with others ahead of making serious personnel orinfrastructure investments. They shared that “when starting outwith a client, we’ll run the default model and we’ll say, ‘Hey, here aresome of the topics, some of the interesting trends that are comingout’ ” and then use the clients reaction to further refine upon defaultmodel or craft a different solution altogether. As we previouslyreported, P12 and P29 also make the case for ’failing-fast’ to discoverissues in the data or analysis without expensive upfront investmentin fully developing an automated data science pipeline. This rapidand iterative use of AutoML to drive different data conversationsevokes a complex picture of a data worker operating within multipleloops of data science and organizational processes. This prototypingscenario offers an evocative example of how the limitations ofAutoML technology can be beneficial leveraged through human-in-the-loop interactions. We argue that with adequate guardrailsin place AutoML systems may also be able to further support theprocess of surfacing potentially more complex issues of bias.

The majority of machine learning modelsoften do not advance to production environments where they areapplied to real data. Those that do are often required to go through a set of governed processes before they are deployed and are con-stantly monitored once they are out in the wild. These governedprocesses vary as P01 described :[A] Governed workflow [is important]. Looking atwhat all teams are doing - are there divergences aroundgovernance, etc. e.g. models with a financial impacthave a very stringent governance process.The volume and variety of both data and models makes it chal-lenging to monitor and govern AutoML models deployed in pro-duction. P03 reported that their practice was to “err on the side ofletting people use tools [they preferred]” and to “monitor what toolsare being used”. They emphasized that vigilance was important toensure mistakes do not “clog up the server with bad content”, whichmight happen when pushing a model generated by AutoML intoproduction without adequately vetting it. These problems exist forall software code in general, but may be further exacerbated by thenovelty and complexity of AutoML. Moreover, the amount of dataproduced by automation can make it overwhelming to effectivelygovern models that need to be continually validated, and have aprocess that employs someone to “look for drift, look to increaseaccuracy and effectiveness of these models over time”(P07). Largerorganizations working under enforceable regulatory constraintsstruggle to find the right balance between integrating a potentiallyvaluable new technology like AutoML while conforming to theseconstraints:[There are] areas where critical models are developedthat will likely have very strict controls, often imposedby a regulator [...]if you have no clue what that [modelresult] means, you are on pretty thin ice (P03)As with preparation, the processes of governing deployed modelsstill requires considerable human effort. Moreover, we believe thatadding some sort of automation to these processes is desirable inorder to reap the efficiency benefits of AutoML. Dashboards areoften used to monitor changes in data [44], but participants did notreport using dashboards for AutoML work even though they mayalready use dashboards for other types of work. We hypothesizethat this relates to the tooling environment and that monitoringand the governance of AutoML systems require more specializeddashboards that are not well supported by existing tools. Theremay be fruitful work here for visualization and HCI research toimprove governance processes through better monitoring and, atleast, help them triage governance violations. Improved awarenessand consideration for governance throughout the visualizationdesign process can also inform the implementation of guardrailsfor AutoML throughout data science work.

Humanoversight of automation is critical to detecting when somethingnew or unexpected has happened, identifying the source of whathas happened, and implementing appropriate corrective actions asneeded. While there may be some ability to automatically detectanomalies, and thus make governed workflows easier to monitor,participants expressed doubt that such an approach would work inpractice. P12 stated that if “the forecast is clearly wrong a humancan detect this” whereas it is harder for the AutoML tool to do so.Still, other participants articulated the limited abilities of analysts

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al. to intervene appropriately, suggesting that some analysts “wouldn’tnecessarily know what to do next...[whereas]...a data scientist mightknow what to do next - for improving forecasts, for analyzing howgood it might be.” P26 succinctly summarized that as “you’re only asgood as what you debug”. These observations align with a recurrenttheme in our findings that trust between the underlying technologyand the data science teams is critical for the wider adoption ofAutoML. Moreover, these observations expose the brittleness ofAutoML technology and its reliance on iterative loops of correctionand refinement with humans. In the next section, we also emphasizethat AutoML loops are not closed systems; rather these are loopsthat interact with many other loops of business processes andpre-existing modes of human collaboration. Thus, ‘debugging’ notonly requires technical expertise that spans the preparation todeployment processes, but includes sufficient domain expertiseto recognize and account for other ‘loops’ that interact with andbeyond data science.

Communication and collaboration are essential data scienceprocesses, including but not limited to model automation [33, 46, 52].AutoML systems require communication between humans and thetechnical system. P22 expresses one such mode of communicationin which the AutoML system helps guide users in analysis by com-municating “this is what it is that you’re about to do, and this is theimpact it will have” and should also prompt users with respect tocertain actions with “are you sure you want to do this?”. P17 alsonoted that more could be done to “walk the user through that [adata analysis, for example] given a kind of data to predict, here arethe kinds of models and visualizations to use”. AutoML introducesa new mode of collaboration between humans as well. This newfrontier can also be challenging to navigate as P29 observed:So, there’s human interaction along the whole life cy-cle. And interpreting that human interaction is whatwe’re trying to get machine learning to do.However, participants indicated that these diverse individualsmust still work together to deliver actionable and safe results fromautomating technology. A common theme emerged around the de-sire to broaden data engagement across the organization, bringingmore people into the data sensemaking loops. In P21’s words,[We] need our workforce to be more data savvy acrossthe board. An engineer needs to be able to play withdata as much as the MBA does [...] [and] giving thembetter tools will help with ramp up.A big part of having teams work more effectively together is toprovide more situational awareness of data science workflows andwho has done which task. For instance, P06 described what wouldbe needed to support teams working together across workflows.This support includes surfacing "notification[s] that people [are]working on the same step ...[and]... underlying metadata about howpeople were using the platform ” (P06). Integral to this collaborationwas the ability to hand-off different aspects of the data or analysisprocesses to different team members, likely with a different datascience role:They can create a workflow, share it with other people,people can build off of that workflow, grab a table from that workflow and then build their own. Thatcollaboration aspect of it was important to us. (P29)In order to improve collaboration some participants defined a vi-able solution around making workflows visual, interrogable, andextensible.Participants also highlighted the importance communicating toindividuals that were one step removed from the data analysis pro-cesses, most often communication to executives or other businessleadersIn this process, visualization plays a clear role as a communica-tion tool. A theme emerging from the analysis was that participantsoften framed this part of the process as more difficult than themodeling itself. This was due in part to the extra work required.For example, P19 describes the additional labor it takes, especiallywhen the visualization tools are not well integrated into existingprocesses :There is a gap once your analysis is done on present-ing the results. Nobody wants to spend more hoursin another tool to build charts for explanation.For instance, P05 described the challenges of authoring compellingvisualizations and how “nice visualizations feel like a hack that theaverage user can’t build themselves”.The other challenge often cited was around the efficacy of visual-izations as a medium of communication to drive business decisionsor processes. Participants described the challenge of translatingtheir work to business users that were not versed in modelingvernacular. In one participant’s words, “we don’t want people toactually understand model jargon, we want to help them under-stand what the model is saying in business terms.” This participantrelies on interactive visualizations to support the dialogues thatthey anticipate will happen when showing a snapshot of results tobridge the gap. Still, despite efforts to translate for business usersthis participant’s team experienced a range of challenges in opera-tionalizing their models. P26 describes how “the challenges that wesaw as the data science team is...we give this [model or results] tothem, but then actually, the action of implementation of this in themarket sometimes doesn’t always pull through. So it’s like we didall this work, you said it was good, but now you have to take it to thelast mile, actually get to marketing, creative, and content, and get itout to market.” They point to communication difficulties betweendata scientists and others at the organizations as exacerbating this‘last mile’ problem, which results from “either lack of funding orsense of disbelief in prediction models and ML techniques” (P26).Validation measures may also be required for regulated industriesthat can slow this process down.Taken together, the collaborativenature of data science work imposes constraints on the design of vi-sualization tools, which must be usable across the organization andinteroperable by individuals embedded within a variety of analytic,business, and governance processes.

Our examination of AutoML technology alongthe data science pipeline, both where it exists and where it does not,helps us to understand the current capabilities of this technologyand how the technology and its surrounding ecosystem can be fur-ther developed to support data scientists and others. We see gapsfor AutoML technology outside of data analysis processes and thattranslate to unmet tooling needs in data preparation, governance, its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan and deployment processes. These are also processes where consid-erable human labor is still required to make AutoML technology indata analysis viable. Automation that extends to these other pro-cesses, ideally with appropriate guardrails, could improve both thequality and speed of data science work. Moreover, the relationshipbetween automation and data science expertise emerged as a criti-cal consideration for what future tools should support, includingthe types of guardrails that should be built in. We were surprisedto surface some of the tensions that existed between data scienceexperts, and data workers with different training. Emerging fromthis tension was one heavy-handed guardrail strategy to restrictaccess to AutoML technology that many sought to implement. Webelieve this view has surfaced from a lack of adequate tools tosupport the safe creation, deployment, and governance of thesemodels and that there are many fertile opportunities for visualiza-tion research in this space. However, our analysis of participants’comments reveals that existing visualization tools are falling shortof their needs. Moreover, that data visualizations tools can have asteep learning curve and their is little motivation to use followingintensive analysis. We underscore that it is critical to understandthe diversity of teams that carry out data science work and theways they intersect with many organizational processes. In otherwords, visualization tools need to work for many humans engagedin many loops.

We now reflect on our findings and summarize the central themesthat emerged from our analysis.

The general attitudes toward AutoML suggest three usage scenariosfor this technology that are conditioned on the technical expertise(statistics and computer science) of the individual analyzing thedata and the magnitude of consequences associated with errors.The first usage scenario is automating routine tasks, thereby re-ducing the coding efforts of data science teams and improving thespeed of the analysis processes. A second usage scenario is the rapidexploration of potential data science solutions through low-effortprototyping. Such prototyping approaches can be used by individu-als with varying degrees of technical expertise. Its possible that forindividuals with high technical expertise (such as, data scientists,generalists, research scientists, ML/AI engineers, and data shapers)prototyping allows them to quickly create a base framework thatthey further develop into novel solutions for arising technical chal-lenges. For other individuals, prototyping enables them to have aconversation around the data with customers and other membersof their organization. Prototyping also enables individuals and datascience teams to fail fast and discover issues with their data andanalysis before investing in considerable engineering effort. A thirdand final usage scenario is the use of AutoML toward democratizingthe ability to create a machine learning model, empowering indi-viduals that would not be able to build a model otherwise. In thisthird scenario, we argue that individuals require heavy guidanceand guardrails from an AutoML systems and may have very limitedability to identify errors or correct them. The delineation of these usage scenarios is intended to guidevisualization researchers as they explore opportunities to developtechniques or systems for AutoML.

Considerable human labor is still expended to prepare data, governand deploy a model, and to communicate the results to impactedindividuals and other decision-makers. An end-to-end AutoML so-lution capable of addressing the full scope of such data sciencework does not currently exist, and as a result data workers, whichincludes individuals that are and are not data scientists, are finding ad hoc ways of bootstrapping AutoML technology into their work.In Figure 1, we outline a common set of eight steps synthesizedfrom participants’ responses describing AutoML use in enterprisesettings. We further align these steps within higher order data sci-ence processes. For data preparation and analysis, these tasks wereprototyping, exploring the results, and settling on a solution toimplement. Should this solution reach a certain level of maturity itis deployed into production following a verification of the solution(including compliance of regulations), where it is consistently mon-itored while in production. Finally, these deployed models can beused to take action through an inspection of the results that surfacenew insights for decision making. We illustrate the levels of automa-tion [41] that we believe are desirable for future AutoML systemsto support, considering the range of participant challenges andconcerns this study surfaced. Importantly, the level of automationis not consistent across all data science processes. Human oversightis still required throughout data science work and is dictated byboth regulatory requirements and organizational practices. Mostautomation likely needs to adopt a ‘cruise control’ mode of interac-tion [34], where humans can oversee and steer AutoML systemswithout needing to guide the systems at each step. Even this wouldbe an improvement over current AutoML systems that appear tooscillate between ’autopilot’ and ’user-driven’ modes. We furtherillustrate the level of automation required by individuals with highexpertise in computer science and/or statistics (Data Scientists,ML/AI or Data Engineers), and low or an evolving technical exper-tise in these areas (Business Analysts, Moonlighters). Individualswith high expertise can benefit from full automation, for examplewhen speeding up routine work (Usage Scenario One) or to rapidlyprototype and explore new solutions (Usage Scenario Two). Even inthese two usage scenarios individuals with high technical expertisestill rely on considerable manual effort, but this in fact might be anappropriate use of their expertise and focus on “bigger problems”,especially if other trivially automated tasks are reliably handled byan AutoML system. Individuals with lower or and evolving techni-cal expertise require much more support and guidance and wouldrely much more on full automation to rapidly prototype solutions(Usage Scenario Two) or even to begin to engage in data sciencework more generally (Usage Scenario Three). However, while theseindividuals rely on AutoML systems to guide them, their domainexpertise still needs to be incorporated in downstream steps.While Figure 1 is a useful illustrative summary of our findings,it needs to be further validated in future studies that assess itsgeneralizability. We suggest how to do so in our Discussion section.

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al.

Taken together, these usage scenarios an levels of automation im-pose a set of constraints for the design of visualization tools that op-erate together with AutoML technologies. Visualization researchersneed to carefully consider where and how automation is currentlydeployed, the diversity and expertise of the data science teams, andthe full breadthof data science processes. We have illustrated a setof steps and proposed the levels of automation in Figure 1 thatdata workers with different levels of expertise desire. Importantly,by illustrating an end-to-end pipeline, we encourage visualizationresearchers to consider how changes across a workflow influencethe kinds of data to be visualized and the fundamental tasks thatthese workflow steps support. For example, ‘prototyping’ may havedifferent tasks associated to it depending on whether the analystswant to develop a new model, fail fast, or prototype some solutionfor a customer. The ‘monitor’ process in deployment could rea-sonably rely on high automation until the system requires humanaction, much like auto pilot in aircraft. Alternatively, ‘exploration’may require less automation if the user is expected to steer thealgorithm. Without a concrete understanding of usage scenarios,data science steps, and level of automation, researchers risk elicit-ing inappropriate tasks and creating visualization tools that willbe dismissed because they are not well integrated into end-to-enddata science workflows. Visualization researchers can reference ourfindings and the summary in Figure 1 as a guide to support theirown task elicitation for the design and evaluation of visualizationtools.

Lastly, we briefly reflect on our findings and propose modifica-tions to the framework of data science work and workers reportedin [10]. We remind the reader that this framework is described inSection 3.2 and delineates a set of higher and associated lower order data science processes that we used as part of selective cod-ing analysis. First, we propose that

Collaboration be added as alower-order processes of

Communication . While collaborationwas part of the original framework there was not enough evidenceto determine how it should be incorporated. This analysis sug-gests it belongs as a component of

Communication alongside documentation and dissemination lower order processes. Moreover, collaboration emphasizes the ways that individuals engage in multi-directional exchanges of knowledge and data products (data, code,models, documents), whereas dissemination refers to a more uni-directional exchange of knowledge from an individual to others.Second, we propose that governance be included as a lower-orderprocess of deployment . While governance processes can technicallyencompass all of data science work, our findings point to its specificimportance in managing the process of launching, monitoring, andrefining machine learning models deployed into production set-tings. Finally, we propose a new higher order process,

Guidance ,which follows communication. We assign three lower order guid-ance processes based upon our analysis : human-machine guidance , human-human guidance (or pedagogy), and organizational guid-ance . Human-machine guidance describes the interplay betweenAutoML tools surfacing new data insights to humans and humansmaking corrections and refinements of AutoML models and results.

Human-human guidance describes the collective work in building adata savvy organization and other efforts to bridge the data science“knowledge gap”. Alternatively this could be referred to as peda-gogical process. Finally, organization guidance refers to regulationsand other organizational processes that impose constraints on theuse of data, models, and the level of automation.

Visualization and HCI researchers have used enterprise studies todiscover unmet needs of practitioners that have inspired new re-search trajectories that have ultimately led to new techniques andtools. As we consider the future of AutoML in enterprise, we believea “cruise control” level of interaction [34] (Figure 1) is more likely tobe adopted. However, we see significant barriers to implementingsuch a level of automation that stem from the diversity among dataworkers with different types expertise, a complex tooling environ-ment that needs to be integrated, and brittle workflows that stillrely on considerable human effort. Although visualization can playa role in supporting ‘cruise-control’ type automation, it was notbeing widely used to that effect and, in some cases, getting activelyremoved from automated data science workflows.We believe thislack of uptake is that visualization tools are potentially misspecifiedfor the tasks they need to support and that this stems from poorunderstand of how automation is used in data science work andwhere there are opportunities for human-in-the-loop interaction.Our study fills this gap by surfacing usage scenarios and illustrationof automation throughout data science work, which informs thegoals and tasks feeding into visualization design and evaluation.

Throughout our analysis, we found both AutoML and human-in-the-loop to be misnomers for the processes that participants weredescribing. First, we noted in Section 2 that AutoML is used to referto an ever-expanding set of data science processes from preparationto deployment and as such is being used interchangeably with‘automating data science’ (among other phrases). We argue thisis limiting as not all automation of data science needs to be inservice of machine learning systems. Moreover, the notion of end-to-end AutoML obscures the human labor required for these systemsto work, now and in the future, leaving inadequate support forhuman-machine collaboration. Echoing Wang’s [46] language, weencourage researchers to augmenting data science with AutoMLrather than automating it. It is more than a matter of semantics– the idea of augmenting data work explicitly makes space forhuman engagement and brings humans needs to the forefront ofconsideration.Second, when we make explicit space for human engagementwe are encouraged to consider the diversity among data workers.As we summarize in our three usage scenarios, this type of en-gagement will vary depend on the goals of data workers and theirlevel of technical expertise. Along with prior studies [42, 52] wefound collaboration among data workers to be of critical impor-tance to the success of data work. Commensurate with findingsfrom Hong [22] we also show that trust amongst individuals en-gaged in data work was as important, or more so, than trust its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan in AutoML.

Surprisingly, AutoML technology appeared to erodetrust among collaborators of different technical expertise by en-abling so called “citizen data scientists” to potentially automatebad decision making. In theory, a human-in-the-loop paradigm foraugmenting data science work can also be useful to understand thetypes of engagement between humans and machines that couldameliorate some of these trust concerns. However, here, too we findthat human-in-the-loop is a limiting term. An AutoML correctionand refinement loop not only exists within a wider scope of datascience processes but also within organizational processes. Whilethe nomenclature of human-in-the-loop is not exclusive to a singleindividual interacting with AutoML, we argue that the notion of“humans-in-the-loops” more accurately captures how this technol-ogy is used within enterprise settings. We note that a limitation ofour findings was that study participants were primarily, althoughwith some exceptions, experts in data science. While several weremanagers who oversaw mixed teams, we none-the-less believe itis useful to follow-up our findings by soliciting the views of thoseindividuals that are not data scientists, but work closely with them.As Visualization and HCI researchers continue to explore ap-plications of technology like AutoML in data science work, weencourage them to consider the diversity of humans involved indata science work, their different needs and varying degrees towhich they benefit from AutoML technology as well as the myriadorganizational loops that are entangled within AutoML and datascience.

Overall, we see that there are opportunities for visualization toolsin data science work, especially in areas where there already ex-ist considerable human labor. We especially see that participantsstruggle to get an overview of data work and that this complicatestheir ability to effectively handoff data, models, and results withintheir organizations. A visual overview of data science workflowsemerged as an organic solution and is a promising area of futureresearch. But beyond this specific example, we hope that the usagescenarios we present will help researchers identify new unmet visu-alization needs toward the use of AutoML that we did not surfacehere. However, the most troubling findings from our study concernthe ecological validity of data visualization systems. We hypothe-size that one reason visualization tools were not more widely usedby participants was because they did not integrate well into existingdata science tooling environments. This may be because existingvisualization tools are developed as stand-alone systems where itis difficult to import data and export results, or because existingsystems do not scale well to the volumes and varieties of data thatorganizations collect, or even because these visualization systemsare themselves too brittle to flexibly adapt to variable data scienceor AutoML workflows. Moreover, visualization tools may not caterwell to individuals across the gradient of technical expertise, andthus may be too rudimentary for those with high technical exper-tise and too complex for those with lower expertise. We encourageresearchers to use our findings as a guide for surfacing these threatsof ecological validity early. Another fruitful area for visualization researchers is the creationof guardrails that surface and alert individuals of potential issueswith their data, models, or results. The development of guardrailscan help to examine concerns toward automating bad decisions. Ourresearch indicates that their design is contingent upon individualexpertise, the context in which individuals are using AutoML, andthe level of automation that is expected. Some areas, like datapreparation, will require more human labor alongside tools thatautomate their processes. Others, like monitoring a deploymentmodel would rely on human labor primarily to respond to events,like the detection of model drift. Guardrails in both scenarios canhelp analysts contextualize and triage problems as they arise, but thedesign of these guardrails will differ between these two scenarios.Well designed guardrails may also increase trust and collaborationnot only between data workers and automated processes, but alsoamong data science teams. While prior research has suggesteddesign considerations [2] and potential analytic pitfalls across visualanalytics processes [36], research is needed to bring these togetherto explore dynamic and adaptive visualization guardrails that areappropriate for an individual’s current analytic context.

The lack of existing studies on AutoML use in enterprise settingswas the motivating factor for carrying out this research. Our find-ings support prior research and shed new light on the challengesand uses of AutoML in enterprise settings. However, we also foundthat participants had quite different experiences in their use andexpectations of AutoML. As a result, our findings were simulta-neously rich in capturing the diversity of experiences and sparsein that some of our findings relied on a handful of observations.To produce a cohesive analysis of these experiences we used anexisting framework for data science work and workers as a scaffold.This sparseness of data and reliance on a scaffold is the primarylimitation of our findings. Further work is needed to validate thegeneralizability of our findings, but this may be difficult due to thenovelty of AutoML technology itself. One fruitful area of futurework is to take the key insights from our research as constructsaround which to develop a survey instrument that probes intoAutoML uses more specifically than our current interview study.We did not take this approach here because we felt we neededadditional information on AutoML use in the enterprise settingsand beyond. A future survey instrument could also be used withina large mixed-methods approach, such as sequential explanatorydesign, which uses the survey results, in lieu of the framework weuse here, as a more data-driven approach to inform a subsequentqualitative analysis.

Automating data science work through AutoML technology willcontinue to be commonplace in enterprise settings, especially atlarge organizations that work with large volumes of data. We iden-tified three usage scenarios for AutoML that we argue are routinein current enterprise environments. These are automation routinework, rapid prototyping for a potential solution, and democratizingaccess to machine learning technology and data science work moregenerally. Moreover, we surface the complex handoff of data work

HI ’21, May 08–13, 2021, Yokohama, Japan Crisan and Fiore-Gartland, et al. between AutoML systems and data workers, as well as betweendata workers having different levels of technical expertise. Indeed,AutoML systems still rely on considerable human effort to be effec-tive and even as this technology improves, human oversight willstill be required to be sure it is safe and effective. While data visual-ization can play an important role together with AutoML, we findthat it is used infrequently and is actively being minimized in datascience work. We see our findings as having important implicationsfor recasting the role of visualization in conjunction with AutoMLand data science more generally.

ACKNOWLEDGMENTS

The authors wish to acknowledge and thank to study participantsfor sharing their insights with us. We also wish to acknowledgemembers of the Tableau Research, User Research, and Tableau CRMfor their feedback on our study and findings.

REFERENCES [1] Sarah Alspaugh, Nava Zokaei, Andrew Liu, Cindy Jin, and Marti A. Hearst.2019. Futzing and Moseying: Interviews with Professional Data Analysts onExploration Practices.

IEEE Transactions on Visualization and Computer Graphics

25, 1 (2019), 22–31. https://doi.org/10.1109/TVCG.2018.2865040[2] Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, BesmiraNushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen,Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AIInteraction. In

Proc CHI’19 . 1–13. https://doi.org/10.1145/3290605.3300233[3] David M. Blei and Padhraic Smyth. 2017. Science and Data Science.

Proceedingsof the National Academy of Sciences

Interna-tional Journal of Qualitative Methods

5, 3 (2006), 12–23. https://doi.org/10.1177/160940690600500304[5] Anthony Bryant and Kathy Charmaz. 2007.

The SAGE Handbook of GroundedTheory . Sage Publications, Los Angeles, Calif.[6] Longbing Cao. 2017. Data Science: A Comprehensive Overview.

Comput. Surveys

50, 3 (2017), 1–42. https://doi.org/10.1145/3076253[7] Angelos Chatzimparmpas, Rafael M. Martins, Ilir Jusufi, and Andreas Kerren.2020. A Survey of Surveys on the Use of Visualization for Interpreting MachineLearning Models.

Information Visualization

19, 3 (2020), 207–233. https://doi.org/10.1177/1473871620904671[8] Angelos Chatzimparmpas, Rafael M. Martins, Ilir Jusufi, Kucher Kostiantyn, RossiFabrice, and Andreas Kerren. 2020. The State of the Art in Enhancing Trust inMachine Learning Models with the Use of Visualizations.

Computer GraphicsForum

39, 3 (2020), 713–756. https://doi.org/10.1111/cgf.14034[9] John W. Creswell and Cheryl N. Poth. 2018.

Qualitative inquiry & Research Design:Choosing Among Five Approaches (fourth edition ed.). Sage Publications, LosAngeles, Calif.[10] Anamaria Crisan, Brittany Fiore-Gartland, and Melanie Tory. 2020. Passing theData Baton: A Retrospective Analysis on Data Science Work and Workers.

IEEETransactions on Visualization and Computer Graphics (2020). https://doi.org/10.1109/TVCG.2020.3030340[11] Anamaria Crisan and Tamara Munzner. 2019. Uncovering Data Landscapesthrough Data Reconnaissance and Task Wrangling. , 46–50. https://doi.org/10.1109/VISUAL.2019.8933542[12] David Donoho. 2017. 50 Years of Data Science.

Journal of Computational andGraphical Statistics

26, 4 (2017), 745–766. https://doi.org/10.1080/10618600.2017.1384734[13] Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao,Changruo Zhao, Michael Muller, Lin Ju, and Hui Su. 2020. Trust in AutoML: Ex-ploring Information Needs for Establishing Trust in Automated Machine LearningSystems. In

Proc IUI’20 . 297–307. https://doi.org/10.1145/3377325.3377501[14] Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim.2014. Do we Need Hundreds of Classifiers to Solve Real World ClassificationProblems?

Journal of Machine Learning Research

15, 90 (2014), 3133–3181. http://jmlr.org/papers/v15/delgado14a.html[15] Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, andFrank Hutter. 2020. Auto-Sklearn 2.0: The Next Generation. https://arxiv.org/abs/2007.04074[16] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg,Manuel Blum, and Frank Hutter. 2015. Efficient and Robust Automated MachineLearning. In

Proc NeurIPs’15 . 2755–2763. https://doi.org/10.5555/2969442.2969547 [17] Yolanda Gil, James Honaker, Shikhar Gupta, Yibo Ma, Vito D’Orazio, DanielGarijo, Shruti Gadewar, Qifan Yang, and Neda Jahanshad. 2019. Towards Human-Guided Machine Learning. In

Proc IUI’19 . 614–624. https://doi.org/10.1145/3301275.3302324[18] Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro,and D. Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In

Proc KDD’17 . 1487–1495. https://doi.org/10.1145/3097983.3098043[19] Mary L. Gray and Siddharth Suri. 2019.

Ghost Work: How to Stop Silicon Valleyfrom Building a New Global Underclass . Houghton Mifflin Harcourt.[20] Jeffrey Heer. 2019. Agency Plus Automation: Designing Artificial Intelligenceinto Interactive Systems.

Proceedings of the National Academy of Sciences

Proc AAAI HCOMP’2020

8, 1, 63–72. https://ojs.aaai.org/index.php/HCOMP/article/view/7464[22] Sungsoo Ray Hong, Jessica Hullman, and Enrico Bertini. 2020. Human Fac-tors in Model Interpretability: Industry Practices, Challenges, and Needs.

ProcCSCW’2020 , Article 068 (2020), 26 pages. https://doi.org/10.1145/3392878[23] Eric Horvitz. 1999. Principles of Mixed-Initiative User Interfaces. In

Proc CHI’99

ProcCHI’11 . 3363–3372. https://doi.org/10.1145/1978942.1979444[31] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffery Heer. 2012.Enterprise Data Analysis and Visualization: An Interview Study.

IEEE Transac-tions on Visualization and Computer Graphics

18, 12 (2012), 2917–2926. https://doi.org/10.1109/TVCG.2012.219[32] Sean Kandel, Ravi Parikh, Andreas Paepcke, Joseph M. Hellerstein, and JeffreyHeer. 2012. Profiler: Integrated Statistical Analysis and Visualization for DataQuality Assessment. In

Proc AVI’12 . 547–554. https://doi.org/10.1145/2254556.2254659[33] Minyung Kim, Thomas Zimmermann, Robert DeLine, and Andrew Begel. 2018.Data Scientists in Software Teams: State of the Art and Challenges.

IEEE Transac-tions on Software Engineering

44, 11 (2018), 1024–1038. https://doi.org/10.1109/TSE.2017.2754374[34] D. Lee, Stephen Macke, Doris Xin, Angela Lee, Silu Huang, and Aditya G.Parameswaran. 2019. A Human-in-the-loop Perspective on AutoML: Mile-stones and the Road Ahead.

IEEE Data Eng. Bull.

42, 2 (2019), 59–70. http://sites.computer.org/debull/A19june/p59.pdf[35] Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: InformingDesign Practices for Explainable AI User Experiences.

Proc CHI’20 (2020), 1–15.https://doi.org/10.1145/3313831.3376590[36] Andrew McNutt, Gordon Kindlmann, and Michael Correll. 2020. Surfacing Visu-alization Mirages.

Proc CHI’20 , 1–16. https://doi.org/10.1145/3313831.3376420[37] Judith S. Olson and Wendy A. Kellogg. 2014. . Springer New York. https://doi.org/10.1007/978-1-4939-0378-8[38] Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016.Evaluation of a Tree-Based Pipeline Optimization Tool for Automating DataScience. In

Proc GECCO ’16 . 485–492. https://doi.org/10.1145/2908812.2908918[39] Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, and Jason H. Moore. 2016.Evaluation of a Tree-Based Pipeline Optimization Tool for Automating DataScience. In

Proc GECCO’16 . 485–492. https://doi.org/10.1145/2908812.2908918[40] Jorge Piazentin Ono, Sonia Castelo, Roque Lopez, Enrico Bertini, Juliana Freire,and Claudio Silva. 2020. PipelineProfiler: A Visual Analytics Tool for the Explo-ration of AutoML Pipelines.

IEEE Transactions on Visualization and ComputerGraphics (2020). https://doi.org/10.1109/TVCG.2020.3030361[41] Raja Parasuraman, Thomas B. Sheridan, and Christopher D. Wickens. 2000. AModel for Types and Levels of Human Interaction with Automation.

IEEE Trans-actions on Systems, Man, and Cybernetics - Part A: Systems and Humans

30, 3(2000), 286–297. https://doi.org/10.1109/3468.844354[42] Samir Passi and Steven J. Jackson. 2018. Trust in Data Science: Collabora-tion, Translation, and Accountability in Corporate Data Science Projects.

ProcCSCW’2018

CSCW, Article 136 (2018), 28 pages. https://doi.org/10.1145/3274405 its and Starts: Enterprise Use of AutoMLand the Role of Humans in the Loop CHI ’21, May 08–13, 2021, Yokohama, Japan [43] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, RonWeiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau,Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn:Machine Learning in Python.

Journal of Machine Learning Research

12, 85 (2011),2825–2830. http://jmlr.org/papers/v12/pedregosa11a.html[44] Alper Sarikaya, Micharl Correll, Lyn Bartram, Melanie Tory, and Danyel Fisher.2019. What Do We Talk About When We Talk About Dashboards?

IEEETransactions on Visualization and Computer Graphics

Proc. CSCW’19 , 24. https://doi.org/10.1145/3359313[47] Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J.Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: IncreasingTransparency and Controllability in Automated Machine Learning. In

Proc CHI’19 .1–12. https://doi.org/10.1145/3290605.3300911[48] Daniel Karl I. Weidele, Justin D. Weisz, Erick Oduor, Michael Muller, Josh Andres,Alexander Gray, and Dakuo Wang. 2020. AutoAIViz: Opening the Blackbox of Automated Artificial Intelligence with Conditional Parallel Coordinates. In

ProcIUI’20 . 308–312. https://doi.org/10.1145/3377325.3377538[49] Kanit Wongsuphasawat, Daniel Smilkov, James Wexler, Jimbo Wilson, DandelionMané, Doug Fritz, Dilip Krishnan, Fernanda B. Viégas, and Martin Wattenberg.2018. Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow.

IEEE Transactions on Visualization and Computer Graphics

24, 1 (2018), 1–12.https://doi.org/10.1109/TVCG.2017.2744878[50] Quanming Yao, Mengshuo Wang, Hugo Jair Escalante, Isabelle Guyon, Yi-Qi Hu,Yu-Feng Li, Wei-Wei Tu, Qiang Yang, and Yang Yu. 2018. Taking Human outof Learning Applications: A Survey on Automated Machine Learning. http://arxiv.org/abs/1810.13306[51] Jun Yuan, Changjian Chen, Weikai Yang, Mengchen Liu, Jiazhi Xia, and ShixiaLiu. 2020. A Survey of Visual Analytics Techniques for Machine Learning.

Compunational Visual Media (2020). https://doi.org/10.1007/s41095-020-0191-7[52] Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How Do Data ScienceWorkers Collaborate? Roles, Workflows, and Tools.

Proc CSCW’2020 , Article 022(May 2020), 23 pages. https://doi.org/10.1145/3392826[53] Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confidenceand Explanation on Accuracy and Trust Calibration in AI-Assisted DecisionMaking. In