[PDF] Falx: Synthesis-Powered Visualization Authoring

Abstract

Modern visualization tools aim to allow data analysts to easily create exploratory visualizations. When the input data layout conforms to the visualization design, users can easily specify visualizations by mapping data columns to visual channels of the design. However, when there is a mismatch between data layout and the design, users need to spend significant effort on data transformation. We propose Falx, a synthesis-powered visualization tool that allows users to specify visualizations in a similarly simple way but without needing to worry about data layout. In Falx, users specify visualizations using examples of how concrete values in the input are mapped to visual channels, and Falx automatically infers the visualization specification and transforms the data to match the design. In a study with 33 data analysts on four visualization tasks involving data transformation, we found that users can effectively adopt Falx to create visualizations they otherwise cannot implement.

Full PDF

FFalx: Synthesis-Powered Visualization Authoring

Chenglong Wang [email protected] of Washington

Yu Feng [email protected] of California, SantaBarbara

Rastislav Bodik [email protected] of Washington

Isil Dillig [email protected] University of Texas at Austin

Alvin Cheung [email protected] of California, Berkeley

Amy J. Ko [email protected] of Washington

ABSTRACT

Modern visualization tools aim to allow data analysts to easily cre-ate exploratory visualizations. When the input data layout conformsto the visualization design, users can easily specify visualizations bymapping data columns to visual channels of the design. However,when there is a mismatch between data layout and the design, usersneed to spend significant effort on data transformation.We propose Falx, a synthesis-powered visualization tool thatallows users to specify visualizations in a similarly simple way butwithout needing to worry about data layout. In Falx, users spec-ify visualizations using examples of how concrete values in theinput are mapped to visual channels, and Falx automatically infersthe visualization specification and transforms the data to matchthe design. In a study with 33 data analysts on four visualizationtasks involving data transformation, we found that users can ef-fectively adopt Falx to create visualizations they otherwise cannotimplement.

ACM Reference Format:

Chenglong Wang, Yu Feng, Rastislav Bodik, Isil Dillig, Alvin Cheung, and AmyJ. Ko. 2021. Falx: Synthesis-Powered Visualization Authoring. In

CHI Con-ference on Human Factors in Computing Systems (CHI ’21), May 8–13, 2021,Yokohama, Japan.

ACM, New York, NY, USA, 15 pages. https://doi.org/10.1145/3411764.3445249

Modern visualization authoring tools, such as declarative visual-ization grammars like ggplot2 [50], Vega-Lite [37] and interactivevisualization tools like Tableau [42] and Voyager [54], are built to re-duce data analysts’ efforts in authoring visualizations in exploratorydata analysis. At the heart of these tools, visualizations are specifiedusing grammars of graphics [52], where every visualization can besuccinctly specified using the following three components: • A graphical mark that defines the geometric objects used tovisualize the data (e.g., line, scatter plots, bars), • A set of visual encodings that map data variables to visualchannels (e.g., 𝑥 , 𝑦 -positions of points), Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for components of this work owned by others than theauthor(s) must be honored. Abstracting with credit is permitted. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

CHI ’21, May 8–13, 2021, Yokohama, Japan © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM.ACM ISBN 978-1-4503-8096-6/21/05...$15.00https://doi.org/10.1145/3411764.3445249 • A set of parameters that decide visualization details: coordinatesystem, scales of axes, legends and titles.In practice, users only need to specify the mark and the visual en-codings in order to create the visualization because many tools usea rule-based engine to automatically fill in parameters for visualiza-tion details (often referred to as “smart defaults”) unless the userwants further customization. The abstraction of graphical marks,visual encoding channels, and adoption of smart default parametersopen an expressive design space for data analysts that allow themto rapidly construct visualizations for exploratory analysis throughsimple specifications . For example, to visualize the dataset in Fig-ure 1 with three columns Date , Temp (for temperature) and

Type as a scatter plot, the user can choose the graphical mark “point”with encodings { 𝑥 ↦→ Date , 𝑦 ↦→ Temp , color ↦→ Type } . The visu-alization tool then creates one point for each row in the input data,by mapping its values in columns Date and

Temp to 𝑥 , 𝑦 -positionsand assigning a color to each point based on its value in column Type . Here, the tool uses the default linear scale for 𝑥 , 𝑦 -axis andcategorical scale for color, which are default parameters that theuser does not need to specify explicitly. The final visualization isrendered in Figure 1 (right). Date Temp Type

Type → color Temp → 𝑦 Date → 𝑥 −−−−−−−−−−−−−−−−→ Figure 1: An example dataset and its scatter plot visualiza-tion that maps

Date to 𝑥 , Temp to 𝑦 and Type to color . In fact, the simplicity of these high-level visualization grammarsis grounded in their abstract data model. These grammars expectthat the input table is organized in a layout that matches the visual-ization design [51]: (1) each relation forms a row in the input dataand corresponds to exactly one geometric object in the visualiza-tion, and (2) each data variable forms a column that can be mappedto a visual channel. In practice, however, the mismatch betweenthe data layout and the visualization design is common due to thefollowing reasons [9, 51]: In our paper, we refer to “expressive visualizations” as the set of visualizations that aresupported by tools powered by grammars of graphics (e.g., visualizations in Tableau,Vega-Lite, ggplot2) as opposed to more general customized visualizations. a r X i v : . [ c s . H C ] F e b HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

Date Temp Type pivot −−−−−→

Date Low High

Low → 𝑦 min High → 𝑦 max Date → 𝑥 −−−−−−−−−−−−−−−−→ Figure 2: A different visualization design requires transformation of the original input data. • Tables exported from different sources (e.g., database, analysistool, different team member) may have different layouts andthey may not directly match the visualization design. • Different analysis tasks require different visualization designs,and changes in the design can lead to different expected datalayout. • The data may need aggregation (e.g., average, count, culmina-tive sum) or additional computation to derive new values priorto visualization.In all of these cases, data analysts cannot directly visualize thedata with a simple specification. They have to conceptualize theexpected data layout and utilize data transformation tools (e.g.,tidyverse [51], Trifacta [17]) to transform the data to match thevisualization design. These additional tasks create a barrier fordata visualizations and greatly increase the effort required for ex-ploratory analysis [7, 9, 18, 53]. For example, if the data analystdecides to change the visualization in Figure 1 to a bar chart withfloating bars that show the temperature range during each day(Figure 2 right), the original data layout will no longer match thenew design since the new design expects three data columns (date,lowest temperature, highest temperature) that map to 𝑥 , 𝑦 max and 𝑦 min . As a result, the data analyst needs to transpose the table inFigure 1 using a pivot operation (to collect key-values pairs in the Type and

Temp columns into new columns) before mapping datacolumns to visual channels (Figure 2 right).We propose Falx, a synthesis-based visualization authoring toolto address the challenges outlined above. Falx builds on recentadvances in program synthesis: many program synthesis tools (e.g.,FlashFill [10], Wrex [6]) have been developed with the promisesof automating challenging or repetitive programming tasks forend users by synthesizing programs from user demonstrations.In our design, instead of asking analysts to transform data andspecify visualization manually, Falx asks analysts to demonstrate the visualization task using examples of mappings from concretevalues in the input data (as opposed to table columns) to visualchannels. Using these examples, Falx automatically synthesizesthe programs to transform and visualize the full data, such thatresulting visualizations are consistent with the examples (i.e., allexample mappings are contained within the visualization). Forexample, for the data in Figure 2, the user can create an examplebar to demonstrate the task andlet Falx create the desired visualization for the full dataset (Figure 2right). Sometimes, the examples can be ambiguous to Falx, and Falxmay generate multiple visualizations that match the example butnot necessarily the user intent. In such cases, analysts can interact Demo available at https://falx.cs.washington.edu/ with an exploration panel to inspect the synthesized visualizationsand select the desired one. After that, analysts can further fine-tunedetails of the desired visualization through a post-processing panel.Falx’s design has many potential advantages. First, users of Falxspecify visualizations by mapping values to visual channels: thisapproach inherits the simplicity from grammars of graphics but pro-vides more expressiveness since users can use the same examples tospecify visualization ideas for inputs with different layouts. Second,Falx offloads the data transformation task to the program synthe-sizer so that users no longer need to conceptualize the expected datalayout or transform the data. Finally, while program synthesizersby design can generate multiple results, users can effectively selectand validate the desired visualization from synthesized candidatesusing the exploration panel in Falx. In general, rather than havingto construct a visualization, data analysts demonstrate the taskusing examples and then select the desired visualization from acandidate pool, which shifts from the challenges of expression tothe ease of recognition. With these designs, Falx aims to eliminateusers’ prerequisites in data transformation and enable data analyststo rapidly author expressive visualizations.We conducted a user study with 33 participants to test thesedesign hypotheses, studying how users adapt to the new visual-ization process. Our results show that users of Falx, regardless ofprevious experience in visualization, can efficiently learn and solvechallenging visualizations tasks that cannot be easily solved usingthe baseline tool ggplot2. However, we also discovered challengesthat users face when using the tool and strategies they adopt tosolve the problems. We believe these discoveries lead to futureopportunities in adopting synthesized-based visualization tools inpractice and unveil other potential designs that can further improvethe usability of such tools.

We first go through an example to illustrate the anticipated userexperience in Falx (Section 2.2) compared to R (Section 2.1). Inthis example, a data analyst has the following dataset with NewYork and San Francisco temperature records from 2011-10-01 to2012-09-30.

Date New York San Francisco2011-10-01 63.4 62.72011-10-05 64.2 58.7... ... ...2012-09-25 63.2 53.32012-09-30 62.3 55.1

The analyst wants to create a visualization to compare the tem-perature in the two cities. First, the visualization should containtwo lines to show temperature trends in the two cities; these two alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan lines should be distinguished by color. Second, on top of the linechart, a bar chart should be layered on top to show the temperaturedifference between the two cities for each date. Each bar shouldstart from the New York temperature and end at the correspondingSan Francisco temperature, and the color gradient of the bar shouldindicate the temperature difference between the two cities on thatday. The desired visualization is shown in Figure 3.

Figure 3: A visualization that compares New York and SanFrancisco temperatures between 2011-10-01 and 2012-09-30.

We first illustrate how a data analyst, Eunice, would create thisvisualization in R using tidyverse [51] and ggplot2 [50], two widely-used libraries for data transformation and data visualization.After loading the data into a data frame in R, Eunice decidesto first create the line chart that shows temperature trends of thetwo cities. To do so, Eunice chooses the function geom_line fromthe ggplot2 library. In order to create lines with different colors fordifferent categories, Eunice needs to supply four data variables tothe geom_line function – two variables for specifying x and y posi-tions, one for colors of the line, and the last one for groups of lines(i.e., which points belong to the same line). Since the input datadoes not have these variables, Eunice needs to use the tidyverselibrary to transform the input data. To do so, Eunice first conceptu-alizes the desired data layout: the data should have 3 fields—date(for 𝑥 -axis), temperature (for 𝑦 -axis), and city name (for color andgroup). Eunice recalls a function pivot_longer in tidyverse, whichsupports pivoting the table from a “wide” to a “long” format bycollecting column names and values in the column as key-valuepairs in the body content. Specifically, Eunice writes the followingcode to transform the data, which yields the data on the right thatmatches Eunice’s expectation. df1 <- pivot_longer(data = df,cols = ("New York", "San Francisco"),names_to = "City", values_to = "Temperature") Date City Temperature2011-10-01 New York 63.42011-10-01 San Francisco 62.7... ... ...2012-09-30 San Francisco 55.1 (a) A line chart that shows temperature trends.(b) A bar chart that visualizes temperature difference.

Figure 4: Two visualizations created in R that compare NewYork and San Francisco temperatures.

After data transformation, Eunice specifies the visualization usingthe following script. The script maps

Date to 𝑥 -axis , Tempera-ture to 𝑦 -axis, and City to both color and group. It generates thevisualization in Figure 4a. plot1 <- ggplot(data = df1) +geom_line(aes(x = ` Date ` , y = ` City ` ,color= ` Temperature ` , group = ` Temperature ` )) Eunice then proceeds to create bars on top of the first layer tovisualize the temperature difference. Eunice first finds the function geom_rect from the library that supports floating bars. To visu-alize temperature difference, Eunice needs to specify positions ofbars by mapping

Date to 𝑥 min and 𝑥 max properties and mappingtemperatures of the two cities to 𝑦 min and 𝑦 max ; she also needs tomap the temperature difference between the two cities to color tospecify bar colors. Since the original data does not contain a columnfor temperature difference, Eunice uses the mutate function fromtidyverse to transform the data. Using the following script, Eunicesuccessfully creates the visualization in Figure 4b. df2 <- mutate(df, Diff = ` New York ` - ` San Francisco ` )plot2 <- ggplot(df2) +geom_rect(aes(xmin = ` Date ` , xmax = ` Date ` ,ymin = ` New York ` , ymax = ` San Francisco ` ,fill = ` Diff ` )) Finally, Eunice restructures the code to combine the two layerstogether using a concatenation operator. She also fine-tunes someparameters in ggplot2 to improve visualization aesthetics (e.g., mod-ify titles of the axes and change line chart to a step chart), whichgenerates the visualization that matches her design in Figure 3.

HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

Since Eunice is an experienced data analyst, she manages to gothrough these data transformation and visualization step and even-tually generates the desired visualization. However, a less experi-enced data analyst, Amelia, finds the visualization task challenging. • First, Amelia is not familiar with the ggplot2 library, so shestruggles in identifying the right functions to use. For example,it is difficult for her to distinguish between geom_path and geom_line , and geom_bar or geom_rect . She is also unfamiliarwith how to compose multi-layered visualizations. • Second, due to her lack of experience with ggplot2, she findsit difficult to conceptualize the expected input layout becausedifferent functions and tasks require different data layouts. • Finally, due to her lack of experience with tidyverse, she needsto spend significantly more time in finding the right operatorsand implementing the desired transformation.

Now we show how Amelia, a less experienced data analyst, usesFalx (Figure 5) to create the same visualization.First, Amelia uploads the input data to Falx’s input panel (Fig-ure 5- ○ ) and examines the input data displayed in a tabular view.Amelia decides to first visualize temperature trends of the twocities using a line chart. Amelia goes to the demonstration panelto demonstrate how the first two data points of New York tem-peratures will be visualized. To do so, Amelia first clicks the “+”icon in the interface and select a line element (Figure 6- ○ ), andFalx pops out an editor panel for Amelia to specify properties ofthis line element. Amelia clicks on values in the input table andcopies the values to specify properties of the line element as follows(Figure 6- ○ ): • The line segment starts at the point with 𝑥 = , 𝑦 = . (New York temperature on 2011-10-01) • The line ends at 𝑥 = 𝑦 = . (New York tem-perature on 2011-10-05) • The color of the line is labeled as “

New York ”After saving the edits, Falx registers the example and provides apreview that visualizes the example line segment (Figure 6- ○ ) forAmelia to examine. Using this example, Amelia conveys the follow-ing visualization idea to Falx: “I want a line chart over the inputdata that contains the demonstrated line segment”. Amelia thenpresses the “Synthesize” button (in Figure 5- ○ ) to ask Falx to findthe desired line chart. Internally, Falx first infers the visualizationspecification and then runs a data transformation synthesizer totransform the input data to match the visualization specification.After approximately four seconds, Falx finds two visualizations thatmatch the example and displays them in the bottom of the explo-ration panel (Figure 5- ○ ). Both visualizations contain the exampleline segment demonstrated by Amelia but they generalize the ex-ample differently: the first visualization only visualizes New Yorktemperatures, while the second generalizes the color dimension toother columns in the input data as well, resulting in a visualizationthat also contains San Francisco temperatures.After briefly examining both candidates, Amelia finds the secondvisualization closer to the design in her mind, so she clicks thesecond visualization to enlarge it in the center view for a detailedcheck (Figure 5- ○ top). In the center view, Amelia hovers on the visualization to check details like values of different points in eachline. After confirming the visualization matches her design, Ameliamoves on to the second layer visualization, which should displaytemperature differences between the two cities using a series ofbars.Next, Amelia creates an example bar to demonstrate how thetemperature difference between the two cities on shouldbe visualized (Figure 7 left): the bar is positioned at date ,it starts at . (San Francisco temperature), ends at . (NewYork temperature), and its color shows the temperature differenceof . for that day. Amelia runs the synthesizer to find visualizationsthat contain both the example line and the example bar. This time,after 9 seconds, Falx finds 8 candidate visualizations that match theexamples (Figure 7 middle). To decide which visualization to pick,Amelia can either (1) add a second example bar to demonstrate thetemperature difference of the two cities on another date to help Falxresolve the ambiguity, or (2) navigate candidates in the explorationpanel to examine them. Amelia decides to use the second approachagain. She first rules out some obviously incorrect visualizations(e.g., visualization 2 in Figure 7 middle), then compares similarvisualizations, and finally selects the first visualization to check itin detail. After some examination, she decides it matches her designand proceeds to post-process the visualization.The post processing panel ( Figure 5- ○ ) contains a GUI editorthat allows Amelia to fine-tune visualization details and a pro-gram viewer for viewing and editing the synthesized program. Anychanges made during the editing process are directly reflected onthe center view panel (Figure 5- ○ ) to provide immediate feedback.Using the post-processing panel, Amelia changes the line markto step mark and modifies axis titles, which produces the visual-ization in Figure 7 right. Amelia is happy with this visualizationand concludes the task. If Amelia wants to further customize thevisualization (e.g., change color scheme, adjust bar spacing), shecan directly edit the underlying Vega-Lite program.In sum, Amelia creates the visualization by iterating throughcreating examples, exploring synthesized visualizations, and postprocessing. In this process, she benefits from the following designdecisions behind Falx: • First, while two visualization layers require different data trans-formations, Amelia does not need to worry about this, as thetransformation task is delegated to the underlying synthesizer.In fact, even if the input data comes with a different layout,Amelia can still solve the problem with the same examples. • Second, Amelia specifies examples by choosing from a small setof visualization marks and specifying mappings from concretedata values to properties. This allows her to create visualiza-tions without programming in the visualization grammar. • Third, instead of asking Amelia to read synthesized programsto disambiguate synthesis results, Falx provides an explorationinterface that allows Amelia to explore and examine results inthe visualization space. • Finally, Falx adopts a scalable synthesis algorithm to explore theexponential number of possible ways to transform and visualizethe input data. Each synthesis run takes between 3 and 20seconds, which makes Amelia conformable at iterating betweengiving examples and exploring the generated visualizations. alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan

Figure 5: Falx interface has three panels: (1) Data analysts import data and create examples in the input panel. (2) Analystsexplore and examine synthesized visualizations in the exploration panel. (3) Analysts edit visualization details in the postprocessing panel. Figure 6: Amelia creates a line segment to demonstrate the visualization task.

In this section, we first provide a brief review of program synthesisand discuss the design and implementation of Falx, our end-to-endsynthesis tool for automating data visualization tasks.

In recent years, many program synthesis algorithms have beendeveloped to automate challenging or repetitive tasks for end usersby automatically generating programs from high-level specifica-tions (e.g., demonstrations, input-output examples, natural lan-guage descriptions). For instance, programming-by-example (PBE)is a branch of program synthesis that aims to synthesize programsthat satisfy input-output examples provided by the user, such toolsbeen used for string processing [11, 39], tabular data transforma-tion [7, 46, 55], and program completion [12, 26, 32, 40, 41]. While there are different approaches to synthesize programs, onecommon method is to perform enumerative search over the spaceof programs by gradually expanding programs from a context-free grammar of some language [1, 8, 45, 55]. In general, thesesearch techniques traverse the program space according to somecost metric and return the candidate programs that satisfy theuser-provided specification. Here, the cost metric can be a modelthat measures simplicity of programs (e.g., based on number ofexpressions in the program) [8] or a statistical models that estimatelikelihood of the program being correct [2, 32]. To speed up thesynthesis process, several recent methods use deduction rules toprune incorrect partial programs early in the search process [7,8]. For instance, Morpheus [7] uses predefined axioms of tableoperators to detect conflicts before the entire program is generated.

HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

Figure 7: Amelia’s interaction with Falx to create the second layer visualization.

The architecture of Falx is shown in Figure 8. To use Falx, a dataanalyst first provides an input table and creates examples to demon-strate the visualization idea. Once the analyst hits the “synthesize”button, the Falx interface sends the input and examples to the Falxserver. Given an input data and an example visualization (in theform of a set of geometric objects), Falx synthesizes pairs of can-didate data transformation and visualization programs such thatthe resulting visualization contains all geometric objects in thevisualization example.To synthesize visualizations consistent with examples from theuser, Falx spawns multiple solver threads to solve the synthesisproblem in parallel. In each solver thread, Falx first runs a visualiza-tion decompiler (step 1) to decompile the example visualization intoa visualization program and an example table, such that applyingthe program on the example table yields the example visualiza-tion provided by the user. Then, Falx calls the data transformationsynthesizer (step 2) to infer programs that can transform the in-put data to a table that contains the example table generated instep 1. Finally, for each candidate data transformation result, Falxgenerates a candidate visualization (step 3) by combining the trans-formed data with the visualization program synthesized in step 1and compiling them to Vega-Lite or R scripts for rendering. Synthe-sized visualizations from all threads are collected and displayed inFalx’s exploration panel for the analyst to inspect. In what follows,we elaborate on the details of each step using the same runningexample in Section 2.

Step1: Visualization Decompilation.

Internally, Falx represents visu-alizations as a simplified visualization grammar similar to ggplot2and Vega-Lite. In this grammar, a visualization is defined by (1)graphical marks (line, bar, rectangle, point, area), (2) encodings thatmap data fields to visual channels ( 𝑥 , 𝑦 , size, color, shape, column,row), and (3) layers, which specify how basic charts are combinedinto compositional charts. Since Falx only uses this grammar as anintermediate language to capture visualization semantics, visual-ization details (e.g., scale types) are intentionally omitted. Falx goesthrough the following three steps to decompile a visualization. • Falx first infers visualization layers from the user example.In particular, Falx partitions examples provided by the user into groups based on their geometric types and properties,and creates one visualization layer for each group. Each layercorresponds to a simple chart of a particular type (e.g., scatterplot, line chart). • Then, for each layer, Falx creates one basic visualization and anexample table. The example table contains the same number ofcolumns as the number of visual channels in this layer (derivedfrom properties of geometric objects), and the visualization isspecified as encodings that map columns in the example tableto visual channels. • Finally, for each example table, Falx fills the table with valuesfrom the example geometric objects.

Example 3.1.

As shown in Figure 8- ○ , given the two visual ele-ments provided by the user, Falx infers that the desired visualizationshould be a multi-layer chart that is composed by a line chart inlayer 1 and a bar chart in layer 2 and decompiles the two layersindependently. For example, for the second layer, Falx generatesa bar chart program Bar { 𝑥 ↦→ C1 , 𝑦 ↦→ C2 , 𝑦 ↦→ C3 , color ↦→ C4 } with an example table 𝑇 = [( , . , . , . )] where 𝑇 represents the desired output table that should be the result of thedata transformation process. Column names C1 , ..., C4 in the barchart program correspond to names of the four columns in Table 𝑇 . Step 2: Data Transformation Synthesis.

After decompiling the exam-ples into the visualization program and example tables 𝑇 , togetherwith the original input table 𝑇 in provided by the user, Falx reducesthe visualization synthesis task into a data transformation synthesistask [7, 46, 47]. For each example table 𝑇 , the data transformationsynthesizer aims to synthesize a transformation program 𝑃 𝑡 thatcan transform the input table into a table that contains the exampletable, i.e., 𝑇 ⊆ 𝑃 𝑡 ( 𝑇 in ) . Falx supports various types of transforma-tion operators commonly used in the tidyverse library to handledifferent layouts of the input from the user (Figure 9).The data transformation synthesizer uses an efficient algorithmto search for programs that are compositions of operators in Fig-ure 9 satisfying the requirement 𝑇 ⊆ 𝑃 𝑡 ( 𝑇 in ) . Falx starts the searchprocess by constructing sketches of transformation programs (i.e.,programs whose arguments are not filled) and then iteratively ex-pands the search tree and fills arguments in these partial programs.To maintain efficiency in this combinatorial search process, Falx alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan Figure 8: The architecture of the Falx system. Each solver thread synthesizes visualizations that match user examples in threesteps: (1) visualization decompilation, (2) data transformation synthesis, and (3) program generation.

Type Operator DescriptionReshaping pivot_longer

Pivot data from wide to long format pivot_wider

Pivot data from long to wide formatFiltering select

Project the table on selected columns filter

Filter table rows with a predicateAggregation group

Partition the table into groups based on values in selected columns summarise

For every group, aggregate values in a column with an aggregator cumsum

Calculate cumulative sum on a column for each groupComputation mutate

Arithmetic computation on selected columns separate

String split on a column unite

Combine two string columns into one with string concatenation

Figure 9: Data transformation operators supported in Falx. For clarity, we omit the parameters of each operator. uses deduction to prune infeasible partial programs as early as possi-ble (as used in prior work [7, 46, 47]). The deduction engine analyzesproperties of partial programs using abstract interpretation [5] andprunes programs whose analysis results are inconsistent with theexample output. Since each partial program corresponds to severaldozens of concrete programs, the deduction engine can dramaticallyprune the search space.When the search algorithm encounters a concrete program (i.e.,with all arguments are filled) that is consistent with the exampleoutput, Falx adds the program to the candidate pool. The searchprocedure terminates either when the designated search space is ex-haustively visited or when the given search time budget is reached.All synthesized program candidates are sent to the post-processorto generate visualizations.

Example 3.2.

Figure 8- ○ shows the data transformation synthe-sis process for the second visualization layer (the bar chart) gener-ated in step ○ . Given the original input table 𝐼 (with three columns Date , SF , and NY ) the output table 𝑇 (with four columns C1 , C2 , C3 , and C4 ) generated in the last step, Falx aims to transform 𝐼 intoa table that contains the example table 𝑇 . Starting from an emptyprogram, Falx iteratively expands the unfilled arguments (repre-sented as holes “ □ ”) in the partial programs to traverse the searchspace. When Falx encounters a partial program cumsum ( 𝐼, □ ) , Falxabstractly analyzes it and concludes that it is infeasible because cumsum cannot transform an input table with three columns into anoutput table with four columns. Falx the expands the feasible partialprograms (e.g., mutate ( 𝐼, □ ) ) and collects concrete programs thatare consistent with the objective (e.g., mutate ( 𝐼, Diff = NY − SF ) ). HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

Optimization.

We made several optimizations on top of existingsynthesis algorithms [7, 47] to reduce Falx’s time to respond. First,the major overhead in synthesis is the cost of analyzing partialprograms using abstract interpretation, as it often requires runningexpensive operators like aggregation and pivoting on big tables. Toreduce this overhead, Falx memoizes abstract interpretation resultsfor partial programs to allow reusing then whenever possible.Second, instead of aiming to find only on or a few candidateprograms that match user inputs like prior algorithms, Falx expectsto find as many different programs as possible that satisfy theexamples to ensure the correct visualization is included. To ensurediverse outputs, different Falx solver threads start with differentinitial program sketches to search for different portions of the searchspace in parallel. To improve responsiveness, Falx sets differenttimeouts for different threads to allow faster threads to respondto the user while other threads are searching for more complextransformations. In our implementation, we run 2 solver threadsin parallel, we set one thread with 5 seconds timeout and anotherwith 20 seconds timeout based on our perception of how long ananalyst would be willing to wait as well as the typical time Falxtakes to finish traversing different parts of the search space.

Step 3: Processing Synthesized Visualizations.

As the final step invisualization synthesis, Falx generates visualizations by combiningthe visualization program generated in step 1 with table transfor-mation programs generated in step 2.Concretely, for each data transformation program, Falx appliesthe table transformation program on the input data to obtain atransformed output and unifies the output table schema with theschema in the visualization program, since the visualization pro-gram was filled with placeholder column names C1 , C2 , ..., etc. Falxthen instantiates other visualization details (e.g., scale type, axisdomain, etc.) omitted in the visualization grammar and compilesthe visualization program into a Vega-Lite (or R) script throughsyntax-directed translation. For example, in Figure 8- ○ , Falx gen-erates an R script that both transforms the input and specifies thevisualization. Furthermore, Falx notices that the values on the 𝑥 -axis are dates instead of strings, so it changes the 𝑥 -axis scale to atemporal scale using the function “ scale_x_date() ”.After compilation, the post-processor removes semantically du-plicate visualizations (i.e., visualizations with different specifica-tions but with the same content and detail). Finally, Falx groups andranks the visualizations based on the complexity of the programs(numbers of expressions). In this way, similar visualizations aregrouped together to make comparison easier in the explorationprocess, and the complexity ranking allows users to explore visu-alizations constructed from easier transformation programs firstbefore jumping into complex ones. These visualizations are sent tothe user interface for rendering to allow user exploration. To understand Falx’s benefits and limitations and to examine howanalysts might adopt synthesis-based visualization tools, we con-duct a between-subjects evaluation centered on the following ques-tions: • Does Falx improve user efficiency in creating visualizationscompared to a baseline tool? • How does Falx change the visualization authoring process fordifferent data analysts? • What strategies do data analysts use to visualize data in Falx?

We recruited two groups participants for the study: 16 participants(10 M, 5 F, 1 Unknown, Ages 23-51) for the Falx study, and another17 participants (12 M, 4 F, Ages 19-60) for the baseline tool study (theR programming language). In the recruiting process, we screenedparticipants by their ability to read a sample visualization. For thebaseline group, we additionally required that all participants haveexperience with R (specifically ggplot2 and tidyverse libraries) fordata visualization.Participants reported their experience in data visualization au-thoring based on the number of visualizations they created in thepast 6 months using any tools. For the Falx study group, there were6 participants experienced with some visualization tools (created>10 visualizations), 8 with moderate experience with visualizationtools (created 1-10 visualizations), and 2 participants with zeroexperience in creating visualizations in the past. For the baselinegroup, there were 8 experienced participants (create >10 visualiza-tions) and 9 participants with moderate experience (created 1-10visualizations).

Each participant was asked to complete four visualization tasks,where the Falx study group completed the task using Falx andthe baseline group used R to complete the task. We chose R asthe baseline tool due to its popularity among data analysts and itsability to support both data transformations and visualizations inthe same context, where many other visualization tools requiresusers to process data and specify visualizations in different contexts.To better examine the use of Falx, participants in the Falx groupfirst completed a 20-minute tutorial together with a warm-up taskwith a sample solution (creating a grouped line chart to visualize seaice level change in the past 20 years). After the tutorial, participantswere asked to solve four visualization tasks. For R participants,we also provided the same warm-up task with a sample solutionto allow users to get familiar with the environment and the dataloading process, so that participants could focus on solving thevisualization tasks. During the user study, participants were allowedto refer to any resource on the Internet including documentationsand QA forums. We collected screen and audio recordings whileparticipants completed tasks. We then interviewed them after alltasks were completed to reflect on their visualization process andstrategies.To conduct our user study, we developed four different visual-ization scenarios (Figure 10):(a)

Disaster Impact : A scatter plot that visualizes the number ofpeople died from five disasters in the last century.(b)

Electric Usage : A faceted heat map for hourly electric usage ineach day during the first two months of 2019.(c)

Car Sales : A waterfall chart for the number of cars sold in ayear. Each bar starts at the sales value in the previous monthand ends at the sales values in the month, and its color gradientreflects the increase/decrease compared to the last month. alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan (a) Disaster impact (b) Electric usage (c) Car sales (d) Movie awards winners

Figure 10: Study tasks. (d)

Movie Awards : A layered line/scatter plot for visualizing win-ners of all four prestigious movie awards. For each celebrity,there are four points showing years these awards were earnedand a line showing the time span for the celebrity to win allfour awards.For each visualization task, we provided as input a table that canbe directly imported into the tools. We also explicitly describedvisualization designs to the participants in text so that participantscould focus on implementation. Finally, we asked participants thatthey do not need to optimize the design — a task was consideredcorrectly solved as long as the semantics of the visualization cre-ated by the participant matched the example solution regardlessof the process and details. In this study, we did not restrict thetime participants could spend on each task, but we provided usersthe option of quitting a task after spending more than 20 minuteswithout success. Thus, participants could complete each task withone of three outcomes: (1) submit a correct solution, (2) submit awrong solution, or (3) give up after trying for at least 20 minutes.We interviewed each participant after they finished all four tasks.For both Falx and baseline groups, we interviewed participantsabout (1) challenges they encountered while solving the tasks andtheir solutions, (2) common errors they made and how they fixedthem, (3) their confidence about the solutions they submitted andwhat checks they performed to ensure correctness, and (4) what ad-ditional resources they used during the study and how they helped.We additionally asked participants in the Falx group to reflect ontheir visualization authoring process and interviewed them about(1) strategies adopted when creating examples to demonstrate thevisualization task, (2) strategies adopted to explore the synthesizedvisualizations, and (3) their prior visualization experience and howFalx could potentially fit in their routine work.The total session was less than 2 hours for all participants. Toaddress learning effects or other carryover effects, we counterbal-anced the tasks using a Latin square. We performed our analysisusing mixed effect models, treating participants as a random effectand modeling tool, tasks, and experience level as fixed effects.

Figure 11 shows the percentage of participants that correctly fin-ished each task. Falx participants generally had higher completionrates in all tasks. We observed a statistically significant differencein the completion rate in the car sales visualization ( 𝑝 < . ); Task R ( 𝑁 = ) Falx ( 𝑁 = ) 𝑛 % 𝑛 %Disaster Impact 16 94.1% 14 87.5%Electric Usage 13 75.6% 14 87.5%Car Sales 5 29.4% 11 68.8%Movie Awards 14 82.4% 16 100% Figure 11: The number and percentage of participants cor-rectly finished each study task.Figure 12: Violin plot showing the amount of time partici-pants spent on each task for both Falx and R study groups. others were not significant. Among nine failed tasks by Falx users,seven were due to incorrect solutions and, in two cases, partici-pants quit the task after 20 minutes. Among 20 failed cases in the Rstudy group, there were 9 incorrect solutions ans 11 cases whereparticipants quit after 20 minutes.Figure 12 shows task completion time in Falx. Using Wilcoxonrank sum test with Holm’s sequential Bonferroni procedure for 𝑝 value correction, we observed a significant improvement in userefficiency for car sales visualization ( 𝑡 Falx = ± 𝑠 , 𝑡 R = ± 𝑠 , 𝜇 R − 𝜇 Falx = 𝑠 , 𝑝 < . ) and electric usage visualization( 𝑡 Falx = ± 𝑠 , 𝑡 R = ± 𝑠 , 𝜇 R − 𝜇 Falx = 𝑠, 𝑝 < . ). While Falx participants were also generally faster in theother two tasks, there was no significant difference for the movie We use 𝑡 Falx and 𝑡 R to show the mean and standard deviation of time participants inFalx and R groups spent on each task. We use 𝜇 R − 𝜇 Falx to represent the differenceof the mean time between the two groups.

HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al. industry celebrity visualization ( 𝑡 Falx = ± 𝑠 , 𝑡 R = ± 𝑠 , 𝜇 R − 𝜇 Falx = 𝑠, 𝑝 = . ) or the disaster impact visualization( 𝑡 Falx = ± 𝑠 , 𝑀 R = ± 𝑠 , 𝜇 R − 𝜇 Falx = 𝑠, 𝑝 = . ).Participants from the R study group noted that the key reasonsfor failing on the car sales visualization task was the difficultyof finding the correct API (for waterfall chart) together with thecomplex transformation behind it (which required calculating acumulative sum). Falx users also noted they found the car salesvisualization difficult due to unfamiliarity with the visualizationtype. On the other hand, R users reported that the movie awardsvisualization and the disasters impact visualization were relativelyeasier since they expected the same pivot operator to transformthe input, which is commonly encountered by R users, and thevisualization types were relatively standard (line chart and scatterplot).We found no significant interaction between user experiencelevel (defined in Section 4.1) and task completion time ( 𝑝 = forall tasks in both study groups using Wilcoxon rank sum test withHolm’s sequential Bonferroni correction). In this section, we describe qualitative feedback from participantsin both groups about general (non-Falx related) visualization chal-lenges both during the study and in their daily work, and how Falxcan help with solving some of these challenges. We leave discus-sions of Falx-specific visualization challenges to Section 4.5.As described in Section 4.2, we conducted a semi-structuredinterview for participants of both groups about visualization chal-lenges they encountered both in the study and in their daily work,and how some of these challenges are typically overcome. To ana-lyze this data, two of the researchers collaboratively conducted aqualitative inductive content analysis on the interviewer’s notes,with a sensitizing concept of visualization challenges and solutions .In this process, two researchers independently labeled interviewnotes and then collaboratively discussed and compared high levellabels to resolve disagreements in the initial codes.

The first challengefrequently mentioned by participants was discovering or recallingthe correct visualization function. In the R study group, 14 out of17 participants described this challenge, especially for the car salestask that most participants failed on. Some participants noted thatthe difficulty came from both finding the right term to search anddistinguishing similar candidate functions. For example, participantR14 noted that “I wasn’t aware that geom _ rect () would be morehelpful than geom _ bar () . One thing that made it more challengingwas the fact that this kind of bar chart has no proper name. I triedsearching ‘non-contiguous bar charts in R’, but I didn’t get manyuseful results.” . These challenges are also common in compositionalcharts: R10 noted “creating the line with the dots is something Inever did before so didn’t know how to achieve it” . To address thesechallenges, participants noted that online example galleries andforums are “essential to their work” (R1). Besides, two participantshad “an internal file – R code dictionary” (R7) and “a collection ofsome own code snippets” (R1) to reduce search effort. We use R1-R17 to denote participants from the R study group and F1-F16 to denoteparticipants from the Falx group.

Falx group participants also described that they faced similarchallenges of finding right functions in their daily work and Falxcould help address them. For example, F1 mentioned: “Falx cangenerate something that you cannot easily do. For example, the multi-layered visualization for the movie dataset would be very difficult todo in Excel or Google doc, you may need to specify some formula tospecify relationship between two layers.”

Participant F11 mentionedthat Falx helped with complex tasks because “It allows you to startby creating a relatively simple visualization in the beginning, whichis good, then it allows you to build more complex stuff on top of itwhich is also helpful.”

Data transformation was another fre-quently mentioned challenge, including both conceptualizing theexpected data layout and implementing the transformation. Forexample, R17 mentioned “it [the car sales task] also seems to requiresome extra aggregation to get the starting and ending value for eachrectangle to be drawn, which makes it even more difficult.”

About im-plementation, R9 said that “the vocabulary of the tidyverse is criticalfor trying to do what you want to do, otherwise it is all impossible toachieve.” , and R14 mentioned that “I had an idea of what I needed todo, but I wasn’t able to search the right things on Google to arrive ata useful code snippet for it.”

Participants from the Falx group mentioned similar issues intheir work routine. For example, “Tableau won’t do data preparationand you need to manually put them together” (F7), “pivoting tableis already something at an intermediate level in Tableau and manypeople cannot use it” (F2). Due to lack of skill of preparing dataprogrammatically, some participants would do it manually. Forexample, “if I need to pivot data, I do it manually – e.g., just copythe data to a blank area [in Excel] and pivot it” (F8). Participantsappreciated that Falx automatically handled data transformations.Participant F5 mentioned “I like the fact that it [Falx] solves the datatransformation and visual encoding. I’m pretty familiar with visualencoding so it is fine when the data is in the right shape. But I findtransforming data annoying.”

Participant F15 mentioned “I didn’tthink about data format at all in the process” . F7 mentioned “Tableauwon’t do data preparation because you need to manually put themtogether and drag drop them for you. Falx is pretty automated onthis.”

Due to the in-herent challenge in visualization and data transformation in thesetools, participants mentioned many of existing tools had a learningbarrier for new users. For example, F4 mentioned that “the learningcurve is pretty steep (Tableau), and we spent a lot of time learningthese tools” . On the other hand, while Falx was a new visualizationtool, most users found it easy to learn, despite some users requir-ing some time in the beginning to get used to “the paradigm shiftfrom my normal understanding” (F6). For example, participant F4mentioned that “the ramp up time [for Falx] is pretty short and it’spretty easy to use.” , and F6 mentioned that “anyone with basic Excelknowledge should be able to use Falx” . Since Falx is a new tool for data visualization, besides understand-ing its ability to address existing visualization challenges, we also alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan investigated how participants used Falx to solve visualization tasks.We conducted an inductive content analysis on the interviewer’snotes about Falx experience similar to that in Section 4.4. In thissection, we discuss observations about participants’ visualizationprocess in Falx and their indications for future synthesizer-basedvisualization tool design.

Data analysts initiate inter-actions with Falx by creating examples. As a synthesis-poweredvisualization tool, poorly constructed examples can be highly am-biguous and lead to long running time and a large number of visu-alization candidates. Also, while users can carefully create multipleexamples to increase Falx’s performance, it requires more effort.Falx users identified the following strategies to create exampleseffectively: • Sketching visualizations before demonstration:

Three partici-pants mentioned that sketching the visualization design onpaper helped them understand geometry of the visualization,and it helped them creating better examples. For example, par-ticipant F13 mentioned “I sketch out first to get a general under-standing of what the visualization would look like, and then usethat to drop points.” . • Selecting representative data points to demonstrate:

Seven par-ticipants mentioned that they considered using “representativepoints” (F7) when creating demonstrations in order to reduceambiguity to Falx. For example, participant F1 mentioned that “[In the disaster impact task], I chose a cause that contains non-zero value in that year, because it’s a unique value that can avoidconfusion of the tool” . • Start from a few examples, add more later if necessary:

Eightparticipants mentioned that they “tried to shoot for minimuminput” (F6) for simplicity. In this way, they can “run the tool to seewhat it returns” (F1) before spending more effort on examples,and they would “add more to help narrow it down if there aremany visualizations pop up” (F9). Additionally, participant F11noted that “It’s easy to add multiple elements to mess up withthe demonstration. A small number of elements make it easier togo back and fix” . • Start with multiple examples to minimize interaction iterations:

Instead of starting from minimal inputs, 6 participants preferredto create more examples in the beginning to “avoid ambiguity” (F2). They remarked that “it doesn’t take that much time to adddata points” (P8) and multiple examples can “avoid having towait and choosing from multiple solutions” (F8).During the process of creating and revising examples, sevenparticipants found the demo preview panel useful since it allowedthem to “ understand more about how a certain layout would look like” (F11) and it “helps put me on the right track of solving the task.” (F13).However, nine participants said they did not find it helpful becausethey “don’t know if it tells enough to help understand anything [aboutsynthesis results]” (F7); they preferred to “just click synthesis to getthe result since synthesis is pretty fast” (F14).Some challenges participants encountered in creating examplesincluded (1) unfamiliarity with terms in Falx (e.g., F4 mentioned “‘size’ is a term that I’m not familiar with.” ) and (2) not getting usedto demonstrate visualization ideas using values (e.g., F6 mentioned ‘’I was struggling with the paradigm shift about when to use values and when to use table headers” ). In general, the fast response timeof Falx enabled participants to get over these challenges throughtrial and error (e.g., F1 mentioned “If there is anything wrong, I’llgo back and do edits on the points.” ), and they “get faster in latertasks once understand the difference” (F6). In future, Falx could adopta mixed-initiative interface [17] to improve experience for newusers. In addition, we observed that many participants felt likethey were interacting with an intelligent tool (e.g., F13 mentioned “the tool is quite good at learning from what I demonstrated” ) andthey were willing to provide more informative inputs (e.g., F16 “tried to write the expression because I don’t know how Falx woulddo computation” ). In future, Falx could take advantage of this tosupport more complex visualization tasks by synthesizing programsfrom users more informative inputs besides examples (e.g., formulasthat describe how certain values in the examples are derived fromthe input).

After creating ex-amples to demonstrate the visualization task, users interact withFalx to explore the synthesized visualizations and identify the de-sired solution. Prior work [22, 27] has shown that a main barrierfor adoption of synthesis-based programming tools is that usershave difficulty understanding and trusting synthesized solutions,especially when there are many solutions consistent with the userdemonstration.We discovered from the interview that many participants sharedthe following similar 4-step process to select the desired visualiza-tion from synthesized visualizations by investigating visualizationfrom coarse to fine: • Step 1: Check against the high-level picture.

First, participantsnoted that it was easy to quickly exclude many visualizationsthat are obviously far from the desired visualization. For ex-ample, “having too many options is a bit overwhelming, but justkeeping in mind what the result you look like can help narrowdown the solution” (F11). • Step 2: Check axes and invariants.

After excluding the obviouslywrong solutions, participants often investigate domains andranges of each axis to further refine synthesis results. For ex-ample, “I first looked at color labels, I noticed they tend to bewrong in wrong visualizations – e.g., some charts only contain 2labels instead of 4” (F16). • Step 3: Compare similar visualizations.

Then, participants in-vestigated similar visualizations to find their difference. Forexample, “In the electric case, there is one mistake [in a candidatevisualization] with 2019 showing up on y axis, it’s small and notobvious. But then, I was able to tell the difference by comparingthe two visualizations directly, and notice that year showed upin the ’hour’ field” (F2). • Step 4: Inspect visualization detail.

Finally, participants “checkcarefully about the values to make sure they are correct” (F5).An example of such detailed checking is to check values inthe chart against known values in the input data: “if there isa specific value that I know is correct – for example, in the lastexample (disasters), I knew the total death for 1961 was, then Ihover over the output to check if the value is correct ” (F6).

HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

After these steps, participants were confident about the result. Infact, while participants mentioned that their confidence about solu-tions could be negatively affected by unfamiliarity of visualizationtypes (e.g., F9 mentioned “ I don’t do much heatmap so I’m less con-fident” ), they mentioned that the checking process can raise theirconfidence about the chosen solution. For example, participantsgot more confident after “comparing them [candidates] with mysketch” (F6), “looking at solutions and finding their difference” (F14),or “checking details” (F2). They further noted that in many cases, “it’s almost impossible for Falx to get it wrong because these valuesare all pretty unique” (F14). In general, participants found the ex-ploration panel “quite useful” because it “allows to choose the bestvisualization out of that” (F7).In sum, Falx’s exploration panel allowed users to directly inspectsolutions in the visualization space following a coarse-to-fine pro-cess, which helped them to disambiguate solutions and trust thechosen results. In the future, Falx’s interface could be improvedto augment users’ exploring strategies. For example, Falx coulddirectly summarize the differences among the synthesized visual-izations to allow users to make comparisons easier. Also, Falx’scenter view panel could support displaying traces that show howproperties of each geometric object are derived from the input,which could make the synthesis process more transparent andmake checking details easier.

Finally, participants reflected on how Falx might fit into their work-flow. For example, F13 mentioned “I’ll absolutely use this if this isa product. Even as it is now I’ll use it” . Participants found severalscenarios that Falx can be helpful. • Create visualizations for discussions and presentations. Forexample, F1 noted that “visualizations generated by Falx canmeet standards of presentation slides” and “Falx can generatesomething that you cannot easily do in Excel” . • Prototyping complex analysis. For example, F16 mentioned “Falx is very useful in the prototyping stage because it’s veryfast to use.”

F7 further noted that they can “take a sample tovisualize and then extend to the full visualization” using Falx foranalyzing big datasets. • Benefit non-experienced users. Six participants mentioned thatFalx can be “more beneficial to new users that cannot createcharts” (F2). Also, Falx can be “a good teaching tool to helppeople understand data” (F7). • Reduce team collaboration effort. Participant F11 described thatvisualization readers were often different from visualizationcreators in their team, and modifying visualizations requiredteam efforts. F11 mentioned that Falx could help with it: “aperson presents me with a visualization, but I want to view some-thing differently. Instead of getting back to the person to re-do it,I can probably just use Falx, which would be more efficient.”

However, several participants also mentioned Falx may not fitwell to their current workflow when they need “very high standardvisualizations” (F1) that requires extensive customization. Anotherlimitation of the current version of Falx is the lack of “deep integra-tion with other tools” (F1), e.g., database for handling big datasetsand data cleaning tools for “handling null / dirty data” (F4). But in general, participants thought that Falx would be helpful when usedin the right scenarios and “would be pretty interesting to try Falx insome of these tasks” (F5).

Falx builds on top of prior research on grammar based visualizationtools, data transformation tools, program synthesis algorithms andautomated visualization design systems.

Grammar-based Visualization.

Following the initial publicationof the Grammar of Graphics [52], high level grammars [37, 42, 50]for data visualizations have grown increasingly popular as a way ofsuccinctly specifying visualization designs. In contrast to low levelvisualization languages like Protovis [4], D3 [13], and Vega [38] thatare designed for creating highly-customizable explanatory visual-izations, these high level grammars aim to enable analysts to rapidlyconstruct expressive graphics in exploratory analysis. For example,ggplot2 [49, 50] and Vega-Lite [37] are two visualization grammarsthat allow users to specify visualizations using visual encodings.In both tools, low level visualization details are handled by defaultparameters unless users want customization. Tableau [42] adopts agraphical interface approach to enable users to rapidly create viewsto explore multidimensional database. In Tableau, users drag-and-drop data variables onto visual encoding “shelves”, which are latertranslated into a high-level grammar similar to ggplot2. These toolsexpect the input data layout to match the design such that (1) eachrow corresponds to a graphical object, and (2) each column can bemapped to a visual channel. In practice, the mismatch between thedesign and the input data layout is common, which raises a barrierfor creating visualizations [9, 53].Falx formalizes visualizations in the same way, and synthesizedprograms are compiled to ggplot2 or Vega-Lite for rendering. Falx’suser interface also inherits the expressiveness and simplicity ofGrammar of Graphics design, by allowing users to create exam-ples of visual encodings to demonstrate visualization ideas. Themain difference is that Falx relaxes the constraints on input datalayout and allows users to use layout-independent examples todemonstrate visualization ideas. Falx then automatically infers thevisualization spec and synthesizes data transformations to matchthe data with the design from the examples, which saves users’construction efforts.

Data Transformation Tools.

The need to prepare data for statisti-cal analysis and visualization has led to the development of manytools for data transformation [6, 17, 31, 51]. Since different analysisobjective requires different layout, users need to frequently trans-form data throughout the analysis process [16, 51, 53]. Potter’sWheel [31] is a graphical interface that allows users to interac-tively choose transformation operators and inspect transformationoutputs. Wrangler [17] is a mixed initiative data transformationtool which can suggest transformations based on the input data.Tidyverse [51] is a data transformation library in R, which allowsusers to interleave data transformation code, analysis code andvisualization code in the same environment to reduce the effortof context switch. Several synthesis-powered data transformation alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan tools [3, 6, 7, 30, 46] have been proposed to help automate data trans-formation. For example, Prose [30] includes several programming-by-example tools that automatically synthesize programs for datacleaning and transformation from input-output examples. Mor-pheus [7] and Scythe [46] are two specialized data transformationsynthesizer with better scalability and expressiveness.Falx inherits the transformation language design in tidyverse [51],and Falx is a realization of prior program synthesis algorithms [7,47] as an interactive system for visualization authoring. Falx’s maindifference from automated data transformation tools is the unifica-tion of the visualization task and transformation tasks. In this way,Falx users do not need to conceptualize expected data layout or fre-quently switch between visualization and data transformation tools.The unification also enables Falx users to easily explore synthe-sis results in the visualization space as opposed to program space,which is considered challenging [27]. Besides data layout transfor-mation, many data preparation tools also support data cleaning(e.g., handling missing data or invalid data) [48], data normalization(collecting non-relational data into relation format) [3], and stringformatting [6, 11, 56]. Falx currently does not support directly visu-alizing dirty or non-relational data. In the future, Falx could workwith these tools to further automate visualization process.

Visualization Automation.

Automated visualization tools [15, 29,35] have been proposed to help data analysts to explore the visu-alization design space. Draco [29] and Dziban [24] use constraintlogic approaches to model design knowledge, and they can recom-mend visualization designs from partial specifications. VizNet [15]uses a deep neural network trained from visualization corpus tosuggest designs. Voyager [54] combines recommendation and ex-ploration for mixed-initiative design exploration. VisExemplar [35]allows users to demonstrate changes in the visualization layoutto explore alternative visualizations designs. Falx is complemen-tary to these design automation tools. Falx allows users to imple-ment visualization designs they have in mind without data layoutconstraints, while design automation tools helps users to explorevisualization designs from a fixed data layout. A combination ofthe two approaches could potentially help users to explore a largervisualizations design space without data layout constraints.

User Interaction with Program Synthesizers.

In general, programsynthesizers can be categorized into exploration tools and imple-mentation tools. Synthesis-based exploration tools aim to gen-erate a large number of solutions from users’ weak constraintsto aid users to explore the search space [29, 43]. For example,Scout [43] is a synthesis-based exploration tool to discover mo-bile layout ideas. In these tools, users interact with an explorationinterface to navigate and save interesting solutions. Implementa-tion tools [6, 7, 11, 30, 46, 56], instead, aim to synthesize programsto help solve a concrete task (e.g., implement a design that a user al-ready have in mind). In these tools, the main interaction objective isto help users to disambiguate spurious programs that happen to beconsistent with the user specification but are incorrect for the fulltask [27]. To solve this challenge, Wrex [6] generates readable pro-grams for users to inspect and edit; Regae [56] and FlashProg [27]interactively ask users disambiguating questions to refine synthesis results; PUMICE [23] lets users collaborate with the agent to recur-sively resolve any ambiguities or vagueness through conversationsand demonstrations.Falx is an implementation tool for data visualization. Falx’s con-tribution to the user interaction model is that Falx brings the explo-ration design (from exploration tools) to address the disambiguationand trust challenges in implementation tools. Allowing users toexplore and examine synthesized programs in the visualizationspace reduces the barrier for user interaction (e.g., users do notneed to be familiar with underlying programs to disambiguate [27])and increases users’ confidence about solutions.

Tools for More Expressive Visualizations.

Besides tools for stan-dard visualization authoring, many visualization tools have beenproposed to let designers create more expressive visualizations.Examples of these tools are Data illustrator [25], Lyra [36], Chartic-ular [33], Data-driven Guides [20], and StructGraphics [44]. Besideshigh-level design layout (e.g., x,y ,column) and standard mark prop-erties (e.g., color, shape), these tools let users customize marks tocreate more expressive glyphs (e.g., compound marks, parametricmarks). These tools expect users to prepare data into a tidy formatto start with, but they support rich visualization designs. Falx, incomparison, supports standard visualization designs but automatesdata transformation.Several design reconstruction tools (e.g., VbD [35], Liger [34],iVolVER [28]) are proposed to let designers create expressive visu-alization by destructing and reconstructing existing visualizationdesigns. Using these tools, users can transform existing visual-izations to new ones by demonstrating desired design changes.Functionally, these tools are design exploration tools that take asinput a visualization design and produce a new visualization design.They differ from Falx because Falx takes data as input and maps itto a visualization design for initial design authoring.There are opportunities to combine Falx with these tools forbetter visualization authoring. Falx can work with expressive de-signs tools to support authoring complex visualizations from non-tidy data: users can first design customized marks using exampledata values, and the tool would automatically synthesize bindingbetween data and these fine-grained mark properties from theseexamples. Falx can also work with design reconstruction tools to al-low users to first use Falx to create initial design from data, and thensubsequently interactively explore new designs by transformingthe initial design.

We have presented Falx, a novel synthesis-powered visualizationauthoring tool that let users demonstrate a visualization design us-ing examples of visual encodings and then receive suggestions forvisualization designs. Our goal was to create a system that does notrequire users to manually specify the visualization or worry aboutdata transformations, thereby improving user efficiency and reduc-ing the learning burden on novice analysts. Our study found thatFalx often achieved these goals: Falx users were able to effectivelyadopt Falx to solve visualization tasks that they could otherwisecannot solve, and in some cases, they do so more quickly. We nextdiscuss some implications of this work in guiding future research.

HI ’21, May 8–13, 2021, Yokohama, Japan Chenglong Wang et al.

Data Layout-Flexible Visualization Exploration.

Besides visual-ization authoring, combining Falx with data exploration tools likeGraphScape [21] Voyager [54], GraphScape [21], VbD [35] and Dz-iban [24] might enable new design exploration tools that allowusers to discover both new relations from the dataset and new de-signs to visualize the them. Using existing design exploration tools,users can explore diverse visualization designs from an input data;but since existing tools generates designs that are specific to theinput data layout, the design space that can be explored is limited.Integrating Falx with these design exploration tools could enablenovel design exploration tools that can assist users to explore de-sign space without being constrained by data layouts. For example,in an anchored design exploration scenario [21, 24, 35], users candemonstrate data layout changes alongside design changes usingthis new tool to incrementally discover data insights from a largerdesign space. Similarly, Falx might also work with visualization rec-ommendation engines [15, 29] to find better designs for the datasetbased on initial visualizations created by users using examples tosuggest data layout independent designs.

Visualization Learning.

As we discovered from our study, usersoften describe existing programming tools as “flexible, powerful”but “having a steep learning curve.” Falx can fill in this gap byhelping data analysts to learn to create visualizations. Since Falxdoes not require its users to have programming expertise, new userscan learn visualization and data transformation concepts usingFalx by first creating visualization using demonstrations and theninspecting synthesized programs. For example, Falx could generatereadable code like Wrex [6] for users to learn to use visualizationAPIs, enabling them to access the flexibility and power of code.

Bootstrapping Complex Data Analysis.

Falx currently focuses oninexperienced data analysts, but it could also potentially benefitexperienced data analysts by bootstrapping complex data analysistasks. For example, data analysts could first create visualizations inFalx and then build complex analyses by iteratively editing synthe-sized programs. To achieve this goal, Falx needs more transparencyand better integration with programming environments. For exam-ple, Falx could expose synthesized programs during the synthesisprocess and allow users to steer the synthesis process to better dis-ambiguate results. Falx could also be integrated into programmingenvironments like mage [19], Wrex [6] or Sketch-n-Sketch [14] tomake program editing easier.All of these possibilities, as well as prior work applying programsynthesis to design (e.g., [29, 43]), suggest a promising future foraugmenting design work with synthesis-based techniques. We hopeFalx provides one exemplar for how to adapt core techniques insynthesis into powerful interactive tools that empower humancreativity.

ACKNOWLEDGMENTS

This work has been supported in part by the NSF Grants ACIOAC–1535191, FMitF CCF-1918027, OIA-1936731, IIS-1546083, IIS-1955488, IIS-2027575, CCF-1723352, the Intel and NSF joint researchcenter for Computer Assisted Programming for Heterogeneous Ar-chitectures (CAPA NSF CCF-1723352), Department of Energy awardDE-SC0016260, the CONIX Research Center, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program spon-sored by DARPA CMU 1042741-394324 AM01, a grant from DARPA,FA8750–16–2–0032, as well as gifts from Adobe, Facebook, Google,Intel, VMWare and Qualcomm. We would also like to thank anony-mous reviewers for their insightful feedback on paper revision.

REFERENCES [1] Aws Albarghouthi, Sumit Gulwani, and Zachary Kincaid. 2013. Recursive Pro-gram Synthesis. In

Computer Aided Verification - 25th International Conference,CAV 2013, Saint Petersburg, Russia, July 13-19, 2013. Proceedings (Lecture Notes inComputer Science, Vol. 8044) , Natasha Sharygina and Helmut Veith (Eds.). Springer,934–950. https://doi.org/10.1007/978-3-642-39799-8_67[2] Matej Balog, Alexander L. Gaunt, Marc Brockschmidt, Sebastian Nowozin, andDaniel Tarlow. 2017. DeepCoder: Learning to Write Programs. In . OpenReview.net. https://openreview.net/forum?id=ByldLrqlx[3] Daniel W Barowy, Sumit Gulwani, Ted Hart, and Benjamin Zorn. 2015. FlashRe-late: extracting relational data from semi-structured spreadsheets using examples.

ACM SIGPLAN Notices

50, 6 (2015), 218–228.[4] Michael Bostock and Jeffrey Heer. 2009. Protovis: A graphical toolkit for visu-alization.

IEEE transactions on visualization and computer graphics

15, 6 (2009),1121–1128.[5] Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A UnifiedLattice Model for Static Analysis of Programs by Construction or Approximationof Fixpoints. In

Conference Record of the Fourth ACM Symposium on Principles ofProgramming Languages, Los Angeles, California, USA, January 1977 , Robert M.Graham, Michael A. Harrison, and Ravi Sethi (Eds.). ACM, 238–252. https://doi.org/10.1145/512950.512973[6] Ian Drosos, Titus Barik, Philip J. Guo, Robert DeLine, and Sumit Gulwani. 2020.Wrex: A Unified Programming-by-Example Interaction for Synthesizing ReadableCode for Data Scientists. In

CHI ’20: CHI Conference on Human Factors in Comput-ing Systems, Honolulu, HI, USA, April 25-30, 2020 , Regina Bernhaupt, Florian ’Floyd’Mueller, David Verweij, Josh Andres, Joanna McGrenere, Andy Cockburn, IgnacioAvellino, Alix Goguey, Pernille Bjøn, Shengdong Zhao, Briane Paul Samson, andRafal Kocielnik (Eds.). ACM, 1–12. https://doi.org/10.1145/3313831.3376442[7] Yu Feng, Ruben Martins, Jacob Van Geffen, Isil Dillig, and Swarat Chaudhuri.2017. Component-based synthesis of table consolidation and transformationtasks from examples. In

Proc. Conference on Programming Language Design andImplementation . ACM, 422–436.[8] John K. Feser, Swarat Chaudhuri, and Isil Dillig. 2015. Synthesizing Data StructureTransformations from Input-output Examples. In

Proc. Conference on Program-ming Language Design and Implementation . ACM, 229–239.[9] Malu AC Gatto. 2015. Making research useful: Current challenges and goodpractices in data visualisation. (2015).[10] Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In

Proceedings of the 38th ACM SIGPLAN-SIGACT Symposiumon Principles of Programming Languages, POPL 2011, Austin, TX, USA, January26-28, 2011 , Thomas Ball and Mooly Sagiv (Eds.). ACM, 317–330. https://doi.org/10.1145/1926385.1926423[11] Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. In

Proc. Symposium on Principles of Programming Languages .ACM, 317–330.[12] Tihomir Gvero, Viktor Kuncak, Ivan Kuraj, and Ruzica Piskac. 2013. Completecompletion using types and weights. In

Proc. Conference on Programming Lan-guage Design and Implementation . ACM, 27–38.[13] Jeffrey Heer and Michael Bostock. 2010. Declarative Language Design for In-teractive Visualization.

IEEE Trans. Vis. Comput. Graph.

16, 6 (2010), 1149–1156.https://doi.org/10.1109/TVCG.2010.144[14] Brian Hempel, Justin Lubin, and Ravi Chugh. 2019. Sketch-n-Sketch: Output-Directed Programming for SVG. In

Proceedings of the 32nd Annual ACM Sympo-sium on User Interface Software and Technology, UIST 2019, New Orleans, LA, USA,October 20-23, 2019 , François Guimbretière, Michael Bernstein, and KatharinaReinecke (Eds.). ACM, 281–292. https://doi.org/10.1145/3332165.3347925[15] Kevin Zeng Hu, Snehalkumar (Neil) S. Gaikwad, Madelon Hulsebos, Michiel A.Bakker, Emanuel Zgraggen, César A. Hidalgo, Tim Kraska, Guoliang Li, ArvindSatyanarayan, and Çagatay Demiralp. 2019. VizNet: Towards A Large-ScaleVisualization Learning and Benchmarking Repository. In

Proceedings of the 2019CHI Conference on Human Factors in Computing Systems, CHI 2019, Glasgow,Scotland, UK, May 04-09, 2019 , Stephen A. Brewster, Geraldine Fitzpatrick, Anna L.Cox, and Vassilis Kostakos (Eds.). ACM, 662. https://doi.org/10.1145/3290605.3300892[16] Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank Van Ham,Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, andPaolo Buono. 2011. Research directions in data wrangling: Visualizations and alx: Synthesis-Powered Visualization Authoring CHI ’21, May 8–13, 2021, Yokohama, Japan transformations for usable and credible data.

Information Visualization

10, 4(2011), 271–288.[17] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2011.Wrangler: interactive visual specification of data transformation scripts. In

Pro-ceedings of the International Conference on Human Factors in Computing Systems,CHI 2011, Vancouver, BC, Canada, May 7-12, 2011 , Desney S. Tan, Saleema Amer-shi, Bo Begole, Wendy A. Kellogg, and Manas Tungare (Eds.). ACM, 3363–3372.https://doi.org/10.1145/1978942.1979444[18] Sean Kandel, Andreas Paepcke, Joseph M. Hellerstein, and Jeffrey Heer. 2012.Enterprise Data Analysis and Visualization: An Interview Study.

IEEE Trans. Vis.Comput. Graph.

18, 12, 2917–2926. https://doi.org/10.1109/TVCG.2012.219[19] Mary Beth Kery, Donghao Ren, Fred Hohman, Dominik Moritz, Kanit Wong-suphasawat, and Kayur Patel. 2020. mage: Fluid Moves Between Code andGraphical Work in Computational Notebooks. In

UIST ’20: The 33rd Annual ACMSymposium on User Interface Software and Technology, Virtual Event, USA, October20-23, 2020 , Shamsi T. Iqbal, Karon E. MacLean, Fanny Chevalier, and StefanieMueller (Eds.). ACM, 140–151. https://doi.org/10.1145/3379337.3415842[20] Nam Wook Kim, Eston Schweickart, Zhicheng Liu, Mira Dontcheva, Wilmot Li,Jovan Popovic, and Hanspeter Pfister. 2017. Data-Driven Guides: SupportingExpressive Design for Information Graphics.

IEEE Trans. Vis. Comput. Graph.

ACM Human Factors in Computing Systems (CHI) . http://idl.cs.washington.edu/papers/graphscape[22] Tessa Lau. 2009. Why Programming-By-Demonstration Systems Fail: LessonsLearned for Usable AI.

AI Mag.

30, 4 (2009), 65–67. https://doi.org/10.1609/aimag.v30i4.2262[23] Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M. Mitchell,and Brad A. Myers. 2019. PUMICE: A Multi-Modal Agent that Learns Conceptsand Conditionals from Natural Language and Demonstrations. In

Proceedingsof the 32nd Annual ACM Symposium on User Interface Software and Technology,UIST 2019, New Orleans, LA, USA, October 20-23, 2019 , François Guimbretière,Michael Bernstein, and Katharina Reinecke (Eds.). ACM, 577–589. https://doi.org/10.1145/3332165.3347899[24] Halden Lin, Dominik Moritz, and Jeffrey Heer. 2020. Dziban: Balancing Agency& Automation in Visualization Design via Anchored Recommendations. In

Pro-ceedings of the 2020 CHI Conference on Human Factors in Computing Systems .1–12.[25] Zhicheng Liu, John Thompson, Alan Wilson, Mira Dontcheva, James Delorey, SamGrigg, Bernard Kerr, and John T. Stasko. 2018. Data Illustrator: Augmenting VectorDesign Tools with Lazy Data Binding for Expressive Visualization Authoring. In

Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems,CHI 2018, Montreal, QC, Canada, April 21-26, 2018 , Regan L. Mandryk, MarkHancock, Mark Perry, and Anna L. Cox (Eds.). ACM, 123. https://doi.org/10.1145/3173574.3173697[26] David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloidmining: helping to navigate the API jungle. In

Proc. Conference on ProgrammingLanguage Design and Implementation . ACM, 48–61.[27] Mikaël Mayer, Gustavo Soares, Maxim Grechkin, Vu Le, Mark Marron, OleksandrPolozov, Rishabh Singh, Benjamin G. Zorn, and Sumit Gulwani. 2015. User Interac-tion Models for Disambiguation in Programming by Example. In

Proceedings of the28th Annual ACM Symposium on User Interface Software & Technology, UIST 2015,Charlotte, NC, USA, November 8-11, 2015 , Celine Latulipe, Bjoern Hartmann, andTovi Grossman (Eds.). ACM, 291–301. https://doi.org/10.1145/2807442.2807459[28] Gonzalo Gabriel Méndez, Miguel A. Nacenta, and Sebastien Vandenheste. 2016.iVoLVER: Interactive Visual Language for Visualization Extraction and Recon-struction. In

Proceedings of the 2016 CHI Conference on Human Factors in Com-puting Systems, San Jose, CA, USA, May 7-12, 2016 , Jofish Kaye, Allison Druin,Cliff Lampe, Dan Morris, and Juan Pablo Hourcade (Eds.). ACM, 4073–4085.https://doi.org/10.1145/2858036.2858435[29] Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith,Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge asConstraints: Actionable and Extensible Models in Draco.

IEEE Trans. Vis. Comput.Graph.

25, 1 (2019), 438–448. https://doi.org/10.1109/TVCG.2018.2865240[30] Oleksandr Polozov and Sumit Gulwani. 2015. FlashMeta: a framework forinductive program synthesis. In

Proceedings of the 2015 ACM SIGPLAN Inter-national Conference on Object-Oriented Programming, Systems, Languages, andApplications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, Octo-ber 25-30, 2015 , Jonathan Aldrich and Patrick Eugster (Eds.). ACM, 107–126.https://doi.org/10.1145/2814270.2814310[31] Vijayshankar Raman and Joseph M Hellerstein. 2001. Potter’s wheel: An interac-tive data cleaning system. In

VLDB , Vol. 1. 381–390.[32] Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Code completion withstatistical language models. In

Proc. Conference on Programming Language Designand Implementation . ACM, 419–428.[33] Donghao Ren, Bongshin Lee, and Matthew Brehmer. 2019. Charticulator: Inter-active Construction of Bespoke Chart Layouts.

IEEE Trans. Vis. Comput. Graph.

25, 1 (2019), 789–799. https://doi.org/10.1109/TVCG.2018.2865158[34] Bahador Saket, Lei Jiang, Charles Perin, and Alex Endert. 2019. Liger: Com-bining Interaction Paradigms for Visual Analysis.

CoRR abs/1907.08345 (2019).arXiv:1907.08345 http://arxiv.org/abs/1907.08345[35] Bahador Saket, Hannah Kim, Eli T Brown, and Alex Endert. 2016. Visualizationby demonstration: An interaction paradigm for visual data exploration.

IEEEtransactions on visualization and computer graphics

23, 1 (2016), 331–340.[36] Arvind Satyanarayan and Jeffrey Heer. 2014. Lyra: An Interactive VisualizationDesign Environment.

Comput. Graph. Forum

33, 3 (2014), 351–360. https://doi.org/10.1111/cgf.12391[37] Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer.2017. Vega-Lite: A Grammar of Interactive Graphics.

IEEE Trans. Vis. Comput.Graph.

23, 1 (2017), 341–350. https://doi.org/10.1109/TVCG.2016.2599030[38] Arvind Satyanarayan, Ryan Russell, Jane Hoffswell, and Jeffrey Heer. 2016.Reactive Vega: A Streaming Dataflow Architecture for Declarative Interac-tive Visualization.

IEEE Trans. Vis. Comput. Graph.

22, 1 (2016), 659–668.https://doi.org/10.1109/TVCG.2015.2467091[39] Rishabh Singh and Sumit Gulwani. 2016. Transforming spreadsheet data typesusing examples. In

Proc. Symposium on Principles of Programming Languages .ACM, 343–356.[40] Armando Solar-Lezama, Rodric M. Rabbah, Rastislav Bodík, and Kemal Ebcioglu.2005. Programming by sketching for bit-streaming programs. In

Proc. Conferenceon Programming Language Design and Implementation . ACM, 281–294.[41] Armando Solar-Lezama, Liviu Tancau, Rastislav Bodik, Sanjit Seshia, and VijaySaraswat. 2006. Combinatorial Sketching for Finite Programs. In

Proc. Inter-national Conference on Architectural Support for Programming Languages andOperating Systems . ACM, 404–415.[42] Chris Stolte, Diane Tang, and Pat Hanrahan. 2002. Query, analysis, and visu-alization of hierarchically structured data using Polaris. In

Proceedings of theEighth ACM SIGKDD International Conference on Knowledge Discovery and DataMining, July 23-26, 2002, Edmonton, Alberta, Canada . ACM, 112–122. https://doi.org/10.1145/775047.775064[43] Amanda Swearngin, Chenglong Wang, Alannah Oleson, James Fogarty, andAmy J. Ko. 2020. Scout: Rapid Exploration of Interface Layout Alternativesthrough High-Level Design Constraints. In

CHI ’20: CHI Conference on Hu-man Factors in Computing Systems, Honolulu, HI, USA, April 25-30, 2020 , ReginaBernhaupt, Florian ’Floyd’ Mueller, David Verweij, Josh Andres, Joanna Mc-Grenere, Andy Cockburn, Ignacio Avellino, Alix Goguey, Pernille Bjøn, Sheng-dong Zhao, Briane Paul Samson, and Rafal Kocielnik (Eds.). ACM, 1–13. https://doi.org/10.1145/3313831.3376593[44] Theophanis Tsandilas. 2020. StructGraphics: Flexible Visualization Designthrough Data-Agnostic and Reusable Graphical Structures.

IEEE Transactions onVisualization and Computer Graphics (2020).[45] Abhishek Udupa, Arun Raghavan, Jyotirmoy V. Deshmukh, Sela Mador-Haim,Milo M. K. Martin, and Rajeev Alur. 2013. TRANSIT: specifying protocols withconcolic snippets. (2013), 287–296. https://doi.org/10.1145/2491956.2462174[46] Chenglong Wang, Alvin Cheung, and Rastislav Bodík. 2017. Synthesizing highlyexpressive SQL queries from input-output examples. In

Proceedings of the 38thACM SIGPLAN Conference on Programming Language Design and Implementation,PLDI 2017, Barcelona, Spain, June 18-23, 2017 , Albert Cohen and Martin T. Vechev(Eds.). ACM, 452–466. https://doi.org/10.1145/3062341.3062365[47] Chenglong Wang, Yu Feng, Rastislav Bodik, Alvin Cheung, and Isil Dillig. 2019.Visualization by example.

Proceedings of the ACM on Programming Languages

Proc. ACM Program. Lang.

1, OOPSLA (2017),62:1–62:26. https://doi.org/10.1145/3133886[49] Hadley Wickham. 2010. A layered grammar of graphics.

Journal of Computationaland Graphical Statistics

19, 1 (2010), 3–28.[50] Hadley Wickham. 2011. ggplot2.

Wiley Interdisciplinary Reviews: ComputationalStatistics

3, 2 (2011), 180–185.[51] Hadley Wickham et al. 2014. Tidy data.

Journal of Statistical Software

59, 10(2014), 1–23.[52] Leland Wilkinson. 2012. The grammar of graphics. In

Handbook of ComputationalStatistics . Springer, 375–414.[53] Kanit Wongsuphasawat, Yang Liu, and Jeffrey Heer. 2019. Goals, Process,and Challenges of Exploratory Data Analysis: An Interview Study.

CoRR abs/1911.00568 (2019). arXiv:1911.00568 http://arxiv.org/abs/1911.00568[54] Kanit Wongsuphasawat, Dominik Moritz, Anushka Anand, Jock Mackinlay, BillHowe, and Jeffrey Heer. 2015. Voyager: Exploratory analysis via faceted browsingof visualization recommendations.

IEEE transactions on visualization and computergraphics

22, 1 (2015), 649–658.[55] Navid Yaghmazadeh, Christian Klinger, Isil Dillig, and Swarat Chaudhuri. 2016.Synthesizing transformations on hierarchically structured data. In