Characterizing Automated Data Insights
CCharacterizing Automated Data Insights
Po-Ming Law * Georgia Institute of Technology
Alex Endert † Georgia Institute of Technology
John Stasko ‡ Georgia Institute of Technology A BSTRACT
Many researchers have explored tools that aim to recommend datainsights to users. These tools automatically communicate a richdiversity of data insights and offer such insights for many differ-ent purposes. However, there is a lack of structured understandingconcerning what researchers of these tools mean by insight andwhat tasks in the analysis workflow these tools aim to support. Weconducted a systematic review of existing systems that seek to rec-ommend data insights. Grounded in the review, we propose 12 typesof automated insights and four purposes of automating insights. Wefurther discuss the design opportunities emerged from our analysis.
Index Terms:
Human-centered computing—Visualization—Visu-alization theory, concepts and paradigms
NTRODUCTION
Providing insight has been recognized as a main goal of visualiza-tion [10]. However, gleaning data insights from visualization is anon-trivial task that requires domain knowledge, analysis expertise,and visualization literary. To facilitate insight generation, some re-searchers have created systems that automatically communicate datainsights to users [15, 37, 38]. For instance, Quick Insights in PowerBI [15] suggests prominent trends and patterns within a data set thatare presented as charts along with textual descriptions (Fig. 1).Developers of these systems often use the term “insight” to referto the automatically-extracted information (e.g., Quick Insights [15]and Automated Insights [1]). However, “insight” is an overloadedterm that has been applied from multiple perspectives in the visual-ization community [23]. In the seminal work about insight-basedevaluation, Saraiya et al. [31] regard insights as data findings. Onthe other hand, Sacha et al. [30] consider insights a product resultingfrom evaluating data findings with domain knowledge. Lacking aclarification of what insights are in the context of these automatedsystems can create confusion to researchers and hinder communica-tion of ideas within the visualization community.Furthermore, the early development of tools that automate theidentification of data insights was motivated by the sentiment thatdata exploration involves inefficient manual specification of a largenumber of charts. These early tools (e.g., [13, 37]) focus on automat-ically surfacing potentially interesting charts to facilitate exploratoryvisual analysis. More recently, some researchers have exploredthe use of such automated tools beyond data exploration. Theyhave investigated new applications such as focused question answer-ing [22] and communication [35, 38]. Clearly, we are still workingto understand the different purposes of automating data insights.In order to gain a better understanding of automated systemsthat suggest data insights, we conducted a systematic review ofpublications that describe these systems. Based on the review of 20relevant papers, we propose a framework to organize the types ofautomatically-generated insights ( auto-insights ) and the purposes of * e-mail: [email protected] † e-mail: [email protected] ‡ e-mail: [email protected] Figure 1: The Quick Insights interface [15]. Many visualization sys-tems aim to automatically provide data insights to users. providing auto-insights. We further discuss four design implicationsbased on the analysis. Our work can shed light on the existinglandscape of tools that seek to recommend data insights to users.
ETHODOLOGY
While some researchers regard insight as a product of human in-terpretation of data, in this paper, we define auto-insights as dataobservations revealed by automation (as opposed to human interpre-tation). This perspective of insight corresponds to Saraiya et al.’sdefinition of an insight as a unit of data discovery [31] and the waysome laypeople consider data insights [4].We focused on two forms of auto-insights: textual and visualinsights. Textual insights (or “data facts” [35,38]) are statements thatdescribe statistical facts about data items, subsets, or aggregations(e.g., US cars have a higher average horsepower than Japanese carsfor a car dataset). Visual insights are potentially interesting charts(e.g., a scatterplot that shows a high correlation). We call systemsthat recommend textual or visual insights auto-insight tools.
One of our objectives was to understand the purposes of automatinginsights: What tasks within the data analysis workflow do system de-signers think auto-insight tools are useful for? Hence, we focused onauto-insight tools that have a graphical user interface. Researchersof these tools often specify a target user group and user tasks. Re-viewing related papers helped expose the purposes of automatinginsights in terms of the tasks auto-insights could facilitate. We ex-cluded papers that describe algorithms for extracting and rankingauto-insights (e.g., [36]) because they often have foci unrelated touser tasks (e.g., optimizing computational efficiency). We also ex-cluded commercial auto-insight tools that have not been publishedin academic conferences and journals as they tend to provide fewerdetails about the auto-insights they support. Our review furthercentered on tabular data as it is one of the most common data types.With the focus on auto-insight user interfaces for tabular data,we collected a set of seed papers. We considered both systemsthat recommend textual insights and those that recommend visualinsights. For systems that recommend textual insights, we started a r X i v : . [ c s . H C ] S e p ith two recent publications that used the term data facts to describethe textual insights offered by the systems [35, 38]. For systemsthat recommend visual insights, Lee [3] reviewed 12 relevant auto-insight tools. We found that some (e.g., SeeDB [37]) have multiplepublications and included only the most highly cited paper for eachtool. From the resulting 12 papers, we omitted two that do not depictuser interfaces [7, 32]. Hence, we started with 2 (textual insightsystems) +
10 (visual insight systems) =
12 seed papers.We then gathered papers citing the seed papers from GoogleScholar and those cited by the seed papers. We found 833 uniquepapers. Having collected the set, we reviewed only publicationsat nine relevant venues (InfoVis, VAST, TVCG, EuroVis, SIGCHI,AVI, VLDB, SIGKDD, and SIGMOD). The review resulted in eightadditional papers that depict auto-insight tools [8, 9, 15, 19, 21, 22,28, 29]. We again collected papers citing the eight papers and thosecited by the eight papers and found 298 unique papers. We did notdiscover additional relevant papers from the set. The paper collectionprocess yielded 12 (original seed papers) + =
20 relevant papers. It occurred in March 2020.During the process, we omitted some tools that offer recom-mendations other than textual and visual insights. For instance,Voyager [41] and Show Me [27] recommend perceptually-effectivevisualizations but do not proactively identify potentially interestingvisualizations based on the statistical properties of data.
We analyzed the 20 papers by coding the types of auto-insights thetools present and the purposes of providing auto-insights to users.For the types of auto-insights, we used the fact taxonomy pro-posed by Chen et al. [11] as a foundation for the analysis. Theirtaxonomy depicts a comprehensive set of facts that can be discov-ered from tabular data and matches our definition of auto-insightsas statistical facts in visual and/or textual forms. Grounded in ourreview of relevant papers, we found 12 auto-insight types (e.g., out-liers and association). The 12 types comprise 11 types adapted fromChen et al.s taxonomy [11] and a new type (i.e. visual motifs).We observed that while some papers clearly depict the types ofauto-insights their tools support (e.g., [12, 35]), others tend to bevague in the description. For example, some tools provide a flexibleframework for including a rich set of auto-insights. Yet, the papersdo not state clearly which types were included in the implementation.While coding auto-insight types, we looked for explicit statementsabout the provision of an auto-insight type in the paper and examinedthe accompanying figures and videos.
Regarding the purposes of automating insights, we referred to pro-cess models that illustrate different stages within the data analysisworkflow [6, 18]. Alspaugh et al. [6] interviewed professional dataanalysts and proposed six phases in the analysis process: discover,wrangle, profile, model, explore, and report. Using these phases as astarting point, the coding resulted in four tasks within the data anal-ysis workflow that auto-insight tools intend to support: exploratoryanalysis, communication, focused analysis, and data wrangling.The tasks that an auto-insight tool supports can be ambiguous.For example, as stated in the title of the zenvisage paper (“EffortlessData Exploration with zenvisage”), zenvisage intends to support ex-ploratory analysis [34]. However, it allows users to draw a line chartto query for charts with a similar temporal trend (focused analysis).Furthermore, an analyst could present the charts recommended bythe tool in front of a manager (communication). In order not toover-interpret the tasks that a tool supports, we only coded the major purpose that a tool provides auto-insights by reading the title andthe introduction in the paper. We coded zenvisage as an exploratoryanalysis tool because it is emphasized in the title of the paper [34].
Quick Insights [15]DataShot [38]DataSite [12]DIVE [17]Duet [21]Duo [22]Foresight [14]SeeDB [37]TSIs [9]VisPilot [24]Voder [35]Profiler [19]AutoVis [40]VizDeck [20]ScagExplorer [13]zenvisage [34]HCE [33]LensXPlain [29]View Space Explorer [8]Layered Storytelling [28] F o c u s e d a n a l y s i s D a t a w r a n g l i n g C o m m u n i c a t i o n E x p l o r a t o r y a n a l y s i s V a l u e / d e r i v e d v a l u e T r e n d D i ff e r e n c e E x t r e m e R a n k C l u s t e r O u t l i e r s A ss o c i a t i o n D i s t r i b u t i o n M e t a d a t a V i s u a l m o t i f s C o m p o u n d f a c t PURPOSESTYPES OF AUTO-INSIGHTSTOOLS
Figure 2: Result of the literature review. Each row is an auto-insighttool. A blue square indicates that a tool provides a type of auto-insightsor puts emphasis on a particular purpose of automating insights.
RGANIZATIONAL F RAMEWORK
The literature review resulted in 12 types of auto-insights and fourpurposes of providing auto-insights (Fig. 2).
We observed various approaches to recommending auto-insights.Tools that provide visual insights often score and rank charts toprioritize the presentation of the charts [42]. For example, to recom-mend correlation insights, Foresight ranks scatterplots by correlationcoefficient [14]. Tools that recommend textual insights may presentfacts based on the attributes displayed in a chart or the noteworthyregions in a visualization. For instance, Voder generates textualdescriptions in relation to the attribute combination in a chart [35].Temporal summary images identify salient regions in a visualizationand annotate them with descriptive facts [9]. Below, we presentauto-insight types in descending order of occurrence frequency.
Outliers . Auto-insights about outliers are available in nine auto-insight tools. 5/20 tools provide auto-insights about outliers in asingle variable. For example, Voder categorizes data values that are1.5 times the interquartile range below the first quartile or above thethird quartile as outliers [35]. 5/20 tools provide auto-insights aboutoutliers in two dimensions. Notable examples are the systems thatemploy scagnostics for ranking scatterplots (e.g., [8, 13]). These sys-tems utilize the outlying scagnostics measure to identify scatterplotswith outlying points. Finally, Quick Insights communicates outliersin a time series that are highlighted in a line chart [15].
Value/derived value . Auto-insights about a value of a single rowin a table or a value derived from multiple rows in the table appearin 8/20 tools. The multimodal layered storytelling approach revealsprominent values in a timeline [28]. VizDeck measures the numberof unique categories in a categorical variable and ranks bar chartsby the unique category count [20]. DataSite finds the average ofa numerical variable [12] while DataShot computes the proportionof categories in a categorical variable [38]. Both systems state thederived values (average and proportion) as textual descriptions.
Association . 7/20 tools provide auto-insights about association (i.e.quantitative relationship between two numerical variables). Theseools commonly identify either linear relationship using Pearson cor-relation (7/20) or non-linear relationship using more sophisticatedmeasures (1/20). DataSite computes a Pearson correlation betweentwo variables and presents it as a textual description alongside ascatterplot [12]. Hierarchical Clustering Explorer (HCE) ranks scat-terplots by least-squares error curvilinear regression and quadracityto identify scatterplots that show a quadratic relationship [33].
Difference . An auto-insight about difference involves a quantitativecomparison between distributions. 7/20 tools provide such auto-insights. With Duo [22], users can specify two groups of objects(e.g., cities in China and cities in the US). For each attribute (e.g.,population), Duo compares the two groups to determine whether theyhave different distributions [22]. SeeDB ranks grouped bar chartsby computing the earth movers distance between the two probabilitydistributions depicted in the charts [37]. AutoVis conducts a one-way ANOVA for charts that show continuous-categorical data andsorts the charts by p-value [40].
Trend . We found auto-insights about temporal trend in 6/20 tools.These tools extract upward and downward trends (3/20), steady trend(2/20), and periodicity (2/20). With the ZQL language for specifyingvisualizations, zenvisage can order line charts based on the upwardtrends they show [34]. Temporal summary images add annotationsto the flat regions in a time series visualization [9]. Quick Insightsextracts time series that show seasonality [15].
Distribution . 5/20 systems communicate auto-insights about thedistribution of a variable. Voder presents data facts about the rangeof a numerical variable [35]. Foresight ranks charts based on severalmeasures of distribution including dispersion, skewness, and heavy-tailedness to reveal charts with a noteworthy distribution [14].
Extreme . 4/20 tools show auto-insights about the minimum andmaximum values in a stream of values. For example, DataSite [12]and Voder [35] present the minimum and maximum values in a nu-merical variable as textual descriptions. Temporal summary imagesannotate the lowest and highest points in a time series [9].
Visual motifs . Visual motifs are unique patterns in a chart thatdo not fall into other auto-insight types. They include the specialpatterns in scatterplots identified by the scagnostics measures [39].For example, the striated measure finds scatterplots with parallellines. 4/20 tools identify visual motifs in scatterplots by utilizingscagnostics ( [8, 13, 40]) or other measures (e.g., uniformity) [33].
Cluster . Only DataSite recommends auto-insights about clusters.DataSite employs K-means and DBSCAN to find clusters in a scat-terplot [12]. It presents the clusters using a textual description andhighlights the clusters in the scatterplot [12].
Metadata . Auto-insights about metadata provide information abouta dataset. Such information includes missing values and other dataquality issues [11]. Profiler uses detection routines to identify dataquality issues and suggests charts to visualize the issues [19].
Rank . Such auto-insights involve sorting categories by a numericalvariable. For a dataset of cars, DataShot recommends a data fact thatsays, “Compact, SUV, Midsize are the top 3 categories in the yearof 2008” [35]. This fact is generated by ranking different types ofcars by the numerical variable sales.
Compound fact . Chen et al. [11] defined a compound fact as afact that contains two or more facts. Voder recommends a fact bycombining auto-insights about derived value and distribution [35].For example, it generates the fact “Average Retail Price of SUV is1.76 times Sedan” for a car dataset [35]. The fact includes a derivedvalue (average) and is about the distribution of retail price.
While the previous section regards what types of auto-insights thereviewed tools provide, this section illuminates how these auto-insights support tasks within the data analysis workflow. Early research effort on auto-insight tools centered on supporting open-ended data exploration. However, we observed new applications ofsuch tools in other aspects of data analysis.
Exploratory analysis . The majority of the auto-insight tools wereviewed intend to support exploratory data analysis (12/20). Severalpapers argue that data exploration typically involves specifying andexamining a large number of charts (5/20). They comment that theprocess is non-trivial because of a large dataset (3/20) or a limitedexpertise of users (4/20). Tools such as Quick Insights [15] andForesight [14] therefore rank charts based on the statistical propertiesof data (e.g., correlation) and show the potentially interesting onesto reduce the number of charts for user review.Aside from surfacing potentially interesting charts, we observedother reasons for providing auto-insights to support data exploration.Will and Wilkinson [40] argue that analysts might not know whereto look at the beginning of data exploration, and automatically re-vealing interesting charts helps analysts enter the exploratory loop.Seo and Shneiderman [33] feel that visualization systems often leaveanalysts “uncertain about how to explore their data in an orderlymanner.” They created the HCE system to support a more systematicdata exploration [33]. Lee et al. [24] suggest that users often en-counter drill-down fallacies and developed VisPilot that recommendsbar charts to protect users from the analysis pitfall.
Communication . More recently, some researchers have developedtools that automatically generate data insights for communicationpurposes (4/20). These tools commonly present textual insightsbeside visualizations. They utilize textual insights to provide vari-ous benefits during communication with visualizations, includingvisualization interpretation, effective storytelling, and reflection.Voder seeks to scaffold visualization interpretation by presentingtextual descriptions of charts [35]. DataShot [38] and Temporalsummary images [9] focus on data-driven storytelling. DataShotgenerates infographic-like fact sheets to communicate key points ina dataset [38]. Temporal summary images annotate temporal visual-izations to point “a viewers attention to regions of interest,” “suggestconclusions,” and “provide data context” [9]. Martinez-Maldonadoet al. [28] further found that automatically generated annotations ofvisualizations served educational purposes by encouraging reflectionon performance in nursing simulations.
Focused analysis . Analysis exists on a spectrum from exploratory todirected. Data exploration is often more opportunistic and involvesa vague goal while focused analysis is directed by more concretequestions. We found 3/20 auto-insight tools that were designedto support more focused analysis. They help answer concrete yethigh-level questions such as “Why is there a large high-incomeWhite population?” [29] and “What are the differences betweenNew England colleges and Southeast colleges?” [22] While low-level questions such as What is the admission rate of University Xin 2020? often have a single correct answer, high-level questionsmay have multiple reasonable answers. Auto-insight tools lendthemselves to answering high-level questions by recommendingpossible answers. Users can then apply their domain knowledge tointerpret the relevance of the recommendations.
Data wrangling . Another purpose of automatically extracting datainsights is to support data wrangling. In the set of tools we reviewed,only Profiler serves this purpose. It detects anomalies in data andrecommends visualizations to show the data quality issues [19].
ISCUSSION
Here, we discuss the design opportunities that emerged during theliterature review and reflect on the limitations of our work.
Our organizational framework enables comparison of existing toolsthat aim to automate data insights. Reviewing the literature using ourramework highlights prevailing approaches to automating insights.Furthermore, our review helps inspire new approaches by identi-fying under-examined spaces. This section discusses four designopportunities based on the observations from our analysis.
Compound facts . Figure 2 indicates several types of auto-insightsthat are rarely provided. A promising research avenue is the provi-sion of compound facts. The auto-insight tools we reviewed oftenprovide relatively simple facts. For a car data set, these facts mightsuggest whether a correlation is high or low or whether a temporaltrend is upward or downward. While Voder generates compoundfacts, in the current implementation, these facts (e.g., Average RetailPrice of SUV is 1.76 times Sedan) appear to be straightforward [35].Some visualization researchers feel that the auto-insights gener-ated by existing tools do not align with the conceptualization ofhuman insights being deep and complex [5]. The view that existingauto-insights lack depth imply opportunities for investigating thefeasibility and utility of generating more nuanced auto-insights thatinvolve multiple auto-insight types and more sophisticated logic.However, communicating more nuanced insights may involvea different set of design considerations. While text lends itself tocommunicating multiple pieces of information, creating effectivevisualizations to highlight different information at once is non-trivial.If designed properly, however, visualizations can ease user effort ininterpreting the information. Developing algorithms for extractingcompound facts and investigating the appropriate presentations ofthese facts are ripe for future research.
Beyond exploratory analysis . While many researchers have de-signed auto-insight tools for exploratory analysis, new applicationshave emerged. One such application is focused analysis. TableauExplain Data [2] offers a recent example. As users select an outlyingvalue, Explain Data automatically generates potential explanationsfor why the value is unusually high or low. However, many questionsconcerning focused analysis remain to be addressed. Future workwill investigate other common high-level questions that auto-insighttools can help answer. In designing these tools, researchers shouldalso investigate how these automated systems affect users duringdata analysis. Do they really make us a better analyst? Do theyintroduce any side effects to the analysis process?Besides the current applications of auto-insight tools, researchcould be devoted to exploring new application areas. We note thatsuggesting new applications entails conducting studies with realusers to understand their needs. During the literature review, wenoticed that auto-insight tools informed by formative user studiesin specific domains are scarce. Exceptions include a tool created byMartinez-Maldonado et al. [28] who found that providing textualinsights alongside a timeline helped support reflection in an edu-cation setting. Without significant understanding of users, the newapplications identified may be divorced from real-world needs.
Explanations . Some researchers were concerned that a lack oftransparency in auto-insight tools will cause user distrust of thesetools [3]. Lee [3] provides an example of the non-transparency: Howdo users know that the insights provided “cover all the things thatcan be learned from the dataset”? In general, users may want toknow why the auto-insights are generated and how they are gener-ated. Although explanations have been recognized as an approachto inspiring user trust and promoting transparency in automatedsystems [26], none of the tools we reviewed explicitly providesexplanations about the generation of auto-insights.Lessons could be learned from other research areas regarding theprovision of explanations. In context-aware systems, recommendersystems, and machine learning systems, much research has been de-voted to investigating the types of explanations that can be providedand the effectiveness of providing these explanations. For instance,Lim et al. [25] proposed five types of explanations for context-awaresystems (i.e. what, why, why not, what if, and how to). Herlocker et al. [16] illustrated the process of automated collaborative filteringand derived explanations based on the operations in the process.These research areas hint at the types of explanations auto-insighttools can provide to users. Based on Lim et al.s work [25], an auto-insight tools can provide what, why, why-not, what-if, and how-toexplanations. A why explanation describes why an auto-insight isrecommended to users while a why-not explanation provides reasonsfor why an auto-insight is not recommended.Grounded in the research by Herlocker et al. [16], a tool canexplain the auto-insights by revealing the generation process. Forinstance, to explain an auto-insight about the correlation betweentwo variables, a tool can describe how it extracts the two columnsand removes the records that have missing values. It can then explainhow it computes a correlation coefficient and shows the auto-insightbecause the correlation is higher than a threshold. Future workis required to understand the effectiveness of such explanations inimproving transparency of auto-insight tools.
Information overload . Our review identified reducing informationload as an important purpose of automating insights. Developersof auto-insight tools often argue that data exploration is non-trivialbecause users need to examine a large number of charts, and thatrecommending potentially interesting charts reduces informationload [37]. As system developers strive to provide more complexauto-insights and explanations for how and why the auto-insights aregenerated, information overload might become a concern that defeatsthe original purpose of surfacing auto-insights. This is challengingbecause developers will need to strike a balance among providingrich auto-insights, instilling user trust, and avoiding informationoverload. We hope that our work will inspire researchers to considera wide range of aspects of auto-insight tools that might affect usersin different tasks within the data analysis workflow.
To keep the literature review manageable and focused, we onlyconsidered a representative set of tools that have a graphical user in-terface, that mine auto-insights from tabular data, and that appearedin top conferences and journals. Aside from the notable landmarkpublications we surveyed, future work will examine a broader setof publications that describe algorithms for auto-insight generation,that mine auto-insights from other types of data, and that have notbeen published (e.g., commercial products). Our work can offerfoundational understanding of existing approaches to automatinginsights for future literature reviews to build on.While the types and purposes of auto-insights constitute twosignificant design dimensions of auto-insight tools, investigatingother dimensions can paint a more holistic picture of the landscapeof these tools. Moving forward, future work could explore othertechnical aspects (e.g., the techniques employed for mining the auto-insights) and design aspects (e.g., the methods for evaluating thedata insights offered by the tools) of automating data insights.Finally, our literature review started with a selection of seedpapers. The seed papers might bias the final set of auto-insighttools we obtained. The 20 auto-insight tools we found, however,could serve as a resource for other researchers to collect a morecomprehensive set of auto-insight tools in future studies.
ONCLUSION
In this paper, we have proposed a framework to organize toolsthat aim to automatically communicate data insights. Grounded ina review of 20 auto-insight tools, we identified 12 types of auto-insights and four purposes of offering these insights. We furtherdiscussed four design opportunities that emerged from the review.Resonating with an ongoing research endeavor to understand theautomation of data insights, we hope that our work will offer moreconsolidated understanding of existing tools that seek to recommenddata insights to users.
EFERENCES [1] Automated Insights: Natural language generation. https://automatedinsights.com . Accessed: 2020-03-31.[2] Explain Data | Tableau Software. . Accessed: 2020-03-31.[3] Insight machines: The past, present, and future of visualization recom-mendation. https://medium.com...
Accessed: 2020-03-31.[4] What are data insights? https://algorithmia.com/blog/what-are-data-insights . Accessed: 2020-03-31.[5] What’s an insight? as i see it. https://jts3blog.wordpress.com/2018/02/22/whats-an-insight . Accessed: 2020-03-31.[6] S. Alspaugh, N. Zokaei, A. Liu, C. Jin, and M. A. Hearst. Futzing andmoseying: Interviews with professional data analysts on explorationpractices.
IEEE Transactions on Visualization and Computer Graphics ,25(1):22–31, 2018.[7] A. Anand and J. Talbot. Automatic selection of partitioning variablesfor small multiple displays.
IEEE Transactions on Visualization andComputer Graphics , 22(1):669–677, 2015.[8] M. Behrisch, F. Korkmaz, L. Shao, and T. Schreck. Feedback-driveninteractive exploration of large multidimensional data supported byvisual classifier. In , pp. 43–52. IEEE, 2014.[9] C. Bryan, K.-L. Ma, and J. Woodring. Temporal Summary Images: Anapproach to narrative visualization via interactive annotation generationand placement.
IEEE Transactions on Visualization and ComputerGraphics , 23(1):511–520, 2016.[10] S. K. Card, J. D. Mackinlay, and B. Shneiderman.
Readings in in-formation visualization: Using vision to think . Morgan Kaufmann,1999.[11] Y. Chen, J. Yang, and W. Ribarsky. Toward effective insight manage-ment in visual analytics systems. In , pp. 49–56. IEEE, 2009.[12] Z. Cui, S. K. Badam, M. A. Yalc¸in, and N. Elmqvist. DataSite: Proac-tive visual data exploration with computation of insight-based recom-mendations.
Information Visualization , 18(2):251–267, 2019.[13] T. N. Dang and L. Wilkinson. ScagExplorer: Exploring scatterplots bytheir scagnostics. In , pp.73–80. IEEE, 2014.[14] C¸ . Demiralp, P. J. Haas, S. Parthasarathy, and T. Pedapati. Foresight:Recommending visual insights.
Proceedings of the VLDB Endowment ,10(12):1937–1940, 2017.[15] R. Ding, S. Han, Y. Xu, H. Zhang, and D. Zhang. QuickInsights: Quickand automatic discovery of insights from multi-dimensional data. In
Proceedings of the 2019 International Conference on Management ofData , pp. 317–332, 2019.[16] J. L. Herlocker, J. A. Konstan, and J. Riedl. Explaining collaborativefiltering recommendations. In
Proceedings of the 2000 ACM conferenceon Computer Supported Cooperative Work , pp. 241–250, 2000.[17] K. Hu, D. Orghian, and C. Hidalgo. DIVE: A mixed-initiative systemsupporting integrated data exploration workflows. In
Proceedings ofthe Workshop on Human-In-the-Loop Data Analytics , pp. 1–7, 2018.[18] S. Kandel, A. Paepcke, J. M. Hellerstein, and J. Heer. Enterprise dataanalysis and visualization: An interview study.
IEEE Transactions onVisualization and Computer Graphics , 18(12):2917–2926, 2012.[19] S. Kandel, R. Parikh, A. Paepcke, J. M. Hellerstein, and J. Heer. Pro-filer: Integrated statistical analysis and visualization for data qualityassessment. In
Proceedings of the International Working Conferenceon Advanced Visual Interfaces , pp. 547–554, 2012.[20] A. Key, B. Howe, D. Perry, and C. Aragon. VizDeck: self-organizingdashboards for visual analytics. In
Proceedings of the 2012 Interna-tional Conference on Management of Data , pp. 681–684, 2012.[21] P.-M. Law, R. C. Basole, and Y. Wu. Duet: Helping data analysisnovices conduct pairwise comparisons by minimal specification.
IEEETransactions on Visualization and Computer Graphics , 25(1):427–437,2018.[22] P.-M. Law, S. Das, and R. C. Basole. Comparing apples and oranges:Taxonomy and design of pairwise comparisons within tabular data. In
Proceedings of the SIGCHI Conference on Human Factors in Comput-ing Systems , pp. 1–12, 2019. [23] P.-M. Law, A. Endert, and J. Stasko. What are data insights to profes-sional visualization users? In
IEEE Visualization Conference (VIS) .IEEE, 2020.[24] D. J.-L. Lee, H. Dev, H. Hu, H. Elmeleegy, and A. Parameswaran.Avoiding drill-down fallacies with VisPilot: Assisted exploration ofdata subsets. In
Proceedings of the 24th International Conference onIntelligent User Interfaces , pp. 186–196, 2019.[25] B. Y. Lim, A. K. Dey, and D. Avrahami. Why and why not explanationsimprove the intelligibility of context-aware intelligent systems. In
Pro-ceedings of the SIGCHI Conference on Human Factors in ComputingSystems , pp. 2119–2128, 2009.[26] Z. C. Lipton. The mythos of model interpretability.
Queue , 16(3):31–57, 2018.[27] J. Mackinlay, P. Hanrahan, and C. Stolte. Show me: Automatic pre-sentation for visual analysis.
IEEE Transactions on Visualization andComputer Graphics , 13(6):1137–1144, 2007.[28] R. Martinez-Maldonado, V. Echeverria, G. F. Nieto, and S. B. Shum.From data to insights: A layered storytelling approach for multimodallearning analytics. 2020.[29] Z. Miao, A. Lee, and S. Roy. LensXPlain: Visualizing and explainingcontributing subsets for aggregate query answers.
Proceedings of theVLDB Endowment , 12(12):1898–1901, 2019.[30] D. Sacha, A. Stoffel, F. Stoffel, B. C. Kwon, G. Ellis, and D. A. Keim.Knowledge generation model for visual analytics.
IEEE Transactionson Visualization and Computer Graphics , 20(12):1604–1613, 2014.[31] P. Saraiya, C. North, and K. Duca. An insight-based methodologyfor evaluating bioinformatics visualizations.
IEEE Transactions onVisualization and Computer Graphics , 11(4):443–456, 2005.[32] S. Sarawagi, R. Agrawal, and N. Megiddo. Discovery-driven explo-ration of OLAP data cubes. In
International Conference on ExtendingDatabase Technology , pp. 168–182. Springer, 1998.[33] J. Seo and B. Shneiderman. A rank-by-feature framework for interac-tive exploration of multidimensional data.
Information Visualization ,4(2):96–113, 2005.[34] T. Siddiqui, A. Kim, J. Lee, K. Karahalios, and A. Parameswaran. Ef-fortless data exploration with zenvisage: An expressive and interactivevisual analytics system.
Proceedings of the VLDB Endowment , 10(4),2016.[35] A. Srinivasan, S. M. Drucker, A. Endert, and J. Stasko. Augmentingvisualizations with interactive data facts to facilitate interpretation andcommunication.
IEEE Transactions on Visualization and ComputerGraphics , 25(1):672–681, 2018.[36] B. Tang, S. Han, M. L. Yiu, R. Ding, and D. Zhang. Extracting top-kinsights from multi-dimensional data. In
Proceedings of the 2017International Conference on Management of Data , pp. 1509–1524,2017.[37] M. Vartak, S. Rahman, S. Madden, A. Parameswaran, and N. Polyzotis.SeeDB: Efficient data-driven visualization recommendations to supportvisual analytics.
Proceedings of the VLDB Endowment , 8(13):2182–2193, 2015.[38] Y. Wang, Z. Sun, H. Zhang, W. Cui, K. Xu, X. Ma, and D. Zhang.DataShot: Automatic generation of fact sheets from tabular data.
IEEETransactions on Visualization and Computer Graphics , 26(1):895–905,2019.[39] L. Wilkinson, A. Anand, and R. Grossman. Graph-theoretic scagnostics.In
IEEE Symposium on Information Visualization, 2005. INFOVIS2005. , pp. 157–164. IEEE, 2005.[40] G. Wills and L. Wilkinson. AutoVis: Automatic visualization.
Infor-mation Visualization , 9(1):47–69, 2010.[41] K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe,and J. Heer. Voyager: Exploratory analysis via faceted browsing ofvisualization recommendations.
IEEE Transactions on Visualizationand Computer Graphics , 22(1):649–658, 2015.[42] K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, andJ. Heer. Towards a general-purpose query language for visualizationrecommendation. In