[PDF] AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization

Abstract

Visualizations themselves have become a data format. Akin to other data formats such as text and images, visualizations are increasingly created, stored, shared, and (re-)used with artificial intelligence (AI) techniques. In this survey, we probe the underlying vision of formalizing visualizations as an emerging data format and review the recent advance in applying AI techniques to visualization data (AI4VIS). We define visualization data as the digital representations of visualizations in computers and focus on data visualization (e.g., charts and infographics). We build our survey upon a corpus spanning ten different fields in computer science with an eye toward identifying important common interests. Our resulting taxonomy is organized around WHAT is visualization data and its representation, WHY and HOW to apply AI to visualization data. We highlight a set of common tasks that researchers apply to the visualization data and present a detailed discussion of AI approaches developed to accomplish those tasks. Drawing upon our literature review, we discuss several important research questions surrounding the management and exploitation of visualization data, as well as the role of AI in support of those processes. We make the list of surveyed papers and related material available online at ai4vis.github.io.

Full PDF

UUNDER REVIEW 1

Survey on Artiﬁcial Intelligence Approachesfor Visualization Data

Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang,Dongmei Zhang, and Huamin Qu

Abstract —Visualizations themselves have become a data format. Akin to other data formats such as text and images, visualizations areincreasingly created, stored, shared, and (re-)used with artiﬁcial intelligence (AI) techniques. In this survey, we probe the underlyingvision of formalizing visualizations as an emerging data format and review the recent advance in applying AI techniques to visualizationdata. We deﬁne visualization data as the digital representations of visualizations in computers and focus on visualizations in informationvisualization and visual analytic. We build our survey upon a corpus spanning ten different ﬁelds in computer science with an eye towardidentifying important common interests. Our resulting taxonomy is organized around WHAT is visualization data and its representation,WHY and HOW to apply AI to visualization data. We highlight a set of common tasks that researchers apply to the visualization data andpresent a detailed discussion of AI approaches developed to accomplish those tasks. Drawing upon our literature review, we discussseveral important research questions surrounding the management and exploitation of visualization data, as well as the role of AI insupport of those processes. We make the list of surveyed papers and related material available online at ai4vis.github.io.

Index Terms —Survey; Data Visualization; Artiﬁcial Intelligence; Data Format; Machine Learning (cid:70)

NTRODUCTION D ATA visualizations use visual representations of abstractdata to amplify human cognition. Researchers traditionallyinvestigate visualizations as artifacts created for people. This paperrevisits this traditional perspective in line with the growing researchinterest in applying artiﬁcial intelligence (AI) to visualizations.Similar to common data formats like text and images, visualizationsare increasingly created, shared, collected and reused with thepower of AI. Thus, we see that visualizations are becoming a newdata format processed by AI. For instance, this trend was evident atthe 2020 IEEE Visualization Conference where multiple techniqueswere proposed for automating the creation of visualizations [1]–[5], retargeting visualizations [6], [7], and analyzing visualizationensembles [8], [9]. In light of this trend, new concepts and researchproblems are emerging, raising the need to organize existingliterature and clarify the research landscape.This survey describes the research vision of formalizing visual-izations as an emerging data format and reviews recent advancesin developing AI approaches for visualization data (AI4VIS). Wedeﬁne visualization data as the digital representations of visual-izations in computers and focus on visualizations in informationvisualization and visual analytics. Nevertheless, AI is a broadnotion that has been studied in different areas. Those areas havedifferent motivations and research questions about applying AIto visualization data, the proposed techniques, and the contentformat used to represent visualizations. For instance, the

Weband Information Retrieval community advances the technologyfor searching visualizations (e.g., [10], [11]), while research in • A. Wu, X. Shu, and H. Qu are with the Hong Kong University of Scienceand Technology. Email: { awuac, xinhuan.shu, huamin } @cse.ust.hk. Thiswork is done when A. Wu is an intern at MSRA. • Y. Wang, W. Cui, H. Zhang, and D. Zhang are with Microsoft Research Asia.Email: { wangyun, weiwei.cui, haidong.zhang, dongmeiz } @microsoft.com. • D. Moritz is with Carnegie Mellon University and Apple. Email: [email protected] received April 19, 2005; revised August 26, 2015.

Computer Vision recently studies visual question answering incharts (e.g., [12], [13]). Therefore, a comprehensive understandingof AI4VIS research requires a foundation in rich literature fromdiverse disciplines.We construct a literature corpus by a relation-search ap-proach [14], i.e., graph traversal over the citation and referencenetworks. This approach allows us to collect 98 publications from10 research communities – publications from the visualizationcommunity account for roughly one-third. While not being exhaus-tive, our corpus provides sufﬁcient research instances to synthesizerelevant work that contributes to the understandings of importantcommon research questions and techniques. As shown in Figure 1,we organize and categorize AI4VIS research following a well-established why-what-how viewpoint [15]: • Why apply AI to visualization data. We identify three commongoals for AI4VIS research, namely visualization recommen-dation, enhancement, and analysis (Figure 1 A ). We classifythose goals into subcategories to provide a comprehensiveresponse to ongoing discussions like “why should we teachmachines to read charts made for humans” [16]. • What is visualization data. We formalize the concept ofvisualization data by providing an overview of existing contentformats of visualization data as well as their representa-tions (Figure 1 B ). • How to apply AI to visualization data. Most importantly, wecontribute a task abstraction regarding how to apply AI tovisualization data (Figure 1 C ). The task abstraction is criticalsince it allows for domain-agnostic and consistent descriptionsof research questions across different disciplines. Besides,it facilitates an organized discussion reconciling distinctvisualization paper types [17], i.e., a technique paper focuseson a single task, while a system paper might accomplishmultiple tasks. In total, we note seven common tasks anddiscuss the AI approaches for each task separately.Drawing upon the discussion, we outline research opportunities. a r X i v : . [ c s . H C ] F e b NDER REVIEW 2

Goal

Visualization GenerationVisualization EnhancementVisualization Analysis

Visualization DataRepresentation Task

AssessmentRecommendation ReasoningComparisonTransformation QueryingMining

WHY A WHAT B HOW C Graphics Program HybridInternal Representation Feature Engineering Feature Learning

Fig. 1. We categorize the surveyed papers from three aspects: A Goals for why apply AI to visualization data; B What is the visualization data andits representation; and C How to apply AI to visualization data, where we outline seven tasks and separately discuss corresponding AI approaches.

We make the list of surveyed paper with related material availableonline at ai4vis.github.io . We hope that our survey willstimulate new theories, problems, techniques, and applicationsin this growing research area.

ELATED S URVEYS

Our ability to collect data has signiﬁcantly exceeded our abilityto analyze it, contributing to the emergence of AI approachesthat automate the processes. Recently, there are active ongoingdiscussions about how visualization research could be interwovenwith artiﬁcial intelligence (AI) [18], [19]. However, AI hasbeen deﬁned and operationalized as a broad notion. To makeour discussion concrete, our speciﬁc prospect is to formalizevisualizations as an emerging format of data. We see visualizationsdifferent from existing types such as images and texts, therebyraising many research questions regarding how AI facilitates themanipulation and analysis of visualizations.Several surveys review techniques for automating the creationof visualizations. Saket et al. [20] discussed the prospect of learningvisualization design and classiﬁed automated visualization designsystems into knowledge-based (i.e., rule-based), data-driven (i.e.,machine-learning), and hybrid approaches. This classiﬁcation wassystematically reviewed in a recent survey about visualization andinfographics recommendation [21]. Besides, Qin et al. [22] drewon research in the database community to survey what makes datavisualization more efﬁcient and effective. In addition to automatedcreation and recommendation, Davila et al. [23] reviewed researchover the past eight years to formalize chart mining , deﬁned as “theprocess of automatic detection, extraction and analysis of charts toreproduce the tabular data that was originally used to create them”.Wang et al. [24] recently surveyed machine learning models appliedto visualizations. Different from them, our objective is to provide acomprehensive review of AI4VIS research, that is, a scaffold foremerging research problems to be formulated and understood (e.g.,automatic assessment and summarization of visualizations).

ETHODOLOGY

In this section, we describe the scope of our literature survey, thesearch methodology and corpus, as well as our analysis method.

This survey focuses on AI approaches applied to visualizationdata. We deﬁne visualization data as the digital representations of visualizations in computers. Thus, we include existing workthat contributes AI techniques or systems that primarily focuseson inputting or outputting visualization data . However, due to thewide scope of visualizations and related research, we constrict thescope of visualizations for manageability.

Excluding scientiﬁc visualizations.

We cover literature relatedto information visualization and visual analytics, particularly forwhich visualizations are represented as charts and infographics.This restriction excludes literature that primarily focus on scientiﬁcvisualizations. Scientiﬁc visualizations represent scientiﬁc datasuch as ﬂows and volumes, which are typically designed with astrong inherent reference to space and time [25]. We exclude themdue to their heterogeneous nature that yields different researchchallenges and interests [15].

Excluding research speciﬁc to a chart type.

The researchproblems and proposed techniques could vary among different charttypes. For instance, a conference (i.e., International Symposium onGraph Drawing) in the graph visualization community focuses onthe graph layout problem, whereas their problems might not applyto other visualizations like histograms. Those chart-speciﬁc studiesare large-scale and therefore might not be covered in a singlesurvey, e.g., Behrisch et al. [26] devoted a survey to discussionsabout assessment metrics for different charts. Different from them,our goal is to identify common research problems (i.e., assessment,recommending) irrespective of the chart types. Nevertheless, wenote a few chart-type-speciﬁc studies and discuss how they mightfall into our taxonomy in section 8.

Excluding human interaction data.

Finally, we emphasizethe central role of visualization data in the surveyed AI approaches.In other words, we do not consider work whose primary goal is tocollect and analyze human interaction data when creating or usingvisualizations. Therefore, we exclude provenance data [15] anddata collected for natural language interfaces [27].

To establish the corpus of papers we discussed in this survey, weapply a relation-search method [14] to traverse and search theliterature. Our method starts with a linear scan of the full paperspublished at the 2020 IEEE Visualization Conference to collect thestarting points. This initial set of papers has nine papers [1]–[9].We further augment these starting points with papers covered inrelated surveys [20]–[23]. If a paper is selected to be included,we traverse both it references and citations. We search for papers

NDER REVIEW 3 B e f o r e Artificial IntelligenceComputer GraphicsComputer VisionData MiningDatabasesHuman-computer InteractionNatural Language ProcessingProgramming LanguagesVisualizationWeb & Information Retrieval Fig. 2. The number of papers per research area and per year breadth-ﬁrst in an attempt to avoid over-focus on one particularline of research.Our relation-search method results in an interdisciplinarycorpus consisting of 98 papers from 10 research areas. We decidethe research area of each paper according to the classiﬁcation ofcomputer science areas by CSRankings . Figure 2 lists the researchareas, indicating the interdisciplinary nature and widespreadpopularity of visualizations. The primary area is Visualization,accounting for around one-third (34/98). The next areas areHuman-computer Interaction and Databases. Besides, we seeresearch efforts from Artiﬁcial Intelligence, Data Mining, ComputerGraphics, and the Web & Information Retrieval.Another ﬁnding is the increasing trend of applying AI methodsfor data visualization. Figure 2 shows that the total number ofpublications has been increasing steadily over the last decade,particularly with a surge since 2018 and a peak at 2020. Given thewide and ever-growing research efforts and interests, we believethat this topic will receive more attention in the near future.We acknowledge the limitations of our search methodologywhich is based on manual search over citation and reference graphs.Consequently, our survey should be seen as an effort in investigatingthe diversity of related research, providing sufﬁcient researchinstances to contextualize the current research landscape, andindicating future research opportunities. As such, we do not claimcomprehensiveness nor exhaustiveness. Instead, future researchcould augment the corpus by automated graph traversal. Figure 1 provides an overview of our classiﬁcations of surveyedpapers, which is organized according to the why-what-how axes.This organization is well-established in visualization-related sur-veys [15], [26]. Nevertheless, we introduce two modiﬁcations tothe “what” and “how” axes in order to contextualize our discussionin AI4VIS research.Firstly, from the “what” perspective, we discuss not only whatis visualization data and but also its representation (section 5).For representation, we review internal representations (how vi-sualization data is stored and operated in systems) and featurerepresentation (how visualizations are converted into features thatare mathematically and computationally convenient for analysis).Secondly, from the “how” perspective, we organize the AIapproaches with a novel task abstraction. We identify sevencommon tasks for AI4VIS research and discuss approaches for

1. http://csrankings.org/ each task in section 6. Our motivations for such task abstractionsare two-fold: • Reconciling techniques with system papers . We ﬁnd ourcorpus a mixture of technique and system papers that confoundthe discussion. A technique paper typically solves a single task,while a system paper could consist of multiple componentseach for different tasks [17]. As such, we aim to decomposesystem papers into abstract tasks that allow for a collectively-exhaustive taxonomy of tasks. • Unifying Inconsistent Vocabularies . Due to the interdis-ciplinary nature of our corpus, we ﬁnd tasks are usuallydescribed in inconsistent vocabularies. For instance, the taskof extracting encoding choices from visualization images orspeciﬁcations is described as deconstruction [1], [2] or chartmining [23]. Thus, we wish to establish a common vocabularythat enables consistent discussions for researchers fromdifferent areas to communicate the relevance and subtleties.For the above purposes, we adopt a bottom-up approach byiteratively categorizing and labeling the tasks in surveyed papers.Thanks to the interdisciplinary nature of our corpus, we areable to start from several “seeding” tasks by referring to tasktaxonomy in other ﬁelds [28]. Subsequently, we verify whetherthe task could map to existing categories, and if not applicable,discuss alternative task categorizations. Most tasks in the ﬁnaltaxonomy conform to well-known terminology, with few exceptionsthat reﬂect the peculiarity of visualizations (e.g., visualizationrecommendation). We discuss details of the task taxonomy insection 6. Our supplemental website, available at ai4vis.github.io,provides the details of our labels. Particularly, we label tasks atthe (sub-)section level, and provide quotes from the paper to helpreaders understand why it falls into the task category.

OAL : W HY A PPLY AI TO V ISUALIZATION D ATA

The goals of applying AI to visualization data cover a widespectrum, pursued by research efforts from different areas. Weadopt a deductive classiﬁcation method to create a mutually ex-clusive and collectively exhaustive taxonomy that better structuresour discussion. Speciﬁcally, we subdivide goals along two axesdeductively: whether visualizations are the input or output andwhether visualizations are single or multiple. We further mergeoutputting single visualization and outputting many visualizationsfrom an inductive perspective, i.e., we observe that they share thesame sub-categorization. Therefore, we ﬁnally classify goals into 3categories, which are further subdivided (Figure 3): • Visualization Generation outputs single or many visualiza-tions given different user inputs. • Visualization Enhancement processes and applies enhance-ment to an input visualization. • Visualization Analysis concerns organizing and exploiting avisualization collection.

One of the central research problems in the visualization communityis to ease the creation of visualizations. This is important sinceauthoring effective and elegant visualizations is challenging evenfor professionals [22]. It is typically tedious and time-consuming tocraft visualizations that clearly convey the insights while satisfyingeffectiveness and aesthetic goals. As such, the ultimate goal ofwork in this category is to automatically generate visualizations.

NDER REVIEW 4

Input VisualizationSingle VisualizationMany Visualizations Output VisualizationVisualization GenerationVisualization Enhancement Data-based Design-based Anchor-based Context-basedRetarget Interaction Caption and Annotation Question AnsweringVisualization AnalysisVisualization Retrieval Visualization Mining

Fig. 3. Matrix of goals of applying AI to visualization data with sub-categories. We subdivide goals along two axes deductively.

We identiﬁed four subcategories that distinguish visualizationgeneration approaches by user input.

Data-based generation outputs visualizations given a databaseor a data-table. These approaches assist in visual data analysisand have been extensively studied over the last decades. Earlyresearch dates back to 1986 [29], while it still remains an importantquestion nowadays. Recent work like Draco [30], DeepEye [31],VizML [32], and DataShot [33] make efforts on the direction.Visual analysis is an iterative process where the next step ofanalysis often depends on earlier insights, motivating the researchon anchor-based generation . The problem is to recommend a vi-sualization given an anchor visualization. For instance, SeeDB [34]intelligently recommends visualizations with large deviation to theanchor visualizations, since they deem most “interesting” to users.Similarly, DiVE [35] aims to vary the visualization recommenda-tion, while Dziban [36] targets at maintaining consistency.Related to anchor-based generation, design-based generation studies the problem of generating visualizations by injecting thetarget data to a reference design. This is referred to as styletransfer in Harper and Agrawala’s approach [37] or visualization-by-example [38]. Another recent example is Retrieve-Then-Adapt [1]that applies pre-deﬁned design templates to user information.The last category is context-based generation , where theinput only provides some contextual information such as a naturallanguage description [39] or news articles [40]. An important taskfor context-based generation is to recommend data that is mostlyrelated to the given context.

The proliferation of visualizations gives rise to research effortsin enhancing the use of existing visualizations. An importantquestion is to retarget visualizations to different environments.For example, VisCode [6] and Chartem [7] study how to encodeadditional information in the visualization images. MobileVis-Fixer [2] attempts to automatically convert web visualizationsinto mobile-friendly designs. Some other work explore adding interactions to visualizations to improve the legibility and interac-tivity. Graphical Overlays [41] uses layered elements to aid chartreading. Interaction+ [42] enhances visualizations with dynamic,interactive visual exploration. It is also common to summarizevisualizations to generate natural language descriptions such as captions and annotations . This approach transforms visualizationsfrom visual to non-visual modality, whereby enabling multimodalinteractions [43] or enabling people with vision impairments or lowvision to consume visualizations [44]. Related to natural language,several recent research challenges machines to perform questionanswering on visualizations, that is, to generate answers given aquestion (e.g., [12]).

Finally, with the increasing availability of visualization data,research has constructed visualization database and investigatedmethods for managing and analyzing these collections.

Retrieval has been largely studied in the ﬁeld of information systems anddatabases, helping users search for visualizations that match theirneeds. For instance, Retrieve-Then-Adapt [1] assists users in ﬁndingan example visualization that is suitable for encoding their data.Saleh et al. [45] developed a search engine that returns stylisticallysimilar visualizations given a query visualization.Another promising set of work has started to mine visualizationcollections to derive useful information such as the visualizationusage on web [46] or in the scientiﬁc literature [11], [47], as wellas design patterns in visualizations [48], [49] and multiple-viewsystems [8]. The mined patterns provide evidence for recommend-ing visualizations. Besides, some work [9], [50] considers charts tobe the analytical target and provides a visual analytic approach foranalyzing data patterns from charts ensembles.

ATA : W

HAT IS V ISUALIZATION D ATA

In this section, we formalize the concept of visualization data.Speciﬁcally, we discuss and categorize visualization data interms of its raw data format (subsection 5.1) and representations(subsection 5.2). As shown in Figure 4, we classify visualizationdata formats into graphics , programs , and hybrid that blends thebeneﬁts of both. In addition to raw data, we note visualizationsare sometimes represented as carefully designed internal repre-sentation formats in surveyed systems. Internal representationsare usually proposed to facilitate the computing by removingunnecessary information, e.g., the VQL format [51] only storesdata transformation and encoding without style information. Assuch, internal representations are usually not exposed (outputtedand shared). Finally, we review feature presentations including feature engineering and feature learning . Feature presentationsare vital for machine learning tasks by concerting visualizations intofeatures that are mathematically and computationally convenient toanalyze. We discuss them due to the increasing interests in applyingmachine learning to visualizations . Visualization data can be stored in different content formats suchas graphics and programs. The choice of content formats directlyinﬂuences the downstream operation possibly on the visualization

GraphicsProgramHybrid Raster Logic ProgramRelational ProgramSet theoretic ProgramGraphic FeatureProgram FeatureText FeatureUnderlying Data FeatureHybrid FeatureImperativeDeclarativeEmbed programs into graphicsOrganize graphics by programsVector Internal RepresentationFeature EngineeringFeature Learning

Visualization DataA RepresentationB

Fig. 4. Summary of categorization of A visualization data and B itsrepresentation

NDER REVIEW 5 data, since different content formats have their own advantages anddisadvantages. Here, we discuss three formats we identiﬁed in oursurvey: graphics, programs, and hybrid formats.

Graphics are a natural and expressive content format of visualiza-tions, since visualizations are deﬁned as a graphical representationof data. It is common to author and store visualization as rastergraphics (bitmaps) for easy usage and sharing [52]. Nevertheless,raster graphics are a standalone and lossy representation of visu-alizations which lose the visualization semantics (e.g., chart type,visual encoding, underlying data). To perform automated analysis,reverse engineering is often a pre-requisite, i.e., to reconstructthe lost information from raster graphics using computer visionand machine learning approaches [53], [54]. However, reverseengineering still remains as an open problem with challenges toovercome in terms of robustness and accuracy [7]. In conclusion,the lossy nature of raster graphics hinders the machines from easilyinterpreting and transforming the visualization [7].

Vector graphics represents a less lossy alternative. They haveadvantages over raster graphics in that they can be scalable upwithout aliasing. Visualizations are usually stored in the ScalableVector Graphics (SVG) format [52], which allows describing visualelements as shapes (e.g., rectangles and text) with styles (e.g.,positions and ﬁll-color). Those low-level descriptions reduce thedifﬁculties of reverse engineering, e.g., it is no longer necessaryto apply computer vision techniques to detect objects such astexts [55]. Besides, this format enables support for interactivity andanimation. Nevertheless, high-level visualization semantics such asvisual encoding and underlying data is still lost whose extractionrequires considerable efforts [2], [56].To conclude, graphics are a human-friendly and expressivecontent format for visualizations. However, their lossy naturerestricts the availability of machine interpretation and computationand requires reverse engineering. To that end, vector graphics aremore advantageous to reverse engineering than raster graphics.

Researchers have developed approaches for describing and storingvisualizations as computer programs . Programs retain necessaryinformation to construct the visualization, e.g., the underlying data.The information is usually represented by languages, which areclassiﬁed into imperative and declarative programming.

Imperative visualization languages require users to specifystep-by-step commands to create the visualization. For example,D3 [57] is a JavaScript-based toolkit that assists programmers inspecifying graphical marks (e.g., bar, line) with visual attributes.Programs by imperative languages are open-ended and thus allowthe ﬂexible creation of visualizations. However, the open-endedcharacteristic renders it unstructured [58], with irregularities andambiguities that hinder efﬁcient machine-analysis to extract thesemantics (e.g., visual encodings). As such, we observe few effortsin analyzing imperative programs. In a slightly different vein, Bolteand Bruckner [59] recently proposed Vis-a-Vis, a visual analyticapproach for exploring visualization source codes. The authorsargued for more efforts on analyzing visualization programs.

Declarative visualization languages ask programmers to di-rectly describe the desired results, which is usually referred tovisualization speciﬁcations. Speciﬁcations (e.g., Vega [60] andVega-Lite [61]) encapsulate step-by-step commands for visualiza-tion reconstruction into semantic components such as data encod- ings, axes and legend properties. This encapsulation is achievedby providing sensible defaults and introducing constraints withprescribed properties and structures. As such, declarative programstend to be less or equally expressive as imperative programs,depending on their design. Since speciﬁcations contain tags ormarkers to separate semantic elements and enforce hierarchies, theyare deemed semi-structured and thus more helpful for computerprocessing tasks [62]. It is, therefore, a common practice togenerate or collect speciﬁcations to conduct data-driven research,e.g., VizML [32] collects the Plotly corpus to train visualizationrecommendation systems.Arguably, programs are not friendly to most people exceptfor programmers. As such, programs tend to be less shared bylaypeople online, which hinders their collection and reuse. Thismight be exempliﬁed by the data collection for visualizationresearch. Even though programs are more commonly used inAI approaches than graphics, existing corpora mainly includegraphics [46], [47], [63] or tabular datasets [64] instead of programs.This suggests the need for more recognition of balancing themachine- and human-friendliness of visualization formats.

Recent research proposes several hybrid content formats thatincorporate the beneﬁts of both graphics and programs. Althoughsuch efforts remain limited, we provide our embryonic classiﬁcationhere, hoping to motivate future theories and models.Two approaches aim to embed programs into graphics .VisCode [6] presents an embedding approach based on deep imagesteganography, that is, to conceal visualization speciﬁcations andmeta information within the bitmap image. Similarly, Chartem [7]encodes information (e.g., the visualization speciﬁcations) in thebackground of a chart image. Both embedding techniques reducethe overhead to decode underlying visualization speciﬁcations,by showing that the encoded information can be extracted in anefﬁcient and less error-prone manner. Besides, both approachesreduce the interference to human perception by avoiding visuallyimportant areas of bitmap images.Loom [65] takes a different approach and seeks to organizegraphics by programs . Loom proposes to share interactivevisualizations by ﬁlling the gap between two extremes, i.e., sharingnon-interactive formats such as images, and sharing the data,source codes, and software. Speciﬁcally, it formulates interfacevisualizations as a standalone object built on an action tree. Eachintermediate node of the tree represents an interaction such ashovering and clicking, and the leaf node stores the resultingvisualization image. Therefore, users can interact with the graphicsas if they have access to the original source codes and software.In this way, the hybrid approach provides a reproducible andsustainable format that promotes the sharability of visualizations.

Now that we have considered the different content formats of rawvisualization data, the next challenge is its representation in AIapproaches. Firstly, raw visualization data in the format of imagesor programs might not explicitly represent the semantic informationneeded (e.g., chart type). Thus, it is usually helpful to store andoperate visualization data in the internal representation formats tofacilitate the process. Secondly, raw data need to be converted intofeature representation to enable machine-learning techniques. Thus,we discuss feature engineering and feature learning approachesused to extract visualization features.

NDER REVIEW 6

TABLE 1Summary of features of visualizations for machine learning tasks

Graphics Program Text Underlying DataFeature Engineering general image descriptors [11], [47], [54], [66]element positions or regions [44], [46], [53], [54],[67]element styles [46] parameters [1], [2]communicativesignals [68]design rules [30], [36] statistical models [3], [66],[69], [70] statistics [31], [32],[64], [71], [72]one-hot vector [73]

Feature Learning convolutional neural network [12], [13], [40],[44], [50], [53], [69], [74]–[77], [77]–[87]autoencoder [6], [77] autoencoder [9], [88] embedding models [3], [73] autoencoder [89]

Visualization programs tend to contain extraneous details (e.g.,visual style) or miss semantics (e.g., chart type) that mightnot meet particular needs of research. Thus, systems usuallyexpress and operate on visualizations in simpler or more structuredformats by removing unwanted or unnecessary speciﬁcations andadding customized information, which we refer to as internalrepresentations . However, we ﬁnd that most surveyed papers donot explicitly discuss their data structure of internal representations.Nevertheless, we note three common formal internal representationformats that are designed towards the high-level goal of facilitatingcomputation (Figure 5).Draco [30] (Figure 5 A ) uses Answer Set Programming toexpress visualization speciﬁcations as logical facts. This fallsinto logic programming [90], which expresses problem domainsas facts or rules and beneﬁts logical computation. For instance,the logical facts in Draco can be used to check whether thespeciﬁcations satisfy compound rules regarding the visualizationdesign knowledge.Literature from data mining and databases [31], [34], [35],[51], [70], [72], [91]–[94] usually uses relational programming to form visualizations as queries into database (Figure 5 B ).Those relational queries facilitate operations on collections of

Fig. 5. Internal representations of visualizations: A logic programmingmaps speciﬁcations into logic facts for computation [30]; B relationalprogramming expresses visualizations in query language for relational op-erations such as selection [31]; C set-theoretic programming facilitatesset operations [38]. visualizations such as composing, ﬁltering, comparing and sorting.This programming paradigm is also adopted by the CompassQLlanguage in Voyager [95]–[97].Finally, Wang et al. [38] proposed a set-theoretic program-ming representation that describes visualizations as a set ofvisual elements (Figure 5 C ). This representation facilitates setcomputation, e.g., to determine whether a visualization is a super-set of another.

In this section, we discuss the features of visualizations, whichare the measurable properties serving as the input to machinelearning models. Features are extracted by feature engineering or feature learning [98]. Feature engineering is the process of usingdomain knowledge to extract features from raw data, while featurelearning replaces this manual process by developing automatedapproaches that automatically discover useful representations. Forour discussion, we classify existing approaches according to thefeature space, including graphics , program , text , and underlyingdata (Table 1). It should be noted that some papers use multiplefeatures for different tasks [44], [53] or use hybrids by featurefusion for improving performances [54], [66], [69]. In the followingtext, we describe each category in detail. Graphical features are the most common features of visu-alizations. The overarching goal is for predictive tasks, e.g., topredict the chart type or the “goodness”, or detection tasks. Forthose purposes, early work uses general image descriptors suchas bag-of-keypoints [66] or patch descriptors [54]. These imagedescriptors are designed for general visual content, containing onlylow-level information such as shapes and regions. To raise thelevel of abstraction, researchers have proposed special domaindescriptors that capture visualization-speciﬁc information, andin many cases outperform the general descriptors (e.g., [53]).Examples include the regions of text elements (e.g., titles andlabels) [54], positions of text and mark elements [44], [46], [53],the relative positions between text and marks [67], and the visualstyles of elements [46]. Despite promising results, such featureengineering process remains labour-intensive that requires expertise.Besides, it remains unclear whether human-crafted features areinformative and discriminating for accomplishing machine learningtasks. Therefore, recent work has predominantly adopted automatedfeature learning by leveraging deep learning models. Particularly,convolutional neural networks (CNNs) have been widely usedto automatically learn spatial hierarchies of image features andshown to outperform early approaches [12], [13], [40], [44], [50],[53], [69], [74]–[76], [78]–[87]. VisCode [6] recently uses anautoencoder to learn an effective representation that could concealadditional information. Fu et al. [77] used the latent vectors of theautoencoder model to predict an assessment score of visualizationimages. However, these deep learning models currently face thesame challenge as early general image descriptors in that they mightnot capture visualization-speciﬁc information. This limits their

NDER REVIEW 7 capability in high-level tasks such as automatic assessment [77]and visual question answering [12] where the performances arerelatively dissatisfactory.

Program features are extracted from the programs such asspeciﬁcations. Probably the most straightforward representationis the parameters. For instance, MobileVisFixer [2] and Retrieve-then-Adapt [1] train models that learn to operate on the chartparameters, e.g., positioning the element. Burns et al. [68] extractedcommunicative signals (such as whether a group of bars is coloreddifferently from the other bars) that they fed into a Bayesianmodel. Draco [30] contains constraints over facts that encodevisualization design knowledge. These constraints describe whethera visualization conforms to best practices of effective visualdesign. However, little work uses programs as the training inputwhich might be due to several reasons, including the lack oftraining data or the overheads of reserve engineering to extractthe program from visualization graphics. Still, programs are apromising representation as they contain high-level visualization-speciﬁc information. This could be exempliﬁed by the recent workChartSeer [9], where the Vega-Lite speciﬁcations of charts areconverted into visualization embeddings by autoencoders. Theresulting embeddings are used to measure similarities betweencharts to assist in analyzing chart ensembles, and proven effective incontrolled user studies. Their results suggest that program featuresare promising in semantically characterizing charts.

Text features refer to the text content in visualizations such astitles. They are considered to improve the feature informativenessby incorporating semantic information. For instance, two systemsdescribe text information with statistical models, i.e., bag-of-words [66], [69], and incorporate the resulting text features withgraphical features to improve the performance for chart detectionand classiﬁcation tasks. Moreover, text features can capture thesubject matter of visualizations. VizCommender [3] is a recentcontent-based recommendation system built on machine learningmodels for predicting semantic similarity between two visualizationrepositories, which shows a high agreement with a human majorityvote. Chart Constellations [73] uses word embeddings to measurethe similarity between charts. However, it still remains unclear howtext features could be effectively fused with graphical features tomore comprehensively represent a visualization, e.g., to balancetext-based and style-based similarities.The last category of visualization features is in the underlyingdata that the visualization encodes. Chart Constellations [73]describes the encoded data columns by a one-hot vector encoding,which is then used to compute chart-wise similarities. Besidessuch descriptive purposes (i.e., to describe a visualization), datafeatures are found to be mainly predictive (i.e., to predict thevisual encoding). Data2Vis [89] adopts a sequence-to-sequenceautoencoder structure that models the input dataset in the JSONformat. However, the sequence-to-sequence structure might not wellcapture the characteristics of the underlying data such as the datatype. DeepEye [31], [70], [72] and VizDeck [71] perform featureengineering to consider data statistics such as the number of uniquevalues in a column. VizML [32] further extends this approachto 841 features including single- and pairwise-column featuresof the input dataset and signiﬁcantly outperforms Data2Vis andDeepEye. The analysis of VizML suggests that those features arenot independent and some appear to be of little importance. Futureresearch should propose effective feature selection approaches orcomplementary feature learning methods.In summary, we ﬁnd the following open research questions regarding the visualization features: • Learning visualization-speciﬁc features . Existing off-the-shelf computer vision or machine learning models are orig-inally designed for general visual content or relational data.As such, they might fall short when applied to visualizations,e.g., visualization assessment [77] and Data2Vis [89] sincethey do not capture informative visualization-speciﬁc features.To address this problem, researchers have proposed featureengineering approaches to manually craft features, which,however, is labour-intensive without guarantees for success.With the rapid development of deep learning techniques, weenvision a deep learning model tailored to visualizations thateffectively learn visualization-speciﬁc features. • Fusing multi-modal features . Visualizations are unlikelyto be comprehensively represented by only one of theaforementioned features. In several cases, researchers havedemonstrated that the feature fusion could improve theperformances of chart detection and classiﬁcation [66], [69].Promising avenues for future work lie in leveraging featurefusion for high-level tasks such as question answering andsimilarity-based recommendation, which are not well-solvedby the single-modal feature.

ASKS : H

OW TO A PPLY AI TO V ISUALIZATION D ATA

In this section, we focus on common tasks that researchers applyto visualization data. We organize the observed tasks into sevenprimary tasks as follows: • Transformation processes visualization graphics to outputcorresponding programs or another graphic. • Assessment measures the absolute or relative quality of avisualization in terms of scores or rankings. • Comparison estimates the similarity or other metrics betweentwo visualizations. • Querying refers to the problem of ﬁnding the target visualiza-tion relevant with a user query within visualization collections. • Reasoning challenges machines to interpret visualizations toderive high-level information such as insights and summaries. • Recommendation automates the creation of visualizations bysuggesting data and/or visual encodings. • Mining aims to discover insights from visualization databases.Most tasks originate from well-known terminology. For in-stance, transformation, assessment, and visual reasoning are well-studied tasks in the ﬁeld of computer vision [28], while queryingand mining come from database and information system research.Two exceptions are comparison and recommendation. Althoughcomparison is similar to the image similarity search task incomputer vision [28], we ﬁnd a large body of visualization researchstudying other metrics (e.g., difference) between two visualizationsand thus decide the wording comparison. Recommendation is awidely studied task in the visualization literature (e.g., [21], [95]).In the following text, we describe the problem statement andchallenges, summarize the existing techniques, and outline openresearch questions for each task.

Transformation is the operation that converts the content formatsof visualizations. Particularly, it is straightforward to transformvisualization programs into graphics by visualization tools or

NDER REVIEW 8 libraries (e.g., [57], [61]). A more challenging problem is thereverse process, i.e., reconstructing programs from graphics. Thisprocess is also known as reverse engineering [53]. In the followingtext, we focus on the reverse engineering problem.

Relations to goals and other tasks.

Transformation is usuallythe ﬁrst task for visualization enhancement and analysis, especiallywhen the input is images. As such, it is often a prerequisite forremaining tasks. For instance, the extracted information can beused for querying [48] and reasoning [99].

Relations to visualization data.

Several works study theproblem of transforming visualization images into another imageby altering the data or visual styles [1], [37], [48]. Nevertheless,their approaches are built on reverse engineering, i.e., to extract theencodings ﬁrst and then replace the data. Little research exploredthe direct transformation in the image space [6], [7], [100], [101].As such, we do not dedicate a separate discussion on image-to-image transformation at the current stage. Future research couldextend our taxonomy with further development in this ﬁeld.

Visualization reverse engineering has been widely and extensivelystudied over the past decades. Early research could date back to2001 when Zhou and Tan [102] proposed a learning-based paradigmfor chart recognition. Since then, much research has devoted toextracting semantic information from visualization images such aschart types, visual encoding, and underlying data. Ideally, reverseengineering is expected to yield cycle-consistency, that is, its outputshould be able to re-generate the original visualization. Despitepromising preliminary results, several challenges remain to beovercome since much work makes simplifying assumptions on theexpected input or output. For instance, several approaches takevector graphics as input [2], [37], [42], [46], [48], [56], [103],[104], assuming the type of each visual element is available asSVG. Most work is limited to a predeﬁned set of chart types,while little work [2], [37], [40], [48], [56], [75], [103], [105]applies to more bespoke visualizations. Besides, a large body ofresearch focuses on extracting a particular portion of information,such as chart classiﬁcation [40], [47], [69], [75], [78]–[80], objectseparation [106], [107], and object clustering [2], [42], [103].We summarize a conceptual framework of the reverse engi-neering process to provide an overview of existing technical devel-opment and identify research gaps. We developed the frameworkvia a bottom-up approach, where we abstract existing methods,identify their simplifying assumptions, and iteratively merge theresults. As shown in Figure 6, we ﬁnd that reverse engineering canbe classiﬁed into two distinct phases. The ﬁrst phase decomposesvisualization graphics into semantic elements (e.g., axis, mark)through machine learning and computer vision techniques includingobject detection, classiﬁcation, and clustering. The second phaseperforms mathematical computation over the decomposed semanticelements to extract visual encoding and/or the underlying data. Inthe following text, we describe each phase in detail.

Decomposing.

The decomposing phase varies depending onthe input (Figure 6 A ). The primary step for raster graphics is todetect and classify visual elements such as text and shapes. This isapproached by traditional image processing techniques (e.g., edgedetection, morphological operations) in early work [44], [54], [67],[111] and machine learning or deep learning approaches (e.g., MarkRCNN) in work published in 2015 and later [10], [43], [44], [50],[53], [66], [76], [105], [112], [113]. This element recognition stepfaces chart-speciﬁc challenges, e.g., to cope with visual clutter in

Guide GuideScaleEncoding EncodingScaleData DataChartMark MarkRaster ImageVector Image Element ChartGuideMarkOthersElement Clustering AB Known Chart Type Unknown Chart TypeElement Detection & ClassificationChart Detection & Classification

Transformation

Fig. 6. The conceptual framework of the reverse engineering process:A It ﬁrst decomposes input graphics into semantic groups such asguides (axes and legends) and marks; B The resulting information isfed into mathematical computation to extract the visual encoding and/orunderlying data, depending on whether the chart type is known. It remainsan open challenge to derive both visual encoding and data from bespoke,unknown chart types. line charts and scatterplots [106], [107]. In addition, since existingobject detection models are prone to rotation that is common for piesectors, Choi et al. [44] proposed a special heuristic to pie chartsby grouping the nearby pixels with the same color. The outputof this element recognition step is usually the position and classof each visual element, which are already available in the SVGspeciﬁcations of vector graphics. In other words, vector graphicsremove the overhead of element recognition.Chart detection and classiﬁcation are another step of thedecomposing phase. This step faces two important choices: theclassiﬁer and the feature representation. Classical classiﬁers (e.g.,support vector machine, random forest) [46], [54], [66], [114]have been gradually superseded by deep learning classiﬁers (e.g.,convolutional neural network (CNN)) [40], [44], [47], [50], [69],[75], [76], [78]–[80] in visualization classiﬁcation tasks. This ismainly because CNNs can effectively learn abstract features fromraw visualization images, while classical classiﬁers require hand-crafted image features such as histograms of the image gradients(HOG) [114] and dense sampling [54], [66]. Several approachesseek to improve the representativeness of features by incorporatingelement-level features such as text [54], [69] and shape stylefeatures [46]. The last step of the decomposing phase addresseselement clustering, that is, to cluster visual elements into semanticgroups including guides (axes and legends), marks, and otherinformation such as annotations. This step is typically separatelydiscussed for text and shape elements. On one hand, clusteringtext is usually formalized as a classiﬁcation problem, that is, toclassify and group text according to their text roles such as x-axis-label and legend-title [10], [43], [44], [50], [53], [54], [66],[67], [113], [115]. On the other hand, this classiﬁcation-basedapproach is not always readily applicable to shape clustering, sincethe roles of shape depend on the chart type. As such, researchersusually simplify this problem by focusing on common chartswhere shapes are well-deﬁned. For instance, Poco and Heer [53]trained a classiﬁer to detect area, bar, line, and plotting shapes,and consequently grouped shapes of the same type. To supportmore bespoke visualizations, several approaches [2], [37], [48],

NDER REVIEW 9

TABLE 2The assessment task is classiﬁed by the output (row) and the method (column)

Rule-based Machine Learning HybridRankings effectiveness rankings [29] learning-to-rank [31], [40], [70], [72]

Scores convert rankings to scores [30], [96], [97]hand-crafted metrics [2], [5], [6], [39], [91], [92], [108]–[110] learning-to-rank with scores [1]predictive regression [64], [71], [77] learning-to-weighthand-crafted metrics [30] [56], [103] use the node hierarchy information to group shapenodes under the same ancestor. Nevertheless, those approachesare only applicable to vector graphics. Finally, the shape clustersare associated with the text clusters to identify axes, legends, andlabel-mark relationships.

Composing.

After the visualization graphic has been de-composed into semantic groups (e.g., guides and marks), the composing phase (Figure 6 B ) aims to extract the visual encodingand/or the underlying data from this semantic information. Differentfrom the decomposing phase that uses computer vision and machinelearning tasks, this composing phase mainly uses heuristics byleveraging domain knowledge about visualizations. We ﬁnd twocommon methodological themes of those heuristics, depending onwhether the chart type has been extracted from the previous phase.The ﬁrst class of heuristics uses information about the charttype and guides to determine the scale and the encoding, whichis dominant in our corpus [43], [44], [50], [54], [66], [67], [76],[111]–[113], [115]. For instance, given a scatterplot with axes andlegends, it is straightforward to derive the visual encodings, i.e.,the x/y positions maps to numerical values and the color encodesthe categorical data, and to calculate the scale. Consequently, theunderlying data could be computed via applying the reserve scalecomputations over the marks. However, those heuristics are oftenlimited to a small set of chart types.Another class [37], [48], [56] studies the more challengingproblem of decoding bespoke visualizations where the chart type isunknown. However, they focus on D3 charts where the underlyingdata is available by crawling the SVG node on the web. In this way,they develop heuristics to determine the scale from guides and data,and to derive the encoding from data and marks.

In conclusion, it remains an open challenge to derive both thevisual encoding and the underlying data from bespoke visualizationgraphics, whose chart types are not limit to common ones. Aninteresting future research direction would be to improve thecurrent heuristics for determining visual encodings from bespokevisualizations, i.e., by machine-learning approaches.Another primary challenge of reverse engineering lies in therobustness and accuracy. As discussed above, the pipeline of reverseengineering usually consists of multiple sequentially dependenttasks that are prone to single points of failure, that is, the failureof one task would spread to the whole system. For instance,researchers have reported common failure cases such as textdetection (e.g., [43], [53]), which impede the extraction of guidesand consequently the visual encoding.This motivates the use of semi-automatic approaches thataddress imperfect algorithms with human intervention [41], [116],[117]. For example, ChartSense [117] allows users to adjustincorrectly recognized data shapes. Nevertheless, its frameworkof incorporating automatic algorithms with human intervention isspeciﬁc to chart types and therefore not readily applicable to moregeneral situations. A related research direction is to investigatea general framework for bespoke visualizations, as the reverseprocess for constructing bespoke charts [118].

There is a long history of research on teaching machines to assessand rank the quality of data visualizations. As shown in Table 2,assessment outputs a numerical score of the visualization qualityor measures the relative quality in terms of ranking.

Assessment

Visualization Scores/Ranking

Fig. 7. Input-output model of assessment.

Relations to goals and other tasks.

The key motivation ofassessment is to improve visualization design, e.g., to derive scoringmetrics that can be used as cost functions for automatic generation.That said, assessment is often combined with recommendation.

Relations to visualization data.

Most surveyed techniquestake visualization programs as input, focusing on the visualencoding and data quality. Nevertheless, Fu et al. [77] propose anapproach for assessing visualization images.

Assessment is challenging due to the human-centred nature ofvisualizations that requires large-scale empirical experiments tounderstand what makes a visualization “good”. However, theknowledge derived from large-scale empirical experiments is oftenrepresented as design guidelines instead of quantiﬁable rules. Assuch, much research aims to quantify knowledge about “good”visualizations. In 1986, Mackinlay [29] developed the APT systemthat ranked the effectiveness according to the accuracy rankings ofquantitative perceptual tasks for different visual encoding channels.However, this ranking-based approach only reﬂects the relativequality of visualizations.

Scoring-based approaches are often moredesirable since scores measure the absolute quality and thereforebeneﬁt down-streaming tasks, e.g., scores can be used as the costfunction for optimization. To that end, Voyager [96], [97] andDraco [30] maps different single-criteria rankings to numericalscores. Besides, researchers often leverage domain knowledgeto design hand-crafted, rule-based metrics that measures thevisualization quality such as informativeness [39], interesting-ness [91], [92], [108], accuracy [91], [92], signiﬁcance [109],saliency [110], visual importance [6], complexity [5], and mobile-friendliness [2]. However, designing hand-crafted metrics usuallyrequires considerable effort. More critically, the design processis often unsystematic and lacks a strong methodological base.For example, Wu et al. [2] demonstrated that even seeminglyreasonable metrics do not always survive experimental scrutiny.Besides, questions arise about how to weigh scores to reﬂect theoverall, multi-criteria quality. Several systems [5], [39], [91], [92],[96], [97] determine the weights for each score through manualreﬁnements that could become unsystematic.As such, another line of research seeks to propose moresystematical machine learning approaches that learn to rankand/or score visualizations from data collected from empiricalstudies. VizByWiki [40] and DeepEye [31], [70], [72] formulate

NDER REVIEW 10 a learning-to-rank problem that learns to rank visualizations fromcrowdsourced data. Retrieve-then-Adapt [1] extends the learning-to-rank model that simultaneously outputs paired scores. VizDeck [71]learns a linear scoring function from users’ up- and downvotes.VizNet [64] demonstrates the feasibility of training a machine-learning model to predict the effectiveness of visual encodingsfrom crowdsourced data.Nevertheless, those machine-learning approaches face twomajor challenges including poor generalisability and explainability.First, the aforementioned models are trained over statistical orvisual encoding properties, assuming the underlying dataset andspeciﬁcation is available. More importantly, they only support alimited number of chart types. To that end, Fu et al. [77] proposea more general approach to assess the quality of visualizationimages that generalizes to different visualization type and doesnot require additional information except the visualization images.Second, machine-learning models lack explainability that mightdecrease trust. Moreover, they often exclude knowledge derivedfrom empirical studies. To address these limitations, Draco [30]takes a hybrid perspective by encoding design knowledge asconstraints and learning a weighting function to trade off thoseconstraints, whereby outputting a ﬁnal score. Besides automaticchart design, Draco can be used as a “visualization spell checker”that explains violations of design guidelines and why they matter.

Existing approaches predominately focus on objective qualitiesthat can be measured via user studies (e.g., task performance andcompetition time). However, subjective metrics such as aestheticsare relatively underexplored, despite that they are considered asimportant features of good visualizations [119]. This is challengingsince it is difﬁcult to harvest crowdsourced data of subjectivequality of visualizations, since crowdsourced judgments can beinconsistent and inaccurate. This underscores research needsto propose methods for generating large-scale training datasetsfor visualization research in a reliable and sustainable manner.One promising way is to incorporate expert knowledge andcrowdsourcing experiments in dataset generation.Going forward, we envision machine-learning approachesthat not only assess the visualization but also provide insightfulexplanations. In this way, the approaches embrace explainabilityby translating ML models into human-readable explanations andeven useful design guidelines.

Characterizing the similarity or other metrics between two visual-izations is helpful when dealing with a visualization collection.

Comparison

Visualization Visualization Numercial Value

Fig. 8. Input-output model of comparison.

Relations to goals and other tasks.

Comparison is found toassist in visualization generation and analysis. The comparisonmetrics can be used as cost-functions in 1) recommendation toperform anchor-based visualization generation (e.g., [34]), as wellas 2) querying to perform “query-by-example” (e.g., [8]). Besides,the assessment metrics can be used to compute the difference.

Relations to visualization data.

Comparison is studied onboth visualization programs and graphics, as shown in Table 3.

TABLE 3Comparison is classiﬁed by the method (row) and the input (column)

Program Text Data GraphicsDistance [9], [35],[70] [8], [45] [3], [73] [9], [34], [35], [51],[70], [120], [121]

Difference [36], [73],[122]

Perhaps the most straightforward approach for comparing twovisualizations is to calculate the difference . GraphScape [122] is adirected graph model where each link represents an edit operation(e.g., add ﬁeld) and nodes denote the resulting visualizations.Subsequently, each edit operation is registered with a cost, whichis learned from human judgments. Dziban [36] further translatesthe graph model into a set of constraints and weights, similarto Draco [30]. In this way, both approaches explicitly model thedifference between two charts as an operation associated witha numerical cost. Nevertheless, this difference-based approachbecomes sophisticated when there exist multiple, often under-speciﬁed, operations between two charts, where graph traversal isessential for searching and weighting all possible paths. This isfurther complicated by the limitation that GraphScape only includesoperations regarding data transformation and visual encodings.Although it is methodologically feasible to extend GraphScape tosupport other operations such as recoloring, such extensions arelabor-intensive, without a guarantee for exhaustiveness.Partly due to the above challenges of difference-based methods,most research adopts distance-based measurements [3], [8], [9],[34], [35], [45], [51], [70], [73], [120], [121]. The key idea is toconvert a visualization into a feature vector, and compute the dis-tance between two feature vectors according to distance functions.Thus, the technical challenges of distance-based measurements aretwo-fold: the choice of features and the distance function.The features of visualizations vary among the underlyingsources, as discussed in subsubsection 5.2.2. In the context ofcomparison, we identify four primary sources including graphics,text, data, and speciﬁcations. For instance, Saleh et al. [45] extractlow-level visual features from graphics to learn style similarity,and Chen et al. [8] model the conﬁguration pattern of multiple-view visualization systems as a 1 ×

126 vector measuring the layout.Regarding text features, VizCommender [3] uses both hand-craftedfeatures (e.g., TF-IDF) and learned features (e.g., Doc2Vec). Morework uses hand-crafted features for the data [34], [51], [120], [121].Finally, ChartSeer [9] uses deep learning approaches to convertVega-Lite speciﬁcations into embeddings.The derived features are subsequently fed into a distancefunction to derive the distance. Examples of common distancefunctions include mutual information [8], [120], Earth Mover’sDistance [34], [51], Bhattacharyya coefﬁcient [121], and Jaccardcoefﬁcient [35]. Nevertheless, the process of selecting distancefunctions is hardly detailed in the literature, leaving rationales andinsights unexplored. This is worsened by the potential downside ofdistance-based measurements that the feature representation is lessinterpretable than the operations in difference-based measurements.Thus, it is often difﬁcult to interpret the results, and the user studysuggests that the similarity measurement “does not fully understandtheir (users’) intent” [9].An important question that then arises is how to measurethe overall similarity when combining multiple sources. Naivefeature concatenation is a natural way to combine features from

NDER REVIEW 11 different sources [35], [51]. For instance, Luo et al. [70] proposea feature vector concatenated from ﬁve aspects, each representedby a one-hot vector describing visualization types, x-axis, y-axis, group/bin operations, and aggregation functions. Anothermethod is to compute the aggregated distance by weighting hybrid distances, e.g., chart encoding distance, keyword taggingdistance, and dimensional interaction distance by Xu et al. [73].Nevertheless, both feature concatenation and distance aggregationassume a linear relationship among different vectors, which seemsfar from capturing the real-world complexity and thus yield limitedperformances when perceived by users.

Comparison and assessment are closely related and share thesame goal of outputting a numerical score. Comparison has beenpredominately rooted in feature engineering and hand-crafteddistance functions. The major downside is that it does not actuallylearn from user feedback and thus usually fails to meet the users’intent. Unlike assessment, few little machine learning approacheshave been applied to comparison. Several approaches (e.g., [73])use pre-trained ML models to perform feature learning. However,they do not ﬁne-tune the models on user feedback data. Thus,proposing dataset and ML approaches for comparison is a clearstep to improve the performance.Nevertheless, it is non-trivial to adapt ML approaches tocomparison, since comparison involves two visualizations whilestandard ML models only take one entity as input. ScatterNet [123]addresses this issue. It is a deep learning model for predictingsimilarities between scatterplots by learning from crowdsourcedhuman feedback data. Nevertheless, it is unclear how to adapt thisapproach to other statistical charts. A key challenge is that CNNmodels used in ScatterNet are worse at capturing human perceptionin other charts [77], [81].

Querying is the task of retrieving relevant visualizations thatsatisfy the users’ needs from a visualization collection. It is acrucial component of Information Retrieval (IR) systems, whichare also known as search engines especially in the context of theweb [124]. Querying in this context is distinct from visualizationquery language (Figure 5 B ). The latter speciﬁes visualizations asa query into a database, while the former describes a query into avisualization collection.

Querying

Input Query Visualization Collection Visualization

Fig. 9. Input-output model of querying.

Relations to goals and other tasks.

Querying is mainly forvisualization retrieval (Table 4). It is often built upon other taskslike transformation (e.g., [48]) and comparison (e.g., [8]).

Relations to visualization data.

Querying directly in theimage space can be difﬁcult since semantic information is lost.As such, it is often performed on visualization programs, wheresemantic information such as titles and axis labels are available.

There are two main viewpoints that characterize querying: how tospecify users’ needs and how to return visualizations that matchthe needs.

TABLE 4Querying is classiﬁed by the method (row) and the input (column)

Keywords NaturalLanguage Structural ExampleExact [8], [48], [76]

Best [11], [125] [126], [127] [1], [10] [45]

The simplest form of querying syntax is keywords . Key-words are popular since they are intuitive and easy to express.Choudhury and Giles [11] developed a search engine that allowedusers to search ﬁgures by keywords in the captions. Similarly,Voder [125] supports keyword-based queries into data ﬁelds aswell as general words like ‘outlier’ from the data fact associatedwith a visualization. Li et al. [126], [127] extends keywords to natural language queries by extracting structural keywords fromqueries and matching the extracted words with text in visualizations.Nevertheless, keywords-based queries often fail to disambiguateunstructured queries since words have multiple meanings.

Structural queries are a mechanism for resolving ambigu-ity and improving the retrieval quality and have been used inmultiple systems [1], [8], [10], [48], [76]. They are built onkeywords with the addition of structural constraints. For instance,DiagramFlyer [10] is a search engine where a query containseight key structural ﬁelds (e.g., type, x-label, legend) that canuniquely describe a visualization. Those structural constraints makeit possible to search information more than text in visualizations.For instance, Retrieve-then-Adapt [1] retrieves infographics basedon a query composed of graphical and textual elements. Notably,visualization speciﬁcations are a functional candidate for structuralqueries. Hoque and Agrawala [48]’s search engine indexes D3visualizations as Vega-Lite speciﬁcations and supports queries inthe Vega-Lite syntax. Accordingly, it lets users ﬁnd visualizationsbased on a wide range of constraints such as mark types, encodings,and non-data-encoding attributes. In the best case where the inputquery is a complete Vega-Lite speciﬁcation, their research engineactually supports “query-by-example”.This example-based query is another format that offers anintuitive method for users to specify their intent. Saleh et al. [45]implemented a search engine for stylistic search over infographiccorpora by returning stylistically similar images given a queryimage. However, more sophisticated analysis methods are necessaryto capture characteristics beyond stylistic similarities.Now that we have discussed the query syntax, the next challengeis how to reason which visualizations are most relevant to theuser-input query, which can be classiﬁed into exact-match and best-match methods. Exact-match techniques are used for ﬁlteringvisualizations by strict conditions, e.g., to retrieve visual analyticsystems containing four views [8], bar charts [48], or chartsdescribing a dataset [76]. However, exact-matching is not alwayspossible especially when the input conditions are too strict. Assuch, more systems use best-match approaches [1], [10], [11], [45],[125]–[127] that rank visualizations according to metrics. Thosemetrics measure the degree to which a visualization is relevantto the input query. They use natural language models for textcomponents (e.g., synonyms [10], [125], text relevance [126], [127])and similarity metrics [1], [8]. However, relevance or similaritymetrics are insufﬁcient for retrieving visualizations that are notonly relevant but also of high quality. In response, Retrieve-then-Adapt [1] learns the distribution of visual elements in the corpusin an attempt to empathize visualizations with common elements,assuming that the more frequent an element is, the better it is.

NDER REVIEW 12

Research on indexing visualizations has been relatively limited inthe past decade, leaving room to boost technical development. Dueto the diversity of visualizations, even state-of-the-art methods areoften restricted to certain types of visualizations, e.g., proportion-related infographics [1] or basic D3 visualizations [48]. Howto generalize those approaches to more types of visualizationsis a non-trivial issue that requires a deeper understanding ofhow visualizations should be indexed. For instance, Hoque andAgrawala’s approach [48] indexes D3 visualizations in Vega-Litesyntax, which is insufﬁcient in expressing user queries such as“sunburst diagrams”. Indexing becomes more challenging whenapplying to raster graphics (bitmap images) that deserve researchefforts, e.g., reverse engineering.Intention gap is another challenge for querying visualizations.From a theoretical perspective, there are not enough empiricalstudies to understand the user needs in searching for visualizations.Such studies are formative approaches to motivate the designspace of the indexes of visualizations. In a related venue, Op-permann et al. [3] recently found that information seeking isa core task when browsing visualization repositories, and thatusers were more interested in content rather than styles. Similarstudies are needed to survey users and inform the design of futuresearch engines for visualization. From a practical perspective, itis crucial to design convenient query interfaces that assist usersin specifying their intent. For instance, it is easy for users tospecify the region of interest on an example visualization andinteractively browse the results. Hoque and Agrawala’s query-by-example feature seems a promising start. We also notice a largebody of research that studies query-by-pattern or query-by-sketchin time-series visualizations, e.g., [128], [129]. Future work couldgeneralize those query paradigms to all visualization types.

Reasoning challenges machines to “read charts made for hu-mans” [16]. Reasoning requires interpreting visualizations toderive high-level information such as insights beyond extractingvisual encoding and data via reverse engineering. Reasoning isdistinct from assessment since reasoning usually outputs semanticinformation (e.g., insights, text summary, visual importance map)rather than a numerical score. As shown in Table 5, we ﬁnd threecommon classes depending on the targeted output of the reasoningprocess: visual perceptual learning, chart summarization, and visualquestion answering. In the following text, we ﬁrst describe eachscheme separately, followed by an organized discussion of existingmethods and research gaps.

Reasoning

Visualization Semantic Information (insights, text summary, answers etc)

Fig. 10. Input-output model of reasoning.

Relations to goals and other tasks.

Reasoning is mainlyfor visualization enhancement (e.g., summarize natural languagedescriptions [88]). It sometimes relies on reverse engineering toimprove the algorithm performance (e.g., [99]).

Relations to visualization data.

Reasoning is studied on bothvisualization images and programs.

TABLE 5Reasoning is classiﬁed by the class (row) and the method (column)

Rule-based MachineLearningVisual Perceptual Learning [108] [74], [81]

Chart Summarization [66], [68], [72],[130]–[132] [13], [83], [88],[133]

Visual Question Answering [67], [99] [12], [82],[84]–[86]

Visual perceptual learning aims to solve visual tasks by analyzingvisual information. For instance, Temporal Summary Images [108]automatically extracts points of interest in charts according topredeﬁned heuristics. Recently, deep learning methods, trained onlabeled datasets, have been applied to improve machine perceptionof visualization images. Bylinskii et al. [74] presented neuralnetwork models to predict human-perceived visual importance ofvisualization images. Similarly, Haehn et al. [81] evaluated theperformances of CNNs on Cleveland and McGill’s 1984 perceptionexperiments [134] and concluded that off-the-shelf CNNs werenot currently a good model for human graphical perception. Theirinitial results underscore the importance of continued research toimprove the performance.

Chart summarization becomes increasingly important withthe rapid popularization of visualizations. Most existing approachesgenerate text summaries such as natural language descriptionor captions [66], [68], [72], [88], [130]–[133]. The simplestapproach is to provide a short description of how to interpretthe chart [72], [130], e.g., “This chart shows the trend of averagedeparture delay in January’. More advanced approaches focus onexplaining and communicating high-level insights conveyed bycharts. Those approaches extract the data patterns and subsequentlyconvert patterns to natural language according to pre-deﬁnedtemplates [66], [68], [131], [132], e.g., “X was the most/leastfrequent sub-category in A”. Liu et al. [133] offered an alternativeperspective that learns most noteworthy insights with deep learningapproaches. A common limitation of the above work is that naturallanguage summaries are generated via pre-deﬁned templates andtherefore conﬁned to few variations and generality. Thus, recentresearch proposes several end-to-end deep-learning solutions forgenerating chart captions [13], [83] or text summarization [88].However, summarization still remains highly under-explored sincethe algorithm performances have much space for improvements.

Visual question answering is another emerging research areathat aims to answer a natural language question given a visualiza-tion image. Traditional methods [67] ﬁrst decode visualizationsinto data tables and then parse template-based questions intoqueries over the data table to generate answers. Kim et al. [99]recently improved a natural language parser to support free-from,crowdsourced questions. Another line of research studies end-to-end deep learning approaches [12], [82], [84]–[86]. The keychallenge is that answering questions for visualizations requireshigh-level reasoning of which existing visual question answeringmodels are not capable [12], [84]. Besides, visualization images aresensitive to small local changes, i.e., shufﬂing the color in legendsgreatly alters the charts’ information. Therefore, the major problemis how to learn the features from visualization images and fuse themwith features from natural language questions. DVQA [12] learnsand fuses features via a sophisticated model containing multiplesub-networks, each responsible for different components such asspatial attention. Later work expands this work with improvements

NDER REVIEW 13 to the models, e.g., PlotQA [85] and LEAF-QA [82] explicitlyapply reverse engineering to retrieve visual elements and feed theextracted information into sub-networks. Despite advancements,there remains room for future work since the datasets, models, andevaluation metrics are lacking.

The reasoning task is currently undergoing changes since machinelearning approaches, particularly deep learning models, are increas-ingly used. This change may attribute to the rapid advancementand successful applications of deep learning in visual reasoning. Inthe visualization context, research gaps emerge since off-the-shelfmodels for natural images are often shown to yield dissatisfactoryperformances on visualizations (e.g., [12], [81]). This gap is notsurprising since visualizations contain relational information that issensitive to small details that are not commonly present in naturalimages, i.e., a local change to a bar shape might signiﬁcantlyimpact the encoded data and conveyed meanings. This researcharea remains largely under-explored. First, limited datasets areavailable, which hinders model development and validation. Forinstance, most existing datasets for visual question answeringcontain synthetic questions and charts, which are far from beingrepresentative for the actual use. Second, it would be pertinentto study feature learning models that are tailored to visualizationimages. The recent trend of decomposing end-to-end models tostructures containing multiple sub-networks might be a promisingmethod (e.g., [85]).

Recommendation is an important step for automating the creationof visualizations. As shown in Table 6, there are three methods forrecommending visualizations [95]: • Data recommendation suggests interesting data, insights, ordata transformation to be visualized from a database • Encoding recommendation determines the visual encoding(including both data and non-data encodings) given the dataor other visualization elements • Hybrid recommendation decides both data and encodings

Recommending

Data to be encoded Visualization

Fig. 11. Input-output model of recommendation.

Relations to goals and other tasks.

Recommendation ismainly for visualization generation. It is related to assessment andcomparison since the derived metrics are used as cost functions.

Relations to visualization data.

Recommendation outputsvisualization programs and subsequently visualization graphics.

Data Recommendation.

Given a dataset, one step of recommend-ing visualizations is to select data ﬁelds to be visualized, and whenapplicable, corresponding data transformation as well. The simplestapproach to decide ﬁelds is enumeration. For instance, Voyager [96],[97] enumerates all possible ﬁelds according to a predeﬁneddisplay order by the type and name. DataSite [131] improvesthis enumeration approach by computing and communicating datafacts associated with the selected ﬁelds according to pre-deﬁnedtemplates (e.g., “Correlation of A was found between X and Y” if

TABLE 6Recommendation is classiﬁed by the class (row) and methods (column)

Optimization PredictionData [4], [33]–[35], [51], [120], [125], [135],[136]

Encoding [1], [2], [4], [5], [29], [30], [36], [39], [49],[75], [96], [108], [120], [125], [131],[137]–[140] [32], [33],[89]

Hybrid [4], [31], [33], [70], [72], [95], [97], [120],[131], [138], [141] selected data ﬁelds are X and Y). However, enumeration imposes aheavy burden on users that motivates other research to recommendthe most useful data facts, also known as insights.Recommending insights is often approached by proposing ataxonomy of insight types (e.g., extrema), each type associated withan assessment metric. Examples are Foresight [135], Proﬁlier [120],Voder [125], DataShot [33], VisClean [51], and Calliope [4].One key challenge that then arises is the assessment metric.Voder [125] introduces threshold-based heuristics that classifydata facts into different tiers, while the remaining systems proposevarious cost functions that better capture the differences betweendata facts. Remarkably, QuickInsights [136] propose a uniﬁedformulation of insights and scoring metrics irrespective of the type.In the context of anchor-based generation where the goal is torecommend data that meets some criteria with respect to anchordata, the aforementioned assessment metrics are augmented byor replaced by comparison metrics in VisClean [51], DiVE [35],and SeeDB [34]. For instance, SeeDB [34] recommends data bydeviation with an anchor visualization.Given the above assessment or comparison metrics, the nextchallenge is to compute the best or top-k insights. This is challeng-ing since the space of data facts grow exponentially with the numberof data table columns. As such, researchers have proposed multiplestrategies to speed up the computation. For instance, Foresight [135]uses sketching to quickly approximate the costs. Other systemsintroduce efﬁcient searching algorithms to recommend the top-kinsights. Those searching algorithms are primarily progressive oriterative [4], [33]–[35], [51], outputting intermediate solutions thatapproximate the optimal one. That said, this data recommendationproblem has not yet been formulated as a prediction problem thatis solved by machine-learning models, probably due to the lack oflabeled training data. Notably, two approaches explicitly adopts tree-based algorithms [4], [51], leveraging the idea of GraphScape [122]that the visualization design space can be modeled as a graph forgreedy or dynamic programming.Several systems recommend data according to other inputbeyond a database. Particularly, those inputs are related to naturallanguage that is beyond the core scope of this survey. Examplesinclude natural language statements in Text-to-Viz [39], newsarticles in VizByWiki [40], keyword queries [72], and naturallanguage interfaces (e.g., NL4DV [139] and FlowSense [142]).

Encoding Recommendation decides data encodings and/ornon-data encodings for styling (e.g., positions).Data encodings are extensively studied in the literature. Earlyapproaches date back to 1986 where the APT system [29] enumer-ates the visual encoding space and selects the “best” encodingsaccording to the assessment ranking in terms of expressiveness andeffectiveness. This ranking-based recommendation is implementedand extended in later systems such as ShowMe [137] and Voy-ager [96]. In addition to ranking, several systems propose heuristicrules to decide visual encodings given the insights extracted

NDER REVIEW 14 from data recommendation or visual tasks, e.g., extreme insightsor ﬁnding extremes are mapped to histograms or scatterplots.Examples include DataVizard [138], DataSite [131], Proﬁler [120],NL4DV [139], Calliope [4], VizAssist [140], and Voder [125].The above heuristic-based data-encoding recommenders haverecently been superseded by machine learning approaches due tothe increasing availability of datasets. Draco [30] and Dziban [36]learn to assess visualizations by weighting design rules for visu-alizations. Subsequently, they formulate a constraint optimizationproblem to recommend the best or top-k visualizations. Asdiscussed, such approaches combining assessment and optimizationare commonly adopted for recommending insights. Other MLapproaches seek to directly learn the mappings between dataand visual encodings by training an end-to-end model, includingData2Vis [89], VizML [32], and DataShot [33].In addition to data encodings, other approaches study how torecommend non-data-encoding attributes such as layouts and colors.Most approaches formulate optimization problems with the primarygoal to deﬁne the optimization target, that is, the assessment metrics.Several metrics are human-crafted cost functions [2], [5], [39],[75], [108], while other metrics are data-driven, including machinelearning models trained on human assessment dataset [1] anddistances to common patterns mined from a corpus [49]. Severalsystems contribute novel optimization algorithms to improve theefﬁciency, including reinforcement learning [2] and Markov chainMonte Carlo methods [1]. Similar to recommending top-k insights,both algorithms are progressive.

Hybrid Recommendation decides both data and encodings. Astraightforward approach for hybrid recommendation is to combinedata and encoding recommendation sequentially. This approachis widely implemented in visualization recommenders includingDataVizard [138], DataSite [131], Proﬁler [120], Calliope [4],Voder [141], and DataShot [33]. Other approaches take an end-to-end perspective, formulating the recommendation tasks as anoptimization problem. Examples are Voyager [95], [97], DeepEyeand its extensions [31], [70], [72]. Thus, the core problem isto provide an overall assessment score regarding both the dataand encodings. The ﬁrst step is to translate a visualization into aformal representation, i.e., visualization query language (VQL).Assessment metrics are then proposed to evaluate the quality ofa VQL representation. Finally, Voyager ranks the recommendedvisualizations, while DeepEye proposes efﬁcient algorithms togenerate the top-k visualizations.

An ongoing discussion in recommending visualizations is theproblem formulation including optimization and prediction. Op-timization requires the careful design of optimization functionswhich are often the assessment scores. Another important concernis the efﬁcient algorithms for solving the complex, sometimesmulti-objective, optimization problem due to the huge designspace of visualizations. Generally speaking, optimization-basedapproaches have the potential to be extended to the human-in-the-loop approaches, since the predicted assessment scores help usersdetermine the visualization quality. Nevertheless, hand-crafted costfunctions for assessment are often insufﬁcient. On the other hand,machine learning assessment requires massive training data thatlabels human assessment and therefore is expensive. In contrast,prediction-based approaches such as VizML [32] only demandtraining data describing the dataset and visualizations without theneed for human labelling. This reduces the overhead of constructing dataset and thus helps boost the performance. Thus, the abovediscussion opens up many interesting general questions: Howto interpret the prediction-based ML models to understand thereasoning process and potentially derive assessment scores? Howto reduce the costs for collecting training dataset?There exist other research gaps with respect to recommendation.For instance, few ML approaches have yet been applied torecommend insights and non-data-encodings. Is it beneﬁcial tocollect training data for those tasks such as predicting importantdata ﬁelds given a data table? Besides, many existing approachesonly recommend top-k, separated visualizations given a large datatable, while it is unlikely that “top-k visualizations ﬁt all”. Futureresearch should study how to recommend dashboards, and evenvisual analytics systems for more comprehensive and intelligentdata analysis. Generating a coherent data story compromising ofmultiple data facts is another interesting question. Finally, existingsystems are conﬁned to well-known charts. Recent research incomputer vision has proposed generative models for generatingsynthetic images. An interesting question is to apply generativemodels to recommend synthetic, novel visualizations.

Mining is an emerging task motivated by the rapid popularizationand accumulation of visualization data online (Table 7). Generally,there are two kinds of mining tasks, i.e., mining design patternsand mining data patterns, that are discussed in the following text.

Mining

Visualization Collection Patterns

Fig. 12. Input-output model of mining.

Relations to goals and other tasks.

The goal of mining isvisualization analysis, i.e., to discover useful information or patternsfrom a visualization collection. Reverse-engineering is often aprerequisite for mining to obtain semantic information.

Relations to visualization data.

Reasoning is studied on bothvisualization graphics and programs.

Mining Design Patterns.

The concept of design mining refersto leveraging data mining techniques to derive design principlesfrom existing artifacts. Smart et al. [49] formally introduced designmining in visualizations and proposed an unsupervised clusteringtechnique to derive common color ramps. However, design mininghas been implicitly practiced in several early systems but fordifferent goals. Choudhury and Giles [11] investigate the designpatterns (e.g., colored or not) of 300 line graphs sampled fromtop computer science conferences. Viziometrics [47] computes theaverage number of visualizations in academic papers from differentresearch domains. In the visualization community, Beagle [46]automatically crawled SVG-based visualizations online to investi-gate how popular, e.g., line and bar charts are on the web. Hoque

TABLE 7Mining is classiﬁed by the class (row) and the method (column)

Statistics Clustering Visual AnalysisDesign Pattern [8], [11],[46]–[48] [49]

Data Pattern [9], [73]

NDER REVIEW 15 and Agrawala [48] performed a design demographics analysisof 7,860 D3 visualizations to identify common patterns such ashow frequently circles are used. More recently, Chen et al. [8]studied the composition and conﬁguration patterns in multiple-view visualizations collected from visualization papers.The above systems mostly apply simple statistical analysis [8],[11], [46]–[48] or clustering techniques [49]. However, patternsmight become meaningless numbers unless they are interpreted asinsights or used to empower novel applications, e.g., to motivatedesign ideas. As such, several systems [8], [48] propose interactivevisual interfaces for users to browse the visualization database andﬁnd example visualizations. Other approaches also leverage mineddesign patterns to recommend visual designs [8], [49].

Mining Data Patterns.

Another line of research aims toexplore the data patterns encoded in visualization ensembles. Thisconcept has been widely implemented in exploring visualizationensembles with respect to a speciﬁed type of charts (e.g., ScagEx-plorer [143] and TimeSeer [144]). We identify two approaches thatare irrespective of the chart types, namely Chart Constellations [73]and ChartSeer [9]. Both adopt a visual analytic approach byprojecting charts into a 2D space whereby supporting clusteringanalysis and interactive analysis. Chart Constellations [73] deﬁnesseveral cost functions for computing the distances between charts,while ChartSeer [9] uses features learned via deep learning modelsand improves the performance.

There exist several promising directions for future work. Fordesign patterns, existing approaches focus on visualization usage.It would be beneﬁcial to mine underlying semantic patternsfrom visualization collections, e.g., the relationships betweenlinked views in a multiple-view visualization, and the “good” or“bad” practices in existing visualization designs. Those guidelinescould motivate the design of recommender systems that automatethe creation of visualizations. For instance, many recommendersystems in our survey rely on manual coding to derive the designspace of visualizations (e.g., [2], [33], [39]). Automated mining ofdesign space would signiﬁcantly reduce the manual efforts.Most current mining techniques are limited to simple statistics.However, it is likely that there exist hidden patterns in visualizationcorpora. Therefore, one could explore the visualization collectionwith advanced mining techniques and human-in-the-loop analyticsto uncover those patterns. A research challenge would be to developvisual analytic systems for analyzing visualization collections.

UTURE R ESEARCH O PPORTUNITIES

Despite the extensive research efforts, there exist sufﬁcient researchgaps and potentials for future research. We have identiﬁed anddiscussed research opportunities regarding representation and eachtask in section 5 and section 6. In this section, we outline anorganized overview of future research directions.

Our analysis reveals different content formats of visualizationdata that have been inconsistently adapted in the systems andtechniques reviewed here (subsection 5.1). Such inconsistencyimpedes interoperability among different visualization systems.Being able to combine different systems and libraries is a commonneed in application settings. For instance, visualization generationtools such as VizML [32] recommend data encodings, while other systems like MobileVixFixer [2] adjust non-data encodings (visualstyles). Both systems do not work well together since their formatsare incompatible with each other. Notably, it is a common practiceto select partial information from the full speciﬁcations as theintermediate format. This leads to the need for a common standardof visualizations that cover all existing partial formats, as wellas derivative tools for auto-completing partial speciﬁcations togenerate universally compatible formats.Besides, since visualizations are naturally shared and storedin the graphical format, systems for visualization enhancementsusually have to engage in the reverse engineering to extract theunderlying program. Despite extensive research efforts, reverseengineering remains computationally expensive and lacks robust-ness [53]. Particularly, our analysis in subsection 6.1 suggests that itis currently impossible to perform reverse engineering on bespokecharts. Although recent work like Chartem [7] and VisCode [6]proposes new standards for storing programs in graphics, there is along way before such standards are adopted and implemented inexisting systems. Thus, continued research on reverse engineeringis essential and beneﬁcial to interoperability.

Recent research has increasingly leveraged machine learningto generate or transform visualization. However, it remains anopen challenge to choose the “best” representation and machinelearning models for visualizations. On the one hand, programsare compact and effective representations of visualizations thatare computationally inexpensive [30], [32]. However, programs donot generalize since they are often limited to speciﬁc visualizationtypes and might not apply to parameter values not observed duringtraining. On the other hand, graphics appear to be a generalrepresentation. However, much research suggests that off-the-shelf models for natural images achieve dissatisfactory resultsin tasks such as visual perceptual learning [81], visual questionanswering [12], and assessment [77].This gap is not surprising due to the characteristics of visual-ization images. Compared with natural images, visualizations areparticularly sensitive to local details, e.g., the sizes of graphicalmarks are of vital importance. Text information is also critical,taking on various roles (e.g., legends, axis labels) that have notolerance for misinterpretation, e.g., missing a single characterin axis labels leads to a great blunder. The above characteristicsmake it challenging to design a machine learning model that istailored to visualizations. Recent work in visualization questionanswering [12], [82], [85] has proposed sophisticated models tocapture visualization-speciﬁc features. However, such attempts arelimited and their efﬁcacy remains to be thoroughly evaluated.Another important perspective lies in augmenting machinelearning models with knowledge derived from empirical studies.Machine learning models are often criticized for poor explainability.Fortunately, empirical research in the visualization ﬁeld has accu-mulated valuable knowledge base about how visualizations shouldbe interpreted, assessed, and created. It is therefore promisingto incorporate that knowledge in ML models that is tailored tovisualization research, e.g., Draco [30].

In this survey, we highlight the theme of considering visualizationsas a data type. Moving forward, an evolving and promisingtheme is big visualization data, which concerns processing and

NDER REVIEW 16 analyzing visualization data at larger scales. This big visualizationdata perspective leads to several issues that deserve researchefforts. First, visualization database systems should effectivelystore, manage, and retrieve visualizations that are heterogeneousand often unstructured. Second, it would be beneﬁcial to facilitatethe sharing of visualization to facilitate its distribution and reuse.Sharing visualizations currently faces multiple problems such asthe dependency on the raw data-table and an exponentially growingnumber of states with user interactions [65]. This underscores theresearch needs for effectively sharing visualizations. Finally, itis unclear how to mine and analyze big visualization data, sinceexisting approaches focus on small-scale visualization collectionsand mainly use simple statistics. These challenges demand aholistic adaptation of data mining techniques to visualization data,including but not limited to data cleaning, data transformation,data reduction, feature extraction and analysis. For instance, whencollecting visualization datasets by web crawling, it is important toclean the visualization collection by removing noise such as non-visualization images. Therefore, there are still many open researchtopics in terms of analyzing big visualization data.

ISCUSSION AND L IMITATION

In this section, we discuss the limitations in terms of mutual exclu-siveness and collective exhaustiveness, as well as generalizability.

In this survey, we use an inductive approach to organize theliterature and construct our taxonomy by observing existing workand iteratively generalizing the classiﬁcation.We note several dependencies among tasks. For instance,assessment and comparison metrics are often used as optimizationfunctions for recommendation. However, those dependenciesshould not be interpreted as violations of mutual exclusiveness.Instead, dependencies suggest that tasks can be sequentiallycombined into a system pipeline for solving complex problems.Because we inductively collected the papers for this survey, wedo not claim that our taxonomy is exhaustive, especially for ourtask and goal taxonomy. This is because there exist many potentialresearch questions that we have not observed yet. Therefore, itwould be interesting to improve the taxonomy from a deductiveperspective by referring to task taxonomies in related ﬁelds such ascomputer vision, artiﬁcial intelligence, and databases. For instance,image compression and style transfer are well-studied tasks inthe computer vision community [28], which, however, remainunexplored in the context of visualizations. That said, there existpromising directions for future research of applying artiﬁcialintelligence for visualizations.

In this survey, we make simplifying assumptions by focusing oncharts and infographics, excluding scientiﬁc visualizations andwork tailored for a speciﬁc type of visualizations. Critically, animportant concern regarding our taxonomy is how to generalizeor extend it to a wider spectrum of visualization data. Duringour analysis process, we ﬁnd that most of the excluded work canﬁt well into our proposed taxonomy. However, there exist a fewexceptions that warrant future improvement. In the following text,we discuss notable extensions to our what-why-how taxonomy.

What.

Our what taxonomy primarily focuses on visualizationdata. However, visualizations are hardly considered as standalonedata in AI-empowered systems. Instead, it is often necessary toprovide ground-truth labels for visualizations as auxiliary data, e.g.,chart types [53]. Another type of auxiliary data are user-generated,i.e., interaction logs [145] and analysis provenance [15]. Futureresearch should study the taxonomy of auxiliary data to bettercontextualize the opportunities for AI for visualization research.

Why.

The why taxonomy is organized along two axes: singleor many visualizations versus inputting or outputting visualizations.A missing perspective lies in work that neither inputs nor outputsvisualizations but instead exploits visualization data in the middlestage. For instance, Lall´e et al. [146] collected eye tracking datawhen browsing visualizations and trained a ML model to predictthe learning curves of visualizations. Our survey does not coverthis kind of research due to our core theme centered on consideringvisualization as a data format, emphasizing how visualization dataare processed and produced. Another perspective for improvingour taxonomy is to expand the sub-categories by adding goals thatare currently conﬁned to a particular visualization. For instance,several systems propose machine learning methods for brushingpoint-based visualizations [145], [147], which falls under thevisualization enhancement category.

How.

Echoing the above discussion about “what”, there existcorresponding needs of identifying tasks for processing andanalyzing auxiliary data. This is crucial since our current tasktaxonomy concerns visualization data. Moreover, we notice anotherpotential task for visualization data, namely visualization collectionsummarization that aims to represent visualization collections in aneffective and compact manner. However, existing approaches arelimited and speciﬁc to visualization types, e.g., Scagnostics for scat-terplots [143] and line charts [144]. In addition to those statisticalsummarization approaches, more recent visual analytic approachesuse glyphs [9]. Future work could provide a systematical overviewof visualization summarization with increasing research efforts.

ONCLUSION

In this paper, we probe the concept of considering visualization asan emerging data format and investigate the advance of applyingartiﬁcial intelligence to visualization data. We present a novelclassiﬁcation that enables the readers to ﬁnd relevant literatureamong a wide variety of research areas in computer science. Ourclassiﬁcation can also help readers to understand what techniqueshave been developed and ﬁnd areas for future research. We hopethat our survey could serve as a tutorial that helps stimulate newtheories, problems, techniques, and applications. A CKNOWLEDGMENTS

The authors would like to thank... R EFERENCES [1] C. Qian, S. Sun, W. Cui, J.-G. Lou, H. Zhang, and D. Zhang,“Retrieve-then-adapt: Example-based automatic generation for proportion-related infographics,” IEEE Transactions on Visualization and ComputerGraphics, 2020 (Early Access).[2] A. Wu, W. Tong, T. Dwyer, B. Lee, P. Isenberg, and H. Qu, “Mobile-visﬁxer: Tailoring web visualizations for mobile phones leveraging anexplainable reinforcement learning framework,” IEEE Transactions onVisualization and Computer Graphics, 2020 (Early Access).

NDER REVIEW 17 [3] M. Oppermann, R. Kincaid, and T. Munzner, “Vizcommender: Comput-ing text-based similarity in visualization repositories for content-basedrecommendations,” IEEE Transactions on Visualization and ComputerGraphics, 2020 (Early Access).[4] D. Shi, X. Xu, F. Sun, Y. Shi, and N. Cao, “Calliope: Automaticvisual data story generation from a spreadsheet,” IEEE Transactionson Visualization and Computer Graphics, 2020 (Early Access).[5] Y. Kim and J. Heer, “Gemini: A grammar and recommender systemfor animated transitions in statistical graphics,” IEEE Transactions onVisualization and Computer Graphics, 2020 (Early Access).[6] P. Zhang, C. Li, and C. Wang, “Viscode: Embedding information invisualization images using encoder-decoder network,” IEEE Transactionson Visualization and Computer Graphics, 2020 (Early Access).[7] J. Fu, B. Zhu, W. Cui, S. Ge, Y. Wang, H. Zhang, H. Huang,Y. Tang, D. Zhang, and X. Ma, “Chartem: Reviving chart images withdata embedding,” IEEE Transactions on Visualization and ComputerGraphics, 2020 (Early Access).[8] X. Chen, W. Zeng, Y. Lin, H. M. Al-Maneea, J. Roberts, and R. Chang,“Composition and conﬁguration patterns in multiple-view visualizations,”IEEE Transactions on Visualization and Computer Graphics, 2020(Early Access).[9] J. Zhao, M. Fan, and M. Feng, “Chartseer: Interactive steering exploratoryvisual analysis with machine intelligence,” IEEE Transactions onVisualization and Computer Graphics, 2020 (Early Access).[10] Z. Chen, M. Cafarella, and E. Adar, “Diagramﬂyer: A search enginefor data-driven diagrams,” in Proc. of the International Conference onWorld Wide Web (WWW), 2015, pp. 183–186.[11] S. Ray Choudhury and C. L. Giles, “An architecture for informationextraction from ﬁgures in digital libraries,” in Proc. of the InternationalConference on World Wide Web (WWW), 2015, pp. 667–672.[12] K. Kaﬂe, B. Price, S. Cohen, and C. Kanan, “Dvqa: Understanding datavisualizations via question answering,” in Proc. of the IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), 2018, pp. 5648–5656.[13] C. Chen, R. Zhang, E. Koh, S. Kim, S. Cohen, and R. Rossi, “Figurecaptioning with relation maps for reasoning,” in Proc. of the IEEE WinterConference on Applications of Computer Vision, 2020, pp. 1537–1545.[14] L. McNabb and R. S. Laramee, “Survey of surveys (sos)-mapping thelandscape of survey papers in information visualization,” in ComputerGraphics Forum, vol. 36, no. 3. Wiley Online Library, 2017, pp.589–617.[15] K. Xu, A. Ottley, C. Walchshofer, M. Streit, R. Chang, and J. Wen-skovitch, “Survey on the analysis of user interactions and visualizationprovenance,” Computer Graphics Forum, vol. 39, no. 3, 2020.[16] J. P. Ono, R. S. Hong, C. T. Silva, and J. Freire. (2018) Whyshould we teach machines to read charts made for humans? [Online].Available: https://vgc.poly.edu/ ∼ jhenrique/ﬁles/chi2019 workshop MLEvaluate Vis.pdf[17] B. Lee, K. Isaacs, D. A. Szaﬁr, G. E. Marai, C. Turkay, M. Tory,S. Carpendale, and A. Endert, “Broadening intellectual diversity in visu-alization research papers,” IEEE Computer Graphics and Applications,vol. 39, no. 4, pp. 78–85, 2019.[18] “Visualization meets AI workshop,” https://visai2020.github.io/.[19] G. Andrienko, N. Andrienko, S. Drucker, J.-D. Fekete, D. Fisher,S. Idreos, T. Kraska, G. Li, K.-L. Ma, J. Mackinlay et al., “Big datavisualization and analytics: Future research challenges and emerging ap-plications,” in BigVis 2020: Big Data Visual Exploration and Analytics,2020.[20] B. Saket, D. Moritz, H. Lin, V. Dibia, C. Demiralp, and J. Heer,“Beyond heuristics: Learning visualization design,” arXiv preprintarXiv:1807.06641, 2018.[21] S. Zhu, G. Sun, Q. Jiang, M. Zha, and R. Liang, “A survey on automaticinfographics and visualization recommendations,” Visual Informatics,vol. 4, no. 3, pp. 24–40, 2020.[22] X. Qin, Y. Luo, N. Tang, and G. Li, “Making data visualization moreefﬁcient and effective: A survey,” The VLDB Journal, vol. 29, no. 1, pp.93–117, 2020.[23] K. Davila, S. Setlur, D. Doermann, U. K. Bhargava, and V. Govindaraju,“Chart mining: A survey of methods for automated chart analysis,” IEEETransactions on Pattern Analysis and Machine Intelligence, 2020 (EarlyAccess).[24] Q. Wang, Z. Chen, Y. Wang, and H. Qu, “Applying machine learningadvances to data visualization: A survey on ml4vis,” arXiv preprintarXiv:2012.00467, 2020.[25] J. Kehrer and H. Hauser, “Visualization and visual analysis of multi-faceted scientiﬁc data: A survey,” IEEE Transactions on Visualizationand Computer Graphics, vol. 19, no. 3, pp. 495–513, 2012. [26] M. Behrisch, M. Blumenschein, N. W. Kim, L. Shao, M. El-Assady,J. Fuchs, D. Seebacher, A. Diehl, U. Brandes, H. Pﬁster et al., “Qualitymetrics for information visualization,” in Computer Graphics Forum,vol. 37, no. 3. Wiley Online Library, 2018, pp. 625–662.[27] A. Srinivasan and J. Stasko, “Natural language interfaces for data analysiswith visualization: Considering what has and could be asked,” in Proc.of the Eurographics/IEEE VGTC Conference on Visualization: ShortPapers, 2017, pp. 55–59.[28] Computer vision — paper with codes. [Online]. Available:https://paperswithcode.com/area/computer-vision[29] J. Mackinlay, “Automating the design of graphical presentations ofrelational information,” ACM Transactions on Graphics, vol. 5, no. 2,pp. 110–141, 1986.[30] D. Moritz, C. Wang, G. L. Nelson, H. Lin, A. M. Smith, B. Howe, andJ. Heer, “Formalizing visualization design knowledge as constraints:Actionable and extensible models in draco,” IEEE Transactions onVisualization and Computer Graphics, vol. 25, no. 1, pp. 438–448, 2018.[31] Y. Luo, X. Qin, N. Tang, and G. Li, “Deepeye: Towards automaticdata visualization,” in Proc. of the International Conference on DataEngineering (ICDE). IEEE, 2018, pp. 101–112.[32] K. Hu, M. A. Bakker, S. Li, T. Kraska, and C. Hidalgo, “Vizml: Amachine learning approach to visualization recommendation,” in Proc. ofthe Conference on Human Factors in Computing Systems (CHI). ACM,2019, pp. 1–12.[33] Y. Wang, Z. Sun, H. Zhang, W. Cui, K. Xu, X. Ma, and D. Zhang,“Datashot: Automatic generation of fact sheets from tabular data,” IEEETransactions on Visualization and Computer Graphics, vol. 26, no. 1,pp. 895–905, 2019.[34] M. Vartak, S. Rahman, S. Madden, A. Parameswaran, and N. Polyzotis,“Seedb: Efﬁcient data-driven visualization recommendations to supportvisual analytics,” in Proc. of the VLDB Endowment InternationalConference on Very Large Data Bases, vol. 8, no. 13. NIH PublicAccess, 2015, p. 2182.[35] R. Mafrur, M. A. Sharaf, and H. A. Khan, “Dive: diversifying viewrecommendation for visual data exploration,” in Proc. of the ACMInternational Conference on Information and Knowledge Management(CIKM). ACM, 2018, pp. 1123–1132.[36] H. Lin, D. Moritz, and J. Heer, “Dziban: Balancing agency & automationin visualization design via anchored recommendations,” in Proc. ofthe ACM Conference on Human Factors in Computing Systems (CHI),2020, pp. 1–12.[37] J. Harper and M. Agrawala, “Converting basic d3 charts into reusablestyle templates,” IEEE Transactions on Visualization and ComputerGraphics, vol. 24, no. 3, pp. 1274–1286, 2017.[38] C. Wang, Y. Feng, R. Bodik, A. Cheung, and I. Dillig, “Visualization byexample,” Proc. of the ACM on Programming Languages, vol. 4, no. 49,pp. 1–28, 2019.[39] W. Cui, X. Zhang, Y. Wang, H. Huang, B. Chen, L. Fang, H. Zhang, J.-G.Lou, and D. Zhang, “Text-to-viz: Automatic generation of infographicsfrom proportion-related natural language statements,” IEEE Transactionson Visualization and Computer Graphics, vol. 26, no. 1, pp. 906–916,2019.[40] A. Y. Lin, J. Ford, E. Adar, and B. Hecht, “Vizbywiki: Mining datavisualizations from the web to enrich news articles,” in Proc. of theWorld Wide Web Conference (WWW), 2018, pp. 873–882.[41] N. Kong and M. Agrawala, “Graphical overlays: Using layered elementsto aid chart reading,” IEEE Transactions on Visualization and ComputerGraphics, vol. 18, no. 12, pp. 2631–2638, 2012.[42] M. Lu, J. Liang, Y. Zhang, G. Li, S. Chen, Z. Li, and X. Yuan,“Interaction+: Interaction enhancement for web-based visualizations,” inProc. of the IEEE Paciﬁc Visualization Symposium (PaciﬁcVis). IEEE,2017, pp. 61–70.[43] C. Lai, Z. Lin, R. Jiang, Y. Han, C. Liu, and X. Yuan, “Automaticannotation synchronizing with textual description for visualization,” inProc. of the ACM Conference on Human Factors in Computing Systems(CHI), 2020, pp. 1–13.[44] J. Choi, S. Jung, D. G. Park, J. Choo, and N. Elmqvist, “Visualizing forthe non-visual: Enabling the visually impaired to use visualization,” inComputer Graphics Forum, vol. 38, no. 3. Wiley Online Library, 2019,pp. 249–260.[45] B. Saleh, M. Dontcheva, A. Hertzmann, and Z. Liu, “Learning stylesimilarity for searching infographics,” in Proc. of the Graphics InterfaceConference, 2015, pp. 59–64.[46] L. Battle, P. Duan, Z. Miranda, D. Mukusheva, R. Chang, andM. Stonebraker, “Beagle: Automated extraction and interpretation ofvisualizations from the web,” in Proc. of the ACM Conference on HumanFactors in Computing Systems (CHI), 2018, pp. 1–8. NDER REVIEW 18 [47] P.-s. Lee, J. D. West, and B. Howe, “Viziometrics: Analyzing visualinformation in the scientiﬁc literature,” IEEE Transactions on Big Data,vol. 4, no. 1, pp. 117–129, 2017.[48] E. Hoque and M. Agrawala, “Searching the visual style and structure ofd3 visualizations,” IEEE Transactions on Visualization and ComputerGraphics, vol. 26, no. 1, pp. 1236–1245, 2019.[49] S. Smart, K. Wu, and D. A. Szaﬁr, “Color crafting: Automating theconstruction of designer quality color ramps,” IEEE Transactions onVisualization and Computer Graphics, vol. 26, no. 1, pp. 1215–1225,2020.[50] W. Dai, M. Wang, Z. Niu, and J. Zhang, “Chart decoder: Generatingtextual and numeric information from chart images automatically,”Journal of Visual Languages & Computing, vol. 48, pp. 101–109, 2018.[51] Y. Luo, C. Chai, X. Qin, N. Tang, and G. Li, “Interactive cleaning forprogressive visualization through composite questions,” in Proc. of theIEEE International Conference on Data Engineering (ICDE). IEEE,2020, pp. 733–744.[52] A. Satyanarayan, B. Lee, D. Ren, J. Heer, J. Stasko, J. Thompson,M. Brehmer, and Z. Liu, “Critical reﬂections on visualization authoringsystems,” IEEE Transactions on Visualization and Computer Graphics,vol. 26, no. 1, pp. 461–471, 2019.[53] J. Poco and J. Heer, “Reverse-engineering visualizations: Recoveringvisual encodings from chart images,” in Computer Graphics Forum,vol. 36, no. 3. Wiley Online Library, 2017, pp. 353–363.[54] M. Savva, N. Kong, A. Chhajta, L. Fei-Fei, M. Agrawala, and J. Heer,“Revision: Automated classiﬁcation, analysis and redesign of chartimages,” in Proc. of the Annual ACM Symposium on User InterfaceSoftware and Technology (UIST), 2011, pp. 393–402.[55] D. Moritz, “Text detection in screen images with a convolutional neuralnetwork,” The Journal of Open Source Software, vol. 2, no. 15, p. 235,Jul. 2017. [Online]. Available: https://doi.org/10.21105/joss.00235[56] J. Harper and M. Agrawala, “Deconstructing and restyling d3 visualiza-tions,” in Proc. of the ACM Symposium on User Interface Software andTechnology (UIST). ACM, 2014, pp. 253–262.[57] M. Bostock, V. Ogievetsky, and J. Heer, “D data-driven documents,”IEEE Transactions on Visualization and Computer Graphics, vol. 17,no. 12, pp. 2301–2309, 2011.[58] S. Haiduc, J. Aponte, and A. Marcus, “Supporting program comprehen-sion with source code summarization,” in the International Conferenceon Software Engineering, vol. 2. IEEE, 2010, pp. 223–226.[59] F. Bolte and S. Bruckner, “Vis-a-vis: Visual exploration of visualiza-tion source code evolution,” IEEE Transactions on Visualization andComputer Graphics, 2020 (Early Access).[60] A. Satyanarayan, R. Russell, J. Hoffswell, and J. Heer, “Reactive vega: Astreaming dataﬂow architecture for declarative interactive visualization,”IEEE Transactions on Visualization and Computer Graphics, vol. 22,no. 1, pp. 659–668, 2015.[61] A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer, “Vega-lite:A grammar of interactive graphics,” IEEE Transactions on Visualizationand Computer Graphics, vol. 23, no. 1, pp. 341–350, 2016.[62] P. Buneman, “Semistructured data,” in Proc. of theSIGACT-SIGMOD-SIGART Symposium on Principles of DatabaseSystems. ACM, 1997, pp. 117–121.[63] M. A. Borkin, A. A. Vo, Z. Bylinskii, P. Isola, S. Sunkavalli, A. Oliva, andH. Pﬁster, “What makes a visualization memorable?” IEEE Transactionson Visualization and Computer Graphics, vol. 19, no. 12, pp. 2306–2315,2013.[64] K. Hu, S. Gaikwad, M. Hulsebos, M. A. Bakker, E. Zgraggen, C. Hidalgo,T. Kraska, G. Li, A. Satyanarayan, and C¸ . Demiralp, “Viznet: Towardsa large-scale visualization learning and benchmarking repository,” inProc. of the ACM Conference on Human Factors in Computing Systems(CHI), 2019, pp. 1–12.[65] M. Raji, J. Duncan, T. Hobson, and J. Huang, “Dataless sharingof interactive visualization,” IEEE Transactions on Visualization andComputer Graphics, 2020 (Early Access).[66] S. R. Choudhury, S. Wang, and C. L. Giles, “Scalable algorithms forscholarly ﬁgure mining and semantics,” in Proc. of the InternationalWorkshop on Semantic Big Data, 2016, p. 1.[67] W. Huang and C. L. Tan, “A system for understanding imagedinfographics and its applications,” in Proc. of the ACM Symposiumon Document Engineering, 2007, pp. 9–18.[68] R. Burns, S. Carberry, S. Elzer, and D. Chester, “Automaticallyrecognizing intended messages in grouped bar charts,” in Proc. ofthe International Conference on Theory and Application of Diagrams.Springer, 2012, pp. 8–22.[69] E. Kim and K. F. McCoy, “Multimodal deep learning using im-ages and text for information graphic classiﬁcation,” in Proc. of the ACM International Conference on Computers and Accessibility(SIGACCESS), 2018, pp. 143–148.[70] Y. Luo, X. Qin, C. Chai, N. Tang, G. Li, and W. Li, “Steerable self-driving data visualization,” IEEE Transactions on Knowledge and DataEngineering, 2020 (Early Access).[71] A. Key, B. Howe, D. Perry, and C. Aragon, “Vizdeck: self-organizingdashboards for visual analytics,” in Proc. of the ACM InternationalConference on Management of Data (SIGMOD), 2012, pp. 681–684.[72] Y. Luo, X. Qin, N. Tang, G. Li, and X. Wang, “Deepeye: Creating gooddata visualizations by keyword search,” in Proc. of the InternationalConference on Management of Data (SIGMOD). ACM, 2018, pp.1733–1736.[73] S. Xu, C. Bryan, J. K. Li, J. Zhao, and K.-L. Ma, “Chart constellations:Effective chart summarization for collaborative and multi-user analyses,”in Computer Graphics Forum, vol. 37, no. 3. Wiley Online Library,2018, pp. 75–86.[74] Z. Bylinskii, N. W. Kim, P. O’Donovan, S. Alsheikh, S. Madan, H. Pﬁster,F. Durand, B. Russell, and A. Hertzmann, “Learning visual importancefor graphic designs and data visualizations,” in Proc. of the ACMSymposium on User Interface Software and Technology (UIST), 2017,pp. 57–69.[75] R. Ma, H. Mei, H. Guan, W. Huang, F. Zhang, C. Xin, W. Dai,X. Wen, and W. Chen, “Ladv: Deep learning assisted authoring ofdashboard visualizations from images and sketches,” IEEE Transactionson Visualization and Computer Graphics, 2020 (Early Access).[76] N. Siegel, Z. Horvitz, R. Levin, S. Divvala, and A. Farhadi, “Figureseer:Parsing result-ﬁgures in research papers,” in Proc. of the EuropeanConference on Computer Vision (ECCV). Springer, 2016, pp. 664–680.[77] X. Fu, Y. Wang, H. Dong, W. Cui, and H. Zhang, “Visualizationassessment: A machine learning approach,” in Proc. of the IEEEVisualization Conference (VIS). IEEE, 2019, pp. 126–130.[78] P. Chagas, R. Akiyama, A. Meiguins, C. Santos, F. Saraiva, B. Meiguins,and J. Morais, “Evaluation of convolutional neural network architecturesfor chart image classiﬁcation,” in Proc. of the International JointConference on Neural Networks (IJCNN). IEEE, 2018, pp. 1–8.[79] S. Tsutsui and D. J. Crandall, “A data driven approach for compoundﬁgure separation using convolutional neural networks,” in Proc. ofthe International Conference on Document Analysis and Recognition(ICDAR), vol. 1. IEEE, 2017, pp. 533–540.[80] B. Tang, X. Liu, J. Lei, M. Song, D. Tao, S. Sun, and F. Dong, “Deepchart:Combining deep convolutional networks and deep belief networks inchart classiﬁcation,” Signal Processing, vol. 124, pp. 156–161, 2016.[81] D. Haehn, J. Tompkin, and H. Pﬁster, “Evaluating ‘graphical per-ception’with cnns,” IEEE Transactions on Visualization and ComputerGraphics, vol. 25, no. 1, pp. 641–650, 2018.[82] R. Chaudhry, S. Shekhar, U. Gupta, P. Maneriker, P. Bansal, and A. Joshi,“Leaf-qa: Locate, encode & attend for ﬁgure question answering,” inProc. of the IEEE Winter Conference on Applications of ComputerVision (WACV), 2020, pp. 3512–3521.[83] C. Chen, R. Zhang, S. Kim, S. Cohen, T. Yu, R. Rossi, and R. Bunescu,“Neural caption generation over ﬁgures,” in Adjunct Proc. of the ACMInternational Joint Conference on Pervasive and Ubiquitous Computing(UBiComp) and Proc. of the 2019 ACM International Symposium onWearable Computers (ISWC), 2019, pp. 482–485.[84] S. E. Kahou, V. Michalski, A. Atkinson, ´A. K´ad´ar, A. Trischler, andY. Bengio, “Figureqa: An annotated ﬁgure dataset for visual reasoning,”arXiv preprint arXiv:1710.07300, 2017.[85] N. Methani, P. Ganguly, M. M. Khapra, and P. Kumar, “Plotqa: Reasoningover scientiﬁc plots,” in Proc. of the IEEE Winter Conference onApplications of Computer Vision (WACV), 2020, pp. 1527–1536.[86] R. Reddy, R. Ramesh, A. Deshpande, and M. M. Khapra, “Figurenet: Adeep learning model for question-answering on scientiﬁc plots,” in Proc.of International Joint Conference on Neural Networks (IJCNN). IEEE,2019, pp. 1–8.[87] H. Singh and S. Shekhar, “Stl-cqa: Structure-based transformers withlocalization and encoding for chart question answering,” in Proc. ofthe Conference on Empirical Methods in Natural Language Processing(EMNLP), 2020, pp. 3275–3284.[88] J. Obeid and E. Hoque, “Chart-to-text: Generating natural languagedescriptions for charts by adapting the transformer model,” in Proc. ofthe International Conference on Natural Language Generation, 2020, pp.138–147.[89] V. Dibia and C¸ . Demiralp, “Data2vis: Automatic generation of datavisualizations using sequence-to-sequence recurrent neural networks,”IEEE Computer Graphics and Applications, vol. 39, no. 5, pp. 33–46,2019. NDER REVIEW 19 [90] V. Lifschitz, “Action languages, answer sets, and planning,” in The LogicProgramming Paradigm. Springer, 1999, pp. 357–373.[91] H. Ehsan, M. A. Sharaf, and P. K. Chrysanthis, “Efﬁcient recommenda-tion of aggregate data visualizations,” IEEE Transactions on Knowledgeand Data Engineering, vol. 30, no. 2, pp. 263–277, 2017.[92] ——, “Muve: Efﬁcient multi-objective view recommendation for visualdata exploration,” in Proc. of the International Conference on DataEngineering (ICDE). IEEE, 2016, pp. 731–742.[93] T. Siddiqui, A. Kim, J. Lee, K. Karahalios, and A. Parameswaran,“Effortless data exploration with zenvisage: an expressive and interactivevisual analytics system,” Proc. of the VLDB Endowment, vol. 10, no. 4,pp. 126–130, 2016.[94] E. Wu, F. Psallidas, Z. Miao, H. Zhang, L. Rettig, Y. Wu, andT. Sellam, “Combining design and performance in a data visualizationmanagement system,” in Proc. of the Biennial Conference on InnovativeData Systems Research, 2017.[95] K. Wongsuphasawat, D. Moritz, A. Anand, J. Mackinlay, B. Howe, andJ. Heer, “Towards a general-purpose query language for visualizationrecommendation,” in Proc. of the Workshop on Human-In-the-LoopData Analytics, 2016, pp. 1–6.[96] ——, “Voyager: Exploratory analysis via faceted browsing of visu-alization recommendations,” IEEE Transactions on Visualization andComputer Graphics, vol. 22, no. 1, pp. 649–658, 2015.[97] K. Wongsuphasawat, Z. Qu, D. Moritz, R. Chang, F. Ouk, A. Anand,J. Mackinlay, B. Howe, and J. Heer, “Voyager 2: Augmenting visual anal-ysis with partial view speciﬁcations,” in Proc. of the ACM Conferenceon Human Factors in Computing Systems (CHI), 2017, pp. 2648–2659.[98] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: Areview and new perspectives,” IEEE Transactions on Pattern Analysisand Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.[99] D. H. Kim, E. Hoque, and M. Agrawala, “Answering questions aboutcharts and generating visual explanations,” in Proc. of the ACMConference on Human Factors in Computing Systems (CHI), 2020, pp.1–13.[100] J. Brosz, M. A. Nacenta, R. Pusch, S. Carpendale, and C. Hurter,“Transmogriﬁcation: Causal manipulation of visualizations,” in Proc.of the ACM Symposium on User Interface Software and Technology(UIST), 2013, pp. 97–106.[101] J. E. Zhang, N. Sultanum, A. Bezerianos, and F. Chevalier, “Dataquilt:Extracting visual elements from images to craft pictorial visualizations,”in Proc. of the Conference on Human Factors in Computing Systems(CHI). ACM, 2020, pp. 1–13.[102] Y. Zhou and C. L. Tan, “Learning-based scientiﬁc chart recognition,”in Proc. of the IAPR International Workshop on Graphics Recognition.Citeseer, 2001, pp. 482–492.[103] Q. Wang, Z. Li, S. Fu, W. Cui, and H. Qu, “Narvis: Authoringnarrative slideshows for introducing data visualization designs,” IEEETransactions on Visualization and Computer Graphics, vol. 25, no. 1,pp. 779–788, 2018.[104] W. Huang, R. Liu, and C.-L. Tan, “Extraction of vectorized graphicalinformation from scientiﬁc chart images,” in Proc. of the InternationalConference on Document Analysis and Recognition (ICDAR), vol. 1.IEEE, 2007, pp. 521–525.[105] J. Poco, A. Mayhua, and J. Heer, “Extracting and retargeting colormappings from bitmap images of visualizations,” IEEE Transactions onVisualization and Computer Graphics, vol. 24, no. 1, pp. 637–646, 2017.[106] W. Browuer, S. Kataria, S. Das, P. Mitra, and C. L. Giles, “Segregatingand extracting overlapping data points in two-dimensional plots,” inProc. of the ACM/IEEE-CS joint conference on Digital libraries, 2008,pp. 276–279.[107] S. Ray Choudhury, S. Wang, and C. L. Giles, “Curve separation for linegraphs in scholarly documents,” in Proc. of the ACM/IEEE-CS on JointConference on Digital Libraries, 2016, pp. 277–278.[108] C. Bryan, K.-L. Ma, and J. Woodring, “Temporal summary images: Anapproach to narrative visualization via interactive annotation generationand placement,” IEEE Transactions on Visualization and ComputerGraphics, vol. 23, no. 1, pp. 511–520, 2016.[109] R. Savvides, A. Henelius, E. Oikarinen, and K. Puolam¨aki, “Signiﬁcanceof patterns in data visualisations,” in Proc. of the ACM InternationalConference on Knowledge Discovery & Data Mining (KDD), 2019, pp.1509–1517.[110] D. J.-L. Lee, H. Dev, H. Hu, H. Elmeleegy, and A. Parameswaran,“Avoiding drill-down fallacies with vispilot: assisted exploration of datasubsets,” in Proc. of the International Conference on Intelligent UserInterfaces (IUI), 2019, pp. 186–196.[111] J. Gao, Y. Zhou, and K. E. Barner, “View: Visual information extractionwidget for improving chart images accessibility,” in Proc. of the International Conference on Image Processing. IEEE, 2012, pp. 2865–2868.[112] Z. Chen, Y. Wang, Q. Wang, Y. Wang, and H. Qu, “Towards automatedinfographic design: Deep learning-based auto-extraction of extensibletimeline,” IEEE Transactions on Visualization and Computer Graphics,vol. 26, no. 1, pp. 917–926, 2019.[113] R. A. Al-Zaidy and C. L. Giles, “A machine learning approach forsemantic structuring of scientiﬁc charts in scholarly documents.” inProc. of the Conference on Artiﬁcial Intelligence (AAAI), 2017, pp.4644–4649.[114] V. S. N. Prasad, B. Siddiquie, J. Golbeck, and L. S. Davis, “Classifyingcomputer generated charts,” in Proc. of the International Workshop onContent-Based Multimedia Indexing. IEEE, 2007, pp. 85–92.[115] R. A. Al-Zaidy, S. R. Choudhury, and C. L. Giles, “Automatic summarygeneration for scientiﬁc data charts,” in Proc. of the Conference onArtiﬁcial Intelligence (AAAI). AI Access Foundation, 2016, pp. 658–663.[116] G. G. M´endez, M. A. Nacenta, and S. Vandenheste, “ivolver: Interactivevisual language for visualization extraction and reconstruction,” in Proc.of the ACM Conference on Human Factors in Computing Systems(CHI), 2016, pp. 4073–4085.[117] D. Jung, W. Kim, H. Song, J.-i. Hwang, B. Lee, B. Kim, and J. Seo,“Chartsense: Interactive data extraction from chart images,” in Proc. ofthe ACM Conference on Human Factors in Computing Systems (CHI),2017, pp. 6706–6717.[118] D. Ren, B. Lee, and M. Brehmer, “Charticulator: Interactive constructionof bespoke chart layouts,” IEEE Transactions on Visualization andComputer Graphics, vol. 25, no. 1, pp. 789–799, 2018.[119] D. McCandless, Information is Beautiful. Scotland, UK: Collins, 2009.[120] S. Kandel, R. Parikh, A. Paepcke, J. M. Hellerstein, and J. Heer,“Proﬁler: Integrated statistical analysis and visualization for data qualityassessment,” in Proc. of the International Working Conference onAdvanced Visual Interfaces (AVI), 2012, pp. 547–554.[121] P.-M. Law, R. C. Basole, and Y. Wu, “Duet: Helping data analysisnovices conduct pairwise comparisons by minimal speciﬁcation,” IEEETransactions on Visualization and Computer Graphics, vol. 25, no. 1,pp. 427–437, 2018.[122] Y. Kim, K. Wongsuphasawat, J. Hullman, and J. Heer, “Graphscape:A model for automated reasoning about visualization similarity andsequencing,” in Proc. of the ACM Conference on Human Factors inComputing Systems (CHI), 2017, pp. 2628–2638.[123] Y. Ma, A. K. Tung, W. Wang, X. Gao, Z. Pan, and W. Chen, “Scatternet:A deep subjective similarity model for visual analysis of scatterplots,”IEEE Transactions on Visualization and Computer Graphics, vol. 26,no. 3, pp. 1562–1576, 2018.[124] L. Cerulo and G. Canfora, “A taxonomy of information retrieval modelsand tools,” Journal of Computing and Information Technology, vol. 12,no. 3, pp. 175–194, 2004.[125] A. Srinivasan, S. M. Drucker, A. Endert, and J. Stasko, “Augmentingvisualizations with interactive data facts to facilitate interpretation andcommunication,” IEEE Transactions on Visualization and ComputerGraphics, vol. 25, no. 1, pp. 672–681, 2018.[126] Z. Li, S. Carberry, H. Fang, K. F. McCoy, and K. Peterson, “Infographicsretrieval: A new methodology,” in Proc. of the International Conferenceon Applications of Natural Language to Data Bases/InformationSystems. Springer, 2014, pp. 101–113.[127] Z. Li, S. Carberry, H. Fang, K. F. McCoy, K. Peterson, and M. Stagitis,“A novel methodology for retrieving infographics utilizing structureand message content,” Data & Knowledge Engineering, vol. 100, pp.191–210, 2015.[128] M. Wattenberg, “Sketching a graph to query a time-series database,”in Proc. of the Extended Abstracts on Human factors in ComputingSystems, 2001, pp. 381–382.[129] C. Fan, K. Matkovic, and H. Hauser, “Sketch-based fast and accuratequerying of time series using parameter-sharing lstm networks,” IEEETransactions on Visualization and Computer Graphics, 2020 (EarlyAccess).[130] V. O. Mittal, J. D. Moore, G. Carenini, and S. Roth, “Describing complexcharts in natural language: A caption generation system,” ComputationalLinguistics, vol. 24, no. 3, pp. 431–467, 1998.[131] Z. Cui, S. K. Badam, M. A. Yalc¸in, and N. Elmqvist, “Datasite:Proactive visual data exploration with computation of insight-basedrecommendations,” Information Visualization, vol. 18, no. 2, pp. 251–267, 2019.[132] S. Demir, S. Carberry, and K. F. McCoy, “Summarizing informationgraphics textually,” Computational Linguistics, vol. 38, no. 3, pp. 527–574, 2012.